Skip to content

Interface to fetch entries in primitive types from DataPack#900

Merged
mylibrar merged 13 commits into
asyml:masterfrom
Pushkar-Bhuse:get_raw
Jan 3, 2023
Merged

Interface to fetch entries in primitive types from DataPack#900
mylibrar merged 13 commits into
asyml:masterfrom
Pushkar-Bhuse:get_raw

Conversation

@Pushkar-Bhuse
Copy link
Copy Markdown
Collaborator

This PR is the first step towards fixing #881

Description of changes

Current, when fetching entries from a DataPack or MultiPack using the get method, Forte converts data store entries into object form. We wanted a way for users to directly interact with DataStore entries. In this PR, we provide a modification to the get method of DataPack to be able to return an entry in its primitive form directly from DataStore without needing to be converted to an object.

Additionally, since DataStore entries are not very interpretable (since they are in a list format), this PR introduces a way to retain data store entries in their primitive form and also represent them in a more interpretable way by converting it to a dictionary. This happens by the transform_data_store_entry method in data_store.py. An example of this is as follows:

# Entry of type 'ft.onto.base_ontology.Sentence'
            data_store_entry = [
                171792711812874531962213686690228233530,
                'ft.onto.base_ontology.Sentence',
                0,
                164,
                0,
                '-',
                0,
                {},
                {},
                {}
            ]

            transformed_entry = pack.transform_data_store_entry(
                data_store_entry
            )

            # transformed_entry = {
            #   'begin': 0,
            #   'end': 164,
            #   'payload_idx': 0,
            #   'speaker': '-',
            #   'part_id': 0,
            #   'sentiment': {},
            #   'classification': {},
            #   'classifications': {},
            #   'tid': 171792711812874531962213686690228233530,
            #   'type': 'ft.onto.base_ontology.Sentence'}
            # }

Possible influences of this PR.

By allowing DataPack or MultiRack to fetch entries in their primitive form, users can interact with DataStore more easily.

Test Conducted

The working of the get method with the get_raw attribute set to True was tested in data_pack_test.py and multi_pack_test.py

@Pushkar-Bhuse Pushkar-Bhuse requested a review from mylibrar August 24, 2022 20:32
@codecov
Copy link
Copy Markdown

codecov Bot commented Aug 24, 2022

Codecov Report

Merging #900 (77f4483) into master (72e8bce) will increase coverage by 0.05%.
The diff coverage is 92.59%.

@@            Coverage Diff             @@
##           master     #900      +/-   ##
==========================================
+ Coverage   80.87%   80.93%   +0.05%     
==========================================
  Files         253      253              
  Lines       19619    19677      +58     
==========================================
+ Hits        15867    15925      +58     
  Misses       3752     3752              
Impacted Files Coverage Δ
tests/forte/data/data_store_serialization_test.py 98.43% <ø> (ø)
tests/forte/data/data_store_test.py 95.58% <ø> (ø)
forte/data/multi_pack.py 83.01% <80.00%> (+0.82%) ⬆️
forte/data/data_pack.py 84.90% <86.36%> (-0.37%) ⬇️
forte/data/data_store.py 93.31% <95.23%> (+0.39%) ⬆️
forte/data/base_pack.py 76.75% <100.00%> (+0.07%) ⬆️
forte/data/ontology/top.py 78.16% <100.00%> (+0.05%) ⬆️
tests/forte/data/data_pack_test.py 98.98% <100.00%> (+0.13%) ⬆️
tests/forte/data/multi_pack_test.py 97.05% <100.00%> (+0.15%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Copy Markdown
Collaborator

@mylibrar mylibrar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a few more places to refactor in the current get method:

  • entry_type can just be a string. This will save the cost of get_full_module_name
  • range_annotation could be a tid instead of an entry object.

To maintain backward compatibility, there is a potential workaround:

  • We may add a new get_raw method with the following signature
    def get_raw(  # type: ignore
        self,
        entry_type: str,
        range_annotation_tid: Optional[int] = None,
        components: Optional[Union[str, Iterable[str]]] = None,
        include_sub_type: bool = True,
    ) -> Iterator[List]
  • And then in the get method, we can do:
    def get(  # type: ignore
          self,
          entry_type: Union[str, Type[EntryType]],
          range_annotation: Optional[Union[Annotation, AudioAnnotation]] = None,
          components: Optional[Union[str, Iterable[str]]] = None,
          include_sub_type: bool = True
      ):
        # Convert entry_type to string if it's Type[EntryType]
        ...
        # Convert range_annotation to tid
        ...
        # Convert result from get_raw to entry objects
        for entry_data in get_raw(...):
          yield self.get_entry(tid=entry_data[TID])

You can try out other solutions as well.

Comment thread forte/data/base_pack.py Outdated
Comment thread forte/data/data_pack.py Outdated
Comment thread forte/data/base_pack.py Outdated
Comment thread forte/data/data_pack.py
Comment thread forte/data/data_pack.py
Comment thread forte/data/data_pack.py Outdated
@hunterhector
Copy link
Copy Markdown
Member

quick comment on the title, not "fetch entries directly from Data Store", but fetch primitive types from data pack. Data store is still invisible to users.

@Pushkar-Bhuse Pushkar-Bhuse changed the title Interface to fetch entries directly from Data Store Interface to fetch entries in primitive types from DataPack Aug 30, 2022
Comment thread forte/data/base_pack.py
Comment thread forte/data/data_pack.py
Comment thread forte/data/data_pack.py Outdated
Comment thread forte/data/data_store.py Outdated
Comment thread forte/data/ontology/top.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants