Building indices removes user defined metadata by pavankumar-jamanjyothi-by · Pull Request #489 · JDASoftwareGroup/kartothek

pavankumar-jamanjyothi-by · 2021-08-03T07:12:28Z

Description:

When building indices for an existing dataset via build_dataset_indices methods, user-defined metadata (the "metadata" key in the deserialized by-metadata.json file) is removed. The reason is that build_dataset_indices functions pass load_dataset_metadata=False to the DatasetFactory, e.g. here:
https://github.com/JDASoftwareGroup/kartothek/blob/master/kartothek/io/eager.py#L817-L822
This has the effect of actively removing user-defined metadata:
https://github.com/JDASoftwareGroup/kartothek/blob/master/kartothek/core/factory.py#L99-L100
so no metadata is written when the by-metadata.json file is written in the end.

Fix is to pass load_dataset_metadata=True to the DatasetFactory.

steffen-schroeder-by

Thanks, all in all, looks good to me. There are 2 things, I'd like to see in addition:

this is worth an entry in the changelog
we should understand why load_dataset_metadata was always set to False and what implication it has to set it to True as default now. (Functionality/Performance/...). Maybe @fjetter has an idea.

pavankumar-jamanjyothi-by · 2021-08-03T09:18:31Z

        dataset_uuid=dataset_uuid,
        store=store_factory,
        factory=factory,
-        load_dataset_metadata=False,


@jochen-ott-by As this is gc, I think we should set it to False here.

I think it does not really matter and we can use load_dataset_meadata=True everywhere.

ghost · 2021-08-24T10:55:08Z

        dataset_uuid=dataset_uuid,
        store=store_factory,
        factory=factory,
-        load_dataset_metadata=False,


I think it does not really matter and we can use load_dataset_meadata=True everywhere.

ghost · 2021-08-24T10:59:21Z

+        {"label": "cluster_1", "data": [("core", pd.DataFrame({"p": [1, 2]}))]},
+        {"label": "cluster_2", "data": [("core", pd.DataFrame({"p": [2, 3]}))]},
+    ]
+    with freeze_time(TIME_TO_FREEZE_ISO):


IIRC, we had some issues with the freeze_time approach in the past, which is why almost no test nowadays uses it. I think this test can be re-written without using freeze_time, simply by not checking a value for metadata["creation_time"]. This would not only drop the dependency on freezegun here, but also make the test clearer.

Makes sense. Removed freeze_time and pushed the changes.

fixed removing metadata while building indices.

23d1e95

pavankumar-jamanjyothi-by requested review from aaron-tal-by, fjetter, florian-jetter-by, ilia-zaitcev-by, jakob-ernst-by, johan-olsson-by, lr4d and steffen-schroeder-by August 3, 2021 07:12

steffen-schroeder-by requested changes Aug 3, 2021

View reviewed changes

pavankumar-jamanjyothi-by commented Aug 3, 2021

View reviewed changes

ghost suggested changes Aug 24, 2021

View reviewed changes

added tests to assert metadata is not lost after adding indices

0618558

pavankumar-jamanjyothi-by force-pushed the building-indices-removes-user-defined-metadata branch from 1c8ed8b to 0618558 Compare September 1, 2021 09:37

pavankumar-jamanjyothi-by requested review from a user and steffen-schroeder-by and removed request for florian-jetter-by September 1, 2021 09:43

johan-olsson-by removed their request for review April 20, 2022 11:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Building indices removes user defined metadata#489

Building indices removes user defined metadata#489
pavankumar-jamanjyothi-by wants to merge 2 commits into
masterfrom
building-indices-removes-user-defined-metadata

pavankumar-jamanjyothi-by commented Aug 3, 2021

Uh oh!

steffen-schroeder-by left a comment

Uh oh!

pavankumar-jamanjyothi-by Aug 3, 2021

Uh oh!

ghost Aug 24, 2021

Uh oh!

ghost Aug 24, 2021

Uh oh!

ghost Aug 24, 2021

Uh oh!

pavankumar-jamanjyothi-by Sep 1, 2021

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Conversation

pavankumar-jamanjyothi-by commented Aug 3, 2021

Description:

Uh oh!

steffen-schroeder-by left a comment

Choose a reason for hiding this comment

Uh oh!

pavankumar-jamanjyothi-by Aug 3, 2021

Choose a reason for hiding this comment

Uh oh!

ghost Aug 24, 2021

Choose a reason for hiding this comment

Uh oh!

ghost Aug 24, 2021

Choose a reason for hiding this comment

Uh oh!

ghost Aug 24, 2021

Choose a reason for hiding this comment

Uh oh!

pavankumar-jamanjyothi-by Sep 1, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants