Propagation and reduction of the metadata in MapDataset.stack #5228

bkhelifi · 2024-04-19T16:32:04Z

No description provided.

AtreyeeS · 2024-05-02T14:57:52Z

Thanks @bkhelifi for bringing this up.
What stacking should do to the MetaData was discussed at length in #4853 without conclusion.

The main point was whether we should have a parallel lists leading to code duplication. An approach with a RootModel and BaseModel was proposed by @adonath (see Add MapDatasetMetaData container #4853 (comment))
How much info should be kept on a stacked dataset? Currently we have a minimal approach where we throw away all the meta info and keep only the creation info. If required, a meta container can be created from the meta_table. This approach is obviously ill suited. In Add MapDatasetMetaData container #4853 I had initially tried keeping all, but that was ill planned and difficult to maintain.

A similar question might arise for the estimators, where the question would be what meta info is propagated from the individual datasets.

AtreyeeS · 2024-05-02T15:11:07Z

What should be the difference between Datasets metadata and a stacked dataset

bkhelifi · 2024-05-02T17:04:54Z

For the fixity metadata, there is no staking (of course).
For the context metadata, it depends a bit on the retained data model. But if, e.g., it contains the datapipe version, the calibration version, one should keep only one instance as these data will be unique for a fixed release
For the reference metadata, here I can propose that we append the ObsId list...

We have to go through all individual metadata fields and make a proposal to VODF/CTA (ie @kosack , myself, ...). A spreadsheet and then we discuss to decide which to keep as unique, which to append, which to skip...
I think that this is the hardest part of this 'project'

adonath · 2024-05-06T20:40:59Z

@bkhelifi Internally in Gammapy I think we can almost always just propagate the meta data to the higher level by building hierarchical structures. There is not necessarily a need to reduce the meta data in each step, only if we find performance issues with Pydantic. The reduction can then finally happen when serializing. The problem with reducing the meta data "on the fly" is that different data formats might require different meta data. And "a priori" we cannot know to which format the user will serialize.

What should be the difference between Datasets metadata and a stacked dataset

The metadata for the stacked dataset is transposed and homogenous in the type of datasets. The datasets meta data is not.

bkhelifi added the effort-medium label Apr 19, 2024

bkhelifi added this to the 1.3 milestone Apr 19, 2024

AtreyeeS closed this as completed May 2, 2024

AtreyeeS reopened this May 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagation and reduction of the metadata in MapDataset.stack #5228

Propagation and reduction of the metadata in MapDataset.stack #5228

bkhelifi commented Apr 19, 2024

AtreyeeS commented May 2, 2024

AtreyeeS commented May 2, 2024

bkhelifi commented May 2, 2024

adonath commented May 6, 2024

Propagation and reduction of the metadata in MapDataset.stack #5228

Propagation and reduction of the metadata in MapDataset.stack #5228

Comments

bkhelifi commented Apr 19, 2024

AtreyeeS commented May 2, 2024

AtreyeeS commented May 2, 2024

bkhelifi commented May 2, 2024

adonath commented May 6, 2024