Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagation and reduction of the metadata in MapDataset.stack #5228

Open
bkhelifi opened this issue Apr 19, 2024 · 4 comments
Open

Propagation and reduction of the metadata in MapDataset.stack #5228

bkhelifi opened this issue Apr 19, 2024 · 4 comments
Milestone

Comments

@bkhelifi
Copy link
Member

No description provided.

@bkhelifi bkhelifi added this to the 1.3 milestone Apr 19, 2024
@AtreyeeS
Copy link
Member

AtreyeeS commented May 2, 2024

Thanks @bkhelifi for bringing this up.
What stacking should do to the MetaData was discussed at length in #4853 without conclusion.

  1. The main point was whether we should have a parallel lists leading to code duplication. An approach with a RootModel and BaseModel was proposed by @adonath (see Add MapDatasetMetaData container #4853 (comment))

  2. How much info should be kept on a stacked dataset? Currently we have a minimal approach where we throw away all the meta info and keep only the creation info. If required, a meta container can be created from the meta_table. This approach is obviously ill suited. In Add MapDatasetMetaData container #4853 I had initially tried keeping all, but that was ill planned and difficult to maintain.

A similar question might arise for the estimators, where the question would be what meta info is propagated from the individual datasets.

@AtreyeeS
Copy link
Member

AtreyeeS commented May 2, 2024

What should be the difference between Datasets metadata and a stacked dataset

@AtreyeeS AtreyeeS closed this as completed May 2, 2024
@AtreyeeS AtreyeeS reopened this May 2, 2024
@bkhelifi
Copy link
Member Author

bkhelifi commented May 2, 2024

For the fixity metadata, there is no staking (of course).
For the context metadata, it depends a bit on the retained data model. But if, e.g., it contains the datapipe version, the calibration version, one should keep only one instance as these data will be unique for a fixed release
For the reference metadata, here I can propose that we append the ObsId list...

We have to go through all individual metadata fields and make a proposal to VODF/CTA (ie @kosack , myself, ...). A spreadsheet and then we discuss to decide which to keep as unique, which to append, which to skip...
I think that this is the hardest part of this 'project'

@adonath
Copy link
Member

adonath commented May 6, 2024

@bkhelifi Internally in Gammapy I think we can almost always just propagate the meta data to the higher level by building hierarchical structures. There is not necessarily a need to reduce the meta data in each step, only if we find performance issues with Pydantic. The reduction can then finally happen when serializing. The problem with reducing the meta data "on the fly" is that different data formats might require different meta data. And "a priori" we cannot know to which format the user will serialize.

What should be the difference between Datasets metadata and a stacked dataset

The metadata for the stacked dataset is transposed and homogenous in the type of datasets. The datasets meta data is not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants