Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Codec: clarify expectations regarding unknown metadata fields #270

Open
sbesson opened this issue Oct 27, 2023 · 0 comments
Open

Codec: clarify expectations regarding unknown metadata fields #270

sbesson opened this issue Oct 27, 2023 · 0 comments

Comments

@sbesson
Copy link

sbesson commented Oct 27, 2023

As a preamble, I wanted to highlight the Zarr v3 specification provides a list of officially supported codecs each with their own specification e.g. blosc. Even though https://zarr-specs.readthedocs.io/en/latest/v3/codecs.html is still marked under construction, this is a noticeable improvement over the Zarr v2 specification. Having an official registry of codecs also allows new additions to be proposed using the standard Zarr Enhancement Proposals process.

This issue is motivated by a compatibility issue initially raised in zarr-developers/jzarr#14: a new feature of the dev.zarr:jzarr:0.4.0 implementation added an extra key (numThreads) to the blosc object which in turned prevented the Zarr from been opened using zarr-python due to stricter semantics when reading the blosc dictionary. In that case, the extra key is not essential and a fix is under review to remove it.

This issue raises the wider question of how implementations should deal with codec objects containing unknown metadata fields. The must_understand key/value pair introduced in the v3 specification aims to handle similar scenarios. However as per the current terms

Future versions of this specification may also add new core features by adding new top-level metadata keys. Such features are required by default. However, if the value of an unknown feature is an object containing the key-value pair "must_understand": false, it can be ignored.
...
The array metadata object must not contain any other names. Those are reserved for future versions of this specification. An implementation must fail to open zarr hierarchies, groups or arrays with unknown metadata fields, with the exception of objects with a "must_understand": false key-value pair.
...
The group metadata object must not contain any other names. Those are reserved for future versions of this specification. An implementation must fail to open zarr hierarchies or groups with unknown metadata fields, with the exception of objects with a "must_understand": false key-value pair.

the scope of this key seems to be limited to unspecified top-level objects.

Ideally, the expectations regarding unspecified codec metadata fields should be enforced at the specification level. Note also there is an ongoing discussion in #72 (comment) about whether must_understand should be defined and supported at arbitrary levels which might be relevant to this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant