Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZEP 2: Consider versioning #277

Open
jakirkham opened this issue Nov 1, 2023 · 6 comments
Open

ZEP 2: Consider versioning #277

jakirkham opened this issue Nov 1, 2023 · 6 comments

Comments

@jakirkham
Copy link
Member

It would be useful to include versioning in the shard implementation to allow improvements over time. This might make sense in the metadata

@normanrz
Copy link
Contributor

normanrz commented Nov 1, 2023

I think that is a good idea.
I was wondering if the entire Zarr spec should be versioned, instead of versioning extensions separately. Otherwise, we might need to track incompatibilities among extension versions.

@clbarnes
Copy link
Contributor

clbarnes commented Nov 1, 2023

I was wondering if the entire Zarr spec should be versioned

I'm in favour of this, as the spec is continuing to evolve, although would it imply any backwards compatibility? If a field was renamed in 3.3, would a 3.3-compliant implementation be expected to support the 3.2 name? Also, where ZEPs represent distinct features like sharding codec, how would an implementation indicate that they supported e.g. ZEP3 but not ZEP2?

@rabernat
Copy link
Contributor

rabernat commented Nov 1, 2023

For an example of a community that has successfully used semver in a spec with multiple implementations, check out STAC: https://github.com/radiantearth/stac-spec/blob/master/process.md

@jbms
Copy link
Contributor

jbms commented Nov 1, 2023

For an example of a community that has successfully used semver in a spec with multiple implementations, check out STAC: https://github.com/radiantearth/stac-spec/blob/master/process.md

Can you provide some pointers to examples of how the version is actually used, e.g. examples of code in STAC implementations that handle multiple versions?

@rabernat
Copy link
Contributor

rabernat commented Nov 1, 2023

PyStac is a good example: https://github.com/stac-utils/pystac/blob/main/pystac/version.py

This is used, e.g. to migrate versions of a catalog: https://github.com/stac-utils/pystac/blob/master/tests/data-files/change_stac_version.py

edit: I think one of the most useful ways semver was used for STAC was for before the 1.0 release, so the implementation community could incrementally move towards the stable version in discrete steps, rather than all at once.

@jbms
Copy link
Contributor

jbms commented Nov 1, 2023

I previously advocated against "version numbers" --- I'll reiterate the arguments I made in previous discussions:

As the spec/format evolves, various new functionality will be added, and implementations will evolve to support subsets of that functionality. However, a version number assumes a linear progression.

For example, let's say we want to know what functionality is supported by a given zarr v3 implementation. We can't necessarily convey that by just saying it supports zarr version 3.5, because it may support some features but not others. Instead, I prefer the HTML model where you have a table that indicates for each feature the minimum version of each implementation that supports it (along with any relevant notes if there are caveats).

If we store the version number in the array metadata, then presumably it gets set when creating the array. But which version number do we choose? Let's say a given zarr implementation ImplA supports all of the required functionality of zarr version 3.7. It might then always specify "version": "3.7" when creating an array. But then a zarr implementation ImplB that hasn't been updated since version 3.6 of the spec was published might decide that it cannot read this array, even if in fact ImplB supports all of the features used by the array. Instead we could add logic to ImplA to choose the minimum version number of the spec that includes all of the features used by the array. But that adds implementation complexity, and given the possibility of optional features or that an implementation may add support for some but not all of the functionality added in a given zarr version, still does not really help other implementations determine whether they support a given array.

An additional problem with version numbers is that as we develop the spec, e.g. adding an index_location parameter, there isn't any assigned version number for these experimental changes.

JSON plus the requirement that all attributes must be known unless they are marked with {"must_understand":false} was specifically intended to facilitate format evolution in a backwards-compatible way without the need for a version number. (We may want to revise the must_understand mechanism, and/or clarify how it applies to nested configuration objects.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants