Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sharding as a codec rather than array storage transformer #220

Open
jbms opened this issue Mar 9, 2023 · 4 comments
Open

sharding as a codec rather than array storage transformer #220

jbms opened this issue Mar 9, 2023 · 4 comments

Comments

@jbms
Copy link
Contributor

jbms commented Mar 9, 2023

In the ZEP meeting on 2023-03-02, @jstriebel, @normanrz and I discussed representing sharding (ZEP 2) in the metadata as a codec rather than a storage transformer:

{"chunk_grid": {
  "name": "regular",
  "configuration": {"chunk_shape": [1000, 1000, 1000]}},
  "codecs": [{"name": "sharding_indexed", "configuration": {
    "chunk_grid": {"name": "regular", "configuration": [100, 100, 100]}, 
    "codecs": [{"name": "blosc"}]
  }}],
 "shape": [50000, 50000, 50000],
 "data_type": "uint16",
 "zarr_format": 3,
 "node_type": "array",
 "fill_value": 0
}

In general, the idea here is to specify the partitioning of the array top-down, i.e. the chunk_grid specifies the top-level partitioning, and then the codec is responsible for any further subdivision, while with the previous proposal to represent sharding with a storage transformer, we are specifying the partitioning bottom-up, i.e. the grid_grid specifies the bottom-level partitioning, and then the storage transformer builds up larger shards from the individual chunks.

While the sharding format as proposed in ZEP2 works fine as a storage transformer, in terms of the spec there are some advantages to making it a codec:

  • Previously sharding had to "hijack" chunk_key_encoding, which should be specifying the keys for chunks, to instead specify the keys for shards; at the same time, the chunk_key_encoding was still used to create fake chunk keys for the individual chunks that served only as an identifier to pass around internally within the implementation, since the sharding format does not store keys. With sharding as a codec, each shard is in fact a chunk and there is no need for string keys for the sub-chunks.

The greater benefit to representing sharding as a codec is in enabling some interesting future extensions beyond the sharding format of ZEP2:

  • The NVIDIA nvcomp project is interested in a sharding format where the compression is specified within the shard format, and may differ across shards. With sharding as a codec, this is straightforward and very natural. With sharding as a storage transformer, this can be done but basically requires that we leave the codecs empty in the array metadata.

  • It is easy to imagine a sharding format where the sub-chunking of each shard is specified as part of a header/footer in the shard itself, and may differ across shards. That is impossible to support with storage transformers, because the bottom-level chunking may vary across the different referenced arrays.

There are a few things that are possible with storage transformers but are not possible with a codec:

  • Shards corresponding to non-rectangular regions of the array.
  • Shards encoded as more than one key on the underlying store.

However, it is not clear that there is a use case for those things, and it would still be possible to use a storage transformer as well.

@normanrz has updated zarrita to support zarr v3 with sharding as a codec: https://github.com/scalableminds/zarrita/blob/v3/zarrita/sharding.py

@joshmoore
Copy link
Member

Primary outcome from the ZEP meeting discussion: for those interested in this issue, we should really work through the API (changes) for each of the main abstractions in Zarr V3.

@jstriebel jstriebel added the core-protocol-v3.0 Issue relates to the core protocol version 3.0 spec label Mar 13, 2023
@jstriebel
Copy link
Member

@jbms @normanrz I just had a chat with @joshmoore, and we had one more argument to keep sharding as a storage transformer: When using sharding as a codec, the implementation must use partial reads to be able to read single chunks. This basically boils it down to: Are implementations assumed to use/support a byte-range API, or is this rather an optional optimization. To keep zarr as simple as possible I tend towards the latter, and would therefore argument to keep sharding as a storage transformer.

In any case, I'd vote to keep storage transformers in ZEP 1, since we do have a working example in zarr-python, even if it turns out that sharding should be a codec in the end. If not, this discussion would certainly block v3 and ZEP 1, which otherwise is almost ready for voting.

@normanrz
Copy link
Contributor

When using sharding as a codec, the implementation must use partial reads to be able to read single chunks.

I wouldn't see it that way. Partial reads are implementation details that implementations can choose to implement. If they don't implement it, reading works the same as accessing a slice of a chunk in Zarr right now (i.e. read the full thing and make a virtual cut-out). For writing, it is like that, anyways. Shards essentially become chunks (with sub chunks). Of course, partial reads make sense and are recommended.

In any case, I'd vote to keep storage transformers in ZEP 1, since we do have a working example in zarr-python, even if it turns out that sharding should be a codec in the end. If not, this discussion would certainly block v3 and ZEP 1, which otherwise is almost ready for voting.

No objections. But I still think codec is the better abstraction for sharding.

@jbms
Copy link
Contributor Author

jbms commented Mar 16, 2023

When using sharding as a codec, the implementation must use partial reads to be able to read single chunks.

I wouldn't see it that way. Partial reads are implementation details that implementations can choose to implement. If they don't implement it, reading works the same as accessing a slice of a chunk in Zarr right now (i.e. read the full thing and make a virtual cut-out). For writing, it is like that, anyways. Shards essentially become chunks (with sub chunks). Of course, partial reads make sense and are recommended.

Agreed. I'm also not sure I understand the concern being raised here. If sharding is through a storage transformer, then the underlying store still needs to support byte range reads in order to read just a single chunk within the shard; that requirement is the same regardless of whether sharding is through a storage transformer or a codec. The only difference is that if sharding is through a codec, then the codec interface must also support partial reads. But that difference is essentially just a matter of the internal organization of the code, rather than a difference in capabilities that must be implemented for each store.

In any case, I'd vote to keep storage transformers in ZEP 1, since we do have a working example in zarr-python, even if it turns out that sharding should be a codec in the end. If not, this discussion would certainly block v3 and ZEP 1, which otherwise is almost ready for voting.

No objections. But I still think codec is the better abstraction for sharding.

Also no objections from me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants