Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Sharding Implementation #260

Open
CSSFrancis opened this issue May 13, 2024 · 0 comments
Open

Test Sharding Implementation #260

CSSFrancis opened this issue May 13, 2024 · 0 comments

Comments

@CSSFrancis
Copy link
Member

CSSFrancis commented May 13, 2024

Describe the functionality you would like to see.

For 4-D STEM there are some operations which would work significantly better with the dataset chunked equally in all dimensions. For example if you want to do something like make a virtual image, or apply a gaussian filter in real space. This has traditionally been a pain because zarr likes large chunks which translate to fast parallel operations. Similarly, hyperspy likes no chunks in the signal dimensions for the map function and for plotting.

With the V3 spec for zarr and the sharding implementation we might be able to rethink how we handle things. For example we could have the data in a format like:

image

Where it essentially acts like the current ideal data strucuture but within the sharded dataset there are small chunks which operate fast along certain dimensions. This allows us to create virtual images without loading the entire dataset into memory and reduce the memory footprint when doing things like rechunking.

This might not be ready (quite yet) as there are some issues to solve regaurding speeding up the sharding implementation. zarr-developers/zarr-python#1338

It is worth a disucssion about if this is something worth persuing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant