Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support rechunking of non-row axes in xds_from_parquet. #211

Open
JSKenyon opened this issue Apr 26, 2022 · 0 comments
Open

Support rechunking of non-row axes in xds_from_parquet. #211

JSKenyon opened this issue Apr 26, 2022 · 0 comments
Assignees

Comments

@JSKenyon
Copy link
Collaborator

Description

Parquet does not allow for on-disk chunking of non-row dimensions. In the interests of providing as uniform an interface as possible to the casa, zarr and parquet backends, I propose that we support chunking in non-row dimensions using dask.Array.rechunk functionality. This will have some obvious limitations, as we will still end up with a row-only chunked array in memory i.e. we will read all channels for a number of rows, even if we subsequently process them in smaller chunks. This will will also introduce a shared root in resulting graph (although we may be able to circumvent this with inlining/caching).

An alternative would be to re-read and slice the data for non-row chunks. This would result in a large amount of memory and disk overhead as we would need to repeatedly allocate memory and read from disk.

My instinct is to go with the first option for now as this is still highly experimental functionality.

@JSKenyon JSKenyon self-assigned this Apr 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant