Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dask.array.to_zarr() does not parse for chunks kwarg #10522

Open
aaptss opened this issue Sep 20, 2023 · 4 comments · May be fixed by #10561
Open

dask.array.to_zarr() does not parse for chunks kwarg #10522

aaptss opened this issue Sep 20, 2023 · 4 comments · May be fixed by #10561
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. needs triage Needs a response from a contributor

Comments

@aaptss
Copy link

aaptss commented Sep 20, 2023

I was trying to specify chunk size for the output zarr when writing from a dask array:

# cc_img is a dask array
cc_img.to_zarr(ccd_zarr_path,
               component=f"0/{current_pyr_lvl}",
               overwrite=True,
               compressor=compressor,
               chunks=(1,1,1,256,256)
               )

cc_img's chunks were (1,1,3,256,256)

I started getting a hard error, that zarr.create gets multiple instances of chunks keyword.

This is easily fixeable by changing expression chunks = [c[0] for c in arr.chunks] to chunks = [c[0] for c in arr.chunks] if 'chunks' not in kwargs.keys() else kwargs.pop('chunks') on line 3714.

Overall, da.to_zarr() needs a way to resolve this, as right now it implies the chunks' shape from an input array

@github-actions github-actions bot added the needs triage Needs a response from a contributor label Sep 20, 2023
@bartbroere
Copy link
Contributor

I took the liberty to try to implement your solution. Let's see if it passes the automated checks.

@aaptss
Copy link
Author

aaptss commented Oct 16, 2023

@bartbroere thank you for doing this!

all tests have passed with one of your solutions. however, i wonder what is the original intention of this method's developer, as there might be another approach to resolving this conflict.

It seems that all the tests have passed simply because this behaviour is not covered (the current state would fail the test anyways if this would've been covered by one).

I am afraid that this particular solution might actually break something outside of my narrow usecase.

@martindurant you seem to be the one who has implemented the code according to git blame (5yrs ago, wow!). can you please weigh in on this ?

@bartbroere
Copy link
Contributor

I am afraid that this particular solution might actually break something outside of my narrow usecase.

I share that concern. In the PR I currently have this open checklist item:

Investigate unintended effects. What happens if chunks is incorrect? Can we "chunk arrays along any dimension"?

@martindurant
Copy link
Member

An obvious workaround, is to first rechunk the original dask array, which does not of course use any memory of CPU, but presents the right attributes to zarr.

@github-actions github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. needs triage Needs a response from a contributor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants