Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dask Delayed not returned from to_raster with compute=False #664

Open
jjwindow opened this issue Apr 12, 2023 · 3 comments
Open

Dask Delayed not returned from to_raster with compute=False #664

jjwindow opened this issue Apr 12, 2023 · 3 comments
Labels
bug Something isn't working dask Dask related issue.

Comments

@jjwindow
Copy link

import rioxarray as rxr
import dask

# Open raster - problem is independent of raster source
raster = rxr.open_rasterio("raster_example.tif", chunks=True)
# Raster is split into chunks with dask as expected

# Write to disk using dask
job = raster.rio.to_raster("write_example.tif", compute=False)
# job is None, write_example.tif written anyway.
# Setting compute=True does the same thing.
# Explicit delay works fine
explicit_job = dask.delayed(raster.rio.to_raster)("write_example.tif")
dask.compute(explicit_job)  # Works normally

# Providing a lock as well as compute=False returns delayed
from threading import Lock
job = raster.rio.to_raster("write_example.tif", lock=Lock(), compute=False)  # Returns delayed

Problem description

The compute keyword in to_raster seems to be ignored in my installation, unless a lock is provided. The documentation does not mention that returning a delayed is conditional on lock being provided. From the docs, the expected behaviour is that compute=False always returns a delayed object, which is preferable when creating multiple delayed tasks to avoid explicitly making many instances of threading.Lock.

Environment Information

This example was recreated on Windows 11 using conda 23.1.0, in a conda-managed virtual environment. The environment was built with
conda create -n name -c conda-forge rioxarray dask
and has rioxarray==0.14.0, dask=2023.3.2. This problem has occurred for me on a different machine running Windows 10 and older versions of conda, rioxarray and dask.

@jjwindow jjwindow added the bug Something isn't working label Apr 12, 2023
@snowman2
Copy link
Member

@djhoese, do you have any thoughts?

@snowman2
Copy link
Member

Related #219

@djhoese
Copy link
Contributor

djhoese commented Apr 13, 2023

Yeah this makes a lot of sense and at first I agreed with @jjwindow until I started re-reading the code. The unfortunate if statement is here:

https://github.com/djhoese/rioxarray/blob/413aef31e27f32a90b9f82e0731f847595869e54/rioxarray/raster_writer.py#L179

This could maybe be adjusted to say if is_dask_collection(xarray_dataarray.data) and (lock or not compute):. However, the Delayed object returned by dask needs to have that lock object already. I don't think there is a way to give the Delayed object a lock after the fact. If someone can correct me on that then I'm fine with the above if statement change.

If this is changed the docstrings updated in #219 would all need to be updated to somehow succinctly say that lock and compute control whether or not dask is used. They would also need to mention that if compute=False you still need to use a lock and how to provide it.

@snowman2 snowman2 added the dask Dask related issue. label May 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working dask Dask related issue.
Projects
None yet
Development

No branches or pull requests

3 participants