Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xrspatial Proximity doesn't work with large rasters even with dask backed array #773

Open
GrahamReveley opened this issue Mar 8, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@GrahamReveley
Copy link

GrahamReveley commented Mar 8, 2023

Xrspatial not working with large dask backed arrays
I believe there's a bug in the xrspatial.proximity method whereby the coordinates for a large array (even if its a dask array) are loaded into memory as a numpy array instead of there being a dask array. For context I'm attempting to compute a distance to coast for a 90m global dataset. I had a look into the source code and I believe the issue is here (in the _process function of proximity):

`
raster_dims = raster.dims
if raster_dims != (y, x):
raise ValueError(
"raster.coords should be named as coordinates:"
"({0}, {1})".format(y, x)
)

distance_metric = DISTANCE_METRICS.get(distance_metric, None)
if distance_metric is None:
    distance_metric = DISTANCE_METRICS["EUCLIDEAN"]

target_values = np.asarray(target_values)

# x-y coordinates of each pixel.
# flatten the coords of input raster and reshape to 2d
xs = np.tile(raster[x].data, raster.shape[0]).reshape(raster.shape)
ys = np.repeat(raster[y].data, raster.shape[1]).reshape(raster.shape)

`

Therefore XS and YS are huge numpy arrays that don't fit into memory, whereas, if the input data is a dask array these should probably be dask arrays rather than numpy arrays. Later on in the processing sequence the proximity calculation is either done using dask or numpy and I think there should be a similar thing here for dask/numpy processing.

If I've missed something please let me know, more than happy to share some more code if needs be and in the mean time I can use gdal_proximity.py directly but I guess it would be slower than using the dask backed xrspatial.

Thanks!

@GrahamReveley GrahamReveley added the bug Something isn't working label Mar 8, 2023
@bstadlbauer
Copy link

Also ran into this 👍

@brendancol
Copy link
Contributor

@GrahamReveley @bstadlbauer thanks contributing here and I believe this is a known issue to @thuydotm, but hasn't been formally documented. also sorry for delay in responding.

  • @thuydotm at a minimum we need to document the limitation in proximity.

We are going to prioritize this proximity over the next sprints, but happy to help review / test a PR if y'all want contribute a fix.

@bstadlbauer
Copy link

@brendancol We do have a constrained usecase and are working with a changed fork of the proximity code here, but IIRC that's nothing that we can easily generalize to work for all usecases

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants