Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore Dask as a scaler #181

Open
darribas opened this issue Aug 15, 2023 · 0 comments
Open

Explore Dask as a scaler #181

darribas opened this issue Aug 15, 2023 · 0 comments
Labels
enhancement New feature or request

Comments

@darribas
Copy link
Member

darribas commented Aug 15, 2023

This is a spin-off issue from the conversation in #180 so we don't loose track of it and also don't distract discussion in that PR.

Original suggestion from @knaaptime:

Categoricals are important, for example, to interpolate rasters (e.g., land use), and having the functionality out in the wild would help it get tested.

it would be useful to see whether this can provide a boost to the existing functionality we have for vectorizing rasters

And response from @darribas:

It’s slightly different. We could think of a way of vectorizing pixels and doing a spatial dissolve with dask. I don’t know if that’d be faster (it'd be at least parallel/out-of-core), but it’s definitely different code (though similar philosophy), so I'd be tempted to leave that for a different PR, perhaps create an issue to remember this option in case we have bandwidth (or need) in the future to explore it.

In the case suggested above, a strategy to use Dask would be:

  • Read in the raster w/ rioxarray
  • Extract pixel centroids with to_pandas (there might be a way to go directly into a dask.DataFrame
  • Turn into a dask_geopandas.GeoDataFrame
  • Build pixels as vectors with buffer(xxx, cap_style=3)
  • Dissolve vector pixels by value

Once we enter a Dask data structure, all computations are lazy and parallel when .compute() is called, providing scalability and parallelism. But I'm not sure if that will make it faster than rasterio's vectorisation, which I imagine relies on GEOS? It might because the dissolve should be a fast one because all polygons to dissolve are four-point squares. One worth a shot for sure.

@darribas darribas added the enhancement New feature or request label Aug 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant