Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible improvements to new xr_percentile function #209

Open
robbibt opened this issue Jun 3, 2021 · 3 comments
Open

Possible improvements to new xr_percentile function #209

robbibt opened this issue Jun 3, 2021 · 3 comments

Comments

@robbibt
Copy link
Contributor

robbibt commented Jun 3, 2021

Hey @kieranricardo and @Kirill888, I've just had a chance to test out the new xr_percentile function. It seems to function far faster than the built-in .quantile functionality in xarray, which is really exciting! I did have a few pieces of feedback though which might help it be easier to use and to fit into existing workflows.

Issue 1: The function currently returns an xr.Dataset with a data variable for each percentile the user requests. E.g. below, I've requested a 0.01 and a 0.999 percentile. These are used to label the new variables by appending the percentiles onto the original variable name.

image

This approach feels a bit clunky to to me as the user can't anticipate the naming convention used for the new band names, especially as the input 0.01 value is converted to an integer 1. It also produces issues where the user requests higher than 0.01 precision, e.g. the 0.999 above is clipped to 0.99 in the band name (e.g. if a user requests both 0.995 and 0.999, they get combined into one in the output).

The native xarray solution is to instead produce an xr.Dataset with a new "quantile" dimension, which is then labelled with the requested quantiles. This feels a bit more elegant to me:

image

Issue 2: It would be nice if this function also supported non-dask input data too, as for a lot of smaller-scale science team applications we don't always need to use dask throughout an entire workflow. If I try running it on a xr.Dataset in memory, I get this error:

image

@Kirill888
Copy link
Member

Agree on both counts.

@RichardScottOZ
Copy link
Contributor

This sounds great given quantile doesn't work on dask I think?

@RichardScottOZ
Copy link
Contributor

and a few hundred times faster than this would be great:-
3 variables, 44 bands, 100 million pixels each approx

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants