-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Have other linear interpolation for percentiles #797
Comments
Question: there are many places in the xclim code where we use |
Well I guess it depends. Plus, If you which to have an unbiased mean instead of an unbiased median, other methods could give better results. I found this on this paper, but it is not published in a scientific journal AFAIK. I'm not qualified to know if it would be a good idea to use method 8 everywhere. |
@bzah Precipitation is clearly not a normal distribution, but temperature is much more... and the median is better than the mean for sure because it takes better into account the number of samples. So I would say method 9 for temperature may be better, and method 8 for precipitation.... |
I spent some time jumping from library to library to see how others do this. It's a kind of a mess, so I will summarize here what I learned. TL;DRIMO, it is safe and best to use On the method to useAs stated above, the method 8 of hyndman&fan should be used if we don't know the distribution function of the sample. In the case of day of year sample, I don't think there is a recognizable distribution function even for temperatures. Instinctively I would think daily averaged temperature of a date is independent of the temperature in the previous year at the same date, but maybe I'm wrong. About performancesIn order to find a quantile we need to have a look at each value in the array and reorganize them. In the case of day of year values, it doesn't matter much because ::percentile_doy expect daily values or coarser, and even if the dataset span on a 100 years, the sort will be done a 100 doy values. In fact, it might even have better performances to use About daskTo make things even simpler, Dask has its own implementation of quantiles. About other methodsBecause the fun never ends, there are other methods to compute quantiles. Final note@huard I think it is best to see case by case if
|
Thanks @bzah for the explanations. I'm happy to let you decide on a course of action. I suggest we focus on percentile_doy for now and not try to tweak with the DataArray.quantile. |
[Warning] _calc_perc now returns a masked array ! It is now possible to use other linear interpolation methods for percentiles. By default, it uses the type 8 of Hyndman&Fan which is similar to climdex and icclim behaviors.
[Warning] _calc_perc now returns a masked array ! It is now possible to use other linear interpolation methods for percentiles. By default, it uses the type 8 of Hyndman&Fan which is similar to climdex and icclim behaviors.
[#797] Update linear interpolation of percentiles
Description
Follow up to the discussion with @huard.
In other climate indices libraries such as climdex and icclim, the percentile interpolation method used is the 8th method of Hyndman and Fan. Both library have their own C/C++ implementation.
In Xclim, the default interpolation of numpy is used, which correspond to the 7th method of Hyndman and Fan.
However numpy does not implement the 8th method (see numpy/numpy#10736 (comment)).
It would be great if xclim:
np.percentile
calendar.percentile_doy
As usual, I can work on this issue.
The text was updated successfully, but these errors were encountered: