-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect quantile computation #353
Comments
We're currently using quickselect for quantiles, so unfortunately it's a little complicated. I'll take a look at the implementation, see if there's a fix, and potentially replace it with a simpler (albeit less performant) implementation. |
@tmcw I think quickselect doesn't matter here. What defines the behavior is this logic for sorted quantile. We seem to use some kind of a mix between Python's equivalent |
I agree that we should stick to numpy's behavior but just know that there still are controversies about their choices which are currently being discuted by Python's core concerning the addition of the |
Any update on this? |
I think we should switch from the blend of nearest and midpoint to just one or the other. Reading through the python documentation for quantiles it seems like they shipped a version with just It's tempting to support a linear/nearest/etc option so that this change doesn't have to be a major version bump, but if the current behavior isn't what anyone would want as a default, the change to fix this issue will be a major version bump. |
Somewhat related, this line of comment is a bit misleading (i was mislead by this thus made some incorrect comment) simple-statistics/src/quantile_sorted.js Line 27 in 1db09fc
p is definitely not an integer by this point. but |
@tmcw From the long and well established R stats ecosystem, the quantile function runs 9 different algorithms, of which
gives
Running all types
gives output (with annotations):
recovering the ss python implementation as |
While looking at the quantile computation, I have noticed that the quantile calculation does not match up with the calculations returned by numpy in python (can be considered as the reference).
There are 4 different ways to interpolate quantiles when one quantile does not land on an exact value:
linear
,lower
,higher
,midpoint
,nearest
. However it seems like neither of these matches with the way simple-statistics computes percentiles.Using data = [0, 0, 0.3, 1.2, 1.23, 3.5, 10, 12, 23.3, 32.1] and computing the 25th, median and 75th percentile we get:
For reproducibility, this is the code I used:
The text was updated successfully, but these errors were encountered: