Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quantile_rank #119

Open
jonlachlan opened this issue Aug 5, 2015 · 15 comments
Open

quantile_rank #119

jonlachlan opened this issue Aug 5, 2015 · 15 comments

Comments

@jonlachlan
Copy link

I'm looking for a way to calculate the quantile_rank of a specific data point. For example, I may want to know that a data point is in the 80th percentile, or in the 3rd quartile. I think I'd prefer a precise value, for example "this value is at the 3.245 quartile", which I can round to 3 if I want.

I've done this in Postgres using the ntile() window function (http://www.postgresql.org/docs/9.4/static/functions-window.html), but I'd like to do it in Javascript.

@jonlachlan
Copy link
Author

I suppose if I have a sorted array, I know the rank of each value, then take that as a percentage of the array length, then multiply by the number of quantiles I'm are looking for.

So if 75 is the 100th element in an array of 200, then this is the 100/200 or .5. Multiply by 100 to get the percentile rank (50) or multiply by 4 for the quartile rank (2).

I think I'm good do this on my own, but thanks anyways :)

@tmcw
Copy link
Member

tmcw commented Aug 5, 2015

so the way to do this would probably be something like

var myNumber = ...;
var myArray = [...];
var qRank = ss.bisect(myArray.sort(), myNumber) / myArray.length;

@jonlachlan
Copy link
Author

Hmm I'm not seeing a bisect function, is that in a new ss dist?

@tmcw
Copy link
Member

tmcw commented Aug 6, 2015

There currently isn't one - I'm proposing we could include one

@denizdogan
Copy link

Has anyone made any progress on this? I have an unsorted array of values as [x, y, z, w] and I'm looking to know what percentile z is in, is that a separate issue or is this the one? I noticed that stats-lite has such a function, but since I already use simple-statistics, I'm not too keen on switching now.

@Yomguithereal
Copy link
Member

@denizdogan it seems the bisect function is now part of the current release on npm. Have you tried @tmcw's solution?

@Yomguithereal
Copy link
Member

What's more, I don't see a function such as the one you need in the stats-lite module.

@Yomguithereal
Copy link
Member

@tmcw on a side note, should we implement a quantile_rank function based on scipy? https://github.com/scipy/scipy/blob/v1.0.0/scipy/stats/stats.py#L1709-L1802

@denizdogan
Copy link

@Yomguithereal I spoke too quickly, the function in stats-lite is slightly different than the one I proposed. Anyways, I honestly don't understand how I'm supposed to use the bisect function, its signature is different than the one in the comment above. :/

@Yomguithereal
Copy link
Member

@denizdogan yes it seems the signature changed to take a function rather than an array. So instead, you should probably use a basic binary search (I assume the value you want the percentile from is in your array, else it would be slightly more complex:

function binarySearch(array, value) {
  var mid = 0;
  var lo = 0;
  var hi = array.length;

  hi--;

  var current;

  while (lo <= hi) {
    mid = (lo + hi) >>> 1;

    current = array[mid];

    if (current > value) {
      hi = ~-mid;
    }
    else if (current < value) {
      lo = -~mid;
    }
    else {
      return mid;
    }
  }

  return -1;
}

Then:

var qRank = binarySearch(myArray.sort(), myNumber) / myArray.length;

@Yomguithereal
Copy link
Member

To get the percentile, multiply by 100 obviously :)

@Yomguithereal
Copy link
Member

Or, even simpler if you don't care about performance:

var qRank = myArray.sort().indexOf(myNumber) / myArray.length;

@denizdogan
Copy link

@Yomguithereal Thanks a lot, much appreciated! :)

@Yomguithereal
Copy link
Member

Note however that if you have the found value multiple times, this formula is not completely correct and there are many ways to answer the question. For instance, default scipy would get the mean of the indices where your value is found in the sorted array.

Yomguithereal added a commit that referenced this issue Mar 20, 2018
@Yomguithereal
Copy link
Member

@denizdogan @tmcw I drafted a PR to add the quantileRankSorted method to the library. @denizdogan you should probably base your work on it since it should be more precise than things I've told you earlier.

Yomguithereal added a commit that referenced this issue Mar 20, 2018
Yomguithereal added a commit that referenced this issue Mar 23, 2018
Yomguithereal added a commit that referenced this issue Mar 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants