Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: Add Q1, Q3, IQR (feature request) #87

Open
sjackman opened this issue Jul 12, 2017 · 2 comments · May be fixed by #273
Open

stats: Add Q1, Q3, IQR (feature request) #87

sjackman opened this issue Jul 12, 2017 · 2 comments · May be fixed by #273

Comments

@sjackman
Copy link

Hi. Would you consider adding the first and third quartile (25th and 75th percentile) to the output of stats? I'd like to calculate the interquartile range (IQR = Q3 - Q1), for the purposes of calculating the boxplot whisker thresholds:
lower = Q1 - 1.5 * IQR
upper = Q3 + 1.5 * IQR

Although it's easily calculated from Q1 and Q3, it'd be helpful to also output the IQR.

@BurntSushi
Copy link
Owner

Do other CSV tools do this?

I'm not necessarily opposed to doing this, but I would like to solidify some sort of policy around these things before going forward. Should we be relaxed about this and add any statistic we come up with? Or should we stick to a small set of very common statistics and avoid niche stats like this?

@sjackman
Copy link
Author

sjackman commented Jul 12, 2017

Hi, Andrew. Thanks for your quick response.

I use Miller mlr a lot. It can compute any arbitrary percentile, prefixed with p* e.g.

mlr stats1 -a p10,p25,p50,p75,p90

See http://johnkerl.org/miller-releases/miller-2.2.0/doc/reference.html#stats1

Other than percentiles, it computes…

  count   Count instances of fields
  mode    Find most-frequently-occurring values for fields; first-found wins tie
  sum     Compute sums of specified fields
  mean    Compute averages (sample means) of specified fields
  stddev  Compute sample standard deviation of specified fields
  var     Compute sample variance of specified fields
  meaneb  Estimate error bars for averages (assuming no sample autocorrelation)
  skewness Compute sample skewness of specified fields
  kurtosis Compute sample kurtosis of specified fields
  min     Compute minimum values of specified fields
  max     Compute maximum values of specified fields

m15a pushed a commit to m15a/xsv that referenced this issue May 28, 2021
m15a pushed a commit to m15a/xsv that referenced this issue May 31, 2021
@m15a m15a linked a pull request Jun 1, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants