Skip to content
This repository has been archived by the owner on May 14, 2018. It is now read-only.

Identify outliers #6

Open
karthik opened this issue Mar 31, 2014 · 4 comments
Open

Identify outliers #6

karthik opened this issue Mar 31, 2014 · 4 comments

Comments

@karthik
Copy link
Owner

karthik commented Mar 31, 2014

So read all numeric columns and return rows that seem like they contain outliers.
We can use the row index to make a whisker page (@sckott could help here).
@hilaryparker Want to take this one?

@sckott
Copy link
Contributor

sckott commented Mar 31, 2014

@hilaryparker assign yourself if you want to do, otherwise i can take a whack at it

@emhart
Copy link

emhart commented Mar 31, 2014

@hilaryparker in issue #1 we talked about visualizing this. I wrote a little gist to tackle it using an iqr cutoff. May or may not be useful. https://gist.github.com/emhart/9025719

@davharris
Copy link
Contributor

Possibly relevant CRAN packages:

The first one does simple one-dimensional stuff, possibly assuming everything is Gaussian. Probably not useful. The second one looks much more interesting. It looks at the multivariate distribution and claims to use more robust methods.

I've never used either package or checked them for correctness.

@hilaryparker
Copy link

Just to close the loop on this, the function for identifying outliers right now finds outliers outside +/-1.96_SD for now. It's also flexible so that you can change the outlier detection method to 1.5_IQR, etc. if you so choose.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants