Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete incorrect measures (outliers) #66

Open
jacobworsoe opened this issue Jun 14, 2017 · 3 comments
Open

Delete incorrect measures (outliers) #66

jacobworsoe opened this issue Jun 14, 2017 · 3 comments

Comments

@jacobworsoe
Copy link

One of my measures reported a load time of 54 seconds, which is clearly an outlier, and not a correct representation of the load time: https://jacobworsoe.github.io/speedtracker/default/?period=year

The problem is that a spike like that, makes the rest of the chart almost flatline and difficult to read.

It would be cool to be able to delete such outliers. Maybe from https://speedtracker.org using the encryption key, so only the admin can delete measures?

@edqwebdev
Copy link

I also suffer from this issue a lot, which makes viewing the long term trends almost impossible. It would be fantastic if there was some easy way to remove the outliers.

@eduardoboucas
Copy link
Member

If we could make the chart library aware of this and ignore these outliers when defining the scale, that would be ideal. It would avoid deleting actual data from the data files, which could potentially lead to corrupt data files.

I'll try to look into this in the next few days. If anyone else wants to have a go as well, let me know how I can help.

@jacobworsoe
Copy link
Author

@eduardoboucas thanks a lot for taking the time to look into this!

I actually started by deleting the entry in the data files manually, but quickly decided that I would most likely mess up the syntax of the file :)

A statistical solution would be cool. Something like data points that are more than 4 standard deviations from the average should not be shown in the chart. But to avoid false positive - it could be that the site was simply suddenly as lot slower or faster - an outlier should only be deleted, if there is only a single high/low measurement. If there are multiple measurements with the same high numbers, it must means that the measurements are correct and the site is indeed now a lot faster/slower.

As far as I can see, it can be difficult to make a statistical rule, that is always correct. So maybe it would be faster to make a feature that allows manual deletion in the data files?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants