Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute drift detection #124

Open
rajivsam opened this issue Jun 19, 2020 · 0 comments
Open

Attribute drift detection #124

rajivsam opened this issue Jun 19, 2020 · 0 comments
Assignees

Comments

@rajivsam
Copy link
Contributor

rajivsam commented Jun 19, 2020

Implement a feature to detect attribute drift detection. We have features at a dataset (joint distribution) level, it looks like azure can do this at the attribute level. This is not difficult to do. It requires the following, check the nature of the attribute:
(1) If it is continuous (numeric)- the numpy dtype should be float, use the kolmogorov-smirnov 2 sample test to see if the attribute distribution in the training data and the data received in deployment have the same distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html
(2) If it is categorical - the numpy dtype is object, use the chi-square test of independence:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
We need a contingency table to do this. We can get this using the group by functionality from pandas:
https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas

Note: Check https://towardsdatascience.com/how-to-compare-two-distributions-in-practice-8c676904a285 to see if a completely discrete non-parametric test makes sense.

@rajivsam rajivsam self-assigned this Jun 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant