Attribute drift detection #124

rajivsam · 2020-06-19T09:54:27Z

Implement a feature to detect attribute drift detection. We have features at a dataset (joint distribution) level, it looks like azure can do this at the attribute level. This is not difficult to do. It requires the following, check the nature of the attribute:
(1) If it is continuous (numeric)- the numpy dtype should be float, use the kolmogorov-smirnov 2 sample test to see if the attribute distribution in the training data and the data received in deployment have the same distribution: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.ks_2samp.html
(2) If it is categorical - the numpy dtype is object, use the chi-square test of independence:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
We need a contingency table to do this. We can get this using the group by functionality from pandas:
https://stackoverflow.com/questions/29901436/is-there-a-pythonic-way-to-do-a-contingency-table-in-pandas

rajivsam self-assigned this Jun 19, 2020

Provide feedback