Skip to content

Cramer's V measure for feature drift #1786

Answered by nirhutnik
arpan-sil asked this question in Q&A
Discussion options

You must be logged in to vote

@arpan-sil Thank you so much for your question!

So, this is right - Cramer's V is closer to 1 the more correlated the 2 variables are.
However, our "2 variables" here are not "variable A in train" and "Variable A in test".

Cramer's V (and by proxy, Chi-Square) checks for co-occurrences in the data - meaning when did feature A had value x, while feature B had value y. And counts all of these and uses them to calculate its statistic, which Cramer's V is just a simple normalization of that statistic.

However, when comparing 2 different datasets (train and test), there are no co-occurrences. This is the same feature in different datasets with no overlap (samples cannot repeat in train and test).

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@arpan-sil
Comment options

Answer selected by noamzbr
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
2 participants