Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Information theory metrics calculation issue #16

Open
surbhir08 opened this issue Apr 29, 2022 · 0 comments
Open

Information theory metrics calculation issue #16

surbhir08 opened this issue Apr 29, 2022 · 0 comments

Comments

@surbhir08
Copy link

Hi Team,
Thanks for addressing the issue of density estimation for multidimensional data.
I have a few questions as I am trying to implement information theory metrics:

  • Q1.Is this method apt for high dimensional tabular data?
  • Q2.I have been trying to run RBIG mutual info() over a tabular data and the results are exact same for all of them, I did check the results using SK learn MI score and got variables results (results not normalized in both cases- SK learn and RBIG). I don't understand the error, can you in anyway help me with this?

below is the piece of code I used:

X: features (attributes not in Y)
Y: set of y attributes (attributes not in X) (let's say y1,y2,y3,y4)
def calculate_miscore_xa(data,X,Y):
mis_xy = []
y_attributes = []
for y in Y:
rbig_model = MutualInfoRBIG(max_layers = 10000)
rbig_model.fit(data[X], data[[y]]);
mi_rbig = rbig_model.mutual_info() * np.log(2)
mis_xy.append(mi_rbig)
y_attributes.append(a)
mis_xy = pd.DataFrame({'Y':y_attributes, 'I(Xi,Y)': mis_xy})
return mis_xy

basically the results I am getting is
I(X,y1) = I(X,y2) = I(X,y3) = I(X,y4) = exact same
It's unusual hence I checked the results using https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html and the results for I(X,y1), I(X,y2), I(X,y3),I(X,y4) differ.
Can you help me understand if there is anythings I am doing wrong ?

Also the original calculation using entropy implemented in information theory notebook can be used used as base for tabular data by substituting respective X and Y in 2d format?

Thanks and Regards
Surbhi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant