Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBM Classifier Global Feature Importance x Random Forest Classifier with Morris Sensitivity Analysis #533

Open
gatihe opened this issue Apr 23, 2024 · 1 comment

Comments

@gatihe
Copy link

gatihe commented Apr 23, 2024

I'm trying to use InterpretML to identify the most relevant features for a classification problem.

After applying two different classifiers (EBM Classifier and Random Forest Classifier) over the same data and getting similar scores, I used InterpretML functionality to identify the most relevant features in each model.

EBM feature importance uses weighted Mean Absolute Score while Random Forest Classifier is being used with Morris Sensitivity Analysis. Even though the model performances are very much similar, they are listing different features as most relevant.

This makes me have some questions:

  • Is any of these methods (Global Feature Importance over EBM Classifier and Morris Sensitivity Analysis over Random Forest Classifier) valid to confirm key factors of a classification process?
  • If so, which one is more reliable?
  • Is there a method/logic to identify the most suitable approach (which method to use) for each scenario?

Best regards

@paulbkoch
Copy link
Collaborator

Hi @gatihe -- The models tend to "think" differently and if the performances are similar it would be difficult to choose which model is a better representation of the underlying generative function. I'm not aware of a way to do this at least. Perhaps @richcaruana has more thoughts on it.

The main benefit you get from using an EBM is that the EBM global explanations are an exact and complete representation of the model itself, so you aren't getting an approximate explanation that would be required from a black box model like a random forest. EBMs make no guarantees however regarding how well they match the underlying generative function. If the only thing you need is a feature importance metric, then I don't think the exactness of the explanation is a critical aspect.

There are also multiple ways that you can measure feature importance, so that's another thing to consider in your scenario. We offer the mean absolute score and the max-min score within the interpret package, but you can also calculate other alternatives yourself like the change in AUC when you remove each feature, etc. Each of these feature importance metrics will tell you different things about your model and data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants