Main factor identifiers #273

lukedex · 2021-08-16T16:11:06Z

Hello,

I was wondering if it's possible for a GAM model to be told on which factor(s) to make tree(s) on first.

The idea behind this is that if I know a factor is very predictive that I want most of my model output to be affected by this factor. By extracting information from this factor (by fitting the first tree on this factor) I assume the model will be most influenced by this factor.

Is my reasoning correct and if so is it possible to do this? I believe it currently chooses a factor by random as I've tried to repeat my results and never managed to replicate the models perfectly- even with the same dataset fed in.

interpret-ml · 2021-08-17T21:29:15Z

Hi @lukedex,

Good question! In general, the final EBM models should not change much based on the order of the features due to the very low default learning rate (learning_rate = 0.01). If you do wish to influence the order of training per round, our code simply loops through the columns of your training dataset (i.e. X[:, 0] -> X[:, 1] -> ... -> X[:, n] ), so reordering your training set columns will change the order of learning per round.

Your larger question about forcing higher importances on a single feature due to prior knowledge is a trickier one. One option is to train EBM models in stages. For example, if the first feature is the feature of interest:

from interpret.glassbox import ExplainableBoostingRegressor

ebm = ExplainableBoostingRegressor()
ebm.fit(X[:, 0], y)  # Fit first EBM on only the important feature

# Generate new targets from residual left in problem
y_new = y - ebm.predict(X[:, 0])

# Fit second stage EBM on residuals
ebm_secondary = ExplainableBoostingRegressor()
ebm_secondary.fit(X[:, 1:], y_new)

The idea here is to use the public interface to force the boosting process to learn exclusively on the first feature, and only then allow boosting on the subsequent features. You can then simply interpret the global explanations from both objects. However, to generate new predictions, you'll need to treat both of these models like a pipeline and sum their predictions -- unfortunately we don't have a utility today for combining two independent EBM models into one object.

Hope this helps answer your question!
-InterpretML Team

flysky555 · 2021-09-09T21:47:26Z

Does the stage training using the public interface work for classification problem? Let’s say I need to apply the post-hoc treatment on main effects. Then it will be ideal if I can fix the main effects then retrain tne interaction. Appreciate your feedback on this.

xiaohk · 2021-10-04T17:30:06Z

Hey @flysky555, what kind of post-hoc treatments do you want to apply on the main effects? I think this two-stage training should help you fix the main effect and then learn other features (e.g., interactions).

By the way, I am Jay Wang, a research intern at the InterpretML team. We are developing a new visualization tool for EBM editing and recruiting participants for a user study (see #283 for more details).

We think you are a good fit for this paid user study! If you are interested, you can sign up with the link in #283. Let me know if you have any question. Thank you!

flysky555 · 2021-10-04T18:39:31Z

I am trying the impose post hoc monotonic constraints on the main effect. Then the model needs to be recalibrated for the intercept and interactions. I understand the two-staged training can be straight forward for the regression by training on residual y-y_hat. But for classification, how to define residual? Can you provide more insights on this? Thanks for the links on case study. Let me check it out. Thanks.

…

On Mon, Oct 4, 2021 at 1:30 PM Jay Wang ***@***.***> wrote: Hey @flysky555 <https://github.com/flysky555>, what kind of post-hoc treatments do you want to apply on the main effects? I think this two-stage training should help you fix the main effect and then learn other features (e.g., interactions). By the way, I am Jay Wang, a research intern at the InterpretML team. We are developing a new visualization tool for EBM editing and recruiting participants for a user study (see #283 <#283> for more details). We think you are a good fit for this paid user study! If you are interested, you can sign up with the link in #283 <#283>. Let me know if you have any question. Thank you! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#273 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AVR4F4FIK6AQVCKLBEGBAQLUFHQCTANCNFSM5CIBLBHQ> .

xiaohk · 2021-10-04T19:36:03Z

@flysky555 I see. Yes, I guess it would be harder to do the two-staged classifier training with the public interface. Maybe you can play around with the internal training code. There you can store the internal residuals as described in this paper.

If you want to add monotonicity constraints during training, you might want to check out #184. There is some discussion on why monotonicity through post-processing is preferred.

Spoiler alert, for the user study, we will also discuss about model monotonicity :)

richcaruana · 2021-10-04T20:08:50Z

interpret-ml closed this as completed Dec 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Main factor identifiers #273

Main factor identifiers #273

lukedex commented Aug 16, 2021

interpret-ml commented Aug 17, 2021

flysky555 commented Sep 9, 2021

xiaohk commented Oct 4, 2021

flysky555 commented Oct 4, 2021 via email •

edited

xiaohk commented Oct 4, 2021 •

edited

richcaruana commented Oct 4, 2021 via email

Main factor identifiers #273

Main factor identifiers #273

Comments

lukedex commented Aug 16, 2021

interpret-ml commented Aug 17, 2021

flysky555 commented Sep 9, 2021

xiaohk commented Oct 4, 2021

flysky555 commented Oct 4, 2021 via email • edited

xiaohk commented Oct 4, 2021 • edited

richcaruana commented Oct 4, 2021 via email

flysky555 commented Oct 4, 2021 via email •

edited

xiaohk commented Oct 4, 2021 •

edited