New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Main factor identifiers #273
Comments
Hi @lukedex, Good question! In general, the final EBM models should not change much based on the order of the features due to the very low default learning rate ( Your larger question about forcing higher importances on a single feature due to prior knowledge is a trickier one. One option is to train EBM models in stages. For example, if the first feature is the feature of interest: from interpret.glassbox import ExplainableBoostingRegressor
ebm = ExplainableBoostingRegressor()
ebm.fit(X[:, 0], y) # Fit first EBM on only the important feature
# Generate new targets from residual left in problem
y_new = y - ebm.predict(X[:, 0])
# Fit second stage EBM on residuals
ebm_secondary = ExplainableBoostingRegressor()
ebm_secondary.fit(X[:, 1:], y_new) The idea here is to use the public interface to force the boosting process to learn exclusively on the first feature, and only then allow boosting on the subsequent features. You can then simply interpret the global explanations from both objects. However, to generate new predictions, you'll need to treat both of these models like a pipeline and sum their predictions -- unfortunately we don't have a utility today for combining two independent EBM models into one object. Hope this helps answer your question! |
Does the stage training using the public interface work for classification problem? Let’s say I need to apply the post-hoc treatment on main effects. Then it will be ideal if I can fix the main effects then retrain tne interaction. Appreciate your feedback on this. |
Hey @flysky555, what kind of post-hoc treatments do you want to apply on the main effects? I think this two-stage training should help you fix the main effect and then learn other features (e.g., interactions). By the way, I am Jay Wang, a research intern at the InterpretML team. We are developing a new visualization tool for EBM editing and recruiting participants for a user study (see #283 for more details). We think you are a good fit for this paid user study! If you are interested, you can sign up with the link in #283. Let me know if you have any question. Thank you! |
I am trying the impose post hoc monotonic constraints on the main effect.
Then the model needs to be recalibrated for the intercept and interactions.
I understand the two-staged training can be straight forward for the
regression by training on residual y-y_hat. But for classification, how to
define residual? Can you provide more insights on this?
Thanks for the links on case study. Let me check it out.
Thanks.
…On Mon, Oct 4, 2021 at 1:30 PM Jay Wang ***@***.***> wrote:
Hey @flysky555 <https://github.com/flysky555>, what kind of post-hoc
treatments do you want to apply on the main effects? I think this two-stage
training should help you fix the main effect and then learn other features
(e.g., interactions).
By the way, I am Jay Wang, a research intern at the InterpretML team. We
are developing a new visualization tool for EBM editing and recruiting
participants for a user study (see #283
<#283> for more details).
We think you are a good fit for this paid user study! If you are
interested, you can sign up with the link in #283
<#283>. Let me know if you
have any question. Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#273 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AVR4F4FIK6AQVCKLBEGBAQLUFHQCTANCNFSM5CIBLBHQ>
.
|
@flysky555 I see. Yes, I guess it would be harder to do the two-staged classifier training with the public interface. Maybe you can play around with the internal training code. There you can store the internal residuals as described in this paper. If you want to add monotonicity constraints during training, you might want to check out #184. There is some discussion on why monotonicity through post-processing is preferred. Spoiler alert, for the user study, we will also discuss about model monotonicity :) |
Hello,
I was wondering if it's possible for a GAM model to be told on which factor(s) to make tree(s) on first.
The idea behind this is that if I know a factor is very predictive that I want most of my model output to be affected by this factor. By extracting information from this factor (by fitting the first tree on this factor) I assume the model will be most influenced by this factor.
Is my reasoning correct and if so is it possible to do this? I believe it currently chooses a factor by random as I've tried to repeat my results and never managed to replicate the models perfectly- even with the same dataset fed in.
The text was updated successfully, but these errors were encountered: