Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monotone models #184

Closed
Garve opened this issue Nov 22, 2020 · 42 comments
Closed

Monotone models #184

Garve opened this issue Nov 22, 2020 · 42 comments
Labels
enhancement New feature or request

Comments

@Garve
Copy link

Garve commented Nov 22, 2020

Hi!

Are there plans to implement monotonic regressors, like it is possible for LightGBM, for example?

Thank you!

Best
Robert

@interpret-ml
Copy link
Collaborator

Hi @Garve,

Thanks for bringing this up, and sorry for our delay in getting back to you. We completely agree that the ability to enforce monotonicity would be a nice addition for EBMs, but we haven't had time to do it. There are a few different ways to implement this -- we can enforce it during training, like LightGBM, or we can provide options to enable it as a post-processing step on a trained EBM model.

When enforcing monotonicity during boosting, we've noticed that models tend to take advantage of correlated features to bypass the constraint. Enforcing monotonicity as a post-processing step might be more ideal, but it still requires further investigation for our model class. One way we've done this in the past is by applying isotonic regression (the Pool-Adjacent-Violators Algorithm or PAV) on the graphs that need to be monotonic.

We'll leave this issue open to track the demand for this feature, and will update this thread once we've made some progress on the research or implementation sides. If anyone would like to discuss this further or help out on this feature, we'd be happy to talk with you!

Thanks!
-InterpretML Team

@Garve
Copy link
Author

Garve commented Feb 9, 2021

Hello @interpret-ml team!

Thanks for the answer :) I would say that the enforcement during the training makes more sense. Doing it after the training only alters what the model is actually saying, right? As a very naive approach, I could use a max(0, [model output]). Then the model would say -, but we make it a 0. Feels kind of hacky to me.

The direct approach might have some issues with correlations, but these problems are always there, no? We can create a dataset X, y and insert a copy of some column of X into X again.

import numpy as np
from interpret.glassbox import ExplainableBoostingRegressor
from interpret import show

X = np.random.randn(10000, 2)
X = np.hstack([X, X[:,[0]]]) # insert copy

y = X[:, 0] + X[:, 1] 

ebm = ExplainableBoostingRegressor(interactions=False)
ebm.fit(X, y)

ebm_global = ebm.explain_global(name='EBM')
show(ebm_global)

The ExplainableBoostingRegressor also can't tell if feature 1 or feature 3 is more important. Both are even half as important as feature 2. This is also a problem due to correlation. Therefore, I think that the users should take care of correlation problems themselves.

What are your thought about this?

Thank you very much! :)

Best regards
Robert

@JoshuaC3
Copy link

JoshuaC3 commented Feb 9, 2021

I agree with Garve here. I feel the model will be more accurate if we train it on a monotonic Xn. At least this behaviour will be "understood" by the model and thus taken into account when boosting (possibly even learned by another (co-)variable).

I am currently doing a lot of research on the use of EBMs/GBMs to find heat coefficients and change-points in gas data when compared with outside air temperatures and other weather and non-weather variables. See here for some examples using piece-wise linear regression on the univariate case of temperature. I have also managed to recover change-points and some crude heating coefficients from the EBM models as well, but only when the data is very well behave, or a good deal of care is taken cleaning it before hand. I was planning to do a detailed write-up on this, and propose a python notebook example on how to treat the model after training, but it seems this is a good time to raise one of my findings/thoughts on monotonicity:

In the post-processing, one of the issues is, if there is a sizeable negative step, then in the monotonic increasing case, a smoother doesn't know which way to smooth it. At the x = 22.5 mark here, we see that a single anomalous reading has caused an undesired just in the final level. this means all values x > 23 predict approximately 5 units too high.

image

This is also true at the far left side of the graph where x = 0 should give y ~ 0.

I am now experimenting with weighted smoothing as a post-processing step, however, it seems rather more tricky and requires the original training data. Thus, it seems better to treat this at training time!

FYI - to save potential confusion, this case is not another variable vs gas, but one that is KNOWN to be monotonically increasing (vs temperature which is decreasing).

@Garve
Copy link
Author

Garve commented Mar 1, 2021

Hi again!

I implemented a very naive proof-of-concept version of an ExplainableBoostingMetaRegressor that takes any base regressor as input, see here on my Github. I can even give each feature an own base regressor.

Is it an option to implement it like this, just more efficiently? :D

To come back to the original problem: If I want monotonically increasing behavior in some features, I can give it an IsotonicRegression() from scikit-learn. If I want it decreasing, I give it an IsotonicRegression(increasing=False). If I need positive values, I can give it a IsotonicRegression(y_min=0) etc.

If I don't specify anything, it uses a DecisionTree with some small depth. Seems to work well!

Again a word of caution: The regressor seems to work, but I didn't test it too much too far. It's also not really efficient and doesn't work together with the show() function of interpret. It also doesn't support interactions so far. You can, however, get the nice graphs using the output_ attribute, i.e.

e = ExplainableBoostinMetaregressor()
e.fit(X, y)

for i in range(len(X)):
    plt.plot(e.domains_[i], e.outputs_[i])
    plt.title(i)
    plt.show()

I also didn't check how you guys implemented it, I just checked out this youtube video of how the algorithm works at a high level and tried to replicate this in code.

Best regards
Robert

@paulsendavidjay
Copy link

paulsendavidjay commented Apr 15, 2021

I also have a need for monotonicity, and would prefer to have it enforced during training. I don't really have a solution other than those that have been mentioned, but posting to +1 the column of demand.
David

@richcaruana
Copy link
Contributor

Hi David, Do you need monotonicity for just for one or a few variables, or for all variables?

@paulsendavidjay
Copy link

Hi Rich, I think we've talked before on this topic some months ago. I would need monotonicity on all features, ultimately, as I'm in a regulated space.

@JoshuaC3
Copy link

@richcaruana all of the above. Something like,
monotonic = None/0 implies no constraints
monotonic = 1 implies all increasing
monotonic = - 1 implies all decreasing
monotonic = [1, 0, 1, - 1, 0] implies [increasing, no constraint, increasing, decreasing, no constraint] for X with shape of 5.

@richcaruana
Copy link
Contributor

@paulsendavidjay: thanks for reminding me of our previous discussion. Completely agree with you that if you need monotonicity on all features then the best way to achieve that is via constraints imposed during training. Not sure how quickly we'll have that implemented, but it is on our radar.

@JoshuaC3: the interface you suggest (-1 = decreasing, 0 - no constraint, +1 = increasing) makes sense. Adding constraints to only a subset of features doesn't always achieve the effect you want. If there is no correlation among features, then imposing constraints per feature works exactly as you would expect, but in the usual case where there is correlation among features learning will do everything it can to get around the monotonicity constraints while still appearing to be monotone on the features you constrained. For example, imagine you have two copies of a feature (but aren't aware of it) and put a monotonicity constraint on one of the features, but not both. The model will satisfy the constraint on the feature you apply the constraint to, but will use the other copy of that feature which is unconstrained to undo what it has learned on the constrained feature, so in the end it is not correct to think of the model as being monotone on the constrained features since the model has used correlation among the features to undo that monotonicity. There are almost always many correlations among features in complex datasets, so this is a real problem and makes applying monotonicity constraints to subsets of features problematic. And this is a problem with monotonicity constraints for all learning methods, not just EBMs. At least the effects are more visible with glassbox methods like ours.

@JoshuaC3
Copy link

@richcaruana I should have said, it is the interface used by LightGBM, CatBoost and XGBoost.

I hadn't considered the colinearity effects for monotonic constraints in general here - what an excellent insight!! That said, I don't think it should cause many issues. Checking colinearity is something an ML practitioner should check as part of EDA/train-test-split/feature selection as standard. Additionally, a domain expert of type who is likely to set monotonic constraints should understand which of his independent variables are monotonically correlated with the dependent variable and with one-another.

In my main use case, the latter is certainly true. I know from the physics of the system I am predicting that the independent variables are all either positive or negative monotonic. I intend to share my use case at some as I feel it will be interesting and stimulate the discussion further!

Your last point is very pertinent - the fact that this is a glassbox model and has the rest of the interpret toolkit (counterfactuals etc) allows you to understand if/when this behaviour occurs. This is EXACTLY why I wish to use EBM over some of the more established GBMs with monotonic constraints! :D

@JoshuaC3
Copy link

JoshuaC3 commented Apr 19, 2021

I was considering @richcaruana's above concern: colinear, correlated or highly descriptive independent variables. Depending on the application and the reason for wanting to constrain some variable to be monotonic, including 2nd order terms could cause problems.

Some idea for how to control for this would be as follows:

  1. Exclude constrained variable from 2nd order features: monotonic_second_order='exclude'.
  2. If both are 1 then have the 2nd order as 1. If both are -1, have the 2nd order as -1. A mix 0, 1 or -1, 0 could then be strictly constrained 1 and -1 respectively. Finally, 1 and -1 would be 0: monotonic_second_order='strict'.
  3. Or, as above but with the mixed case 0, 1 and -1, 0 being weakly constrained 0 and 0: monotonic_second_order='weak'.
  4. Finally, ignore the constraints on the variables: monotonic_second_order='ignore'.

@interpret-ml
Copy link
Collaborator

interpret-ml commented Apr 19, 2021

Hi @Garve, @JoshuaC3, and @paulsendavidjay,

Thanks for the spirited discussion around this! Wanted to add to this thread with some utility code that post-processes any main effect graph to enforce monotonicity (after training):

from sklearn.isotonic import IsotonicRegression
from copy import deepcopy
import plotly.graph_objects as go
import numpy as np

def make_monotone(ebm, feature, direction='auto', inplace=False, visualize_changes=True):
    ''' Adjusts an individual feature to be monotone using isotonic regression. 
    
        Args:
            ebm: Fitted ExplainableBoostingClassifier or ExplainableBoostingRegressor.
            feature: Index or name of continuous univariate feature to apply monotone constraints
            direction: 'auto', 'increasing' or 'decreasing'. Auto decides sign based on Spearman correlation estimate.
            inplace: If True, modifies existing EBM in place. If False, returns new EBM.
            visualize_changes: Produces Plotly visualization highlighting edits.
            
        Returns:
            If not inplace, returns new EBM with monotonicity constraints.      
    '''
    if isinstance(feature, str): # Find feature index if passed as string
        feature_index = ebm.feature_names.index(feature)
    else:
        feature_index = feature
    
    x = np.array(range(len(ebm.additive_terms_[feature_index])))
    y = ebm.additive_terms_[feature_index]
    w = ebm.preprocessor_.col_bin_counts_[feature_index]

    # Fit isotonic regression weighted by training data bin counts
    direction = 'auto' if direction not in ['increasing', 'decreasing'] else direction == 'increasing'
    ir = IsotonicRegression(out_of_bounds="clip", increasing=direction)
    y_ = ir.fit_transform(x, y, sample_weight=w)
    
    ebm_mono = deepcopy(ebm)
    ebm_mono.additive_terms_[feature_index][1:] = y_[1:]
    
    # Plot changes to model
    if visualize_changes:
        ebm_global = ebm.explain_global()
        trace = ebm_mono.explain_global().visualize(feature_index)
        trace['data'][1]['line']['color'] = 'red'
        trace['data'][1]['name'] = "Monotone"

        source_layout = ebm_global.visualize(feature_index)['layout']
        source_data = list(ebm_global.visualize(feature_index)['data'])
        source_data = [source_data[index] for index, trace in enumerate(source_data) 
                       if trace.name in ["Main", "Distribution"]]
        source_data[0]['fill'] = None
        source_data.append(trace['data'][1])
        source_layout['showlegend'] = True

        fig_mono = go.Figure(
            data=source_data,
            layout=source_layout
        )

        fig_mono.show()

    # Modify in place or return copy
    if inplace:
        ebm.additive_terms_[feature_index][1:] = y_[1:]
    else:
        return ebm_mono

Here's a quick usage example:

modifed_ebm = make_monotone(ebm, feature='Age', direction='auto', inplace=False, visualize_changes=True)

which produces a new EBM and the following visualization (if visualize_changes=True) highlighting the changes made to the model. You can also modify an existing EBM in place with the inplace flag.

image

This function isn't fully featured or tested yet, but we wanted to share it here first to provide a temporary solution and get feedback. As @JoshuaC3 points out, this also may not enforce true monotone constraints when pairwise interactions containing the feature are present -- maybe we should throw a warning in those cases, or explore ways to postprocess constraints on pairwise interaction terms?

We don't intend for this to be a replacement for monotone constraints at training time, but it could be a nice supplemental utility function for the cases where montonicity via post-processing makes sense. It'd be useful for us to hear if this function works on your problems as we work on training time constraints!

-InterpretML Team

@paulsendavidjay
Copy link

@interpret-ml Very nice! I had spent some time a while ago looking at just such a post-processing method but was having difficulties with accessing the right data given my unfamiliarity with the objects, and had to drop it to work on other business items. This is a great solution that could applied to many business cases, with a clear visualization of the trade off. Thank you for such a quick turnaround!

@paulsendavidjay
Copy link

paulsendavidjay commented Apr 19, 2021

From the above code I get the following error:
AttributeError: 'EBMPreprocessor' object has no attribute 'col_bin_counts_'

modifying the code by replacing 'col_bin_counts_' with 'col_bin_edges_' and looks good!

@interpret-ml
Copy link
Collaborator

Hi @paulsendavidjay,

Same to you -- thanks for the quick feedback! It's a bit surprising that your EBMPreprocessor doesn't have the col_bin_counts_ attribute exposed. Any chance you can check what version of interpret you're on? 0.2.4 (our latest release) should have support for this.

From the command line:
pip show interpret

or in a python environment:

import interpret
interpret.__version__

should both show the version number. If you can upgrade, pip install -U interpret should do the trick. It won't make a big difference, but using the counts instead of the edges for weighting the isotonic regression would help the algorithm make better tradeoffs. Thanks again for testing it out so quickly!

@JoshuaC3
Copy link

Having given some further thought to the discussion here, I have raise the above issue. I think this would address some of the fears we had around constrained variables when used in 2nd order features, as well as 2nd order features in regulated spaces.

@flippercy
Copy link

Hi @interpret-ml:

Are we still working to add monotonic constraints during training to the algorithm, please? It would be great if this feature can be implemented since domain knowledge is crucial when a model is being used practically.

Thank you.

@huanvo88
Copy link

Hi @interpret-ml
I second @flippercy's comment. I am working in the insurance industry, and monotonic constraints are very important. Do we plan to add this to EBM soon?

Thank you.

@xiaohk
Copy link
Contributor

xiaohk commented Jun 24, 2021

Hey @huanvo88, I plan to work on EBM monotonicity through post-processing.

Just curious, are there laws or regulations that require insurance companies to use monotone ML models? If so, could you please point me to some related documents?

@huanvo88
Copy link

Hi @xiaohk , I think insurance in Canada is more regulated, and I am not dealing with filing so I don't have any legal documents to give you. But sometimes when we present the models to the business, they would require certain features to be increasing or decreasing. From the discussion on this thread it seems it is better to incorporate the constraint in the fitting (like XGBoost or Lightgbm) rather than a post processing, but we can use post processing if there is no better alternative.

@xiaohk
Copy link
Contributor

xiaohk commented Jun 24, 2021

Got it @huanvo88 , thanks!

If you only want certain features (not all) to be increasing or decreasing, post-processing might be a better solution than monotonic constraint. You can see #184 (comment)

paulsendavidjay: thanks for reminding me of our previous discussion. Completely agree with you that if you need monotonicity on all features then the best way to achieve that is via constraints imposed during training. Not sure how quickly we'll have that implemented, but it is on our radar.

JoshuaC3: the interface you suggest (-1 = decreasing, 0 - no constraint, +1 = increasing) makes sense. Adding constraints to only a subset of features doesn't always achieve the effect you want. If there is no correlation among features, then imposing constraints per feature works exactly as you would expect, but in the usual case where there is correlation among features learning will do everything it can to get around the monotonicity constraints while still appearing to be monotone on the features you constrained. For example, imagine you have two copies of a feature (but aren't aware of it) and put a monotonicity constraint on one of the features, but not both. The model will satisfy the constraint on the feature you apply the constraint to, but will use the other copy of that feature which is unconstrained to undo what it has learned on the constrained feature, so in the end it is not correct to think of the model as being monotone on the constrained features since the model has used correlation among the features to undo that monotonicity. There are almost always many correlations among features in complex datasets, so this is a real problem and makes applying monotonicity constraints to subsets of features problematic. And this is a problem with monotonicity constraints for all learning methods, not just EBMs. At least the effects are more visible with glassbox methods like ours.

@huanvo88
Copy link

Ah ok I see, thanks @xiaohk. Also just out of curiosity, in Xgboost, lightgbm, and catboost they also have the monotone constraints, I assume that is also post processing? Or did they implement it during the fitting process?

@xiaohk
Copy link
Contributor

xiaohk commented Jun 24, 2021

Ah ok I see, thanks @xiaohk. Also just out of curiosity, in Xgboost, lightgbm, and catboost they also have the monotone constraints, I assume that is also post processing? Or did they implement it during the fitting process?

They implement it as a monotonicity constraint during training. I believe monotonicity constraint during EBM training is on the development roadmap too.

@huanvo88
Copy link

@xiaohk it is good to know that it is on the development roadmap. So I assume for now you will work on the monotone post processing and push it to the next release?

@paulsendavidjay
Copy link

@huanvo88 During the fitting process, if the direction of the identified split L > R is different from the constraint L < R, then a split is simply not made.

In regulatory space, we are often required to give plain language explanations for adverse decisions based on model scores. Business leaders need need to make sure that these explanations are sensible. For example, it would make sense to say that 'you were declined a loan offer because your total debt is too high', if debt is the most impactful feature in that model for that individual. But if debt had a U-shaped pattern, it could happen but would not make sense to say that 'you were declined a loan because your debt is both too high and too low'. Monotonic constraints eliminate that possibility with rare exception.

@xiaohk
Copy link
Contributor

xiaohk commented Jun 24, 2021

@xiaohk it is good to know that it is on the development roadmap. So I assume for now you will work on the monotone post processing and push it to the next release?

@huanvo88 My stuff is still work-in-progress, but I will keep you updated. If you are interested, I can also show you the pre-release version in the next few weeks. I'd really love to get some feedback from you :)

For now, I suggest just to use Isotonic regression to find the best monotonic shape of your learned shape function. The code is included in #184 (comment).

@xiaohk
Copy link
Contributor

xiaohk commented Jun 24, 2021

huanvo88 During the fitting process, if the direction of the identified split L > R is different from the constraint L < R, then a split is simply not made.

In regulatory space, we are often required to give plain language explanations for adverse decisions based on model scores. Business leaders need need to make sure that these explanations are sensible. For example, it would make sense to say that 'you were declined a loan offer because your total debt is too high', if debt is the most impactful feature in that model for that individual. But if debt had a U-shaped pattern, it could happen but would not make sense to say that 'you were declined a loan because your debt is both too high and too low'. Monotonic constraints eliminate that possibility with rare exception.

Hey @paulsendavidjay, thanks for the reply! Your example makes a lot of sense. Just out of curiosity, what is the rare exception where monotonic constraint doesn't help?

@huanvo88
Copy link

huanvo88 commented Jun 24, 2021 via email

@paulsendavidjay
Copy link

you can check this out https://cs.stackexchange.com/questions/69220/random-forests-on-monotone-training-set-yields-a-monotone-classifier
It's not what I was thinking of, which was a paper demonstrating a more clever example of training a gbm using monotonic constraints to specifically violate monotonicity in the final model. I'm unable to find the ref however.

@JoshuaC3
Copy link

JoshuaC3 commented Jul 2, 2021

An interesting paper on better monotonic splits in Trees: https://arxiv.org/pdf/2011.00986.pdf

Having quickly read the paper, my initial understanding is that it improves on the monotonicity constraints as follows:

Then, when we make any split (monotone or not) in a branch having a
monotone node as a parent somewhere, after making the split, we need to check
that the new outputs are not violating any constraint on other leaves of the
tree. The general idea is that we should start from the node where a split was
just made, go up the tree, and every time a monotone node is encountered,
we should go down in the opposite branch and check that the constraints and
the new outputs from the new split are compatible. If they are not, then the
constraints need to be updated. Therefore making a split in a branch can very
well update the constraints of other leaves in another branch.

My intuition tells me that this may only be used at the 2nd order interaction terms stage of training.

Additionally, if my intuition is correct, the very small decrease in training time would be even less important as the "opposite-branch-check", as italicised in the quote above, would only need checking on a small subset of cases.

Finally, I accept because it might be used on only a small subset of cases, it might not worth implementing for a potentially small accuracy improvement. Nonetheless, it would be interesting to test and find out!

@xiaohk
Copy link
Contributor

xiaohk commented Sep 27, 2021

Hello @Garve, @paulsendavidjay, @JoshuaC3, @flippercy and @huanvo88 , thank you so much for using Interpret! I am Jay Wang, a research intern at the InterpretML team. We are developing a new visualization tool for EBM and recruiting participants for a user study to test out the new tool (see #283 for more details).

We think you are a good fit for this paid user study! If you are interested, you can sign up with the link in #283. Let me know if you have any question. Thank you!

@discdiver
Copy link

This would be a great feature! Any update on its implementation timeline?

@xiaohk
Copy link
Contributor

xiaohk commented Jan 9, 2022

@discdiver Maybe GAM Changer can help! You can try out this interactive tool to edit your EBM models and make them monotonic.

@SrayUM
Copy link

SrayUM commented Jan 16, 2023

Great thread! Thanks, everyone! I just want to follow up: is there any update about adding the monotonicity constraints in the training process? As @paulsendavidjay suggested that many industries are heavily regulated and we need to apply monotonicity constraints to meet those requirements. Thank you!

@paulbkoch paulbkoch mentioned this issue Jan 24, 2023
@paulbkoch paulbkoch added the enhancement New feature or request label Jan 24, 2023
@Jebin1999
Copy link

I get the error : Can't get the 'EBMPreprocessor' on <module 'intepret.glassbox.ebm.ebm' from ' /file/python3.10/site-packages/interpret/glassbox/ebm/ebm.py

while I load a EBM model in a pickle file

@paulbkoch
Copy link
Collaborator

@Jebin1999 -- You should probably open a new thread for questions like this, but I'll just say that this error is what I'd expect if you were to open a model built in interpret 0.2.7 in a 0.3.x version.

@Jebin1999
Copy link

Jebin1999 commented Apr 4, 2023 via email

@paulbkoch
Copy link
Collaborator

Our latest v0.4.0 release includes a post-processing monotonize function. Details on how to call it are available in our docs at: https://interpret.ml/docs/ebm.html#interpret.glassbox.ExplainableBoostingRegressor.monotonize

@Guillermogsjc
Copy link

it is great to have monotonicity constraint on EBM already. Just guessing if it would be possible to add into the roadmap concavity and convexity constraints.

By having those constraints configurable by feature:

  • monotonic inc
  • monotonic dec
  • convex
  • concave

i guess that all typical prior information would be available to adjust the models.

Interpretability is often needed together with those kind of prior constraints between features and target variable.

Kind regards.

@paulbkoch
Copy link
Collaborator

Hi @Guillermogsjc -- That's an interesting option. Using concave as an example, you could probably do a fairly good job by selecting the bin with the highest score and monitonizing the graph to the left as increasing and the graph to the right as decreasing. You could do an even better job by repeating this for a few of the highest bins instead of just the highest, and selecting the one that results in the smallest change.

I'm really curious to know what applications you had in mind for convex and concave constraints?

@Guillermogsjc
Copy link

Thanks for the proxy :)

Well, convex restrictions arise as a prior in any relationship between covariates and target variable where "It is known" that the feature must have a unique min or max respecting to the effect at the target variable. Probably even searching that min or max when modeling the case.

@paulbkoch
Copy link
Collaborator

We now support monotone constraints during fitting, so closing this issue. As detailed in the thread above, we almost always recommend using post processed model editing instead of applying constraints during fitting, except for investigative purposes or when you're 100% sure the underlying generation function has a fundamental monotonic relationship. An example of this would be when you're modeling a physical system (eg: the dark matter modeling papers on our readme). I've also updated our documentation to reflect this recommendation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Development

No branches or pull requests