WIP sklearn compatible ThresholdOptimizer #282

adrinjalali · 2020-02-02T15:48:39Z

This PR moves the ThresholdOptimizer towards a sklearn compatible one.

Still working on fixing the tests, and adding the sklearn's common tests for the estimator.

The changes are substantial, but I hope none of them are controversial.

p.s. the line limit is 80 here, since longer ones don't easily fit on my screen :D

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

fairlearn/postprocessing/_threshold_optimizer.py

MiroDudik · 2020-02-04T14:45:40Z

fairlearn/postprocessing/_threshold_optimizer.py

+
+    def __init__(self, *, estimator=None, constraints=DEMOGRAPHIC_PARITY,
+                 grid_size=1000, flip=True, plot=False, random_state=None):
+


We have lost unconstrained_predictor and all of the related logic. Was this intentional? The idea is that if this argument is present, we should use it in fit to initialize self._estimator (without calling self.unconstrained_predictor.fit(...)).

@adrinjalali can elaborate but from what I understand scikit-learn doesn't take already fitted models, but rather fits them itself. We can still have another version where we take an already fitted one, but it wouldn't be scikit-learn compatible. Is that about accurate?

Yes. GridSearchCV clones the estimators before fitting them, and that clears all the state they have. If we want to allow the user to re-fit on using a pre-fitted estimator, we can implement a warm_start strategy for the estimator to use the previously fitted one. But by default all of these should be fit from scratch, since grid search is fitting them on a new part of the data each time anyway.

That sounds reasonable. Would be sad to lose the existing functionality if we're planning to add it back in anyway... any chance we can add that warm start in this PR, or is that a major change?

added warm_start

Sorry for a delay on this, but I think that warm_star is confusing in this context.

Just to state the obvious: the high-level semantics of ThresholdOptimizer (after it is fitted) is really of a two-stage pipeline. The first step is implemented by self._estimator.predict(). It takes as input X and returns y. The returned y is then transformed into y' (inside self.predict()).

We would like to enable two styles of fitting this pipeline: (1) fitting both steps from the provided data; or (2) fitting only the second step from the provided data (and using the provided unconstrained_predictor as the first step). Now if I think about a warm start, I would expect re-fitting a previously fitted pipeline, using the previous pipeline as a warm start. This to me is quite different from (2), which is just fitting the second stage of the pipeline (from scratch) while keeping the first stage of the pipeline untouched.

I'm not quite sure I understand the issue with the interaction between the cloning behavior and the unconstrained_predictor argument.

maybe you're referring to this?
https://scikit-learn.org/stable/glossary.html#term-metaestimators

To help resolve this. I suggest that we also implement the pattern from CalibratedClassifierCV link

That would mean, using prefit=False instead of warm_Start=False.

Sorry took me a while to concentrate enough to understand what you mean.

I understand that warm_start is different from what we have here. I'm just not sure why it's really needed. In a usual pipeline, we usually don't have pre-fitted estimators to feed to meta-estimators. Let me open a separate issue to talk about it there.

fairlearn/postprocessing/_threshold_optimizer.py

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

romanlutz · 2020-02-05T16:12:03Z

Should there be a set of test cases to ensure we remain sklearn compatible?

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

adrinjalali · 2020-02-06T14:49:48Z

I'll add the sklearn tests once the existing tests pass.

I'm finding it difficult to make them pass since they're a bit obscure to me, but getting there.

One question: the output of predict_proba is of shape (n_samples, n_classes). How should we use them?

romanlutz · 2020-02-06T18:40:33Z

I'll add the sklearn tests once the existing tests pass.

I'm finding it difficult to make them pass since they're a bit obscure to me, but getting there.

One question: the output of predict_proba is of shape (n_samples, n_classes). How should we use them?

I don't think I understand your question properly. predict_proba has the probabilities, so each first entry is the probability for class 0 and the second entry the probability for class 1, etc. I think most of our work is binary classification right now, so n_classes = 2.

Please don't hesitate to ask if tests seem weird. They need to be made less obscure or explained with a little comment if they are a huge obstacle.

romanlutz · 2020-02-06T18:41:43Z

fairlearn/postprocessing/_threshold_optimizer.py

@@ -292,7 +293,7 @@ class ThresholdOptimizer(ClassifierMixin, BaseEstimator):

    :param estimator: An untrained estimator that will be trained, and
        subsequently its output will be postprocessed
-    :type estimator: An untrained estimator
+    :type estimator: An untrained estimator, default=SGDClassifier(loss='log')


Do we want to provide a default? If so, why that one? Just generally wondering, but I don't have a strong opinion

I thought we couldn't, since this was mutually exclusive with unconstrained_predictor ?

Hmmmm…. would be related to the warm_start below.

romanlutz · 2020-02-06T18:44:11Z

fairlearn/postprocessing/__init__.py

@@ -7,10 +7,8 @@
 learn how to adjust the predictor's output from the training data.
 """

-from ._postprocessing import PostProcessing  # noqa: F401


this probably warrants a description in the CHANGES.md as well :-)

romanlutz · 2020-02-06T18:57:02Z

fairlearn/postprocessing/_threshold_optimizer.py

+    EQUALIZED_ODDS: _threshold_optimization_equalized_odds}
+
+
+def _get_soft_predictions(estimator, X):


right now we're using predict, not predict_proba for this. However, when the scores are binary this method doesn't work as well as when using real-value scores so for examples (such as one of the notebooks) we've actually done this manually by wrapping the estimator and mapping predict_proba to predict. I'm not familiar with decision_function, but would we just use predict if the others aren't available?

Also adding @MiroDudik for awareness

Interesting. As @romanlutz says, previously we were only using predict as the first stage of the pipeline before applying thresholding. I think it's reasonable to consider the back-off sequence predict_proba -> decision_function -> predict, where in predict_proba we only look at the predicted probability of the positive class.

So maybe two fixes to the below:

for predict_proba, just extract the second column

after else, don't throw an error, but instead return predict

predict has very different semantics than predict_proba or decision_function. The probabilities predict_proba gives usually come from the raw values that the decision_function gives. You can think of the decision_function values as the distances from the hyperplanes in the case of an SVM.

predict always gives categorical results, and are not smooth. Either the method (ThresholdOptimizer) knows how to deal with that or it doesn't. If it doesn't, I don't think falling back to predict is a good idea. The predict function of a classifier shouldn't be giving soft outputs.

Alternatively, we could ask the user to either pass a classifier with a predict_proba or a decision_function provided, or a regression on a binary classifier which gives soft outputs as the predictions. One of our classifiers uses a regressor under the hoods in sklearn.

I'm lacking relevant sklearn knowledge here: does every classifier have either predict_proba or decision_function or are there any that don't?
I don't understand what you mean by regression on a binary classifier. How would that be called? In some examples we use predict_proba from LogisticRegression and take the second entry (assuming binary classification) which is the probability for label 1.

ThresholdOptimizer has no problem handling hard 0/1 predictions and that's in fact what the original paper was assuming. That's also the reason why we used predict in our original implementation. However, in those cases when the classifier is also providing a soft prediction (in the form of decision_function or predict_proba), it is obviously better to use that.

So I definitely think that we should allow users to run ThresholdOptimizer based on predict of the underlying classifier. What I wonder about is whether we should allow folks to force ThresholdOptimizer to use predict even in those cases when the classifier also implements the soft versions.

PROS: it allows you to run the "textbook version" of the algorithm
CONS: it's essentially guaranteed to lead to inferior performance

I don't understand what you mean by regression on a binary classifier.

one example is that the GradientBoostingClassifier uses DecisionTreeRegressor at each step instead of a DecisionTreeClassifier. In some cases people use a regressor on the binary classification problem with 0/1 as the target, and they use the soft predicted output as a proxy for probability.

We can have a predict_method parameter for the users to set it to whatever they want, and have it as auto by default.

I like the final suggestion! Let's go with predict_method parameter defaulted to auto. For auto, we can try predict_proba -> decision_function -> predict (also I don't mind dropping the final predict).

The only weirdness is that whenever predict_method is predict_proba, we will need to extract the second column of the output.

MiroDudik · 2020-02-10T19:26:42Z

fairlearn/postprocessing/_threshold_optimizer.py

+    """
+
+    def __init__(self, *, estimator=SGDClassifier(loss='log'),
+                 constraints=DEMOGRAPHIC_PARITY,


If we're using SGDClassifier as a default, shall we run it with random_state=0 (or something of the sort) to make it replicate? Alternatively, I don't think we necessarily need to have a default here.

I'm not very set on the SGDClassifier, really no hard feelings there. We could instead use HistGradientBoostingClassifier which is a much more modern one and people seem to like it.

Regarding the random state, since the user can set the estimator parameter, they can set it while passing a random state to it. It's the same in GridSearch and the underlying estimator for example. If the user needs a deterministic result, they should pass an rng to both.

Should we have a default at all? Is this commonly done?

I'm in favor of dropping the default value for the estimator. It might even make sense to drop a default on constraints.

sklearn's common tests assume all estimators can be constructed using default values, except for the estimator or a meta-estimator. So we could leave this as required, and follow the [private] protocol to make the common tests understand how it should construct it. It'll use a Ridge if the meta-estimator is a regressor and a LinearDiscriminantAnalysis if a classifier.

Alternatively, the tests on the class can be disabled/skipped and only test instances.

I think that the right approach is to go the meta-estimator route; this is something we will also want to use in our reductions, so if we implement it here, we can propagate it there.

On the API level, I think that ThresholdOptimizer should be similar to CalibratedClassifierCV link

Re. defaults, in CalibratedClassifierCV, the base_estimator is defaulted to None, which becomes LinearSVC(random_state=0) during fit. I'm happy to go with whatever plays better with sklearn; if we need to have some default, I think that any linear model would work (LinearSVC, LinearDiscriminantAnalysis, or SGDClassifier), I just thought it would be nicer if the default was deterministic.

romanlutz · 2020-03-23T19:36:17Z

@adrinjalali this is closable because you did this in multiple smaller PRs, right?

adrinjalali · 2020-03-24T11:05:03Z

#342 should close this one :)

romanlutz · 2020-03-24T22:30:14Z

#342 should close this one :)

You probably mean the issue. I meant this PR :-) Are you still using this or is the PR closable?

adrinjalali added 2 commits February 2, 2020 16:43

sklearn compatible ThresholdOptimizer

06da6b3

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

remove _postprosessing.py

10754c4

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

romanlutz reviewed Feb 3, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Show resolved Hide resolved

romanlutz reviewed Feb 3, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Outdated Show resolved Hide resolved

romanlutz reviewed Feb 3, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Show resolved Hide resolved

romanlutz added this to the Integration with scikit-learn milestone Feb 3, 2020

romanlutz added ease-of-use enhancement New feature or request labels Feb 3, 2020

romanlutz reviewed Feb 3, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Show resolved Hide resolved

MiroDudik reviewed Feb 4, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Show resolved Hide resolved

MiroDudik reviewed Feb 4, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Show resolved Hide resolved

MiroDudik reviewed Feb 4, 2020

View reviewed changes

fairlearn/postprocessing/_threshold_optimizer.py Show resolved Hide resolved

apply suggestions

78b8963

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

adrinjalali added 5 commits February 6, 2020 12:47

add warm_start

6c0cc01

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

further fixes, passing a part of the tests

390544e

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

further fixes, passing a part of the tests

f7326a6

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

accept arbitrary input type for sensitive features

df89f19

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

use log loss to have predict_proba

fa2c809

Signed-off-by: adrinjalali <adrin.jalali@gmail.com>

romanlutz reviewed Feb 6, 2020

View reviewed changes

MiroDudik reviewed Feb 10, 2020

View reviewed changes

adrinjalali mentioned this pull request Feb 12, 2020

MNT makefile improvements #291

Merged

This was referenced Feb 19, 2020

FIX ThresholdOptimizer: move validations to fit #304

Merged

ThresholdOptimizer as a scikit-learn meta-estimator #308

Closed

adrinjalali mentioned this pull request Mar 24, 2020

[WIP] FIX make ThresholdOptimizer sklearn compatible #342

Open

MiroDudik mentioned this pull request Jun 16, 2020

Add new constraints to ThresholdOptimizer and add InterpolatedThresholder #381

Merged

MiroDudik removed the ease-of-use label Jun 22, 2020

romanlutz removed this from the scikit-learn compatibility milestone Jun 22, 2020

MiroDudik mentioned this pull request Oct 2, 2020

Add random_state to ThresholdOptimizer #588

Closed

Base automatically changed from master to main February 6, 2021 06:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP sklearn compatible ThresholdOptimizer #282

WIP sklearn compatible ThresholdOptimizer #282

adrinjalali commented Feb 2, 2020

MiroDudik Feb 4, 2020

romanlutz Feb 4, 2020

adrinjalali Feb 5, 2020

romanlutz Feb 5, 2020

adrinjalali Feb 6, 2020

MiroDudik Feb 10, 2020

MiroDudik Feb 14, 2020

MiroDudik Feb 18, 2020

adrinjalali Feb 24, 2020

romanlutz commented Feb 5, 2020

adrinjalali commented Feb 6, 2020

romanlutz commented Feb 6, 2020

romanlutz Feb 6, 2020

riedgar-ms Feb 6, 2020

riedgar-ms Feb 6, 2020

romanlutz Feb 6, 2020

romanlutz Feb 6, 2020

romanlutz Feb 6, 2020

MiroDudik Feb 10, 2020

adrinjalali Feb 12, 2020

romanlutz Feb 12, 2020

MiroDudik Feb 13, 2020

adrinjalali Feb 17, 2020

MiroDudik Feb 18, 2020

MiroDudik Feb 10, 2020

adrinjalali Feb 12, 2020

romanlutz Feb 12, 2020

MiroDudik Feb 13, 2020

adrinjalali Feb 17, 2020

MiroDudik Feb 18, 2020

romanlutz commented Mar 23, 2020

adrinjalali commented Mar 24, 2020

romanlutz commented Mar 24, 2020


		def __init__(self, *, estimator=None, constraints=DEMOGRAPHIC_PARITY,
		grid_size=1000, flip=True, plot=False, random_state=None):

		EQUALIZED_ODDS: _threshold_optimization_equalized_odds}


		def _get_soft_predictions(estimator, X):

WIP sklearn compatible ThresholdOptimizer #282

Are you sure you want to change the base?

WIP sklearn compatible ThresholdOptimizer #282

Conversation

adrinjalali commented Feb 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanlutz commented Feb 5, 2020

adrinjalali commented Feb 6, 2020

romanlutz commented Feb 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romanlutz commented Mar 23, 2020

adrinjalali commented Mar 24, 2020

romanlutz commented Mar 24, 2020