FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120

glemaitre · 2023-04-07T12:03:09Z

superseded #16525
closes #16525
closes #8614
closes #10117
supersedes #10117

build upon #26037

relates to #4813

Summary

We introduce a TunedThresholdClassifier that intends to post-tune the cut-off points to convert a soft decision of the decision_function or predict_proba to a hard decision provided by predict.

Important features to have in mind:

objective_metric: the objective metric is set to either a metric to be maximized or a pair of metrics, one to be optimized under the constraint of the other (to find a trade-off). Additionally, we can pass a cost/gain-matrix that could be used to optimize a business metric. For this case, we are limited to constant cost/gain. In the future, we can think of cost/gain that depends on the matrix X but we would need to be able to forward meta-data to the scorer (a good additional use case for SLEP006 @adrinjalali).

cv and refit: we provide some flexibility to pass refitted model and single train/test split. We add limitations and documentation to caveats with an example.

Point to discuss

Are we fine with the name TunedThresholdClassifier? Shall instead have something about "threshold" (e.g. ThresholdTuner)?
We are using the term objective_metric, constraint_value and objective_score. Is the naming fine? An alternative to "objective" might be "utility"?

Further work

I implemented currently a single example that shows the feature in the context of post-tuning of the decision threshold.

The current example is using a single train/test split for the figure and I think it would be nice to have some ROC/precision-recall curve obtained from cross-validation to be complete. However, we need some new features to be implemented.

I also am planning to analyse the usage of this feature on the problem of calibration on imbalanced classification problems. The feeling on this topic is that resampling strategies involved an implicit tuning of the decision threshold at the cost of a badly calibrated model. It might be better to learn a model on the imbalanced problem directly, making sure that it is well calibrated and then post-tune the decision threshold for "hard" prediction. In this case, you get the best of two worlds: a calibrated model if the output of predict_proba is important to you and an optimum hard predictor for your specific utility metric. However, this is going to need some investigation and will be better suited for another PR.

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

…_classifier_again

adrinjalali

Had a look into the API

adrinjalali · 2024-05-02T07:59:35Z

sklearn/model_selection/_classification_threshold.py

+    if curve_scorer is roc_curve or (
+        isinstance(curve_scorer, _BaseScorer) and curve_scorer._score_func is roc_curve
+    ):
+        fpr, tpr, potential_thresholds = curve_scorer(
+            classifier, X_val, y_val, **score_params_val
+        )
+        # For fpr=0/tpr=0, the threshold is set to `np.inf`. We need to remove it.
+        fpr, tpr, potential_thresholds = fpr[1:], tpr[1:], potential_thresholds[1:]
+        # thresholds are in decreasing order
+        return potential_thresholds[::-1], ((1 - fpr)[::-1], tpr[::-1])
+    elif curve_scorer is precision_recall_curve or (
+        isinstance(curve_scorer, _BaseScorer)
+        and curve_scorer._score_func is precision_recall_curve
+    ):
+        precision, recall, potential_thresholds = curve_scorer(
+            classifier, X_val, y_val, **score_params_val
+        )
+        # thresholds are in increasing order
+        # the last element of the precision and recall is not associated with any
+        # threshold and should be discarded
+        return potential_thresholds, (precision[:-1], recall[:-1])


I would rather have these in two separate CurveScorers and not have this specific code here. It's looking odd to have this very specific treatment of the score functions here. This method shouldn't be caring about it.

I would rather have these in two separate CurveScorers and not have this specific code here.

I'm not sure to understand exactly what you want. Here, we call make_scorer on these parametric curve (not _CurveScorer) and since the API for the two functions is not consistent, we need to have this specific treatment.

Could you provide a small pseudo code on your vision, maybe I could understand better.

What I mean is to have a new CureScorerMixin and have roc_auc and precision_recall_curve to be one such scorer as well.

As I mentioned earlier I also think that there's room for improvement in the Scorers design to simplify this, but it is the kind of internal details that I think can be sorted out in a follow-up PR because it doesn't impact the public API.

Indeed, we had this exact discussion with @jeremiedbb as he mentioned we thought about not addressing it in this PR.

It is the kind of technical debt that we don't tend to pay immediately after, that's why I'm worried. Could you please open an issue for this?

+1 for creating an issue, after merging it because it's hard to point to code that does not exist yet 😃

We can point to this PR

I open the following issue to be edited in the future: #28941

sklearn/model_selection/_classification_threshold.py

adrinjalali · 2024-05-02T08:10:44Z

sklearn/model_selection/_classification_threshold.py

+    objective_metric : {"max_tpr_at_tnr_constraint", "max_tnr_at_tpr_constraint", \
+            "max_precision_at_recall_constraint, "max_recall_at_precision_constraint"} \
+            , str, dict or callable, default="balanced_accuracy"
+        The objective metric to be optimized. Can be one of:
+
+        * a string associated to a scoring function for binary classification
+          (see model evaluation documentation);
+        * a scorer callable object created with :func:`~sklearn.metrics.make_scorer`;
+        * `"max_tnr_at_tpr_constraint"`: find the decision threshold for a true
+          positive ratio (TPR) of `constraint_value`;
+        * `"max_tpr_at_tnr_constraint"`: find the decision threshold for a true
+          negative ratio (TNR) of `constraint_value`.
+        * `"max_precision_at_recall_constraint"`: find the decision threshold for a
+          recall of `constraint_value`;
+        * `"max_recall_at_precision_constraint"`: find the decision threshold for a
+          precision of `constraint_value`.
+
+    constraint_value : float, default=None
+        The value associated with the `objective_metric` metric for which we
+        want to find the decision threshold when `objective_metric` is either
+        `"max_tnr_at_tpr_constraint"`, `"max_tpr_at_tnr_constraint"`,
+        `"max_precision_at_recall_constraint"`, or
+        `"max_recall_at_precision_constraint"`.


This is in a sense reinventing what in GridSearch we already have as a callable refit. I find the API here quite odd. It's also not making the implementation here clean.

I think it would make more sense to accept a callable here that takes the cv results and gives you what needs to be used, then we have 4 pre-defined callables which users and us internally can use, and then we might consider allowing a shortcut for using those callables.

I think that it makes the code cleaner but not necessarily more accessible for end user. Having to provide a string an a single value is a low entry bar. Provide a callable (even a pre-defined one) mean that people would need to understand what this thing is doing and the discoverability is also more complex.

However, this looks again the dilemma between the score provided as a string or as a scorer. So maybe the current API will bite us in the future. At least here, we should not need to extend the options.

@jeremiedbb @ogrisel what are you thoughts on the API. Do you see alternative to the what is discussed here?

I had a similar concern, assuming that users will need more flexibility, e.g. constraint on different metrics. I also don't find the current 4 strings ideal but I understand that they probably represent 90%+ of the use cases, so I think it's acceptable as a start. We can add the option for a callable later if we receive requests.

This is what I think users should do:

clf = TunedThreasholdClassifierCV( estimator=LogisticRegression(), objective=MaxTprAtTnrConstraint(tnr=0.3), ... )

And we expose those callable classes to users, and document clearly how users can write one of their own.

This also makes the implementation here in this class much cleaner.

I also think it's worth special casing the precision/recall and TNR/TPR tradeoffs with the string / float value combo as:

it makes the very common cases easy to implement without having to write custom filtering logic,

the intent of an expression such as TunedThresholdClassifier(clf, objective_metric=max_precision_at_recall_constraint, constraint_value=0.3) more easily understandable in code than something like:

def max_precision_at_recall(results_at_thresholds, recall=0.3): # complex user written code here ... return selected_threshold_idx TunedThresholdClassifier( clf, objective_metric=["precision", "recall"], constraint=max_precision_at_recall )

So I am +1 for keeping it this way in this first iteration and explore adding more flexibility to custom filtering logic in a follow-up PR.

I personally don't mind either way, but I consider both of them better than what we have now. Historically, we haven't had a good experience passing metric name and then metric parameters to the estimators. We've had things like Estimator(..., metric=..., p=.., metric_params=...) and I prefer the metric passed to have all that information since that's where that information belongs.

Well I'd prefer to have metrics as objects as well, but everytime the discussion came up, there was a strong opposition against it.

The reason I'm not happy with going forward with this as is, is that I don't want us to release and then immediately deprecated constraint_value, which I think doesn't belong to the estimator and it belongs to the constraint.

We have many estimators with coupled parameters. On the opposite we almost never have parameters with a complex type. So in fact this would make it a special case in our API. I also think a motivation for keeping separate parameters is to make parameter search easier.

I don't think we'll need to deprecate constraint_value right away. We can keep the combination of objective_metric=<str> + constraint_value=<value>, easy to use and for most use cases, and add support for objective_metric=callable for advanced use cases.

Be aware that we have the exact the same issue with pos_label (@glemaitre shooting himself a bullet in his on foot)

sklearn/model_selection/_classification_threshold.py

ogrisel

Some more feedback.

sklearn/model_selection/_classification_threshold.py

examples/model_selection/plot_tuned_decision_threshold.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

sklearn/model_selection/_classification_threshold.py

jeremiedbb · 2024-05-03T13:30:58Z

sklearn/model_selection/_classification_threshold.py

-                    "max_recall_at_precision_constraint",
-                }
-            ),
+            StrOptions(set(get_scorer_names())),


We should limit to classification metrics

Uhm not sure how we should handle that. We don't have any public mechanism available (only present in the test). I recall to have open such a useful PR at some point in time: #17930

let's figure this out separately then :)

sklearn/model_selection/_classification_threshold.py

adrinjalali · 2024-05-03T13:46:42Z

sklearn/model_selection/_classification_threshold.py

-    objective_metric : {"max_tpr_at_tnr_constraint", "max_tnr_at_tpr_constraint", \
-            "max_precision_at_recall_constraint, "max_recall_at_precision_constraint"} \
-            , str, dict or callable, default="balanced_accuracy"
+    objective_metric : str, dict or callable, default="balanced_accuracy"


shouldn't this be called scoring the same as *SeachCV? We're optimizing using a cv object optimizing for one or more metrics, exactly like *SearchCV, would be nice to have a consistent name for this.

I agree that it makes the API more consistent

I was trying to find where we got the discussion in the past but I could not find it.
I think that the fact that we had constraint metric was different enough from the "scoring" parameter that we decided to change the name.

I'm pretty optimistic that we can find an API which we can share with *SearchCV which is both user friendly and capable of handling usecases we want. And as @jeremiedbb says in #26120 (review) , we're not far from nicely supporting it. So I'd say we can safely rename the parameter.

Now I have a question: do we need pos_label and response_method parameters. Those two can be delegated to make_scorer: response_method is a parameter of this function while pos_label is a parameter of the function provided to make_scorer and forwarded to it.

The downside is coming to "how many people are using make_scorer

the people who don't are fine with the defaults

Currently, if I want to tune a LogisticRegression with a grid search and recall as scoring, either I pass scoring="recall" and rely on the defaults, or I pass a scorer made out of recall_score if I want to change the pos label. I think it's fine to mimic this behavior

But they will be requested for the FixedThresholdClassifier because we don't have any metric.

For this one we need to keep them indeed I think because if a user passes a threshold of say 0.3, we need to know if it's for a proba or for a decision function.

Actually, the response_method of the estimator and the response_method for the scorer are not the same, so we need to keep it.

Indeed. response_method is dispatched to _CurveScorer and not the scorer. It would be weird to request our user passing make_scorer(balanced_accuracy_score, response_method="predict_proba") while the scorer alone does not make any sense.

jeremiedbb

I took a quick look and references to constrained metrics seem to have all been removed.

Also, I think the use case of constrained metric is still doable with the current API. I expect that something like the following works:

def max_tpr_at_tnr_constraint_score(y_true, y_pred, max_tnr):
    tpr = tpr(y_true, y_pred)
    tnr = tnr(y_true, y_pred)
    if tnr > max_tnr:
        return -np.inf
    return tpr

my_scorer = make_scorer(max_tpr_at_tnr_constraint_score, max_tnr=0.5)

doc/modules/classification_threshold.rst

glemaitre · 2024-05-03T15:28:07Z

I think that this is good for another round of review.

jeremiedbb

LGTM

glemaitre · 2024-05-03T15:42:21Z

LGTM

Deja-vu :)

adrinjalali

LGTM. Could you please open separate issues for remaining work? (some refactoring and some doc concerns seem to be open).

jeremiedbb · 2024-05-03T16:16:21Z

🎉

glemaitre · 2024-05-03T16:16:27Z

The issue about refactoring is already open here: #28941.

For the documentation, I'll open one issue to see how to articulate the constrained part. The comment raised by @amueller is not anymore meaningful since we don't really allow to choose a point on the PR or ROC curve in a straight forward manner.

I'll also revive #17930

lorentzenchr · 2024-05-03T18:45:36Z

🚀 @glemaitre 🎉
Thanks for this great addition many have been longing for.

In a way, I like the explicit FixedThresholdClassifier. Out of curiosity: is there a path forward that this could end up as mixing class in every classifier? Without the need to meta-class-wrap it by the user?

ogrisel · 2024-05-06T16:37:20Z

Out of curiosity: is there a path forward that this could end up as mixing class in every classifier? Without the need to meta-class-wrap it by the user?

Maybe but that would entail adding at least 2 new constructor parameters to all classifiers in scikit-learn that implement predict_proba or decision_function. That's kind of an invasive API change...

glemaitre and others added 11 commits March 31, 2023 17:14

MAINT refactor scorer using _get_response_values

b44dd9d

Add __name__ for method of Mock

516f62f

remove multiclass issue

d2fbee0

make response_method a mandatory arg

29e5e87

Update sklearn/metrics/_scorer.py

b645ade

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

apply jeremie comments

3397c56

Merge branch 'main' into is/18589_restart

092689a

Merge remote-tracking branch 'origin/main' into cutoff_classifier_again

200ec31

iter

e871558

Merge remote-tracking branch 'glemaitre/is/18589_restart' into cutoff…

31aa1c0

…_classifier_again

FEA add CutOffClassifier to post-tune prediction threshold

74614e8

github-actions bot added module:metrics module:utils labels Apr 7, 2023

glemaitre changed the title ~~Cutoff classifier again~~ FEA add CutOffClassifier meta-estimator to post-tune the decision threshold Apr 7, 2023

DOC add changelog entry

27713af

glemaitre marked this pull request as draft April 7, 2023 12:38

glemaitre added 8 commits April 7, 2023 16:27

refresh implementation

ed1d9b3

add files

8410317

remove random state for the moment

c7d1fe4

TST make sure to pass the common test

c9d7a22

TST metaestimator sample_weight

9981f3a

API add prediction functions

b9c9d5e

TST bypass the test for classification

588f1c4

iter before another bug

243d173

glemaitre mentioned this pull request Apr 17, 2023

Thresholds can exceed 1 in roc_curve while providing probability estimate #26193

Closed

glemaitre added 5 commits April 18, 2023 18:26

iter

883e929

TST add test for _fit_and_score

69333ed

iter

8616da1

integrate refit

99a10b3

TST more test

0f6dce2

adrinjalali reviewed May 2, 2024

View reviewed changes

glemaitre added 7 commits May 2, 2024 15:32

use conditional p(y|X) instead of posterior

581133f

be more explicit that strings need to be provided to objective_metric

0f803d9

factorize plotting into a function

eb0defc

fix typo in code

ffd5669

use proper scoring rule and robust estimator to scale

18abafe

improve narrative

ce9464c

use grid-search

89d67cf

ogrisel reviewed May 2, 2024

View reviewed changes

Apply suggestions from code review

db3360b

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre mentioned this pull request May 3, 2024

MAINT create a specific scorer base class for curve metrics #28941

Open

remove constrainted metrics option

1789cc0

jeremiedbb reviewed May 3, 2024

View reviewed changes

partial review

e7c31b9

adrinjalali reviewed May 3, 2024

View reviewed changes

jeremiedbb reviewed May 3, 2024

View reviewed changes

doc/modules/classification_threshold.rst Outdated Show resolved Hide resolved

glemaitre added 3 commits May 3, 2024 16:14

rename objective_metric to scoring

0fd667c

fix typo

07e4387

remove pos_label and delegate to make_scorer

9bd68e6

jeremiedbb approved these changes May 3, 2024

View reviewed changes

adrinjalali approved these changes May 3, 2024

View reviewed changes

adrinjalali enabled auto-merge (squash) May 3, 2024 16:11

adrinjalali merged commit 1e49c34 into scikit-learn:main May 3, 2024
28 checks passed

glemaitre mentioned this pull request May 3, 2024

DOC add an example on how to optimize a metric with a constraint in TunedThresholdClassifierCV #28944

Open

FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120

FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120

Conversation

glemaitre commented Apr 7, 2023 • edited

Summary

Important features to have in mind:

Point to discuss

Further work

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adrinjalali May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb May 3, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremiedbb left a comment • edited

Choose a reason for hiding this comment

glemaitre commented May 3, 2024

jeremiedbb left a comment

Choose a reason for hiding this comment

glemaitre commented May 3, 2024

adrinjalali left a comment

Choose a reason for hiding this comment

jeremiedbb commented May 3, 2024

glemaitre commented May 3, 2024

lorentzenchr commented May 3, 2024

ogrisel commented May 6, 2024

glemaitre commented Apr 7, 2023 •

edited

adrinjalali May 3, 2024 •

edited

jeremiedbb May 3, 2024 •

edited

jeremiedbb left a comment •

edited