scikit-learn · adrinjalali · May 3, 2024 · Mar 31, 2023 · Apr 1, 2023 · Apr 1, 2023
diff --git a/doc/model_selection.rst b/doc/model_selection.rst
@@ -14,5 +14,6 @@ Model selection and evaluation
 
     modules/cross_validation
     modules/grid_search
+    modules/classification_threshold
     modules/model_evaluation
     modules/learning_curve
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -1248,6 +1248,16 @@ Hyper-parameter optimizers
    model_selection.RandomizedSearchCV
    model_selection.HalvingRandomSearchCV
 
+Model post-fit tuning
+---------------------
+
+.. currentmodule:: sklearn
+
+.. autosummary::
+   :toctree: generated/
+   :template: class.rst
+
+   model_selection.TunedThresholdClassifier
 
 Model validation
 ----------------

diff --git a/doc/modules/classification_threshold.rst b/doc/modules/classification_threshold.rst
@@ -0,0 +1,174 @@
+.. currentmodule:: sklearn.model_selection
+
+.. _tunedthresholdclassifier:
+
+==================================================
+Tuning the decision threshold for class prediction
+==================================================
+
+Classification is best divided into two parts:
+
+* the statistical problem of learning a model to predict, ideally, class probabilities;
+* the decision problem to take concrete action based on those probability predictions.
+
+Let's take a straightforward example related weather forecasting: the first point is
+related to answering "what is the chance of rain tomorrow?" while the second point is
+related to answering "should I take an umbrella tomorrow?".
+
+When it comes to the scikit-learn API, the first point is addressed providing scores
+using :term:`predict_proba` or :term:`decision_function`. The former returns posterior
+probability estimates for each class, while the latter returns a decision score for each
+class.
+
+The decision corresponding to the labels are obtained with :term:`predict`. In binary
+classification, a decision rule or action is then defined by thresholding the scores,
+leading to the prediction of a single class label for each sample. For binary
+classification in scikit-learn, class labels predictions are obtained by hard-coded
+cut-off rules: a positive class is predicted when the posterior probability is greater
+than 0.5 (obtained with :term:`predict_proba`) or if the decision score is greater than
+0 (obtained with :term:`decision_function`).
+
+Here, we show an example that illustrates the relation between posterior
+probability estimates and class labels::
+
+    >>> from sklearn.datasets import make_classification
+    >>> from sklearn.tree import DecisionTreeClassifier
+    >>> X, y = make_classification(random_state=0)
+    >>> classifier = DecisionTreeClassifier(max_depth=2, random_state=0).fit(X, y)
+    >>> classifier.predict_proba(X[:4])
+    array([[0.94     , 0.06     ],
+           [0.94     , 0.06     ],
+           [0.0416..., 0.9583...],
+           [0.0416..., 0.9583...]])
+    >>> classifier.predict(X[:4])
+    array([0, 0, 1, 1])
+
+While these hard-coded rules might at first seem reasonable as default behavior, they
+are most certainly not ideal for most use cases. Let's illustrate with an example.
+
+Let's consider a scenario where a predictive model is being deployed to assist
+physicians in detecting tumors. In this setting, physicians will be most likely
+interested in identifying all patients with cancer and not missing anyone with cancer so
+that they can provide them with the right treatment. In other words, physicians
+prioritize achieving a high recall rate. This emphasis on recall comes, of course, with
+the trade-off of potentially more false-positive predictions, reducing the precision of
+the model. That is a risk physicians are willing to take because the cost of a missed
+cancer is much higher than the cost of further diagnostic tests. Consequently, when it
+comes to deciding whether to classify a patient as having cancer or not, it may be more
+beneficial to classify them as positive for cancer when the posterior probability
+estimate is much lower than 0.5.
+
+Post-tuning the decision threshold
+==================================
+
+One solution to address the problem stated in the introduction is to tune the decision
+threshold of the classifier once the model has been trained. The
+:class:`~sklearn.model_selection.TunedThresholdClassifier` tunes this threshold using an
+internal cross-validation. The optimum threshold is chosen to maximize a given metric
+with or without constraints.
+
+The following image illustrates the tuning of the cut-off point for a gradient boosting
+classifier. While the vanilla and tuned classifiers provide the same Receiver Operating
+Characteristic (ROC) and Precision-Recall curves, and thus the same
+:term:`predict_proba` outputs, the class label predictions differ because of the tuned
+decision threshold. The vanilla classifier predicts the class of interest for a
+posterior probability greater than 0.5 while the tuned classifier predicts the class of
+interest for a very low probability (around 0.02). This cut-off point optimizes a
+utility metric defined by the business (in this case an insurance company).
+
+.. figure:: ../auto_examples/model_selection/images/sphx_glr_plot_cost_sensitive_learning_002.png
+   :target: ../auto_examples/model_selection/plot_cost_sensitive_learning.html
+   :align: center
+
+Options to tune the cut-off point
+---------------------------------
+
+The cut-off point can be tuned through different strategies controlled by the parameter
+`objective_metric`.
+
+One way to tune the threshold is by maximizing a pre-defined scikit-learn metric. These
+metrics can be found by calling the function :func:`~sklearn.metrics.get_scorer_names`.
+In this example, we maximize the balanced accuracy.
+
+.. note::
+
+    It is important to notice that these metrics come with default parameters, notably
+    the label of the class of interested (i.e. `pos_label`). Thus, if this label is not
+    the right one for your application, you need to define a scorer and pass the right
+    `pos_label` (and additional parameters) using the
+    :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get
+    information to define your own scoring function. For instance, we show how to pass
+    the information to the scorer that the label of interest is `0` when maximizing the
+    :func:`~sklearn.metrics.f1_score`:
+
+        >>> from sklearn.linear_model import LogisticRegression
+        >>> from sklearn.model_selection import (
+        ...     TunedThresholdClassifier, train_test_split
+        ... )
+        >>> from sklearn.metrics import make_scorer, f1_score
+        >>> X, y = make_classification(
+        ...    n_samples=1_000, weights=[0.1, 0.9], random_state=0)
+        >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)
+        >>> pos_label = 0
+        >>> scorer = make_scorer(f1_score, pos_label=pos_label)
+        >>> base_model = LogisticRegression()
+        >>> model = TunedThresholdClassifier(base_model, objective_metric=scorer).fit(
+        ...     X_train, y_train)
+        >>> scorer(model, X_test, y_test)
+        0.79...
+        >>> # compare it with the internal score found by cross-validation
+        >>> model.best_score_
+        0.86...
+
+A second strategy aims to maximize one metric while imposing constraints on another
+metric. There are four pre-defined options, two use the Receiver Operating
+Characteristic (ROC) statistics and two use the Precision-Recall statistics.
+
+- `"max_tpr_at_tnr_constraint"`: maximizes the True Positive Rate (TPR) such that the
+  True Negative Rate (TNR) is the closest to a given value.
+- `"max_tnr_at_tpr_constraint"`: maximizes the TNR such that the TPR is the closest to
+  a given value.
+- `"max_precision_at_recall_constraint"`: maximizes the precision such that the recall
+  is the closest to a given value.
+- `"max_recall_at_precision_constraint"`: maximizes the recall such that the precision
+  is the closest to a given value.
+
+For these options, the `constraint_value` parameter needs to be defined. In addition,
+you can use the `pos_label` parameter to indicate the label of the class of interest.
+
+Important notes regarding the internal cross-validation
+-------------------------------------------------------
+
+By default :class:`~sklearn.model_selection.TunedThresholdClassifier` uses a 5-fold
+stratified cross-validation to tune the cut-off point. The parameter `cv` allows to
+control the cross-validation strategy. It is possible to bypass cross-validation by
+setting `cv="prefit"` and providing a fitted classifier. In this case, the cut-off point
+is tuned on the data provided to the `fit` method.
+
+However, you should be extremely careful when using this option. You should never use
+the same data for training the classifier and tuning the cut-off point due to the risk
+of overfitting. Refer to the following example section for more details (cf.
+:ref:`tunedthresholdclassifier_no_cv`). If you have limited resources, consider using a
+float number for `cv` to limit to an internal single train-test split.
+
+The option `cv="prefit"` should only be used when the provided classifier was already
+trained, and you just want to find the best cut-off using a new validation set.
+
+Manually setting the decision threshold
+---------------------------------------
+
+The previous sections discussed strategies to find an optimal decision threshold. It is
+also possible to manually set the decision threshold in
+:class`~sklearn.model_selection.TunedThresholdClassifier` by setting the parameter
+`strategy` to `"constant"` and providing the desired threshold using the parameter
+`constant_threshold`.
+
+Examples
+--------
+
+- See the example entitled
+  :ref:`sphx_glr_auto_examples_model_selection_plot_tuned_decision_threshold.py`,
+  to get insights on the post-tuning of the decision threshold.
+- See the example entitled
+  :ref:`sphx_glr_auto_examples_model_selection_plot_cost_sensitive_learning.py`,
+  to learn about cost-sensitive learning and decision threshold tuning.
diff --git a/doc/whats_new/v1.5.rst b/doc/whats_new/v1.5.rst
@@ -304,6 +304,11 @@ Changelog
 :mod:`sklearn.model_selection`
 ..............................
 
+- |MajorFeature| :class:`model_selection.TunedThresholdClassifier` finds
+  the decision threshold of a binary classifier that maximizes a
+  classification metric through cross-validation.
+  :pr:`26120` by :user:`Guillaume Lemaitre <glemaitre>`.
+
 - |Enhancement| :term:`CV splitters <CV splitter>` that ignores the group parameter now
   raises a warning when groups are passed in to :term:`split`. :pr:`28210` by
 - |Fix| the ``cv_results_`` attribute (of :class:`model_selection.GridSearchCV`) now