[WIP] Blog post on `predict_proba` #152

aperezlebel · 2022-12-02T14:33:23Z

Closes #147.

Work in progress.

TODO:

Update illustration image + credits
Update title
Check content

GaelVaroquaux · 2022-12-04T20:03:28Z

_posts/2022-12-01-predict-proba.md

+These probability estimates are typically accessible from the `predict_proba` method of scikit-learn's classifiers.
+
+However, the quality of the estimated probabilities must be validated to provide trustworthiness, ensure fairness and robustness to operating conditions.
+To be reliable, the estimated probabilities must be close to the true underlying posterior probabilities of the classes `P(Y=1|X)`.


As you know, there are several points of view on what such a probability may mean (controlled average error rate versus controlled individual probabilities). Maybe it would be good by first explaining what these two mean.

GaelVaroquaux · 2022-12-04T20:04:33Z

_posts/2022-12-01-predict-proba.md

+Similarly to validating a discriminant classifier through accuracy or ROC curves, tools have been developed to evaluate a probabilistic classifier.
+Calibration is one of them [1-4]. Calibration is used as a proxy to evaluate the closeness of the estimated probabilities to the true ones. Many recalibration techniques have been developed to improve the estimated probabilities (see [scikit-learn's user guide on calibration](https://scikit-learn.org/stable/modules/calibration.html)). Estimated probabilities of a calibrated classifier can be interpreted as probability of correctness on population of same estimated probability, but not as the true posterior class probability.
+
+Indeed, it is important to highlight that calibration only captures part of the error on the estimated probabilities. The remaining term is the grouping loss [5]. Together, the calibration and grouping losses fully characterize the error on the estimated probabilities, the epistemic loss.


You are too formal. Give us the intuitions of why calibration is not the full story, rather than the maths.

GaelVaroquaux · 2022-12-04T20:05:17Z

_posts/2022-12-01-predict-proba.md

+Indeed, it is important to highlight that calibration only captures part of the error on the estimated probabilities. The remaining term is the grouping loss [5]. Together, the calibration and grouping losses fully characterize the error on the estimated probabilities, the epistemic loss.
+
+$$\text{Epistemic loss} = \text{Calibration loss} + \text{Grouping loss}$$
+


First mention Brier score for model selection. Later mention grouping loss.

GaelVaroquaux · 2022-12-04T20:07:25Z

_posts/2022-12-01-predict-proba.md

+
+However, estimating the grouping loss is a harder problem than calibration as its estimation involves directly the true probabilities. Recent work have focused on approximating the grouping loss through local estimations of the true probabilities [6].
+
+When working with scikit-learn's classifiers, users must be equally as cautious on results obtained from `predict_proba` as on results from `predict`. Both output estimated quantities (probabilities and labels respectively) with no prior guarantees on their quality. In both cases, model's quality must be assessed with appropriate metrics: expected calibration error, brier score, accuracy, AUC.


Put links to relevant pages in the scikit-learn documentation

GaelVaroquaux · 2022-12-04T20:09:03Z

_posts/2022-12-01-predict-proba.md

+However, estimating the grouping loss is a harder problem than calibration as its estimation involves directly the true probabilities. Recent work have focused on approximating the grouping loss through local estimations of the true probabilities [6].
+
+When working with scikit-learn's classifiers, users must be equally as cautious on results obtained from `predict_proba` as on results from `predict`. Both output estimated quantities (probabilities and labels respectively) with no prior guarantees on their quality. In both cases, model's quality must be assessed with appropriate metrics: expected calibration error, brier score, accuracy, AUC.
+


Maybe mention quickly (with a separate section title) recalibration (and link to corresponding docs)

lorentzenchr · 2022-12-08T20:03:13Z

@aperezlebel While I value the effort for such a blog post, I do not agree with some parts of its current content, e.g. the grouping loss. Unfortunately, I can‘t promise a fast review atm.

aperezlebel · 2022-12-13T16:15:05Z

@lorentzenchr I appreciate your feedback, thanks. Could you elaborate on the parts you disagree with?

lorentzenchr · 2022-12-13T21:54:22Z

I have 3 main points of critique:

The main message is unclear to me. If I fit a logistic regression, do I need to re-calibrated and how do I assess the calibration after all?
As @GaelVaroquaux knows, I prefer the decomposition of proper scoring rules in terms of reliabilty (or miscalibration), resolution (or discrimination) and uncertainty (or entropy), according to Murphy's original decomposition of the Brier score and many others after him, e.g. Bröcker. I would prefer to avoid the term "grouping loss".
You cite yourself, exclusively for one topic.

aperezlebel added 7 commits December 1, 2022 21:42

Add base post

45ffb86

Add base

2e60440

Rename file

c05db58

Fix

ad61f8d

Credit line

0e31831

Typo

597b088

Update

e600bc1

aperezlebel mentioned this pull request Dec 2, 2022

predict_proba #147

Open

GaelVaroquaux reviewed Dec 4, 2022

View reviewed changes

glemaitre mentioned this pull request Dec 15, 2022

[MRG] Implement predict_proba() for OutputCodeClassifier scikit-learn/scikit-learn#25148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Blog post on `predict_proba` #152

[WIP] Blog post on `predict_proba` #152

aperezlebel commented Dec 2, 2022 •

edited

GaelVaroquaux Dec 4, 2022

GaelVaroquaux Dec 4, 2022

GaelVaroquaux Dec 4, 2022

GaelVaroquaux Dec 4, 2022

GaelVaroquaux Dec 4, 2022

lorentzenchr commented Dec 8, 2022

aperezlebel commented Dec 13, 2022

lorentzenchr commented Dec 13, 2022

		Indeed, it is important to highlight that calibration only captures part of the error on the estimated probabilities. The remaining term is the grouping loss [5]. Together, the calibration and grouping losses fully characterize the error on the estimated probabilities, the epistemic loss.

		$$\text{Epistemic loss} = \text{Calibration loss} + \text{Grouping loss}$$


		However, estimating the grouping loss is a harder problem than calibration as its estimation involves directly the true probabilities. Recent work have focused on approximating the grouping loss through local estimations of the true probabilities [6].

		When working with scikit-learn's classifiers, users must be equally as cautious on results obtained from `predict_proba` as on results from `predict`. Both output estimated quantities (probabilities and labels respectively) with no prior guarantees on their quality. In both cases, model's quality must be assessed with appropriate metrics: expected calibration error, brier score, accuracy, AUC.

[WIP] Blog post on predict_proba #152

Are you sure you want to change the base?

[WIP] Blog post on predict_proba #152

Conversation

aperezlebel commented Dec 2, 2022 • edited

GaelVaroquaux Dec 4, 2022

Choose a reason for hiding this comment

GaelVaroquaux Dec 4, 2022

Choose a reason for hiding this comment

GaelVaroquaux Dec 4, 2022

Choose a reason for hiding this comment

GaelVaroquaux Dec 4, 2022

Choose a reason for hiding this comment

GaelVaroquaux Dec 4, 2022

Choose a reason for hiding this comment

lorentzenchr commented Dec 8, 2022

aperezlebel commented Dec 13, 2022

lorentzenchr commented Dec 13, 2022

[WIP] Blog post on `predict_proba` #152

[WIP] Blog post on `predict_proba` #152

aperezlebel commented Dec 2, 2022 •

edited