Net benefit curve and decision curve analysis #22136

Morgan243 · 2022-01-06T20:11:06Z

Describe the workflow you want to enable

Computing a net benefit curve in support of decision curve analysis.

See [1] for primary publication (2,301 citations on google scholar) and [2] for publication on usage and interpretation (148 citations in google scholar).

Vickers, Andrew J., and Elena B. Elkin. "Decision curve analysis: a novel method for evaluating prediction models." Medical Decision Making 26.6 (2006): 565-574.
Vickers, Andrew J., Ben van Calster, and Ewout W. Steyerberg. "A simple, step-by-step guide to interpreting decision curve analysis." Diagnostic and prognostic research 3.1 (2019): 1-8.

Describe your proposed solution

Given predicted likelihoods and ground truth labels, produce 'net benefit' over a configurable decision threshold cut-points:

Where p_t is the likelihood threshold for considering the prediction a positive (e.g. {0.01, 0.05, 0.1, ..., 1.})

I could see this feature being implemented similarly to calibration curves - methods to produce the raw net benefit data and possibly additional functions to plot it.

I have an implementation that isn't public yet and I'm open to working on a PR to get this into sklearn if the community is interested. I'd also be open to a 'contrib' package.

Describe alternatives you've considered, if relevant

Some code exists in other languages to produce these curves and subsequent plots, but I'm not aware of a well-maintained Python implementation.

Additional context

(I would suggest reading citation 2 above for more thorough explanation and examples - see the later examples for common questions)

Net benefit is primarily for assessing models that are intended to aid an individual in making a decision.

For example, given a model for the likelihood of rain later in the day: what level of certainty do you need to pack a jacket/umbrella? If you are risk averse, you may decide that even a 5% chance of rain is enough motivation. Or in contrast, another user may be risk tolerant and would need 75% chance of rain to warrant rain gear.

A net benefit curve accounts for both calibration and classification performance, helping users understand the value of using the model versus always acting a specific way. Given a net benefit curve and it's comparison to naively "treating all" or "treating none":

The risk averse user may find that the model has higher net-benefit than always wearing a coat (i.e. treat all) for their 5% threshold. This user would then only prepare for rain if the model's output was above 5%.
Instead, the risk tolerant user may find that there is little net benefit to relying on the model for their 75% threshold, and they will have the same net benefit if they simply ignore the model and never bring an umbrella (i.e. treat none).

Net benefit is in units of true positives is in the range (-inf, true positive rate]. Negative values indicate harm and that it is better to assume no positives at that threshold (i.e. treat none) rather than rely on the model.

thomasjpfan · 2022-02-10T17:03:08Z

@lorentzenchr What do you think of this metric?

lorentzenchr · 2022-02-10T19:28:11Z

I think the authors should have read Elkan 2001 "The Foundations of Cost-Sensitive Learning" or Granger and Pesaran 1999 "Economic and Statistical Measures of Forecast Accuracy" (which might have been much harder to find at that time and in a different scientific field). I have not worked out the math to see if the two are equivalent, but if we need a new metric in that direction, it would be the cost-weighted misclassification error, see Elkan Eq.~(1), given some cost ratio (=probability threshold) or cost matrix.

Morgan243 · 2022-03-24T17:38:42Z

Thanks for the input @lorentzenchr - I plan to review the publications you shared and I'll get back to this thread. Thanks!

lorentzenchr · 2022-06-02T20:21:32Z

I'll close this issue. In case of further interest, we can open discussion any time.

ck37 · 2024-05-09T13:50:38Z

For anyone else who finds this issue, the dcurves package can be used to generate net benefit curves in python: https://mskcc-epi-bio.github.io/decisioncurveanalysis/dca-tutorial.html

Morgan243 added Needs Triage Issue requires triage New Feature labels Jan 6, 2022

thomasjpfan added module:metrics Needs Decision - Include Feature Requires decision regarding including feature and removed Needs Triage Issue requires triage labels Feb 10, 2022

lorentzenchr closed this as completed Jun 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Net benefit curve and decision curve analysis #22136

Net benefit curve and decision curve analysis #22136

Morgan243 commented Jan 6, 2022

thomasjpfan commented Feb 10, 2022

lorentzenchr commented Feb 10, 2022 •

edited

Morgan243 commented Mar 24, 2022

lorentzenchr commented Jun 2, 2022

ck37 commented May 9, 2024 •

edited

Net benefit curve and decision curve analysis #22136

Net benefit curve and decision curve analysis #22136

Comments

Morgan243 commented Jan 6, 2022

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

thomasjpfan commented Feb 10, 2022

lorentzenchr commented Feb 10, 2022 • edited

Morgan243 commented Mar 24, 2022

lorentzenchr commented Jun 2, 2022

ck37 commented May 9, 2024 • edited

lorentzenchr commented Feb 10, 2022 •

edited

ck37 commented May 9, 2024 •

edited