Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Net benefit curve and decision curve analysis #22136

Closed
Morgan243 opened this issue Jan 6, 2022 · 5 comments
Closed

Net benefit curve and decision curve analysis #22136

Morgan243 opened this issue Jan 6, 2022 · 5 comments
Labels
module:metrics Needs Decision - Include Feature Requires decision regarding including feature New Feature

Comments

@Morgan243
Copy link

Describe the workflow you want to enable

Computing a net benefit curve in support of decision curve analysis.

See [1] for primary publication (2,301 citations on google scholar) and [2] for publication on usage and interpretation (148 citations in google scholar).

  1. Vickers, Andrew J., and Elena B. Elkin. "Decision curve analysis: a novel method for evaluating prediction models." Medical Decision Making 26.6 (2006): 565-574.
  2. Vickers, Andrew J., Ben van Calster, and Ewout W. Steyerberg. "A simple, step-by-step guide to interpreting decision curve analysis." Diagnostic and prognostic research 3.1 (2019): 1-8.

Describe your proposed solution

Given predicted likelihoods and ground truth labels, produce 'net benefit' over a configurable decision threshold cut-points:

image

Where p_t is the likelihood threshold for considering the prediction a positive (e.g. {0.01, 0.05, 0.1, ..., 1.})

I could see this feature being implemented similarly to calibration curves - methods to produce the raw net benefit data and possibly additional functions to plot it.

I have an implementation that isn't public yet and I'm open to working on a PR to get this into sklearn if the community is interested. I'd also be open to a 'contrib' package.

Describe alternatives you've considered, if relevant

Some code exists in other languages to produce these curves and subsequent plots, but I'm not aware of a well-maintained Python implementation.

Additional context

(I would suggest reading citation 2 above for more thorough explanation and examples - see the later examples for common questions)

Net benefit is primarily for assessing models that are intended to aid an individual in making a decision.

For example, given a model for the likelihood of rain later in the day: what level of certainty do you need to pack a jacket/umbrella? If you are risk averse, you may decide that even a 5% chance of rain is enough motivation. Or in contrast, another user may be risk tolerant and would need 75% chance of rain to warrant rain gear.

A net benefit curve accounts for both calibration and classification performance, helping users understand the value of using the model versus always acting a specific way. Given a net benefit curve and it's comparison to naively "treating all" or "treating none":

  • The risk averse user may find that the model has higher net-benefit than always wearing a coat (i.e. treat all) for their 5% threshold. This user would then only prepare for rain if the model's output was above 5%.
  • Instead, the risk tolerant user may find that there is little net benefit to relying on the model for their 75% threshold, and they will have the same net benefit if they simply ignore the model and never bring an umbrella (i.e. treat none).

Net benefit is in units of true positives is in the range (-inf, true positive rate]. Negative values indicate harm and that it is better to assume no positives at that threshold (i.e. treat none) rather than rely on the model.

@Morgan243 Morgan243 added Needs Triage Issue requires triage New Feature labels Jan 6, 2022
@thomasjpfan
Copy link
Member

@lorentzenchr What do you think of this metric?

@thomasjpfan thomasjpfan added module:metrics Needs Decision - Include Feature Requires decision regarding including feature and removed Needs Triage Issue requires triage labels Feb 10, 2022
@lorentzenchr
Copy link
Member

lorentzenchr commented Feb 10, 2022

I think the authors should have read Elkan 2001 "The Foundations of Cost-Sensitive Learning" or Granger and Pesaran 1999 "Economic and Statistical Measures of Forecast Accuracy" (which might have been much harder to find at that time and in a different scientific field). I have not worked out the math to see if the two are equivalent, but if we need a new metric in that direction, it would be the cost-weighted misclassification error, see Elkan Eq.~(1), given some cost ratio (=probability threshold) or cost matrix.

@Morgan243
Copy link
Author

Thanks for the input @lorentzenchr - I plan to review the publications you shared and I'll get back to this thread. Thanks!

@lorentzenchr
Copy link
Member

I'll close this issue. In case of further interest, we can open discussion any time.

@ck37
Copy link

ck37 commented May 9, 2024

For anyone else who finds this issue, the dcurves package can be used to generate net benefit curves in python: https://mskcc-epi-bio.github.io/decisioncurveanalysis/dca-tutorial.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:metrics Needs Decision - Include Feature Requires decision regarding including feature New Feature
Projects
None yet
Development

No branches or pull requests

4 participants