New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Net benefit curve and decision curve analysis #22136
Comments
@lorentzenchr What do you think of this metric? |
I think the authors should have read Elkan 2001 "The Foundations of Cost-Sensitive Learning" or Granger and Pesaran 1999 "Economic and Statistical Measures of Forecast Accuracy" (which might have been much harder to find at that time and in a different scientific field). I have not worked out the math to see if the two are equivalent, but if we need a new metric in that direction, it would be the cost-weighted misclassification error, see Elkan Eq.~(1), given some cost ratio (=probability threshold) or cost matrix. |
Thanks for the input @lorentzenchr - I plan to review the publications you shared and I'll get back to this thread. Thanks! |
I'll close this issue. In case of further interest, we can open discussion any time. |
For anyone else who finds this issue, the dcurves package can be used to generate net benefit curves in python: https://mskcc-epi-bio.github.io/decisioncurveanalysis/dca-tutorial.html |
Describe the workflow you want to enable
Computing a net benefit curve in support of decision curve analysis.
See [1] for primary publication (2,301 citations on google scholar) and [2] for publication on usage and interpretation (148 citations in google scholar).
Describe your proposed solution
Given predicted likelihoods and ground truth labels, produce 'net benefit' over a configurable decision threshold cut-points:
Where p_t is the likelihood threshold for considering the prediction a positive (e.g. {0.01, 0.05, 0.1, ..., 1.})
I could see this feature being implemented similarly to calibration curves - methods to produce the raw net benefit data and possibly additional functions to plot it.
I have an implementation that isn't public yet and I'm open to working on a PR to get this into sklearn if the community is interested. I'd also be open to a 'contrib' package.
Describe alternatives you've considered, if relevant
Some code exists in other languages to produce these curves and subsequent plots, but I'm not aware of a well-maintained Python implementation.
Additional context
(I would suggest reading citation 2 above for more thorough explanation and examples - see the later examples for common questions)
Net benefit is primarily for assessing models that are intended to aid an individual in making a decision.
For example, given a model for the likelihood of rain later in the day: what level of certainty do you need to pack a jacket/umbrella? If you are risk averse, you may decide that even a 5% chance of rain is enough motivation. Or in contrast, another user may be risk tolerant and would need 75% chance of rain to warrant rain gear.
A net benefit curve accounts for both calibration and classification performance, helping users understand the value of using the model versus always acting a specific way. Given a net benefit curve and it's comparison to naively "treating all" or "treating none":
Net benefit is in units of true positives is in the range (-inf, true positive rate]. Negative values indicate harm and that it is better to assume no positives at that threshold (i.e. treat none) rather than rely on the model.
The text was updated successfully, but these errors were encountered: