-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci.cvAUC needs 0.5 for ties #6
Comments
Thanks, @sgruber65, for pointing this out. Did you have a specific code fix in mind to resolve this? Is there an easy way to identify which rows, i, should be w1 * 0.5 instead of w1 * 1.0? If so, then perhaps we can add a line of code right after the one above, which corrects the weights. It's been a long time since I wrote this code, so it would take me a while to get familiar with it again, in order to dig in deeper. |
Hi Erin,
The AUC calculation returned by the call to ROCR is correct — the only problem is the IC. It captures the formula in the 2015 paper, but that isn’t correct.
Here’s the IC function inside of the cvAUC function (v1.1.0 of the cvAUC package from CRAN)
.IC <- function(fold_preds, fold_labels, pos, neg, w1, w0) {
n_rows <- length(fold_labels)
n_pos <- sum(fold_labels == pos)
n_neg <- n_rows - n_pos
auc <- AUC(fold_preds, fold_labels)
DT <- data.table(pred = fold_preds, label = fold_labels)
DT <- DT[order(pred, -xtfrm(label))]
DT[, `:=`(fracNegLabelsWithSmallerPreds, cumsum(label ==
neg)/n_neg)]
DT <- DT[order(-pred, label)]
DT[, `:=`(fracPosLabelsWithLargerPreds, cumsum(label ==
pos)/n_pos)]
DT[, `:=`(icVal, ifelse(label == pos, w1 * (fracNegLabelsWithSmallerPreds -
auc), w0 * (fracPosLabelsWithLargerPreds - auc)))]
return(mean(DT$icVal^2))
}
We want to add 0.5 points for ties. Also notice that when there are ties, ordering the observations and using cumsum won’t work, since some negative observations with the same predicted value might be ranked both before and after positive observations with that value.
Here’s a version that works. Nothing else has to change.
.ICv2 <- function(fold_preds, fold_labels, pos, neg, w1, w0) {
n_rows <- length(fold_labels)
n_pos <- sum(fold_labels == pos)
n_neg <- n_rows - n_pos
pos_rows <- fold_labels == pos
neg_rows <- fold_labels == neg
auc <- AUC(fold_preds, fold_labels)
DT <- data.table(pred = fold_preds, label = fold_labels)
DT[pos_rows, `:=`(icVal, apply(DT[pos_rows,], 1, function(x){
sum(x["pred"] > DT[neg_rows, pred] + .5*(x["pred"] == DT[neg_rows,pred]))})/n_neg * w1 - auc*w1)]
DT[neg_rows, `:=`(icVal, apply(DT[neg_rows,], 1, function(x){
sum(x["pred"] < DT[pos_rows, pred] + .5*(x["pred"] == DT[pos_rows,pred]))})/n_pos * w0 - auc*w0)]
return(mean(DT$icVal^2))
}
—Susan
… On Jan 18, 2021, at 1:38 AM, Erin LeDell ***@***.***> wrote:
Thanks, @sgruber65, for pointing this out. Did you have a specific code fix in mind to resolve this?
Is there an easy way to identify which rows, i, should be w1 * 0.5 instead of w1 * 1.0? If so, then perhaps we can add a line of code right after the one above, which corrects the weights. It's been a long time since I wrote this code, so it would take me a while to get familiar with it again, in order to dig in deeper.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi Erin,
When you get a chance can you upload a new version to CRAN that uses the .IC function I defined below?
Thanks,
Susan
… Begin forwarded message:
From: Susan Gruber ***@***.***>
Subject: Re: [ledell/cvAUC] ci.cvAUC needs 0.5 for ties (#6)
Date: January 19, 2021 at 3:53:20 PM EST
To: ledell/cvAUC ***@***.***>
Cc: ledell/cvAUC ***@***.***>, Mention ***@***.***>
Hi Erin,
The AUC calculation returned by the call to ROCR is correct — the only problem is the IC. It captures the formula in the 2015 paper, but that isn’t correct.
Here’s the IC function inside of the cvAUC function (v1.1.0 of the cvAUC package from CRAN)
.IC <- function(fold_preds, fold_labels, pos, neg, w1, w0) {
n_rows <- length(fold_labels)
n_pos <- sum(fold_labels == pos)
n_neg <- n_rows - n_pos
auc <- AUC(fold_preds, fold_labels)
DT <- data.table(pred = fold_preds, label = fold_labels)
DT <- DT[order(pred, -xtfrm(label))]
DT[, `:=`(fracNegLabelsWithSmallerPreds, cumsum(label ==
neg)/n_neg)]
DT <- DT[order(-pred, label)]
DT[, `:=`(fracPosLabelsWithLargerPreds, cumsum(label ==
pos)/n_pos)]
DT[, `:=`(icVal, ifelse(label == pos, w1 * (fracNegLabelsWithSmallerPreds -
auc), w0 * (fracPosLabelsWithLargerPreds - auc)))]
return(mean(DT$icVal^2))
}
We want to add 0.5 points for ties. Also notice that when there are ties, ordering the observations and using cumsum won’t work, since some negative observations with the same predicted value might be ranked both before and after positive observations with that value.
Here’s a version that works. Nothing else has to change.
.ICv2 <- function(fold_preds, fold_labels, pos, neg, w1, w0) {
n_rows <- length(fold_labels)
n_pos <- sum(fold_labels == pos)
n_neg <- n_rows - n_pos
pos_rows <- fold_labels == pos
neg_rows <- fold_labels == neg
auc <- AUC(fold_preds, fold_labels)
DT <- data.table(pred = fold_preds, label = fold_labels)
DT[pos_rows, `:=`(icVal, apply(DT[pos_rows,], 1, function(x){
sum(x["pred"] > DT[neg_rows, pred] + .5*(x["pred"] == DT[neg_rows,pred]))})/n_neg * w1 - auc*w1)]
DT[neg_rows, `:=`(icVal, apply(DT[neg_rows,], 1, function(x){
sum(x["pred"] < DT[pos_rows, pred] + .5*(x["pred"] == DT[pos_rows,pred]))})/n_pos * w0 - auc*w0)]
return(mean(DT$icVal^2))
}
—Susan
> On Jan 18, 2021, at 1:38 AM, Erin LeDell ***@***.*** ***@***.***>> wrote:
>
>
> Thanks, @sgruber65, for pointing this out. Did you have a specific code fix in mind to resolve this?
>
> Is there an easy way to identify which rows, i, should be w1 * 0.5 instead of w1 * 1.0? If so, then perhaps we can add a line of code right after the one above, which corrects the weights. It's been a long time since I wrote this code, so it would take me a while to get familiar with it again, in order to dig in deeper.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub, or unsubscribe.
>
|
Hi @sgruber65 I am sorry for the delay on this -- i was locked out of my berkeley.edu email and so I had to sort that out before being able to update the package (since this package uses my old email and you can't update a package w/o access). Thank you for providing the code! I think I can use the same code for the pooled version, as well. I have opened a PR here with some remaining tasks noted: #11 |
Thanks, Erin. And I agree, this should be the same for the pooled version. |
Hi Erin,
The ROCR package's calculation of the AUC assigns 0.5 points for a tie. I was looking at your code for calculating the CIs, and saw that it ignores that possibility. Although people argue over strategies for dealing with ties, since the code is estimating the variance of the cv-AUC, as calculated by the ROCR package, it ought to respect the underlying calculation of the AUC.
DT[,
:=(icVal, ifelse(label == pos, w1 * (fracNegLabelsWithSmallerPreds - auc), w0 * (fracPosLabelsWithLargerPreds - auc)))]
For some positive observation, i, this line will assign w1 * 1 to each negLabel earlier in the ordering, when for some subset of those it should possibly be w1 * 0.5. Also, there may be one or more negLabel observations immediately after i in the ordering that should be counted as 0.5, instead of 0. (Of course, similar logic applies to the negative label calculations.)
--Susan Gruber
The text was updated successfully, but these errors were encountered: