ci.cvAUC needs 0.5 for ties #6

sgruber65 · 2020-09-09T01:47:34Z

Hi Erin,
The ROCR package's calculation of the AUC assigns 0.5 points for a tie. I was looking at your code for calculating the CIs, and saw that it ignores that possibility. Although people argue over strategies for dealing with ties, since the code is estimating the variance of the cv-AUC, as calculated by the ROCR package, it ought to respect the underlying calculation of the AUC.

DT[, :=(icVal, ifelse(label == pos, w1 * (fracNegLabelsWithSmallerPreds - auc), w0 * (fracPosLabelsWithLargerPreds - auc)))]

For some positive observation, i, this line will assign w1 * 1 to each negLabel earlier in the ordering, when for some subset of those it should possibly be w1 * 0.5. Also, there may be one or more negLabel observations immediately after i in the ordering that should be counted as 0.5, instead of 0. (Of course, similar logic applies to the negative label calculations.)

--Susan Gruber

The text was updated successfully, but these errors were encountered:

ledell · 2021-01-18T06:38:40Z

Thanks, @sgruber65, for pointing this out. Did you have a specific code fix in mind to resolve this?

Is there an easy way to identify which rows, i, should be w1 * 0.5 instead of w1 * 1.0? If so, then perhaps we can add a line of code right after the one above, which corrects the weights. It's been a long time since I wrote this code, so it would take me a while to get familiar with it again, in order to dig in deeper.

sgruber65 · 2021-01-19T20:53:27Z

Hi Erin, The AUC calculation returned by the call to ROCR is correct — the only problem is the IC. It captures the formula in the 2015 paper, but that isn’t correct. Here’s the IC function inside of the cvAUC function (v1.1.0 of the cvAUC package from CRAN) .IC <- function(fold_preds, fold_labels, pos, neg, w1, w0) { n_rows <- length(fold_labels) n_pos <- sum(fold_labels == pos) n_neg <- n_rows - n_pos auc <- AUC(fold_preds, fold_labels) DT <- data.table(pred = fold_preds, label = fold_labels) DT <- DT[order(pred, -xtfrm(label))] DT[, `:=`(fracNegLabelsWithSmallerPreds, cumsum(label == neg)/n_neg)] DT <- DT[order(-pred, label)] DT[, `:=`(fracPosLabelsWithLargerPreds, cumsum(label == pos)/n_pos)] DT[, `:=`(icVal, ifelse(label == pos, w1 * (fracNegLabelsWithSmallerPreds - auc), w0 * (fracPosLabelsWithLargerPreds - auc)))] return(mean(DT$icVal^2)) } We want to add 0.5 points for ties. Also notice that when there are ties, ordering the observations and using cumsum won’t work, since some negative observations with the same predicted value might be ranked both before and after positive observations with that value. Here’s a version that works. Nothing else has to change. .ICv2 <- function(fold_preds, fold_labels, pos, neg, w1, w0) { n_rows <- length(fold_labels) n_pos <- sum(fold_labels == pos) n_neg <- n_rows - n_pos pos_rows <- fold_labels == pos neg_rows <- fold_labels == neg auc <- AUC(fold_preds, fold_labels) DT <- data.table(pred = fold_preds, label = fold_labels) DT[pos_rows, `:=`(icVal, apply(DT[pos_rows,], 1, function(x){ sum(x["pred"] > DT[neg_rows, pred] + .5*(x["pred"] == DT[neg_rows,pred]))})/n_neg * w1 - auc*w1)] DT[neg_rows, `:=`(icVal, apply(DT[neg_rows,], 1, function(x){ sum(x["pred"] < DT[pos_rows, pred] + .5*(x["pred"] == DT[pos_rows,pred]))})/n_pos * w0 - auc*w0)] return(mean(DT$icVal^2)) } —Susan

…

On Jan 18, 2021, at 1:38 AM, Erin LeDell ***@***.***> wrote: Thanks, @sgruber65, for pointing this out. Did you have a specific code fix in mind to resolve this? Is there an easy way to identify which rows, i, should be w1 * 0.5 instead of w1 * 1.0? If so, then perhaps we can add a line of code right after the one above, which corrects the weights. It's been a long time since I wrote this code, so it would take me a while to get familiar with it again, in order to dig in deeper. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

sgruber65 · 2021-03-04T17:24:49Z

Hi Erin, When you get a chance can you upload a new version to CRAN that uses the .IC function I defined below? Thanks, Susan

…

Begin forwarded message: From: Susan Gruber ***@***.***> Subject: Re: [ledell/cvAUC] ci.cvAUC needs 0.5 for ties (#6) Date: January 19, 2021 at 3:53:20 PM EST To: ledell/cvAUC ***@***.***> Cc: ledell/cvAUC ***@***.***>, Mention ***@***.***> Hi Erin, The AUC calculation returned by the call to ROCR is correct — the only problem is the IC. It captures the formula in the 2015 paper, but that isn’t correct. Here’s the IC function inside of the cvAUC function (v1.1.0 of the cvAUC package from CRAN) .IC <- function(fold_preds, fold_labels, pos, neg, w1, w0) { n_rows <- length(fold_labels) n_pos <- sum(fold_labels == pos) n_neg <- n_rows - n_pos auc <- AUC(fold_preds, fold_labels) DT <- data.table(pred = fold_preds, label = fold_labels) DT <- DT[order(pred, -xtfrm(label))] DT[, `:=`(fracNegLabelsWithSmallerPreds, cumsum(label == neg)/n_neg)] DT <- DT[order(-pred, label)] DT[, `:=`(fracPosLabelsWithLargerPreds, cumsum(label == pos)/n_pos)] DT[, `:=`(icVal, ifelse(label == pos, w1 * (fracNegLabelsWithSmallerPreds - auc), w0 * (fracPosLabelsWithLargerPreds - auc)))] return(mean(DT$icVal^2)) } We want to add 0.5 points for ties. Also notice that when there are ties, ordering the observations and using cumsum won’t work, since some negative observations with the same predicted value might be ranked both before and after positive observations with that value. Here’s a version that works. Nothing else has to change. .ICv2 <- function(fold_preds, fold_labels, pos, neg, w1, w0) { n_rows <- length(fold_labels) n_pos <- sum(fold_labels == pos) n_neg <- n_rows - n_pos pos_rows <- fold_labels == pos neg_rows <- fold_labels == neg auc <- AUC(fold_preds, fold_labels) DT <- data.table(pred = fold_preds, label = fold_labels) DT[pos_rows, `:=`(icVal, apply(DT[pos_rows,], 1, function(x){ sum(x["pred"] > DT[neg_rows, pred] + .5*(x["pred"] == DT[neg_rows,pred]))})/n_neg * w1 - auc*w1)] DT[neg_rows, `:=`(icVal, apply(DT[neg_rows,], 1, function(x){ sum(x["pred"] < DT[pos_rows, pred] + .5*(x["pred"] == DT[pos_rows,pred]))})/n_pos * w0 - auc*w0)] return(mean(DT$icVal^2)) } —Susan > On Jan 18, 2021, at 1:38 AM, Erin LeDell ***@***.*** ***@***.***>> wrote: > > > Thanks, @sgruber65, for pointing this out. Did you have a specific code fix in mind to resolve this? > > Is there an easy way to identify which rows, i, should be w1 * 0.5 instead of w1 * 1.0? If so, then perhaps we can add a line of code right after the one above, which corrects the weights. It's been a long time since I wrote this code, so it would take me a while to get familiar with it again, in order to dig in deeper. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or unsubscribe. >

ledell · 2021-04-27T22:06:14Z

Hi @sgruber65 I am sorry for the delay on this -- i was locked out of my berkeley.edu email and so I had to sort that out before being able to update the package (since this package uses my old email and you can't update a package w/o access).

Thank you for providing the code! I think I can use the same code for the pooled version, as well.

I have opened a PR here with some remaining tasks noted: #11

sgruber65 · 2021-05-06T18:58:19Z

Thanks, Erin. And I agree, this should be the same for the pooled version.

ledell self-assigned this Apr 27, 2021

ledell linked a pull request Apr 27, 2021 that will close this issue

ci.cvAUC needs 0.5 for ties #11

Open

3 tasks

ledell mentioned this issue May 29, 2021

Observation weights? #10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci.cvAUC needs 0.5 for ties #6

ci.cvAUC needs 0.5 for ties #6

sgruber65 commented Sep 9, 2020

ledell commented Jan 18, 2021

sgruber65 commented Jan 19, 2021 via email

sgruber65 commented Mar 4, 2021 via email

ledell commented Apr 27, 2021 •

edited

sgruber65 commented May 6, 2021

ci.cvAUC needs 0.5 for ties #6

ci.cvAUC needs 0.5 for ties #6

Comments

sgruber65 commented Sep 9, 2020

ledell commented Jan 18, 2021

sgruber65 commented Jan 19, 2021 via email

sgruber65 commented Mar 4, 2021 via email

ledell commented Apr 27, 2021 • edited

sgruber65 commented May 6, 2021

ledell commented Apr 27, 2021 •

edited