Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What does "direction" mean in roc function #125

Open
Ivy-ops opened this issue Feb 7, 2024 · 3 comments
Open

What does "direction" mean in roc function #125

Ivy-ops opened this issue Feb 7, 2024 · 3 comments
Labels
doc more-info-needed The problem is unclear and the issue needs more information

Comments

@Ivy-ops
Copy link

Ivy-ops commented Feb 7, 2024

Hi developer,
I am trying to use roc() function with my dataset; after reading the description of the "direction", I still can not understand what does this mean. It would be highly appreciated if you can help me with this:
I use random forest and get the probability of each sample(shown below), the second column is for "Case" group.
My dataset rf$prediction:
Control Case
[1,] 0.24642643 0.7535736
[2,] 0.33507026 0.6649297
[3,] 0.45731121 0.5426888
[4,] 0.46547831 0.5345217
[5,] 0.53042247 0.4695775
[6,] 0.31020475 0.6897952
[7,] 0.15786178 0.8421382
[8,] 0.15340136 0.8465986
[9,] 0.15774135 0.8422587
[10,] 0.18421489 0.8157851
[11,] 0.64663338 0.3533666
[12,] 0.40697185 0.5930282
[13,] 0.37198661 0.6280134
[14,] 0.57076432 0.4292357
[15,] 0.18086131 0.8191387
[16,] 0.58201416 0.4179858
[17,] 0.19227444 0.8077256
[18,] 0.46165459 0.5383454
[19,] 0.19301864 0.8069814
[20,] 0.66767106 0.3323289
[21,] 0.80801017 0.1919898
[22,] 0.66952125 0.3304788
[23,] 0.62995097 0.3700490
[24,] 0.50042121 0.4995788
[25,] 0.77477208 0.2252279
[26,] 0.60949394 0.3905061
[27,] 0.82625698 0.1737430
[28,] 0.65935287 0.3406471
[29,] 0.07350427 0.9264957
[30,] 0.72550278 0.2744972
[31,] 0.72104726 0.2789527
[32,] 0.65799964 0.3420004
[33,] 0.70231445 0.2976856
[34,] 0.32174162 0.6782584
[35,] 0.86845567 0.1315443
[36,] 0.50935250 0.4906475
[37,] 0.44772867 0.5522713
[38,] 0.78675787 0.2132421

actual
[1] Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Case Control
[21] Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control Control
Levels: Case Control

table(actual, predict)
predict
actual Case Control
Case 15 4
Control 3 16

Then I use roc function:

pROC::roc(actual, rf$predictions[,2], levels = c('Case','Control'), plot=T, direction = '>')
Call:
roc.default(response = actual, predictor = rf$predictions[, 2], levels = c("Case", "Control"), direction = ">", plot = T)
Data: rf$predictions[, 2] in 19 controls (actual Case) > 19 cases (actual Control).
Area under the curve: 0.8726
pROC::roc(actual, rf$predictions[,2], levels = c('Case','Control'), plot=T, direction = '<')
Call:
roc.default(response = actual, predictor = rf$predictions[, 2], levels = c("Case", "Control"), direction = "<", plot = T)
Data: rf$predictions[, 2] in 19 controls (actual Case) < 19 cases (actual Control).
Area under the curve: 0.1274

As we can see in the above code, I can have 2 different AUCs. I refer to the tutorial of roc() and https://stackoverflow.com/questions/31756682/what-does-coercing-the-direction-argument-input-in-roc-function-package-proc that mentioned about direction means probability < |> the threshold.

Does direction mean: when I calculate the 1st sample, if I use threshold=0.5 and direction ">", direction means 0.7535736> 0.5, sample 1 will be predicted as "Case"? If I use threshold = 0.5 and direction "<", what does direction mean? Too confused. When to use ">" and when to use "<"?
Looking forward to your help! Much appreciated!

@xrobin
Copy link
Owner

xrobin commented Feb 9, 2024

Thanks for your report.

I'm not sure what's unclear exactly. What do you suggest should be clarified precisely, and can you maybe make some suggestions of better ways to explain that?

@xrobin xrobin added the more-info-needed The problem is unclear and the issue needs more information label Feb 9, 2024
@Ivy-ops
Copy link
Author

Ivy-ops commented Feb 9, 2024

Hi @xrobin , Thanks for the reply.
Based on the tutorial:

">”: if the predictor values for the control group are higher than the values of the case group (controls > t >= cases)
“<”: if the predictor values for the control group are lower or equal than the values of the case group (controls < t <= cases).

In my case:
Does direction mean: when I calculate the 1st sample[the prediction probability for Control=0.24642643; Case=0.7535736], if I use threshold=0.5 and direction ">", direction means: 0.7535736> 0.5, sample 1 will be predicted as "Case"?
If I use threshold = 0.5 and direction "<", what does direction mean?
Thank you for your patience!

@xrobin xrobin added the doc label Mar 5, 2024
xrobin added a commit that referenced this issue Mar 5, 2024
@xrobin
Copy link
Owner

xrobin commented Mar 5, 2024

I attempted to clarify the documentation. Here is the new description of direction:

how are positive observations defined? “<”: observations are positive when they are greater than or equal (>=) to the threshold. “>”: observations are positive when they are smaller than or equal (<=) to the threshold. “auto” (default): automatically detect in which group the median is higher and take the direction accordingly. See details. You should set this explicity to “>” or “<” whenever you are resampling or randomizing the data, otherwise the curves will be biased towards higher AUC values.

Is it clearer like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc more-info-needed The problem is unclear and the issue needs more information
Projects
None yet
Development

No branches or pull requests

2 participants