ASReview Knee Criterion Help #1336
-
Dear all, First of all, thank you for this wonderful resource and the many helpful insights in the discussion forum. Our team (with @gspadaro90) is facing some challenges with determining a stopping criterion for our current systematic search using ASReview. After some reading, we perceived the knee-criterion (Cormack & Grossman, 2016) might be robust and suitable for our purposes, and decided to use this. With the help of @MaxvanHaastrecht, we managed to use his Python script to calculate this stopping criterion based on the Progress Analytics .csv file from ASReview. However, we found the results a bit puzzling and we wonder whether we are making any mistakes in the interpretation of the output or the implementation of the script. For reference, we have 10703 total records, we have labeled 2560 records (approaching 25%), and have currently found 914 relevant records. From our understanding of the script, the only values that we should set are k = 10 and rho (slope ratio) = 6 (as suggested by Cormack and Grossman, 2016). The output from the script indicates the following: We stop at s=94 since the slope from 0 to i=1 is 9.3 times higher than the intermediate slope between i and s. Our main confusion revolves around the interpretation of s=94. Based on the updated script, s is an evaluation point. Correct me if I am wrong, but I also assume that this corresponds to an amount of labeled records (in this case 94) out of which some are relevant. If this is indeed correct, we do not understand why the knee criterion provides us with such a low value, given that we have already found 914 relevant records, much more than anything under 100. I have also attached the .csv file we are currently using: I am not sure how to best share the accompanying script based on the original from @MaxvanHaastrecht. From what I understand, it is easiest to do this through a repository, so I have linked that here: Any suggestions or help would be highly appreciated! Tycho |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hey @tychovt99. I checked your CSV and I think the problem is in the way the 'Relevant records' column is defined. First of all, I should note that I am not familiar with how the current ASReview app exports statistics. However, when I wrote the script I assumed that the 'Relevant records' column would contain a progressively increasing series which indicated how many records had been marked as relevant up to and including the index currently being considered. I used this CSV as my reference point at that time: https://github.com/MaxvanHaastrecht/ASReview-Knee-Method/blob/main/ASReviewLABprogressRecall.2.csv. The 'Relevant by ASReview LAB' column in that CSV corresponds to what in your case would be the 'Relevant records' column. You will note that with the definition of 'progressively increasing series' I use, I imply that it should be a series of numbers which always either increases or stays constant. In the case of the 'Relevant records' column in your CSV this is not what happens. I think your CSV uses a slightly different definition to produce the numbers in the column. This is just a guess, but I think your column corresponds to something like 'Number of records in the last 10 that were relevant'. This yields a series that will sometimes increase and sometimes decrease, which means it is not suitable to be used (in that form) for the knee algorithm. Again, the CSV may just be the output that ASReview produces. I'm not fully up to date on the current ASReview implementation. But it is not quite the CSV you need. To help you along the way, attached is your CSV with an added column 'Relevant records knee' which contains the adapted version of your series to be suitable for the knee algorithm. I have assumed the definition of your original column is 'Number of records in the last 10 that were relevant'. If that is not what the original definition was, my CSV will still be incorrect. If it is, then I think you should be able to calculate the knee criterion using this new column and get a more logical result. In any case, good luck! |
Beta Was this translation helpful? Give feedback.
Hey @tychovt99. I checked your CSV and I think the problem is in the way the 'Relevant records' column is defined. First of all, I should note that I am not familiar with how the current ASReview app exports statistics. However, when I wrote the script I assumed that the 'Relevant records' column would contain a progressively increasing series which indicated how many records had been marked as relevant up to and including the index currently being considered. I used this CSV as my reference point at that time: https://github.com/MaxvanHaastrecht/ASReview-Knee-Method/blob/main/ASReviewLABprogressRecall.2.csv. The 'Relevant by ASReview LAB' column in that CSV corresponds to what in your ca…