Replies: 1 comment 2 replies
-
Dear @mruderman, regardless of the deduplication method, there's always the risk of starting of with some duplicates in your set. Having removed more than 3000 and still seeing 97 duplicates doesn't seem strange. Depending on whether they are relevant or not, you risk seeing them twice if you mark them relevant. Marking them (the first of the two) as irrelevant will send the duplicate to the end of the list. |
Beta Was this translation helpful? Give feedback.
-
After completing about a third of our review, we realized that our external de-duplication software (Covidence) removed 3,343 duplicates, but it apparently missed about 97 duplicates from a set of 1753 identified using Zotero.
The question is whether we should:
Run datatools ASReview deduplication mid-way through (replace the RIS in the .asreview directory with the further deduplicated one exported from Zoters)? Potentially some issues top of mind like whether having multpl labels for relevant or irrelevant will bother ASReview, how it interacts with the RIS, etc.?
Simply start over with the final deduplicated set?
Continue on with the original dataset containing 97 duplicates (unknown whether random or not)?
I know @boer0107 (or possibly @boer0107-zz) previously suggested the risks are unknown and depend on the randomness of the data; however, this was in 2021. Does anyone have any additional insights? Or perhaps a method to test whether the duplicated articles were at random?
Any thoughts are appreciated. Thank you.
@eltrank
Beta Was this translation helpful? Give feedback.
All reactions