Implications of running deduplication function on ASReview already in process #1578

mruderman · 2023-11-08T23:43:55Z

mruderman
Nov 8, 2023

After completing about a third of our review, we realized that our external de-duplication software (Covidence) removed 3,343 duplicates, but it apparently missed about 97 duplicates from a set of 1753 identified using Zotero.

The question is whether we should:

Run datatools ASReview deduplication mid-way through (replace the RIS in the .asreview directory with the further deduplicated one exported from Zoters)? Potentially some issues top of mind like whether having multpl labels for relevant or irrelevant will bother ASReview, how it interacts with the RIS, etc.?
Simply start over with the final deduplicated set?
Continue on with the original dataset containing 97 duplicates (unknown whether random or not)?

I know @boer0107 (or possibly @boer0107-zz) previously suggested the risks are unknown and depend on the randomness of the data; however, this was in 2021. Does anyone have any additional insights? Or perhaps a method to test whether the duplicated articles were at random?

Any thoughts are appreciated. Thank you.

@eltrank

FelixWdm · 2023-11-13T15:20:18Z

FelixWdm
Nov 13, 2023
Collaborator

Dear @mruderman, regardless of the deduplication method, there's always the risk of starting of with some duplicates in your set. Having removed more than 3000 and still seeing 97 duplicates doesn't seem strange. Depending on whether they are relevant or not, you risk seeing them twice if you mark them relevant. Marking them (the first of the two) as irrelevant will send the duplicate to the end of the list.
I would advice not to also use the ASReview Datatools, since the deduplication is very generic and will very likely not find anything additionally once you've used Covidence's deduplication.
I hope this helps, if not, let us know. Best Felix

2 replies

mruderman Nov 13, 2023
Author

Thank you for your answer, I really appreciate it. To clarify, would you also advise to avoid Zotero's deduplication tool? (Zotero's de-duplication add-in is the one that found the remaining 97 duplicates). It sounds like you're recommending to continue on with the original dataset that includes the 97 duplicates; just wanting to confirm.

FelixWdm Nov 17, 2023
Collaborator

@mruderman aplogies for the delay. I don't know all the pro's and con's of the Zotero deduplication but in general it's slow but reliable. My main concern was that the time spent adjusting the process was to high compared to the risks/benefits of removing the duplicates. Screening with duplicates is not ideal, but in the end you'll end up with the relevant papers. Hope this helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implications of running deduplication function on ASReview already in process #1578

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Implications of running deduplication function on ASReview already in process #1578

mruderman Nov 8, 2023

Replies: 1 comment · 2 replies

FelixWdm Nov 13, 2023 Collaborator

mruderman Nov 13, 2023 Author

FelixWdm Nov 17, 2023 Collaborator

mruderman
Nov 8, 2023

Replies: 1 comment 2 replies

FelixWdm
Nov 13, 2023
Collaborator

mruderman Nov 13, 2023
Author

FelixWdm Nov 17, 2023
Collaborator