How to collaborate with ASReview? #975

Rensvandeschoot · 2022-02-10T19:20:47Z

Rensvandeschoot
Feb 10, 2022
Maintainer

Systematic reviewing using software implementing Active Learning (AL) is relatively new. Many users (and reviewers) have to get familiar with the many different ways of how AL can be used in practice. In this discussion thread, we will discuss various options meant to inspire others and help answer questions from your collaborators or reviewers.

Let’s assume you conducted a systematic search in multiple databases, the results were merged into one dataset, the data was de-duplicated, and as many abstracts as possible were retrieved (why?). You found 10,000 potentially relevant records, and you want to screen the records based on predefined inclusion/exclusion criteria with two screeners, researchers A and B. Also, you have ten records of which you already know are relevant and ten irrelevant; this is what we call prior knowledge.

What options do you have?

Rensvandeschoot · 2022-02-11T13:27:28Z

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Both researchers install the software locally, upload the same data and precisely the same prior knowledge (e.g., ten relevant +10 irrelevant), train the same model, and independently start screening for relevance. After both researchers are done screening (either because the time is up or because the model presents only irrelevant records), the results, containing the labeling decisions and the ranking of the unseen records, are exported as a RIS, CSV, or XLSX file. Both files can be merged in, for example, Excel or R. Now, you can compute the similarity in decisions, just like with a classical PRISMA-based review. The only difference is that there might be records only seen by one of the two researchers. My advice would be to discuss such papers with the team and make a joint decision.

1 reply

Rensvandeschoot Feb 11, 2022
Maintainer Author

One researcher can set up the project and share the project before the actual screening starts. This will make sure that you really start with the same priors and project setup.

Rensvandeschoot · 2022-02-11T13:31:01Z

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Both researchers use the same data but different settings: for example different prior knowledge so that you can check if using different starting values result in the same set of relevant papers. A similar procedure can be applied by choosing different models (there are many different feature extraction techniques and classifiers available).

0 replies

Rensvandeschoot · 2022-02-11T13:31:51Z

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Sequential Screening:

Research A starts with screening and Reviewer B takes over if the screening time for A is done. If B is done, then A takes over again. (this is especially interesting if you collaborate with researchers in different time zones, so screening can continue 24 hours per day!). You can export the project file, share it with a colleague who can import the file in ASReview to continue screening where the first researchers stopped (officially supported).

Alternatively, ASReview can be put on a server (not officially supported, but successfully implemented by some users).

0 replies

Rensvandeschoot · 2022-02-11T13:32:26Z

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Researcher A starts with screening until only irrelevant records are presented. Researcher B imports the same dataset but uses different prior knowledge: all relevant records found by Researcher A as well as the unseen records, but not the excluded records. This can easily be done in Excel (or R) by adding a column to the dataset containing the labeling decisions which should be used as prior knowledge. Research B trains a new model, and screens until only irrelevant records are found. This strategy is a quality check for the labeling decisions made by the first screener (i.e., to check for incorrectly excluded records).

0 replies

Rensvandeschoot · 2022-02-11T13:32:43Z

Rensvandeschoot
Feb 11, 2022
Maintainer Author

After researcher A is done, the data is exported and imported again (the software will automatically recognize the previous labeling decisions and will use these as prior knowledge to train the first iteration of the model). Researcher B trains a different model, preferably with a more advanced feature extractor (e.g., sBert, doc2vec) in combination with a neural network (e.g., 2NN, LSTM, 17-layer CNN). Such deep learning models typically dig deeper into the contextual meaning of the text but require much more training data. Researcher B trains a deep learning model (this might take some time, up to a couple of days depending on the size of your data and your pc (you might want to run this model on a [cloud] server), and screens another set of records.

0 replies

langtonhugh · 2022-02-15T10:50:23Z

langtonhugh
Feb 15, 2022

Once researchers A and B have completed their reviews, they might be interested in this: an R Markdown file that automatically generates a report summarising the level of agreement between their decisions. You can view all the material required to generate your own report on the corresponding GitHub repository and look at a demonstration report on the html preview page.

A number of descriptives are produced in the report, including the number of abstracts reviewed, the number flagged as relevant, the number left unreviewed, the Kappa statistic, and the number of abstracts for which researchers A and B disagreed (e.g., A said relevant, B said irrelevant). In the final case, tables are produced which detail the specific literature involved.

The report also identifies the number of cases in which there was disagreement 'implicitly'. This is when one researcher rates an abstract as relevant, but the other researcher never reviews that abstract. This is likely to occur because of the 'stop rule' used by the team (e.g., one hundred irrelevant articles in a row). If this number is greater than zero for either researcher, it indicates that one or both researchers stopped too early.

The report is a work in progress. I would appreciate any checks, feedback, comparisons to your own findings, and suggestions for other features the report could include.

5 replies

robintschoetschel Jun 2, 2023

That's a helpful resource! But don't you think that providing Kappa/IRR scores based on only the actually seen/reviewed records would be informative as well? Using the full dataset inflates the percentage agreement for sure and might bias Kappa as well (I know it corrects against chance agreement). I think metrics of both have their use: the actual rater agreement tells us more about how much human raters actually agree on the inclusion criteria, and the scores you calculate give an indication of how well that information is used by the algorithm. That's at least how I interpret it.

Rensvandeschoot Jun 5, 2023
Maintainer Author

Do you mean basing the IRR on the set of records seen by both screeners?

robintschoetschel Jun 8, 2023

Hi Rens, yes, exactly, I think that would make sense as an additional piece of information. If two screeners have strongly differing views on what to include/exclude, I would be worried that the separately trained models could lead to a number of articles not seen by either screener where there is false agreement that these should be excluded. But maybe this reasoning is unfounded. This could be tested in a simulation study?

Rensvandeschoot Jun 9, 2023
Maintainer Author

That's an interesting perspective! You've raised a valid point about the potential value of evaluating the Inter-Rater Reliability (IRR) based on the set of records screened by both reviewers. This could indeed provide additional insights into the agreement regarding the inclusion criteria between the reviewers, as you've rightly noted.

However, when considering missing data, the dynamics might change, and I'm not currently aware of any study that has delved into exploring the IRR in this specific context. Your suggestion about a simulation study seems like a promising way to shed light on this issue and I appreciate the thoughtfulness behind it.

There are some notable experts in the field of missing data that I am acquainted with. Given the unique nature of this issue, I believe their insights could significantly contribute to this discussion. I'll reach out to them and see what they think.

Hanedinges Jun 19, 2023

@langtonhugh, I'm interested in the R Markdown file you mention .... however I never know how to continue after clicking on any GitHub link: I start with eagerness to find cool stuff and close the tab with my tail between the legs. I have never learnt how to pursue and don't understand the instructions there. I feel a painful distance between me and the 'smart techies'. However, I don't want to give up, I love everything that adds to clearity and efficiency....would you mind to add a little bit more instruction for people like me? Or maybe you can help me by sending me an existing step by step guide? My apologizes if I have overlooked what I'm asking for...

boer0107-zz · 2022-02-17T09:54:25Z

boer0107-zz
Feb 17, 2022

For inspiration: we have a list of systematic reviews with multiple screeners where ASReview was used as a screening tool.

0 replies

JasikaP · 2023-05-16T08:23:43Z

JasikaP
May 16, 2023

How can I share my workinglist with another screener so we can include and exclude together?

4 replies

Rensvandeschoot May 16, 2023
Maintainer Author

Do you want both to label the same data? Note that this is different than the PRISMA guidelines where two screeners screen independent from each other

JasikaP May 16, 2023

yes we want to label the same data together.

Rensvandeschoot May 16, 2023
Maintainer Author

also, at the same time or serial?

JasikaP May 16, 2023

At the same time, would that be possible?

robintschoetschel · 2023-06-09T10:03:14Z

robintschoetschel
Jun 9, 2023

Thanks for doing that — I am curious what they have to say. It might not be a big issue, as the two differently trained models might accurately reveal the disagreements between screeners in the pool. But maybe there is some bias. I am not an expert in this either, but a concerned user, so if you can post something based on what your contacts say, I’d appreciate it.

…

On 9. Jun 2023, at 09:58, Rens van de schoot ***@***.***> wrote: That's an interesting perspective! You've raised a valid point about the potential value of evaluating the Inter-Rater Reliability (IRR) based on the set of records screened by both reviewers. This could indeed provide additional insights into the agreement regarding the inclusion criteria between the reviewers, as you've rightly noted. However, when considering missing data, the dynamics might change, and I'm not currently aware of any study that has delved into exploring the IRR in this specific context. Your suggestion about a simulation study seems like a promising way to shed light on this issue and I appreciate the thoughtfulness behind it. There are some notable experts in the field of missing data that I am acquainted with. Given the unique nature of this issue, I believe their insights could significantly contribute to this discussion. I'll reach out to them and see what they think. — Reply to this email directly, view it on GitHub <#975 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGVJZ7B3OB7YEXLQQGWKOHTXKLJRBANCNFSM5OBYM6XA>. You are receiving this because you commented.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to collaborate with ASReview? #975

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 9 comments 10 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to collaborate with ASReview? #975

Rensvandeschoot Feb 10, 2022 Maintainer

Replies: 9 comments · 10 replies

Rensvandeschoot Feb 11, 2022 Maintainer Author

Rensvandeschoot Feb 11, 2022 Maintainer Author

Rensvandeschoot Feb 11, 2022 Maintainer Author

Rensvandeschoot Feb 11, 2022 Maintainer Author

Rensvandeschoot Feb 11, 2022 Maintainer Author

Rensvandeschoot Feb 11, 2022 Maintainer Author

Rensvandeschoot Jun 5, 2023 Maintainer Author

Rensvandeschoot Jun 9, 2023 Maintainer Author

Rensvandeschoot May 16, 2023 Maintainer Author

Rensvandeschoot May 16, 2023 Maintainer Author

Rensvandeschoot
Feb 10, 2022
Maintainer

Replies: 9 comments 10 replies

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Rensvandeschoot Feb 11, 2022
Maintainer Author

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Rensvandeschoot
Feb 11, 2022
Maintainer Author

Rensvandeschoot Jun 5, 2023
Maintainer Author

Rensvandeschoot Jun 9, 2023
Maintainer Author

Rensvandeschoot May 16, 2023
Maintainer Author

Rensvandeschoot May 16, 2023
Maintainer Author