How to collaborate with ASReview? #975
Replies: 9 comments 10 replies
-
Both researchers install the software locally, upload the same data and precisely the same prior knowledge (e.g., ten relevant +10 irrelevant), train the same model, and independently start screening for relevance. After both researchers are done screening (either because the time is up or because the model presents only irrelevant records), the results, containing the labeling decisions and the ranking of the unseen records, are exported as a RIS, CSV, or XLSX file. Both files can be merged in, for example, Excel or R. Now, you can compute the similarity in decisions, just like with a classical PRISMA-based review. The only difference is that there might be records only seen by one of the two researchers. My advice would be to discuss such papers with the team and make a joint decision. |
Beta Was this translation helpful? Give feedback.
-
Both researchers use the same data but different settings: for example different prior knowledge so that you can check if using different starting values result in the same set of relevant papers. A similar procedure can be applied by choosing different models (there are many different feature extraction techniques and classifiers available). |
Beta Was this translation helpful? Give feedback.
-
Sequential Screening: Research A starts with screening and Reviewer B takes over if the screening time for A is done. If B is done, then A takes over again. (this is especially interesting if you collaborate with researchers in different time zones, so screening can continue 24 hours per day!). You can export the project file, share it with a colleague who can import the file in ASReview to continue screening where the first researchers stopped (officially supported). Alternatively, ASReview can be put on a server (not officially supported, but successfully implemented by some users). |
Beta Was this translation helpful? Give feedback.
-
Researcher A starts with screening until only irrelevant records are presented. Researcher B imports the same dataset but uses different prior knowledge: all relevant records found by Researcher A as well as the unseen records, but not the excluded records. This can easily be done in Excel (or R) by adding a column to the dataset containing the labeling decisions which should be used as prior knowledge. Research B trains a new model, and screens until only irrelevant records are found. This strategy is a quality check for the labeling decisions made by the first screener (i.e., to check for incorrectly excluded records). |
Beta Was this translation helpful? Give feedback.
-
After researcher A is done, the data is exported and imported again (the software will automatically recognize the previous labeling decisions and will use these as prior knowledge to train the first iteration of the model). Researcher B trains a different model, preferably with a more advanced feature extractor (e.g., sBert, doc2vec) in combination with a neural network (e.g., 2NN, LSTM, 17-layer CNN). Such deep learning models typically dig deeper into the contextual meaning of the text but require much more training data. Researcher B trains a deep learning model (this might take some time, up to a couple of days depending on the size of your data and your pc (you might want to run this model on a [cloud] server), and screens another set of records. |
Beta Was this translation helpful? Give feedback.
-
Once researchers A and B have completed their reviews, they might be interested in this: an R Markdown file that automatically generates a report summarising the level of agreement between their decisions. You can view all the material required to generate your own report on the corresponding GitHub repository and look at a demonstration report on the html preview page. A number of descriptives are produced in the report, including the number of abstracts reviewed, the number flagged as relevant, the number left unreviewed, the Kappa statistic, and the number of abstracts for which researchers A and B disagreed (e.g., A said relevant, B said irrelevant). In the final case, tables are produced which detail the specific literature involved. The report also identifies the number of cases in which there was disagreement 'implicitly'. This is when one researcher rates an abstract as relevant, but the other researcher never reviews that abstract. This is likely to occur because of the 'stop rule' used by the team (e.g., one hundred irrelevant articles in a row). If this number is greater than zero for either researcher, it indicates that one or both researchers stopped too early. The report is a work in progress. I would appreciate any checks, feedback, comparisons to your own findings, and suggestions for other features the report could include. |
Beta Was this translation helpful? Give feedback.
-
For inspiration: we have a list of systematic reviews with multiple screeners where ASReview was used as a screening tool. |
Beta Was this translation helpful? Give feedback.
-
How can I share my workinglist with another screener so we can include and exclude together? |
Beta Was this translation helpful? Give feedback.
-
Thanks for doing that — I am curious what they have to say. It might not be a big issue, as the two differently trained models might accurately reveal the disagreements between screeners in the pool. But maybe there is some bias. I am not an expert in this either, but a concerned user, so if you can post something based on what your contacts say, I’d appreciate it.
… On 9. Jun 2023, at 09:58, Rens van de schoot ***@***.***> wrote:
That's an interesting perspective! You've raised a valid point about the potential value of evaluating the Inter-Rater Reliability (IRR) based on the set of records screened by both reviewers. This could indeed provide additional insights into the agreement regarding the inclusion criteria between the reviewers, as you've rightly noted.
However, when considering missing data, the dynamics might change, and I'm not currently aware of any study that has delved into exploring the IRR in this specific context. Your suggestion about a simulation study seems like a promising way to shed light on this issue and I appreciate the thoughtfulness behind it.
There are some notable experts in the field of missing data that I am acquainted with. Given the unique nature of this issue, I believe their insights could significantly contribute to this discussion. I'll reach out to them and see what they think.
—
Reply to this email directly, view it on GitHub <#975 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AGVJZ7B3OB7YEXLQQGWKOHTXKLJRBANCNFSM5OBYM6XA>.
You are receiving this because you commented.
|
Beta Was this translation helpful? Give feedback.
-
Systematic reviewing using software implementing Active Learning (AL) is relatively new. Many users (and reviewers) have to get familiar with the many different ways of how AL can be used in practice. In this discussion thread, we will discuss various options meant to inspire others and help answer questions from your collaborators or reviewers.
Let’s assume you conducted a systematic search in multiple databases, the results were merged into one dataset, the data was de-duplicated, and as many abstracts as possible were retrieved (why?). You found 10,000 potentially relevant records, and you want to screen the records based on predefined inclusion/exclusion criteria with two screeners, researchers A and B. Also, you have ten records of which you already know are relevant and ten irrelevant; this is what we call prior knowledge.
What options do you have?
Beta Was this translation helpful? Give feedback.
All reactions