Segmented screening to improve ASReview performance for transdisciplinary SLRs? #1580
Replies: 1 comment 3 replies
-
Hi @TimothyMarcroft, I am partly copying my answer from #1547. Some models we have available in ASReview are context based classification models as opposed to vocabulary based. These models are much more context dependent and have proven to be resistant to divergent terminology. To simplify a little (a lot), these models will analyze the contextual usage of every word compared to every other word. This means that if two terms exist for the same concept but are used in similar contexts, they will end up closer in the embedding space. This is in contrast to simpler techniques like TF-IDF, which rely more on the frequency of individual terms and may not capture such nuances, which means having to rely on the classification model to make those connections. In practical terms, if you are dealing with a corpus that has a lot of heterogeneous terminology, more advanced models like doc2vec or sBERT may provide more accurate representations of the underlying semantic structures. These models can capture the semantic similarity between different terms that are contextually similar, even if they do not appear the same. Keep in mind that sBERT will take some time to train, but in your situation I would highly recommend it's use as feature extractor. |
Beta Was this translation helpful? Give feedback.
-
Hello ASReview community,
PhD student in the social sciences here. I joined the summer school put on by Utrecht University a few months ago and I am now working on the methodology for my very first systematic literature review. I plan to use ASReview during the screening process, but I am a little bit concerned that my specific context and research question will make the tool less effective than it would otherwise be. The thing is, my research question is quite pragmatic and my approach is transdisciplinary. This means that there are many different ways to refer to my objects of study out there in the literature, with each discipline often having its own lexicon. I want to include all of the papers that discuss an instance of a real-world phenomenon, not just those that use a single set of vocabulary to describe it. So, compared to a relatively mono-discipline SLR, the papers I want to include will be more linguistically different from one another which would seem like it could cause performance issues for ASReview. I have an idea of how to overcome this challenge while still getting good value out of ASReview: journal-segmented screening.
In this approach, I would perform a relatively broad search in several databases, combine them, deduplicate, and then segment my screening phase by journal. So, I would create a separate .csv file and do a separate screening procedure for each different journal with each following the same stopping criteria. My logic here is that journals are probably the best proxy I'm going to find for disciplinary affiliation with any ease, and that (on average) the vocabulary used by a given article is going to be more similar to those published within that journal than those published in a different one. This should improve the performance of ASReview, if I'm understanding things correctly. I would then merge the resulting labeled datasets once I had finished and deduplicate (although I would expect few duplicates at this stage).
One problem I see with this approach is that I can't be sure that there are any relevant records in every journal. In fact, I can be quite sure that some of the journals will include zero relevant records, making the selection of a stopping rule more complicated. Perhaps this is solveable, but I'm not sure.
Does this seem like an approach that would be worth the extra effort? Is the SAFE procedure strong enough, by combining multiple models, to not need this extra complication? I would love some feedback from people who have some experience with the tool and a deeper technical undestanding of it than I do. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions