would like to add terms that mean less likely to include #1493

mamazebra · 2023-07-13T10:56:47Z

mamazebra
Jul 13, 2023

I sense that ASreview works by figuring out which words I'm looking for, so it keeps showing me records that include those words (staff, violence, aggression, etc).

It also keeps showing me records that have words that tell me quickly the article is almost certainly what I don't want, asreview is clearly not learning to downgrade these records:

music
cross-sectional
Halipronel (other drugs)
psychotropic
qualitative
animal
prevalence

It would be so cool if I could keep adding to a list of words that basically meant "downgrade these studies severely for likely relevance" as I come upon them. This is how real systematic reviewing happens. You start by looking for irrelevant words like mice or rabbit to quickly get rid of those most irrelevant studies. You don't always know what irrelevant items will come up, so you can't prespecify them all at the initial training phase.

Rensvandeschoot · 2023-07-27T13:54:31Z

Rensvandeschoot
Jul 27, 2023
Maintainer

ASReview indeed uses a machine learning model to learn from your inclusion and exclusion decisions based on the information provided in the articles. However, it does not directly rely on specific keywords that you might be interested in. The model learns to recognize patterns in the text based on your decisions, which can be more complex than simply the presence or absence of certain keywords.

For the step where we translate text into vectors that can be used by the classifier there are multiple options available that each work slightly differently:

TF-IDF: This stands for Term Frequency-Inverse Document Frequency, which is a numerical statistic used to reflect how important a word is to a document in a collection or corpus. It's based on the occurrence of words in documents. The TF (Term Frequency) part counts how often a term occurs in a document, while the IDF (Inverse Document Frequency) part scales down the weight of terms that occur very frequently in the document set and scales up the weight of terms that occur rarely. This way, it gives higher weight to more 'informative' words.

Doc2Vec: This is an algorithm that extends the word2vec method to larger blocks of text, such as sentences, paragraphs or entire documents. It learns to represent these blocks of text as vectors in a high-dimensional space in such a way that semantically similar texts are close together in that space. This makes it possible to perform semantic operations on the vectors, for instance finding texts that are similar to a given text.

SBERT: SBERT (Sentence-BERT) is a modification of the pre-trained BERT network that uses siamese and triplet network structures to derive semantically meaningful sentence embeddings. BERT is a transformer-based machine learning technique for natural language processing pre-training developed by Google. SBERT allows you to derive sentence embeddings, which means you can calculate semantically meaningful similarity between sentences, allowing more fine-grained matching than at the document level.

These are the default models used in ASReview because they are efficient and effective ways of converting the text of articles into a form that machine learning algorithms can use to learn from your inclusion and exclusion decisions. Users can also use their own feature extractor, such as a multilanguage one that can deal with texts written in different languages.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

would like to add terms that mean less likely to include #1493

{{title}}

Replies: 1 comment

{{title}}

Select a reply

would like to add terms that mean less likely to include #1493

mamazebra Jul 13, 2023

Replies: 1 comment

Rensvandeschoot Jul 27, 2023 Maintainer

mamazebra
Jul 13, 2023

Rensvandeschoot
Jul 27, 2023
Maintainer