Skip to content

ProjectDossier/CitationScreeningReplicability

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CitationScreeningReplicability

arXiv

This repository is the official implementation of the ECIR 2022 paper Automation of Citation Screening for Systematic Literature Reviews using Neural Networks: A Replicability Study.

Citing

If you find our code useful, please cite our paper:

@inproceedings{kusa2022automation,
  title={Automation of Citation Screening for Systematic Literature Reviews Using Neural Networks: A Replicability Study},
  author={Kusa, Wojciech and Hanbury, Allan and Knoth, Petr},
  booktitle={European Conference on Information Retrieval},
  pages={584--598},
  year={2022},
  organization={Springer}
}

Installation

Tested with Python 3.8.

Install requirements with pip:

$ pip install -r requirements.txt

Datasets

Clinical

Original Clinical review datasets can be downloaded from here. Use src/data/prepare_clinical_data.py script to prepare the datasets. Make sure that the variable repository_path is set to a root of a bwallace/citation-screening/ repository.

Drug

Original Drug review datasets can be downloaded from here.

This dataset does not contain Abstract and Title information, so this data needs to be downloaded from PubMed using the article's PubMed ID. Place epc-ir.clean.tsv input file in a data/external/drug/ folder and run src/data/prepare_drug_data.py script.

SWIFT

Original SWIFT review datasets can be downloaded from here.

  • OHAT datasets (PFOA/PFOS, Bisphenol A (BPA), Transgenerational and Fluoride and neurotoxicity) are stored as four sheets in one Excel file.

  • CAMRADES dataset (Neuropathic pain) is stored as a separate Excel file.

Fluoride and neurotoxicity, and Neuropathic pain already contain a title and abstract data, so the only needed preparation step is a conversion of the Label column into a common format.

Other datasets consist only of PubMed IDs and assigned labels so, it is necessary to download abstract and title data using biopython.

src/data/prepare_swift_data.py script accept .tsv files, so you need to convert each dataset into separate .tsv file and place them in data/external/SWIFT/ folder.


For Drug and SWIFT datasets, in order to download documents from Pubmed, you need to set Entrez.email variable to your email address.

Results

Detailed results are stored in reports/ directory

  • results-document_features.csv file contains detailed results of input document feature influence for all models and datasets.
  • results-precision_at_95recall.csv file contains detailed precision@95% recall results for all models and datasets.
  • results-time.csv file contains training time measurement results for all models and datasets.

Figures

In order to recreate the figures, run jupyter notebook notebooks/plotting.ipynb.

Dataset statistics

In order to calculate dataset statistics, run src/data/dataset_statistics.py script.

About

[ECIR 2022] Automation of Citation Screening for Systematic Literature Reviews Using Neural Networks: A Replicability Study

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published