This repo contains raw annotaions for PICO dataset used in paper:
Aggregating and Predicting Sequence Labels from Crowd Annotations An Thanh Nguyen, Byron C. Wallace, Junyi Jessy Li, Ani Nenkova and Matthew Lease Association for Computational Linguistics (ACL), 2017.
A SDK and sample codes are provided for retrieving the annotations.
The dataset is in annotations/, it is splited into 4 parts:
- train/ contains random selected 3549 abstracts.
- dev/ contains random selected 500 abstracts.
- test/ contains random selected 500 abstracts.
- acl17-test contains 191 abstarcts with annotations by a medical student.
In each folder:
- PICO-annos-crowdsourcing.json contains annotations from crowd sourced workers.
- PICO-annos-crowdsourcing-agg.json contains aggregated results from crowd sourced annotations. The aggregation methods are described in Aggregating and Predicting Sequence Labels from Crowd Annotations.:
- PICO-annos-professional.json for acl17-test only, contains annotations from a medical student.
- Python 2
- spaCy for basic tokenization etc
- Sample code in src/examples/ folder
cd src
python -m examples.load_annotation