- tensorflow
- keras
- numpy
- shutil
- sklearn
- crf-ma-datasets: Download the original crowdsourced dataset annotated by Rodrigues et. al from http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz.
- pre_trained_bert: Download the pre-trained BERT model and place it into this folder.
- NER: This folder contains processed dataset. NER/original_data folder contains processed crf-ma-datasets and NER/processed_test_data folder contains processed test set.
- execution: All the execution results are stored in this folder. execution/calculations folder contains iteration wise results.
- NER: Dataset can be found at http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz
- PICO: Dataset can be found at https://github.com/yinfeiy/PICO-data
Dataset | Precision(S) | Recall(S) | F1(S) | Precision(R) | Recall(R) | F1(R) |
---|---|---|---|---|---|---|
NER | 83.02 | 78.69 | 80.79 | 92.64 | 92.47 | 91.63 |
PICO | 64.03 | 52.62 | 57.77 | 92.20 | 95.15 | 93.65 |
- original data preprocessing: python data_preprocessing.py
- execution data preprocessing: python execution_data_preprocessing.py
- execution: python execution.py
- result calculation: python calculations.py
NOTE 1: The execution files will be stored in execution folder and the result calculation will be stored in execution/calculations folder. We use conlleval evaluation scripts from http://amilab.dei.uc.pt/fmpr/ma-crf.tar.gz.
NOTE 2: PICO results can be reproduced by executing the scripts on PICO dataset.
NOTE 3: We have provided the execution run results for each iteration in execution/calculations folder.
- CRF-MA: @article{rodrigues2014sequence, title={Sequence labeling with multiple annotators}, author={Rodrigues, Filipe and Pereira, Francisco and Ribeiro, Bernardete}, journal={Machine learning}, volume={95}, number={2}, pages={165--181}, year={2014}, publisher={Springer} }
- AggSLC: {@INPROCEEDINGS{9679072, author={Sabetpour, Nasim and Kulkarni, Adithya and Xie, Sihong and Li, Qi}, booktitle={2021 IEEE International Conference on Data Mining (ICDM)}, title={Truth Discovery in Sequence Labels from Crowds}, year={2021}, pages={539-548}, doi={10.1109/ICDM51629.2021.00065} }
- DL-CL: @inproceedings{rodrigues2018deep, title={Deep learning from crowds}, author={Rodrigues, Filipe and Pereira, Francisco C}, booktitle={Thirty-Second AAAI Conference on Artificial Intelligence}, year={2018} }
- BERT Pre trained: @article{devlin2018bert, title={Bert: Pre-training of deep bidirectional transformers for language understanding}, author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, journal={Proceedings of NAACL-HLT 2019, Association for Computational Linguistics}, year={2019}, pages = {4171–4186} }
- OPTSLA: @inproceedings{sabetpour-etal-2020-optsla, title = "{O}pt{SLA}: an Optimization-Based Approach for Sequential Label Aggregation", author = "Sabetpour, Nasim and Kulkarni, Adithya and Li, Qi", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.119", doi = "10.18653/v1/2020.findings-emnlp.119", pages = "1335--1340"}