This is the repository for the paper: Truth Discovery in Sequential Labels from Crowds

Requirements

tensorflow
keras
numpy
shutil
sklearn

Folder Description

crf-ma-datasets: Download the original crowdsourced dataset annotated by Rodrigues et. al from http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz.
pre_trained_bert: Download the pre-trained BERT model and place it into this folder.
NER: This folder contains processed dataset. NER/original_data folder contains processed crf-ma-datasets and NER/processed_test_data folder contains processed test set.
execution: All the execution results are stored in this folder. execution/calculations folder contains iteration wise results.

Dataset details

NER: Dataset can be found at http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz
PICO: Dataset can be found at https://github.com/yinfeiy/PICO-data

System Description

All executions are done on Macbook Pro with 2.6 GHz 6-CoreIntel Core i7 processor and 16 GB memory.

Results

Dataset	Precision(S)	Recall(S)	F1(S)	Precision(R)	Recall(R)	F1(R)
NER	83.02	78.69	80.79	92.64	92.47	91.63
PICO	64.03	52.62	57.77	92.20	95.15	93.65

NOTE: S refers to strict metrics and R refers to relaxed metrics.

Results reproduction commands

original data preprocessing: python data_preprocessing.py
execution data preprocessing: python execution_data_preprocessing.py
execution: python execution.py
result calculation: python calculations.py

NOTE 1: The execution files will be stored in execution folder and the result calculation will be stored in execution/calculations folder. We use conlleval evaluation scripts from http://amilab.dei.uc.pt/fmpr/ma-crf.tar.gz.

NOTE 2: PICO results can be reproduced by executing the scripts on PICO dataset.

NOTE 3: We have provided the execution run results for each iteration in execution/calculations folder.

References

CRF-MA: @article{rodrigues2014sequence, title={Sequence labeling with multiple annotators}, author={Rodrigues, Filipe and Pereira, Francisco and Ribeiro, Bernardete}, journal={Machine learning}, volume={95}, number={2}, pages={165--181}, year={2014}, publisher={Springer} }
AggSLC: {@INPROCEEDINGS{9679072, author={Sabetpour, Nasim and Kulkarni, Adithya and Xie, Sihong and Li, Qi}, booktitle={2021 IEEE International Conference on Data Mining (ICDM)}, title={Truth Discovery in Sequence Labels from Crowds}, year={2021}, pages={539-548}, doi={10.1109/ICDM51629.2021.00065} }
DL-CL: @inproceedings{rodrigues2018deep, title={Deep learning from crowds}, author={Rodrigues, Filipe and Pereira, Francisco C}, booktitle={Thirty-Second AAAI Conference on Artificial Intelligence}, year={2018} }
BERT Pre trained: @article{devlin2018bert, title={Bert: Pre-training of deep bidirectional transformers for language understanding}, author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, journal={Proceedings of NAACL-HLT 2019, Association for Computational Linguistics}, year={2019}, pages = {4171–4186} }
OPTSLA: @inproceedings{sabetpour-etal-2020-optsla, title = "{O}pt{SLA}: an Optimization-Based Approach for Sequential Label Aggregation", author = "Sabetpour, Nasim and Kulkarni, Adithya and Li, Qi", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.119", doi = "10.18653/v1/2020.findings-emnlp.119", pages = "1335--1340"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Requirements

Folder Description

Dataset details

System Description

Results

Results reproduction commands

References

Files

README.md

Latest commit

History

README.md

File metadata and controls

Requirements

Folder Description

Dataset details

System Description

Results

Results reproduction commands

References