GitHub - NasimISU/Truth-Discovery-in-Sequence-Labels-from-Crowds: This repository contains the implementation code for paper : "Truth Discovery in Sequence Labels from Crowds" Accepted in IEEE ICDM 2021

This is the repository for the paper: Truth Discovery in Sequential Labels from Crowds

Requirements

tensorflow
keras
numpy
shutil
sklearn

Folder Description

crf-ma-datasets: Download the original crowdsourced dataset annotated by Rodrigues et. al from http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz.
pre_trained_bert: Download the pre-trained BERT model and place it into this folder.
NER: This folder contains processed dataset. NER/original_data folder contains processed crf-ma-datasets and NER/processed_test_data folder contains processed test set.
execution: All the execution results are stored in this folder. execution/calculations folder contains iteration wise results.

Dataset details

NER: Dataset can be found at http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz
PICO: Dataset can be found at https://github.com/yinfeiy/PICO-data

System Description

All executions are done on Macbook Pro with 2.6 GHz 6-CoreIntel Core i7 processor and 16 GB memory.

Results

Dataset	Precision(S)	Recall(S)	F1(S)	Precision(R)	Recall(R)	F1(R)
NER	83.02	78.69	80.79	92.64	92.47	91.63
PICO	64.03	52.62	57.77	92.20	95.15	93.65

NOTE: S refers to strict metrics and R refers to relaxed metrics.

Results reproduction commands

original data preprocessing: python data_preprocessing.py
execution data preprocessing: python execution_data_preprocessing.py
execution: python execution.py
result calculation: python calculations.py

NOTE 1: The execution files will be stored in execution folder and the result calculation will be stored in execution/calculations folder. We use conlleval evaluation scripts from http://amilab.dei.uc.pt/fmpr/ma-crf.tar.gz.

NOTE 2: PICO results can be reproduced by executing the scripts on PICO dataset.

NOTE 3: We have provided the execution run results for each iteration in execution/calculations folder.

References

CRF-MA: @article{rodrigues2014sequence, title={Sequence labeling with multiple annotators}, author={Rodrigues, Filipe and Pereira, Francisco and Ribeiro, Bernardete}, journal={Machine learning}, volume={95}, number={2}, pages={165--181}, year={2014}, publisher={Springer} }
AggSLC: {@INPROCEEDINGS{9679072, author={Sabetpour, Nasim and Kulkarni, Adithya and Xie, Sihong and Li, Qi}, booktitle={2021 IEEE International Conference on Data Mining (ICDM)}, title={Truth Discovery in Sequence Labels from Crowds}, year={2021}, pages={539-548}, doi={10.1109/ICDM51629.2021.00065} }
DL-CL: @inproceedings{rodrigues2018deep, title={Deep learning from crowds}, author={Rodrigues, Filipe and Pereira, Francisco C}, booktitle={Thirty-Second AAAI Conference on Artificial Intelligence}, year={2018} }
BERT Pre trained: @article{devlin2018bert, title={Bert: Pre-training of deep bidirectional transformers for language understanding}, author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, journal={Proceedings of NAACL-HLT 2019, Association for Computational Linguistics}, year={2019}, pages = {4171–4186} }
OPTSLA: @inproceedings{sabetpour-etal-2020-optsla, title = "{O}pt{SLA}: an Optimization-Based Approach for Sequential Label Aggregation", author = "Sabetpour, Nasim and Kulkarni, Adithya and Li, Qi", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.119", doi = "10.18653/v1/2020.findings-emnlp.119", pages = "1335--1340"}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
NER		NER
execution		execution
.DS_Store		.DS_Store
README.md		README.md
bert.py		bert.py
calculations.py		calculations.py
compute_class_weights.py		compute_class_weights.py
conlleval.pl		conlleval.pl
conlleval.py		conlleval.py
conlleval.pyc		conlleval.pyc
data_preprocessing.py		data_preprocessing.py
execution.py		execution.py
execution_data_preprocessing.py		execution_data_preprocessing.py

NasimISU/Truth-Discovery-in-Sequence-Labels-from-Crowds

Folders and files

Latest commit

History

Repository files navigation

Requirements

Folder Description

Dataset details

System Description

Results

Results reproduction commands

References

About

Resources

Stars

Watchers

Forks

Languages