Skip to content

NasimISU/Truth-Discovery-in-Sequence-Labels-from-Crowds

Repository files navigation

This is the repository for the paper: Truth Discovery in Sequential Labels from Crowds

Requirements

  • tensorflow
  • keras
  • numpy
  • shutil
  • sklearn

Folder Description

  1. crf-ma-datasets: Download the original crowdsourced dataset annotated by Rodrigues et. al from http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz.
  2. pre_trained_bert: Download the pre-trained BERT model and place it into this folder.
  3. NER: This folder contains processed dataset. NER/original_data folder contains processed crf-ma-datasets and NER/processed_test_data folder contains processed test set.
  4. execution: All the execution results are stored in this folder. execution/calculations folder contains iteration wise results.

Dataset details

  1. NER: Dataset can be found at http://amilab.dei.uc.pt/fmpr/crf-ma-datasets.tar.gz
  2. PICO: Dataset can be found at https://github.com/yinfeiy/PICO-data

System Description

All executions are done on Macbook Pro with 2.6 GHz 6-CoreIntel Core i7 processor and 16 GB memory.

Results

Dataset Precision(S) Recall(S) F1(S) Precision(R) Recall(R) F1(R)
NER 83.02 78.69 80.79 92.64 92.47 91.63
PICO 64.03 52.62 57.77 92.20 95.15 93.65
NOTE: S refers to strict metrics and R refers to relaxed metrics.

Results reproduction commands

  1. original data preprocessing: python data_preprocessing.py
  2. execution data preprocessing: python execution_data_preprocessing.py
  3. execution: python execution.py
  4. result calculation: python calculations.py

NOTE 1: The execution files will be stored in execution folder and the result calculation will be stored in execution/calculations folder. We use conlleval evaluation scripts from http://amilab.dei.uc.pt/fmpr/ma-crf.tar.gz.

NOTE 2: PICO results can be reproduced by executing the scripts on PICO dataset.

NOTE 3: We have provided the execution run results for each iteration in execution/calculations folder.

References

  1. CRF-MA: @article{rodrigues2014sequence, title={Sequence labeling with multiple annotators}, author={Rodrigues, Filipe and Pereira, Francisco and Ribeiro, Bernardete}, journal={Machine learning}, volume={95}, number={2}, pages={165--181}, year={2014}, publisher={Springer} }
  2. AggSLC: {@INPROCEEDINGS{9679072, author={Sabetpour, Nasim and Kulkarni, Adithya and Xie, Sihong and Li, Qi}, booktitle={2021 IEEE International Conference on Data Mining (ICDM)}, title={Truth Discovery in Sequence Labels from Crowds}, year={2021}, pages={539-548}, doi={10.1109/ICDM51629.2021.00065} }
  3. DL-CL: @inproceedings{rodrigues2018deep, title={Deep learning from crowds}, author={Rodrigues, Filipe and Pereira, Francisco C}, booktitle={Thirty-Second AAAI Conference on Artificial Intelligence}, year={2018} }
  4. BERT Pre trained: @article{devlin2018bert, title={Bert: Pre-training of deep bidirectional transformers for language understanding}, author={Devlin, Jacob and Chang, Ming-Wei and Lee, Kenton and Toutanova, Kristina}, journal={Proceedings of NAACL-HLT 2019, Association for Computational Linguistics}, year={2019}, pages = {4171–4186} }
  5. OPTSLA: @inproceedings{sabetpour-etal-2020-optsla, title = "{O}pt{SLA}: an Optimization-Based Approach for Sequential Label Aggregation", author = "Sabetpour, Nasim and Kulkarni, Adithya and Li, Qi", booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020", month = nov, year = "2020", address = "Online", publisher = "Association for Computational Linguistics", url = "https://www.aclweb.org/anthology/2020.findings-emnlp.119", doi = "10.18653/v1/2020.findings-emnlp.119", pages = "1335--1340"}

About

This repository contains the implementation code for paper : "Truth Discovery in Sequence Labels from Crowds" Accepted in IEEE ICDM 2021

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published