BOND

This repo contains our code and pre-processed distantly/weakly labeled data for paper BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision (KDD2020)

BOND

Benchmark

The reuslts (entity-level F1 score) are summarized as follows:

Method	CoNLL03	Tweet	OntoNote5.0	Webpage	Wikigold
Full Supervision	91.21	52.19	86.20	72.39	86.43
Previous SOTA	76.00	26.10	67.69	51.39	47.54
BOND	81.48	48.01	68.35	65.74	60.07

Full Supervision: Roberta Finetuning/BiLSTM CRF
Previous SOTA: BiLSTM-CRF/AutoNER/LR-CRF/KALM/CONNET

Data

We release five open-domain distantly/weakly labeled NER datasets here: dataset. For gazetteers information and distant label generation code, please directly email cliang73@gatech.edu.

Environment

Python 3.7, Pytorch 1.3, Hugging Face Transformers v2.3.0.

Training & Evaluation

We provides the training scripts for all five open-domain distantly/weakly labeled NER datasets in scripts. E.g., for BOND training and evaluation on CoNLL03

cd BOND
./scripts/conll_self_training.sh

For Stage I training and evaluation on CoNLL03

cd BOND
./scripts/conll_baseline.sh

The test reuslts (entity-level F1 score) are summarized as follows:

Method	CoNLL03	Tweet	OntoNote5.0	Webpage	Wikigold
Stage I	75.61	46.61	68.11	59.11	52.15
BOND	81.48	48.01	68.35	65.74	60.07

Citation

Please cite the following paper if you are using our datasets/tool. Thanks!

@inproceedings{liang2020bond,
  title={BOND: Bert-Assisted Open-Domain Named Entity Recognition with Distant Supervision},
  author={Liang, Chen and Yu, Yue and Jiang, Haoming and Er, Siawpeng and Wang, Ruijia and Zhao, Tuo and Zhang, Chao},
  booktitle={ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
dataset		dataset
docs		docs
scripts		scripts
semi_script		semi_script
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
data_utils.py		data_utils.py
eval.py		eval.py
model_utils.py		model_utils.py
modeling_bert.py		modeling_bert.py
modeling_roberta.py		modeling_roberta.py
run_ner.py		run_ner.py
run_self_training_ner.py		run_self_training_ner.py

License

cliang1453/BOND

Folders and files

Latest commit

History

Repository files navigation

BOND

BOND

Benchmark

Data

Environment

Training & Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages