Revisiting Unsupervised Relation Extraction

Source code for Revisiting Unsupervised Relation Extraction in ACL 2020

Environment

pip3 install -r requirements.txt

The experiments were conducted on Nvidia V100 GPUs (16GB GPU RAM). However, these methods are very small, you can run on most GPU.

Datasets

NYT: contact Diego Marcheggiani TACRED: TACRED Input format: same as sample

Both NYT and TACRED are pre-processed (tokenisation, entity typing).
We use Stanford CoreNLP to get dependency features for TACRED.
Entity types in NYT is a subset of TACRED, we map all entity types in TACRED that are unseen in NYT to MISC.

There are some vocabulary files needed to generate in advance. You can use the script

bash ure/preprocessing/run.sh

We also provide the file for feature extraction To generate the lexicon_file:

python ure/preprocessing/feature_extractor.py --generate_lexicon --input_file [file] --lexicon_file [file] --output_file [file] --threshold [occurrence threshold]

To generate features:

python ure/preprocessing/feature_extractor.py --input_file [file] --lexicon_file [file] --output_file [file] --threshold [occurrence threshold]

Usage

Training

EType+: B3 usually achieves 41% after one epoch

python -u -m ure.etypeplus.main  --config models/etypeplus.yml

Feature Marcheggiani and Titov: expect to get B3 around 32-33% after one epoch

python -u -m ure.feature.main --config models/feature.yml

PCNN Simon et al

python -u -m ure.pcnn.main --config models/pcnn.yml

Evaluation

python -u -m ure.etypeplus.main   --config models/etypeplus.yml --mode test

Reproducibility & Bug Fixes & FQA

L_s coefficient rel_dist.py is now shared among three methods in which loss_s is scaled down by [B x k_samples], hence, the coefficient of L_s of EType+ is set to 0.01 instead of 0.0001 in the paper. (Line 91 in /ure/rel_dist.py)

Entity type dimension in Table 4. (b,c) appendix There is a mistake, it is entity dimension in link predictor, we use the same dimension of 10 for all methods. (There is no entity type in PCNN.)

Typos in the paper Appendix A., in the second paragraph, the number of relation labels in NYT-FB should be 262 (253 in the paper). Same for the caption of Figure 2a, NYT-FB has 262 relation types in total. The last x axis label of Figure 2a. should be "each of the rest 249 relation types".

Citation

If you plan to use it, please cite the following paper =)

@inproceedings{tran-etal-2020-revisiting,
    title = "Revisiting Unsupervised Relation Extraction",
    author = "Tran, Thy Thy  and
      Le, Phong  and
      Ananiadou, Sophia",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.669",
    pages = "7498--7505"
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
models		models
ure		ure
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

models

models

ure

ure

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

requirements.txt

requirements.txt

Repository files navigation

Revisiting Unsupervised Relation Extraction

Environment

Datasets

Usage

Training

Evaluation

Reproducibility & Bug Fixes & FQA

Citation

About

Releases

Packages

Contributors 2

Languages

License

ttthy/ure

Folders and files

Latest commit

History

Repository files navigation

Revisiting Unsupervised Relation Extraction

Environment

Datasets

Usage

Training

Evaluation

Reproducibility & Bug Fixes & FQA

Citation

About

Resources

License

Stars

Watchers

Forks

Languages