GitHub - INK-USC/ReQuest: Indirect Supervision for Relation Extraction Using Question-Answer Pairs (WSDM'18)

Relation Extraction with Question-Answer Pairs (ReQuest)

Source code and data for WSDM'18 paper Indirect Supervision for Relation Extraction Using Question-Answer Pairs.

Performance

Performance comparison with several relation extraction systems over KBP 2013 dataset (sentence-level extraction).

Method	Precision	Recall	F1
Mintz (our implementation, Mintz et al., 2009)	0.296	0.387	0.335
LINE + Dist Sup (Tang et al., 2015)	0.360	0.257	0.299
MultiR (Hoffmann et al., 2011)	0.325	0.278	0.301
FCM + Dist Sup (Gormley et al., 2015)	0.151	0.498	0.300
CoType-RM (Ren et al., 2017)	0.342	0.339	0.340
ReQuest (our model, [Wu et al., 2018])	0.386	0.410	0.397

Dependencies

We will take Ubuntu for example.

python 2.7
Python library dependencies

$ pip install pexpect ujson tqdm

stanford coreNLP 3.7.0 and its python wrapper. Please put the library under `ReQuest/code/DataProcessor/'.

$ cd code/DataProcessor/
$ git clone git@github.com:stanfordnlp/stanza.git
$ cd stanza
$ pip install -e .
$ wget http://nlp.stanford.edu/software/stanford-corenlp-full-2016-10-31.zip
$ unzip stanford-corenlp-full-2016-10-31.zip

eigen 3.2.5 (already included).

Data

We process (using our data pipeline) two public RE datasets to our JSON format. We ran Stanford NER on training set to detect entity mentions, and performed distant supervision using DBpediaSpotlight to assign type labels:

NYT (Riedel et al., 2011): 1.18M sentences sampled from 294K New York Times news articles. 395 sentences are manually annotated with 24 relation types and 47 entity types. (Download JSON)
Wiki-KBP: the training corpus contains 1.5M sentences sampled from 780k Wikipedia articles (Ling & Weld, 2012) plus ~7,000 sentences from 2013 KBP corpus. Test data consists of 14k mannually labeled sentences from 2013 KBP slot filling assessment results. It has 13 relation types and 126 entity types after filtering of numeric value-related relations. (Download JSON)

Please put the data files in corresponding subdirectories under ReQuest/data/source

We use the answer sentence selection dataset from TREC QA as our source of indirect supervision. We ran Stanford NER to extract entity mentions on both question and answer sentences and process the dataset into JSON format containing QA-pairs. Details of how we construct QA-pairs can be found in our paper.

We provide the processed qa.json file and it should be put into each data folder under ReQuest/data/source.

Makefile

To compile request.cpp under your own g++ environment

$ cd ReQuest/code/Model/request; make

Default Run & Parameters

Run ReQuest for the task of Relation Extraction on the Wiki-KBP dataset

Start the Stanford corenlp server for the python wrapper.

$ java -mx4g -cp "code/DataProcessor/stanford-corenlp-full-2016-10-31/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer

Feature extraction, embedding learning on training data, and evaluation on test data.

$ ./run_kbp.sh

The hyperparamters for embedding learning are included in the run_{dataname}.sh script.

Evaluation

Evaluates relation extraction performance (precision, recall, F1): produce predictions along with their confidence score; filter the predicted instances by tuning the thresholds.

$ python code/Evaluation/emb_test.py extract KBP request cosine 0.0
$ python code/Evaluation/tune_threshold.py extract KBP emb request cosine

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
code		code
data/source		data/source
LICENSE		LICENSE
README.md		README.md
run_kbp.sh		run_kbp.sh
run_nyt.sh		run_nyt.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

data/source

data/source

LICENSE

LICENSE

README.md

README.md

run_kbp.sh

run_kbp.sh

run_nyt.sh

run_nyt.sh

Repository files navigation

Relation Extraction with Question-Answer Pairs (ReQuest)

Performance

Dependencies

Data

Makefile

Default Run & Parameters

Evaluation

About

Releases

Packages

Contributors 2

Languages

License

INK-USC/ReQuest

Folders and files

Latest commit

History

Repository files navigation

Relation Extraction with Question-Answer Pairs (ReQuest)

Performance

Dependencies

Data

Makefile

Default Run & Parameters

Evaluation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages