Corpus-based Set Expansion (CaSE)

This repo contains the scripts of experiments for SIGIR '19 short paper, "Corpus-based Set Expansion with Lexical Features and Distributed Representations".

In order to re-create similar results of the CaSE model, run the whole pipeline; otherwise, if you are just interested in the algorithm itself, please just look at 09-empirical.py

data preparation

Prior to running the pipeline, please pre-process raw corpora in the similar fashion as SetExpan does. Place the corpora data in the following structure:

data
|
|------ dataset1 
|          |------ source
|          |------ intermediate
|------ dataset2
|          |------ source
|          |------ intermediate
...
|------ eval

intermediate folder should contain the output files of SetExpan's preprocessing module, as well as some intermediate data files this pipeline will generate.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
SetExpan		SetExpan
data/eval/queries		data/eval/queries
.gitignore		.gitignore
01-reduce-large-file.py		01-reduce-large-file.py
02-data-process.py		02-data-process.py
03-skipgram2entity.py		03-skipgram2entity.py
04-clean-evaluation-set.py		04-clean-evaluation-set.py
05-build-queries.py		05-build-queries.py
06-reduce-input-setexpan.py		06-reduce-input-setexpan.py
07-reduce-word2vec.py		07-reduce-word2vec.py
08-w2v_bert-baseline.py		08-w2v_bert-baseline.py
09-empirical.py		09-empirical.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SetExpan

SetExpan

data/eval/queries

data/eval/queries

.gitignore

.gitignore

01-reduce-large-file.py

01-reduce-large-file.py

02-data-process.py

02-data-process.py

03-skipgram2entity.py

03-skipgram2entity.py

04-clean-evaluation-set.py

04-clean-evaluation-set.py

05-build-queries.py

05-build-queries.py

06-reduce-input-setexpan.py

06-reduce-input-setexpan.py

07-reduce-word2vec.py

07-reduce-word2vec.py

08-w2v_bert-baseline.py

08-w2v_bert-baseline.py

09-empirical.py

09-empirical.py

README.md

README.md

Repository files navigation

Corpus-based Set Expansion (CaSE)

data preparation

About

Releases

Packages

Languages

PxYu/entity-expansion

Folders and files

Latest commit

History

Repository files navigation

Corpus-based Set Expansion (CaSE)

data preparation

About

Resources

Stars

Watchers

Forks

Languages