Skip to content

Towards Open Domain Topic Classification

License

Notifications You must be signed in to change notification settings

CogComp/ZeroShotWiki

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code and data for NAACL 2022 Demo: Towards Open Domain Topic Classification

Requirements

To install required Python packages, run

pip install -r requirement.txt

We recommend to create a new conda environment of Python 3.7 to avoid potential incompatibility.

Using our models

To download datasets/models on this page, please right-click on the links to "Save link as ..."

Users can download our pre-trained ESA-WikiCate and TE-WikiCate models (model.zip), and evaluation data (data.zip). To evaluate ESA_WikiCate, for example, run

python src/eval_ESA.py --data_path data/yahoo/test.txt --label_path data/yahoo/label_names.txt --esa_path model/ESA_WikiCate --n_jobs N_JOBS

To evaluate TE-WikiCate, for example, run

CUDA_VISIBLE_DEVICES=xxx python src/eval_TE.py --data_path data/yahoo/test.txt --label_path data/yahoo/label_names.txt --model_path model/TE_WikiCate/ --simple_hypo

For the multi-labeled Situation dataset, add --multi_label to the above two commands.

Training

We provide the data for training TE-WikiCate (wikipedia.zip), which is about 12GB. To train your own model, extract the zip file under data directory and run

CUDA_VISIBLE_DEVICES=xxx python src/train_wiki_entailment.py --data_file data/wikipedia/tokenized_wiki.txt --cate_file data/wikipedia/page2content_cate.json --save_dir SAVE_DIR

About

Towards Open Domain Topic Classification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages