Skip to content

THU-KEG/MAVEN-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MAVEN-dataset

Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".

Data

The dataset (ver. 1.0) can be obtained from Tsinghua Cloud or Google Drive. The data format is introduced in this document.

We also release the document topics for data analysis and model development. The docid2topic.json is to map the document ids to their EventWiki topic labels.

CodaLab

To get the test results, you can submit your predictions to our permanent CodaLab competition (the older version will be phased out soon). For the evaluation method, please refer to the evaluation script.

Codes

We release the source codes for the baselines, including DMCNN, BiLSTM, BiLSTM+CRF, MOGANED and DMBERT.

Citation

If these data and codes help you, please cite this paper.

@inproceedings{wang2020MAVEN,
  title={{MAVEN}: A Massive General Domain Event Detection Dataset},
  author={Wang, Xiaozhi and Wang, Ziqi and Han, Xu and Jiang, Wangyi and Han, Rong and Liu, Zhiyuan and Li, Juanzi and Li, Peng and Lin, Yankai and Zhou, Jie},
  booktitle={Proceedings of EMNLP 2020},
  year={2020}
}

About

Source code and dataset for EMNLP 2020 paper "MAVEN: A Massive General Domain Event Detection Dataset".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published