Skip to content

Latest commit

 

History

History
38 lines (23 loc) · 823 Bytes

README.md

File metadata and controls

38 lines (23 loc) · 823 Bytes

MUSIED

Dataset and baselines for paper "MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts".

Data

The dataset can be obtained from the “data” folder. The data format is introduced in this document.

Data preprocess

Run preprocessing.py to obtain the sentence-level input of model. The result is saved in data directory.

├── data
│     └── train_sentence.json
│     └── dev_sentence.json
│     └── test_sentence.json

Codes

We release the source codes for the baselines, including

sentence-level models:

--DMCNN

--BiLSTM

--BERT

--C-BiLSTM

--DMBERT

document-level models

--HBTNGMA

--MLBiNet.