Skip to content

leileigan/SAN-CWS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Investigate self-attention network for Chinese Word Segmentation

Investigate self-attention network for Chinese Word Segmentation.

Models and results can be found at our paper Investigate self-attention network for Chinese Word Segmentation.

Requirement:

Python: 3.6.2  
PyTorch: 1.0.1 

Input format:

CoNLL format (prefer BMES tag scheme), with each character its label for one line. Sentences are splited with a null line.

中 B-SEG
国 E-SEG
最 B-SEG
大 E-SEG
氨 B-SEG
纶 M-SEG
丝 E-SEG
生 B-SEG
产 E-SEG
基 B-SEG
地 E-SEG
在 S-SEG
连 B-SEG
云 M-SEG
港 E-SEG
建 B-SEG
成 E-SEG

新 B-SEG
华 M-SEG
社 E-SEG
北 B-SEG
京 E-SEG
十 B-SEG
二 M-SEG
月 E-SEG
二 B-SEG
十 M-SEG
六 M-SEG
日 E-SEG
电 S-SEG

Pretrained Embeddings:

How to run the code?

  1. Download the character embeddings, character bigram embeddings and set their directories in main.py.
  2. Modify the run_seg.sh by adding your train/dev/test file directory.
  3. sh run_seg.sh

Cite:

Cite our paper as:

@article{gan2019investigating,
  title={Investigating Self-Attention Network for Chinese Word Segmentation},
  author={Gan, Leilei and Zhang, Yue},
  journal={arXiv preprint arXiv:1907.11512},
  year={2019}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published