Skip to content

A python toolkit for machine learning on Chinese words.

License

Notifications You must be signed in to change notification settings

taedlar/wordcept

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

wordcept

A python toolkit for machine learning on Chinese words.

Word Segmentation Tool: dartfrog.py

To train the word segmentation tool on a corpus of segmented text, run:

dartfrog.py --fit TRAIN-DATA-FILE

To process raw text and produce segmented text, run:

dartfrog.py --transform INPUT-FILE OUTPUT-FILE

Performance

Dataset: SIGHAN Bakeoff 2005 F1 Recall OOV Recall
AS 0.928 0.935 0.390
CityU 0.911 0.927 0.388
MSRA 0.946 0.963 0.205
PKU 0.924 0.932 0.499

About

A python toolkit for machine learning on Chinese words.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages