Skip to content

Tokenization models and training scripts for Vaporetto fast tokenizer

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

daac-tools/vaporetto-models

Repository files navigation

🚤 Vaporetto models

This repository provides word segmentation models available in the fast tokenizer Vaporetto, as well as a set of programs for creating each model.

Usage

Create the resources directory directly under the repository root, copy *.xml files contained in the BCCWJ M-XML directory and lex_3_1.csv contained in UniDic 3.1.1 into it, and then run build.sh in the models directory.

License

Licensed under either of

at your option.

Contribution

See the guidelines.

About

Tokenization models and training scripts for Vaporetto fast tokenizer

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

No packages published