Releases: meilisearch/charabia
Releases · meilisearch/charabia
Tokenizer v0.2.8
Changes
- Changes related to the rebranding (#66)
- Update LICENSE (#67) @curquiza
- Small fix in
benches/
(#71) @Thearas - Setup lindera tokenizer for ja support ( related with #49 ) (#70) @miiton
- Benchmark and optimize japanese (#73) @ManyTheFish
- Decompose Japanese compound words (#75) @mosuka
- Update the dependencies (#80) @Kerollmops
Thanks again to @Kerollmops, @ManyTheFish, @Thearas, @curquiza, @miiton and @mosuka! 🎉
Tokenizer v0.2.7
Tokenizer v0.2.6
Changes
- Test Meilisearch issue 1714 (#58) @ManyTheFish
- Please exclude Hangul from is_cjk. (#60) @datamaker
- Add mapping between bytes in original word and normalized word (#59) @Samyak2
Thanks again to @ManyTheFish, @Samyak2, @datamaker and JB! 🎉
Tokenizer v0.2.5
Changes
- change ZeroRemover into ControlCharacterRemover (#55) @ManyTheFish
- Add a rustfmt config file into the project (#57) @Kerollmops
Thanks again to @Kerollmops, @ManyTheFish, and @curquiza! 🎉
Tokenizer v0.2.4
Changes
- Introduce a new default normalizer that removes zeroes from tokens (#52) @Kerollmops
Thanks again to @Kerollmops ! 🎉
Tokenizer v0.2.3
Changes
- Make legacy tokenizer handle unicode separators (#47) @ManyTheFish
Thanks again to @ManyTheFish! 🎉
Tokenizer v0.2.2
Changes
- Fix non-breaking space separator (#44) @shekhirin
Thanks again to @LegendreM, and @shekhirin! 🎉
Tokenizer v0.2.1
Changes
- Add release drafter files (#37) @curquiza
- Add bors (#41) @curquiza
- Fix separators: treat cyrillic chars as non-separators (#39) @shekhirin
Thanks again to @shekhirin! 🎉
use HMM feature on jieba
Merge pull request #23 from meilisearch/use-hmm-on-jieba Use hmm on jieba