Skip to content

Releases: mingruimingrui/fast-mosestokenizer

Version 0.0.8.2

29 Oct 11:15
Compare
Choose a tag to compare

Bugfixes

  • Fixed underflow error in detokenization
  • Fixed underflow error in trim function

Version 0.0.8.1

14 Aug 17:19
Compare
Choose a tag to compare
Version 0.0.8.1 Pre-release
Pre-release

Changes

  • other_letters option exposed in python API.

Version 0.0.8

13 Aug 12:45
Compare
Choose a tag to compare
Version 0.0.8 Pre-release
Pre-release

Changes

  • Segmentation by \p{So} not automatically enabled.
  • The performance of \p{So} segmentation drastically improved.

Version 0.0.7.2

06 Aug 13:24
Compare
Choose a tag to compare
Version 0.0.7.2 Pre-release
Pre-release

Hotfix

Fixed regex.

Version 0.0.7.1

06 Aug 13:19
Compare
Choose a tag to compare
Version 0.0.7.1 Pre-release
Pre-release

Hotfix

Hotfix for other_letters since they might contain nonspacing mark.

Version 0.0.6

06 Aug 09:41
Compare
Choose a tag to compare
Version 0.0.6 Pre-release
Pre-release

Features

Improved tokenization rules for Logogram languages

Version 0.0.5

01 Aug 10:07
Compare
Choose a tag to compare
Version 0.0.5 Pre-release
Pre-release

Features

  • Installation of the C++ library and command-line tools can finally be done using make install
  • make build-cli has been changed to make build

Bug fixes

  • Capture case where in_num_p is not switched off.
    • Before: "文字123汉语" -> ["文字", "123", "汉", "语"]
    • After: "文字123汉语" -> ["文字", "123", "汉语"]

Todo

  • To determine how characters belonging to the "other letters" category
    should be handled by the tokenizer.
  • Reduce the number of flags.
    • Remove those out of the scope of this package. Eg. lowercase
    • Or adds unnecessary bloat to the logic. Eg. url handling

Version 0.0.4

17 Jul 19:46
Compare
Choose a tag to compare
Version 0.0.4 Pre-release
Pre-release
  • Fixed detokenization for "@-@"
  • Now builds Linux images using base ubuntu:16.04

Version 0.0.3

15 Jul 04:06
Compare
Choose a tag to compare
Version 0.0.3 Pre-release
Pre-release

Fix for github workflow.

Version 0.0.2

14 Jul 18:02
Compare
Choose a tag to compare
Version 0.0.2 Pre-release
Pre-release
  • Build static libs locally
  • Build python packages with static lib