Skip to content

v4.5.6: Lemmatizer & Tokenizer bugfixes

Compare
Choose a tag to compare
@AngledLuffa AngledLuffa released this 01 Feb 20:39
· 33 commits to main since this release

English Lemmatizer upgrades

  • enroll, appall as American spellings, instead of enrol & appal. de- as a verb prefix, blog and xfer as double letter exceptions 8adcbfe
  • cowritten 2dd08da
  • elder / eldest 9b5bec8
  • Yazidi as a demonym 2852da8

Tokenizer upgrades

  • #number as a single thing after an abbreviation #1396 ad37f2a

UD Processing upgrades

  • 'twas and 'tis as MWT in the UD converter b9f19a6
  • Sort morpho features in alphabetical order when writing out UD
    f77a9b4

Other Bugfixes

  • Crash when deleting the endpoints of an IntervalTree #1405 6d17c23
  • Find and remove extraneous uses of yield, which became a keyword: e5c9d44 b084233

Minor API change

  • Updating the text on a CoreLabel no longer wipes out the Lemma c03522b
  • Update to more recent Jakarta Servlet 8a671fd

Ssurgeon

  • UpdateMorphoFeatures edit 27c6703
  • Lemmatize operation (only works on English) c26b25e