Skip to content

v4.5.1: Bugfixes

Compare
Choose a tag to compare
@AngledLuffa AngledLuffa released this 30 Aug 04:13
· 301 commits to main since this release

CoreNLP 4.5.1

Bugfixes!

  • Fix tokenizer regression: 4.5.0 will tokenize ",5" as one word 974383a
  • Use a LinkedHashMap in the PTBTokenizer instead of Properties. Keeps the option processing order predictable. #1289 6550188
  • Fix \r\n not being properly processed on Windows: #1291 9889f4e
  • Handle one half of surrogate character pairs in the tokenizer w/o crashing #1298 1b12faa
  • Attempt to fix semgrex "Unknown vertex" errors which have plagued CoreNLP for years in hard to track down circumstances: #1296 #1229 #1169 f99b5ab