Skip to content
Compare
Choose a tag to compare
@trungtv trungtv released this 17 May 09:24
· 6 commits to master since this release

New features:

  • Retrain a new tokenization model on a much bigger dataset. F1 score =0.985
  • Add training data and training code
  • Better integration to spacy.io (removing redundant spaces between tokens after tokenization. Eg. Việt Nam ,
    12 / 22 / 2020 => Việt Nam, 12/22/2020]