Skip to content

Minor Performance Enhancements & Tidy Up

Pre-release
Pre-release
Compare
Choose a tag to compare
@husnusensoy husnusensoy released this 07 Jan 23:13
· 454 commits to master since this release

In one month time we have added lots into sadedegel library.

News

  • We have resolved an old and major issue caused by improper from transformers import AutoTokenizer calls here and there and lazy loading sentence boundary detector (sbd). Just to given an idea:
    • sadedegel config CLI call to show sadedegel configuration took 11 sec in 0.16.1.1 release whereas 2 sec in 0.16.2.1+
    • from sadedegel import Doc call (which is usually the first one to start working with sadedegel) took 9.5 sec in 0.16.1.1 release whereas 1 sec in 0.16.2.1+

Feature Drop & Deprecation

  • Old configuration capabilities are deprecated (this time unfortunately without prior warnings in earlier releases)
    • DeprecationWarning is the indication that you do access one of such APIs which will completely be removed by 0.18
    • Please use new API config_context (tf_context and idf_context are just simplified wrappers)

Documentation

  • CONFIG.md details the configuration of sadedegel.

Others

  • __getitem__ function to access any token of a Sentence
  • Iterator on Sentence yields all Tokens in order.
  • default tf method is now log_norm instead of binary thanks to @dafajon's most recent summarizer experiments.