Skip to content

Regular Expression based Simple Word Tokenizer & Code Quality

Pre-release
Pre-release
Compare
Choose a tag to compare
@husnusensoy husnusensoy released this 05 Sep 21:38
· 656 commits to master since this release
  • ADD: Major change of this release is Simple word tokenizer implementation by @dafajon after seeing the issues with BERT Tokenizer. Note that simple tokenizer is still experimental and not compatible with all summarizers (Cluster based summarizer automatically switch to BERT Tokenizer in order to be able to utilize BERT embeddings)
  • ADD: Introduction of sadedgel.set_config to modify some sadedegel configurations. Such as word tokenizer.
  • ADD: tags are added to ExtractiveSummarizer in order to filter them out (in evaluation etc.) easily.
  • ADD: Thanks to Code Inspector sadedeGel is under constant code quality monitoring with an intial grade of A (Score 94). We will keep it high as much as we can as the capabilities of the library grows.
  • CHANGE: Downgrade sklearn dependency back to 0.23.1 to prevent serialization compatibility warnings.
  • CHANGE: Score normalization of summarizers push up to parent abstract class ExtractiveSummarizer, improving code quality by reducing repetitive code blocks.