Skip to content

Direction to General Purpose NLP Library for Turkish

Compare
Choose a tag to compare
@husnusensoy husnusensoy released this 17 Mar 09:40
· 388 commits to master since this release

0.17 release introduces several non summarisation related NLP capabilities in Sadegel

News

  • Starting with this release, sadedegel now ships prebuilt models for various basic NLP tasks. The purpose is to allow developers to load & use those models with minimal configuration.
    • Our first model is a news classifier (Thanks Taner Sezer for his corpus support)
  • We report accuracy of our tokenizers (word) for potential enhancement points in future releases (Thanks Taner Sezer for his corpus support)
  • To support the development of prebuilt models, sklearn compatiblle extension.sklearn module is introduced for feature engineering
  • Token.is_stopwordis added to flag stopword token types.
  • LexRankSummarizer (based on lexrank external module, to be deprecate in future releases) and LexRankPureSummarizer (pure sadedegel version of the same method) is added into set of extractive summarizers.

Feature Drop & Deprecation

  • sents property on Doc is dropped. use __iter__(Doc) instead.
  • tf property on Doc is deprecated (will be dropped by 0.18) in favor of get_tf function which gives a more flexible way to access document level tf vectors.
  • tfidf function on Doc is deprecated (will be dropped by 0.18) in favor of get_tfidf function which gives a more flexible way to access document level tf-idf vectors.

Others

  • We have pushed up TF and IDF implementations from Sentence and Doc to separate classes using python multiple inheritance support to reduce code duplication.