Skip to content

More Prebuilt Models

Compare
Choose a tag to compare
@husnusensoy husnusensoy released this 17 Mar 09:59
· 296 commits to master since this release

0.18 adds more prebuilt models into sadedegel library

News

  • Our main contributor @dafajon has implemented a new BM25Summarizer similary to TfIdf summarizer. BM25Summarizer outperforms slightly in short summaries.

  • We have packaged two new prebuilt models (Refer to README for model accuracies )

    1. tweeter profanity classification (sadedegel.prebuilt.tweet_profanity)
    2. tweeter sentiment classification (sadedegel.prebuilt.tweet_sentiment)
  • Change the way we report summarizer performance. Instead of a grid search of summarizer options, we now use a RandomSearch to decide optimal summarizer and parameters. Refer to README for details.

Feature Drop & Deprecation

  • sents property on Doc is dropped. use __iter__(Doc) instead.
  • tf property on Doc is deprecated (will be dropped by 0.18) in favor of get_tf function which gives a more flexible way to access document level tf vectors.
  • tfidf function on Doc is deprecated (will be dropped by 0.18) in favor of get_tfidf function which gives a more flexible way to access document level tf-idf vectors.
  • lexrank external dependency is dropped and LexRankPureSummarizer is renamed to be LexRankSummarizer
  • set_config, get_config, describe_config and get_all_configs are dropped in favor of new configuration implementation.

Others

  • tf property is now a part of TfImpl class using default configuration settings to yield a tf vector for a Doc or Sentence
  • We've updated documentation for our datasets.
  • idf property is now a part of IdfImp class using default configuration settings to yield a idf vector for a Doc or Sentence
  • More default parameters in default.ini based on our summarizer performance.