Skip to content

One Big Release

Pre-release
Pre-release
Compare
Choose a tag to compare
@husnusensoy husnusensoy released this 09 Oct 19:57
· 543 commits to master since this release

In one month time we have added lots into sadedegel library.

News

  • We have @doruktiktiklar as the first code contributor out of Global Maksimum AI team.

New Capabilities

  • ADD: Addition of Vocabulary and Token concepts into library
    • Token: singleton per word (case sensitive) to store unique token features (lower form, shape, document frequency, etc.)
    • New sadedegel-build-vocabulary to manage sadedegel vocabularies.

New Summarizers

  • ADD: TextRank Summarizer
    TextRank summarizer uses Google's PageRank algorithm based on distance/similarity defined by BERT embedding cosine distance/similarity (as of this release and more to come)
  • ADD: TFIDF Summarizer
    TFIDF Summarizer uses element sum of tfidf vector of a sentence as the relevance score of a sentence in a document.

Others

  • UPDATE: Some annotator consensus issues on summary corpus.
  • UPDATE: A better command-line for summarizer evaluation. Check sadedegel-summarize evaluate for more
  • ADD: Sentences level tf, idf and tfidf embeddings
  • ADD: Doc has tfidf_embeddings property similar to bert_embeddings property.

Documentation

Contribution Guidelines

  • ADD: Commit Guidelines
  • ADD: New Feature checklist

Feature Drop & Deprecation

  • DROP: Code quality guidelines is removed since Code Inspector limits the number of lines per open source project. We might continue with other providers later in the future.

  • DEPRECATED: Doc.sents will be removed by version 0.17

    • Use [i] to access ith sentences of a document
    • Doc object now implements __iter__ to let iterate over all sentences of a document.

Bugfix

  • Properly handle empty documents. Ex Doc("") or Doc('')