One Big Release

Pre-release

Pre-release

husnusensoy released this 09 Oct 19:57

· 543 commits to master since this release

In one month time we have added lots into sadedegel library.

News

We have @doruktiktiklar as the first code contributor out of Global Maksimum AI team.

New Capabilities

ADD: Addition of Vocabulary and Token concepts into library
- Token: singleton per word (case sensitive) to store unique token features (lower form, shape, document frequency, etc.)
- New sadedegel-build-vocabulary to manage sadedegel vocabularies.

New Summarizers

ADD: TextRank Summarizer
TextRank summarizer uses Google's PageRank algorithm based on distance/similarity defined by BERT embedding cosine distance/similarity (as of this release and more to come)
ADD: TFIDF Summarizer
TFIDF Summarizer uses element sum of tfidf vector of a sentence as the relevance score of a sentence in a document.

Others

UPDATE: Some annotator consensus issues on summary corpus.
UPDATE: A better command-line for summarizer evaluation. Check sadedegel-summarize evaluate for more
ADD: Sentences level tf, idf and tfidf embeddings
ADD: Doc has tfidf_embeddings property similar to bert_embeddings property.

Documentation

ADD: Youtube webinar videos (in Turkish) on sadedeGel YouTube Channel

Contribution Guidelines

ADD: Commit Guidelines
ADD: New Feature checklist

Feature Drop & Deprecation

DROP: Code quality guidelines is removed since Code Inspector limits the number of lines per open source project. We might continue with other providers later in the future.
DEPRECATED: Doc.sents will be removed by version 0.17
- Use [i] to access ith sentences of a document
- Doc object now implements __iter__ to let iterate over all sentences of a document.

Bugfix

Properly handle empty documents. Ex Doc("") or Doc('')

Assets 2