SadedeGel prebuilt models are built to extend capabilities of sadedegel library on common NLP tasks we encounter every day. Such as sentiment analysis, profanity detection.
Open source prebuilt models are not designed to achieve state of the art accuracies. They rather provide a good starting
point by training sklearn based limited memory (partial_fit
using micro batches) models with a single pass over training data.
If you need access for our state of the art models, please reach us at info@sadedegel.ai
Classifier assigns each Turkish text into one of 12 categories (sadedegel.dataset.tscorpus.CATEGORIES
)
by using a sadedegel including pipeline
from sadedegel.prebuilt import news_classification
model = news_classification.load()
y_pred = model.predict([''])
To convert class ids to class labels use
from sadedegel.dataset.tscorpus import CATEGORIES
y_pred_label = [CATEGORIES.index(_y_pred) for _y_pred in y_pred]
Current prebuilt model has an average class prediction cv-3 accuracy of 0.746
This classifier assigns Turkish tweets to one of OFF
, NOT
classes based on whether a tweet contains a profane language or not, by using a sadedegel
pipeline.
from sadedegel.prebuilt import tweet_profanity
model = tweet_profanity.load() # Load latest version
y_pred = model.predict(['bir takım ağza alınmayacak sözcükler.'])
To convert predictions to profanity label by class mapping:
from sadedegel.prebuilt.tweet_profanity import load, CLASS_VALUES
model = load()
y_pred = model.predict(['bir takım ağza alınmayacak sözcükler.'])
y_pred_value = [CLASS_VALUES[_y_pred] for _y_pred in y_pred]
Current prebuilt tweet profanity model has an macro-F1 score of 0.6619
on test set.
Best model in SemEval-2020 Task 12 has
0.8258
accuracy
Classifier assigns each Turkish tweet texts into two classes ('POSITIVE', 'NEGATIVE') by using sadedegel built-in pipeline.
from sadedegel.prebuilt import tweet_sentiment
# We load our prebuilt model:
model = tweet_sentiment.load()
# Here we enter our text to get sentiment predictions.
y_pred = model.predict([])
Current prebuilt model has
- 3-fold cross validation F1 macro score of
mean 0.7946, std 0.0043)
. - 5-fold cross validation F1 macro score of
mean 0.7989, std 0.0055)