Annotators

Overview

Annotators are components (connectors/services) that annotate a given user's utterance.

An example of an annotator is NER: this annotator may return a dictionary with tokens and tags keys:

{"tokens": ["Paris"], "tags": ["I-LOC"]}

Another example is Sentiment Classification annotator. It can return a list of labels, e.g.:

["neutral", "speech"]

Available English Annotators

Name	Requirements	Description
ASR	40 MB RAM	calculates overall ASR confidence for a given utterance and grades it as either very low, low, medium, or high (for Amazon markup)
Badlisted Words	150 MB RAM	detects words and phrases from the badlist
Combined Classification	1.5 GB RAM, 3.5 GB GPU	BERT-based model including topic classification, dialog acts classification, sentiment, toxicity, emotion, factoid classification
COMeT Atomic	2 GB RAM, 1.1 GB GPU	Commonsense prediction models COMeT Atomic
COMeT ConceptNet	2 GB RAM, 1.1 GB GPU	Commonsense prediction models COMeT ConceptNet
Convers Evaluator Annotator	1 GB RAM, 4.5 GB GPU	is trained on the Alexa Prize data from the previous competitions and predicts whether the candidate response is interesting, comprehensible, on-topic, engaging, or erroneous
Emotion Classification	2.5 GB RAM	emotion classification annotator
Entity Detection	1.5 GB RAM, 3.2 GB GPU	extracts entities and their types from utterances
Entity Linking	2.5 GB RAM, 1.3 GB GPU	finds Wikidata entity ids for the entities detected with Entity Detection
Entity Storer	220 MB RAM	a rule-based component, which stores entities from the user's and socialbot's utterances if opinion expression is detected with patterns or MIDAS Classifier and saves them along with the detected attitude to dialogue state
Fact Random	50 MB RAM	returns random facts for the given entity (for entities from user utterance)
Fact Retrieval	7.4 GB RAM, 1.2 GB GPU	extracts facts from Wikipedia and wikiHow
Intent Catcher	1.7 GB RAM, 2.4 GB GPU	classifies user utterances into a number of predefined intents which are trained on a set of phrases and regexps
KBQA	2 GB RAM, 1.4 GB GPU	answers user's factoid questions based on Wikidata KB
MIDAS Classification	1.1 GB RAM, 4.5 GB GPU	BERT-based model trained on a semantic classes subset of MIDAS dataset
MIDAS Predictor	30 MB RAM	BERT-based model trained on a semantic classes subset of MIDAS dataset
NER	2.2 GB RAM, 5 GB GPU	extracts person names, names of locations, organizations from uncased text
News API Annotator	80 MB RAM	extracts the latest news about entities or topics using the GNews API. DeepPavlov Dream deployments utilize our own API key.
Personality Catcher	30 MB RAM	the skill is to change the system's personality description via chatting interface, it works as a system command, the response is system-like message
Prompt Selector	50 MB RAM	Annotator utilizing Sentence Ranker to rank prompts and selecting `N_SENTENCES_TO_RETURN` most relevant prompts (based on questions provided in prompts)
Property Extraction	6.3 GiB RAM	extracts user attributes from utterances
Rake Keywords	40 MB RAM	extracts keywords from utterances with the help of RAKE algorithm
Relative Persona Extractor	50 MB RAM	Annotator utilizing Sentence Ranker to rank persona sentences and selecting `N_SENTENCES_TO_RETURN` the most relevant sentences
Sentrewrite	200 MB RAM	rewrites user's utterances by replacing pronouns with specific names that provide more useful information to downstream components
Sentseg	1 GB RAM	allows us to handle long and complex user's utterances by splitting them into sentences and recovering punctuation
Spacy Nounphrases	180 MB RAM	extracts nounphrases using Spacy and filters out generic ones
Speech Function Classifier	1.1 GB RAM, 4.5 GB GPU	a hierarchical algorithm based on several linear models and a rule-based approach for the prediction of speech functions described by Eggins and Slade
Speech Function Predictor	1.1 GB RAM, 4.5 GB GPU	yields probabilities of speech functions that can follow a speech function predicted by Speech Function Classifier
Spelling Preprocessing	50 MB RAM	pattern-based component to rewrite different colloquial expressions to a more formal style of conversation
Topic Recommendation	40 MB RAM	offers a topic for further conversation using the information about the discussed topics and user's preferences. Current version is based on Reddit personalities (see Dream Report for Alexa Prize 4).
Toxic Classification	3.5 GB RAM, 3 GB GPU	Toxic classification model from Transformers specified as PRETRAINED_MODEL_NAME_OR_PATH
User Persona Extractor	40 MB RAM	determines which age category the user belongs to based on some key words
Wiki Parser	100 MB RAM	extracts Wikidata triplets for the entities detected with Entity Linking
Wiki Facts	1.7 GB RAM	model that extracts related facts from Wikipedia and WikiHow pages

Available Russian Annotators

Name	Requirements	Description
Badlisted Words	50 MB RAM	detects obscene Russian words from the badlist
Entity Detection	5.5 GB RAM	extracts entities and their types from utterances
Entity Linking	400 MB RAM	finds Wikidata entity ids for the entities detected with Entity Detection
Fact Retrieval	6.5 GiB RAM, 1 GiB GPU	Аннотатор извлечения параграфов Википедии, релевантных истории диалога.
Intent Catcher	900 MB RAM	classifies user utterances into a number of predefined intents which are trained on a set of phrases and regexps
NER	1.7 GB RAM, 4.9 GB GPU	extracts person names, names of locations, organizations from uncased text using ruBert-based (pyTorch) model
Sentseg	2.4 GB RAM, 4.9 GB GPU	recovers punctuation using ruBert-based (pyTorch) model and splits into sentences
Spacy Annotator	250 MB RAM	token-wise annotations by Spacy
Spelling Preprocessing	8 GB RAM	Russian Levenshtein correction model
Toxic Classification	3.5 GB RAM, 3 GB GPU	Toxic classification model from Transformers specified as PRETRAINED_MODEL_NAME_OR_PATH
Wiki Parser	100 MB RAM	extracts Wikidata triplets for the entities detected with Entity Linking
DialogRPT	3.8 GB RAM, 2 GB GPU	DialogRPT model which is based on Russian DialoGPT by DeepPavlov and fine-tuned on Russian Pikabu Comment sequences

Developing Own Annotator

TBD

Resources

Annotators @ ReadTheDocs

Provide feedback

Saved searches