Skip to content

Commit

Permalink
Merge pull request #594 from deeppavlov/dev
Browse files Browse the repository at this point in the history
Release v1.12.0
  • Loading branch information
dilyararimovna committed Dec 27, 2023
2 parents 3b8f2b7 + b6f6726 commit 8dab819
Show file tree
Hide file tree
Showing 275 changed files with 12,427 additions and 262 deletions.
34 changes: 18 additions & 16 deletions MODELS.md

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions README.md
Expand Up @@ -72,6 +72,13 @@ and the provided information will be used in LLM-powered reply generation as a p

# Quick Start

### System Requirements

- Operating System: Ubuntu 18.04+, Windows 10+ (через WSL \& WSL2), MacOS Big Sur;
- Version of `docker` from 20 and above;
- Version of `docker-compose` v1.29.2;
- Operative Memory from 2 Gb (using proxy), from 4 Gb (LLM-based prompted distributions) and from 20 Gb (old scripted distributions).

### Clone the repo

```
Expand Down
61 changes: 36 additions & 25 deletions README_ru.md
Expand Up @@ -65,6 +65,15 @@ Deepy GoBot Base содержит аннотатор исправления оп

# Quick Start


### System Requirements

- Операционная система Ubuntu 18.04+, Windows 10+ (через WSL \& WSL2), MacOS Big Sur;
- Версия docker от 20 и выше;
- Версия docker-compose v1.29.2;
- Оперативная память от 2 гигабайт (при использовании прокси контейнеров), от 4 гигабайт (при использовании дистрибутивов на основе БЯМ) и от 20 гигабайт (при использовании сценарных дистрибутивов).


### Склонируйте репозиторий

```
Expand Down Expand Up @@ -189,33 +198,35 @@ docker-compose -f docker-compose.yml -f assistant_dists/dream/docker-compose.ove

## Annotators

| Name | Requirements | Description |
|------------------------|-------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Badlisted Words | 50 MB RAM | detects obscene Russian words from the badlist |
| Entity Detection | 5.5 GB RAM | extracts entities and their types from utterances |
| Entity Linking | 400 MB RAM | finds Wikidata entity ids for the entities detected with Entity Detection |
| Fact Retrieval | 6.5 GiB RAM, 1 GiB GPU | Аннотатор извлечения параграфов Википедии, релевантных истории диалога. |
| Intent Catcher | 900 MB RAM | classifies user utterances into a number of predefined intents which are trained on a set of phrases and regexps |
| NER | 1.7 GB RAM, 4.9 GB GPU | extracts person names, names of locations, organizations from uncased text using ruBert-based (pyTorch) model |
| Sentseg | 2.4 GB RAM, 4.9 GB GPU | recovers punctuation using ruBert-based (pyTorch) model and splits into sentences |
| Spacy Annotator | 250 MB RAM | token-wise annotations by Spacy |
| Spelling Preprocessing | 8 GB RAM | Russian Levenshtein correction model |
| Toxic Classification | 3.5 GB RAM, 3 GB GPU | Toxic classification model from Transformers specified as PRETRAINED_MODEL_NAME_OR_PATH |
| Wiki Parser | 100 MB RAM | extracts Wikidata triplets for the entities detected with Entity Linking |
| DialogRPT | 3.8 GB RAM, 2 GB GPU | DialogRPT model which is based on [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) and fine-tuned on Russian Pikabu Comment sequences |
| Name | Requirements | Description |
|----------------------------|------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Badlisted Words | 50 MB RAM | detects obscene Russian words from the badlist |
| Entity Detection | 5.5 GB RAM | extracts entities and their types from utterances |
| Entity Linking | 400 MB RAM | finds Wikidata entity ids for the entities detected with Entity Detection |
| Fact Retrieval | 6.5 GiB RAM, 1 GiB GPU | Аннотатор извлечения параграфов Википедии, релевантных истории диалога. |
| Intent Catcher | 900 MB RAM | classifies user utterances into a number of predefined intents which are trained on a set of phrases and regexps |
| NER | 1.7 GB RAM, 4.9 GB GPU | extracts person names, names of locations, organizations from uncased text using ruBert-based (pyTorch) model |
| Relative Persona Extractor | 50 MB RAM | Annotator utilizing Sentence Ranker to rank persona sentences and selecting `N_SENTENCES_TO_RETURN` the most relevant sentences |
| Sentseg | 2.4 GB RAM, 4.9 GB GPU | recovers punctuation using ruBert-based (pyTorch) model and splits into sentences |
| Spacy Annotator | 250 MB RAM | token-wise annotations by Spacy |
| Spelling Preprocessing | 8 GB RAM | Russian Levenshtein correction model |
| Toxic Classification | 3.5 GB RAM, 3 GB GPU | Toxic classification model from Transformers specified as PRETRAINED_MODEL_NAME_OR_PATH |
| Wiki Parser | 100 MB RAM | extracts Wikidata triplets for the entities detected with Entity Linking |
| DialogRPT | 3.8 GB RAM, 2 GB GPU | DialogRPT model which is based on [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) and fine-tuned on Russian Pikabu Comment sequences |

## Skills & Services
| Name | Requirements | Description |
|----------------------|--------------------------|-------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 2.8 GB RAM, 2 GB GPU | [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) |
| Dummy Skill | | a fallback skill with multiple non-toxic candidate responses and random Russian questions |
| Personal Info Skill | 40 MB RAM | queries and stores user's name, birthplace, and location |
| DFF Generative Skill | 50 MB RAM | **[New DFF version]** generative skill which uses DialoGPT service to generate 3 different hypotheses |
| DFF Intent Responder | 50 MB RAM | provides template-based replies for some of the intents detected by Intent Catcher annotator |
| DFF Program Y Skill | 80 MB RAM | **[New DFF version]** Chatbot Program Y (https://github.com/keiffster/program-y) adapted for Dream socialbot |
| DFF Friendship Skill | 70 MB RAM | **[New DFF version]** DFF-based skill to greet the user in the beginning of the dialog, and forward the user to some scripted skill |
| DFF Template Skill | 50 MB RAM | **[New DFF version]** DFF-based skill that provides an example of DFF usage |
| Text QA | 3.8 GiB RAM, 5.2 GiB GPU | Навык для ответа на вопросы по тексту. |
| Name | Requirements | Description |
|-----------------------|--------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| DialoGPT | 2.8 GB RAM, 2 GB GPU | [Russian DialoGPT by DeepPavlov](https://huggingface.co/DeepPavlov/rudialogpt3_medium_based_on_gpt2_v2) |
| Dummy Skill | | a fallback skill with multiple non-toxic candidate responses and random Russian questions |
| Personal Info Skill | 40 MB RAM | queries and stores user's name, birthplace, and location |
| DFF Generative Skill | 50 MB RAM | **[New DFF version]** generative skill which uses DialoGPT service to generate 3 different hypotheses |
| DFF Intent Responder | 50 MB RAM | provides template-based replies for some of the intents detected by Intent Catcher annotator |
| DFF Program Y Skill | 80 MB RAM | **[New DFF version]** Chatbot Program Y (https://github.com/keiffster/program-y) adapted for Dream socialbot |
| DFF Friendship Skill | 70 MB RAM | **[New DFF version]** DFF-based skill to greet the user in the beginning of the dialog, and forward the user to some scripted skill |
| DFF Template Skill | 50 MB RAM | **[New DFF version]** DFF-based skill that provides an example of DFF usage |
| Seq2seq Persona-based | 1.5 GB RAM, 1.5 GB GPU | generative service based on Transformers seq2seq model, the model was pre-trained on the PersonaChat dataset to generate a response conditioned on a several sentences of the socialbot's persona |
| Text QA | 3.8 GiB RAM, 5.2 GiB GPU | Навык для ответа на вопросы по тексту. |



Expand Down
14 changes: 11 additions & 3 deletions annotators/BadlistedWordsDetector/README.md
Expand Up @@ -4,8 +4,16 @@
Spacy-based user utterance annotator that detects words and phrases from the badlist

## I/O
input: "sentences": ["fucking hell", "he mishit the shot", "you asshole"],
output: words and their tags
[{"bad_words": True}, {"bad_words": False}, {"bad_words": True}]
**Input:** a list of user's utterances
```
["fucking hell", "he mishit the shot", "you asshole"]
```

**Output:** words and their tags
```
[{"bad_words": True}, {"bad_words": False}, {"bad_words": True}]
```


## Dependencies
none
20 changes: 16 additions & 4 deletions annotators/BadlistedWordsDetector_ru/README.md
@@ -1,11 +1,23 @@
# BadlistedWordsDetector
# BadlistedWordsDetector for Russian

## Description

Spacy-based user utterance annotator that detects words and phrases from the badlist. This version of the annotator works for the Russian Language.
Spacy-based user utterance annotator that detects words and phrases from the badlist.

This version of the annotator works for the Russian Language.

## I/O
input: user input as a str, lang = ru
output: json dict
**Input:**
Takes a list of user's utterances
```
["не пизди.", "застрахуйте уже его", "пошел нахер!"]
```

**Output:**
Returns words and their tags
```
[{"bad_words": True}, {"bad_words": False}, {"bad_words": True}]
```

## Dependencies
none
102 changes: 102 additions & 0 deletions annotators/COMeT/README.md
Expand Up @@ -5,6 +5,7 @@
COMeT is a Commonsense Transformers for Automatic Knowledge Graph Construction service based
on [comet-commonsense](https://github.com/atcbosselut/comet-commonsense) framework written in Python 3.


### Quickstart from docker for COMeT with Atomic graph

```bash
Expand Down Expand Up @@ -36,6 +37,107 @@ docker-compose -f docker-compose.yml -f local.yml exec comet-conceptnet bash tes
| Average starting time | 4s | 3s |
| Average request execution time | 0.4s | 0.2s |

## Input/Output

**Input**
- hypotheses: possible assistant's replies
- currentUtterance: latest reply from a user
- pastResponses: a list of user's utterances

an input example ():
```
{
"input": "PersonX went to a mall",
"category": [
"xReact",
"xNeed",
"xAttr",
"xWant",
"oEffect",
"xIntent",
"oReact"
]
}
```
**Output**
a list of probabilities about the utterance based on categories:
- xReact
- xNeed
- xAttr
- xWant
- oEffect
- xIntent
- oReact

an output example ():
```
"xReact": {
"beams": [
"satisfied",
"happy",
"excited"
],
"effect_type": "xReact",
"event": "PersonX went to a mall"
},
"xNeed": {
"beams": [
"to drive to the mall",
"to get in the car",
"to drive to the mall"
],
"effect_type": "xNeed",
"event": "PersonX went to a mall"
},
"xAttr": {
"beams": [
"curious",
"fashionable",
"interested"
],
"effect_type": "xAttr",
"event": "PersonX went to a mall"
},
"xWant": {
"beams": [
"to buy something",
"to go home",
"to shop"
],
"effect_type": "xWant",
"event": "PersonX went to a mall"
},
"oEffect": {
"beams": [
"they go to the store",
"they go to the mall"
],
"effect_type": "oEffect",
"event": "PersonX went to a mall"
},
"xIntent": {
"beams": [
"to buy something",
"to shop",
"to buy things"
],
"effect_type": "xIntent",
"event": "PersonX went to a mall"
},
"oReact": {
"beams": [
"happy",
"interested"
],
"effect_type": "oReact",
"event": "PersonX went to a mall"
}
}
```



## Dependencies

none
22 changes: 22 additions & 0 deletions annotators/ConversationEvaluator/README.md
@@ -0,0 +1,22 @@
# Conversation Evaluator

## Description
This annotator is trained on the Alexa Prize data from the previous competitions and predicts whether the candidate response is interesting, comprehensible, on-topic, engaging, or erroneous.

## Input/Output

**Input**
- possible assistant's replies
- user's past responses
**Output**
tags
- `isResponseComprehensible`
- `isResponseErroneous`
- `isResponseInteresting`
- `isResponseOnTopic`
- `responseEngagesUser`

with their probabilities

## Dependencies
none
16 changes: 15 additions & 1 deletion annotators/DeepPavlovEmotionClassification/README.md
@@ -1 +1,15 @@
BERT Base model for emotion classification which learned at the custom dataset(described more precisely in our article)
# DeepPavlov Emotion Classification Annotator

## Description

BERT Base model for emotion classification

## I/O

**Input**

**Output:**


## Dependencies
none
9 changes: 9 additions & 0 deletions annotators/DeepPavlovFactoidClassification/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
4 changes: 3 additions & 1 deletion annotators/IntentCatcherTransformers/README.md
@@ -1,5 +1,7 @@
## IntentCatcher based on Transformers
## Intent Catcher based on Transformers

Intent Catcher Annotator allows to adapt the dialog system to particular tasks.
The annotator detects intents of the user that are addressed by the DFF Intent Responder Skill.

English version was trained on `intent_phrases.json` dataset using `DeepPavlov` library via command:
```
Expand Down
28 changes: 28 additions & 0 deletions annotators/NER/README.md
@@ -0,0 +1,28 @@
# Title
Named Entity Recognition Annotator

## Description
Extracts people names, locations and names of organizations from an uncased text

## Input/Output

**Input**
A list of user utterances
```
["john peterson is my brother.", "he lives in New York."]
```


**Output**
A user utterance annotated by
- confidence level
- named entity's position in a sentence (`start_pos` and `end_pos`)
- the named the entity itself
- the named entity type

```
[{"confidence": 1, "end_pos": 5, "start_pos": 3, "text": "New York", "type": "LOC"}],
```

## Dependencies
none
9 changes: 9 additions & 0 deletions annotators/NER_deeppavlov/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
9 changes: 9 additions & 0 deletions annotators/SentRewrite/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
9 changes: 9 additions & 0 deletions annotators/SentSeg/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
7 changes: 7 additions & 0 deletions annotators/asr/README.md
Expand Up @@ -5,6 +5,13 @@
ASR component allows users to provide speech input via its `http://_service_name_:4343/asr?user_id=` endpoint. To do so, attach the recorded voice as a `.wav` file, 16KHz.

## I/O
**Input:**
user utterance: recorded voice as a `.wav` file

**Output**
asr_confidence: a probability of a user speech recognition


## Dependencies
none

0 comments on commit 8dab819

Please sign in to comment.