Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat dists readmes #596

Open
wants to merge 28 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
1f7480a
first batch of readmes to the key services, TBC
nstsj Jul 5, 2023
871fa3e
readmes for dream multilingual dist
nstsj Jul 8, 2023
cb96cbd
main distribution readmes upd, TBC
nstsj Jul 8, 2023
cd3fef8
first batch of readmes to the key services, TBC
nstsj Jul 5, 2023
c0ed857
readmes for dream multilingual dist
nstsj Jul 8, 2023
1b76ab2
main distribution readmes upd, TBC
nstsj Jul 8, 2023
bb8ca29
readmes upd
nstsj Jul 10, 2023
4f52edc
fixed readmes, those old ones with wrong format
nstsj Jul 10, 2023
7f05210
some more readmes
nstsj Jul 10, 2023
99be99d
added readme templates for the components that hadn't have them yet. …
nstsj Jul 26, 2023
3bcb9bd
readme upd for skills -- templates added
nstsj Oct 25, 2023
b771590
readme upd for skills -- templates added
nstsj Oct 25, 2023
1f3be1f
fixed readmes
nstsj Jul 27, 2023
6fc1aff
upd readmes in skill selectors and response selectors
nstsj Jul 29, 2023
e4a467a
upd in dependencies and I/O
nstsj Aug 4, 2023
7d7e41a
readmes for dream multilingual dist
nstsj Jul 8, 2023
889d8b1
readme upd - rebasing the branch after pulling the fresh dev
nstsj Oct 25, 2023
9762bdf
readme upd - rebasing the branch after pulling the fresh dev
nstsj Oct 25, 2023
f0851ab
fixes during rebase
nstsj Oct 25, 2023
fee4410
fixes during rebase
nstsj Oct 25, 2023
11318c7
fixes during rebase
nstsj Oct 25, 2023
7fb04ff
fixes during rebase
nstsj Oct 25, 2023
4ea75ac
fixes during rebase
nstsj Oct 25, 2023
2429faa
missing readme added
nstsj Aug 28, 2023
4788051
fixes during rebase
nstsj Oct 25, 2023
97956c4
fixed merge conflicts, re-updated files
nstsj Nov 28, 2023
58bf98b
ancient components readmes upd: added more content and examples
nstsj Nov 28, 2023
8fc38c9
added readmes for main Dream distributions, explaining their purpose …
nstsj Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 11 additions & 3 deletions annotators/BadlistedWordsDetector/README.md
Expand Up @@ -4,8 +4,16 @@
Spacy-based user utterance annotator that detects words and phrases from the badlist

## I/O
input: "sentences": ["fucking hell", "he mishit the shot", "you asshole"],
output: words and their tags
[{"bad_words": True}, {"bad_words": False}, {"bad_words": True}]
**Input:** a list of user's utterances
```
["fucking hell", "he mishit the shot", "you asshole"]
```

**Output:** words and their tags
```
[{"bad_words": True}, {"bad_words": False}, {"bad_words": True}]
```


## Dependencies
none
20 changes: 16 additions & 4 deletions annotators/BadlistedWordsDetector_ru/README.md
@@ -1,11 +1,23 @@
# BadlistedWordsDetector
# BadlistedWordsDetector for Russian

## Description

Spacy-based user utterance annotator that detects words and phrases from the badlist. This version of the annotator works for the Russian Language.
Spacy-based user utterance annotator that detects words and phrases from the badlist.

This version of the annotator works for the Russian Language.

## I/O
input: user input as a str, lang = ru
output: json dict
**Input:**
Takes a list of user's utterances
```
["не пизди.", "застрахуйте уже его", "пошел нахер!"]
```

**Output:**
Returns words and their tags
```
[{"bad_words": True}, {"bad_words": False}, {"bad_words": True}]
```

## Dependencies
none
102 changes: 102 additions & 0 deletions annotators/COMeT/README.md
Expand Up @@ -5,6 +5,7 @@
COMeT is a Commonsense Transformers for Automatic Knowledge Graph Construction service based
on [comet-commonsense](https://github.com/atcbosselut/comet-commonsense) framework written in Python 3.


### Quickstart from docker for COMeT with Atomic graph

```bash
Expand Down Expand Up @@ -36,6 +37,107 @@ docker-compose -f docker-compose.yml -f local.yml exec comet-conceptnet bash tes
| Average starting time | 4s | 3s |
| Average request execution time | 0.4s | 0.2s |

## Input/Output

**Input**
- hypotheses: possible assistant's replies
- currentUtterance: latest reply from a user
- pastResponses: a list of user's utterances

an input example ():
```
{
"input": "PersonX went to a mall",
"category": [
"xReact",
"xNeed",
"xAttr",
"xWant",
"oEffect",
"xIntent",
"oReact"
]
}

```
**Output**
a list of probabilities about the utterance based on categories:
- xReact
- xNeed
- xAttr
- xWant
- oEffect
- xIntent
- oReact

an output example ():
```
"xReact": {
"beams": [
"satisfied",
"happy",
"excited"
],
"effect_type": "xReact",
"event": "PersonX went to a mall"
},
"xNeed": {
"beams": [
"to drive to the mall",
"to get in the car",
"to drive to the mall"
],
"effect_type": "xNeed",
"event": "PersonX went to a mall"
},
"xAttr": {
"beams": [
"curious",
"fashionable",
"interested"
],
"effect_type": "xAttr",
"event": "PersonX went to a mall"
},
"xWant": {
"beams": [
"to buy something",
"to go home",
"to shop"
],
"effect_type": "xWant",
"event": "PersonX went to a mall"
},
"oEffect": {
"beams": [
"they go to the store",
"they go to the mall"
],
"effect_type": "oEffect",
"event": "PersonX went to a mall"
},
"xIntent": {
"beams": [
"to buy something",
"to shop",
"to buy things"
],
"effect_type": "xIntent",
"event": "PersonX went to a mall"
},
"oReact": {
"beams": [
"happy",
"interested"
],
"effect_type": "oReact",
"event": "PersonX went to a mall"
}
}
```



## Dependencies

none
22 changes: 22 additions & 0 deletions annotators/ConversationEvaluator/README.md
@@ -0,0 +1,22 @@
# Conversation Evaluator

## Description
This annotator is trained on the Alexa Prize data from the previous competitions and predicts whether the candidate response is interesting, comprehensible, on-topic, engaging, or erroneous.

## Input/Output

**Input**
- possible assistant's replies
- user's past responses
**Output**
tags
- `isResponseComprehensible`
- `isResponseErroneous`
- `isResponseInteresting`
- `isResponseOnTopic`
- `responseEngagesUser`

with their probabilities

## Dependencies
none
16 changes: 15 additions & 1 deletion annotators/DeepPavlovEmotionClassification/README.md
@@ -1 +1,15 @@
BERT Base model for emotion classification which learned at the custom dataset(described more precisely in our article)
# DeepPavlov Emotion Classification Annotator

## Description

BERT Base model for emotion classification

## I/O

**Input**

**Output:**


## Dependencies
none
9 changes: 9 additions & 0 deletions annotators/DeepPavlovFactoidClassification/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
28 changes: 28 additions & 0 deletions annotators/NER/README.md
@@ -0,0 +1,28 @@
# Title
Named Entity Recognition Annotator

## Description
Extracts people names, locations and names of organizations from an uncased text

## Input/Output

**Input**
A list of user utterances
```
["john peterson is my brother.", "he lives in New York."]
```


**Output**
A user utterance annotated by
- confidence level
- named entity's position in a sentence (`start_pos` and `end_pos`)
- the named the entity itself
- the named entity type

```
[{"confidence": 1, "end_pos": 5, "start_pos": 3, "text": "New York", "type": "LOC"}],
```

## Dependencies
none
9 changes: 9 additions & 0 deletions annotators/NER_deeppavlov/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
9 changes: 9 additions & 0 deletions annotators/SentRewrite/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
9 changes: 9 additions & 0 deletions annotators/SentSeg/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies
7 changes: 7 additions & 0 deletions annotators/asr/README.md
Expand Up @@ -5,6 +5,13 @@
ASR component allows users to provide speech input via its `http://_service_name_:4343/asr?user_id=` endpoint. To do so, attach the recorded voice as a `.wav` file, 16KHz.

## I/O
**Input:**
user utterance: recorded voice as a `.wav` file

**Output**
asr_confidence: a probability of a user speech recognition


## Dependencies
none

6 changes: 4 additions & 2 deletions annotators/combined_classification/README.md
Expand Up @@ -21,12 +21,14 @@ The models were trained on the following datasets:

The model also contains 3 replacement models for Amazon services.

The models (multitask and comparative single task) were trained with initial learning rate 2e-5(with validation patience 2 it could be dropped 2 times), batch size 32,optimizer adamW(betas (0.9,0.99) and early stop on 3 epochs. The criteria on early stopping was average accuracy for all tasks for multitask models, or the single-task accuracy for singletask models.
The models (multitask and comparative single task) were trained with initial learning rate 2e-5(with validation patience 2 it could be dropped 2 times), batch size 32,optimizer adamW(betas (0.9,0.99)) and early stop on 3 epochs. The criteria on early stopping was average accuracy for all tasks for multitask models, or the single-task accuracy for singletask models.

This model(with a distilbert-base-uncased backbone) takes only 2439 Mb for 9 tasks, whereas single-task models with the same backbone for every of these tasks take up almost the same memory(~2437 Mb for every of these 9 tasks).

## I/O
text here if i/o specified
**Input:** immediate user utterances (+ optional history of previous utterances)
**Output:** tags for each utterance (based on toxic/topic/emotion/sentiment/factoid/midas classification)

## Dependencies
none

6 changes: 4 additions & 2 deletions annotators/combined_classification_lightweight/README.md
Expand Up @@ -21,13 +21,15 @@ The models were trained on the following datasets:

The model also contains 3 replacement models for Amazon services.

The models (multitask and comparative single task) were trained with initial learning rate 2e-5(with validation patience 2 it could be dropped 2 times), batch size 32,optimizer adamW(betas (0.9,0.99) and early stop on 3 epochs. The criteria on early stopping was average accuracy for all tasks for multitask models, or the single-task accuracy for singletask models.
The models (multitask and comparative single task) were trained with initial learning rate 2e-5(with validation patience 2 it could be dropped 2 times), batch size 32,optimizer adamW(betas (0.9,0.99)) and early stop on 3 epochs. The criteria on early stopping was average accuracy for all tasks for multitask models, or the single-task accuracy for singletask models.

This model(with a huawei-noah/TinyBERT_General_4L_312D backbone) on a CPU-only inference takes 42% less time than combined_classification, while using only ~1.5 Gb of the CPU instead of the 2909 Mb for combined_classification. The average accuracy and average F1 at the same time are for this model only ~1.5% lower than for the combined_classification, and this dropdown is consistent for all tasks.


## I/O

**Input:** immediate user utterances (+ optional history of previous utterances)
**Output:** tags for each utterance (based on toxic/topic/emotion/sentiment/factoid/midas classification)

## Dependencies
none

41 changes: 26 additions & 15 deletions annotators/custom_entity_linking/README.md
Expand Up @@ -4,29 +4,40 @@
This component is an Annotator that sematically links entities detected in user utterances. Entites then bound via relations.

Relation examples:
- favorite animal
- like animal
- favorite book
- like read
- favorite movie
- favorite food
- like food
- favorite drink
- like drink
- favorite sport
- like sports
- `favorite animal`
- `like animal`
- `favorite book`
- `like read`
- `favorite movie`
- `favorite food`
- `like food`
- `favorite drink`
- `like drink`
- `favorite sport`
- `like sports`


## I/O

**Inpunt**
**Input**
Takes a list of user_id, entity substring, entity_tags

An input example:
```
```

**Output:**
the annotator returns:
**Output:**
processed information about:
- entities
- entity_id (ids for multiple entities)
- entity_confidence score
- entity_id_tags


An output example:
```
```

## Dependencies
- annotators.ner
- annotators.entity_detection
- annotators.spacy_nounphrases
9 changes: 9 additions & 0 deletions annotators/dialog_breakdown/README.md
@@ -0,0 +1,9 @@
# Title
## Description

## Input/Output

**Input**
**Output**

## Dependencies