Added new annotator solving the NLI problem #311

Kolpnick · 2023-02-03T08:38:38Z

No description provided.

dilyararimovna · 2023-02-06T05:32:19Z

annotators/ConveRTBasedNLI/Dockerfile

+
+RUN mkdir /cache
+
+COPY . .


сначала надо копировать только файл с зависимостями и ставить их, а уже потом копировать всю папку. Потому в твоем текущем вараинте, зависимости будут переустаналиваться каждый раз, когда меняются файлы в папке.

dilyararimovna · 2023-02-06T05:33:35Z

annotators/ConveRTBasedNLI/Dockerfile

+RUN mkdir /convert_model
+RUN tar -xf nocontext_tf_model.tar.gz --directory /convert_model
+
+ENV TRAINED_MODEL_PATH model.h5


вижу, что ты добавил файл model.h5 - веса на гит. Так нельзя, их надо скачивать, даже если файл маленький

dilyararimovna · 2023-02-06T05:34:08Z

annotators/ConveRTBasedNLI/requirements.txt

+sentry-sdk==0.12.3
+jinja2<=3.0.3
+Werkzeug<=2.0.3
+protobuf==3.20.*


Надо либо ==, либо <= фиксированной версии, без звездочек, пожалуйста

dilyararimovna · 2023-02-06T05:34:35Z

annotators/ConveRTBasedNLI/server.py

+    candidates = request.json.get("candidates", [])
+    history = request.json.get("history", [])
+    logger.info(f"Candidates: {candidates}")
+    logger.info(f"History: {history}")


лишние логи можно убрать в дебаг

dilyararimovna · 2023-02-06T05:34:54Z

annotators/ConveRTBasedNLI/server.py

+    logger.info(f"History: {history}")
+    result = annotator.candidate_selection(candidates, history)
+    total_time = time.time() - start_time
+    logger.info(f"Annotator candidate prediction time: {total_time: .3f}s")


какой аннотатор? в лога надо оставить название компоненты, как у других

dilyararimovna · 2023-02-06T05:34:57Z

annotators/ConveRTBasedNLI/server.py

+    logger.info(f"sentence: {response}")
+    result = annotator.response_encoding(response)
+    total_time = time.time() - start_time
+    logger.info(f"Annotator response encoding time: {total_time: .3f}s")


какой аннотатор? в лога надо оставить название компоненты, как у других

dilyararimovna · 2023-02-06T06:00:51Z

state_formatters/dp_formatters.py

+        history = [u["text"] for u in dialog["bot_utterances"][-20:]]
+    else:
+        history = []
+    return [{"candidates": hypots, "history": history}]


нет, в батче длины должны быть одинаковые. У тебя один сэмпл истории, его надо дублировать. Потому что батчи могут перемешаться

dilyararimovna · 2023-03-28T06:45:40Z

state_formatters/dp_formatters.py

+    hypots = [h["text"] for h in hypotheses]
+    last_bot_utterances = [u["text"] for u in dialog["bot_utterances"][-20:]]
+    last_bot_utterances = [last_bot_utterances for h in hypotheses]
+    return [{"sentences": hypots, "last_bot_utterances": last_bot_utterances}]


last_bot_utterances = [u["text"] for u in dialog["bot_utterances"][-20:]] return [{"sentences": hypots, "last_bot_utterances": [last_bot_utterances] * len(hypots)}]

smilni

дополнительно к комментам по файлам:

пройтись black (black -l 120 .) и flake8 по всем файлам, иначе не смерджится
разрешить мердж конфликты
добавить новый аннотатор в три места: components.json в корневой папке + создать component.yml и pipeline.yml внутри папки твоего аннотатора. как это делать можно посмотреть в свежем деве, например, в annotators/asr

smilni · 2023-03-27T18:34:01Z

annotators/ConveRTBasedNLI/test.py

+                       'entailment': 0.0019739873241633177,
+                       'neutral': 0.0290225762873888,
+                       'contradiction': 0.9690034985542297}]
+


у меня не проходит тест, чуть отличаются значения

wild guess: не закреплен random seed где-то, где стоило бы закрепить?
либо модель не переведена в режим инференса, нужно проверить

не убирай числа, используй round() до двух знаков после запятой
но вообще лучше дополнительно проверить то что я написала про сид и инференс

smilni · 2023-03-27T19:03:09Z

response_selectors/convers_evaluation_based_selector/server.py

+
+                curr_is_toxics.append(is_toxic_or_contr_utterance)
+
+                if is_toxic_or_contr_utterance:


если тут меняется так, лучше изменить sentry_sdk.capture_message и msg для логгера (стр 105-111), добавить инфу, что это не только badlisted phrases, но и contradiction. иначе сообщение будет очень путать потом)

smilni · 2023-03-27T19:32:08Z

assistant_dists/dream/docker-compose.override.yml

+        limits:
+          memory: 1G
+        reservations:
+          memory: 1G


ты проверял работу аннотатора внутри дрима? у меня начиналось отлично, а на третьем utterance он вылетел dream_convert-based-nli_1 exited with code 137, памяти не хватило. обязательно нужно проверять на 3+ диалогах, чем длиннее, тем лучше, хотя бы пять-десять ходов юзера

сейчас ты выделяешь ему 1гб на работу, вероятно нужно больше, попробуй поднять и посмотреть docker stats

smilni · 2023-03-27T19:42:24Z

common/utils.py

@@ -1289,6 +1289,13 @@ def is_toxic_or_badlisted_utterance(annotated_utterance):
    return toxic_result or any([badlist_result.get(bad, False) for bad in ["bad_words", "inappropriate", "profanity"]])


+def is_contradiction_utterance(annotated_utterance):
+    contradiction_result = annotated_utterance.get("annotations", {}).get("convert_based_nli")["decision"]


безопаснее всегда через get доставать и с default value, предлагаю переделать на contradiction_result = annotated_utterance.get("annotations", {}).get("convert_based_nli", {}).get("decision", "")

smilni · 2023-03-27T19:42:58Z

common/utils.py

+    contradiction_result = True if "contradiction" in contradiction_result else False
+
+    return contradiction_result


читабельнее в одну строку: return "contradiction" in contradiction_result

smilni · 2023-03-28T06:36:28Z

annotators/ConveRTBasedNLI/Dockerfile

+ARG DATA_URL=https://github.com/davidalami/ConveRT/releases/download/1.0/nocontext_tf_model.tar.gz
+ARG NEL_URL=https://github.com/Kolpnick/dream/raw/convert_based_nli/annotators/ConveRTBasedNLI/model.h5


нужно написать Федору Игнатову, чтобы он положил эти файлы нужно положить в share и дал путь, как их скачивать (нужно качать с нашего share, а не с гитхаба)

smilni · 2023-03-28T06:38:58Z

annotators/ConveRTBasedNLI/Dockerfile

+COPY requirements.txt .
+RUN pip install -r requirements.txt


сначала установить все зависимости, потом уже скачивать модельку (перенести эти строчки до скачивания моделек)

smilni · 2023-03-29T10:08:50Z

response_selectors/convers_evaluation_based_selector/server.py

                    with sentry_sdk.push_scope() as scope:
                        scope.set_extra("utterance", skill_data["text"])
                        scope.set_extra("selected_skills", skill_data)
-                        sentry_sdk.capture_message("response selector got candidate with badlisted phrases")
+                        sentry_sdk.capture_message("response selector got candidate with badlisted phrases and detected contradiction")


or, не and
или или, не то и то)

smilni · 2023-03-29T10:08:59Z

response_selectors/convers_evaluation_based_selector/server.py

                        msg = (
-                            "response selector got candidate with badlisted phrases:\n"
+                            "response selector got candidate with badlisted phrases and detected contradiction:\n"


smilni · 2023-03-29T10:52:08Z

annotators/ConveRTBasedNLI/test.py

+                       'entailment': 0.0019739873241633177,
+                       'neutral': 0.0290225762873888,
+                       'contradiction': 0.9690034985542297}]
+


не убирай числа, используй round() до двух знаков после запятой
но вообще лучше дополнительно проверить то что я написала про сид и инференс

smilni · 2023-03-29T10:53:47Z

assistant_dists/dream/docker-compose.override.yml

        reservations:
-          memory: 1G


проверил, работает ли с памятью 1.5G в дриме? минимум на 3 диалогах и 7 репликах юзера

имею в виду пробовал ли поднимать весь дрим дистрибутив и с ним разговаривать
если нет, это нужно сделать

если вдруг что-то будет падать во время поднятия, подтяни и подмердж свежий дев, должно исправить

dilyararimovna · 2023-04-04T15:11:22Z

annotators/ConveRTBasedNLI/Dockerfile

+FROM python:3.7.4
+
+ARG DATA_URL=http://files.deeppavlov.ai/tmp/nocontext_tf_model.tar.gz
+ARG NEL_URL=http://files.deeppavlov.ai/tmp/model.h5


почему h5 не архив? может заархивировать? сколько весит файл?

давай пути к моделям принимать в виде параметров без дефолтных значейний. значения задавтаь в докер компоуз

Модель занимает немного места - всего 10,5 Мб весит (которая model.h5)

dilyararimovna · 2023-04-04T15:13:01Z

annotators/ConveRTBasedNLI/Dockerfile

+ARG NEL_URL=http://files.deeppavlov.ai/tmp/model.h5
+
+ENV CACHE_DIR /cache
+ENV TRAINED_MODEL_PATH /data/nli_model/model.h5


плохо задать переменную в докерфайле, лучше не задавтаь вообще - зачем переменная-то?

Переменную с cache можно убрать, а вот переменная с путём к модели нужна - вдруг захочется в другое место её загружать)

dilyararimovna · 2023-04-04T15:13:18Z

annotators/ConveRTBasedNLI/Dockerfile

+ENV CONVERT_MODEL_PATH /data/convert_model
+
+WORKDIR /src
+RUN mkdir /cache


почему кэш, а не в дата?

что в кэше лежит?

Кэш нужен при обучении: если обучать модель с 0, то в папку cache закачиваются датасеты и сохраняются checkpoints модели

dilyararimovna · 2023-05-29T12:51:23Z

response_selectors/convers_evaluation_based_selector/server.py

-                if is_toxic_utterance:
+                is_toxic_or_contr_utterance = is_toxic_utterance or is_contr_utterance
+
+                curr_is_toxics.append(is_toxic_or_contr_utterance)


лучше сделать это отдельными параметрами, чтобы можно было по дом параметру в докер компоуз файле менять "фильтруем бэдлист или нет, фильтрум контрадикшен или нет"

# Conflicts: # assistant_dists/dream/dev.yml # components.tsv

NeoIsALie · 2023-05-30T07:26:00Z

annotators/ConveRTBasedNLI/Dockerfile

+RUN mkdir /cache
+RUN mkdir /data
+RUN mkdir /data/nli_model/
+RUN mkdir /data/convert_model/


mkdir может принимать на вход несколько аргкментов, так что можно собрать 4 строки в одну вида mkdir /cache /data ...

NeoIsALie · 2023-05-30T07:26:39Z

annotators/ConveRTBasedNLI/Dockerfile

+RUN curl -L $NLI_URL --output /tmp/nli_model.tar.gz && tar -xf /tmp/nli_model.tar.gz -C /data/nli_model && rm /tmp/nli_model.tar.gz
+RUN curl -L $CONVERT_URL --output /tmp/conv_model.tar.gz && tar -xf /tmp/conv_model.tar.gz -C /data/convert_model && rm /tmp/conv_model.tar.gz


можно собрать в один вызов RUN через &&

Kolpnick and others added 5 commits January 19, 2023 11:45

Added model that solves NLI between bot replicas

7122cbe

Merge branch 'deeppavlov:dev' into dev

57ce7d6

Added tests to ConveRTBasedNLI

7e4cc4b

Added ConveRTBasedNLI annotator

5a765f5

Changed nli annotator output format

17faff4

dilyararimovna requested changes Feb 6, 2023

View reviewed changes

Kolpnick added 6 commits February 7, 2023 11:24

Changed data copying and added model downloading

14d737a

Changed protobuf requirements

64df82a

Made lengths in batch the same

eed477b

Changed logs outputs

e73f96c

Added NLI contradiction selection into convers_evaluation_selector

a3d91a2

Fixed history batch preprocessing and updated test file

77becf2

dilyararimovna reviewed Mar 28, 2023

View reviewed changes

smilni reviewed Mar 28, 2023

View reviewed changes

Kolpnick and others added 9 commits March 28, 2023 20:45

Fixed code readability

47743ac

Added component cards

3e7aa63

Fixed assert in a test

4ba1726

Changed debug message output

9a7216a

Increased model memory limit

9cef446

Updated models paths

7c64fdd

Updated model port

75c80c5

Merge branch 'dev' into convert_based_nli

b3adfad

Deleted model file from git

9f41374

smilni reviewed Mar 29, 2023

View reviewed changes

Kolpnick added 2 commits March 29, 2023 16:58

Changed test

0a614b5

Fixed debug message output

d57f7e3

smilni previously approved these changes Mar 30, 2023

View reviewed changes

dilyararimovna requested changes Apr 4, 2023

View reviewed changes

Kolpnick added 2 commits April 6, 2023 10:22

Initialization of variables in Dockerfile changed

a34ecdc

Fixed issues with custom models training

5ce22f4

Kolpnick dismissed smilni’s stale review via 5ce22f4 April 6, 2023 07:25

Kolpnick added 2 commits April 6, 2023 15:23

Updated model path

03df165

Added README

4b7341b

dilyararimovna reviewed May 29, 2023

View reviewed changes

dilyararimovna added 2 commits May 29, 2023 17:52

Merge branch 'dev' into convert_based_nli

f3ddf53

# Conflicts: # assistant_dists/dream/dev.yml # components.tsv

fix: port

07376bc

NeoIsALie requested changes May 30, 2023

View reviewed changes

Kolpnick and others added 7 commits May 30, 2023 14:10

Updated Dockerfile

f2f9e99

Separated toxic and contradiction check

1754759

Fixed black and flake8 codestyles errors

a05b8af

Added component card

c2487a5

Separated logger out for toxic and contradiction

c2f0894

Refactored dockerfile

755c98e

Updated environment.yml

e6bbda5


		curr_is_toxics.append(is_toxic_or_contr_utterance)

		if is_toxic_or_contr_utterance:

		contradiction_result = True if "contradiction" in contradiction_result else False

		return contradiction_result

		ARG DATA_URL=https://github.com/davidalami/ConveRT/releases/download/1.0/nocontext_tf_model.tar.gz
		ARG NEL_URL=https://github.com/Kolpnick/dream/raw/convert_based_nli/annotators/ConveRTBasedNLI/model.h5

		RUN curl -L $NLI_URL --output /tmp/nli_model.tar.gz && tar -xf /tmp/nli_model.tar.gz -C /data/nli_model && rm /tmp/nli_model.tar.gz
		RUN curl -L $CONVERT_URL --output /tmp/conv_model.tar.gz && tar -xf /tmp/conv_model.tar.gz -C /data/convert_model && rm /tmp/conv_model.tar.gz

Added new annotator solving the NLI problem #311

Are you sure you want to change the base?

Added new annotator solving the NLI problem #311

Conversation

Kolpnick commented Feb 3, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smilni left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smilni Mar 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smilni Mar 29, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smilni left a comment •

edited

smilni Mar 29, 2023 •

edited

smilni Mar 29, 2023 •

edited