Bot kg annotator (#586) · deeppavlov/dream@442bb26

Commit

Bot kg annotator (#586)

* update pipeline_conf

* fix: Use dream_kg pipeline for that dist

* update: extract multiple triplets

* fix typo

* remove bot uttr for cuxtom-el

* use generic relations

* fix url

* switch to t5 lite

* chore: Update user-kg to CRUD batch operations

* fix: Separate storing entities and relationships

This let us avoid assertion error about lists lengths in deeppavlov-kg

* fix: Make all entities lowercase before storing

This fixes the problem of storing the same entity many times

* remove repetitive triplets

* minor fix

* fix: Check if entity exists in kg before adding it

This prevents creating the same entity many times in KG

* fix: Delete invalid condition

* fix ckpt name

* created kg_prompted distribution

* fix: Update name_scenario func after DP-kg changes

* chore: Modify logs

* generative skill and prop_ext update

* update lite model

* fix: custom-el ports

* user-kg: prompt key to write into ctx

* remove repeating triplets with low score

* fix: Update entity-detection file location

* chore: Fix 'added' to return only added triplets

* test: Update test.py for user-kg annotator

* test: Fix according to prompted dist

* test: Correct path of index sqlite db

* llm update: gpt-j replaced by gpt-jt, oasst added

* fix: removed extra services, added property-extraction to pipeline

* fix proxy

* feat: Turn KG triplets to NL & insert into prompt

* update custom_el

* chore: Add component & container cards

* style: optimize Dockerfile for custom_el

* style: flake and black up

* style: flake and black up properly

* feat: exrtact multiple triplets

* codestyle

* codestyle

* fix

* fix: Add env vars & correct ports

* test: Add test.py

* codestyle

* update user-knowledge-graph

Bring updates from feat/new_prop_ex

* chore: Add component card & occupy a port

* chore: Delete add entities from custom_el

* The user-kg does the work now utilizing index module of deeppavlov-kg

* chore: Change app port & print test results

* test: Fix test_el.py after deleting add_entities

* fix

* refactor: Clear user-kg formatter up

* chore: Get rel_kind_dict out to a file

* update: kg_prompt processing in generative skill

* update: dff_travel_italy_skill modified

* fix: bug in dff_travel_italy_skill

* update: container cards for dff_travel_italy_skill

* style: flake8 and black reformatting

* update: Dockerfiles refactoring

* dff_travel_italy_skill
* dff_user_kg_skill

* update: container cards for user-kg-skill

* style: flake8 and black reformat for user-kg-skill

* refactor: Divide into more comprehensible functions

* update: source data added for KG skills

* update: del repeated lines, add .env to skill

* update: edit description, ram_usage changed

* update: add .env_secret to travel-italy-skill

* refactor: Make it work

* doc: Add docstring to functions

* style: Black up

* update: kg part added to template-prompted-skill

* removed badlisted classifier

* doc: Add readme to terminusdb

* doc: update terminusdb_service readme

* refactor: assistant_dists for dream_kg_prompted

* update download path

* fix: Correct user_id, PE input & rel_list

* container and component cards for kg-prompted-skill

* update: pipeline for dream_kg_prompted dist

* fix

* test: Delete '_' from relationship in test.py

* refactor: Rename to user-knowledge-memorizer

* typo

* delete file

* update: knowledge_prompted_skill moved from template_prompted_skill

* refactor: Add abstract kind scenario

* doc: Complete check_entities_in_kg docstring

* fix: skill tallied with refactored user-km

* chore: container cards for knowledge_prompted_skill

* in prompt_selector
* in agent_services

* fix: dream_kg dist optimized

* fix: pipeline changed for dream_kg_prompted dist

* fix: prop-ex ports, terminusdb server db

* refactor: Add the user properties scenario

* fix: knowledge prompt added, pipeline optimized

* refactor: KG code deleted from template-prompted-skill

* Merge branch 'dev' into feat/kg_annotator

* chore: Replace env vars into args of container

* chore: Replace env vars to skill container args

* fix: travel_italy_skill test

* Revert "chore: Replace env vars into args of container"

This reverts commit 3dbaccd.

* Revert "Merge branch 'dev' into feat/kg_annotator"

This reverts commit c70dc5a.

* chore: Replace env vars into args of container

* chore: Delete proxy2

* remove unnecessary services

* style: add line to end

* fix: terminusdb bug

* chore: optimized services for dream_kg dist

* use prop extraction in dff skills

* fix: bug in wiki_skill

* feat: pipeline change for bot uttr in prop-ex

* chore: PE formatter change in kg_prompted dist

* chore: user-kg changed to user-km in prompted dist

* user-kg deleted from annotators
* dream_kg_prompted dist changed to user-km

* bot property extraction

* chore: prop-ex formatter changed in dream_kg

* Revert "chore: prop-ex formatter changed in dream_kg"

This reverts commit ed895cd.

* chore: prop-ex formatter changed in dream_kg

* fix: prop_ex_bot_formatter fix for segmented uttrs

* chore: user-km formatters changed

* fix: formatter for user-km last bot uttr

* prop-ex input as segments

* analyze user&bot utterance

* chore: prop-ex bot deleted

* fix: changes for user-km in dream_kg_prompted

* ports changed in docker-compose, dev, pipeline and component card

* terminusdb variables added to user-km in docker-compose

* source added for user-km in pipeline

* fix: pipeline_conf in dream_kg_prompted

* for prop-ex and user-km

* fix: user_knowledge_memorizer

* if-statement added to accept error if no bot_utterances

* user variables changed to bot

* fix: User to Bot in user-km

* fix: function name revert

* feat: custom-el changes to annotate bot_utterances

* custom-el added to pipeline to response_annotators

* custom-el formatter for last_bot_utt

* drafts for some other funcs to process last_bot_utt

* fix:  custom_el_last_bot formatter bug

* feat: changes for entity detection

* entity-detection formatter changed for last_bot_utt

* entity-detection moved to response_annotators in pipeline

* database changed in docker-compose

* fix: entity-detection formatter for last_bot_utt

* feat: entity_linking to response_annot in pipeline

* feat: entity-linking changed for last_bot_utt

* updates for entity-linking formatter for last_bot_utt

* formatter corrected for entity-linking in pipeline

* fix: EL error on an empty uttr

* update: prop-ex last_bot_uttr formatter

* added get_entities func

* fix: bugs in bot-km and custom-el

* bot_id changed in bot-km to avoid mismatch with index

* bot_id is passed to custom-el via formatter as user_id

* fix: Decode symbols in custom_el

* when getting user_id from index

* fix: check existance of a dict key

* fix: Let terminusdb run locally

* fix: log an error instead of exiting the annotator

* fix: Iterate over reversed indices

** to avoid iterating over incorrect elements

* chore: bot-km as a separate service

* bot-km moved to its own folder and renamed

* container cards for bot-km created

* component card created for bot-km

* bot-km under its own name in dream_kg_prompted distribution

* fix: bot-km building along the whole platform

* fix: bot-km error when adding entity to index

* chore: one log info is off in bot-km

* fix: decode symbols in custom_el

* chore: timeout added to response_annotators in ppl

* revert: user-km to its original view

* INDEX_LOAD_PATH moved to CONFIG

* chore: minor changes for bot-km

* INDEX_LOAD_PATH moved to config

* timeout increased for bot-km

* fix: index path upload from config in bot-km

* chore: bot_km updates

* BOT_ID is moved to global variables

* tests edited for bot-km

* config fix in Dockerfile

* remove: spacy-nounphrases from proxy

* update: dist renamed into dream_bot_kg_prompted

* fix: removed unused variable from the skill

* fix: wrong utterance annotation

* fix: Delete duplicates from to-create kinds

* fix: del duplicates from to-create kinds in bot-km

* feat: bot-km part to convert triplets into NL

* generate_prompt function to bot-km and a test example

* env variables in docker-compose and Dockerfile

* variable to .env to use bot's knowledge

* fix: pipeline for skill to wait for bot-km info

* feat: skill adds info about bot to prompt

* fix: bug in custom-el formatter

* fix: timeout increase for bot-km & entity-linking

* fix: iterate over reversed indices

* to avoid iterating over incorrect elements

* fix: accept and return a batch of utterances

* chore: delete unneccessary index from bot-km test

* chore : bot-km func to convert triplets to NL

* added function to retrieve knowledge about bot

* added functionality to convert triplets to natural language

* some variables changed and added in Dockerfile

* USE_BOT_KG_DATA moved from .env into docker-compose

* bot-km service cards updated

* chore: updates for knowledge-prompted-skill

* USE_BOT_KG_DATA added to docker-compose and Dockerfile

* Dockerfile refactored

* service cards updated for the skill

* USE_BOT_KG_DATA converted into integer when uploaded

* feat: added functionality to bot-km

* added func to filter related knowledge

* added function to create prompt for the skill

* added last_human_utterance for bot-km in dp_formatters

* args added for bot-km to docker-compose, Dockerfile

* servie_configs updated for bot-km

* chore: triplet preprocessing in skill

* feat: bot-kg-skill use knowledge-skill as template

* skill renamed to dff_knowledge_prompted_skill

* service cards added

* bot_knowledge prompt added

* bot_knowledge prompt added to prompt_selector

* chore: triplet processing

* triplet format changed in bot-km

* triplet processing updated in skill

* chore: component cards and service configs

* for dream_bot_kg_prompted dist created/modified

* changes reflected in pipeline

* fix: Check in kg ONLY triplets that ARE IN index

* chore: ports reservation for bot-km and the skill

* ports reserved in components.tsv

* changes reflected in distribution and components cards

* chore: spacy-nounphrases deleted from distribution

* chore: variables added to build args for bot-km

* changes reflected in Dockerfile and service cards

* chore: variable added to bot-km build args

* changes reflected in Dockerfile and service cards

* refactor: unused vars and loops deleted from skill

* refactor: generative func in bot-km

* chore: change of ports for bot-km and bot-skill

* changes reflected in distribution and service cards

* chore: formatters update

* fix: custom-el, knowledge-skill

* custom-el service_port fixed

* knowledge skill refactored

* style format

* fix: prop-ex and prompt-selector

* blocks of code returned to pro-ex to extract triplets from segments

* sentence-ranker url added to .env for prompt-selector to work

* fix: triplets conversion in bot-km and style

* fix: variable upload to use bot knowledge in skill

* fix: GPUs in dream_bot_kg_prompted

* fix: bot-km port in tests

* fix: prop-ex

* fix: tests for bot-km

* revert: merge errors

* fix: removed old files and merge error

* fix: package versions, component card

* component card changed in pipeline_conf

* sentence-ranker url removed from .env

* fix: bot_id retrieved from context

* changes in bot-km server file

* custom-el formatter changed

* property extraction formatter update

* revert: changes in dream_kg

* fix: prop-ex port in dream_bot_kg_prompted dist

* fix: added default values to skill Dockerfile

* USE_KG_DATA and USE_BOT_KG_DATA

* fix: entity-detection formatter

* fix: environemnet for bot-km

* fixes propr ex

* fix if bot uttr

* fix: entity-linking and custom-el formatters

* changes reflected in services server-files

* style format

* fix: Send unique list of kinds to be created in db

* fix: default output for bot-km

* returns the output in the usual form and prevents skill from error

* fix: component cards for dream_bot_kg_prompted

* property-extraction

* entity-detection

* entity-linking

* custom-entity-linking

* ner

* response_selector

* bot-knowledge-memorizer

* dff-bot-knowledge-prompted-skill

* fix: Send unique list of kinds to be created in db

* fix: sentence-ranker in args for prompt-selector

* changes reflected in service_configs for dream_kg_prompted and dream_bot_kg_prompted

* fix: ner component card

* fix: pipeline and component cards

* bot-knowledge-prompted-skill

* response_selector

---------

Co-authored-by: dmitry <dmitrij.euseew@yandex.ru>
Co-authored-by: Ramimashkouk <rami.n.mashkouk@gmail.com>
Co-authored-by: Raushan <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: kpetyxova <kpetyxova@mail.ru>
Co-authored-by: Rami <54779216+Ramimashkouk@users.noreply.github.com>
Co-authored-by: Fedor Ignatov <ignatov.fedor@gmail.com>
Co-authored-by: Федор Игнатов <ignatov@gpu2.ipavlov.mipt.ru>
Co-authored-by: dilyararimovna <dilyara.rimovna@gmail.com>

Loading branch information

9 people committed Dec 12, 2023

1 parent 9f9f749 commit 442bb26

.env

-Original file line number
+Diff line change
@@ Expand Up / @@ -33,4 +33,4 @@ DIALOGPT_CONTINUE_SERVICE_URL=http://dialogpt:8125/continue @@
     PROMPT_STORYGPT_SERVICE_URL=http://prompt-storygpt:8127/respond
     STORYGPT_SERVICE_URL=http://storygpt:8126/respond
     FILE_SERVER_URL=http://files:3000
-    SUMMARIZATION_SERVICE_URL=http://dialog-summarizer:8059/respond_batch
+    SUMMARIZATION_SERVICE_URL=http://dialog-summarizer:8059/respond_batch

annotators/bot_knowledge_memorizer/Dockerfile

-Original file line number
+Diff line change
@@ -0,0 +1,47 @@
+    FROM python:3.9.1
+    WORKDIR /src
+    ARG SERVICE_PORT
+    ARG SRC_DIR
+    ARG TERMINUSDB_SERVER_PASSWORD
+    ARG TERMINUSDB_SERVER_URL
+    ARG TERMINUSDB_SERVER_TEAM
+    ARG TERMINUSDB_SERVER_DB
+    ARG BOT_KM_SERVICE_CONFIG
+    ARG GENERATIVE_SERVICE_URL
+    ARG GENERATIVE_SERVICE_CONFIG
+    ARG GENERATIVE_SERVICE_TIMEOUT
+    ARG SENTENCE_RANKER_URL
+    ARG SENTENCE_RANKER_TIMEOUT
+    ARG RELEVANT_KNOWLEDGE_THRESHOLD
+    ARG ENVVARS_TO_SEND
+    ARG USE_BOT_KG_DATA
+    ENV SERVICE_PORT=$SERVICE_PORT
+    ENV TERMINUSDB_SERVER_PASSWORD=$TERMINUSDB_SERVER_PASSWORD
+    ENV TERMINUSDB_SERVER_URL=$TERMINUSDB_SERVER_URL
+    ENV TERMINUSDB_SERVER_TEAM=$TERMINUSDB_SERVER_TEAM
+    ENV TERMINUSDB_SERVER_DB=$TERMINUSDB_SERVER_DB
+    ENV BOT_KM_SERVICE_CONFIG=$BOT_KM_SERVICE_CONFIG
+    ENV GENERATIVE_SERVICE_URL=$GENERATIVE_SERVICE_URL
+    ENV GENERATIVE_SERVICE_CONFIG=$GENERATIVE_SERVICE_CONFIG
+    ENV GENERATIVE_SERVICE_TIMEOUT=$GENERATIVE_SERVICE_TIMEOUT
+    ENV SENTENCE_RANKER_URL=$SENTENCE_RANKER_URL
+    ENV SENTENCE_RANKER_TIMEOUT=$SENTENCE_RANKER_TIMEOUT
+    ENV RELEVANT_KNOWLEDGE_THRESHOLD=$RELEVANT_KNOWLEDGE_THRESHOLD
+    ENV ENVVARS_TO_SEND=$ENVVARS_TO_SEND
+    ENV USE_BOT_KG_DATA=$USE_BOT_KG_DATA
+    RUN pip install -U pip wheel setuptools
+    COPY ./annotators/bot_knowledge_memorizer/requirements.txt .
+    RUN pip install --upgrade pip && \
+        pip install --no-cache -r /src/requirements.txt && \
+        python -m nltk.downloader wordnet && \
+        pip install git+https://github.com/deeppavlov/custom_kg_svc.git@feat/support_index
+    COPY $SRC_DIR .
+    CMD gunicorn --workers=1 server:app -b 0.0.0.0:$SERVICE_PORT

annotators/bot_knowledge_memorizer/abstract_rels.txt

-Original file line number
+Diff line change
@@ -0,0 +1,11 @@
+    favorite animal
+    like animal
+    favorite book
+    like read
+    favorite movie
+    favorite food
+    like food
+    favorite drink
+    like drink
+    favorite sport
+    like sports

annotators/bot_knowledge_memorizer/config.json

-Original file line number
+Diff line change
@@ -0,0 +1,9 @@
+    {
+        "chainer": {
+        },
+        "metadata": {
+          "variables": {
+            "CUSTOM_EL": "/root/.deeppavlov/downloads/entity_linking_eng/custom_el_eng_dream"
+          }
+        }
+      }

annotators/bot_knowledge_memorizer/rel_list.json

-Original file line number
+Diff line change
@@ -0,0 +1,15 @@
+    {
+        "FAVORITE ANIMAL": "Animal",
+        "HAVE PET": "Animal",
+        "LIKE ANIMAL": "Animal",
+        "FAVORITE BOOK": "Book",
+        "LIKE READ": "Book",
+        "FAVORITE MOVIE": "Film",
+        "FAVORITE FOOD": "Food",
+        "LIKE FOOD": "Food",
+        "FAVORITE DRINK": "Food",
+        "LIKE DRINK": "Food",
+        "FAVORITE SPORT": "Type_of_sport",
+        "LIKE SPORTS": "Type_of_sport",
+        "LIKE GOTO": "Place"
+    }

annotators/bot_knowledge_memorizer/requirements.txt

-Original file line number
+Diff line change
@@ -0,0 +1,12 @@
+    Flask==1.1.1
+    inflect==0.2.4
+    gunicorn==19.9.0
+    requests==2.27.1
+    jinja2<=3.0.3
+    Werkzeug<=2.0.3
+    sentry-sdk==0.12.3
+    pyopenssl==22.0.0
+    itsdangerous==2.0.1
+    click==7.1.2
+    nltk==3.5
+    uuid==1.30

0 comments on commit `442bb26`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `442bb26`

Commit

There are no files selected for viewing

0 comments on commit 442bb26

0 comments on commit `442bb26`