Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prompts refactoring #517

Merged
merged 38 commits into from
May 20, 2024
Merged

Conversation

quovadim
Copy link
Collaborator

@quovadim quovadim commented Apr 30, 2024

closes #505
Prompts are moved into subclass, better prompts for text generation added, chain-of-thought capabilities added for prompt

Main new features are:

  • Added native and automatic support for JSON generation using openai format structure
  • Added prompt-level validation capabilities for generated answer
  • Added classes prompt with automatic validation of prompts and parameters
  • Added support for non-strict queries, meaning that now you can specify if particular prompt response may fail and it will be still ok
  • Added support for chain-of-thought reasoning (take a look at examples at QNA generation prompts
  • Moved prompts into separate txt files
  • Now q&a generation can fail generation of some questions without consequences as long as at least one question is generated
  • Changed python version to python 3.11
  • Some prompts that were previously returning text now return JSONs
  • Some prompts that were previously returning JSONs now return text
  • Signature of generate_response now depends on prompt input parameters
  • Refactored all (except ragas) prompts, added examples.
  • some try/except blocks were removed due to new system of handling responses for non-strict queries. Meaning that all exceptions thrown from response_generator are critical. If NonStrict tag is added, in case of failed execution result will be None, unless exception is critical (e.g. non-related to LLM generation, but problem in the code)

Metrics and comparison

Comparison on data generated using this branch

Metric New Data, New Prompts New Data, Old Prompts
fuzzy 75.88 77.05
bert_all_MiniLM_L6_v2 80.58 79.60
cosine 75.21 68.88
bert_distilbert_base_nli_stsb_mean_tokens 76.90 76.14
llm_answer_relevance 72.59 67.60
llm_context_precision 91.03 84.62

Comparison on data generated using development branch

Metric Old Data, New Prompts Old Data, Old Prompts
fuzzy 81.14 80.56
bert_all_MiniLM_L6_v2 69.71 66.58
cosine 66.78 57.51
bert_distilbert_base_nli_stsb_mean_tokens 70.49 67.78
llm_answer_relevance 62.18 57.63
llm_context_precision 84.62 78.85

@martinpeck
Copy link
Member

You've removed quite a few try/except blocks.
This might be fine, but it's not mentioned in the PR notes, and I wasn't sure if this change was desired/expected.
Probably best to mention how the error handling has changed in the PR notes.

@ritesh-modi ritesh-modi self-requested a review May 1, 2024 11:00
@quovadim quovadim requested a review from martinpeck May 3, 2024 07:43
@quovadim quovadim self-assigned this May 3, 2024
@martinpeck martinpeck removed their request for review May 7, 2024 10:01
@quovadim quovadim requested a review from martinpeck May 9, 2024 07:58
Copy link
Collaborator

@ritesh-modi ritesh-modi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

rag_experiment_accelerator/config/config.py Show resolved Hide resolved
@quovadim quovadim changed the base branch from development to prerelease May 20, 2024 09:37
@quovadim quovadim merged commit 0341e88 into prerelease May 20, 2024
3 checks passed
@quovadim quovadim deleted the quovadim/qna-generation-improvement branch May 20, 2024 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bring better Q&A generation capabilities
3 participants