Bring better Q&A generation capabilities #505

quovadim · 2024-04-24T12:23:36Z

No description provided.

closes #505 Prompts are moved into subclass, better prompts for text generation added, chain-of-thought capabilities added for prompt Main new features are: - Added native and automatic support for JSON generation using openai format structure - Added prompt-level validation capabilities for generated answer - Added classes prompt with automatic validation of prompts and parameters - Added support for non-strict queries, meaning that now you can specify if particular prompt response may fail and it will be still ok - Added support for chain-of-thought reasoning (take a look at examples at QNA generation prompts - Moved prompts into separate txt files - Now q&a generation can fail generation of some questions without consequences as long as at least one question is generated - Changed python version to python 3.11 - Some prompts that were previously returning text now return JSONs - Some prompts that were previously returning JSONs now return text - Signature of generate_response now depends on prompt input parameters - Refactored all (except ragas) prompts, added examples. - some try/except blocks were removed due to new system of handling responses for non-strict queries. Meaning that all exceptions thrown from response_generator are critical. If NonStrict tag is added, in case of failed execution result will be None, unless exception is critical (e.g. non-related to LLM generation, but problem in the code) Metrics and comparison Comparison on data generated using this branch | Metric | New Data, New Prompts | New Data, Old Prompts | |-------------------------------------------|-----------------------|-----------------------| | fuzzy | 75.88 | **77.05** | | bert_all_MiniLM_L6_v2 | **80.58** | 79.60 | | cosine | **75.21** | 68.88 | | bert_distilbert_base_nli_stsb_mean_tokens | **76.90** | 76.14 | | llm_answer_relevance | **72.59** | 67.60 | | llm_context_precision | **91.03** | 84.62 | Comparison on data generated using development branch | Metric | Old Data, New Prompts | Old Data, Old Prompts | |-------------------------------------------|-----------------------|-----------------------| | fuzzy | **81.14** | 80.56 | | bert_all_MiniLM_L6_v2 | **69.71** | 66.58 | | cosine | **66.78** | 57.51 | | bert_distilbert_base_nli_stsb_mean_tokens | **70.49** | 67.78 | | llm_answer_relevance | **62.18** | 57.63 | | llm_context_precision | **84.62** | 78.85 | --------- Co-authored-by: Vadim Kirilin <vadimkirilin@Vadims-MacBook-Pro.local>

quovadim added the Good have label Apr 24, 2024

quovadim self-assigned this Apr 24, 2024

quovadim mentioned this issue May 3, 2024

Prompts refactoring #517

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bring better Q&A generation capabilities #505

Bring better Q&A generation capabilities #505

quovadim commented Apr 24, 2024

Bring better Q&A generation capabilities #505

Bring better Q&A generation capabilities #505

Comments

quovadim commented Apr 24, 2024