Prerelease #556

guybartal · 2024-05-18T14:05:42Z

This PR Includes:

Checkpointing - Add Checkpoints to 02_qa_generation and 03_querying #555 Add Checkpoints to 01_index #472
Hyperparameters logging with mlflow Log all hyper-parameters to mlflow #529
Lowercase config attributes Log all hyper-parameters to mlflow #529

closes #480 ### This PR includes - Log all hyper parameters to mlflow - Config refactoring - lowercase for all attributes as those are not constants (match coding conventions) - Print azure ml monitoring URL right after its creation to allow easy access to monitoring (ctrl + left click) - Fix issues with experiment and job names, allowing azure ml commands open experiment and mlflow runs automatically, while locally we create those manually. - Remove unused experiment settings from `.env` sample file - Hide warning azureml warnings by using `CliV2AnonymousEnvironment` as Azure ML environment name - Temporary solution for wrong json format in Q&A Gen step with current CI generation model version by removing all "..."` strings from the model response ### WIP #540

# Resolves #186 This PR introduces a checkpointing feature to skip the processing of data that has already been processed in previous runs. A checkpoint object is used to wrap methods, so when the method is called with an ID that was called before, instead of executing the method, the checkpoint will return the result of the previous execution. To use it, first initialize the checkpoint (should be done only once, can be done anywhere in the pipeline): ``` init_checkpoint(checkpoint_name, config) ``` and decorate the method you want to use it for: ``` @cache_with_checkpoint(id="arg2.id") def method(arg1, arg2): pass ``` or wrap the method using the checkpoint object: ``` get_checkpoint().load_or_run(method, arg2.id, arg1, arg2) ``` (arg2.id is the ID that uniquely identifies the call in this example) This call will check if the provided method has previously been executed with the given ID, If it has, it returns the cached result, otherwise, it executes the method with the given arguments and caches the result for future calls. The design for this feature can be found in #186 # This PR includes: * Added Checkpoint - an abstract class for the checkpoint interface * Added LocalStorageCheckpoint - a Checkpoint implementation for checkpoints on the local machine * Added NullCheckpoint - a dummy Checkpoint implementation for skipping the checkpoint feature * Implemented the usage of Checkpoints in the 01_index step * Added a Make command to delete the checkpoints data * Added tests * Tiny fix for the misconfiguration of flake8. # Next PRs: * Checkpointing to other steps (this PR is only for the `index` step only) * Implementation for AzureMLCheckpoint - see design in #186 --------- Co-authored-by: Martin Peck <martinjohnpeck@gmail.com> Co-authored-by: Guy Bertental <gubert@microsoft.com>

…ig (#557) This PR fixes two config attributes, USE_CHECKPOINTS and EXECUTION_ENVIROMENT, to be lowercase to match the naming conventions and remove unused comment.

Added Checkpoints to the `02_qa_generation` and `03_querying` steps. The only code changes, other than the addition of the checkpoints, were to export code to methods so they could be wrapped with the checkpoint decorator. There are no functional changes in this PR. `01_index` index step is already covered as part of #472. The `04_evaluation` step is not covered, as a refactor will soon take place there. --------- Co-authored-by: Martin Peck <martinjohnpeck@gmail.com> Co-authored-by: Guy Bertental <gubert@microsoft.com> Co-authored-by: Guy Bertental <guybartal@gmail.com>

closes #505 Prompts are moved into subclass, better prompts for text generation added, chain-of-thought capabilities added for prompt Main new features are: - Added native and automatic support for JSON generation using openai format structure - Added prompt-level validation capabilities for generated answer - Added classes prompt with automatic validation of prompts and parameters - Added support for non-strict queries, meaning that now you can specify if particular prompt response may fail and it will be still ok - Added support for chain-of-thought reasoning (take a look at examples at QNA generation prompts - Moved prompts into separate txt files - Now q&a generation can fail generation of some questions without consequences as long as at least one question is generated - Changed python version to python 3.11 - Some prompts that were previously returning text now return JSONs - Some prompts that were previously returning JSONs now return text - Signature of generate_response now depends on prompt input parameters - Refactored all (except ragas) prompts, added examples. - some try/except blocks were removed due to new system of handling responses for non-strict queries. Meaning that all exceptions thrown from response_generator are critical. If NonStrict tag is added, in case of failed execution result will be None, unless exception is critical (e.g. non-related to LLM generation, but problem in the code) Metrics and comparison Comparison on data generated using this branch | Metric | New Data, New Prompts | New Data, Old Prompts | |-------------------------------------------|-----------------------|-----------------------| | fuzzy | 75.88 | **77.05** | | bert_all_MiniLM_L6_v2 | **80.58** | 79.60 | | cosine | **75.21** | 68.88 | | bert_distilbert_base_nli_stsb_mean_tokens | **76.90** | 76.14 | | llm_answer_relevance | **72.59** | 67.60 | | llm_context_precision | **91.03** | 84.62 | Comparison on data generated using development branch | Metric | Old Data, New Prompts | Old Data, Old Prompts | |-------------------------------------------|-----------------------|-----------------------| | fuzzy | **81.14** | 80.56 | | bert_all_MiniLM_L6_v2 | **69.71** | 66.58 | | cosine | **66.78** | 57.51 | | bert_distilbert_base_nli_stsb_mean_tokens | **70.49** | 67.78 | | llm_answer_relevance | **62.18** | 57.63 | | llm_context_precision | **84.62** | 78.85 | --------- Co-authored-by: Vadim Kirilin <vadimkirilin@Vadims-MacBook-Pro.local>

This PR fixes: - **llm_context_precision**: the metric considers the context used to generate the question-answer pair in step 2 (`02_qa_generation.py`), instead of the retrieved contexts. To assess the ability of the system to retrieve relevant chunks/contexts, we need to consider the relevancy of the retrieved contexts against the question. The computation of the metric is also using as input the generated answer (`actual`) instead of the question. The updated metric computes a simple average precision (proportion of relevant chunks without consideration of the ranking order). - **llm_context_recall**: this metric is also using the qna context instead of the retrieved contexts. To assess the ability of the system in retrieving contexts that are aligned with the ground truth answer we need to consider the retrieved contexts. The temperature parameter is removed as it's already part of the initialisation of `ResponseGenerator`. - **llm_answer_relevance**: the metric currently inputs the generated answer and the ground truth answer instead of the question and the generated answer.

Removed the separation of checkpoints to folders by their index_name and step. Now all checkpoint data is stored together

guybartal and others added 5 commits May 18, 2024 15:52

include prerelease branch in github workflows

e7e69c1

lowercase USE_CHECKPOINTS and EXECUTION_ENVIROMENT attributes in conf…

85cf175

…ig (#557) This PR fixes two config attributes, USE_CHECKPOINTS and EXECUTION_ENVIROMENT, to be lowercase to match the naming conventions and remove unused comment.

guybartal marked this pull request as draft May 18, 2024 19:43

quovadim and others added 3 commits May 20, 2024 12:48

Remove separation of checkpoints (#570)

057dd54

Removed the separation of checkpoints to folders by their index_name and step. Now all checkpoint data is stored together

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prerelease #556

Prerelease #556

guybartal commented May 18, 2024 •

edited

Prerelease #556

Are you sure you want to change the base?

Prerelease #556

Conversation

guybartal commented May 18, 2024 • edited

guybartal commented May 18, 2024 •

edited