-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prerelease #556
Draft
guybartal
wants to merge
8
commits into
development
Choose a base branch
from
prerelease
base: development
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Prerelease #556
+3,293
−1,011
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
closes #480 ### This PR includes - Log all hyper parameters to mlflow - Config refactoring - lowercase for all attributes as those are not constants (match coding conventions) - Print azure ml monitoring URL right after its creation to allow easy access to monitoring (ctrl + left click) - Fix issues with experiment and job names, allowing azure ml commands open experiment and mlflow runs automatically, while locally we create those manually. - Remove unused experiment settings from `.env` sample file - Hide warning azureml warnings by using `CliV2AnonymousEnvironment` as Azure ML environment name - Temporary solution for wrong json format in Q&A Gen step with current CI generation model version by removing all "..."` strings from the model response ### WIP #540
# Resolves #186 This PR introduces a checkpointing feature to skip the processing of data that has already been processed in previous runs. A checkpoint object is used to wrap methods, so when the method is called with an ID that was called before, instead of executing the method, the checkpoint will return the result of the previous execution. To use it, first initialize the checkpoint (should be done only once, can be done anywhere in the pipeline): ``` init_checkpoint(checkpoint_name, config) ``` and decorate the method you want to use it for: ``` @cache_with_checkpoint(id="arg2.id") def method(arg1, arg2): pass ``` or wrap the method using the checkpoint object: ``` get_checkpoint().load_or_run(method, arg2.id, arg1, arg2) ``` (arg2.id is the ID that uniquely identifies the call in this example) This call will check if the provided method has previously been executed with the given ID, If it has, it returns the cached result, otherwise, it executes the method with the given arguments and caches the result for future calls. The design for this feature can be found in #186 # This PR includes: * Added Checkpoint - an abstract class for the checkpoint interface * Added LocalStorageCheckpoint - a Checkpoint implementation for checkpoints on the local machine * Added NullCheckpoint - a dummy Checkpoint implementation for skipping the checkpoint feature * Implemented the usage of Checkpoints in the 01_index step * Added a Make command to delete the checkpoints data * Added tests * Tiny fix for the misconfiguration of flake8. # Next PRs: * Checkpointing to other steps (this PR is only for the `index` step only) * Implementation for AzureMLCheckpoint - see design in #186 --------- Co-authored-by: Martin Peck <martinjohnpeck@gmail.com> Co-authored-by: Guy Bertental <gubert@microsoft.com>
…ig (#557) This PR fixes two config attributes, USE_CHECKPOINTS and EXECUTION_ENVIROMENT, to be lowercase to match the naming conventions and remove unused comment.
Added Checkpoints to the `02_qa_generation` and `03_querying` steps. The only code changes, other than the addition of the checkpoints, were to export code to methods so they could be wrapped with the checkpoint decorator. There are no functional changes in this PR. `01_index` index step is already covered as part of #472. The `04_evaluation` step is not covered, as a refactor will soon take place there. --------- Co-authored-by: Martin Peck <martinjohnpeck@gmail.com> Co-authored-by: Guy Bertental <gubert@microsoft.com> Co-authored-by: Guy Bertental <guybartal@gmail.com>
closes #505 Prompts are moved into subclass, better prompts for text generation added, chain-of-thought capabilities added for prompt Main new features are: - Added native and automatic support for JSON generation using openai format structure - Added prompt-level validation capabilities for generated answer - Added classes prompt with automatic validation of prompts and parameters - Added support for non-strict queries, meaning that now you can specify if particular prompt response may fail and it will be still ok - Added support for chain-of-thought reasoning (take a look at examples at QNA generation prompts - Moved prompts into separate txt files - Now q&a generation can fail generation of some questions without consequences as long as at least one question is generated - Changed python version to python 3.11 - Some prompts that were previously returning text now return JSONs - Some prompts that were previously returning JSONs now return text - Signature of generate_response now depends on prompt input parameters - Refactored all (except ragas) prompts, added examples. - some try/except blocks were removed due to new system of handling responses for non-strict queries. Meaning that all exceptions thrown from response_generator are critical. If NonStrict tag is added, in case of failed execution result will be None, unless exception is critical (e.g. non-related to LLM generation, but problem in the code) Metrics and comparison Comparison on data generated using this branch | Metric | New Data, New Prompts | New Data, Old Prompts | |-------------------------------------------|-----------------------|-----------------------| | fuzzy | 75.88 | **77.05** | | bert_all_MiniLM_L6_v2 | **80.58** | 79.60 | | cosine | **75.21** | 68.88 | | bert_distilbert_base_nli_stsb_mean_tokens | **76.90** | 76.14 | | llm_answer_relevance | **72.59** | 67.60 | | llm_context_precision | **91.03** | 84.62 | Comparison on data generated using development branch | Metric | Old Data, New Prompts | Old Data, Old Prompts | |-------------------------------------------|-----------------------|-----------------------| | fuzzy | **81.14** | 80.56 | | bert_all_MiniLM_L6_v2 | **69.71** | 66.58 | | cosine | **66.78** | 57.51 | | bert_distilbert_base_nli_stsb_mean_tokens | **70.49** | 67.78 | | llm_answer_relevance | **62.18** | 57.63 | | llm_context_precision | **84.62** | 78.85 | --------- Co-authored-by: Vadim Kirilin <vadimkirilin@Vadims-MacBook-Pro.local>
This PR fixes: - **llm_context_precision**: the metric considers the context used to generate the question-answer pair in step 2 (`02_qa_generation.py`), instead of the retrieved contexts. To assess the ability of the system to retrieve relevant chunks/contexts, we need to consider the relevancy of the retrieved contexts against the question. The computation of the metric is also using as input the generated answer (`actual`) instead of the question. The updated metric computes a simple average precision (proportion of relevant chunks without consideration of the ranking order). - **llm_context_recall**: this metric is also using the qna context instead of the retrieved contexts. To assess the ability of the system in retrieving contexts that are aligned with the ground truth answer we need to consider the retrieved contexts. The temperature parameter is removed as it's already part of the initialisation of `ResponseGenerator`. - **llm_answer_relevance**: the metric currently inputs the generated answer and the ground truth answer instead of the question and the generated answer.
Removed the separation of checkpoints to folders by their index_name and step. Now all checkpoint data is stored together
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR Includes: