Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prerelease #556

Draft
wants to merge 8 commits into
base: development
Choose a base branch
from
Draft

Prerelease #556

wants to merge 8 commits into from

Conversation

guybartal
Copy link
Collaborator

@guybartal guybartal commented May 18, 2024

guybartal and others added 5 commits May 18, 2024 15:52
closes #480 

### This PR includes

- Log all hyper parameters to mlflow
- Config refactoring - lowercase for all attributes as those are not constants (match coding conventions)
- Print azure ml monitoring URL right after its creation to allow easy access to monitoring (ctrl + left click)
- Fix issues with experiment and job names, allowing azure ml commands open experiment and mlflow runs automatically, while locally we create those manually.
- Remove unused experiment settings from `.env` sample file
- Hide warning azureml warnings by using `CliV2AnonymousEnvironment` as Azure ML environment name
- Temporary solution for wrong json format in Q&A Gen step with current CI generation model version by removing all "..."` strings from the model response

### WIP
#540
# Resolves #186

This PR introduces a checkpointing feature to skip the processing of
data that has already been processed in previous runs.
A checkpoint object is used to wrap methods, so when the method is
called with an ID that was called before, instead of executing the
method, the checkpoint will return the result of the previous execution.

To use it, first initialize the checkpoint (should be done only once,
can be done anywhere in the pipeline):
```
init_checkpoint(checkpoint_name, config)
```

and decorate the method you want to use it for:
```
@cache_with_checkpoint(id="arg2.id")
def method(arg1, arg2):
    pass
```

or wrap the method using the checkpoint object:

```
get_checkpoint().load_or_run(method, arg2.id, arg1, arg2)
```

(arg2.id is the ID that uniquely identifies the call in this example)

This call will check if the provided method has previously been executed
with the given ID, If it has, it returns the cached result, otherwise,
it executes the method with the given arguments and caches the result
for future calls.

The design for this feature can be found in #186 

# This PR includes:

* Added Checkpoint - an abstract class for the checkpoint interface
* Added LocalStorageCheckpoint - a Checkpoint implementation for
checkpoints on the local machine
* Added NullCheckpoint - a dummy Checkpoint implementation for skipping
the checkpoint feature
* Implemented the usage of Checkpoints in the 01_index step
* Added a Make command to delete the checkpoints data
* Added tests
* Tiny fix for the misconfiguration of flake8.

# Next PRs:
* Checkpointing to other steps (this PR is only for the `index` step
only)
* Implementation for AzureMLCheckpoint - see design in #186

---------

Co-authored-by: Martin Peck <martinjohnpeck@gmail.com>
Co-authored-by: Guy Bertental <gubert@microsoft.com>
…ig (#557)

This PR fixes two config attributes, USE_CHECKPOINTS and
EXECUTION_ENVIROMENT, to be lowercase to match the naming conventions
and remove unused comment.
Added Checkpoints to the `02_qa_generation`  and `03_querying` steps.

The only code changes, other than the addition of the checkpoints, were
to export code to methods so they could be wrapped with the checkpoint
decorator. There are no functional changes in this PR.

`01_index` index step is already covered as part of
#472.
The `04_evaluation` step is not covered, as a refactor will soon take
place there.

---------

Co-authored-by: Martin Peck <martinjohnpeck@gmail.com>
Co-authored-by: Guy Bertental <gubert@microsoft.com>
Co-authored-by: Guy Bertental <guybartal@gmail.com>
@guybartal guybartal marked this pull request as draft May 18, 2024 19:43
quovadim and others added 3 commits May 20, 2024 12:48
closes #505 
Prompts are moved into subclass, better prompts for text generation
added, chain-of-thought capabilities added for prompt

Main new features are:
- Added native and automatic support for JSON generation using openai
format structure
- Added prompt-level validation capabilities for generated answer
- Added classes prompt with automatic validation of prompts and
parameters
- Added support for non-strict queries, meaning that now you can specify
if particular prompt response may fail and it will be still ok
- Added support for chain-of-thought reasoning (take a look at examples
at QNA generation prompts
- Moved prompts into separate txt files
- Now q&a generation can fail generation of some questions without
consequences as long as at least one question is generated
- Changed python version to python 3.11
- Some prompts that were previously returning text now return JSONs 
- Some prompts that were previously returning JSONs now return text
- Signature of generate_response now depends on prompt input parameters
- Refactored all (except ragas) prompts, added examples.
- some try/except blocks were removed due to new system of handling
responses for non-strict queries. Meaning that all exceptions thrown
from response_generator are critical. If NonStrict tag is added, in case
of failed execution result will be None, unless exception is critical
(e.g. non-related to LLM generation, but problem in the code)

Metrics and comparison

Comparison on data generated using this branch

| Metric | New Data, New Prompts | New Data, Old Prompts |

|-------------------------------------------|-----------------------|-----------------------|
| fuzzy | 75.88 | **77.05** |
| bert_all_MiniLM_L6_v2 | **80.58** | 79.60 |
| cosine | **75.21** | 68.88 |
| bert_distilbert_base_nli_stsb_mean_tokens | **76.90** | 76.14 |
| llm_answer_relevance | **72.59** | 67.60 |
| llm_context_precision | **91.03** | 84.62 |

Comparison on data generated using development branch

| Metric | Old Data, New Prompts | Old Data, Old Prompts |

|-------------------------------------------|-----------------------|-----------------------|
| fuzzy | **81.14** | 80.56 |
| bert_all_MiniLM_L6_v2 | **69.71** | 66.58 |
| cosine | **66.78** | 57.51 |
| bert_distilbert_base_nli_stsb_mean_tokens | **70.49** | 67.78 |
| llm_answer_relevance | **62.18** | 57.63 |
| llm_context_precision | **84.62** | 78.85 |

---------

Co-authored-by: Vadim Kirilin <vadimkirilin@Vadims-MacBook-Pro.local>
This PR fixes:

- **llm_context_precision**: the metric considers the context used to
generate the question-answer pair in step 2 (`02_qa_generation.py`),
instead of the retrieved contexts. To assess the ability of the system
to retrieve relevant chunks/contexts, we need to consider the relevancy
of the retrieved contexts against the question. The computation of the
metric is also using as input the generated answer (`actual`) instead of
the question. The updated metric computes a simple average precision
(proportion of relevant chunks without consideration of the ranking
order).
- **llm_context_recall**: this metric is also using the qna context
instead of the retrieved contexts. To assess the ability of the system
in retrieving contexts that are aligned with the ground truth answer we
need to consider the retrieved contexts. The temperature parameter is
removed as it's already part of the initialisation of
`ResponseGenerator`.
- **llm_answer_relevance**: the metric currently inputs the generated
answer and the ground truth answer instead of the question and the
generated answer.
Removed the separation of checkpoints to folders by their index_name and
step.
Now all checkpoint data is stored together
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants