Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log all hyper-parameters to mlflow #529

Merged
merged 16 commits into from
May 18, 2024
Merged

Conversation

guybartal
Copy link
Collaborator

@guybartal guybartal commented May 6, 2024

closes #480

This PR includes

  1. Log all hyper parameters to mlflow
  2. Config refactoring - lowercase for all attributes as those are not constants (match coding conventions)
  3. Print azure ml monitoring URL right after its creation to allow easy monitoring (ctrl + left click)
  4. Fix issues with experiment and job names, allowing azure ml commands open experiment and mlflow runs automatically, while locally we create those manually.
  5. Remove unused experiment settings from .env sample file
  6. Hide warning azureml warnings by using CliV2AnonymousEnvironment as Azure ML environment name
  7. Temporary solution for wrong json format in Q&A Gen step with current CI generation model version by removing all "..." strings from the model response

WIP

#540

Example

image

example for printing azure ml job monitoring runs:

vscode ➜ /workspaces/rag-experiment-accelerator (guy/fix-mlflow-tags) $ make azureml

»»» 🧩  running on Azure ML...
python3 azureml/pipeline.py  --data_dir ./data   --config_path ./config.json
2024-05-06 08:07:50,635 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: index_name_prefix = surface
2024-05-06 08:07:50,635 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: experiment_name = search types and chunk sizes
2024-05-06 08:07:50,635 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: job_name = search types with different chunk sizes
2024-05-06 08:07:50,635 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: job_description = experimenting with different search types and chunking strategies
2024-05-06 08:07:50,635 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: preprocess = False
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: chunking = {'chunk_size': [1000, 1500], 'overlap_size': [100], 'generate_title': False, 'generate_summary': False, 'override_content_with_summary': False}
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: embedding_models = [{'type': 'sentence-transformer', 'model_name': 'all-mpnet-base-v2'}]
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: ef_construction = [400]
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: ef_search = [400]
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: language = {'analyzers': {'analyzer_name': 'en.microsoft', 'index_analyzer_name': '', 'search_analyzer_name': '', 'char_filters': [], 'tokenizers': [], 'token_filters': []}, 'query_language': 'en-us'}
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: rerank = True
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: rerank_type = crossencoder
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: llm_re_rank_threshold = 3
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: cross_encoder_at_k = 4
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: crossencoder_model = cross-encoder/stsb-roberta-base
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: search_types = ['search_for_match_semantic', 'search_for_manual_hybrid']
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: retrieve_num_of_documents = 5
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: metric_types = ['cosine']
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: azure_oai_chat_deployment_name = gpt-35-turbo
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: azure_oai_eval_deployment_name = gpt-35-turbo
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: openai_temperature = 0
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: search_relevancy_threshold = 0.8
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: data_formats = all
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: eval_data_jsonl_file_path = ./artifacts/eval_data.jsonl
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: chunking_strategy = basic
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: chain_of_thoughts = True
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: hyde = disabled
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: query_expansion = False
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: min_query_expansion_related_question_similarity_score = 90
2024-05-06 08:07:50,636 - DEBUG - rag_experiment_accelerator.config.config - Configuration setting: azure_document_intelligence_model = prebuilt-read
2024-05-06 08:07:54,206 - INFO - __main__ - Starting pipeline for index: surface_p-0_cs-1000_o-100_efc-400_efs-400_sp-0_t-0_s-0_oc-0_all-mpnet-base-v2
Class AutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class AutoDeleteConditionSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseAutoDeleteSettingSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class IntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class ProtectionLevelSchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class BaseIntellectualPropertySchema: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Uploading rag-experiment-accelerator (28.15 MBs): 100%|████████████| 28145635/28145635 [00:28<00:00, 977345.83it/s]


Uploading config.json (< 1 MB): 100%|█████████████████████████████████████████| 1.85k/1.85k [00:00<00:00, 20.6kB/s]


2024-05-06 08:09:20,884 - INFO - __main__ - Pipeline job started...
Index name: surface_p-0_cs-1000_o-100_efc-400_efs-400_sp-0_t-0_s-0_oc-0_all-mpnet-base-v2
Monitoring url: https://ml.azure.com/runs/sleepy_egg_f3n0qvqf5f?wsid=/subscriptions/xxxx-xxxxx-xxxxxxx-xxxxxx/resourcegroups/xxxx/workspaces/xxxx&tid=xxxx-xxxxx-xxxxxxx-xxxxxx
2024-05-06 08:09:20,884 - INFO - __main__ - Starting pipeline for index: surface_p-0_cs-1500_o-100_efc-400_efs-400_sp-0_t-0_s-0_oc-0_all-mpnet-base-v2
2024-05-06 08:09:56,348 - INFO - __main__ - Pipeline job started...
Index name: surface_p-0_cs-1500_o-100_efc-400_efs-400_sp-0_t-0_s-0_oc-0_all-mpnet-base-v2
Monitoring url: https://ml.azure.com/runs/red_box_7sxtp59ftl?wsid=/subscriptions/xxxx-xxxxx-xxxxxxx-xxxxxx/resourcegroups/rg-guyb/workspaces/xxxx/workspaces/xxxx&tid=xxxx-xxxxx-xxxxxxx-xxxxxx

Comparing experiment runs

You can easily compare metrices and hyper parameters between experiment runs using Azure ML / ML Flow UI:

image

@guybartal guybartal changed the title Fix mlflow logging Log all hyper-parameters to mlflow May 6, 2024
guybartal and others added 8 commits May 9, 2024 15:30
The `expand_to_multiple_questions` flag in the config file was set to `false`. This commit updates the flag to `true` to enable the feature of splitting complex queries into multiple questions. This change allows the Language Model to generate multiple queries and retrieve relevant documents accordingly.
@guybartal guybartal self-assigned this May 16, 2024
@guybartal guybartal changed the base branch from development to prerelease May 18, 2024 12:48
@guybartal guybartal merged commit 618007a into prerelease May 18, 2024
3 checks passed
@guybartal guybartal mentioned this pull request May 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Log all config hyper parameters to mlflow tags
1 participant