GridSearchCV support callback for MLFlow #26395

tianhuil · 2023-05-18T14:10:58Z

Describe the workflow you want to enable

I would like to save off the results of all runs in GridSearchCV to MLFlow. MLFlow

for param in params:
    with mlflow.start_run():
        est = ElasticNet(**param)
        est.fit(train_x, train_y)
        metrics = est.score(test_x, test_y)
        mlflow.log_params(param)
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(est, "model")

See https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html for more details:

I would like to use GridSearchCV to do the above because it comes with many other features (e.g. HalvingGridSearchCV, multi-threading, etc ...)

Describe your proposed solution

A callback parameter to GridSearchCV. Perhaps

def log_candidate(model, test_x, test_y):
  with mlflow.start_run():
        mlflow.log_params(model.get_params())
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(est, "model")

Describe alternatives you've considered, if relevant

To hack the scorer for this purpose: https://danielhnyk.cz/adding-callback-to-a-sklearn-gridsearch/

This is suboptimal because:

If you want to return multiple metrics, you cannot save multiple scores using the provided API. This is because we have to pass multiple scorers, not a function that generates multiple scores.
Enabling return_train_score will call the scorer callback too many times and it is not easy to distinguish between the training and testing scoring.

Additional context

No response

The text was updated successfully, but these errors were encountered:

Micky774 · 2023-05-19T17:55:55Z

Hi @tianhuil! Adding a callback API is a fairly large undertaking, and indeed already in progress (#22000)!

I'll leave this issue open for now, since afaik it is a new/unique use-case and is helpful to keep in mind, but bear in mind that this feature is probably not going to be released for some time and still requires much work.

tianhuil added Needs Triage Issue requires triage New Feature labels May 18, 2023

Micky774 removed the Needs Triage Issue requires triage label May 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GridSearchCV support callback for MLFlow #26395

GridSearchCV support callback for MLFlow #26395

tianhuil commented May 18, 2023 •

edited

Micky774 commented May 19, 2023

GridSearchCV support callback for MLFlow #26395

GridSearchCV support callback for MLFlow #26395

Comments

tianhuil commented May 18, 2023 • edited

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Micky774 commented May 19, 2023

tianhuil commented May 18, 2023 •

edited