Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GridSearchCV support callback for MLFlow #26395

Open
tianhuil opened this issue May 18, 2023 · 1 comment
Open

GridSearchCV support callback for MLFlow #26395

tianhuil opened this issue May 18, 2023 · 1 comment

Comments

@tianhuil
Copy link

tianhuil commented May 18, 2023

Describe the workflow you want to enable

I would like to save off the results of all runs in GridSearchCV to MLFlow. MLFlow

for param in params:
    with mlflow.start_run():
        est = ElasticNet(**param)
        est.fit(train_x, train_y)
        metrics = est.score(test_x, test_y)
        mlflow.log_params(param)
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(est, "model")

See https://mlflow.org/docs/latest/tutorials-and-examples/tutorial.html for more details:

I would like to use GridSearchCV to do the above because it comes with many other features (e.g. HalvingGridSearchCV, multi-threading, etc ...)

Describe your proposed solution

A callback parameter to GridSearchCV. Perhaps

def log_candidate(model, test_x, test_y):
  with mlflow.start_run():
        mlflow.log_params(model.get_params())
        mlflow.log_metrics(metrics)
        mlflow.sklearn.log_model(est, "model")

Describe alternatives you've considered, if relevant

To hack the scorer for this purpose: https://danielhnyk.cz/adding-callback-to-a-sklearn-gridsearch/

This is suboptimal because:

  1. If you want to return multiple metrics, you cannot save multiple scores using the provided API. This is because we have to pass multiple scorers, not a function that generates multiple scores.
  2. Enabling return_train_score will call the scorer callback too many times and it is not easy to distinguish between the training and testing scoring.

Additional context

No response

@tianhuil tianhuil added Needs Triage Issue requires triage New Feature labels May 18, 2023
@Micky774 Micky774 removed the Needs Triage Issue requires triage label May 19, 2023
@Micky774
Copy link
Contributor

Hi @tianhuil! Adding a callback API is a fairly large undertaking, and indeed already in progress (#22000)!

I'll leave this issue open for now, since afaik it is a new/unique use-case and is helpful to keep in mind, but bear in mind that this feature is probably not going to be released for some time and still requires much work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants