Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Get or create run #11783

Open
1 of 22 tasks
BalkanFlink opened this issue Apr 22, 2024 · 5 comments · May be fixed by #11896
Open
1 of 22 tasks

[FR] Get or create run #11783

BalkanFlink opened this issue Apr 22, 2024 · 5 comments · May be fixed by #11896
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request has-closing-pr This issue has a closing PR

Comments

@BalkanFlink
Copy link

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

It would be convenient to our data scientists if there was one function that could search for an experiment's run by name and return the relevant run object if it exists, or create it if it does not exist. If there exist more than one run with that run name in the experiment, it would return an error.

Motivation

What is the use case for this feature?

Users would like to pick up where they left off with a run. This would be easier and quicker to do with the run name rather than the run id

Why is this use case valuable to support for MLflow users in general?

It would save them having to search for the run id in the MLflow UI before they can obtain the run via Python API and resume their work.

Why is this use case valuable to support for your project(s) or organization?

Making the lives of Data Scientists easier by removing a step from their workflow

Why is it currently difficult to achieve this use case?

Resuming a specific run currently requires knowing it's run id (via the UI), whereas it would be a smoother experience to just search/create by run name.

Details

I'm happy to contribute this feature. I would add a method to the mlflow client to first search for an existing run with the same run name (using mlflow.search_runs(filter_string="run_name='myexistingrun'") ) or create a new run with that run name if it does not exist. If there is more than one run with this name, it would throw an error.

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@BalkanFlink BalkanFlink added the enhancement New feature or request label Apr 22, 2024
@github-actions github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label Apr 22, 2024
@daniellok-db
Copy link
Collaborator

Hi @BalkanFlink, i think the request makes sense. With regard to searching runs by name, there is already some support for this via the mlflow.search_runs() api. The syntax is not as convenient as just having run_name, but you can use it like this:

mlflow.search_runs(
  experiment_ids=[0], 
  # if you know the exact run name
  filter_string="attributes.run_name='shivering-fox-792'"
)

mlflow.search_runs(
  experiment_ids=[0], 
  # if you only know part of the run name
  filter_string="attributes.run_name LIKE '%fox%'"
)

depending on the result of the search, you can either create a run or retrieve the run id from the search result. let me know if this solves your use case!

@BalkanFlink
Copy link
Author

Hi @daniellok-db , thanks for the info. I know it is possible to do already, but it would be convenient as a small standalone function in my opinion. I will already be developing this function as part of a ticket, the decision now is whether I can contribute it to MLflow directly instead of building our own internal wrapper function. Should I fork the repo and raise a PR related to this issue (#11783) ?

@daniellok-db
Copy link
Collaborator

I see! Yes, feel free to file a PR and the MLflow team will review it 😄

Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

@m-blasiak m-blasiak linked a pull request May 3, 2024 that will close this issue
39 tasks
@github-actions github-actions bot added the has-closing-pr This issue has a closing PR label May 3, 2024
@BalkanFlink
Copy link
Author

hey @daniellok-db , would you be able to take a look at our PR for this please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request has-closing-pr This issue has a closing PR
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants