Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Add regex comparator ~ to search functions for string attributes and tags #11898

Open
2 of 22 tasks
sydneyw-spotify opened this issue May 3, 2024 · 2 comments
Open
2 of 22 tasks
Labels
area/server-infra MLflow Tracking server backend area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request help wanted We would like help from the community to add this support

Comments

@sydneyw-spotify
Copy link

sydneyw-spotify commented May 3, 2024

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

Currently the only string comparators available to Search Experiments and Search Runs are =, !=, LIKE, ILIKE. While these comparators are sufficient for many use cases, they only allow for relatively simple queries, especially considering MLflow only allows you to chain together expressions with the AND operator and is missing the OR operator (see: #6075). While LIKE and ILIKE allow you to do some string matching they both lack the complexity of full regex expressions.

In addition, the FileStore already substitutes LIKE for the regex operator causing operations that worked in the FileStore to error out in a SqlAlchemyStore.

Motivation

What is the use case for this feature?

Users can make more complex queries of their experiments and runs.

Why is this use case valuable to support for MLflow users in general?

String querying is limited to relatively simple operations, this unlocks a much deeper querying ability. Additionally, regex supports an OR operator which at least partially unblocks user in need of that operation, while being much simpler to implement.

Why is this use case valuable to support for your project(s) or organization?

We need to query tags in a way that is outside the scope of LIKE

Why is it currently difficult to achieve this use case?

MLflow string querying is limited.

Details

Modify this function to be this:

@staticmethod
def get_sql_comparison_func(comparator, dialect):
    import sqlalchemy as sa

    def comparison_func(column, value):
        if comparator == "LIKE":
            return column.like(value)
        elif comparator == "ILIKE":
            return column.ilike(value)
        elif comparator == "IN":
            return column.in_(value)
        elif comparator == "NOT IN":
            return ~column.in_(value)
        elif comparator == "~":
            return column.regexp_match(value)
        return SearchUtils.get_comparison_func(comparator)(column, value)

And update all comparator checks to include the ~ operator such as here and here:

  VALID_PARAM_COMPARATORS = {"!=", "=", "~", LIKE_OPERATOR, ILIKE_OPERATOR}
  VALID_TAG_COMPARATORS = {"!=", "=", "~", LIKE_OPERATOR, ILIKE_OPERATOR}
  VALID_STRING_ATTRIBUTE_COMPARATORS = {"!=", "=", "~", LIKE_OPERATOR, ILIKE_OPERATOR, "IN", "NOT IN"}

and

if comparator not in ("=", "!=", "LIKE", "ILIKE", "~"):

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@sydneyw-spotify sydneyw-spotify added the enhancement New feature or request label May 3, 2024
@github-actions github-actions bot added the area/server-infra MLflow Tracking server backend label May 3, 2024
@sydneyw-spotify sydneyw-spotify changed the title [FR] Add regex comparator ~ to Search functions for string attributes and tags [FR] Add regex comparator ~ to search functions for string attributes and tags May 3, 2024
@github-actions github-actions bot added the area/tracking Tracking service, tracking client APIs, autologging label May 6, 2024
@harupy harupy added the help wanted We would like help from the community to add this support label May 9, 2024
@sydneyw-spotify
Copy link
Author

@harupy I opened this issue, happy to take it on myself

Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/server-infra MLflow Tracking server backend area/tracking Tracking service, tracking client APIs, autologging enhancement New feature or request help wanted We would like help from the community to add this support
Projects
None yet
Development

No branches or pull requests

2 participants