Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] log_table support for csv (and not only json) #11829

Open
2 of 22 tasks
turbotimon opened this issue Apr 25, 2024 · 5 comments
Open
2 of 22 tasks

[FR] log_table support for csv (and not only json) #11829

turbotimon opened this issue Apr 25, 2024 · 5 comments
Assignees
Labels
area/artifacts Artifact stores and artifact logging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server enhancement New feature or request

Comments

@turbotimon
Copy link
Contributor

turbotimon commented Apr 25, 2024

Willingness to contribute

Yes. I can contribute this feature independently.

Proposal Summary

The experimental log_table feature is great! However, it supports only json. I think supporting also csv would be a benefit for the user and easy to implement.

Motivation

What is the use case for this feature?

Log data as csv (instead of only json)

Why is this use case valuable to support for MLflow users in general?

  • It's easier to view, as mlflow already renders csv as html tables nicely in the browser (and json as plain jsons)
  • More flexibility like e.g, log_dict supports json and yml.

Why is this use case valuable to support for your project(s) or organization?

Why is it currently difficult to achieve this use case?

  • not really, mlflow.log_text(df.to_csv(), artifact_file="example.csv") would to the job. However, it is not intuitive as there is a log_table function

Details

  • i would make use of the existing pandas.to_csv function
  • i would decide if csv or json from the provided artifact_file string: If csv=csv, else json (likewise its done in log_dict for json/yaml)

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@turbotimon turbotimon added the enhancement New feature or request label Apr 25, 2024
@github-actions github-actions bot added area/artifacts Artifact stores and artifact logging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server labels Apr 25, 2024
@WeichenXu123
Copy link
Collaborator

This feature makes sense ! :)

@turbotimon
Copy link
Contributor Author

The function in its current form seems to do much more than one would expect and has some inconsistencies:

  • It supports images in table and does some "magic" with it like saving them to a pre-defined folder name in two sizes. This is neither documented nor do i see a broad use case for that..

  • The corresponding load_table does not have this functionality to load images..

@marcosjt7
Copy link

marcosjt7 commented May 1, 2024

I ran into another issue with log_table: unlike any other of the log_artifact functions, it uses a pattern in which it appends the artifact path to an mlflow tag mlflow.loggedArtifacts (see here). Since mlflow tags have a max length of 5000, this creates an artificial limit to the number of artifacts that can be saved per run. Basically, the summed total length of all table artifact paths cannot exceed an mlflow tag's character limit, or log_table will error. For now one can get around this by using log_artifact instead, but this pattern should not be generalized.

@turbotimon
Copy link
Contributor Author

@marcosjt7 thanks for pointing this out. I think the whole focus of log_table is unclear and it should be reconsidered as a whole. I searched for the initial issue of this feature which may would help to clarify, but i couldn't find anything..

Copy link

github-actions bot commented May 3, 2024

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts Artifact stores and artifact logging area/uiux Front-end, user experience, plotting, JavaScript, JavaScript dev server enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants