You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have read and agree to submit bug reports in accordance with the issues policy
Where did you encounter this bug?
Local machine
Willingness to contribute
Yes. I can contribute a fix for this bug independently.
MLflow version
Client: 2.8.1
Tracking server: 2.8.1
System information
OS Platform and Distribution: MacOS Sonoma 14.4.1
Python version: 3.10.13
Describe the problem
Unlike the other mlflow.log_* functions, mlflow.log_table internally appends to an mlflow tag mlflow.loggedArtifacts (code).
Since mlflow tags have a character limit of 5000, this creates a limit on the number of table artifacts that can be logged to a single run: If the the sum length of logged artifact paths exceeds 5000, log_table will raise RestException: INVALID_PARAMETER_VALUE.
For now, one can get around this by writing out the data and using mlflow.log_artifact instead. My primary concern is that this pattern of tracking artifacts with a tag of finite length does not get extended to log_artifact.
Tracking information
REPLACE_ME
Code to reproduce issue
import mlflow
with mlflow.start_run() as mlflow_run:
for i in range(80):
artifact_path = 'test_directory/this_is_my_test_sub_directory/test_table_{}.csv'.format(i)
mlflow.log_table(df, artifact_path)
Stack trace
---------------------------------------------------------------------------
RestException Traceback (most recent call last)
Cell In[68], line 10
8 for i in range(80):
9 artifact_path = 'test_directory/this_is_my_test_sub_directory/test_table_{}.csv'.format(i)
---> 10 mlflow.log_table(df, artifact_path)
File /venv/lib/python3.10/site-packages/mlflow/tracking/fluent.py:1244, in log_table(data, artifact_file)
1202 """
1203 Log a table to MLflow Tracking as a JSON artifact. If the artifact_file already exists
1204 in the run, the data would be appended to the existing artifact_file.
(...)
1241 mlflow.log_table(data=df, artifact_file="qabot_eval_results.json")
1242 """
1243 run_id = _get_or_start_run().info.run_id
-> 1244 MlflowClient().log_table(run_id, data, artifact_file)
File /venv/lib/python3.10/site-packages/mlflow/tracking/client.py:1611, in MlflowClient.log_table(self, run_id, data, artifact_file)
1609 current_tag_value.append(tag_value)
1610 # Set the tag with the updated list
-> 1611 self.set_tag(run_id, MLFLOW_LOGGED_ARTIFACTS, json.dumps(current_tag_value))
File /venv/lib/python3.10/site-packages/mlflow/tracking/client.py:924, in MlflowClient.set_tag(self, run_id, key, value, synchronous)
870 def set_tag(
871 self, run_id: str, key: str, value: Any, synchronous: bool = True
872 ) -> Optional[RunOperations]:
873 """
874 Set a tag on the run with the specified ID. Value is converted to a string.
875
(...)
922 Tags: {'nlp.framework': 'Spark NLP'}
923 """
--> 924 return self._tracking_client.set_tag(run_id, key, value, synchronous=synchronous)
File /venv/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py:366, in TrackingServiceClient.set_tag(self, run_id, key, value, synchronous)
364 tag = RunTag(key, str(value))
365 if synchronous:
--> 366 self.store.set_tag(run_id, tag)
367 else:
368 return self.store.set_tag_async(run_id, tag)
File /venv/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py:234, in RestStore.set_tag(self, run_id, tag)
225 """
226 Set a tag for the specified run
227
228 :param run_id: String ID of the run
229 :param tag: RunTag instance to log
230 """
231 req_body = message_to_json(
232 SetTag(run_uuid=run_id, run_id=run_id, key=tag.key, value=tag.value)
233 )
--> 234 self._call_endpoint(SetTag, req_body)
File /venv/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py:59, in RestStore._call_endpoint(self, api, json_body)
57 endpoint, method = _METHOD_TO_INFO[api]
58 response_proto = api.Response()
---> 59 return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)
File /venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py:210, in call_endpoint(host_creds, endpoint, method, json_body, response_proto, extra_headers)
208 call_kwargs["json"] = json_body
209 response = http_request(**call_kwargs)
--> 210 response = verify_rest_response(response, endpoint)
211 js_dict = json.loads(response.text)
212 parse_dict(js_dict=js_dict, message=response_proto)
File /venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py:142, in verify_rest_response(response, endpoint)
140 if response.status_code != 200:
141 if _can_parse_as_json_object(response.text):
--> 142 raise RestException(json.loads(response.text))
143 else:
144 base_msg = (
145 f"API request to endpoint {endpoint} "
146 f"failed with error code {response.status_code} != 200"
147 )
RestException: INVALID_PARAMETER_VALUE: Tag value '[{"path": "test_directory/this_is_my_test_sub_directory/test_table_0.csv", "type": "table"}, {"path": "test_directory/this_is_my_test_sub_directory/test_table_1.csv", "type": "table"}, {"path": "test_directory/this_is_my_test_sub_directory/test_table' had length 5012, which exceeded length limit of 5000
Other info / logs
REPLACE_ME
What component(s) does this bug affect?
area/artifacts: Artifact stores and artifact logging
area/build: Build and test infrastructure for MLflow
area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
area/docs: MLflow documentation pages
area/examples: Example code
area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
area/models: MLmodel format, model serialization/deserialization, flavors
Issues Policy acknowledgement
Where did you encounter this bug?
Local machine
Willingness to contribute
Yes. I can contribute a fix for this bug independently.
MLflow version
System information
Describe the problem
Unlike the other
mlflow.log_*
functions,mlflow.log_table
internally appends to an mlflow tagmlflow.loggedArtifacts
(code).Since mlflow tags have a character limit of 5000, this creates a limit on the number of table artifacts that can be logged to a single run: If the the sum length of logged artifact paths exceeds 5000,
log_table
will raiseRestException: INVALID_PARAMETER_VALUE
.For now, one can get around this by writing out the data and using
mlflow.log_artifact
instead. My primary concern is that this pattern of tracking artifacts with a tag of finite length does not get extended tolog_artifact
.Tracking information
Code to reproduce issue
Stack trace
Other info / logs
What component(s) does this bug affect?
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/deployments
: MLflow Deployments client APIs, server, and third-party Deployments integrationsarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingWhat interface(s) does this bug affect?
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportWhat language(s) does this bug affect?
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: