Artificial artifact limit in `log_table` [BUG] #11874

marcosjt7 · 2024-05-01T20:54:37Z

Issues Policy acknowledgement

I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

MLflow version

Client: 2.8.1
Tracking server: 2.8.1

System information

OS Platform and Distribution: MacOS Sonoma 14.4.1
Python version: 3.10.13

Describe the problem

Unlike the other mlflow.log_* functions, mlflow.log_table internally appends to an mlflow tag mlflow.loggedArtifacts (code).
Since mlflow tags have a character limit of 5000, this creates a limit on the number of table artifacts that can be logged to a single run: If the the sum length of logged artifact paths exceeds 5000, log_table will raise RestException: INVALID_PARAMETER_VALUE.

For now, one can get around this by writing out the data and using mlflow.log_artifact instead. My primary concern is that this pattern of tracking artifacts with a tag of finite length does not get extended to log_artifact.

Tracking information

REPLACE_ME

Code to reproduce issue

import mlflow

with mlflow.start_run() as mlflow_run:
    for i in range(80):
        artifact_path = 'test_directory/this_is_my_test_sub_directory/test_table_{}.csv'.format(i)
        mlflow.log_table(df, artifact_path)

Stack trace

---------------------------------------------------------------------------
RestException                             Traceback (most recent call last)
Cell In[68], line 10
      8 for i in range(80):
      9     artifact_path = 'test_directory/this_is_my_test_sub_directory/test_table_{}.csv'.format(i)
---> 10     mlflow.log_table(df, artifact_path)

File /venv/lib/python3.10/site-packages/mlflow/tracking/fluent.py:1244, in log_table(data, artifact_file)
   1202 """
   1203 Log a table to MLflow Tracking as a JSON artifact. If the artifact_file already exists
   1204 in the run, the data would be appended to the existing artifact_file.
   (...)
   1241         mlflow.log_table(data=df, artifact_file="qabot_eval_results.json")
   1242 """
   1243 run_id = _get_or_start_run().info.run_id
-> 1244 MlflowClient().log_table(run_id, data, artifact_file)

File /venv/lib/python3.10/site-packages/mlflow/tracking/client.py:1611, in MlflowClient.log_table(self, run_id, data, artifact_file)
   1609 current_tag_value.append(tag_value)
   1610 # Set the tag with the updated list
-> 1611 self.set_tag(run_id, MLFLOW_LOGGED_ARTIFACTS, json.dumps(current_tag_value))

File /venv/lib/python3.10/site-packages/mlflow/tracking/client.py:924, in MlflowClient.set_tag(self, run_id, key, value, synchronous)
    870 def set_tag(
    871     self, run_id: str, key: str, value: Any, synchronous: bool = True
    872 ) -> Optional[RunOperations]:
    873     """
    874     Set a tag on the run with the specified ID. Value is converted to a string.
    875 
   (...)
    922         Tags: {'nlp.framework': 'Spark NLP'}
    923     """
--> 924     return self._tracking_client.set_tag(run_id, key, value, synchronous=synchronous)

File /venv/lib/python3.10/site-packages/mlflow/tracking/_tracking_service/client.py:366, in TrackingServiceClient.set_tag(self, run_id, key, value, synchronous)
    364 tag = RunTag(key, str(value))
    365 if synchronous:
--> 366     self.store.set_tag(run_id, tag)
    367 else:
    368     return self.store.set_tag_async(run_id, tag)

File /venv/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py:234, in RestStore.set_tag(self, run_id, tag)
    225 """
    226 Set a tag for the specified run
    227 
    228 :param run_id: String ID of the run
    229 :param tag: RunTag instance to log
    230 """
    231 req_body = message_to_json(
    232     SetTag(run_uuid=run_id, run_id=run_id, key=tag.key, value=tag.value)
    233 )
--> 234 self._call_endpoint(SetTag, req_body)

File /venv/lib/python3.10/site-packages/mlflow/store/tracking/rest_store.py:59, in RestStore._call_endpoint(self, api, json_body)
     57 endpoint, method = _METHOD_TO_INFO[api]
     58 response_proto = api.Response()
---> 59 return call_endpoint(self.get_host_creds(), endpoint, method, json_body, response_proto)

File /venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py:210, in call_endpoint(host_creds, endpoint, method, json_body, response_proto, extra_headers)
    208     call_kwargs["json"] = json_body
    209     response = http_request(**call_kwargs)
--> 210 response = verify_rest_response(response, endpoint)
    211 js_dict = json.loads(response.text)
    212 parse_dict(js_dict=js_dict, message=response_proto)

File /venv/lib/python3.10/site-packages/mlflow/utils/rest_utils.py:142, in verify_rest_response(response, endpoint)
    140 if response.status_code != 200:
    141     if _can_parse_as_json_object(response.text):
--> 142         raise RestException(json.loads(response.text))
    143     else:
    144         base_msg = (
    145             f"API request to endpoint {endpoint} "
    146             f"failed with error code {response.status_code} != 200"
    147         )

RestException: INVALID_PARAMETER_VALUE: Tag value '[{"path": "test_directory/this_is_my_test_sub_directory/test_table_0.csv", "type": "table"}, {"path": "test_directory/this_is_my_test_sub_directory/test_table_1.csv", "type": "table"}, {"path": "test_directory/this_is_my_test_sub_directory/test_table' had length 5012, which exceeded length limit of 5000

Other info / logs

REPLACE_ME

What component(s) does this bug affect?

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

github-actions · 2024-05-09T00:13:09Z

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

marcosjt7 added the bug Something isn't working label May 1, 2024

github-actions bot added the area/artifacts Artifact stores and artifact logging label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Artificial artifact limit in `log_table` [BUG] #11874

Artificial artifact limit in `log_table` [BUG] #11874

marcosjt7 commented May 1, 2024

github-actions bot commented May 9, 2024

Artificial artifact limit in log_table [BUG] #11874

Artificial artifact limit in log_table [BUG] #11874

Comments

marcosjt7 commented May 1, 2024

Issues Policy acknowledgement

Where did you encounter this bug?

Willingness to contribute

MLflow version

System information

Describe the problem

Tracking information

Code to reproduce issue

Stack trace

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

github-actions bot commented May 9, 2024

Artificial artifact limit in `log_table` [BUG] #11874

Artificial artifact limit in `log_table` [BUG] #11874