[BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls #11930

ctufts-ncino · 2024-05-07T20:33:16Z

Issues Policy acknowledgement

I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

mlflow, version 2.10.2

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OSX Sonoma 14.4.1
Python version: 3.9.15

Describe the problem

I'm unsure if this is a bug, documentation issue, or feature request.
When passing a dictionary input to the mlflow.pyfunc.PythonModel predict function it is converted to a pandas.core.DataFrame . I can't find documentation to clearly state that this is the expected operation, but did see the following in the save_model docs:

If the predict method or function has type annotations, MLflow automatically constructs a model signature based on the type annotations (unless the signature argument is explicitly specified), and converts the input value to the specified type before passing it to the function. Currently, the following type annotations are supported:

        List[str]

        List[Dict[str, str]]

The issue is reproduced by:

defining a mlflow.pyfunc.PythonModel to accept a Dict as input to the predict function.
Infer the signature of the input and output of the predict call
Save the model with the signature
Load the model and call the predict function.
When passing a dictionary input to the mlflow.pyfunc.PythonModel predict function it is converted to a pandas.core.DataFrame .

My primary goal here is to infer a signature that is a Dict and to process it as a dict at inference time. Is it expected behavior to infer the signature based on type Dict, but then always try to convert it to dataframe at inference time?

Tracking information

NA

Code to reproduce issue

from typing import List, Dict, Any
import mlflow

class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input: Dict) -> Dict:
        print(f"model input type: {type(model_input)}")
        return model_input


example_input = {"input": 1}
signature = mlflow.models.infer_signature(example_input, MyModel().predict(context=None, model_input=example_input)) # infer signature and submit bug case.
mlflow.pyfunc.save_model("model", python_model=MyModel(), input_example=example_input, signature=signature) # add signature and submit bug case. 
mlflow.pyfunc.load_model("model").predict(example_input)

Output:

2024/05/07 16:27:28 INFO mlflow.types.utils: MLflow 2.9.0 introduces model signature with new data types for lists and dictionaries. For input such as Dict[str, Union[scalars, List, Dict]], we infer dictionary values types as `List -> Array` and `Dict -> Object`. 
2024/05/07 16:27:28 INFO mlflow.types.utils: MLflow 2.9.0 introduces model signature with new data types for lists and dictionaries. For input such as Dict[str, Union[scalars, List, Dict]], we infer dictionary values types as `List -> Array` and `Dict -> Object`. 
infer signature
model input type: <class 'dict'>
load model and predict
model input type: <class 'pandas.core.frame.DataFrame'>

Stack trace

NA

Other info / logs

No response

What component(s) does this bug affect?

What interface(s) does this bug affect?

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

What language(s) does this bug affect?

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

harupy · 2024-05-09T09:45:33Z

@ctufts-ncino As documented, we only support:

List[str]
List[Dict[str, str]]

ctufts-ncino · 2024-05-09T12:59:29Z

@harupy Thank you for your response. Is there explicit documentation pointing to the conversion of all objects to Dataframe when using a predict function (with the exception of the types supported by type hints)? If so, my apologies, otherwise I'd request the docs are updated to clearly state this, especially given that MLFlow's more recent updates state there is wider support for different input data types, when in reality they are being converted back to DF.

The only documentation I can find regarding DF conversions is in regards to the input_example and example_no_conversion flag, but I don't see anything for a model logged without an example.

github-actions · 2024-05-15T00:13:34Z

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

ctufts-ncino added the bug Something isn't working label May 7, 2024

github-actions bot added the area/models MLmodel format, model serialization/deserialization, flavors label May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls #11930

[BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls #11930

ctufts-ncino commented May 7, 2024

harupy commented May 9, 2024

ctufts-ncino commented May 9, 2024

github-actions bot commented May 15, 2024

[BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls #11930

[BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls #11930

Comments

ctufts-ncino commented May 7, 2024

Issues Policy acknowledgement

Where did you encounter this bug?

Willingness to contribute

MLflow version

System information

Describe the problem

Tracking information

Code to reproduce issue

Stack trace

Other info / logs

What component(s) does this bug affect?

What interface(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

harupy commented May 9, 2024

ctufts-ncino commented May 9, 2024

github-actions bot commented May 15, 2024