Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Conversion of non-dataframe type to dataframe in a pyfunc.PythonModel predict calls #11930

Open
2 of 23 tasks
ctufts-ncino opened this issue May 7, 2024 · 3 comments
Open
2 of 23 tasks
Labels
area/models MLmodel format, model serialization/deserialization, flavors bug Something isn't working

Comments

@ctufts-ncino
Copy link

Issues Policy acknowledgement

  • I have read and agree to submit bug reports in accordance with the issues policy

Where did you encounter this bug?

Local machine

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the MLflow community.

MLflow version

  • mlflow, version 2.10.2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): OSX Sonoma 14.4.1
  • Python version: 3.9.15

Describe the problem

I'm unsure if this is a bug, documentation issue, or feature request.
When passing a dictionary input to the mlflow.pyfunc.PythonModel predict function it is converted to a pandas.core.DataFrame . I can't find documentation to clearly state that this is the expected operation, but did see the following in the save_model docs:

If the predict method or function has type annotations, MLflow automatically constructs a model signature based on the type annotations (unless the signature argument is explicitly specified), and converts the input value to the specified type before passing it to the function. Currently, the following type annotations are supported:

        List[str]

        List[Dict[str, str]]

The issue is reproduced by:

  • defining a mlflow.pyfunc.PythonModel to accept a Dict as input to the predict function.
  • Infer the signature of the input and output of the predict call
  • Save the model with the signature
  • Load the model and call the predict function.
    When passing a dictionary input to the mlflow.pyfunc.PythonModel predict function it is converted to a pandas.core.DataFrame .

My primary goal here is to infer a signature that is a Dict and to process it as a dict at inference time. Is it expected behavior to infer the signature based on type Dict, but then always try to convert it to dataframe at inference time?

Tracking information

NA

Code to reproduce issue

from typing import List, Dict, Any
import mlflow

class MyModel(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input: Dict) -> Dict:
        print(f"model input type: {type(model_input)}")
        return model_input


example_input = {"input": 1}
signature = mlflow.models.infer_signature(example_input, MyModel().predict(context=None, model_input=example_input)) # infer signature and submit bug case.
mlflow.pyfunc.save_model("model", python_model=MyModel(), input_example=example_input, signature=signature) # add signature and submit bug case. 
mlflow.pyfunc.load_model("model").predict(example_input)

Output:

2024/05/07 16:27:28 INFO mlflow.types.utils: MLflow 2.9.0 introduces model signature with new data types for lists and dictionaries. For input such as Dict[str, Union[scalars, List, Dict]], we infer dictionary values types as `List -> Array` and `Dict -> Object`. 
2024/05/07 16:27:28 INFO mlflow.types.utils: MLflow 2.9.0 introduces model signature with new data types for lists and dictionaries. For input such as Dict[str, Union[scalars, List, Dict]], we infer dictionary values types as `List -> Array` and `Dict -> Object`. 
infer signature
model input type: <class 'dict'>
load model and predict
model input type: <class 'pandas.core.frame.DataFrame'>

Stack trace

NA

Other info / logs

No response

What component(s) does this bug affect?

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

What interface(s) does this bug affect?

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

What language(s) does this bug affect?

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations
@ctufts-ncino ctufts-ncino added the bug Something isn't working label May 7, 2024
@github-actions github-actions bot added the area/models MLmodel format, model serialization/deserialization, flavors label May 7, 2024
@harupy
Copy link
Member

harupy commented May 9, 2024

@ctufts-ncino As documented, we only support:

  • List[str]
  • List[Dict[str, str]]

@ctufts-ncino
Copy link
Author

@harupy Thank you for your response. Is there explicit documentation pointing to the conversion of all objects to Dataframe when using a predict function (with the exception of the types supported by type hints)? If so, my apologies, otherwise I'd request the docs are updated to clearly state this, especially given that MLFlow's more recent updates state there is wider support for different input data types, when in reality they are being converted back to DF.

The only documentation I can find regarding DF conversions is in regards to the input_example and example_no_conversion flag, but I don't see anything for a model logged without an example.

Copy link

@mlflow/mlflow-team Please assign a maintainer and start triaging this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/models MLmodel format, model serialization/deserialization, flavors bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants