Support files for pyfunc models and model_config #11951

annzhang-db · 2024-05-09T00:06:54Z

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11951/merge

Checkout with GitHub CLI

gh pr checkout 11951

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Move shared utils for langchain and pyfunc to models/utils
Support logging and loading code from files for pyfunc models
- Introduce code model loader module
Support yaml files for model_config in pyfunc models

How is this PR tested?

Existing unit/integration tests
New unit/integration tests
Manual tests

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

No. You can skip the rest of this section.
Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

Interface

area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
area/windows: Windows support

Language

language/r: R APIs and clients
language/java: Java APIs and clients
language/new: Proposals for new client languages

Integrations

integrations/azure: Azure and Azure ML integrations
integrations/sagemaker: SageMaker integrations
integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
rn/feature - A new user-facing feature worth mentioning in the release notes
rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

Yes should be selected for bug fixes, documentation updates, and other small changes. No should be selected for new features and larger changes. If you're unsure about the release classification of this PR, leave this unchecked to let the maintainers decide.

What is a minor/patch release?

Minor release: a release that increments the second part of the version number (e.g., 1.2.0 -> 1.3.0).
Bug fixes, doc updates and new features usually go into minor releases.
Patch release: a release that increments the third part of the version number (e.g., 1.2.0 -> 1.2.1).
Bug fixes and doc updates usually go into patch releases.

Yes (this PR will be cherry-picked and included in the next patch release)
No (this PR will be included in the next minor release)

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

github-actions · 2024-05-09T00:07:16Z

Documentation preview for afb8b84 will be available when this CircleCI job
completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/9123100722.

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

sunishsheth2009

@harupy can you help review the code as well? Specially pyfunc.load_model.

Other than that, it looks good to me.

mlflow/pyfunc/__init__.py

bbqiu

looks great - would it be possible for you to adjust the error message here as well?

mlflow/mlflow/pyfunc/__init__.py

Lines 2154 to 2155 in 1a665ca

    
           "When `python_model` is a callable object, it must accept exactly one argument. " 
        
           f"Found {num_args} arguments.",

mlflow/langchain/__init__.py

mlflow/pyfunc/__init__.py

bbqiu · 2024-05-09T17:59:12Z

could you also fill in the testing section w/ any DB notebooks you tested the loading behavior of paths that lead to db notebooks in?

mlflow/pyfunc/__init__.py

harupy · 2024-05-10T10:29:33Z

mlflow/pyfunc/__init__.py

    if python_model:
+        model_code_path = None
+        if isinstance(python_model, str):
+            model_code_path = _validate_and_get_model_code_path(python_model)


What would happen if the python_model file depends on other custom modules or external resources? Do they need to be logged as artifacts?

Not exactly sure how that would work - can we address this in a follow-up?

What would happen if the python_model file depends on other custom modules or external resources? Do they need to be logged as artifacts?

@harupy I think those custom modules and resources can be passed in as pip_requirements or code_paths and those should still work here right? Or am I missing something?

@annzhang-db @sunishsheth2009 can we write a test for that case, if we assume it will work? If it doesn't work, we need to either fix it or update documentation to indicate that it isn't supported

Wrote a test for this - code_paths works with the functionality 😄

mlflow/pyfunc/__init__.py

tests/pyfunc/pyfunc_sample_code_with_config.py

mlflow/utils/model_utils.py

mlflow/pyfunc/__init__.py

dbczumar

Left some comments that should simplify and derisk these changes by removing lots of if / else logic.

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

mlflow/pyfunc/model.py

mlflow/pyfunc/__init__.py

mlflow/utils/model_utils.py

dbczumar

Thanks @annzhang-db ! Left some comments - let me know if you have questions!

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db · 2024-05-16T23:41:56Z

mlflow/langchain/__init__.py

Changes to this file are all for clarity, no functionality changes.

annzhang-db · 2024-05-16T23:42:38Z

mlflow/langchain/__init__.py

+        lc_model = _load_model_code_path(model_code_path, model_config)
+        _validate_and_copy_file_to_directory(model_code_path, path, "code")
+    else:
+        lc_model = lc_model_or_path


After this point, lc_model is always a langchain model instance - it cannot be a path.

annzhang-db · 2024-05-16T23:43:00Z

mlflow/langchain/__init__.py

@@ -252,49 +252,32 @@ def load_retriever(persist_directory):
    import langchain
    from langchain.schema import BaseRetriever

-    lc_model = _validate_and_wrap_lc_model(lc_model, loader_fn)
+    lc_model_or_path = _validate_and_prepare_lc_model_or_path(lc_model, loader_fn)


Use variable name lc_model_or_path while both a langchain model or file path are possible.

annzhang-db · 2024-05-16T23:44:23Z

mlflow/langchain/utils.py

Refactor - move shared functions from langchain/utils.py to models/utils.py

annzhang-db · 2024-05-16T23:46:52Z

mlflow/models/utils.py

Refactor - moved functions verbatim from langchain/utils.py here to be used for pyfunc as well. No code changes.

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db · 2024-05-16T23:50:29Z

mlflow/langchain/__init__.py

@@ -900,49 +882,6 @@ def load_model(model_uri, dst_path=None):
    return _load_model_from_local_fs(local_model_path)


-@contextmanager


Moved code to models/utils.py

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db · 2024-05-17T00:03:30Z

mlflow/langchain/utils.py

Moved functions to models/utils.py.

annzhang-db · 2024-05-17T00:07:19Z

mlflow/pyfunc/__init__.py

@@ -2328,8 +2363,6 @@ def predict(model_input: List[str]) -> List[str]:
        )
        raise MlflowException(message=msg, error_code=INVALID_PARAMETER_VALUE)

-    _validate_and_prepare_target_save_path(path)


Moved up so we can copy the model code file to the save path.

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

mlflow/pyfunc/model.py

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

dbczumar · 2024-05-17T05:43:58Z

mlflow/models/model.py

+            from mlflow.langchain import _validate_and_prepare_lc_model_or_path

-            # If its not a PyFuncModel, then it should be a Langchain model
-            _validate_and_wrap_lc_model(model, None)
+            # If its not a PyFuncModel, then it should be a Langchain model (not a path)
+            # Check this since the validation function does not
+            if isinstance(model, str):
+                raise mlflow.MlflowException(
+                    "Model should either be an instance of PyFuncModel or Langchain type."
+                )
+            model = _validate_and_prepare_lc_model_or_path(model, None)


@annzhang-db @sunishsheth2009 I recall that we talked about making set_model support a Python function too. This is also supported by mlflow.pyfunc.log_model(). Can we support this if it's easy or file a follow-up internal ticket?

Yes we have a ticket for that option as well. We will have to do it as a follow up. :)

dbczumar · 2024-05-17T05:56:02Z

mlflow/langchain/__init__.py

@@ -839,7 +820,8 @@ def predict(
        return result


-def _load_pyfunc(path):
+# TODO: Support loading langchain with model_config. For now, this is a no-op.


@annzhang-db Do we have an internal ticket to track this? It would be very useful

I'll make a ticket!

dbczumar

@annzhang-db LGTM once remaining comments are addressed. Tried this out manually with a few variations on the following scripts; it works quite well:

script.py

import mlflow
from mlflow.models import set_model
from mlflow.models import ModelConfig

config = ModelConfig()

class Model(mlflow.pyfunc.PythonModel):
    def predict(self, context, model_input, params):
        print("CONFIG", config.config)
        print("CONFIG", config.get("A"))
        print("PARAMS", params)
        return model_input.apply(lambda column: column + 1)

set_model(Model())

logger.py

import mlflow
from mlflow.models import infer_signature



import pandas

df = pandas.DataFrame(data=[[1, 2, 3]], columns=["a", "b", "c"])

sig = infer_signature(df, params={"BAZ": "BAR"})

model_info = mlflow.pyfunc.log_model(
    python_model="script.py",
    artifact_path="baz",
    signature=sig,
    model_config={"A": "B"},
)


print(model_info.model_uri)

mod = mlflow.pyfunc.load_model(model_info.model_uri, model_config={"A": "B"})
print(mod)
print(mod.predict(df, params={"BAZ": "BAR"}))

support files in pyfunc

96c0cf3

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db requested review from sunishsheth2009 and bbqiu May 9, 2024 00:06

github-actions bot added the rn/none List under Small Changes in Changelogs. label May 9, 2024

annzhang-db added 2 commits May 8, 2024 17:53

pyfunc tests

d994610

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

load langchain fix

670db08

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

bbqiu mentioned this pull request May 9, 2024

add pyfunc code log model support #11901

Closed

39 tasks

sunishsheth2009 reviewed May 9, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into clean-pyfunc

1a665ca

bbqiu reviewed May 9, 2024

View reviewed changes

mlflow/langchain/__init__.py Outdated Show resolved Hide resolved

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

annzhang-db requested a review from harupy May 10, 2024 08:38

harupy reviewed May 10, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

harupy reviewed May 10, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

harupy reviewed May 10, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Show resolved Hide resolved

harupy reviewed May 10, 2024

View reviewed changes

tests/pyfunc/pyfunc_sample_code_with_config.py Outdated Show resolved Hide resolved

harupy reviewed May 10, 2024

View reviewed changes

mlflow/utils/model_utils.py Outdated Show resolved Hide resolved

harupy reviewed May 10, 2024

View reviewed changes

mlflow/utils/model_utils.py Outdated Show resolved Hide resolved

dbczumar reviewed May 14, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

dbczumar reviewed May 14, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

dbczumar requested changes May 14, 2024

View reviewed changes

annzhang-db added 7 commits May 14, 2024 17:40

update

7509c0b

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

update

cfa8cd4

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

update

f766f09

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

update

debbc3d

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

Merge remote-tracking branch 'upstream/master' into clean-pyfunc

7bada7e

fix

4576ec4

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

fix

6b7ddaf

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

dbczumar reviewed May 16, 2024

View reviewed changes

mlflow/pyfunc/model.py Outdated Show resolved Hide resolved

dbczumar reviewed May 16, 2024

View reviewed changes

mlflow/pyfunc/__init__.py Outdated Show resolved Hide resolved

dbczumar reviewed May 16, 2024

View reviewed changes

mlflow/utils/model_utils.py Outdated Show resolved Hide resolved

dbczumar reviewed May 16, 2024

View reviewed changes

mlflow/utils/model_utils.py Outdated Show resolved Hide resolved

dbczumar reviewed May 16, 2024

View reviewed changes

mlflow/utils/model_utils.py Show resolved Hide resolved

dbczumar reviewed May 16, 2024

View reviewed changes

mlflow/utils/model_utils.py Outdated Show resolved Hide resolved

dbczumar requested changes May 16, 2024

View reviewed changes

update

550fe56

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db commented May 16, 2024

View reviewed changes

mlflow/langchain/__init__.py

Copy link

Collaborator Author

annzhang-db May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this file are all for clarity, no functionality changes.

annzhang-db commented May 16, 2024

View reviewed changes

mlflow/langchain/utils.py Outdated

Copy link

Collaborator Author

annzhang-db May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor - move shared functions from langchain/utils.py to models/utils.py

annzhang-db commented May 16, 2024

View reviewed changes

refactor

afe4b10

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db commented May 16, 2024

View reviewed changes

annzhang-db added 2 commits May 16, 2024 17:00

clean

d97ac3b

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

rename var

5c2129c

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

annzhang-db commented May 17, 2024

View reviewed changes

mlflow/langchain/utils.py Outdated

Copy link

Collaborator Author

annzhang-db May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved functions to models/utils.py.

annzhang-db commented May 17, 2024

View reviewed changes

annzhang-db added 2 commits May 16, 2024 17:22

code paths testing

816fa3f

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

check for chatmodel type

1368889

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

dbczumar requested a review from ian-ack-db May 17, 2024 01:33

dbczumar reviewed May 17, 2024

View reviewed changes

mlflow/pyfunc/model.py Show resolved Hide resolved

annzhang-db requested a review from dbczumar May 17, 2024 04:40

annzhang-db added 2 commits May 16, 2024 21:49

add streamable model as code test

c21f4eb

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

formatting

afb8b84

Signed-off-by: Ann Zhang <ann.zhang@databricks.com>

dbczumar reviewed May 17, 2024

View reviewed changes

dbczumar approved these changes May 17, 2024

View reviewed changes

annzhang-db merged commit 9eeaef9 into mlflow:master May 17, 2024
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support files for pyfunc models and model_config #11951

Support files for pyfunc models and model_config #11951

annzhang-db commented May 9, 2024 •

edited

github-actions bot commented May 9, 2024 •

edited

sunishsheth2009 left a comment

bbqiu left a comment

bbqiu commented May 9, 2024

harupy May 10, 2024 •

edited

annzhang-db May 15, 2024

sunishsheth2009 May 16, 2024

dbczumar May 16, 2024

annzhang-db May 17, 2024

dbczumar left a comment

dbczumar left a comment

annzhang-db May 16, 2024

annzhang-db May 16, 2024

annzhang-db May 16, 2024 •

edited

annzhang-db May 16, 2024

annzhang-db May 16, 2024 •

edited

annzhang-db May 16, 2024

annzhang-db May 17, 2024

annzhang-db May 17, 2024

dbczumar May 17, 2024

sunishsheth2009 May 17, 2024

dbczumar May 17, 2024

annzhang-db May 17, 2024

dbczumar left a comment

	"When `python_model` is a callable object, it must accept exactly one argument. "
	f"Found {num_args} arguments.",

		@@ -900,49 +882,6 @@ def load_model(model_uri, dst_path=None):
		return _load_model_from_local_fs(local_model_path)


		@contextmanager

Support files for pyfunc models and model_config #11951

Support files for pyfunc models and model_config #11951

Conversation

annzhang-db commented May 9, 2024 • edited

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

Should this PR be included in the next patch release?

github-actions bot commented May 9, 2024 • edited

sunishsheth2009 left a comment

Choose a reason for hiding this comment

bbqiu left a comment

Choose a reason for hiding this comment

bbqiu commented May 9, 2024

harupy May 10, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annzhang-db May 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

annzhang-db May 16, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dbczumar left a comment

Choose a reason for hiding this comment

annzhang-db commented May 9, 2024 •

edited

github-actions bot commented May 9, 2024 •

edited

harupy May 10, 2024 •

edited

annzhang-db May 16, 2024 •

edited

annzhang-db May 16, 2024 •

edited