Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Unable to deploy a ML model locally to MLFlow #2235

Open
1 task done
PriyanshBhardwaj opened this issue Jan 7, 2024 · 4 comments
Open
1 task done

[BUG]: Unable to deploy a ML model locally to MLFlow #2235

PriyanshBhardwaj opened this issue Jan 7, 2024 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@PriyanshBhardwaj
Copy link

PriyanshBhardwaj commented Jan 7, 2024

System Information

python = 3.9
zenml version = 0.53.1
os = macos
integration = mlflow (downloaded separately by pip install mlflow)

What happened?

Unable to deploy a ml model locally in mlflow. The problem lies in class MLFlowDeploymentService in file mlflow_deployment.py.

Please check the reproduction steps to understand the issue clearly.

Reproduction steps

I followed all steps correctly, set the experiment tracker, model deployer and created the stack:

zenml experiment-tracker register mlflow_tracker --flavor=mlflow

zenml model-deployer register mlflow --flavor=mlflow

zenml stack register mlflow_stack -a default -o default -d mlflow -e mlflow_tracker --set

i created a pipeline which will ingest data, train the model, evaluate performance and then deploy model after passing the trigger:

df = ingest_data(data_path = data_path)
x_train, x_test, y_train, y_test = clean_data(df)
model = train_model(x_train, x_test, y_train, y_test)
r2, mse, rmse = evaluate_model(model, x_test, y_test)
deployment_decision = deployment_trigger(accuracy=mse)      #deploying on the basis of mse score
mlflow_model_deployer_step(
    model=model,
    deploy_decision=deployment_decision,
    workers=workers,
    timeout=timeout,
)

I debugged it, everything is working good and the model is also passing the deployment trigger but the model deployer is not working properly. the problem is with this step:

mlflow_model_deployer_step(
    model=model,
    deploy_decision=deployment_decision,
    workers=workers,
    timeout=timeout,
)

when pipeline calls it, the log which prints is:

Updating an existing MLflow deployment service: MLFlowDeploymentService[2ade1153-7fd3-45d1-8ecd-412f86b264b5] (type: model-serving, flavor: mlflow)

this gets logged from function deploy_model() which is in the file /zenml/integrations/mlflow/model_deployers/mlflow_model_deployer.py . In the same function at line 210 it calls service.start() which is in the file /zenml/services/local/local_service.py and in the same under the start function it logs Starting service 'MLFlowDeploymentService[2ade1153-7fd3-45d1-8ecd-412f86b264b5] (type: model-serving, flavor: mlflow)'. from line 387 and then when it calls if not self.poll_service_status(timeout): at line 391 it logs error:

Timed out waiting for service MLFlowDeploymentService[2ade1153-7fd3-45d1-8ecd-412f86b264b5] (type: model-serving, flavor: mlflow) to become active:
  Administrative state: active
  Operational state: inactive
  Last status message: 'service daemon is not running'
For more information on the service status, please see the following log file: 

when i visited the log file it says:

TypeError: Cannot load service with unregistered service type:
type='model-serving' flavor='mlflow' name='mlflow-deployment'
description='MLflow prediction service'

it raises this issue from here: /zenml/services/service_registry.py:193 in load_service_from_dict

but in the file mlflow_model_deployer.py at line 187 service gets its value from here service = cast(MLFlowDeploymentService, existing_service) which is MLFlowDeploymentService[2ade1153-7fd3-45d1-8ecd-412f86b264b5] (type: model-serving, flavor: mlflow).

but in class MLFlowDeploymentService in file mlflow_deployment.py at line 128 type is already defined as "model-serving" which i think cant be changed:

SERVICE_TYPE = ServiceType(
        name="mlflow-deployment",
        type="model-serving",
        flavor="mlflow",
        description="MLflow prediction service",
    )

It is getting timed out due to this because MLFlowDeploymentService[2ade1153-7fd3-45d1-8ecd-412f86b264b5] (type: model-serving, flavor: mlflow) is not starting and it will always give timeout error doesn't matter what will be the value of timeout.

so why it is giving this error in log file

TypeError: Cannot load service with unregistered service type:

I tried everything to solve it:
created new stack from scratch
created new pipeline in different stacks
initialize zenml again by deleting .zen folder and again calling zenml init in terminal in same directory

i also tried to use --type=mlflow while creating model deployer as mentioned somewhere in your old docs in this command:

zenml model-deployer register mlflow --type=mlflow --flavor=mlflow

It obviously didnt work.

Nothing worked and also your docs doesnt have a solution for such problems.

My issue is I tried everything to debug this but not able to deploy my model locally to mlflow bcz i cant change type in your internal class which is the root cause. Please resolve the issue and please update your logs and make them more clear for the users.

P.S: model size 1000 bytes only, timeout 60, 120 didnt work for both values
I'm using a separate env for this.
latest version of zenml and mlflow

Relevant log output

TypeError: Cannot load service with unregistered service type:
type='model-serving' flavor='mlflow' name='mlflow-deployment'
description='MLflow prediction service'
Cleanup: terminating children processes...

Code of Conduct

  • I agree to follow this project's Code of Conduct
@Vishal-Padia
Copy link
Contributor

@PriyanshBhardwaj

The inconsistent type naming between the model deployer registration and the MLFlowDeploymentService definition could also be contributing to this issue.
One thing you try is to register the MLFLow model deployer as

zenml model-deployer register mlflow --type=model-serving --flavor=mlflow

As it's defined in ServiceType as:

SERVICE_TYPE = ServiceType(
        name="mlflow-deployment",
        type="model-serving",
        flavor="mlflow",
        description="MLflow prediction service",
    )

Or we can change the type in src\zenml\integrations\mlflow\services\mlflow_deployment.py on line 128 to just mlflow rather than mlflow-serving.
So that the service type name matches what is defined in the MLFlowDeploymentService class.
This can contribute to the issue you are facing.

@PriyanshBhardwaj
Copy link
Author

PriyanshBhardwaj commented Jan 8, 2024

@Vishal-Padia thanks for your response.

zenml model-deployer register mlflow --type=model-serving --flavor=mlflow
this will not work as --type is an extra field in model deployer registration as you can see in the error below. I tried it because I saw it in some old docs.

ValidationError: 1 validation error for MLFlowModelDeployerConfig
type
  extra fields not permitted (type=value_error.extra)

for your 2nd solution, I tried it but it didn't work.

@Vishal-Padia
Copy link
Contributor

@PriyanshBhardwaj
okay, I got it.
I guess Alex will be able to zero-down on the issue!

@strickvl strickvl changed the title [BUG]: I'm unable to deploy a ML model locally to MLFlow using zenml [BUG]: Unable to deploy a ML model locally to MLFlow using zenml Feb 5, 2024
@strickvl strickvl changed the title [BUG]: Unable to deploy a ML model locally to MLFlow using zenml [BUG]: Unable to deploy a ML model locally to MLFlow Feb 5, 2024
@avishniakov
Copy link
Contributor

Hello @PriyanshBhardwaj , sorry about the delay!

As I learned from the info provided you install mlflow using pip directly, which might not work well with zenml due to version mismatch. Moreover mlflow integration also pulls important components for model deployment, the full list is (for 0.53.1 version you used): 'mlflow>=2.1.1,<=2.9.2', 'mlserver>=1.3.3', 'mlserver-mlflow>=1.3.3'.

Can you do the following and retest?

pip3 uninstall mlflow
zenml integration install mlflow -y

Moreover, forking on MacOS might not be working always smoothly, if OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES is not set, so also ensure that this ENV is properly set before rerunning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants