Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to initialize Python stub + ModuleNotFoundError: No module named 'nvtabular', 'merlin' #7158

Open
zwei2016 opened this issue Apr 25, 2024 · 2 comments

Comments

@zwei2016
Copy link

zwei2016 commented Apr 25, 2024

Description
A clear and concise description of what the bug is.

I am following the tutorial online: https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb

After creating the model "executor_model", I tried to run the Triton Inference Server with

docker run --gpus=1 --rm --net=host -v /home/***/workspace/data/models:/models nvcr.io/nvidia/tritonserver:24.03-py3 tritonserver --model-repository=/models
=============================
== Triton Inference Server ==
=============================

NVIDIA Release 24.03 (build 86102629)
Triton Server Version 2.44.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

I0425 00:38:39.764967 1 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x205000000' with size 268435456
I0425 00:38:39.765025 1 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 67108864
I0425 00:38:39.770824 1 model_lifecycle.cc:469] loading: 1_predictpytorchtriton:1
I0425 00:38:39.770870 1 model_lifecycle.cc:469] loading: executor_model:1
I0425 00:38:39.770891 1 model_lifecycle.cc:469] loading: 0_transformworkflowtriton:1
I0425 00:38:40.340567 1 libtorch.cc:2467] TRITONBACKEND_Initialize: pytorch
I0425 00:38:40.340593 1 libtorch.cc:2477] Triton TRITONBACKEND API version: 1.19
I0425 00:38:40.340603 1 libtorch.cc:2483] 'pytorch' TRITONBACKEND API version: 1.19
I0425 00:38:40.340635 1 libtorch.cc:2516] TRITONBACKEND_ModelInitialize: 1_predictpytorchtriton (version 1)
W0425 00:38:40.342571 1 libtorch.cc:318] skipping model configuration auto-complete for '1_predictpytorchtriton': not supported for pytorch backend
I0425 00:38:40.343279 1 libtorch.cc:347] Optimized execution is enabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343301 1 libtorch.cc:366] Cache Cleaning is disabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343304 1 libtorch.cc:383] Inference Mode is enabled for model instance '1_predictpytorchtriton'
I0425 00:38:40.343350 1 libtorch.cc:2560] TRITONBACKEND_ModelInstanceInitialize: 1_predictpytorchtriton_0_0 (GPU device 0)
I0425 00:38:40.431502 1 model_lifecycle.cc:835] successfully loaded '1_predictpytorchtriton'
I0425 00:38:40.653469 157 pb_stub.cc:290] I0425 00:38:40.653469 156 pb_stub.cc:290]  Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'nvtabular'

At:
  /models/0_transformworkflowtriton/1/model.py(32): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load
 Failed to initialize Python stub for auto-complete: ModuleNotFoundError: No module named 'merlin'

At:
  /models/executor_model/1/model.py(31): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load


E0425 00:38:40.655660 1 model_lifecycle.cc:638] failed to load 'executor_model' version 1: Internal: ModuleNotFoundError: No module named 'merlin'

At:
  /models/executor_model/1/model.py(31): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

E0425 00:38:40.655687 1 model_lifecycle.cc:638] failed to load '0_transformworkflowtriton' version 1: Internal: ModuleNotFoundError: No module named 'nvtabular'

At:
  /models/0_transformworkflowtriton/1/model.py(32): <module>
  <frozen importlib._bootstrap>(241): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(883): exec_module
  <frozen importlib._bootstrap>(703): _load_unlocked
  <frozen importlib._bootstrap>(1006): _find_and_load_unlocked
  <frozen importlib._bootstrap>(1027): _find_and_load

I0425 00:38:40.655692 1 model_lifecycle.cc:773] failed to load 'executor_model'
I0425 00:38:40.655723 1 model_lifecycle.cc:773] failed to load '0_transformworkflowtriton'
I0425 00:38:40.655779 1 server.cc:607]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0425 00:38:40.655820 1 server.cc:634]
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path                                                    | Config
                                                                                                            |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+
| pytorch | /opt/tritonserver/backends/pytorch/libtriton_pytorch.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
| python  | /opt/tritonserver/backends/python/libtriton_python.so   | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+---------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0425 00:38:40.655868 1 server.cc:677]
+---------------------------+---------+-------------------------------------------------------------------------+
| Model                     | Version | Status                                                                  |
+---------------------------+---------+-------------------------------------------------------------------------+
| 0_transformworkflowtriton | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'nvtabular' |
|                           |         |                                                                         |
|                           |         | At:                                                                     |
|                           |         |   /models/0_transformworkflowtriton/1/model.py(32): <module>            |
|                           |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed         |
|                           |         |   <frozen importlib._bootstrap_external>(883): exec_module              |
|                           |         |   <frozen importlib._bootstrap>(703): _load_unlocked                    |
|                           |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked          |
|                           |         |   <frozen importlib._bootstrap>(1027): _find_and_load                   |
| 1_predictpytorchtriton    | 1       | READY                                                                   |
| executor_model            | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'merlin'    |
|                           |         |                                                                         |
|                           |         | At:                                                                     |
|                           |         |   /models/executor_model/1/model.py(31): <module>                       |
|                           |         |   <frozen importlib._bootstrap>(241): _call_with_frames_removed         |
|                           |         |   <frozen importlib._bootstrap_external>(883): exec_module              |
|                           |         |   <frozen importlib._bootstrap>(703): _load_unlocked                    |
|                           |         |   <frozen importlib._bootstrap>(1006): _find_and_load_unlocked          |
|                           |         |   <frozen importlib._bootstrap>(1027): _find_and_load                   |
+---------------------------+---------+-------------------------------------------------------------------------+

I0425 00:38:40.679675 1 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA GeForce RTX 3070 Ti Laptop GPU
I0425 00:38:40.689920 1 metrics.cc:770] Collecting CPU metrics
I0425 00:38:40.690196 1 tritonserver.cc:2538]
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value

     |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton

     |
| server_version                   | 2.44.0

     |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0]         | /models

     |
| model_control_mode               | MODE_NONE

     |
| strict_model_config              | 0

     |
| rate_limit                       | OFF

     |
| pinned_memory_pool_byte_size     | 268435456

     |
| cuda_memory_pool_byte_size{0}    | 67108864

     |
| min_supported_compute_capability | 6.0

     |
| strict_readiness                 | 1

     |
| exit_timeout                     | 30

     |
| cache_enabled                    | 0

     |
+----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0425 00:38:40.690214 1 server.cc:307] Waiting for in-flight requests to complete.
I0425 00:38:40.690219 1 server.cc:323] Timeout 30: Found 0 model versions that have in-flight inferences
I0425 00:38:40.690350 1 server.cc:338] All models are stopped, unloading models
I0425 00:38:40.690365 1 server.cc:347] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0425 00:38:40.690430 1 libtorch.cc:2594] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0425 00:38:40.691390 1 libtorch.cc:2539] TRITONBACKEND_ModelFinalize: delete model state
I0425 00:38:40.691732 1 model_lifecycle.cc:620] successfully unloaded '1_predictpytorchtriton' version 1
I0425 00:38:41.690694 1 server.cc:347] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
W0425 00:38:41.693252 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000
error: creating server: Internal - failed to load all models
W0425 00:38:42.701892 1 metrics.cc:631] Unable to get power limit for GPU 0. Status:Success, value:0.000000

Triton Information
What version of Triton are you using?
tritonserver:24.03-py3

Are you using the Triton container or did you build it yourself?
docker nvcr.io/nvidia/tritonserver:24.03-py3

To Reproduce
Steps to reproduce the behavior.

I followed this online tutorial : https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb

Expected behavior
The server should reply to client with the following message:
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}>
bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')

@rmccorm4
Copy link
Collaborator

Hi @zwei2016,

You'll need to install any python dependencies necessary for your python model inside of the container before starting the server. For example, via pip install ....

You can prep a custom Docker so you can re-use it across runs as well:

FROM nvcr.io/nvidia/tritonserver:24.03-py3
RUN pip install ...

You can also look into packaging the dependencies along with your python model though custom environments: https://github.com/triton-inference-server/python_backend?tab=readme-ov-file#creating-custom-execution-environments

@zwei2016
Copy link
Author

zwei2016 commented May 9, 2024

Thanks Ryan @rmccorm4
I customized the docker image nvcr.io/nvidia/tritonserver:24.03-py3 with installing the necessary libs and commit it as a new docker image. It works. Thank you.

By the way, when I try to use the server as described in the tutorial:
from merlin.systems.triton.utils import send_triton_request response = send_triton_request(workflow.input_schema, df, output_schema.column_names, endpoint="localhost:8001")

I got another error: Failed to open the cudaIpcHandle
After searching around, I found that the cause might be CUDA shared memory is not supported on Windows. As I deployed the server in WSL2 within Win11, it would always have this error? is there any solution now?

Best
Wei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants