You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expected behavior
The server should reply to client with the following message:
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}>
bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')
The text was updated successfully, but these errors were encountered:
You'll need to install any python dependencies necessary for your python model inside of the container before starting the server. For example, via pip install ....
You can prep a custom Docker so you can re-use it across runs as well:
FROM nvcr.io/nvidia/tritonserver:24.03-py3
RUN pip install ...
Thanks Ryan @rmccorm4
I customized the docker image nvcr.io/nvidia/tritonserver:24.03-py3 with installing the necessary libs and commit it as a new docker image. It works. Thank you.
By the way, when I try to use the server as described in the tutorial: from merlin.systems.triton.utils import send_triton_request response = send_triton_request(workflow.input_schema, df, output_schema.column_names, endpoint="localhost:8001")
I got another error: Failed to open the cudaIpcHandle
After searching around, I found that the cause might be CUDA shared memory is not supported on Windows. As I deployed the server in WSL2 within Win11, it would always have this error? is there any solution now?
Description
A clear and concise description of what the bug is.
I am following the tutorial online: https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
After creating the model "executor_model", I tried to run the Triton Inference Server with
Triton Information
What version of Triton are you using?
tritonserver:24.03-py3
Are you using the Triton container or did you build it yourself?
docker nvcr.io/nvidia/tritonserver:24.03-py3
To Reproduce
Steps to reproduce the behavior.
I followed this online tutorial : https://github.com/NVIDIA-Merlin/Transformers4Rec/blob/stable/examples/getting-started-session-based/03-serving-session-based-model-torch-backend.ipynb
Expected behavior
The server should reply to client with the following message:
<HTTPSocketPoolResponse status=200 headers={'content-type': 'application/json', 'content-length': '188'}>
bytearray(b'[{"name":"0_transformworkflowtriton","version":"1","state":"READY"},{"name":"1_predictpytorchtriton","version":"1","state":"READY"},{"name":"executor_model","version":"1","state":"READY"}]')
The text was updated successfully, but these errors were encountered: