New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Graph not work #7150
Comments
@tanmayv25 @Tabrizian @fpetrini15 |
@SunnyGhj Can you share complete verbose triton server logs? (--log-verbose=1) |
Yes of course, the log as below
|
https://github.com/triton-inference-server/tensorrt_backend/blob/5c881ce8f74988deedc473bb78a9417ffc650757/src/instance_state.cc#L3817 |
Description
CUDA Graph not work in tensorrt backend. The model config as below:
I have manual padding all the batch size of request to 25 in client,our expectation is that all inference should be performed
with cuda graph. However, according to the nsys analysis results, all inferences do not use cuda graph. Because only the Enqueue API is called. If cuda graph is used for inference, the cudaGraphLaunch API should be used to perform inference according to the tensorrt backend source code:
https://github.com/triton-inference-server/tensorrt_backend/blob/main/src/instance_state.cc#L893.
Triton Information
nvcr.io/nvidia/tritonserver:23.06-py3
GPU version
The text was updated successfully, but these errors were encountered: