Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use TF-TRT on NVIDIA T4 GPU to perform fasterrcnn model inference error #497

Open
XueweiHan21 opened this issue Jul 26, 2021 · 0 comments
Open

Comments

@XueweiHan21
Copy link
Contributor

System information (version)
docker image: registry.cn-beijing.aliyuncs.com/adlik/model-compiler:v0.3.0_trt7.2.1.6_cuda11.0
registry.cn-beijing.aliyuncs.com/adlik/serving-tftrt-gpu:v0.3.0
Inference engine:TF-TRT
Model name:Faster R-CNN ResNet50 V1 640x640

Detailed description
When I use the TF-TRT inference engine on Nvidia T4 GPU for fasterrcnn model inference, the error is reported as follows:

2021-06-28 21:29:25.597593: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-06-28 21:29:25.629687: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-06-28 21:29:27.796481: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:27.798160: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:27.798260: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for TRTEngineOp_0_0 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2021-06-28 21:29:27.798284: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[300,1], [300,1], [300,1], [300,1]] failed. Running native segment for TRTEngineOp_0_0
2021-06-28 21:29:27.804854: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-06-28 21:29:27.828653: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:27.830466: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:27.830544: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for StatefulPartitionedCall/SecondStagePostprocessor/TRTEngineOp_0_13 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2021-06-28 21:29:27.830570: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[1,300,91]] failed. Running native segment for StatefulPartitionedCall/SecondStagePostprocessor/TRTEngineOp_0_13
2021-06-28 21:29:27.885787: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-06-28 21:29:27.956809: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:27.958320: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:27.958386: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for StatefulPartitionedCall/SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Area/TRTEngineOp_0_12 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2021-06-28 21:29:27.958406: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[9000,1], [9000,1], [9000,1], [9000,1]] failed. Running native segment for StatefulPartitionedCall/SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/ClipToWindow/Area/TRTEngineOp_0_12
2021-06-28 21:29:27.965526: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-06-28 21:29:28.028448: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:28.029964: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger safeContext.cpp (124) - Cudnn Error in initializeCommonContext: 1 (Could not initialize cudnn, please check cudnn installation.)
2021-06-28 21:29:28.030028: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:758] TF-TRT Warning: Engine creation for StatefulPartitionedCall/SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Area/TRTEngineOp_0_11 failed. The native segment will be used instead. Reason: Internal: Failed to build TensorRT engine
2021-06-28 21:29:28.030049: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:628] TF-TRT Warning: Engine retrieval for input shapes: [[9000,1], [9000,1], [9000,1], [9000,1]] failed. Running native segment for StatefulPartitionedCall/SecondStagePostprocessor/BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Area/TRTEngineOp_0_11
2021-06-28 21:29:28: I adlik_serving/framework/manager/time_stats.cc:18] PredictServiceImpl::predict: process request, model fasterrcnn, batch size 1, time (milliseconds): 62123.6
2021-06-28 21:29:28: D adlik_serving/apis/predict_impl.cc:138] After predict, code: 0, status:
2021-06-28 21:29:28: D adlik_serving/apis/predict_impl.cc:142] After postProcessOutputs, status: , code: 0
2021-06-28 21:29:28: D adlik_serving/server/grpc/grpc_service.cc:35] Receive predict request, model_name: fasterrcnn, batch size: 1
2021-06-28 21:29:28.058662: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger ../rtSafe/cuda/cudaActivationRunner.cpp (103) - Cudnn Error in execute: 8 (CUDNN_STATUS_EXECUTION_FAILED)
2021-06-28 21:29:28.058717: E external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:42] DefaultLogger FAILED_EXECUTION: std::exception
2021-06-28 21:29:28.058729: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/kernels/trt_engine_op.cc:639] TF-TRT Warning: Failed to execute engine: Internal: Failed to enqueue batch for TRT engine Retrying with native segment for StatefulPartitionedCall/TRTEngineOp_0_4
2021-06-28 21:29:28.554509: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-06-28 21:29:28.573206: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-06-28 21:29:28.663224: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_blas.cc:226] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

However, when I use the same model and inference engine image to perform inference on Tesla v100 GPU, no error is reported and the inference can be completed correctly. Part of the log during inference is as follows:

2021-07-23 16:23:49.960729: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-07-23 16:23:50.010199: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-07-23 16:23:50.013737: W external/org_tensorflow/tensorflow/compiler/tf2tensorrt/utils/trt_logger.cc:38] TF-TRT Warning: DefaultLogger TensorRT was linked against cuDNN 8.1.0 but loaded cuDNN 8.0.5
2021-07-23 16:23:50: I adlik_serving/framework/manager/time_stats.cc:18] PredictServiceImpl::predict: process request, model fasterrcnn, batch size 1, time (milliseconds): 55320.3
2021-07-23 16:23:50: D adlik_serving/apis/predict_impl.cc:138] After predict, code: 0, status:
2021-07-23 16:23:50: D adlik_serving/apis/predict_impl.cc:142] After postProcessOutputs, status: , code: 0
2021-07-23 16:23:50: D adlik_serving/server/grpc/grpc_service.cc:35] Receive predict request, model_name: fasterrcnn, batch size: 1
2021-07-23 16:23:50: I adlik_serving/framework/manager/time_stats.cc:18] PredictServiceImpl::predict: process request, model fasterrcnn, batch size 1, time (milliseconds): 91.7115
2021-07-23 16:23:50: D adlik_serving/apis/predict_impl.cc:138] After predict, code: 0, status:
2021-07-23 16:23:50: D adlik_serving/apis/predict_impl.cc:142] After postProcessOutputs, status: , code: 0
2021-07-23 16:23:50: D adlik_serving/server/grpc/grpc_service.cc:35] Receive predict request, model_name: fasterrcnn, batch size: 1
2021-07-23 16:23:50: I adlik_serving/framework/manager/time_stats.cc:18] PredictServiceImpl::predict: process request, model fasterrcnn, batch size 1, time (milliseconds): 90.111
2021-07-23 16:23:50: D adlik_serving/apis/predict_impl.cc:138] After predict, code: 0, status:
2021-07-23 16:23:50: D adlik_serving/apis/predict_impl.cc:142] After postProcessOutputs, status: , code: 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant