Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TRTEXEC Failure when trying to build TensorRT engine from ONNX Model; Error from graphShapeAnalyzer.cpp #3846

Open
timf34 opened this issue May 7, 2024 · 2 comments
Assignees
Labels
triaged Issue has been triaged by maintainers

Comments

@timf34
Copy link

timf34 commented May 7, 2024

Error portion of logs:

 Error[4]: [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeAnalyzerImpl::analyzeShapes::2084] Error Code 4: Miscellaneous (ITopKLayer /TopK: /TopK: K exceeds the maximum value allowed (3840).)
[05/07/2024-15:50:04] [E] Engine could not be created from network
[05/07/2024-15:50:04] [E] Building engine failed
[05/07/2024-15:50:04] [E] Failed to create engine from model or file.
[05/07/2024-15:50:04] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100001] # C:\Program Files\TensorRT-10.0.1.6\bin\trtexec.exe --onnx=C:\Users\timf3\PycharmProjects\BallNet\footandball_model.onnx --minShapes=input:1x3x1080x1920 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --fp16 --saveEngine=resnet_engine.trt

I am running this on the most recent version of TensorRT, and am using an Nvidia GeForce 3050Ti on a Windows 11 laptop.

Here is the output from nvcc --version:

PS C:\Program Files\TensorRT-10.0.1.6\bin> nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Here are the full logs:

PS C:\Program Files\TensorRT-10.0.1.6\bin> ./trtexec.exe --onnx=C:\Users\timf3\PycharmProjects\BallNet\footandball_model.onnx --minShapes=input:1x3x1080x1920 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --fp16 --sav
eEngine=resnet_engine.trt
&&&& RUNNING TensorRT.trtexec [TensorRT v100001] # C:\Program Files\TensorRT-10.0.1.6\bin\trtexec.exe --onnx=C:\Users\timf3\PycharmProjects\BallNet\footandball_model.onnx --minShapes=input:1x3x1080x1920 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --fp16 --saveEngine=resnet_engine.trt
[05/07/2024-15:49:51] [I] === Model Options ===
[05/07/2024-15:49:51] [I] Format: ONNX
[05/07/2024-15:49:51] [I] Model: C:\Users\timf3\PycharmProjects\BallNet\footandball_model.onnx
[05/07/2024-15:49:51] [I] Output:
[05/07/2024-15:49:51] [I] === Build Options ===
[05/07/2024-15:49:51] [I] Memory Pools: workspace: default, dlaSRAM: default, dlaLocalDRAM: default, dlaGlobalDRAM: default, tacticSharedMem: default
[05/07/2024-15:49:51] [I] avgTiming: 8
[05/07/2024-15:49:51] [I] Precision: FP32+FP16
[05/07/2024-15:49:51] [I] LayerPrecisions:
[05/07/2024-15:49:51] [I] Layer Device Types:
[05/07/2024-15:49:51] [I] Calibration:
[05/07/2024-15:49:51] [I] Refit: Disabled
[05/07/2024-15:49:51] [I] Strip weights: Disabled
[05/07/2024-15:49:51] [I] Version Compatible: Disabled
[05/07/2024-15:49:51] [I] ONNX Plugin InstanceNorm: Disabled
[05/07/2024-15:49:51] [I] TensorRT runtime: full
[05/07/2024-15:49:51] [I] Lean DLL Path:
[05/07/2024-15:49:51] [I] Tempfile Controls: { in_memory: allow, temporary: allow }
[05/07/2024-15:49:51] [I] Exclude Lean Runtime: Disabled
[05/07/2024-15:49:51] [I] Sparsity: Disabled
[05/07/2024-15:49:51] [I] Safe mode: Disabled
[05/07/2024-15:49:51] [I] Build DLA standalone loadable: Disabled
[05/07/2024-15:49:51] [I] Allow GPU fallback for DLA: Disabled
[05/07/2024-15:49:51] [I] DirectIO mode: Disabled
[05/07/2024-15:49:51] [I] Restricted mode: Disabled
[05/07/2024-15:49:51] [I] Skip inference: Disabled
[05/07/2024-15:49:51] [I] Save engine: resnet_engine.trt
[05/07/2024-15:49:51] [I] Load engine:
[05/07/2024-15:49:51] [I] Profiling verbosity: 0
[05/07/2024-15:49:51] [I] Tactic sources: Using default tactic sources
[05/07/2024-15:49:51] [I] timingCacheMode: local
[05/07/2024-15:49:51] [I] timingCacheFile:
[05/07/2024-15:49:51] [I] Enable Compilation Cache: Enabled
[05/07/2024-15:49:51] [I] errorOnTimingCacheMiss: Disabled
[05/07/2024-15:49:51] [I] Preview Features: Use default preview flags.
[05/07/2024-15:49:51] [I] MaxAuxStreams: -1
[05/07/2024-15:49:51] [I] BuilderOptimizationLevel: -1
[05/07/2024-15:49:51] [I] Calibration Profile Index: 0
[05/07/2024-15:49:51] [I] Weight Streaming: Disabled
[05/07/2024-15:49:51] [I] Debug Tensors:
[05/07/2024-15:49:51] [I] Input(s)s format: fp32:CHW
[05/07/2024-15:49:51] [I] Output(s)s format: fp32:CHW
[05/07/2024-15:49:51] [I] Input build shape (profile 0): input=1x3x1080x1920+1x3x1080x1920+1x3x1080x1920
[05/07/2024-15:49:51] [I] Input calibration shapes: model
[05/07/2024-15:49:51] [I] === System Options ===
[05/07/2024-15:49:51] [I] Device: 0
[05/07/2024-15:49:51] [I] DLACore:
[05/07/2024-15:49:51] [I] Plugins:
[05/07/2024-15:49:51] [I] setPluginsToSerialize:
[05/07/2024-15:49:51] [I] dynamicPlugins:
[05/07/2024-15:49:51] [I] ignoreParsedPluginLibs: 0
[05/07/2024-15:49:51] [I]
[05/07/2024-15:49:51] [I] === Inference Options ===
[05/07/2024-15:49:51] [I] Batch: Explicit
[05/07/2024-15:49:51] [I] Input inference shape : input=1x3x1080x1920
[05/07/2024-15:49:51] [I] Iterations: 10
[05/07/2024-15:49:51] [I] Duration: 3s (+ 200ms warm up)
[05/07/2024-15:49:51] [I] Sleep time: 0ms
[05/07/2024-15:49:51] [I] Idle time: 0ms
[05/07/2024-15:49:51] [I] Inference Streams: 1
[05/07/2024-15:49:51] [I] ExposeDMA: Disabled
[05/07/2024-15:49:51] [I] Data transfers: Enabled
[05/07/2024-15:49:51] [I] Spin-wait: Disabled
[05/07/2024-15:49:51] [I] Multithreading: Disabled
[05/07/2024-15:49:51] [I] CUDA Graph: Disabled
[05/07/2024-15:49:51] [I] Separate profiling: Disabled
[05/07/2024-15:49:51] [I] Time Deserialize: Disabled
[05/07/2024-15:49:51] [I] Time Refit: Disabled
[05/07/2024-15:49:51] [I] NVTX verbosity: 0
[05/07/2024-15:49:51] [I] Persistent Cache Ratio: 0
[05/07/2024-15:49:51] [I] Optimization Profile Index: 0
[05/07/2024-15:49:51] [I] Weight Streaming Budget: Disabled
[05/07/2024-15:49:51] [I] Inputs:
[05/07/2024-15:49:51] [I] Debug Tensor Save Destinations:
[05/07/2024-15:49:51] [I] === Reporting Options ===
[05/07/2024-15:49:51] [I] Verbose: Disabled
[05/07/2024-15:49:51] [I] Averages: 10 inferences
[05/07/2024-15:49:51] [I] Percentiles: 90,95,99
[05/07/2024-15:49:51] [I] Dump refittable layers:Disabled
[05/07/2024-15:49:51] [I] Dump output: Disabled
[05/07/2024-15:49:51] [I] Profile: Disabled
[05/07/2024-15:49:51] [I] Export timing to JSON file:
[05/07/2024-15:49:51] [I] Export output to JSON file:
[05/07/2024-15:49:51] [I] Export profile to JSON file:
[05/07/2024-15:49:51] [I]
[05/07/2024-15:49:51] [I] === Device Information ===
[05/07/2024-15:49:51] [I] Available Devices:
[05/07/2024-15:49:51] [I]   Device 0: "NVIDIA GeForce RTX 3050 Ti Laptop GPU" UUID: GPU-173983d4-c1c9-ad3a-5330-e883c4542db5
[05/07/2024-15:49:51] [I] Selected Device: NVIDIA GeForce RTX 3050 Ti Laptop GPU
[05/07/2024-15:49:51] [I] Selected Device ID: 0
[05/07/2024-15:49:51] [I] Selected Device UUID: GPU-173983d4-c1c9-ad3a-5330-e883c4542db5
[05/07/2024-15:49:51] [I] Compute Capability: 8.6
[05/07/2024-15:49:51] [I] SMs: 20
[05/07/2024-15:49:51] [I] Device Global Memory: 4095 MiB
[05/07/2024-15:49:51] [I] Shared Memory per SM: 100 KiB
[05/07/2024-15:49:51] [I] Memory Bus Width: 128 bits (ECC disabled)
[05/07/2024-15:49:51] [I] Application Compute Clock Rate: 1.035 GHz
[05/07/2024-15:49:51] [I] Application Memory Clock Rate: 5.501 GHz
[05/07/2024-15:49:51] [I]
[05/07/2024-15:49:51] [I] Note: The application clock rates do not reflect the actual clock rates that the GPU is currently running at.
[05/07/2024-15:49:51] [I]
[05/07/2024-15:49:51] [I] TensorRT version: 10.0.1
[05/07/2024-15:49:51] [I] Loading standard plugins
[05/07/2024-15:49:51] [I] [TRT] [MemUsageChange] Init CUDA: CPU +92, GPU +0, now: CPU 22297, GPU 792 (MiB)
[05/07/2024-15:50:04] [I] [TRT] [MemUsageChange] Init builder kernel library: CPU +2601, GPU +310, now: CPU 25624, GPU 1102 (MiB)
[05/07/2024-15:50:04] [I] Start parsing network model.
[05/07/2024-15:50:04] [I] [TRT] ----------------------------------------------------------------
[05/07/2024-15:50:04] [I] [TRT] Input filename:   C:\Users\timf3\PycharmProjects\BallNet\footandball_model.onnx
[05/07/2024-15:50:04] [I] [TRT] ONNX IR version:  0.0.6
[05/07/2024-15:50:04] [I] [TRT] Opset version:    11
[05/07/2024-15:50:04] [I] [TRT] Producer name:    pytorch
[05/07/2024-15:50:04] [I] [TRT] Producer version: 2.0.1
[05/07/2024-15:50:04] [I] [TRT] Domain:
[05/07/2024-15:50:04] [I] [TRT] Model version:    0
[05/07/2024-15:50:04] [I] [TRT] Doc string:
[05/07/2024-15:50:04] [I] [TRT] ----------------------------------------------------------------
[05/07/2024-15:50:04] [W] [TRT] ModelImporter.cpp:680: Make sure output 614 has Int64 binding.
[05/07/2024-15:50:04] [I] Finished parsing network model. Parse time: 0.0570069
[05/07/2024-15:50:04] [I] Set shape of input tensor input for optimization profile 0 to: MIN=1x3x1080x1920 OPT=1x3x1080x1920 MAX=1x3x1080x1920
[05/07/2024-15:50:04] [E] Error[4]: [graphShapeAnalyzer.cpp::nvinfer1::builder::`anonymous-namespace'::ShapeAnalyzerImpl::analyzeShapes::2084] Error Code 4: Miscellaneous (ITopKLayer /TopK: /TopK: K exceeds the maximum value allowed (3840).)
[05/07/2024-15:50:04] [E] Engine could not be created from network
[05/07/2024-15:50:04] [E] Building engine failed
[05/07/2024-15:50:04] [E] Failed to create engine from model or file.
[05/07/2024-15:50:04] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec [TensorRT v100001] # C:\Program Files\TensorRT-10.0.1.6\bin\trtexec.exe --onnx=C:\Users\timf3\PycharmProjects\BallNet\footandball_model.onnx --minShapes=input:1x3x1080x1920 --optShapes=input:1x3x1080x1920 --maxShapes=input:1x3x1080x1920 --fp16 --saveEngine=resnet_engine.trt

I'm not sure how to fix this or how to go about debugging it. I have found the TopK layer using Netron, but I can't associate it to where it is in my network (I am using a custom CNN architecture).

@lix19937
Copy link

/TopK: K exceeds the maximum value allowed (3840)

topK K > 3840, FAILED.

@zerollzeng
Copy link
Collaborator

It's known limitation, and we are actively working on remove it.

@zerollzeng zerollzeng self-assigned this May 12, 2024
@zerollzeng zerollzeng added the triaged Issue has been triaged by maintainers label May 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

3 participants