New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNX with TensorRT fails to run #2715
Comments
From the 3/29/2024 Nightly. nVidia RTX 3060. Fresh install of ChaiNNer, deleted and redownloaded python, onnx and nccn environment. Error An error occurred in a onnx Upscale Image node: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1 Input values: Stack Trace:
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\process.py", line 155, in run_node
raw_output = node.run(context, *enforced_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
return convenient_upscale(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
return upscale(img)
^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\image_op.py", line 18, in <lambda>
return lambda i: np.clip(op(i), 0, 1)
^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in <lambda>
lambda i: upscale(i, session, tile_size, change_shape, exact_size),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
return auto_split(img, upscale, tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
return split(
^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
upscale_result = upscale(padded_tile.read_from(img), padded_tile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1 |
Does this happen with every model? |
@joeyballentine I tried two different models just now and I get the same error. |
GPU is failing to run on multiple ONNX models, however those models will complete much more slowly with CPU. |
Make sure to try the nightly, we fixed an onnx issue there. And when you do, make sure to update onnx |
@joeyballentine tried nightly build from 2024-04-07, extracted to a separate folder, installed all dependencies and now i get captured just after the first error: logs.zip captured after some time (it begins to spam another errors regarding localhost): logs_1.zip |
Sorry but could you try this again on tonight's nightly when it comes out? We had an issue where some things arent logging properly, so the important logs that would tell us whats going wrong are currently being missed. Thanks |
Updated a new nightly build on top of yesterday's, still the same error. captured after some time (it begins to spam another errors regarding localhost): logs_11.zip I also just tried to delete everything and install it again, but no changes. |
Damn, whatever's going wrong still isn't logging. Are you sure you have tensorrt set up properly and added to your path env var? And just to be sure, CUDA works fine for you? |
Here is the CUDA failure An error occurred in a onnx Upscale Image node: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1 Input values: Stack Trace: Here is the Tensor FailureError An error occurred in a onnx Upscale Image node: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1 Input values: Stack Trace: |
Well, i copied TensorRT's libraries to CUDA's /bin folder, which is in PATH. Though, i cant check any samples like https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleOnnxMNIST (I don't have visual studio installed atm).
CUDA works. |
Tensorrt's python wheels aren't used for onnx's tensorrt support. Anyway, I'll look into this more. Thanks for the updates |
Does the image dimension in the chainner matter when creating a TensorRT model from ONNX? Specifically, if I have, for example, 500 images where dimensions vary slightly for almost every other image, will the chain create a cached file for each resolution? Because that would take a considerable amount of time. If it doesn't create separate caches, that would be great! |
AFAIK TensorRT should first create an engine file for your model and then
it will use it for all input data.
|
If the image size varies, it might make a new model for each one as it is not set to use dynamic size. I tried to set that up in the past, and was unable to get it to work. |
Yes, I assumed that. It's the same way in Selur's Hybrid program. But for video files, waiting isn't really a problem since they're mostly standard sizes. So, if they stay cached on the hard drive, there's no need to wait again if the input file is the same resolution. Anyway, we're eagerly waiting for the video super resolution chainner updates in one of the next major builds, and we're keeping our fingers crossed that everything works out as planned. Thanks for everything you've done so far. |
Information:
Description
I'm getting
TypeError: Failed to fetch
when trying to run upscaling with ONNX and TensorRT. Of course I'm aware that provided onnxruntime-gpu 1.15.1 doesn't support CUDA 12.x, so I manually updated it to 1.17 with CUDA 12.x support withpip
using chainner's python distro. As well as installed TensorRT's wheels same way. CUDA runner works fine (although it much slower than even CPU processing).Logs
logs.zip
The text was updated successfully, but these errors were encountered: