ONNX with TensorRT fails to run #2715

Andryshik345 · 2024-03-29T04:50:56Z

Information:

Chainner version: 0.22.2
OS: Windows 10 10.0.19045.4170
CPU: AMD Ryzen 5 5600
GPU: RTX 3060
CUDA: 12.4
TensorRT: 10.0.0.6

Description
I'm getting TypeError: Failed to fetch when trying to run upscaling with ONNX and TensorRT. Of course I'm aware that provided onnxruntime-gpu 1.15.1 doesn't support CUDA 12.x, so I manually updated it to 1.17 with CUDA 12.x support with pip using chainner's python distro. As well as installed TensorRT's wheels same way. CUDA runner works fine (although it much slower than even CPU processing).

Logs
logs.zip

The text was updated successfully, but these errors were encountered:

UffernKur · 2024-03-30T03:35:05Z

From the 3/29/2024 Nightly. nVidia RTX 3060. Fresh install of ChaiNNer, deleted and redownloaded python, onnx and nccn environment.

Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values:
• Image: RGB Image 1280x1024
• Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric'
• Tile Size: 256
• Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\process.py", line 155, in run_node
    raw_output = node.run(context, *enforced_inputs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
    return convenient_upscale(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
    return upscale(img)
           ^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\image_op.py", line 18, in <lambda>
    return lambda i: np.clip(op(i), 0, 1)
                             ^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in <lambda>
    lambda i: upscale(i, session, tile_size, change_shape, exact_size),
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
    return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
    return auto_split(img, upscale, tiler)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
    return split(
           ^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
    upscale_result = upscale(padded_tile.read_from(img), padded_tile)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
    output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

joeyballentine · 2024-03-30T03:39:31Z

Does this happen with every model?

Andryshik345 · 2024-03-30T04:04:38Z

@joeyballentine I tried two different models just now and I get the same error.
I also decided to record logs for the whole process (converting from pytorch model to onnx and try to upscale an image using this model), so maybe the problem arises somewhere there.

logs.zip

UffernKur · 2024-04-08T16:17:02Z

GPU is failing to run on multiple ONNX models, however those models will complete much more slowly with CPU.

joeyballentine · 2024-04-08T16:22:00Z

Make sure to try the nightly, we fixed an onnx issue there. And when you do, make sure to update onnx

Andryshik345 · 2024-04-08T18:18:20Z

@joeyballentine tried nightly build from 2024-04-07, extracted to a separate folder, installed all dependencies and now i get Error: An unexpected error occurred: Error: The application encountered an unexpected error and could not continue.
Tried with several models (including those that I have already successfully used in AnimeJaNaiConverterGui with TensorRT), same error with all of them.

captured just after the first error: logs.zip

captured after some time (it begins to spam another errors regarding localhost): logs_1.zip

joeyballentine · 2024-04-08T18:29:43Z

Sorry but could you try this again on tonight's nightly when it comes out? We had an issue where some things arent logging properly, so the important logs that would tell us whats going wrong are currently being missed. Thanks

Andryshik345 · 2024-04-09T04:15:58Z

Updated a new nightly build on top of yesterday's, still the same error.
And just like yesterday:
captured just after the first error: logs.zip

captured after some time (it begins to spam another errors regarding localhost): logs_11.zip

I also just tried to delete everything and install it again, but no changes.

joeyballentine · 2024-04-09T04:59:59Z

Damn, whatever's going wrong still isn't logging. Are you sure you have tensorrt set up properly and added to your path env var?

And just to be sure, CUDA works fine for you?

UffernKur · 2024-04-09T05:07:53Z

Here is the CUDA failure
Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values:
• Image: RGB Image 1074x1515
• Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric'
• Tile Size: 256
• Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\process.py", line 155, in run_node
raw_output = node.run(context, *enforced_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
return convenient_upscale(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
return upscale(img)
^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\image_op.py", line 18, in
return lambda i: np.clip(op(i), 0, 1)
^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in
lambda i: upscale(i, session, tile_size, change_shape, exact_size),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
return auto_split(img, upscale, tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
return split(
^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
upscale_result = upscale(padded_tile.read_from(img), padded_tile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Here is the Tensor Failure

Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values:
• Image: RGB Image 1074x1515
• Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric'
• Tile Size: 256
• Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\process.py", line 155, in run_node
raw_output = node.run(context, *enforced_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
return convenient_upscale(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
return upscale(img)
^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\image_op.py", line 18, in
return lambda i: np.clip(op(i), 0, 1)
^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in
lambda i: upscale(i, session, tile_size, change_shape, exact_size),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
return auto_split(img, upscale, tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
return split(
^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
upscale_result = upscale(padded_tile.read_from(img), padded_tile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Andryshik345 · 2024-04-09T05:21:29Z

Are you sure you have tensorrt set up properly and added to your path env var?

Well, i copied TensorRT's libraries to CUDA's /bin folder, which is in PATH. Though, i cant check any samples like https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleOnnxMNIST (I don't have visual studio installed atm).
I also forgot to install tensorrt's python wheels, but not like it changed anything.

>python -m pip show tensorrt
Name: tensorrt
Version: 10.0.0b6
Summary: A high performance deep learning inference library
Home-page: https://developer.nvidia.com/tensorrt
Author: NVIDIA Corporation
Author-email:
License: Proprietary
Location: D:\upscale_software\chainner-nightly\python\python\Lib\site-packages
Requires:
Required-by:

And just to be sure, CUDA works fine for you?

CUDA works.

Screenshot

joeyballentine · 2024-04-09T05:33:36Z

Tensorrt's python wheels aren't used for onnx's tensorrt support. Anyway, I'll look into this more. Thanks for the updates

zelenooki87 · 2024-04-23T19:27:46Z

Does the image dimension in the chainner matter when creating a TensorRT model from ONNX?

Specifically, if I have, for example, 500 images where dimensions vary slightly for almost every other image, will the chain create a cached file for each resolution? Because that would take a considerable amount of time. If it doesn't create separate caches, that would be great!

Andryshik345 · 2024-04-23T19:31:43Z

AFAIK TensorRT should first create an engine file for your model and then it will use it for all input data.

joeyballentine · 2024-04-23T19:34:32Z

If the image size varies, it might make a new model for each one as it is not set to use dynamic size. I tried to set that up in the past, and was unable to get it to work.

zelenooki87 · 2024-04-23T19:47:38Z

Yes, I assumed that. It's the same way in Selur's Hybrid program. But for video files, waiting isn't really a problem since they're mostly standard sizes. So, if they stay cached on the hard drive, there's no need to wait again if the input file is the same resolution.

Anyway, we're eagerly waiting for the video super resolution chainner updates in one of the next major builds, and we're keeping our fingers crossed that everything works out as planned. Thanks for everything you've done so far.

Andryshik345 added the bug Something isn't working label Mar 29, 2024

RunDevelopment added the ONNX label Apr 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNX with TensorRT fails to run #2715

ONNX with TensorRT fails to run #2715

Andryshik345 commented Mar 29, 2024

UffernKur commented Mar 30, 2024 •

edited by RunDevelopment

joeyballentine commented Mar 30, 2024

Andryshik345 commented Mar 30, 2024

UffernKur commented Apr 8, 2024

joeyballentine commented Apr 8, 2024

Andryshik345 commented Apr 8, 2024

joeyballentine commented Apr 8, 2024

Andryshik345 commented Apr 9, 2024 •

edited

joeyballentine commented Apr 9, 2024

UffernKur commented Apr 9, 2024

Andryshik345 commented Apr 9, 2024 •

edited

joeyballentine commented Apr 9, 2024

zelenooki87 commented Apr 23, 2024

Andryshik345 commented Apr 23, 2024 via email •

edited

joeyballentine commented Apr 23, 2024

zelenooki87 commented Apr 23, 2024

ONNX with TensorRT fails to run #2715

ONNX with TensorRT fails to run #2715

Comments

Andryshik345 commented Mar 29, 2024

UffernKur commented Mar 30, 2024 • edited by RunDevelopment

joeyballentine commented Mar 30, 2024

Andryshik345 commented Mar 30, 2024

UffernKur commented Apr 8, 2024

joeyballentine commented Apr 8, 2024

Andryshik345 commented Apr 8, 2024

joeyballentine commented Apr 8, 2024

Andryshik345 commented Apr 9, 2024 • edited

joeyballentine commented Apr 9, 2024

UffernKur commented Apr 9, 2024

Here is the Tensor Failure

Andryshik345 commented Apr 9, 2024 • edited

joeyballentine commented Apr 9, 2024

zelenooki87 commented Apr 23, 2024

Andryshik345 commented Apr 23, 2024 via email • edited

joeyballentine commented Apr 23, 2024

zelenooki87 commented Apr 23, 2024

UffernKur commented Mar 30, 2024 •

edited by RunDevelopment

Andryshik345 commented Apr 9, 2024 •

edited

Andryshik345 commented Apr 9, 2024 •

edited

Andryshik345 commented Apr 23, 2024 via email •

edited