Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX with TensorRT fails to run #2715

Open
Andryshik345 opened this issue Mar 29, 2024 · 16 comments
Open

ONNX with TensorRT fails to run #2715

Andryshik345 opened this issue Mar 29, 2024 · 16 comments
Labels
bug Something isn't working ONNX

Comments

@Andryshik345
Copy link

Information:

  • Chainner version: 0.22.2
  • OS: Windows 10 10.0.19045.4170
  • CPU: AMD Ryzen 5 5600
  • GPU: RTX 3060
  • CUDA: 12.4
  • TensorRT: 10.0.0.6

Description
I'm getting TypeError: Failed to fetch when trying to run upscaling with ONNX and TensorRT. Of course I'm aware that provided onnxruntime-gpu 1.15.1 doesn't support CUDA 12.x, so I manually updated it to 1.17 with CUDA 12.x support with pip using chainner's python distro. As well as installed TensorRT's wheels same way. CUDA runner works fine (although it much slower than even CPU processing).

Logs
logs.zip

@Andryshik345 Andryshik345 added the bug Something isn't working label Mar 29, 2024
@UffernKur
Copy link

UffernKur commented Mar 30, 2024

From the 3/29/2024 Nightly. nVidia RTX 3060. Fresh install of ChaiNNer, deleted and redownloaded python, onnx and nccn environment.

Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values:
• Image: RGB Image 1280x1024
• Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric'
• Tile Size: 256
• Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\process.py", line 155, in run_node
    raw_output = node.run(context, *enforced_inputs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
    return convenient_upscale(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
    return upscale(img)
           ^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\image_op.py", line 18, in <lambda>
    return lambda i: np.clip(op(i), 0, 1)
                             ^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in <lambda>
    lambda i: upscale(i, session, tile_size, change_shape, exact_size),
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
    return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
    return auto_split(img, upscale, tiler)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
    return split(
           ^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
    upscale_result = upscale(padded_tile.read_from(img), padded_tile)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
    output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

@joeyballentine
Copy link
Member

Does this happen with every model?

@Andryshik345
Copy link
Author

@joeyballentine I tried two different models just now and I get the same error.
I also decided to record logs for the whole process (converting from pytorch model to onnx and try to upscale an image using this model), so maybe the problem arises somewhere there.

logs.zip

@UffernKur
Copy link

GPU is failing to run on multiple ONNX models, however those models will complete much more slowly with CPU.

@joeyballentine
Copy link
Member

Make sure to try the nightly, we fixed an onnx issue there. And when you do, make sure to update onnx

@Andryshik345
Copy link
Author

@joeyballentine tried nightly build from 2024-04-07, extracted to a separate folder, installed all dependencies and now i get Error: An unexpected error occurred: Error: The application encountered an unexpected error and could not continue.
Tried with several models (including those that I have already successfully used in AnimeJaNaiConverterGui with TensorRT), same error with all of them.

captured just after the first error: logs.zip

captured after some time (it begins to spam another errors regarding localhost): logs_1.zip

@joeyballentine
Copy link
Member

Sorry but could you try this again on tonight's nightly when it comes out? We had an issue where some things arent logging properly, so the important logs that would tell us whats going wrong are currently being missed. Thanks

@Andryshik345
Copy link
Author

Andryshik345 commented Apr 9, 2024

Updated a new nightly build on top of yesterday's, still the same error.
And just like yesterday:
captured just after the first error: logs.zip

captured after some time (it begins to spam another errors regarding localhost): logs_11.zip

I also just tried to delete everything and install it again, but no changes.

@joeyballentine
Copy link
Member

Damn, whatever's going wrong still isn't logging. Are you sure you have tensorrt set up properly and added to your path env var?

And just to be sure, CUDA works fine for you?

@UffernKur
Copy link

Here is the CUDA failure
Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values:
• Image: RGB Image 1074x1515
• Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric'
• Tile Size: 256
• Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\process.py", line 155, in run_node
raw_output = node.run(context, *enforced_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
return convenient_upscale(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
return upscale(img)
^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\image_op.py", line 18, in
return lambda i: np.clip(op(i), 0, 1)
^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in
lambda i: upscale(i, session, tile_size, change_shape, exact_size),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
return auto_split(img, upscale, tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
return split(
^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
upscale_result = upscale(padded_tile.read_from(img), padded_tile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Here is the Tensor Failure

Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values:
• Image: RGB Image 1074x1515
• Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric'
• Tile Size: 256
• Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\process.py", line 155, in run_node
raw_output = node.run(context, *enforced_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
return convenient_upscale(
^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
return upscale(img)
^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\image_op.py", line 18, in
return lambda i: np.clip(op(i), 0, 1)
^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in
lambda i: upscale(i, session, tile_size, change_shape, exact_size),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
return auto_split(img, upscale, tiler)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
return split(
^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
upscale_result = upscale(padded_tile.read_from(img), padded_tile)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

@Andryshik345
Copy link
Author

Andryshik345 commented Apr 9, 2024

Are you sure you have tensorrt set up properly and added to your path env var?

Well, i copied TensorRT's libraries to CUDA's /bin folder, which is in PATH. Though, i cant check any samples like https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleOnnxMNIST (I don't have visual studio installed atm).
I also forgot to install tensorrt's python wheels, but not like it changed anything.

>python -m pip show tensorrt
Name: tensorrt
Version: 10.0.0b6
Summary: A high performance deep learning inference library
Home-page: https://developer.nvidia.com/tensorrt
Author: NVIDIA Corporation
Author-email:
License: Proprietary
Location: D:\upscale_software\chainner-nightly\python\python\Lib\site-packages
Requires:
Required-by:

And just to be sure, CUDA works fine for you?

CUDA works.

Screenshot

изображение

@joeyballentine
Copy link
Member

Tensorrt's python wheels aren't used for onnx's tensorrt support. Anyway, I'll look into this more. Thanks for the updates

@zelenooki87
Copy link

Does the image dimension in the chainner matter when creating a TensorRT model from ONNX?

Specifically, if I have, for example, 500 images where dimensions vary slightly for almost every other image, will the chain create a cached file for each resolution? Because that would take a considerable amount of time. If it doesn't create separate caches, that would be great!

@Andryshik345
Copy link
Author

Andryshik345 commented Apr 23, 2024 via email

@joeyballentine
Copy link
Member

If the image size varies, it might make a new model for each one as it is not set to use dynamic size. I tried to set that up in the past, and was unable to get it to work.

@zelenooki87
Copy link

Yes, I assumed that. It's the same way in Selur's Hybrid program. But for video files, waiting isn't really a problem since they're mostly standard sizes. So, if they stay cached on the hard drive, there's no need to wait again if the input file is the same resolution.

Anyway, we're eagerly waiting for the video super resolution chainner updates in one of the next major builds, and we're keeping our fingers crossed that everything works out as planned. Thanks for everything you've done so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ONNX
Projects
None yet
Development

No branches or pull requests

5 participants