Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in torch2trt of inference segmentation.ipynb #1

Open
flow-dev opened this issue Apr 11, 2020 · 20 comments
Open

Error in torch2trt of inference segmentation.ipynb #1

flow-dev opened this issue Apr 11, 2020 · 20 comments
Labels
good first issue Good for newcomers

Comments

@flow-dev
Copy link

Thanks for sharing great code!

However, I am having trouble getting an error when converting deeplabv3 models with torch2trt.
--> "inference segmentation.ipynb"

The backbone alone such as resnet18 can be executed without problems.
-->python3 inference_tensorrt.py

Which torch2trt installation method or jetpack version are you using?

The environment is Jetpack4.2 and Jetson nano

I installed with "Option 2 - With plugins (experimental)" referring to this site.(https://github.com/NVIDIA-AI-IOT/torch2trt)

https://github.com/NVIDIA-AI-IOT/torch2trt
Option 2 - With plugins (experimental)
To install with plugins to support some operations in PyTorch that are not natviely supported with TensorRT, call the following

sudo apt-get install libprotobuf* protobuf-compiler ninja-build
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python setup.py install --plugins

Err log of "inference_segmentation.ipynb"

$ jupyter nbconvert inference_segmentation.ipynb --to python
$ python3 inference_segmentation.py 
model: fcn_resnet50
Avg execution time (ms): 0.039
Traceback (most recent call last):
  File "inference_segmentation.py", line 109, in <module>
    model_trt = torch2trt(model_w, [x])
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 377, in torch2trt
    outputs = module(*inputs)
  File "/home/hogehoge/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "inference_segmentation.py", line 47, in forward
    return self.model(x)['out']
  File "/home/hogehoge/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/hogehoge/.local/lib/python3.6/site-packages/torchvision/models/segmentation/_utils.py", line 25, in forward
    x = F.interpolate(x, size=input_shape, mode='bilinear', align_corners=False)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/torch2trt.py", line 202, in wrapper
    converter['converter'](ctx)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/converters/interpolate/interpolate.py", line 35, in convert_interpolate
    plugin = get_interpolate_plugin(size=size, mode=mode, align_corners=align_corners)
  File "/usr/local/lib/python3.6/dist-packages/torch2trt/converters/interpolate/interpolate.py", line 11, in get_interpolate_plugin
    creator = [c for c in registry.plugin_creator_list if c.name == PLUGIN_NAME and c.plugin_namespace == 'torch2trt'][0]
IndexError: list index out of range

"inference_tensorrt.py" is no problem.

$ python3 inference_tensorrt.py 

Avg execution time (ms): 0.002
model: resnet18
Avg execution time (ms): 0.001
running fp16 models..
Avg execution time (ms): 0.001
running int8 models..
Avg execution time (ms): 0.001
Avg execution time (ms): 0.004
model: resnet34
Avg execution time (ms): 0.002
running fp16 models..
Avg execution time (ms): 0.002
running int8 models..
Avg execution time (ms): 0.001
Avg execution time (ms): 0.005

I hope you get good advice.

@kentaroy47
Copy link
Owner

Thanks for trying it out!
Your error comes from the interpolation not included in torch2trt. Did you build torch2trt with the latest commit?
I see that FCN does not run, but Do deeplab run?

@flow-dev
Copy link
Author

Thank you for your reply.

I installed latest commit and deeplab could not run too.

Your error comes from the interpolation not included in torch2trt. Did you build torch2trt with the latest commit?
I see that FCN does not run, but Do deeplab run?

The interpolation has environment-dependent problems.
There seems to be no clear solution...
I am trying various things referring to the issue, but it is a difficult problem.

NVIDIA-AI-IOT/torch2trt#274
NVIDIA-AI-IOT/torch2trt#119

There is no problem in your code,
but I would like to know where you installed torch2trt with jetson nano and xavier.

@luhang-HPU
Copy link

Thank you for your reply.

I installed latest commit and deeplab could not run too.

Your error comes from the interpolation not included in torch2trt. Did you build torch2trt with the latest commit?
I see that FCN does not run, but Do deeplab run?

The interpolation has environment-dependent problems.
There seems to be no clear solution...
I am trying various things referring to the issue, but it is a difficult problem.

NVIDIA-AI-IOT/torch2trt#274
NVIDIA-AI-IOT/torch2trt#119

There is no problem in your code,
but I would like to know where you installed torch2trt with jetson nano and xavier.

Facing the same problem, with the latest trt7 and torch2trt with plugin installation.
Any idea how to solve this?
Thanks for this wonderful project!

@kentaroy47 kentaroy47 added the good first issue Good for newcomers label Apr 12, 2020
@kentaroy47
Copy link
Owner

kentaroy47 commented Apr 13, 2020

I only tried segmentation with Xavier. I used Xavier, Jetpack=4.3, TRT=7, latest torch2trt with plugin upon testing the segmentations.

I just tried running segmentations with Jetson Nano as well, but I was stuck in running the native PyTorch segmentation model. Will report if I get this working.. (Jetpack=4.1, TRT5)

The steps I followed to setup Xavier is as bellow:

  1. Install torchvision
    I followed this instruction and installed torchvision==0.3.0
    https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32
sudo apt-get install libjpeg-dev zlib1g-dev
git clone -b v0.3.0 https://github.com/pytorch/vision torchvision
cd torchvision
sudo python3 setup.py install
  1. Install torch2trt
    Followed readme.
    https://github.com/NVIDIA-AI-IOT/torch2trt
sudo apt-get install libprotobuf* protobuf-compiler ninja-build
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
sudo python3 setup.py install --plugins 

@kentaroy47
Copy link
Owner

Actually, by following this setup, I was able to convert torch2trt with Jetson nano as well. Can you try building torchvision as above? I think that was the issue.

@flow-dev
Copy link
Author

flow-dev commented Apr 13, 2020

Thank you for your reply.

The facts that you can do with Jetson nano and Xavier are very valuable information!!!

I'm now trying on 2080ti and ubuntu 18.04 environment,
but what I really need is jetson torch2trt works.
I will check it in parallel.

This problem seems to the generation of libtorch2trt.so.
I ran the below command, libtorch2trt.so had some undefined symbol.

ldd -r /usr/local/lib/python3.6/dist-packages/torch2trt/libtorch2trt.so

linux-vdso.so.1 (0x00007ffcbe5c6000)
	libc10.so => not found
	libc10_cuda.so => not found
	libtorch.so => not found
	libcudart.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0 (0x00007f0e800d2000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f0e7feb3000)
	libnvinfer.so.7 => /usr/lib/x86_64-linux-gnu/libnvinfer.so.7 (0x00007f0e72238000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f0e71eaf000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f0e71c97000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0e718a6000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f0e806e1000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f0e716a2000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f0e7149a000)
	libcudnn.so.7 => /usr/lib/x86_64-linux-gnu/libcudnn.so.7 (0x00007f0e59bcc000)
	libcublas.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0 (0x00007f0e54789000)
	libmyelin.so.1 => /usr/lib/x86_64-linux-gnu/libmyelin.so.1 (0x00007f0e53f78000)
	libnvrtc.so.10.0 => /usr/local/cuda-10.0/targets/x86_64-linux/lib/libnvrtc.so.10.0 (0x00007f0e5295c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f0e525be000)
undefined symbol: _ZN3c1019UndefinedTensorImpl10_singletonE	(/usr/local/lib/python3.6/dist-packages/torch2trt/libtorch2trt.so)
....

The problem is that the following libc10.so libc10_cuda.so libtorch.so related to torch cannot be linked.

This problem depends on torch version and g++ version

The following issues are likely to be helpful.
I haven't solved it yet...

NVIDIA-AI-IOT/torch2trt#53

@flow-dev
Copy link
Author

I read an article about installing torch that you told me. You may be installed torch1.1.0
I'm using torch1.4.0, so the difference seems to be important.

I think to use torch1.1.0 until torch2trt supports torch1.4.0

https://medium.com/hackers-terminal/installing-pytorch-torchvision-on-nvidias-jetson-tx2-81591d03ce32

@kentaroy47
Copy link
Owner

Yes, I use torch 1.1.0 and torchvision 0.3.0 for Jetson Nano.
I think torch1.4.0 is not fully supported for torch2trt yet.

For Xavier, I used 1.3.0 with Nvidia built binaries.
https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048

wget https://nvidia.box.com/shared/static/phqe92v26cbhqjohwtvxorrwnmrnfx1o.whl -O torch-1.3.0-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.3.0-cp36-cp36m-linux_aarch64.whl

@flow-dev
Copy link
Author

Xavier worked with torch 1.3.0. This is great information.
Let's recreate my environment.

Thanks for any useful information. Thanks for your contribution!

@kentaroy47
Copy link
Owner

@flow-dev
Please tell us if your jetson nano/xavier works with the fixed torch version!

It may be informative to create an issue asking an appropriate version of torch that works for torch2trt, which will help others.

@flow-dev
Copy link
Author

@kentaroy47
That's a good suggestion. I will write this issue if I can confirm it.

@kentaroy47 kentaroy47 pinned this issue Apr 13, 2020
@luhang-HPU
Copy link

luhang-HPU commented Apr 14, 2020

Reading all your feedbacks.
A little catch up: I am using pytorch 1.4.0 as well with titan V on AMD64, not on ARM platforms. I think that may be the cause of this problem.

@kentaroy47
Copy link
Owner

Thanks for the comments.
@flow-dev @hive-cas , did the model run for you guys by changing the torch version?

@flow-dev
Copy link
Author

Thanks for the comments.
@flow-dev @hive-cas , did the model run for you guys by changing the torch version?

Not working in the following environments. in my case.
Likely to have other dependencies on AMD64.

ubuntu18.04 JEtPack4.3
2080Ti
pytorch 1.4.0 -> cannot build
pytorch 1.3.0 -> cannot build
pytorch 1.1.0 -> cannot build

I am trying to build Jetson Nano now.
It will take a little longer...

@kentaroy47
Copy link
Owner

@flow-dev
Thanks for the updates.
You can simply pip install the Nvidia build torch? (there is pytorch1.0-1.4 in the link)
https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048

@flow-dev
Copy link
Author

@flow-dev
Thanks for the updates.
You can simply pip install the Nvidia build torch? (there is pytorch1.0-1.4 in the link)
https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-4-0-now-available/72048

Yes. I can simply pip installed.

@luhang-HPU
Copy link

@flow-dev @kentaroy47
I also change the pytorch version from 1.4 to 1.2 and 1.1, and none of them could work.
I use Titan V in ubuntu 18.04LTS with cuda 10.2 and tensorRT 7, and the latest commit of torch2trt. My final purpose is not to use it on nano, but just on my amd64 server.

@kentaroy47
Copy link
Owner

@hive-cas
hmm.. Have you discussed this with the guys in the torch2trt repo? Reporting the error message will greatly help.

@flow-dev
Copy link
Author

@kentaroy47
That's a good suggestion. I will write this issue if I can confirm it.

@kentaroy47
I made it the same environment as you.
I was able to run with JetsonNano.
Thank you very much!
(But you have to install in exactly the same way. Details wait for official response.)

@kentaroy47
Copy link
Owner

It is really weird that Linking fails in amd64 Ubuntu 18.04 servers, since they should behave as same as Xavier hardware..
Can you post the link to the torch2trt issue so that others can help themselves if they get the same error? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants