Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Failed to copy to symbol host-to-device: invalid device symbol #192

Open
anand97 opened this issue Jul 18, 2022 · 9 comments
Open

Comments

@anand97
Copy link

anand97 commented Jul 18, 2022

I am able to compile and run the CCTag detection application with no issues on CPU on a Jetson Nano, with the sample images in the repository. On the other hand, when I pass the --use-cuda flag, I get this error:

`You called ./build/Linux-aarch64/detection with:
--input sample/02.png
--nbrings 3
--bank
--params
--output
--parallel 1
--use-cuda

******************* Image mode **********************
Creating TagPipe 0
Initializing TagPipe 0
/home/dozer/git/CCTag/src/./cctag/cuda/frame_02_gaussian.cu:144
Failed to copy to symbol host-to-device: invalid device symbol
src ptr=7faa0113a0
dst ptr=7faa4b73e8
`

CCtag is being built with cuda support and cmake was able to find the appropriate cuda libraries on my Nano. I am running CUDA 10.2.300 . Please let me know if you need any additional information to debug this issue!

@anand97 anand97 changed the title Failed to copy to symbol host-to-device: invalid device symbol [bug] Failed to copy to symbol host-to-device: invalid device symbol Jul 18, 2022
@simogasp
Copy link
Member

simogasp commented Jul 19, 2022

I think the problem might come from the missing architecture flags for Jetson Nano.
It requires arch=compute_53,code=sm_53 (see https://forums.developer.nvidia.com/t/jetson-nano-running-openpose-example-gives-a-cuda-check-failed/77196/3) but they are not in our list
https://github.com/alicevision/CCTag/blob/develop/CMakeLists.txt#L166

You can try to add them in the CMake but I'm not sure the code is compatible. (just try to add 5.3 to the list)

@griwodz can tell better. We should also update the list of compatible architectures wrt the CUDA version.

@simogasp simogasp added the cuda label Jul 19, 2022
@anand97
Copy link
Author

anand97 commented Jul 20, 2022

Thanks for your reply. I tried adding the 5.3 Compute capability specifier to the list, and clean rebuilt the repository. I get a different error now though that seems to be related to running out of memory, I'm not sure whether this was the kind of error that we would expect if the code did not support this architecture :
`
You called ./Linux-aarch64/detection with:
--input ../sample/01.png
--nbrings 3
--bank
--params
--output
--parallel 1
--use-cuda

******************* Image mode **********************
Creating TagPipe 0
Initializing TagPipe 0
Loading image 0 into TagPipe 0
terminate called after throwing an instance of 'thrust::system::system_error'
what(): copy_if failed on 2nd step: cudaErrorLaunchOutOfResources: too many resources requested for launch
Aborted (core dumped)
`
I was running tegrastats on the side to look at RAM utilisation, I had about 2.5GB of available system memory( which is shared by the GPU on a Nano) throughout the execution. Please let me know if you have any other ideas that I can try out, thanks!

@simogasp
Copy link
Member

Memory could very well be the issue.
You can try to crop that image around one of the cctag and process that one. A smaller image require less image.
I suggest crop because if you scale down the image too much the cctag won't be detected because too small

@anand97
Copy link
Author

anand97 commented Jul 20, 2022

Hello! Thanks again for your suggestion and reply. I tried cropping the sample image down to 260x260 px with one tag visible fully in the center, I still receive a similar error.
`You called ./Linux-aarch64/detection with:
--input ../samples/02_crop.png
--nbrings 3
--bank
--params
--output
--parallel 1
--use-cuda

******************* Image mode **********************
Creating TagPipe 0
Initializing TagPipe 0
Loading image 0 into TagPipe 0
terminate called after throwing an instance of 'terminate called recursively
terminate called recursively
Aborted (core dumped)
`

For an image size of 640 x 640, I get the same cudaErrorLaunchOutOfResources: too many resources requested for launch error.
I'm not familiar with cuda programming but am pretty well versed in C++, if you have any suggestions for code updates. Happy to try anything else as well.

@anand97
Copy link
Author

anand97 commented Jul 21, 2022

Some more information, when trying with the --sync option, I get this message sometimes:
~/git/CCTag/src/cctag/cuda/debug_macros.cpp:27
called from ~/git/CCTag/src/./cctag/cuda/frame_07c_eval.cu:243
cudaGetLastError failed: invalid configuration argument
terminate called recursively

@simogasp
Copy link
Member

I'm not an expert in cuda either.
The cuda code was optimized to run in real time on GTXs.
It is possible some of the configurations used for blocks, num threads etc are not supported on the Nano.
Looking at this e.g.
https://stackoverflow.com/questions/16125389/invalid-configuration-argument-error-for-the-call-of-cuda-kernel
it seems that the error you are getting with the sync option might refer to that (unsupported configuration).
we have to wait for @griwodz to have confirmation

@anand97
Copy link
Author

anand97 commented Jul 21, 2022

Hello @simogasp thanks for your reply. I'm happy to report that I seem to have solved the problem.
Just like you mentioned, the block size seemed to be the issue. From another issue #170 (comment), and https://www.wikiwand.com/en/CUDA#/Version_features_and_specifications it seems like the Jetson nano supports only 16 'grids per resident device', hence I changed the block size in all the spots in frame_07c_eval.cu. Compiled and ran perfectly at about 0.9s per frame, even at full resolution of the sample files. I can attach a diff for posterity if you'd like to incorporate it into the framework. Thanks again for all your help!

Note: I figure this change has to be made on all the other cuda files as well, but it seems like just this fixed it for me.

@simogasp
Copy link
Member

thanks for testing that.
I don't know if it is doable, but it would be nice if we could parametrize these parameters according to the architecture. It's nice to hear that it can work even on the jetson nano with a decent performance. We should definitely find the way to enable that at compiling time.

@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants