Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #64

Open
dheerajmk opened this issue Apr 11, 2018 · 5 comments

Comments

@dheerajmk
Copy link

I have implemented this ENet project on nvidia Jetson TX2 with jetpack 3.0 (cuda 8 , cudnn 5.1 , ubuntu 16.04) . and during the training of the encoder , the error which is stated below arises.
as some forums suggested to use "sudo" ,which i did , but also the error remains. please suggest me the solution to remove this error.

my cmake summary is as follows.
Caffe Configuration Summary
-- General:
-- Version : 1.0.0-rc3
-- Git : 22d356c
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
-- Debug CXX flags : -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
-- Build type : Release
-- BUILD_SHARED_LIBS : ON
-- BUILD_python : ON
-- BUILD_matlab : OFF
-- BUILD_docs : ON
-- CPU_ONLY : OFF
-- USE_OPENCV : ON
-- USE_LEVELDB : ON
-- USE_LMDB : ON
-- ALLOW_LMDB_NOLOCK : OFF
-- Dependencies:
-- BLAS : Yes (Atlas)
-- Boost : Yes (ver. 1.61)
-- glog : Yes
-- gflags : Yes
-- protobuf : Yes (ver. 3.1.0)
-- lmdb : Yes (ver. 0.9.17)
-- LevelDB : Yes (ver. 1.18)
-- Snappy : Yes (ver. 1.1.3)
-- OpenCV : Yes (ver. 2.4.13)
-- CUDA : Yes (ver. 8.0)
-- NVIDIA CUDA:
-- Target GPU(s) : Auto
-- GPU arch(s) : sm_62
-- cuDNN : Yes (ver. 5.1.10)
-- Python:
-- Interpreter : /usr/bin/python2.7 (ver. 2.7.12)
-- Libraries : /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.12)
-- NumPy : /usr/local/lib/python2.7/dist-packages/numpy/core/include (ver 1.14.2)
-- Documentaion:
-- Doxygen : /usr/bin/doxygen (1.8.11)
-- config_file : /home/nvidia/ENet/caffe-enet/.Doxyfile
-- Install:
-- Install path : /home/nvidia/ENet/caffe-enet/build/install
-- Configuring done
-- Generating done
-- Build files have been written to: /home/nvidia/ENet/caffe-enet/build

and the error is as follows at the training of encoder stage

I0411 10:37:39.844830 4349 layer_factory.hpp:77] Creating layer conv3_3_1
I0411 10:37:39.844874 4349 net.cpp:100] Creating Layer conv3_3_1
I0411 10:37:39.844895 4349 net.cpp:434] conv3_3_1 <- conv3_3_1_a
I0411 10:37:39.844918 4349 net.cpp:408] conv3_3_1 -> conv3_3_1
F0411 10:37:39.854919 4349 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
@ 0x7f98134718 google::LogMessage::Fail()
@ 0x7f98136614 google::LogMessage::SendToLog()
@ 0x7f98134290 google::LogMessage::Flush()
@ 0x7f98136eb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9840c988 caffe::CuDNNConvolutionLayer<>::LayerSetUp()
@ 0x7f98438774 caffe::Net<>::Init()
@ 0x7f98439ff0 caffe::Net<>::Net()
@ 0x7f9841b510 caffe::Solver<>::InitTestNets()
@ 0x7f9841bd84 caffe::Solver<>::Init()
@ 0x7f9841c034 caffe::Solver<>::Solver()
@ 0x7f98455c7c caffe::Creator_AdamSolver<>()
@ 0x40c3cc train()
@ 0x4093e0 main
@ 0x7f9777e8a0 __libc_start_main
Aborted (core dumped)

@vsuryamurthy
Copy link

If you have not solved the problem yet, I suggest you check the following:
i) Check if GPU has enough memory. The image resolution or the batch size might be large.
ii) You might have to change the name of the last layer if you are using different number of classes (This is only if you are fine-tuning a pretrained network).

@WellXiong
Copy link

If anything you have tried and you still don't deal it, try " sudo rm -rf ~/.nv/"

@ASONG0506
Copy link

If anything you have tried and you still don't deal it, try " sudo rm -rf ~/.nv/"

That really worked for me, THX!

@yuxwind
Copy link

yuxwind commented Feb 28, 2020

Actually, I was out of GPU memory. After killing some application, the error is fixed.

@Xpangz
Copy link

Xpangz commented Oct 29, 2020

when I was troulbed in this problem, my finally ways was just changed the gpu ID.
Caffe default gpu number is 0, so if your gpu 0 is occupied,you‘d better change thd ID,otherwise the error 'Abort(core dumped)' could be generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants