Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #64

dheerajmk · 2018-04-11T10:52:37Z

I have implemented this ENet project on nvidia Jetson TX2 with jetpack 3.0 (cuda 8 , cudnn 5.1 , ubuntu 16.04) . and during the training of the encoder , the error which is stated below arises.
as some forums suggested to use "sudo" ,which i did , but also the error remains. please suggest me the solution to remove this error.

my cmake summary is as follows.
Caffe Configuration Summary
-- General:
-- Version : 1.0.0-rc3
-- Git : 22d356c
-- System : Linux
-- C++ compiler : /usr/bin/c++
-- Release CXX flags : -O3 -DNDEBUG -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
-- Debug CXX flags : -g -fPIC -Wall -Wno-sign-compare -Wno-uninitialized
-- Build type : Release
-- BUILD_SHARED_LIBS : ON
-- BUILD_python : ON
-- BUILD_matlab : OFF
-- BUILD_docs : ON
-- CPU_ONLY : OFF
-- USE_OPENCV : ON
-- USE_LEVELDB : ON
-- USE_LMDB : ON
-- ALLOW_LMDB_NOLOCK : OFF
-- Dependencies:
-- BLAS : Yes (Atlas)
-- Boost : Yes (ver. 1.61)
-- glog : Yes
-- gflags : Yes
-- protobuf : Yes (ver. 3.1.0)
-- lmdb : Yes (ver. 0.9.17)
-- LevelDB : Yes (ver. 1.18)
-- Snappy : Yes (ver. 1.1.3)
-- OpenCV : Yes (ver. 2.4.13)
-- CUDA : Yes (ver. 8.0)
-- NVIDIA CUDA:
-- Target GPU(s) : Auto
-- GPU arch(s) : sm_62
-- cuDNN : Yes (ver. 5.1.10)
-- Python:
-- Interpreter : /usr/bin/python2.7 (ver. 2.7.12)
-- Libraries : /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.12)
-- NumPy : /usr/local/lib/python2.7/dist-packages/numpy/core/include (ver 1.14.2)
-- Documentaion:
-- Doxygen : /usr/bin/doxygen (1.8.11)
-- config_file : /home/nvidia/ENet/caffe-enet/.Doxyfile
-- Install:
-- Install path : /home/nvidia/ENet/caffe-enet/build/install
-- Configuring done
-- Generating done
-- Build files have been written to: /home/nvidia/ENet/caffe-enet/build

and the error is as follows at the training of encoder stage

I0411 10:37:39.844830 4349 layer_factory.hpp:77] Creating layer conv3_3_1
I0411 10:37:39.844874 4349 net.cpp:100] Creating Layer conv3_3_1
I0411 10:37:39.844895 4349 net.cpp:434] conv3_3_1 <- conv3_3_1_a
I0411 10:37:39.844918 4349 net.cpp:408] conv3_3_1 -> conv3_3_1
F0411 10:37:39.854919 4349 cudnn_conv_layer.cpp:53] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
@ 0x7f98134718 google::LogMessage::Fail()
@ 0x7f98136614 google::LogMessage::SendToLog()
@ 0x7f98134290 google::LogMessage::Flush()
@ 0x7f98136eb4 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9840c988 caffe::CuDNNConvolutionLayer<>::LayerSetUp()
@ 0x7f98438774 caffe::Net<>::Init()
@ 0x7f98439ff0 caffe::Net<>::Net()
@ 0x7f9841b510 caffe::Solver<>::InitTestNets()
@ 0x7f9841bd84 caffe::Solver<>::Init()
@ 0x7f9841c034 caffe::Solver<>::Solver()
@ 0x7f98455c7c caffe::Creator_AdamSolver<>()
@ 0x40c3cc train()
@ 0x4093e0 main
@ 0x7f9777e8a0 __libc_start_main
Aborted (core dumped)

vsuryamurthy · 2018-05-08T10:02:06Z

If you have not solved the problem yet, I suggest you check the following:
i) Check if GPU has enough memory. The image resolution or the batch size might be large.
ii) You might have to change the name of the last layer if you are using different number of classes (This is only if you are fine-tuning a pretrained network).

WellXiong · 2018-06-28T03:22:56Z

If anything you have tried and you still don't deal it, try " sudo rm -rf ~/.nv/"

ASONG0506 · 2019-12-17T03:05:41Z

If anything you have tried and you still don't deal it, try " sudo rm -rf ~/.nv/"

That really worked for me, THX!

yuxwind · 2020-02-28T06:22:06Z

Actually, I was out of GPU memory. After killing some application, the error is fixed.

Xpangz · 2020-10-29T08:57:45Z

when I was troulbed in this problem, my finally ways was just changed the gpu ID.
Caffe default gpu number is 0, so if your gpu 0 is occupied，you‘d better change thd ID，otherwise the error 'Abort(core dumped)' could be generated.

johndpope mentioned this issue May 23, 2020

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR wells-wei-wei/smplify-x_in_docker#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #64

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #64

dheerajmk commented Apr 11, 2018

vsuryamurthy commented May 8, 2018

WellXiong commented Jun 28, 2018

ASONG0506 commented Dec 17, 2019

yuxwind commented Feb 28, 2020

Xpangz commented Oct 29, 2020

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #64

Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR #64

Comments

dheerajmk commented Apr 11, 2018

vsuryamurthy commented May 8, 2018

WellXiong commented Jun 28, 2018

ASONG0506 commented Dec 17, 2019

yuxwind commented Feb 28, 2020

Xpangz commented Oct 29, 2020