Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #1212

jmaicas · 2021-05-02T21:35:48Z

Hi, I keep having the same issue that I tried to explain in #1200 (comment). I am working in Ubuntu 18.04 with a Nvidia RTX 4000

After trying the Docker and getting the same issue, I am back using the DLC-GPU environment, I still get the same problem when I try to train.

Right now I came back to install Cuda 10.0 (included in the PATH in the .bashrc file) because that is the one is used by the DLC-GPU environment when I install it in miniconda.

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130

During training, I opted for doing allow_growth = True because I saw it in this github issue. tensorflow/tensorflow#24496

However, it still does not work.

deeplabcut.train_network(path_config_file, shuffle=1, allow_growth=True, gputouse=0, displayiters=50,saveiters=20000, maxiters=50000)

2021-05-02 22:16:22.447624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7018 MB memory) -> physical GPU (device: 0, name: NVIDIA Quadro RTX 4000, pci bus id: 0000:2d:00.0, compute capability: 7.5)
2021-05-02 22:16:22.449495: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55d6db8d6fd0 executing computations on platform CUDA. Devices:
2021-05-02 22:16:22.449516: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): NVIDIA Quadro RTX 4000, Compute Capability 7.5
2021-05-02 22:16:29.122700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-05-02 22:16:29.122776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-02 22:16:29.122783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990]      0 
2021-05-02 22:16:29.122790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0:   N 
2021-05-02 22:16:29.122871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7018 MB memory) -> physical GPU (device: 0, name: NVIDIA Quadro RTX 4000, pci bus id: 0000:2d:00.0, compute capability: 7.5)
2021-05-02 22:16:37.593240: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-05-02 22:16:37.598426: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I also get this:

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node resnet_v1_50/conv1/Conv2D (defined at /home/jorge/miniconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:124) ]]
	 [[node sigmoid_cross_entropy_loss/value (defined at /home/jorge/miniconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:283) ]]

The text was updated successfully, but these errors were encountered:

AlexEMG · 2021-05-03T10:18:28Z

This is a TensorFlow installation issue: tensorflow/tensorflow#24828

jmaicas · 2021-05-09T17:08:33Z

Thanks Alexander!

After going through the different options in the link you passed, this one worked for me!

Set the TF_FORCE_GPU_ALLOW_GROWTH environment variable to true.
In your terminal, run this command.
$ export TF_FORCE_GPU_ALLOW_GROWTH=true

Curiously enough, it did not work for me setting allow_growth to true as a python variable, but it did work setting it as a system variable.

tensorflow/tensorflow#24828 (comment)

AlexEMG closed this as completed May 3, 2021

AlexEMG added Installation issue with non-DLC code labels May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #1212

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #1212

jmaicas commented May 2, 2021

AlexEMG commented May 3, 2021 •

edited

jmaicas commented May 9, 2021 •

edited

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #1212

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #1212

Comments

jmaicas commented May 2, 2021

AlexEMG commented May 3, 2021 • edited

jmaicas commented May 9, 2021 • edited

AlexEMG commented May 3, 2021 •

edited

jmaicas commented May 9, 2021 •

edited