Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496

michaelmyc · 2018-12-21T06:29:31Z

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes and No (described below)
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Manjaro
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
TensorFlow installed from (source or binary): tf-nightly-gpu (Dec 19, r1.13)
TensorFlow version (use command below): 1.13.0-dev20181219
Python version: 3.7.1
Bazel version (if compiling from source):
GCC/Compiler version (if compiling from source):
CUDA/cuDNN version: CUDA 10 with cuDNN 7.4.1
GPU model and memory: RTX 2070 8GB

Describe the current behavior
I'm running the CNN model on MNIST. When I'm running with the GPU, I am encountering
2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I did some digging and realized that it is a memory issue (which shouldn't be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings.

Using the gpu_options.allow_growth = True gets the model to work properly, and setting os.environ['CUDA_VISIBLE_DEVICES'] = '-1' also works. This means that I AM facing a memory issue, but I don't see how.

Also, using gpu_options.allow_growth = True does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code.

Code to reproduce the issue

import os
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
import math
import time
# Killing optional CPU driver warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
# os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
tf.logging.set_verbosity(tf.logging.ERROR)


class Model:

    def __init__(self, image, label):
        """
        A Model class contains a computational graph that classifies images
        to predictions. Each of its methods builds part of the graph
        on Model initialization. Do not modify the constructor, as doing so
        would break the autograder. You may, however, add class variables
        to use in your graph-building. e.g. learning rate, 

        image: the input image to the computational graph as a tensor
        label: the correct label of an image as a tensor
        prediction: the output prediction of the computational graph,
                    produced by self.forward_pass()
        optimize: the model's optimizing tensor produced by self.optimizer()
        loss: the model's loss produced by computing self.loss_function()
        accuracy: the model's prediction accuracy
        """
        self.image = image
        self.label = label

        # TO-DO: Add any class variables you want to use.

        self.prediction = self.forward_pass()
        self.loss = self.loss_function()
        self.optimize = self.optimizer()
        self.accuracy = self.accuracy_function()

    def forward_pass(self):
        """
        Predicts a label given an image using convolution layers

        :return: the prediction as a tensor
        """
        filter_1 = tf.Variable(tf.truncated_normal([3, 3, 1, 8], stddev=0.1))
        conv_1 = tf.nn.conv2d(self.image, filter_1, [1, 1, 1, 1], "SAME")

        reshaped = tf.reshape(conv_1, shape=[50, -1])

        L1 = reshaped.shape[1].value
        L2 = 500
        W1 = tf.Variable(tf.random_normal([L1, L2], mean=0, stddev=0.01))
        b1 = tf.Variable(tf.random_normal([L2], mean=0, stddev=0.01))
        relu_1 = tf.nn.relu(tf.matmul(reshaped, W1) + b1)

        W2 = tf.Variable(tf.random_normal([L2, 10], mean=0, stddev=0.01))
        b2 = tf.Variable(tf.random_normal([10], mean=0, stddev=0.01))
        logits = tf.nn.relu(tf.matmul(relu_1, W2) + b2)
        return logits

    def loss_function(self):
        """
        Calculates the model cross-entropy loss

        :return: the loss of the model as a tensor
        """
        loss = tf.losses.softmax_cross_entropy(onehot_labels=self.label, logits=self.prediction)
        return loss

    def optimizer(self):
        """
        Optimizes the model loss using an Adam Optimizer

        :return: the optimizer as a tensor
        """
        learning_rate = 0.1
        sgd = tf.train.GradientDescentOptimizer(learning_rate)
        train = sgd.minimize(self.loss)
        return train

    def accuracy_function(self):
        """
        Calculates the model's prediction accuracy by comparing
        predictions to correct labels – no need to modify this

        :return: the accuracy of the model as a tensor
        """
        correct_prediction = tf.equal(tf.argmax(self.prediction, 1),
                                      tf.argmax(self.label, 1))
        return tf.reduce_mean(tf.cast(correct_prediction, tf.float32))


def main():
    t_start = time.time()

    mnist = input_data.read_data_sets("data/mnist/", one_hot=True)
    batch_sz = 50
    batch = 2000

    inputs = tf.placeholder(shape=[batch_sz, 28, 28, 1], dtype=tf.float32)
    labels = tf.placeholder(shape=[batch_sz, 10], dtype=tf.float32)

    model = Model(inputs, labels)

    session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
    sess = tf.Session(config=session_config)

    # sess = tf.Session()

    sess.run(tf.global_variables_initializer())
    for i in range(batch):
        next_image, next_label = mnist.train.next_batch(batch_sz)
        next_image = next_image.reshape((batch_sz, 28, 28, 1))
        sess.run(model.optimize, feed_dict={inputs: next_image, labels: next_label})

    acc, test_images, test_labels = 0, mnist.test.images, mnist.test.labels
    test_batch = math.ceil(len(test_images) / batch_sz)
    for i in range(test_batch):
        batch_images = test_images[i * batch_sz: (i + 1) * batch_sz]
        batch_images = batch_images.reshape((batch_sz, 28, 28, 1))
        batch_labes = test_labels[i * batch_sz: (i + 1) * batch_sz]
        acc += sess.run(model.accuracy, feed_dict={inputs: batch_images, labels: batch_labes})
    acc /= test_batch
    print(acc)

    print(time.time() - t_start, 'seconds')

    return


if __name__ == '__main__':
    main()

The text was updated successfully, but these errors were encountered:

va-andrew · 2019-01-15T05:12:42Z

I've been running into the same issue with the same GPU: "CUDNN_STATUS_INTERNAL_ERROR".

RTX 2070 GPU
CUDA 10
cuDNN 7.4.2
Ubuntu 18.04
tf-nightly-gpu (r1.13, Jan 13)
Python 3.6.7

2019-01-15 05:01:03.503415: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA li
brary libcublas.so.10.0 locally
2019-01-15 05:01:03.752563: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA li
brary libcudnn.so.7 locally
2019-01-15 05:01:04.905618: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STAT
US_INTERNAL_ERROR
2019-01-15 05:01:04.908147: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STAT
US_INTERNAL_ERROR
2019-01-15 05:01:04.908191: W tensorflow/core/framework/op_kernel.cc:1412] OP_REQUIRES failed at conv_ops_fused.cc:801 :
 Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to se
e if a warning log message was printed above.

dennisjay · 2019-01-17T13:20:09Z

I've the same problem running on

RTX2080 GPU
CUDA 10
cudnn 7.4.2

I tried the following tf Versions tf-nightly-gpu and a self compiled Version from master (060b6e3).
I found out that its possible to set the following ENVIRONMENT Variables to get further Debug Info.

CUDNN_LOGINFO_DBG=1;
CUDNN_LOGDEST_DBG=stdout

Then i get the following error:

I0117 14:11:24.441819 140433563125568 basic_session_run_hooks.py:594] Saving checkpoints for 0 into /tmp/mnist/model.ckpt.
2019-01-17 14:11:25.916269: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcublas.so.10.0 locally

I! CuDNN (v7402) function cudnnCreate() called:
i! Time: 2019-01-17T14:11:26.079184 (0d+0h+0m+0s since start)
i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.079151: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcudnn.so.7 locally

I! CuDNN (v7402) function cudnnCreate() called:
i! Time: 2019-01-17T14:11:26.571897 (0d+0h+0m+0s since start)
i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.571858: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-01-17 14:11:26.579375: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I! CuDNN (v7402) function cudnnCreate() called:
i! Time: 2019-01-17T14:11:26.579803 (0d+0h+0m+0s since start)
i! Process=29255; Thread=29356; GPU=NULL; Handle=NULL; StreamId=NULL.

2019-01-17 14:11:26.585818: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-01-17 14:11:26.585850: W ./tensorflow/stream_executor/stream.h:2109] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1320, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1408, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node Discriminator_1/Conv/Conv2D}}]]
[[train/discriminator_train/train_op/control_dependency/_569]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 151, in
tf.app.run()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 147, in main
get_hooks_fn=tfgan.get_joint_train_hooks())
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py", line 1200, in gan_train
config=config)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/training/python/training/training.py", line 546, in train
loss = session.run(train_op, run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 693, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1188, in run
run_metadata=run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1287, in run
raise six.reraise(*original_exc_info)
File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1272, in run
return self._sess.run(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1336, in run
feed_dict, options)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/monitored_session.py", line 1362, in _call_hook_before_run
request = hook.before_run(run_context)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py", line 1061, in before_run
run_context.session.run(self._train_ops)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node Discriminator_1/Conv/Conv2D (defined at home/dj/projects/gan/tf_models/research/gan/mnist/networks.py:152) ]]
[[train/discriminator_train/train_op/control_dependency/_569]]

Errors may have originated from an input operation.
Input Source operations connected to node Discriminator_1/Conv/Conv2D:
inputs/batch/n (defined at home/dj/projects/gan/tf_models/research/gan/mnist/data_provider.py:67)

Original stack trace for 'Discriminator_1/Conv/Conv2D':
File "home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 151, in
tf.app.run()
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "home/dj/projects/gan/tf_models/research/gan/mnist/train.py", line 87, in main
[FLAGS.batch_size, FLAGS.noise_dims]))
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/gan/python/train.py", line 118, in gan_model
discriminator_real_outputs = discriminator_fn(real_data, generator_inputs)
File "home/dj/projects/gan/tf_models/research/gan/mnist/networks.py", line 176, in unconditional_discriminator
net = _discriminator_helper(img, False, None, weight_decay)
File "home/dj/projects/gan/tf_models/research/gan/mnist/networks.py", line 152, in _discriminator_helper
net = layers.conv2d(img, 64, [4, 4], stride=2)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1155, in convolution2d
conv_dims=2)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args
return func(*args, **current_args)
File "usr/local/lib/python3.6/dist-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1058, in convolution
outputs = layer.apply(inputs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 1228, in apply
return self.call(inputs, *args, **kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/layers/base.py", line 531, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 564, in call
outputs = self.call(inputs, *args, **kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/layers/convolutional.py", line 196, in call
outputs = self._convolution_op(inputs, self.kernel)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 966, in call
return self.conv_op(inp, filter)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 591, in call
return self.call(inp, filter)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 208, in call
name=self.name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_ops.py", line 1578, in conv2d
name=name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_nn_ops.py", line 1040, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 501, in new_func
return func(*args, **kwargs)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

Any ideas somebody? I am just before reinstalling my complete environement :-(

michaelmyc · 2019-01-17T14:08:27Z

Try to compile r1.13 from source. It would take a long time, but it should fix your problem. At least it fixed mine.

va-andrew · 2019-01-17T17:49:45Z

I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting config.gpu_options.allow_growth = True.

nickovs · 2019-01-22T01:46:45Z

I've been having the same issue (on an RTX 2060, Ubuntu 18.04, Python 3.6.7, CUDA 10.0.130, cuDNN 7.4.2, Tensorflow 1.13.0-rc0 from source). Thanks to @va-andrew's suggestion I have it working with the allow_growth option set.

FWIW, in the course of searching for solutions to this it seems that this issue is a common problem with the RTX series (although it might be a general problem with CUDA 10.0, since the new cards don't support the older versions). It would be great if the defaults could get updated in the release of 1.13 so that special options don't need to be set for these cards.

newhouseb · 2019-01-25T20:16:52Z

Chiming in to say I also experienced this under the following configuration:

Running tf benchmarks from https://github.com/tensorflow/benchmarks
RTX 2080
Ubuntu 18.04
CUDA 10.0
Nvidia Drivers 415.27
Tensorflow 1.13.0-dev20190125
CuDNN 7.4.2
Python 3

Tensorflow Docker GPU containers with stable releases of everything don't work either (they straight up segfault rather than report CUDNN_STATUS_INTERNAL_ERROR).

Curiously, things work fine on Windows 10 with Tensorflow v1.12!

And has others have reported, setting allow_growth allows things to run properly.

nkdsoft · 2019-01-29T10:54:43Z

Same problem here.

RTX 2070
Ubuntu 18.04
CudNN 7.4.2 (but I have tried compiling with other older versions with no luck)
Tensorflow 1.13.0-dev20190125 (also tried Tensorflow 1.12 compiled with Cuda 10)

And as others have reported, setting allow_growth=TRUE allows things to run.

ymodak · 2019-01-31T00:29:52Z

Closing this issue since its resolved. Thanks!

nickovs · 2019-01-31T04:09:24Z

@ymodak Can you please reference the PR that fixed this bug?

peterroelants · 2019-02-05T22:05:33Z

I have a similar issue with tf-nightly-gpu-2.0-preview on the RTX 2080

hoermannpaul · 2019-02-17T18:15:34Z

Same issue with an RTX2080, spent two days recompiling and bug hunting until I found this fix.
(the allow_growth=true thing fixed it)

You made my day

oscarlinux · 2019-02-18T21:47:07Z

How do you actually set allow_growth=true? I have tf-nightly-gpu-2.0-preview and tried:

import tensorflow as tf
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

but get this error:

AttributeError Traceback (most recent call last)
in ()
1 import tensorflow as tf
----> 2 config = tf.ConfigProto()

AttributeError: module 'tensorflow' has no attribute 'ConfigProto'

How can I set allow_growth in tensorflow 2.0?

oscarlinux · 2019-02-18T22:45:02Z

ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

sandacn · 2019-03-27T14:34:33Z

same issue, with gpu_options.allow_growth = True the issue fixed.

diego898 · 2019-04-01T19:02:44Z

@newhouseb how/where did you set that true for all benchmarks? Was it an easy change?

samhodge · 2019-04-06T21:51:22Z

Is blanket allow growth a solution ?

It is turned off by default for a reason see
https://www.tensorflow.org/guide/using_gpu#allowing_gpu_memory_growth

In my program memory management is important

I would like to limit the amount of GPU used by TF because in my graphics application the GPU memory will be used for other things and putting it into a limited space is important to prevent out of memory errors

samhodge · 2019-04-07T00:44:40Z

I am working in C++ under Windows

Adding the allow growth option results in an OOM error.

Without this line of code the model runs fine on the same machine with the same card.

With OOM error

options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(fraction);

Without OOM error

//options.config.mutable_gpu_options()->set_allow_growth(true);
options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(fraction);

So to solve this problem with set allow growth results in a segfault.

yorickvP · 2019-04-12T09:11:21Z

@ymodak This bug is not fixed. Arguably, using any sort of convnet should work in the default configuration. Either allow_growth should be true by default, it should be fixed so this works, or there should be a better error than CUDNN_STATUS_INTERNAL_ERROR.

nickovs · 2019-04-13T01:09:01Z

@ymodak It looks like this issue was closed prematurely. While there is a work-around for this issue it involves changing application code. As a result the example code does not work out of the box on RTX cards and most recipes on line will also need modification.

roebel · 2020-08-21T10:25:29Z

In case your problem has the same origin as the problems that are treated in the present issue (which I cannot know from your report) then there are a few solutions that you can easily find by means of reading the last 10-20 posts in this thread.

bigboy32 · 2020-08-21T17:29:40Z

I Fixed it with this:

config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.compat.v1.Session(config=config)
sess.as_default()

Gangadharsmg · 2020-08-24T09:59:16Z

I had this same issue with RTX 2080. Then following code worked for me.

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Thanks everyone

nikste · 2020-08-24T18:12:58Z

I think we can stop posting the allow_growth fix now :)

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR tensorflow/tensorflow#24496

drscotthawley · 2020-10-17T21:41:40Z

RTX 2070 here. Was getting this error, but now running with TF_FORCE_GPU_ALLOW_GROWTH=true (as other commenters have pointed out, fixes it for them) changes the error message to an out of memory error (even though I've got plenty of memory):

2020-10-17 16:35:11.717658: I tensorflow/stream_executor/cuda/cuda_driver.cc:831] failed to allocate 3.87G (4159818752 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory

But my GPU has 8GB and only about 250MB were in use before I started the process. So I don't understand, why can't it allocate 3.87GB? (lowering batch size had no effect; the weights hdf5 file is less than 200MB)

TiruBokka · 2020-10-18T15:25:31Z

TF_FORCE_GPU_ALLOW_GROWTH=true worked for me.
tf.config.experimental.set_memory_growth(gpu, True) worked too.

Here is my configuration:
GPU GTX 1650
cuda-10-1 10.1.243-1
libcudnn7 7.6.5.32-1+cuda10.1
Ubuntu 18.04.5 LTS

Whoever cannot set the environment variable, could try this as suggested in https://www.tensorflow.org/guide/gpu:
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
# Currently, memory growth needs to be the same across GPUs
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
logical_gpus = tf.config.experimental.list_logical_devices('GPU')
print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
except RuntimeError as e:
# Memory growth must be set before GPUs have been initialized
print(e)

sachinkmohan · 2020-10-23T09:21:33Z

Typing the command mentioned on the terminal just worked for me.

tensorflow/tfjs#671 (comment)

zzhuolun · 2020-11-12T15:48:34Z

Just upgrade to Tensorflow 2.3 with CUDA 11 and cudnn 8.0. It magically solved all my problems and I don't even need the workaround with config.gpu_options.allow_growth = True now.

It seems that the issue is noticed and solved in tensorflow 2.3.0.

CUDA 10.1
GPU: Quadro RTX 6000
Tensorflow 2.2.0
cudnn 7.6.5

Same problem:
tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.

And the workaround allow_growth = True does not help.

After I upgrade tensorflow to 2.3.0, the problem disappeared, even without adding the line allow_growth = True .

duongdqq · 2020-11-17T05:40:41Z

ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

it works in my case

Solution from tensorflow/tensorflow#24496

wojdzi1607 · 2021-03-10T11:39:54Z

ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

It works, paste to start python file you execute. Ubuntu 20.04, docker Nvidia, tensorflow 1.15, GTX 1060

LiUzHiAn · 2021-06-25T02:21:38Z

Hi,

The config.gpu_options.allow_growth = True option also works well with Keras. One can initialize a session and specify it to Keras, just something as follows:

from tensorflow.keras import backend as K
import tensorflow as tf

session_config = tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))
sess = tf.Session(config=session_config)
K.set_session(sess)

Hope it helps.

google-ml-butler · 2021-07-26T08:09:29Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-08-05T19:05:15Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2021-08-05T19:06:09Z

Are you satisfied with the resolution of your issue?
Yes
No

ymodak added the comp:gpu GPU related issues label Jan 31, 2019

ymodak closed this as completed Jan 31, 2019

lmoneta mentioned this issue Feb 5, 2019

Add possibility to specify number of threads in PyKeras and set specific Tensorflow session config options root-project/root#2566

Merged

oscarlinux mentioned this issue Feb 18, 2019

Error : Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. #24828

Closed

kyleabeauchamp mentioned this issue Mar 6, 2019

TypeError: Fail to find the dnn implementation. keras-team/keras#10634

Closed

saierd mentioned this issue Mar 28, 2019

Could not create cudnn handle ensenso/ros_driver#18

Closed

bobiblazeski mentioned this issue Mar 29, 2019

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR tensorflow/tfjs#1469

Closed

ymodak mentioned this issue Apr 2, 2019

CUDNN_STATUS_INTERNAL_ERROR on GTX 1660 TI #27144

Closed

samhodge mentioned this issue Apr 6, 2019

Failing on a in tensorflow_cc.so on Windows 7 on Quadro R5000 16Gb with v1.12 and CUDA 10.0.130 and CUDNN 7.4.2.24 OK under Windows 10 Quadro P5000 and GTX 1060 6Gb #27441

Closed

ymodak reopened this Apr 13, 2019

ymodak added the type:bug Bug label Apr 13, 2019

Spotlight0xff pushed a commit to Spotlight0xff/returnn that referenced this issue Sep 5, 2020

comment

16624c8

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR tensorflow/tensorflow#24496

rsanchezgarc mentioned this issue Sep 8, 2020

Problem: CUDA driver version is insufficient for CUDA runtime version. rsanchezgarc/deepEMhancer#4

Closed

ravikyram mentioned this issue Sep 25, 2020

Multi_GPU MODEL EXAMPLE : Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED tensorflow/models#9301

Closed

cguegi mentioned this issue Nov 25, 2020

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR tony-framework/TonY#486

Merged

amahendrakar mentioned this issue Dec 6, 2020

cuDNN issue in TF #45423

Closed

mtyrolski mentioned this issue Dec 16, 2020

CUDNN_STATUS_INTERNAL_ERROR google/trax#1311

Closed

ssnirgudkar mentioned this issue Jan 23, 2021

During evaluation phase of Pascal VOC dataset with DeepLabv3/xception_65, 'Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR' error is emitted tensorflow/models#9661

Open

MatthijsBurgh added a commit to tue-robotics/image_recognition that referenced this issue Feb 10, 2021

(tensorflow) Fix issues when running with cuda

bf5ebab

Solution from tensorflow/tensorflow#24496

MatthijsBurgh added a commit to tue-robotics/image_recognition that referenced this issue Feb 10, 2021

(tensorflow) Fix issues when running with cuda

95c1c19

Solution from tensorflow/tensorflow#24496

jmaicas mentioned this issue May 2, 2021

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR DeepLabCut/DeepLabCut#1212

Closed

dbaranchuk mentioned this issue May 13, 2021

Problems running code with GPU support. ratschlab/GP-VAE#10

Closed

stromal mentioned this issue Jul 5, 2021

Tensorflow GPU installation don't want to run this code #50614

Closed

mohantym added the stat:awaiting response Status - Awaiting response from author label Jul 19, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Jul 26, 2021

google-ml-butler bot closed this as completed Aug 5, 2021

RohitDhankar mentioned this issue Jun 4, 2023

Not getting good results with the Caltech faces images for the NN Plots RohitDhankar/test_lightly_1#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496

michaelmyc commented Dec 21, 2018

va-andrew commented Jan 15, 2019

dennisjay commented Jan 17, 2019 •

edited

michaelmyc commented Jan 17, 2019

va-andrew commented Jan 17, 2019

nickovs commented Jan 22, 2019

newhouseb commented Jan 25, 2019

nkdsoft commented Jan 29, 2019

ymodak commented Jan 31, 2019

nickovs commented Jan 31, 2019

peterroelants commented Feb 5, 2019

hoermannpaul commented Feb 17, 2019

oscarlinux commented Feb 18, 2019

oscarlinux commented Feb 18, 2019

sandacn commented Mar 27, 2019

diego898 commented Apr 1, 2019

samhodge commented Apr 6, 2019 •

edited

samhodge commented Apr 7, 2019

yorickvP commented Apr 12, 2019

nickovs commented Apr 13, 2019

roebel commented Aug 21, 2020

bigboy32 commented Aug 21, 2020

Gangadharsmg commented Aug 24, 2020 •

edited

nikste commented Aug 24, 2020

drscotthawley commented Oct 17, 2020 •

edited

TiruBokka commented Oct 18, 2020

sachinkmohan commented Oct 23, 2020

zzhuolun commented Nov 12, 2020 •

edited

duongdqq commented Nov 17, 2020

wojdzi1607 commented Mar 10, 2021

LiUzHiAn commented Jun 25, 2021

google-ml-butler bot commented Jul 26, 2021

google-ml-butler bot commented Aug 5, 2021

google-ml-butler bot commented Aug 5, 2021

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496

Comments

michaelmyc commented Dec 21, 2018

va-andrew commented Jan 15, 2019

dennisjay commented Jan 17, 2019 • edited

michaelmyc commented Jan 17, 2019

va-andrew commented Jan 17, 2019

nickovs commented Jan 22, 2019

newhouseb commented Jan 25, 2019

nkdsoft commented Jan 29, 2019

ymodak commented Jan 31, 2019

nickovs commented Jan 31, 2019

peterroelants commented Feb 5, 2019

hoermannpaul commented Feb 17, 2019

oscarlinux commented Feb 18, 2019

but get this error:

oscarlinux commented Feb 18, 2019

sandacn commented Mar 27, 2019

diego898 commented Apr 1, 2019

samhodge commented Apr 6, 2019 • edited

samhodge commented Apr 7, 2019

yorickvP commented Apr 12, 2019

nickovs commented Apr 13, 2019

roebel commented Aug 21, 2020

bigboy32 commented Aug 21, 2020

Gangadharsmg commented Aug 24, 2020 • edited

nikste commented Aug 24, 2020

drscotthawley commented Oct 17, 2020 • edited

TiruBokka commented Oct 18, 2020

sachinkmohan commented Oct 23, 2020

zzhuolun commented Nov 12, 2020 • edited

duongdqq commented Nov 17, 2020

wojdzi1607 commented Mar 10, 2021

LiUzHiAn commented Jun 25, 2021

google-ml-butler bot commented Jul 26, 2021

google-ml-butler bot commented Aug 5, 2021

google-ml-butler bot commented Aug 5, 2021

dennisjay commented Jan 17, 2019 •

edited

samhodge commented Apr 6, 2019 •

edited

Gangadharsmg commented Aug 24, 2020 •

edited

drscotthawley commented Oct 17, 2020 •

edited

zzhuolun commented Nov 12, 2020 •

edited