New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496
Comments
I've been running into the same issue with the same GPU: "CUDNN_STATUS_INTERNAL_ERROR". RTX 2070 GPU
|
I've the same problem running on RTX2080 GPU I tried the following tf Versions tf-nightly-gpu and a self compiled Version from master (060b6e3). CUDNN_LOGINFO_DBG=1; Then i get the following error: I0117 14:11:24.441819 140433563125568 basic_session_run_hooks.py:594] Saving checkpoints for 0 into /tmp/mnist/model.ckpt. I! CuDNN (v7402) function cudnnCreate() called: 2019-01-17 14:11:26.079151: I tensorflow/stream_executor/platform/default/dso_loader.cc:154] successfully opened CUDA library libcudnn.so.7 locally I! CuDNN (v7402) function cudnnCreate() called: 2019-01-17 14:11:26.571858: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR I! CuDNN (v7402) function cudnnCreate() called: 2019-01-17 14:11:26.585818: E tensorflow/stream_executor/cuda/cuda_dnn.cc:493] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR During handling of the above exception, another exception occurred: Traceback (most recent call last): Errors may have originated from an input operation. Original stack trace for 'Discriminator_1/Conv/Conv2D': Any ideas somebody? I am just before reinstalling my complete environement :-( |
Try to compile r1.13 from source. It would take a long time, but it should fix your problem. At least it fixed mine. |
I did try compiling from source, but ran into the same issue. I was finally able to fix my problem was setting |
I've been having the same issue (on an RTX 2060, Ubuntu 18.04, Python 3.6.7, CUDA 10.0.130, cuDNN 7.4.2, Tensorflow 1.13.0-rc0 from source). Thanks to @va-andrew's suggestion I have it working with the FWIW, in the course of searching for solutions to this it seems that this issue is a common problem with the RTX series (although it might be a general problem with CUDA 10.0, since the new cards don't support the older versions). It would be great if the defaults could get updated in the release of 1.13 so that special options don't need to be set for these cards. |
Chiming in to say I also experienced this under the following configuration:
Tensorflow Docker GPU containers with stable releases of everything don't work either (they straight up segfault rather than report CUDNN_STATUS_INTERNAL_ERROR). Curiously, things work fine on Windows 10 with Tensorflow v1.12! And has others have reported, setting allow_growth allows things to run properly. |
Same problem here.
And as others have reported, setting allow_growth=TRUE allows things to run. |
Closing this issue since its resolved. Thanks! |
@ymodak Can you please reference the PR that fixed this bug? |
I have a similar issue with |
Same issue with an RTX2080, spent two days recompiling and bug hunting until I found this fix. You made my day |
How do you actually set allow_growth=true? I have tf-nightly-gpu-2.0-preview and tried: import tensorflow as tf but get this error:AttributeError Traceback (most recent call last) AttributeError: module 'tensorflow' has no attribute 'ConfigProto' How can I set allow_growth in tensorflow 2.0? |
ok, made it work in tf-nightly-gpu-2.0-preview and ipython notebook adding this to my code: from tensorflow.compat.v1 import ConfigProto config = ConfigProto() |
same issue, with gpu_options.allow_growth = True the issue fixed. |
@newhouseb how/where did you set that true for all benchmarks? Was it an easy change? |
Is blanket allow growth a solution ? It is turned off by default for a reason see In my program memory management is important I would like to limit the amount of GPU used by TF because in my graphics application the GPU memory will be used for other things and putting it into a limited space is important to prevent out of memory errors |
I am working in C++ under Windows Adding the allow growth option results in an OOM error. Without this line of code the model runs fine on the same machine with the same card. With OOM error
Without OOM error
So to solve this problem with set allow growth results in a segfault. |
@ymodak This bug is not fixed. Arguably, using any sort of convnet should work in the default configuration. Either allow_growth should be true by default, it should be fixed so this works, or there should be a better error than |
@ymodak It looks like this issue was closed prematurely. While there is a work-around for this issue it involves changing application code. As a result the example code does not work out of the box on RTX cards and most recipes on line will also need modification. |
In case your problem has the same origin as the problems that are treated in the present issue (which I cannot know from your report) then there are a few solutions that you can easily find by means of reading the last 10-20 posts in this thread. |
I Fixed it with this:
|
I had this same issue with RTX 2080. Then following code worked for me.
Thanks everyone |
I think we can stop posting the |
Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR tensorflow/tensorflow#24496
RTX 2070 here. Was getting this error, but now running with
But my GPU has 8GB and only about 250MB were in use before I started the process. So I don't understand, why can't it allocate 3.87GB? (lowering batch size had no effect; the weights hdf5 file is less than 200MB) |
TF_FORCE_GPU_ALLOW_GROWTH=true worked for me. Here is my configuration: Whoever cannot set the environment variable, could try this as suggested in https://www.tensorflow.org/guide/gpu: |
Typing the command mentioned on the terminal just worked for me. |
It seems that the issue is noticed and solved in tensorflow 2.3.0.
Same problem: And the workaround After I upgrade tensorflow to 2.3.0, the problem disappeared, even without adding the line |
it works in my case |
It works, paste to start python file you execute. Ubuntu 20.04, docker Nvidia, tensorflow 1.15, GTX 1060 |
Hi, The
Hope it helps. |
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you. |
Closing as stale. Please reopen if you'd like to work on this further. |
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
Describe the current behavior
I'm running the CNN model on MNIST. When I'm running with the GPU, I am encountering
2018-12-20 20:09:13.644176: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I did some digging and realized that it is a memory issue (which shouldn't be the case as I have 32GB of RAM and 64GB of swap. I ran htop when running the model and I have 20+GB free, which is more than enough to fit the 8GB vRAM mappings.
Using the
gpu_options.allow_growth = True
gets the model to work properly, and settingos.environ['CUDA_VISIBLE_DEVICES'] = '-1'
also works. This means that I AM facing a memory issue, but I don't see how.Also, using
gpu_options.allow_growth = True
does not fix the same issue when trying to run tensorflow/models/official/mnist/ model, which should have a similar behavior with my code.Code to reproduce the issue
The text was updated successfully, but these errors were encountered: