You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During evaluation phase of Pascal VOC dataset with DeepLabv3/xception_65, 'Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR' error is emitted
#9661
Open
ssnirgudkar opened this issue
Jan 23, 2021
· 2 comments
As per the documentation, I am trying to run Pascal VOC dataset on DeepLabV3 and I am getting error during evaluation phase (eval.py).
The error is as follows -
INFO:tensorflow:Starting evaluation at 2021-01-23-03:43:35
I0123 03:43:35.323748 140297229682496 evaluation.py:450] Starting evaluation at 2021-01-23-03:43:35
2021-01-23 03:43:36.424183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-23 03:43:36.907355: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-23 03:43:36.918801: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I do not know why this error is emitted.
3. Steps to reproduce
Comment out the call to initial train.py
and then run -
sh local_test.sh (See 'Testing the Installation' section at )
4. Expected behavior
The train.py should execute successfully without any error.
5. Additional context
I understand that this may/may not be DeepLabV3 code issue but I do not know how to fix it.
During running of 'train.py' I had faced 'out of memory' issue and I fixed it by reducing the batch size to 1. However, now, I am not even running train.py. I am only running eval.py and I can see that GPU memory usage is exceeding (watched it in different shell using nvidia-smi). How can I control the GPU memory usage? Which parameters in eval.py can be changed so that memory footprint will be manageable?
If you think that my issue is same as tensorflow/tensorflow#24496 because of NVIDIA GeForce RTX 2070 series then please let me know how to create a 'configuration object' and incorporate it in 'eval.py'.
In 'train.py' there was 'configuration object' at the beginning but in 'eval.py' there is none. And I do not know where to hook it up to if I create one!
6. System information
OS Platform and Distribution: Linux Ubuntu 18.04
TensorFlow installed from (source or binary): Source
TensorFlow version (use command below): 1.15
Python version: 2.7
Bazel version (if compiling from source): 0.26.1
GCC/Compiler version (if compiling from source): 7.5.0
CUDA/cuDNN version: CUDA: 10.2, cuDNN:7.6.5.32-1
GPU model and memory: GeForce RTX 2070 SUPER, 8GB
I have created a Docker image with base of nvidia/cuda:10.0-base-ubuntu18.04 and have build TF 1.15 in it.
The text was updated successfully, but these errors were encountered:
Prerequisites
Please answer the following questions for yourself before submitting an issue.
1. The entire URL of the file you are using
https://github.com/tensorflow/models/tree/master/research/deeplab
2. Describe the bug
As per the documentation, I am trying to run Pascal VOC dataset on DeepLabV3 and I am getting error during evaluation phase (eval.py).
The error is as follows -
INFO:tensorflow:Starting evaluation at 2021-01-23-03:43:35
I0123 03:43:35.323748 140297229682496 evaluation.py:450] Starting evaluation at 2021-01-23-03:43:35
2021-01-23 03:43:36.424183: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-23 03:43:36.907355: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-23 03:43:36.918801: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I do not know why this error is emitted.
3. Steps to reproduce
Comment out the call to initial train.py
and then run -
sh local_test.sh (See 'Testing the Installation' section at )
4. Expected behavior
The train.py should execute successfully without any error.
5. Additional context
I understand that this may/may not be DeepLabV3 code issue but I do not know how to fix it.
During running of 'train.py' I had faced 'out of memory' issue and I fixed it by reducing the batch size to 1. However, now, I am not even running train.py. I am only running eval.py and I can see that GPU memory usage is exceeding (watched it in different shell using nvidia-smi). How can I control the GPU memory usage? Which parameters in eval.py can be changed so that memory footprint will be manageable?
If you think that my issue is same as tensorflow/tensorflow#24496 because of NVIDIA GeForce RTX 2070 series then please let me know how to create a 'configuration object' and incorporate it in 'eval.py'.
In 'train.py' there was 'configuration object' at the beginning but in 'eval.py' there is none. And I do not know where to hook it up to if I create one!
6. System information
The text was updated successfully, but these errors were encountered: