Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuDNN issue in TF #45423

Closed
jaehochang92 opened this issue Dec 5, 2020 · 10 comments
Closed

cuDNN issue in TF #45423

jaehochang92 opened this issue Dec 5, 2020 · 10 comments
Assignees
Labels
comp:gpu GPU related issues TF 2.3 Issues related to TF 2.3 type:support Support issues

Comments

@jaehochang92
Copy link

jaehochang92 commented Dec 5, 2020

Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template

System information

  • I use custom code
  • Windows 10 Pro
  • TensorFlow installed from pip
  • TensorFlow version == 2.3.1
  • python == 3.8.6
  • CUDA/cuDNN version: 10.1 / 7.6.5
  • GPU model and memory: GeForce RTX 2080 SUPER

Describe the current behavior

Describe the expected behavior

Standalone code to reproduce the issue

Other info / logs

D:\00.dev\Anaconda\envs\jh-ip\python.exe D:/02.users/jaehochang/gits/Autoencoders/main.py
2020-12-05 14:22:07.727822: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
NVIDIA GPU info.:
[{'index': '0',
  'mem_total': 8192,
  'mem_used': 2475,
  'mem_used_percent': 30.21240234375,
  'type': 'GeForce RTX 2080 SUPER',
  'uuid': '...'}]

{'videos': ['D:/20.share/jaehochang/SP2Robotics/videos\\vid1.mp4',
            'D:/20.share/jaehochang/SP2Robotics/videos\\vid2.mkv',
            'D:/20.share/jaehochang/SP2Robotics/videos\\vid3.mkv']}
Videos found correctly? */n: 
Which video? 1/2/3/...: 3
Capturing vid3.mkv ...
Volume shape: (165, 1088, 1920, 3)
Write volume array? */n: 

Capturing vid3_AGN.avi ...
Volume shape: (165, 1088, 1920, 3)
Write volume array? */n: 

2020-12-05 14:22:40.544902: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-12-05 14:22:40.605546: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2020-12-05 14:22:40.605721: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-12-05 14:22:41.024671: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-12-05 14:22:41.268418: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-12-05 14:22:41.292114: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-12-05 14:22:41.515595: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-12-05 14:22:41.712017: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-12-05 14:22:41.917475: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-12-05 14:22:41.917629: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-12-05 14:22:41.918243: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-05 14:22:41.926740: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x26380efa3e0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-05 14:22:41.926862: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-12-05 14:22:41.927041: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2020-12-05 14:22:41.927191: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-12-05 14:22:41.927267: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-12-05 14:22:41.927337: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-12-05 14:22:41.927412: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-12-05 14:22:41.927484: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-12-05 14:22:41.927554: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-12-05 14:22:41.927623: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-12-05 14:22:41.927717: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-12-05 14:22:42.588437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-05 14:22:42.588531: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-12-05 14:22:42.588582: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-12-05 14:22:42.588761: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6598 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:08:00.0, compute capability: 7.5)
2020-12-05 14:22:42.591519: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x263ae5c79c0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-12-05 14:22:42.591639: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
Train volume shape:
   (99, 2, 432, 768, 3)
Test volume shape: 
   (66, 2, 432, 768, 3)


===== You're trying ...
mname:      vid3-AGN-500epc-1btc
=====

Proceed? */n: 
Virtual devices cannot be modified after being initialized
Your model:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 432, 768, 36)      1008      
_________________________________________________________________
batch_normalization (BatchNo (None, 432, 768, 36)      144       
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 864, 1536, 36)     0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 864, 1536, 36)     144       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 864, 1536, 36)     11700     
_________________________________________________________________
batch_normalization_2 (Batch (None, 864, 1536, 36)     144       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 432, 768, 36)      0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 432, 768, 36)      144       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 432, 768, 3)       975       
=================================================================
Total params: 14,259
Trainable params: 13,971
Non-trainable params: 288
_________________________________________________________________
None
Fit? */n: 
Epoch 1/500
2020-12-05 14:22:51.499832: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-12-05 14:22:52.667191: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-12-05 14:22:52.668713: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-12-05 14:22:52.668804: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_ops_fused_impl.h:642 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Traceback (most recent call last):
  File "D:/02.users/jaehochang/gits/Autoencoders/main.py", line 55, in <module>
    history = my_model.fit(train[:, 1], train[:, 0],  # noisy train, clean train
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
    outputs = execute.execute(
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node sequential/conv2d/Relu (defined at /02.users/jaehochang/gits/Autoencoders/main.py:55) ]] [Op:__inference_train_function_2470]

Function call stack:
train_function


Process finished with exit code 1
@jaehochang92
Copy link
Author

I'm constantly getting Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED issue.

@tonystratum
Copy link

I'm constantly getting Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED issue.

https://starriet.medium.com/tensorflow-2-0-wanna-limit-gpu-memory-10ad474e2528
The fix is limiting the VRAM (in my case 512-1024M less than the VRAM capacity, YMMV).

@amahendrakar
Copy link
Contributor

@jaehochang92,
Could you please try setting a hard limit on the total GPU memory as mentioned in this guide and let us know if it helps.

Also, please go through issue #24496 with a similar error and check if it works. Thanks!

@amahendrakar amahendrakar added comp:gpu GPU related issues stat:awaiting response Status - Awaiting response from author TF 2.3 Issues related to TF 2.3 type:support Support issues and removed type:bug Bug labels Dec 6, 2020
@jaehochang92
Copy link
Author

jaehochang92 commented Dec 7, 2020

I'm constantly getting Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED issue.

https://starriet.medium.com/tensorflow-2-0-wanna-limit-gpu-memory-10ad474e2528
The fix is limiting the VRAM (in my case 512-1024M less than the VRAM capacity, YMMV).

Thank you for the aid, but, I'm still getting the same errors and plus failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED ; ((

@jaehochang92
Copy link
Author

jaehochang92 commented Dec 7, 2020

@jaehochang92,
Could you please try setting a hard limit on the total GPU memory as mentioned in this guide and let us know if it helps.

Also, please go through issue #24496 with a similar error and check if it works. Thanks!

Thank you. I'm working on a shared GPU and I've added the following as TF documentation is directing on setting a memory limit for a shared GPU:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
    try:
        tf.config.experimental.set_virtual_device_configuration(
            gpus[0],
            [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5* 1024)]
        )
    except RuntimeError as e:
        print(e)

But I'm still getting failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED and Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED.

@jaehochang92
Copy link
Author

jaehochang92 commented Dec 7, 2020

This reproduces my error:

# This is a standalone code for reproducing bugs or issues.
import tensorflow as tf  # tf == 2.3.1
import numpy as np
import nvgpu

from pprint import pprint
from sklearn import model_selection
from tensorflow.keras.layers import Input, BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Dense

tf.keras.backend.clear_session()

print('NVIDIA GPU info.:')
pprint(nvgpu.gpu_info())
print()


def prepare_dataset(volume: np.array, ts_size: float) -> np.array:
    zipped_vol = np.array([*zip(volume[:, 0], volume[:, 1])])
    tr, ts = split_trts(zipped_vol, ts_size)
    print('Train volume shape:')
    print('  ', tr.shape)
    print('Test volume shape: ')
    print('  ', ts.shape)
    print()
    return tr, ts


def split_trts(video_volume, ts_size):
    vol_tr, vol_ts = model_selection.train_test_split(video_volume, test_size=ts_size)
    vol_tr, vol_ts = np.asarray(vol_tr), np.asarray(vol_ts)
    vol_tr = vol_tr.astype("float32") / 255.
    vol_ts = vol_ts.astype("float32") / 255.
    return vol_tr, vol_ts


def config_gpus(memory_limit):
    gpus = tf.config.experimental.list_physical_devices('GPU')
    if gpus:
        try:
            tf.config.experimental.set_virtual_device_configuration(
                gpus[0],
                [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=memory_limit * 1024)]
            )
        except RuntimeError as e:
            print(e)


def build_model(input_shape, cnn_filters):
    model = tf.keras.Sequential()
    model.add(Input(input_shape))
    for depth in cnn_filters:
        model.add(Conv2D(depth, (3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
    model.add(UpSampling2D((2, 2)))
    model.add(BatchNormalization())
    for depth in cnn_filters[::-1]:
        model.add(Conv2D(depth, (3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
    model.add(MaxPooling2D((2, 2), padding='same'))
    model.add(BatchNormalization())
    for depth in cnn_filters:
        model.add(Conv2D(depth, (3, 3), activation='relu', padding='same'))
        model.add(BatchNormalization())
    model.add(Dense(3))
    optmz = tf.keras.optimizers.SGD(momentum=.05)
    loss = tf.keras.losses.MeanSquaredError()
    model.compile(optmz, loss)
    return model


foo_volume = tf.random.uniform(
    (1000, 2, 128, 128, 3), minval=0, maxval=255, dtype=tf.dtypes.int32, seed=None, name=None
)
train, test = prepare_dataset(foo_volume, ts_size=0.4)
config_gpus(5)
tf.debugging.set_log_device_placement(True)
my_model = build_model(train.shape[2:], [64, 64, 64])
print('Your model:'), print(my_model.summary())
if input("Proceed? */n: ") != 'n':
    history = my_model.fit(train[:, 1], train[:, 0],  # noisy train, clean train
                           batch_size=4, epochs=20000, verbose=True,
                           validation_data=(test[:, 1], test[:, 0])).history

And this results in...

D:\00.dev\Anaconda\envs\jh-ip\python.exe D:/02.users/jaehochang/gits/Autoencoders/debugging.py
2020-12-07 12:48:13.226923: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
NVIDIA GPU info.:
[{'index': '0',
  'mem_total': 8192,
  'mem_used': 1620,
  'mem_used_percent': 19.775390625,
  'type': 'GeForce RTX 2080 SUPER',
  'uuid': ...}]

2020-12-07 12:48:15.909482: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library nvcuda.dll
2020-12-07 12:48:15.948027: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2020-12-07 12:48:15.948351: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-12-07 12:48:15.955311: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-12-07 12:48:15.958050: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-12-07 12:48:15.959191: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-12-07 12:48:15.962691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-12-07 12:48:15.965691: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-12-07 12:48:15.972593: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-12-07 12:48:15.972749: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-12-07 12:48:15.973284: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-12-07 12:48:15.983121: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x210a1480c90 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-12-07 12:48:15.983253: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-12-07 12:48:15.983451: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:08:00.0 name: GeForce RTX 2080 SUPER computeCapability: 7.5
coreClock: 1.815GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 462.00GiB/s
2020-12-07 12:48:15.983590: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudart64_101.dll
2020-12-07 12:48:15.983661: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-12-07 12:48:15.983730: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cufft64_10.dll
2020-12-07 12:48:15.983803: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library curand64_10.dll
2020-12-07 12:48:15.983877: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusolver64_10.dll
2020-12-07 12:48:15.983951: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cusparse64_10.dll
2020-12-07 12:48:15.984022: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-12-07 12:48:15.984107: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0
2020-12-07 12:48:16.594642: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-12-07 12:48:16.594732: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263]      0 
2020-12-07 12:48:16.594780: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0:   N 
2020-12-07 12:48:16.594938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6598 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:08:00.0, compute capability: 7.5)
2020-12-07 12:48:16.597744: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x210cf29d150 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-12-07 12:48:16.597859: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
Train volume shape:
   (600, 2, 128, 128, 3)
Test volume shape: 
   (400, 2, 128, 128, 3)

Virtual devices cannot be modified after being initialized
Your model:
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d (Conv2D)              (None, 128, 128, 64)      1792      
_________________________________________________________________
batch_normalization (BatchNo (None, 128, 128, 64)      256       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 128, 128, 64)      36928     
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 128, 64)      256       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 128, 128, 64)      36928     
_________________________________________________________________
batch_normalization_2 (Batch (None, 128, 128, 64)      256       
_________________________________________________________________
up_sampling2d (UpSampling2D) (None, 256, 256, 64)      0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 256, 256, 64)      256       
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 256, 256, 64)      36928     
_________________________________________________________________
batch_normalization_4 (Batch (None, 256, 256, 64)      256       
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 256, 256, 64)      36928     
_________________________________________________________________
batch_normalization_5 (Batch (None, 256, 256, 64)      256       
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 256, 256, 64)      36928     
_________________________________________________________________
batch_normalization_6 (Batch (None, 256, 256, 64)      256       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 64)      0         
_________________________________________________________________
batch_normalization_7 (Batch (None, 128, 128, 64)      256       
_________________________________________________________________
conv2d_6 (Conv2D)            (None, 128, 128, 64)      36928     
_________________________________________________________________
batch_normalization_8 (Batch (None, 128, 128, 64)      256       
_________________________________________________________________
conv2d_7 (Conv2D)            (None, 128, 128, 64)      36928     
_________________________________________________________________
batch_normalization_9 (Batch (None, 128, 128, 64)      256       
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 128, 128, 64)      36928     
_________________________________________________________________
batch_normalization_10 (Batc (None, 128, 128, 64)      256       
_________________________________________________________________
dense (Dense)                (None, 128, 128, 3)       195       
=================================================================
Total params: 300,227
Trainable params: 298,819
Non-trainable params: 1,408
_________________________________________________________________
None
Proceed? */n: 
Epoch 1/20000
2020-12-07 12:48:21.149807: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cublas64_10.dll
2020-12-07 12:48:21.431219: E tensorflow/stream_executor/cuda/cuda_blas.cc:225] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-07 12:48:21.431686: E tensorflow/stream_executor/cuda/cuda_blas.cc:225] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-07 12:48:21.431805: E tensorflow/stream_executor/cuda/cuda_blas.cc:225] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2020-12-07 12:48:21.440341: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library cudnn64_7.dll
2020-12-07 12:48:22.542557: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-12-07 12:48:22.545222: E tensorflow/stream_executor/cuda/cuda_dnn.cc:328] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-12-07 12:48:22.545329: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at conv_ops_fused_impl.h:642 : Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
Traceback (most recent call last):
  File "D:/02.users/jaehochang/gits/Autoencoders/debugging.py", line 84, in <module>
    history = my_model.fit(train[:, 1], train[:, 0],  # noisy train, clean train
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\keras\engine\training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1098, in fit
    tmp_logs = train_function(iterator)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\def_function.py", line 780, in __call__
    result = self._call(*args, **kwds)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\def_function.py", line 840, in _call
    return self._stateless_fn(*args, **kwds)
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 2829, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 1843, in _filtered_call
    return self._call_flat(
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 1923, in _call_flat
    return self._build_call_outputs(self._inference_function.call(
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\function.py", line 545, in call
    outputs = execute.execute(
  File "D:\00.dev\Anaconda\envs\jh-ip\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnknownError:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
	 [[node sequential/conv2d/Relu (defined at /02.users/jaehochang/gits/Autoencoders/debugging.py:84) ]] [Op:__inference_train_function_10722]

Function call stack:
train_function


Process finished with exit code 1

@amahendrakar
Copy link
Contributor

@jaehochang92,
Thank you for the update. I'm facing issues while running the code, the prepare_dataset method seems to run indefinitely.

Could you please provide a minimal code snippet so that we can reproduce the issue on our end. Thanks!

@amahendrakar amahendrakar added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting response Status - Awaiting response from author labels Dec 7, 2020
@jaehochang92
Copy link
Author

jaehochang92 commented Dec 8, 2020

@jaehochang92,
Thank you for the update. I'm facing issues while running the code, the prepare_dataset method seems to run indefinitely.

Could you please provide a minimal code snippet so that we can reproduce the issue on our end. Thanks!

Thank you for the feedback. It is weird since prepare_dataset is working well in my machine... By the way, unexpectedly, I found a simple solution. All I needed to do was moving the following code above foo_volume declaration:

config_gpus(5)
tf.debugging.set_log_device_placement(True)

I think the bug appeared because I tried to set the virtual device again after calling tensorflow method. This maybe a code design issue when a newbie like me try to configure virtual device in an inappropriate order...

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 10, 2020
@amahendrakar
Copy link
Contributor

By the way, unexpectedly, I found a simple solution. All I needed to do was moving the following code above foo_volume declaration

@jaehochang92,
Thank you for the update. Glad its working now. Marking the issue as closed since it is resolved. Please feel free to reopen if necessary.

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:gpu GPU related issues TF 2.3 Issues related to TF 2.3 type:support Support issues
Projects
None yet
Development

No branches or pull requests

4 participants