New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDNN issue in TF #45423
Comments
I'm constantly getting |
https://starriet.medium.com/tensorflow-2-0-wanna-limit-gpu-memory-10ad474e2528 |
@jaehochang92, Also, please go through issue #24496 with a similar error and check if it works. Thanks! |
Thank you for the aid, but, I'm still getting the same errors and plus |
Thank you. I'm working on a shared GPU and I've added the following as TF documentation is directing on setting a memory limit for a shared GPU: gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=5* 1024)]
)
except RuntimeError as e:
print(e) But I'm still getting |
This reproduces my error: # This is a standalone code for reproducing bugs or issues.
import tensorflow as tf # tf == 2.3.1
import numpy as np
import nvgpu
from pprint import pprint
from sklearn import model_selection
from tensorflow.keras.layers import Input, BatchNormalization
from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D, Dense
tf.keras.backend.clear_session()
print('NVIDIA GPU info.:')
pprint(nvgpu.gpu_info())
print()
def prepare_dataset(volume: np.array, ts_size: float) -> np.array:
zipped_vol = np.array([*zip(volume[:, 0], volume[:, 1])])
tr, ts = split_trts(zipped_vol, ts_size)
print('Train volume shape:')
print(' ', tr.shape)
print('Test volume shape: ')
print(' ', ts.shape)
print()
return tr, ts
def split_trts(video_volume, ts_size):
vol_tr, vol_ts = model_selection.train_test_split(video_volume, test_size=ts_size)
vol_tr, vol_ts = np.asarray(vol_tr), np.asarray(vol_ts)
vol_tr = vol_tr.astype("float32") / 255.
vol_ts = vol_ts.astype("float32") / 255.
return vol_tr, vol_ts
def config_gpus(memory_limit):
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
tf.config.experimental.set_virtual_device_configuration(
gpus[0],
[tf.config.experimental.VirtualDeviceConfiguration(memory_limit=memory_limit * 1024)]
)
except RuntimeError as e:
print(e)
def build_model(input_shape, cnn_filters):
model = tf.keras.Sequential()
model.add(Input(input_shape))
for depth in cnn_filters:
model.add(Conv2D(depth, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(UpSampling2D((2, 2)))
model.add(BatchNormalization())
for depth in cnn_filters[::-1]:
model.add(Conv2D(depth, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2, 2), padding='same'))
model.add(BatchNormalization())
for depth in cnn_filters:
model.add(Conv2D(depth, (3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Dense(3))
optmz = tf.keras.optimizers.SGD(momentum=.05)
loss = tf.keras.losses.MeanSquaredError()
model.compile(optmz, loss)
return model
foo_volume = tf.random.uniform(
(1000, 2, 128, 128, 3), minval=0, maxval=255, dtype=tf.dtypes.int32, seed=None, name=None
)
train, test = prepare_dataset(foo_volume, ts_size=0.4)
config_gpus(5)
tf.debugging.set_log_device_placement(True)
my_model = build_model(train.shape[2:], [64, 64, 64])
print('Your model:'), print(my_model.summary())
if input("Proceed? */n: ") != 'n':
history = my_model.fit(train[:, 1], train[:, 0], # noisy train, clean train
batch_size=4, epochs=20000, verbose=True,
validation_data=(test[:, 1], test[:, 0])).history And this results in...
|
@jaehochang92, Could you please provide a minimal code snippet so that we can reproduce the issue on our end. Thanks! |
Thank you for the feedback. It is weird since config_gpus(5)
tf.debugging.set_log_device_placement(True) I think the bug appeared because I tried to set the virtual device again after calling |
@jaehochang92, |
Please make sure that this is a bug. As per our
GitHub Policy,
we only address code/doc bugs, performance issues, feature requests and
build/installation issues on GitHub. tag:bug_template
System information
Describe the current behavior
Describe the expected behavior
Standalone code to reproduce the issue
Other info / logs
The text was updated successfully, but these errors were encountered: