You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I keep having the same issue that I tried to explain in #1200 (comment). I am working in Ubuntu 18.04 with a Nvidia RTX 4000
After trying the Docker and getting the same issue, I am back using the DLC-GPU environment, I still get the same problem when I try to train.
Right now I came back to install Cuda 10.0 (included in the PATH in the .bashrc file) because that is the one is used by the DLC-GPU environment when I install it in miniconda.
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
During training, I opted for doing allow_growth = True because I saw it in this github issue. tensorflow/tensorflow#24496
2021-05-02 22:16:22.447624: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/device:GPU:0 with 7018 MB memory) -> physical GPU (device: 0, name: NVIDIA Quadro RTX 4000, pci bus id: 0000:2d:00.0, compute capability: 7.5)
2021-05-02 22:16:22.449495: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x55d6db8d6fd0 executing computations on platform CUDA. Devices:
2021-05-02 22:16:22.449516: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): NVIDIA Quadro RTX 4000, Compute Capability 7.5
2021-05-02 22:16:29.122700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0
2021-05-02 22:16:29.122776: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-05-02 22:16:29.122783: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0
2021-05-02 22:16:29.122790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N
2021-05-02 22:16:29.122871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7018 MB memory) -> physical GPU (device: 0, name: NVIDIA Quadro RTX 4000, pci bus id: 0000:2d:00.0, compute capability: 7.5)
2021-05-02 22:16:37.593240: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-05-02 22:16:37.598426: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
I also get this:
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node resnet_v1_50/conv1/Conv2D (defined at /home/jorge/miniconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:124) ]]
[[node sigmoid_cross_entropy_loss/value (defined at /home/jorge/miniconda3/envs/DLC-GPU/lib/python3.7/site-packages/deeplabcut/pose_estimation_tensorflow/nnet/pose_net.py:283) ]]
The text was updated successfully, but these errors were encountered:
Hi, I keep having the same issue that I tried to explain in #1200 (comment). I am working in Ubuntu 18.04 with a Nvidia RTX 4000
After trying the Docker and getting the same issue, I am back using the DLC-GPU environment, I still get the same problem when I try to train.
Right now I came back to install Cuda 10.0 (included in the PATH in the .bashrc file) because that is the one is used by the DLC-GPU environment when I install it in miniconda.
During training, I opted for doing
allow_growth = True
because I saw it in this github issue. tensorflow/tensorflow#24496However, it still does not work.
deeplabcut.train_network(path_config_file, shuffle=1, allow_growth=True, gputouse=0, displayiters=50,saveiters=20000, maxiters=50000)
I also get this:
The text was updated successfully, but these errors were encountered: