CUDNN not installed #14

seaniedan · 2021-06-03T11:38:38Z

Again, this might be a problem with how I'm using the repo - there's no instructions - so I'm attaching a BASH terminal.

All is working great (ffmpeg, sorting scripts) until I try to train :

(deepfacelab) anaconda@8edf84a647e9:~/scripts$ ./6_train_Quick96_no_preview.sh 
Running trainer.

[new] No saved models found. Enter a name of a new model : 
new

Model first run.

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU
  [0] : Quadro RTX 5000

[0] Which GPU indexes to choose? : 
0

Initializing models:   0%|                                                                                                               | 0/5 [00:00<?, ?it/s]
Error: No OpKernel was registered to support Op 'DepthToSpace' used by node DepthToSpace (defined at /deepfacelab/core/leras/ops/__init__.py:336)  with these attrs: [data_format="NCHW", block_size=2, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  device='GPU'; T in [DT_QINT8]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_VARIANT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_RESOURCE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_STRING]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BOOL]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX128]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_DOUBLE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_FLOAT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BFLOAT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_HALF]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT64]; data_format in ["NHWC"]

	 [[DepthToSpace]]

Errors may have originated from an input operation.
Input Source operations connected to node DepthToSpace:
 LeakyRelu_4 (defined at /deepfacelab/core/leras/archis/DeepFakeArchi.py:58)
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1358, in _run_fn
    self._extend_graph()
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1398, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'DepthToSpace' used by {{node DepthToSpace}} with these attrs: [data_format="NCHW", block_size=2, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  device='GPU'; T in [DT_QINT8]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_VARIANT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_RESOURCE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_STRING]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BOOL]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX128]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_DOUBLE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_FLOAT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BFLOAT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_HALF]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT64]; data_format in ["NHWC"]

	 [[DepthToSpace]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/deepfacelab/mainscripts/Trainer.py", line 46, in trainerThread
    model = models.import_model(model_class_name)(
  File "/usr/local/deepfacelab/models/ModelBase.py", line 189, in __init__
    self.on_initialize()
  File "/usr/local/deepfacelab/models/Model_Quick96/Model.py", line 222, in on_initialize
    model.init_weights()
  File "/usr/local/deepfacelab/core/leras/layers/Saveable.py", line 104, in init_weights
    nn.init_weights(self.get_weights())
  File "/usr/local/deepfacelab/core/leras/ops/__init__.py", line 48, in init_weights
    nn.tf_sess.run (ops)
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'DepthToSpace' used by node DepthToSpace (defined at /deepfacelab/core/leras/ops/__init__.py:336)  with these attrs: [data_format="NCHW", block_size=2, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  device='GPU'; T in [DT_QINT8]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_VARIANT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_RESOURCE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_STRING]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BOOL]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX128]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_DOUBLE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_FLOAT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BFLOAT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_HALF]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT64]; data_format in ["NHWC"]

	 [[DepthToSpace]]

Errors may have originated from an input operation.
Input Source operations connected to node DepthToSpace:
 LeakyRelu_4 (defined at /deepfacelab/core/leras/archis/DeepFakeArchi.py:58)

I have CUDA in Docker:

(deepfacelab) anaconda@8edf84a647e9:~/scripts$ nvidia-smi
Thu Jun  3 08:51:38 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 5000     Off  | 00000000:65:00.0  On |                  Off |
| 34%   34C    P8    18W / 230W |    853MiB / 16124MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|

And before I run the above command, I installed CUDNN inside Docker:

(deepfacelab) anaconda@8edf84a647e9:~/scripts$ conda install -c conda-forge cudnn
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.1

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /usr/local/anaconda3/envs/deepfacelab

  added / updated specs:
    - cudnn


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge
    certifi-2021.5.30          |   py38h578d9bd_0         141 KB  conda-forge
    cudatoolkit-11.2.2         |       he111cf0_8       877.3 MB  conda-forge
    cudnn-8.1.0.77             |       h90431f1_0       634.8 MB  conda-forge
    openssl-1.1.1k             |       h7f98852_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        1.48 GB

The following NEW packages will be INSTALLED:

  cudatoolkit        conda-forge/linux-64::cudatoolkit-11.2.2-he111cf0_8
  cudnn              conda-forge/linux-64::cudnn-8.1.0.77-h90431f1_0

The following packages will be UPDATED:

  ca-certificates                      2020.12.5-ha878542_0 --> 2021.5.30-ha878542_0
  certifi                          2020.12.5-py38h578d9bd_1 --> 2021.5.30-py38h578d9bd_0
  openssl                                 1.1.1j-h7f98852_0 --> 1.1.1k-h7f98852_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
cudatoolkit-11.2.2   | 877.3 MB  | #################################################################################################################### | 100% 
openssl-1.1.1k       | 2.1 MB    | #################################################################################################################### | 100% 
certifi-2021.5.30    | 141 KB    | #################################################################################################################### | 100% 
ca-certificates-2021 | 136 KB    | #################################################################################################################### | 100% 
cudnn-8.1.0.77       | 634.8 MB  | #################################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: \ By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

- By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA -
  https://docs.nvidia.com/deeplearning/cudnn/sla/index.html

done

I believe the repo installs CUDA correctly, but doesn't install CUDNN. These are requirements - I was looking here for the error. Any advice/help appreciated!

The text was updated successfully, but these errors were encountered:

xychelsea · 2021-07-29T16:52:18Z

Ok Thank you -- I am updating the repository for v0.4 -- sorry for taking so long to get back to you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN not installed #14

CUDNN not installed #14

seaniedan commented Jun 3, 2021

xychelsea commented Jul 29, 2021

CUDNN not installed #14

CUDNN not installed #14

Comments

seaniedan commented Jun 3, 2021

xychelsea commented Jul 29, 2021