Skip to content
This repository has been archived by the owner on Dec 10, 2023. It is now read-only.

CUDNN not installed #14

Open
seaniedan opened this issue Jun 3, 2021 · 1 comment
Open

CUDNN not installed #14

seaniedan opened this issue Jun 3, 2021 · 1 comment

Comments

@seaniedan
Copy link
Contributor

Again, this might be a problem with how I'm using the repo - there's no instructions - so I'm attaching a BASH terminal.

All is working great (ffmpeg, sorting scripts) until I try to train :

(deepfacelab) anaconda@8edf84a647e9:~/scripts$ ./6_train_Quick96_no_preview.sh 
Running trainer.

[new] No saved models found. Enter a name of a new model : 
new

Model first run.

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU
  [0] : Quadro RTX 5000

[0] Which GPU indexes to choose? : 
0

Initializing models:   0%|                                                                                                               | 0/5 [00:00<?, ?it/s]
Error: No OpKernel was registered to support Op 'DepthToSpace' used by node DepthToSpace (defined at /deepfacelab/core/leras/ops/__init__.py:336)  with these attrs: [data_format="NCHW", block_size=2, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  device='GPU'; T in [DT_QINT8]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_VARIANT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_RESOURCE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_STRING]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BOOL]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX128]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_DOUBLE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_FLOAT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BFLOAT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_HALF]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT64]; data_format in ["NHWC"]

	 [[DepthToSpace]]

Errors may have originated from an input operation.
Input Source operations connected to node DepthToSpace:
 LeakyRelu_4 (defined at /deepfacelab/core/leras/archis/DeepFakeArchi.py:58)
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1358, in _run_fn
    self._extend_graph()
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1398, in _extend_graph
    tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'DepthToSpace' used by {{node DepthToSpace}} with these attrs: [data_format="NCHW", block_size=2, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  device='GPU'; T in [DT_QINT8]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_VARIANT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_RESOURCE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_STRING]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BOOL]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX128]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_DOUBLE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_FLOAT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BFLOAT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_HALF]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT64]; data_format in ["NHWC"]

	 [[DepthToSpace]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/deepfacelab/mainscripts/Trainer.py", line 46, in trainerThread
    model = models.import_model(model_class_name)(
  File "/usr/local/deepfacelab/models/ModelBase.py", line 189, in __init__
    self.on_initialize()
  File "/usr/local/deepfacelab/models/Model_Quick96/Model.py", line 222, in on_initialize
    model.init_weights()
  File "/usr/local/deepfacelab/core/leras/layers/Saveable.py", line 104, in init_weights
    nn.init_weights(self.get_weights())
  File "/usr/local/deepfacelab/core/leras/ops/__init__.py", line 48, in init_weights
    nn.tf_sess.run (ops)
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/usr/local/anaconda3/envs/deepfacelab/lib/python3.8/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'DepthToSpace' used by node DepthToSpace (defined at /deepfacelab/core/leras/ops/__init__.py:336)  with these attrs: [data_format="NCHW", block_size=2, T=DT_FLOAT]
Registered devices: [CPU]
Registered kernels:
  device='GPU'; T in [DT_QINT8]
  device='GPU'; T in [DT_HALF]
  device='GPU'; T in [DT_FLOAT]
  device='CPU'; T in [DT_VARIANT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_RESOURCE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_STRING]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BOOL]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX128]; data_format in ["NHWC"]
  device='CPU'; T in [DT_COMPLEX64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_DOUBLE]; data_format in ["NHWC"]
  device='CPU'; T in [DT_FLOAT]; data_format in ["NHWC"]
  device='CPU'; T in [DT_BFLOAT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_HALF]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT8]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT16]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT32]; data_format in ["NHWC"]
  device='CPU'; T in [DT_INT64]; data_format in ["NHWC"]
  device='CPU'; T in [DT_UINT64]; data_format in ["NHWC"]

	 [[DepthToSpace]]

Errors may have originated from an input operation.
Input Source operations connected to node DepthToSpace:
 LeakyRelu_4 (defined at /deepfacelab/core/leras/archis/DeepFakeArchi.py:58)

I have CUDA in Docker:

(deepfacelab) anaconda@8edf84a647e9:~/scripts$ nvidia-smi
Thu Jun  3 08:51:38 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.80       Driver Version: 460.80       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 5000     Off  | 00000000:65:00.0  On |                  Off |
| 34%   34C    P8    18W / 230W |    853MiB / 16124MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|

And before I run the above command, I installed CUDNN inside Docker:

(deepfacelab) anaconda@8edf84a647e9:~/scripts$ conda install -c conda-forge cudnn
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.9.2
  latest version: 4.10.1

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /usr/local/anaconda3/envs/deepfacelab

  added / updated specs:
    - cudnn


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.5.30  |       ha878542_0         136 KB  conda-forge
    certifi-2021.5.30          |   py38h578d9bd_0         141 KB  conda-forge
    cudatoolkit-11.2.2         |       he111cf0_8       877.3 MB  conda-forge
    cudnn-8.1.0.77             |       h90431f1_0       634.8 MB  conda-forge
    openssl-1.1.1k             |       h7f98852_0         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:        1.48 GB

The following NEW packages will be INSTALLED:

  cudatoolkit        conda-forge/linux-64::cudatoolkit-11.2.2-he111cf0_8
  cudnn              conda-forge/linux-64::cudnn-8.1.0.77-h90431f1_0

The following packages will be UPDATED:

  ca-certificates                      2020.12.5-ha878542_0 --> 2021.5.30-ha878542_0
  certifi                          2020.12.5-py38h578d9bd_1 --> 2021.5.30-py38h578d9bd_0
  openssl                                 1.1.1j-h7f98852_0 --> 1.1.1k-h7f98852_0


Proceed ([y]/n)? y


Downloading and Extracting Packages
cudatoolkit-11.2.2   | 877.3 MB  | #################################################################################################################### | 100% 
openssl-1.1.1k       | 2.1 MB    | #################################################################################################################### | 100% 
certifi-2021.5.30    | 141 KB    | #################################################################################################################### | 100% 
ca-certificates-2021 | 136 KB    | #################################################################################################################### | 100% 
cudnn-8.1.0.77       | 634.8 MB  | #################################################################################################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: \ By downloading and using the CUDA Toolkit conda packages, you accept the terms and conditions of the CUDA End User License Agreement (EULA): https://docs.nvidia.com/cuda/eula/index.html

- By downloading and using the cuDNN conda packages, you accept the terms and conditions of the NVIDIA cuDNN EULA -
  https://docs.nvidia.com/deeplearning/cudnn/sla/index.html

done

I believe the repo installs CUDA correctly, but doesn't install CUDNN. These are requirements - I was looking here for the error. Any advice/help appreciated!

@xychelsea
Copy link
Owner

Ok Thank you -- I am updating the repository for v0.4 -- sorry for taking so long to get back to you

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants