Assertion `graph->check_support(cudnn_handle).is_good()' failed #366

wfoy · 2024-05-06T00:44:16Z

I'm getting the following error when running ./train_gpt2cu after building using make train_gpt2cu USE_CUDNN=1

allocated 237 MiB for model parameters
allocated 1703 MiB for activations
train_gpt2cu: train_gpt2.cu:582: auto lookup_cache_or_build_graph_fwd(Args ...) [with Args = {int, int, int, int, bool}]: Assertion `graph->check_support(cudnn_handle).is_good()' failed.
[ip-172-31-71-31:07018] *** Process received signal ***
[ip-172-31-71-31:07018] Signal: Aborted (6)
[ip-172-31-71-31:07018] Signal code:  (-6)
[ip-172-31-71-31:07018] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7d3327442520]
[ip-172-31-71-31:07018] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7d33274969fc]
[ip-172-31-71-31:07018] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7d3327442476]
[ip-172-31-71-31:07018] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7d33274287f3]
[ip-172-31-71-31:07018] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2871b)[0x7d332742871b]
[ip-172-31-71-31:07018] [ 5] /lib/x86_64-linux-gnu/libc.so.6(+0x39e96)[0x7d3327439e96]
[ip-172-31-71-31:07018] [ 6] ./train_gpt2cu(+0xc09cb)[0x5f73a90349cb]
[ip-172-31-71-31:07018] [ 7] ./train_gpt2cu(+0x2a0a2)[0x5f73a8f9e0a2]
[ip-172-31-71-31:07018] [ 8] ./train_gpt2cu(+0x2b543)[0x5f73a8f9f543]
[ip-172-31-71-31:07018] [ 9] ./train_gpt2cu(+0x15a64)[0x5f73a8f89a64]
[ip-172-31-71-31:07018] [10] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7d3327429d90]
[ip-172-31-71-31:07018] [11] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7d3327429e40]
[ip-172-31-71-31:07018] [12] ./train_gpt2cu(+0x177e5)[0x5f73a8f8b7e5]
[ip-172-31-71-31:07018] *** End of error message ***
[1]    7018 IOT instruction (core dumped)  ./train_gpt2cu

I'm running CUDA 12.4 on Ubuntu 22.04
Any help or pointers would be great, thanks!

The text was updated successfully, but these errors were encountered:

Anerudhan · 2024-05-06T16:28:24Z

Can you add which GPU device and cudnn version?
A log with CUDNN_LOGLEVEL_DBG=3 will be useful for debug as well.

https://docs.nvidia.com/deeplearning/cudnn/latest/reference/troubleshooting.html

wfoy · 2024-05-06T16:57:18Z

Fixed by upgrading cuDNN version, previously was on 8.9.2 which broke with above error

ifromeast · 2024-05-13T03:20:55Z

After compiling by make train_gpt2cu USE_CUDNN=1, and run ./train_gpt2cu, there is an ERROR that

+-----------------------+----------------------------------------------------+
| Parameter             | Value                                              |
+-----------------------+----------------------------------------------------+
| input dataset prefix  | data/tiny_shakespeare                              |
| output log file       | NULL                                               |
| batch size B          | 4                                                  |
| sequence length T     | 1024                                               |
| learning rate         | 3.000000e-04                                       |
| max_steps             | -1                                                 |
| val_loss_every        | 20                                                 |
| val_max_batches       | 20                                                 |
| sample_every          | 20                                                 |
| genT                  | 64                                                 |
| overfit_single_batch  | 0                                                  |
| use_master_weights    | enabled                                            |
+-----------------------+----------------------------------------------------+
| device                | NVIDIA GeForce RTX 4090                            |
| precision             | BF16                                               |
+-----------------------+----------------------------------------------------+
| load_filename         | gpt2_124M_bf16.bin                                 |
| max_sequence_length T | 1024                                               |
| vocab_size V          | 50257                                              |
| padded_vocab_size Vp  | 50304                                              |
| num_layers L          | 12                                                 |
| num_heads NH          | 12                                                 |
| channels C            | 768                                                |
| num_parameters        | 124475904                                          |
+-----------------------+----------------------------------------------------+
| train_num_batches     | 74                                                 |
| val_num_batches       | 20                                                 |
+-----------------------+----------------------------------------------------+
| num_processes         | 1                                                  |
+-----------------------+----------------------------------------------------+
num_parameters: 124475904 ==> bytes: 248951808
allocated 237 MiB for model parameters
allocated 1703 MiB for activations
[CUDNN ERROR] at file cudnn_att.cpp:141:
[cudnn_frontend] Error: No execution plans built successfully.

and my CUDA is 12.4, cuDNN is 9.1, cudnn-frontend is 1.4.0 on Ubuntu 22.04

Anerudhan · 2024-05-13T07:50:28Z

Hi @ifromeast

Is it possible for you to dump the cudnn log?

If you set export CUDNN_LOGLEVEL_DBG=3 and it will dumped to your stdout.

The log will look like something like this:

I! CuDNN (v90100 70) function cudnnCreate() called:
i!     handle: location=host; addr=0x563c9b4a01a0;
i! Time: 2024-05-13T07:49:20.051230 (0d+0h+0m+0s since start)
i! Process=975; Thread=975; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v90100 70) function cudnnGraphLibraryConfigInit() called:
i!     apiLog: type=cudnnLibConfig_t; val=CUDNN_STANDARD;
i! Time: 2024-05-13T07:49:20.051266 (0d+0h+0m+0s since start)
i! Process=975; Thread=975; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v90100 70) function cudnnGetVersion() called:
i! Time: 2024-05-13T07:49:20.216976 (0d+0h+0m+0s since start)
i! Process=975; Thread=975; GPU=NULL; Handle=NULL; StreamId=NULL.

I am able to run the exact some configuration locally.

+-----------------------+----------------------------------------------------+
| Parameter               | Value                                           |
+-----------------------+----------------------------------------------------+
| input dataset prefix    | data/tiny_shakespeare                              |
| output log file       | NULL                                               |
| batch size B          | 4                                                  |
| sequence length T     | 1024                                               |
| learning rate         | 3.000000e-04                                       |
| max_steps             | -1                                                 |
| val_loss_every        | 20                                                 |
| val_max_batches       | 20                                                 |
| sample_every          | 20                                                 |
| genT                  | 64                                                 |
| overfit_single_batch  | 0                                                  |
| use_master_weights    | enabled                                            |
+-----------------------+----------------------------------------------------+
| device                | NVIDIA GeForce RTX 4090                            |
| precision             | BF16                                               |
+-----------------------+----------------------------------------------------+
| load_filename         | gpt2_124M_bf16.bin                                 |
| max_sequence_length T | 1024                                               |
| vocab_size V          | 50257                                              |
| padded_vocab_size Vp  | 50304                                              |
| num_layers L          | 12                                                 |
| num_heads NH          | 12                                                 |
| channels C            | 768                                                |
| num_parameters        | 124475904                                          |
+-----------------------+----------------------------------------------------+
| train_num_batches     | 74                                                 |
| val_num_batches       | 20                                                 |
+-----------------------+----------------------------------------------------+
| num_processes         | 1                                                  |
+-----------------------+----------------------------------------------------+
num_parameters: 124475904 ==> bytes: 248951808
allocated 237 MiB for model parameters
allocated 1703 MiB for activations
val loss 4.505090
allocated 237 MiB for parameter gradients
allocated 30 MiB for activation gradients
allocated 474 MiB for AdamW optimizer state m
allocated 474 MiB for AdamW optimizer state v
allocated 474 MiB for master copy of params
step    1/74: train loss 4.370480 (acc 4.370480) (298.699646 ms, 13712.771484 tok/s)
step    2/74: train loss 4.502850 (acc 4.502850) (34.138111 ms, 119983.187500 tok/s)
step    3/74: train loss 4.414629 (acc 4.414629) (34.011135 ms, 120212.890625 tok/s)
step    4/74: train loss 3.958204 (acc 3.958204) (34.105343 ms, 120172.781250 tok/s)
step    5/74: train loss 3.607100 (acc 3.607100) (34.020351 ms, 120233.632812 tok/s)
step    6/74: train loss 3.782271 (acc 3.782271) (34.085888 ms, 120218.898438 tok/s)

ifromeast · 2024-05-13T08:12:25Z

Hi @Anerudhan , Thank you so much for your advice to print log, and I got

E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e!     Info: Traceback contains 4 message(s)
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-13T16:09:12.824746 (0d+0h+0m+2s since start)
e! Process=781621; Thread=781621; GPU=NULL; Handle=NULL; StreamId=NULL.


I! CuDNN (v90101 17) function cudnnGetErrorString() called:
i!     status: type=int; val=5000;
i! Time: 2024-05-13T16:09:12.824882 (0d+0h+0m+2s since start)
i! Process=781621; Thread=781621; GPU=NULL; Handle=NULL; StreamId=NULL.

Do you know why it happens? I am new to CUDA, thank you so much!

Anerudhan · 2024-05-13T08:37:01Z

Could be a driver or toolkit issue. What version of driver are you on?

nvidia-smi
Mon May 13 08:31:45 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+

Update instructions: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network

ifromeast · 2024-05-13T08:39:06Z

Could be a driver or toolkit issue. What version of driver are you on?
nvidia-smi
Mon May 13 08:31:45 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
Update instructions: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network

this is my driver version

Mon May 13 16:38:13 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4090        Off |   00000000:00:08.0 Off |                  Off |
| 30%   28C    P8             22W /  450W |      11MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 4090        Off |   00000000:00:09.0 Off |                  Off |
| 30%   27C    P8             18W /  450W |      11MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

ifromeast · 2024-05-13T09:01:46Z

@Anerudhan cudnn-frontend have updated last week, have you update it?

ifromeast · 2024-05-13T09:15:53Z

similarly, the ERROR occurs when

(llm-env) root@ubuntu22:~/llm.c/dev/cuda# nvcc -I../../cudnn-frontend/include -DENABLE_CUDNN -O3 --use_fast_math -lcublas -lcublasLt -lcudnn attention_forward.cu -o attention_forwa
rd
(llm-env) root@ubuntu22:~/llm.c/dev/cuda# ./attention_forward 10
enable_tf32: 1
Using kernel 10
Checking block size 32.
attention_forward: attention_forward.cu:1143: auto lookup_cache_or_build_graph_fwd(Args ...) [with Args = {int, int, int, int, bool}]: Assertion `graph->check_support(cudnn_handle).is_good()' failed.

is there anything wrong with my cuDNN or cudnn-frontend?

Anerudhan · 2024-05-14T21:29:32Z

Hi @ifromeast,
I am still trying to reproduce the issue (Yes I have the latest cudnn-frontend and cudnn).

This does not look like a cudnn issue. I suspect this happens because of multi-GPU setup.

Is it possible for you to try two scenarios:
a) Try setting CUDA_VISIBLE_DEVICES=0,-1,1 and check if the execution is successful for you?
b) (Indpendent of case above)Try setting CUDA_MODULE_LOADING=EAGER and CUDA_MODULE_DATA_LOADING=EAGER

Thanks
Anerudhan

simonguozirui · 2024-05-16T15:56:24Z

I am having exact issue as @ifromeast. My cuda version is 12.4, CuDNN Version is 9.1.1.17-1, and cudnn-frontend is 1.4.0 on Debian 11.

Anerudhan · 2024-05-16T17:41:21Z

Hi @simonguozirui , Is it on multi-GPU 4090 as well?

Is it possible for you to try two scenarios:
a) Try setting CUDA_VISIBLE_DEVICES=0,-1,1 and check if the execution is successful for you?

b) (Indpendent of case above)Try setting CUDA_MODULE_LOADING=EAGER and CUDA_MODULE_DATA_LOADING=EAGER

Thanks

simonguozirui · 2024-05-16T17:47:31Z

Hey @Anerudhan! Thanks so much for the suggestion. I tried both of those, but unfortunately doesn't change the behavior. I am on a T4 GPU (single GPU setup). Things break for me at graph->check_support(cudnn_handle) as well.

Curious which CuDNN and front end version are you using so I can reference and debug.

Anerudhan · 2024-05-16T17:52:04Z

I am using cudnn-frontend-1.4.0 and cuda 12.4 (I have cuda 12.3 installed as well for debugging).

I think the issue is cudnn sdpa operation is not supported on T4 (turing and requires Ampere or later GPUs). If you run with export CUDNN_LOGLEVEL_DBG=2, you will see more helpful messages.

Thanks

simonguozirui · 2024-05-16T17:55:21Z

@Anerudhan thanks I will try on an Ampere GPU too.
With the new log level I see some messages like
i! descriptor: type=CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR; val=NOT_IMPLEMENTED; curious if you know what might be causing that.

Anerudhan · 2024-05-16T18:08:56Z

Those are info messages (i!) and harmless as they capture the library state. I would be more interested in messages which are warnings(w!) or errors(e!).

simonguozirui · 2024-05-16T18:13:23Z

Hi @Anerudhan, I checked. No errors e!, only one w!; here it is

W! CuDNN (v90101 17) function cudnnBackendFinalize() called:
w!     Info: Traceback contains 2 message(s)
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: userGraph->getEntranceNodesSize() != 2
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: numUserNodes != 5 && numUserNodes != 6
w! Time: 2024-05-16T18:12:27.288935 (0d+0h+0m+0s since start)
w! Process=349188; Thread=349188; GPU=NULL; Handle=NULL; StreamId=NULL.

karpathy/llm.c#366 (comment) the error was seen on `rtc->loadModule()`. Adding cudaGetLastError() to capture the associated cudaError.

h53 · 2024-05-27T23:19:14Z

same error with @ifromeast, btw I tested it on wsl.

(base) h53@Nyx:~/repo/llm.c$ make train_gpt2cu USE_CUDNN=1
---------------------------------------------
✓ cuDNN found, will run with flash-attention
✓ OpenMP found
✗ OpenMPI is not found, disabling multi-GPU support
---> On Linux you can try install OpenMPI with `sudo apt install openmpi-bin openmpi-doc libopenmpi-dev`
✓ nvcc found, including GPU/CUDA support
---------------------------------------------
/usr/local/cuda-12.3/bin/nvcc -O3 -t=0 --use_fast_math --generate-code arch=compute_89,code=[compute_89,sm_89] -DENABLE_CUDNN -DENABLE_BF16 train_gpt2.cu cudnn_att.o -lcublas -lcublasLt -lcudnn -I/home/h53/cudnn-frontend/include  -o train_gpt2cu 
(base) h53@Nyx:~/repo/llm.c$ ./train_gpt2cu 
Multi-GPU support is disabled. Using a single GPU.
+-----------------------+----------------------------------------------------+
| Parameter             | Value                                              |
+-----------------------+----------------------------------------------------+
| train data pattern    | dev/data/tinyshakespeare/tiny_shakespeare_train.bin |
| val data pattern      | dev/data/tinyshakespeare/tiny_shakespeare_val.bin  |
| output log dir        | NULL                                               |
| checkpoint_every      | 0                                                  |
| resume                | 0                                                  |
| micro batch size B    | 4                                                  |
| sequence length T     | 1024                                               |
| total batch size      | 4096                                               |
| learning rate (LR)    | 3.000000e-04                                       |
| warmup iterations     | 0                                                  |
| final LR fraction     | 1.000000e+00                                       |
| weight decay          | 0.000000e+00                                       |
| grad_clip             | 1.000000e+00                                       |
| max_steps             | -1                                                 |
| val_loss_every        | 20                                                 |
| val_max_steps         | 20                                                 |
| sample_every          | 20                                                 |
| genT                  | 64                                                 |
| overfit_single_batch  | 0                                                  |
| use_master_weights    | enabled                                            |
| recompute             | 1                                                  |
+-----------------------+----------------------------------------------------+
| device                | NVIDIA GeForce RTX 4060 Ti                         |
| precision             | BF16                                               |
+-----------------------+----------------------------------------------------+
| load_filename         | gpt2_124M_bf16.bin                                 |
| max_sequence_length T | 1024                                               |
| vocab_size V          | 50257                                              |
| padded_vocab_size Vp  | 50304                                              |
| num_layers L          | 12                                                 |
| num_heads NH          | 12                                                 |
| channels C            | 768                                                |
| num_parameters        | 124475904                                          |
+-----------------------+----------------------------------------------------+
| train_num_batches     | 74                                                 |
| val_num_batches       | 20                                                 |
+-----------------------+----------------------------------------------------+
| run hellaswag         | no                                                 |
+-----------------------+----------------------------------------------------+
| Zero Optimization is disabled                                              |
| num_processes         | 1                                                  |
| zero_stage            | 0                                                  |
+-----------------------+----------------------------------------------------+
HellaSwag eval not found at dev/data/hellaswag/hellaswag_val.bin, skipping its evaluation
You can run `python dev/data/hellaswag.py` to export and use it with `-h 1`.
num_parameters: 124475904 => bytes: 248951808
allocated 237 MiB for model parameters
batch_size B=4 * seq_len T=1024 * num_processes=1 and total_batch_size=4096
=> setting grad_accum_steps=1
allocating 1439 MiB for activations

W! CuDNN (v90101 17) function cudnnBackendFinalize() called:
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: userGraph->getEntranceNodesSize() != 2
w!         Warning: CUDNN_STATUS_NOT_SUPPORTED; Reason: numUserNodes != 5 && numUserNodes != 6
w! Time: 2024-05-28T07:01:07.380708 (0d+0h+0m+0s since start)
w! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.


E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-28T07:01:07.741508 (0d+0h+0m+0s since start)
e! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.


E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-28T07:01:07.960496 (0d+0h+0m+0s since start)
e! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.


E! CuDNN (v90101 17) function cudnnBackendFinalize() called:
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: rtc->loadModule()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: ptr.isSupported()
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: engine_post_checks(*engine_iface, engine.getPerfKnobs(), req_size, engine.getTargetSMCount())
e!         Error: CUDNN_STATUS_EXECUTION_FAILED; Reason: finalize_internal()
e! Time: 2024-05-28T07:01:08.192814 (0d+0h+0m+1s since start)
e! Process=210452; Thread=210452; GPU=NULL; Handle=NULL; StreamId=NULL.

[CUDNN ERROR] at file cudnn_att.cpp:141:
[cudnn_frontend] Error: No execution plans built successfully.
(base) h53@Nyx:~/repo/llm.c$ nvidia-smi
Tue May 28 07:02:59 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.65                 Driver Version: 551.86         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   37C    P8              8W /  165W |    1231MiB /  16380MiB |      3%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A        33      G   /Xwayland                                   N/A      |
+-----------------------------------------------------------------------------------------+
(base) h53@Nyx:~/repo/llm.c$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Nov__3_17:16:49_PDT_2023
Cuda compilation tools, release 12.3, V12.3.103
Build cuda_12.3.r12.3/compiler.33492891_0

Anerudhan added a commit to NVIDIA/cudnn-frontend that referenced this issue May 16, 2024

Based on

a0cef89

karpathy/llm.c#366 (comment) the error was seen on `rtc->loadModule()`. Adding cudaGetLastError() to capture the associated cudaError.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion `graph->check_support(cudnn_handle).is_good()' failed #366

Assertion `graph->check_support(cudnn_handle).is_good()' failed #366

wfoy commented May 6, 2024 •

edited

Anerudhan commented May 6, 2024

wfoy commented May 6, 2024

ifromeast commented May 13, 2024

Anerudhan commented May 13, 2024 •

edited

ifromeast commented May 13, 2024

Anerudhan commented May 13, 2024

ifromeast commented May 13, 2024

ifromeast commented May 13, 2024

ifromeast commented May 13, 2024

Anerudhan commented May 14, 2024

simonguozirui commented May 16, 2024

Anerudhan commented May 16, 2024

simonguozirui commented May 16, 2024

Anerudhan commented May 16, 2024

simonguozirui commented May 16, 2024

Anerudhan commented May 16, 2024

simonguozirui commented May 16, 2024

h53 commented May 27, 2024

Assertion `graph->check_support(cudnn_handle).is_good()' failed #366

Assertion `graph->check_support(cudnn_handle).is_good()' failed #366

Comments

wfoy commented May 6, 2024 • edited

Anerudhan commented May 6, 2024

wfoy commented May 6, 2024

ifromeast commented May 13, 2024

Anerudhan commented May 13, 2024 • edited

ifromeast commented May 13, 2024

Anerudhan commented May 13, 2024

ifromeast commented May 13, 2024

ifromeast commented May 13, 2024

ifromeast commented May 13, 2024

Anerudhan commented May 14, 2024

simonguozirui commented May 16, 2024

Anerudhan commented May 16, 2024

simonguozirui commented May 16, 2024

Anerudhan commented May 16, 2024

simonguozirui commented May 16, 2024

Anerudhan commented May 16, 2024

simonguozirui commented May 16, 2024

h53 commented May 27, 2024

wfoy commented May 6, 2024 •

edited

Anerudhan commented May 13, 2024 •

edited