-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assertion `graph->check_support(cudnn_handle).is_good()' failed #366
Comments
Can you add which GPU device and cudnn version? https://docs.nvidia.com/deeplearning/cudnn/latest/reference/troubleshooting.html |
Fixed by upgrading cuDNN version, previously was on 8.9.2 which broke with above error |
After compiling by
and my CUDA is 12.4, cuDNN is 9.1, cudnn-frontend is 1.4.0 on Ubuntu 22.04 |
Hi @ifromeast Is it possible for you to dump the cudnn log? If you set The log will look like something like this:
I am able to run the exact some configuration locally.
|
Hi @Anerudhan , Thank you so much for your advice to print log, and I got
Do you know why it happens? I am new to CUDA, thank you so much! |
Could be a driver or toolkit issue. What version of driver are you on?
Update instructions: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=22.04&target_type=deb_network |
this is my driver version
|
@Anerudhan cudnn-frontend have updated last week, have you update it? |
similarly, the ERROR occurs when
is there anything wrong with my cuDNN or cudnn-frontend? |
Hi @ifromeast, This does not look like a cudnn issue. I suspect this happens because of multi-GPU setup. Is it possible for you to try two scenarios: Thanks |
I am having exact issue as @ifromeast. My cuda version is |
Hi @simonguozirui , Is it on multi-GPU 4090 as well? Is it possible for you to try two scenarios: b) (Indpendent of case above)Try setting CUDA_MODULE_LOADING=EAGER and CUDA_MODULE_DATA_LOADING=EAGER Thanks |
Hey @Anerudhan! Thanks so much for the suggestion. I tried both of those, but unfortunately doesn't change the behavior. I am on a T4 GPU (single GPU setup). Things break for me at Curious which CuDNN and front end version are you using so I can reference and debug. |
I am using cudnn-frontend-1.4.0 and cuda 12.4 (I have cuda 12.3 installed as well for debugging). I think the issue is cudnn sdpa operation is not supported on T4 (turing and requires Ampere or later GPUs). If you run with Thanks |
@Anerudhan thanks I will try on an Ampere GPU too. |
Those are info messages (i!) and harmless as they capture the library state. I would be more interested in messages which are warnings(w!) or errors(e!). |
Hi @Anerudhan, I checked. No errors
|
karpathy/llm.c#366 (comment) the error was seen on `rtc->loadModule()`. Adding cudaGetLastError() to capture the associated cudaError.
same error with @ifromeast, btw I tested it on wsl.
|
I'm getting the following error when running
./train_gpt2cu
after building usingmake train_gpt2cu USE_CUDNN=1
I'm running CUDA 12.4 on Ubuntu 22.04
Any help or pointers would be great, thanks!
The text was updated successfully, but these errors were encountered: