Support of CuDNN8 #7000

artyom-beilis · 2021-04-20T07:29:17Z

Support of CuDNN8

Some of the API that was used by Caffe was removed in cudnn8. Without it it is impossible to run Caffe on Ampre architecture.

It required:

switch to cudnnFind* API instead of cudnnGet* that was removed in version 8.
cache search results such that search of the alogrithms happens only in case shape really changed - otherwise reshape costs too much
fixed cudnn version search to support cudnn 8
added missing error code that was added in version 8

The change was tested on

3070/cuda11.2/cudnn8.1
1080/cuda8/cudnn7
1080/cuda8/cudnn6

- switch to cudnnFind* API instead of cudnnGet* that was removed in 8 - fixed cudnn version search - search of the alogrithms happens only in case shape really changed

artyom-beilis · 2021-05-03T09:21:17Z

Anybody here?

borisgribkov · 2021-10-14T18:47:59Z

@artyom-beilis Thanks for your patch! I have tried it the same as this #6970. But encountered with a large memory utilization in case of cudnn8.
After some tests I have tried a model with single conv layer and ( 20 * 3 * 1280 * 720 ) input, it's "head" of ResNet used for detection task. With cuda10 and cudnn7.6 I observed about 1.7Gb usage for a forward pass, for cuda 11 and cudnn8 ~ 2.6Gb. Maybe this comparison is not fully correct, because different GPUs were used, Titan XP in the first case and 3060 for the second.
Have you seen something like this with 3070 and 1080? Thank you!

artyom-beilis · 2021-10-14T19:59:55Z

Hi, I noticed larger memory use as well. It looks like related to cudnn8 in general. I see clear difference when I build same code with cudnn7 vs cudnn8. Also make sure you use latest alignment fix, i.e. latest branch: https://github.com/artyom-beilis/caffe/commits/fixes_for_cudnn8_bvlc_master Also caffe in general is memory hug. AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well. Artyom @artyom-beilis Thanks for your patch! I have tried it the same as this #6970. But encountered with a large memory utilization in case of cudnn8. After some tests I have tried a model with single conv layer and ( 20 * 3 * 1280 * 720 ) input, it's "head" of ResNet used for detection task. With cuda10 and cudnn7.6 I observed about 1.7Gb usage for a forward pass, for cuda 11 and cudnn8 ~ 2.6Gb. Maybe this comparison is not fully correct, because different GPUs were used, Titan XP in the first case and 3060 for the second. Have you seen something like this with 3070 and 1080? Thank you! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

borisgribkov · 2021-10-14T20:33:13Z

AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well.

Could you tell more about other frameworks? I have tried to find similar GPU memory problems mentions but unsuccessful.

artyom-beilis · 2021-10-14T20:42:19Z

AFAIR I noticed the difference in memory use of cudnn7 vs cudnn8 with other frameworks as well.

Could you tell more about other frameworks? I have tried to find similar GPU memory usage mentions but unsuccessful.

I don't really remember. It was either pytorch or mxnet. I don't recall. Was long time ago.

borisgribkov · 2021-10-14T20:45:12Z

Anyway, thank you! )

…ch, so switched to cudnnGet*_v7 API instead of much heavier cudnnFind and query optimal algorithm on _any_ reshape - not ignoring batch size reduction

kmmanto · 2022-01-11T11:39:32Z

Following this as current caffe I built with nvcr.io/nvidia/cuda:11.4.1-cudnn8-devel-ubuntu20.04 with OpenPose results to a much larger GPU RAM footprint on an AWS G5(Ampere).

BigMuscle85 · 2023-01-20T09:29:17Z

I tried the proposed changes to make cuDNN8 work but it does not work and the training immediately ends with the following error:

I0120 10:25:10.763470 1539595 solver.cpp:60] Solver scaffolding done.
I0120 10:25:10.765404 1539595 caffe.cpp:239] Starting Optimization
I0120 10:25:10.765410 1539595 solver.cpp:292] Solving squeezenet-ssd
I0120 10:25:10.765413 1539595 solver.cpp:293] Learning Rate Policy: poly
F0120 10:25:10.835502 1539595 cudnn_conv_layer.cu:118] Check failed: status == CUDNN_STATUS_SUCCESS (4 vs. 0)  CUDNN_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
    @     0x7f09cdf8f1c3  google::LogMessage::Fail()
    @     0x7f09cdf9425b  google::LogMessage::SendToLog()
    @     0x7f09cdf8eebf  google::LogMessage::Flush()
    @     0x7f09cdf8f6ef  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f09ce7753f0  caffe::CuDNNConvolutionLayer<>::Backward_gpu()
    @     0x7f09ce711c6a  caffe::Net<>::BackwardFromTo()
    @     0x7f09ce711da5  caffe::Net<>::Backward()
    @     0x7f09ce6ecaab  caffe::Solver<>::Step()
    @     0x7f09ce6ed492  caffe::Solver<>::Solve()
    @     0x55739e9b4a7a  train()
    @     0x55739e9b1eac  main
    @     0x7f09cd2fb083  __libc_start_main
    @     0x55739e9b290e  _start

Ubuntu 20.04
nVidia GeForce RTX 3060 12 GB
Driver Version: 510.108.03
CUDA Version: 11.6
cuDNN Version: 8.6

Build without cuDNN runs without problems.

Support of CuDNN8:

2c5d39f

- switch to cudnnFind* API instead of cudnnGet* that was removed in 8 - fixed cudnn version search - search of the alogrithms happens only in case shape really changed

artyom-beilis mentioned this pull request May 3, 2021

Caffe install on ubuntu 20.04 with RTX3090 #6993

Open

artyom-beilis mentioned this pull request Aug 22, 2021

just a question artyom-beilis/dlprimitives#1

Closed

Fixed aligmnet issues for workspace

f2e8a96

Fixed reshape. On Pascal same algorithm does not work for smaller bat…

0999f5d

…ch, so switched to cudnnGet*_v7 API instead of much heavier cudnnFind and query optimal algorithm on _any_ reshape - not ignoring batch size reduction

mhanuel26 mentioned this pull request Mar 15, 2022

train with caffee Vitis-AI GPU fail Xilinx/Vitis-AI#691

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support of CuDNN8 #7000

Support of CuDNN8 #7000

artyom-beilis commented Apr 20, 2021

artyom-beilis commented May 3, 2021

borisgribkov commented Oct 14, 2021

artyom-beilis commented Oct 14, 2021 via email

borisgribkov commented Oct 14, 2021 •

edited

artyom-beilis commented Oct 14, 2021

borisgribkov commented Oct 14, 2021

kmmanto commented Jan 11, 2022

BigMuscle85 commented Jan 20, 2023 •

edited

Support of CuDNN8 #7000

Are you sure you want to change the base?

Support of CuDNN8 #7000

Conversation

artyom-beilis commented Apr 20, 2021

artyom-beilis commented May 3, 2021

borisgribkov commented Oct 14, 2021

artyom-beilis commented Oct 14, 2021 via email

borisgribkov commented Oct 14, 2021 • edited

artyom-beilis commented Oct 14, 2021

borisgribkov commented Oct 14, 2021

kmmanto commented Jan 11, 2022

BigMuscle85 commented Jan 20, 2023 • edited

borisgribkov commented Oct 14, 2021 •

edited

BigMuscle85 commented Jan 20, 2023 •

edited