Memory requirement formula correction #27

kWeissenow · 2020-02-11T15:54:08Z

While reducing the alignment sizes of my current dataset in order to be able to compute couplings on the GPU, I noticed a large discrepancy between results from the formula in the README and the actual RAM needed when running CCMpred.

I know that CCMpred is no longer actively maintained, but in order to help fellow researches running into the same issue, here is the corrected formula based on the calculation in the source code (ccmpred.c, lines 437-441):
Padded: 4* (4* (L * L * 32 * 21 + L * 20) + N * L * 2 + N * L * 32 + N) + 2 * N * L
Unpadded: 4* (4* (L * L * 21 * 21 + L * 20) + N * L * 2 + N * L * 21 + N) + 2 * N * L

The internal size_t mem_needed is however only used for the output part, the actual allocation happens separately for a variety of different memory blocks. I'll do some further testing with samples calculated to barely fit into GPU memory to see if the CUDA allocations are equivalent.

The text was updated successfully, but these errors were encountered:

kWeissenow · 2020-02-13T14:12:32Z

Apparently, the actual GPU memory needed is still larger than indicated, leading to a crash with CUDA error 2 (out of memory).

Found 1 CUDA devices, using device #0: Tesla V100-SXM2-16GB
Total GPU RAM:     16,914,055,168
Free GPU RAM:      16,475,422,720
Needed GPU RAM:    16,475,401,388 �
Reweighted 538462 sequences with threshold 0.8 to Beff=226100 weight mean=0.4199, min=8.95656e-05, max=1

Will optimize 20389525 32-bit variables

iter    eval    f(x)            �x�             �g�             step
CUDA error No. 2 in [...]/CCMpred/lib/libconjugrad/src/conjugrad_cuda.c at line 185

When further reducing alignment sizes so memory consumption stops being a problem, large MSAs still cause crashes with CUDA error 77 (illegal memory access) as shown in the example below:

Found 1 CUDA devices, using device #0: Tesla V100-SXM2-16GB
Total GPU RAM:     16,914,055,168
Free GPU RAM:      16,475,422,720
Needed GPU RAM:    12,562,797,518 �
Reweighted 307029 sequences with threshold 0.8 to Beff=153460 weight mean=0.499823, min=0.00118765, max=1

Will optimize 33843029 32-bit variables

iter    eval    f(x)            �x�             �g�             step
CUDA error No. 77 in [...]/CCMpred/src/evaluate_cuda_kernels.cu at line 590

Since apparently this has not been a common occurrence in the past, I assume the very large alignment is causing the issue. I'll try to investigate and will report back if I find the problem in the CUDA kernels.

jhschwartz · 2022-01-07T16:49:19Z

Hi, I wonder if this is related to #34? Just opened it and I'm curious if you found a solution.

jhschwartz mentioned this issue Jan 7, 2022

CCMpred is limited to MSA length (ncols) = 1787 residues #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory requirement formula correction #27

Memory requirement formula correction #27

kWeissenow commented Feb 11, 2020 •

edited

kWeissenow commented Feb 13, 2020 •

edited

jhschwartz commented Jan 7, 2022

Memory requirement formula correction #27

Memory requirement formula correction #27

Comments

kWeissenow commented Feb 11, 2020 • edited

kWeissenow commented Feb 13, 2020 • edited

jhschwartz commented Jan 7, 2022

kWeissenow commented Feb 11, 2020 •

edited

kWeissenow commented Feb 13, 2020 •

edited