Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory requirement formula correction #27

Open
kWeissenow opened this issue Feb 11, 2020 · 2 comments
Open

Memory requirement formula correction #27

kWeissenow opened this issue Feb 11, 2020 · 2 comments

Comments

@kWeissenow
Copy link

kWeissenow commented Feb 11, 2020

While reducing the alignment sizes of my current dataset in order to be able to compute couplings on the GPU, I noticed a large discrepancy between results from the formula in the README and the actual RAM needed when running CCMpred.

I know that CCMpred is no longer actively maintained, but in order to help fellow researches running into the same issue, here is the corrected formula based on the calculation in the source code (ccmpred.c, lines 437-441):
Padded: 4* (4* (L * L * 32 * 21 + L * 20) + N * L * 2 + N * L * 32 + N) + 2 * N * L
Unpadded: 4* (4* (L * L * 21 * 21 + L * 20) + N * L * 2 + N * L * 21 + N) + 2 * N * L

The internal size_t mem_needed is however only used for the output part, the actual allocation happens separately for a variety of different memory blocks. I'll do some further testing with samples calculated to barely fit into GPU memory to see if the CUDA allocations are equivalent.

@kWeissenow
Copy link
Author

kWeissenow commented Feb 13, 2020

Apparently, the actual GPU memory needed is still larger than indicated, leading to a crash with CUDA error 2 (out of memory).

Found 1 CUDA devices, using device #0: Tesla V100-SXM2-16GB
Total GPU RAM:     16,914,055,168
Free GPU RAM:      16,475,422,720
Needed GPU RAM:    16,475,401,388 �
Reweighted 538462 sequences with threshold 0.8 to Beff=226100 weight mean=0.4199, min=8.95656e-05, max=1

Will optimize 20389525 32-bit variables

iter    eval    f(x)            �x�             �g�             step
CUDA error No. 2 in [...]/CCMpred/lib/libconjugrad/src/conjugrad_cuda.c at line 185

When further reducing alignment sizes so memory consumption stops being a problem, large MSAs still cause crashes with CUDA error 77 (illegal memory access) as shown in the example below:

Found 1 CUDA devices, using device #0: Tesla V100-SXM2-16GB
Total GPU RAM:     16,914,055,168
Free GPU RAM:      16,475,422,720
Needed GPU RAM:    12,562,797,518 �
Reweighted 307029 sequences with threshold 0.8 to Beff=153460 weight mean=0.499823, min=0.00118765, max=1

Will optimize 33843029 32-bit variables

iter    eval    f(x)            �x�             �g�             step
CUDA error No. 77 in [...]/CCMpred/src/evaluate_cuda_kernels.cu at line 590

Since apparently this has not been a common occurrence in the past, I assume the very large alignment is causing the issue. I'll try to investigate and will report back if I find the problem in the CUDA kernels.

@jhschwartz
Copy link

Hi, I wonder if this is related to #34? Just opened it and I'm curious if you found a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants