-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfMemeryError running 16 atoms system sci on 4 * DCU node #4124
Comments
@Religious-J Hello, can you analyze the memory cost for this test case? You can test it on CPU first. |
The same 32 atoms task with |
OK,I analyze the memory cost for this test case on CPU: Also running on c64_m64_cpu machine in the NAME-------------------------|MEMORY(MB)--------
total 39155.9037
Psi_PW 37558.5117
PW_B_K::gcar 485.6704
PW_B_K::gk2 161.8901
Force::vkb1 118.3359
Stress::dbecp_noevc 118.3359
Stress::vkb1 118.3359
VNL::vkb 59.1680
Force::dbecp 48.9375
wavefunc::wfcatom 47.6631
DiagSub::hpsi 47.6631
DiagSub::spsi 47.6631
DiagSub::evctemp 47.6631
XC_Functional::gradcorr 29.4496
Broyden_Mixing::F&DF 28.7967
Nonlocal<PW>::becp 16.3125
Nonlocal<PW>::ps 16.3125
Force::becp 16.3125
Stress::becp 16.3125
Stress::dbecp 16.3125
FFT::grid 15.0000
XC_Functional::aux&gaux 10.6996 |
I try to run this example on BOHRIUM with "4 * NVIDIA GPU_16g", and it also has the out of memory error.
|
I use bohrium I also try to use two nodes on sugon DCU, but it still raise the oom error.
Also try to run other execute command: It seems that running on more than one node is not effetely to decrease the memory allocated on DCU. @denghuilu Is this reasonable? |
Use Bohrium '4 * DCU_32g' can run this example successfully. |
Could you please help to check if followed Pb task has OOM problem |
We need to check if all the 8 DCUs were actually used when applying for two nodes |
Details
As described in the title. The 16 atoms task with
kspacing
=0.05 Bohr^-1 is given by:relax_task.zip
Image: registry.dp.tech/dptech/abacus:v3.6.0
node type: 4 * DCU_16g
command: OMP_NUM_THREADS=1 mpirun -np 4 abacus_pw > log
Error msg:
Task list for Issue attackers (only for developers)
The text was updated successfully, but these errors were encountered: