Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard to reach convergence of some magnetic elements & abnormal slow of ABACUS scf calculations #4149

Closed
9 tasks
ZLI-afk opened this issue May 10, 2024 · 8 comments
Assignees
Labels
Performance Issues related to fail running ABACUS

Comments

@ZLI-afk
Copy link

ZLI-afk commented May 10, 2024

Details

Please see below attached cases as described by the title:

INPUT:

INPUT_PARAMETERS
calculation scf
basis_type pw
symmetry 0
ecutwfc 100
scf_thr 1e-08
scf_nmax 200
cal_force 1
cal_stress 1
kspacing 0.08
pseudo_rcut 10
pseudo_mesh 1
ks_solver dav
relax_nmax 100
force_thr 0.001
stress_thr 0.5
smearing_method gaussian
smearing_sigma 0.01

machine_type:
c64_m128_cpu_H

Task list for Issue attackers (only for developers)

  • Reproduce the performance issue on a similar system or environment.
  • Identify the specific section of the code causing the performance issue.
  • Investigate the issue and determine the root cause.
  • Research best practices and potential solutions for the identified performance issue.
  • Implement the chosen solution to address the performance issue.
  • Test the implemented solution to ensure it improves performance without introducing new issues.
  • Optimize the solution if necessary, considering trade-offs between performance and other factors (e.g., code complexity, readability, maintainability).
  • Review and incorporate any relevant feedback from users or developers.
  • Merge the improved solution into the main codebase and notify the issue reporter.
@ZLI-afk ZLI-afk added the Performance Issues related to fail running ABACUS label May 10, 2024
@WHUweiqingzhou
Copy link
Collaborator

WHUweiqingzhou commented May 11, 2024

@ZLI-afk,

According to ABACUS收敛性问题解决手册. I try to reduce mixing_beta from 0.8 to 0.4/0.2, and increase mixing_ndim from 8 to 15. You can check the results in link.
Actually, I try 4 combinations, namely:

  1. mixing_beta=0.4 and mixing_ndim=8
  2. mixing_beta=0.4 and mixing_ndim=15
  3. mixing_beta=0.2 and mixing_ndim=8
  4. mixing_beta=0.2 and mixing_ndim=15

For Ni-hcp, converges in all 4 combinations:
image

For Mn-bcc, converges in all 4 combinations:
image

For Fe-fcc, converges in 3 combinations, only fails to converge for mixing_beta=0.4 and mixing_ndim=8:
image

For Cr-bcc, converges in all 4 combinations:
image

For Co-bcc, converges in all 4 combinations:
image

For Ce-bcc, converges for mixing_beta=0.2 and mixing_ndim=8
image

Actually, Ce-bcc is not hard converge. Instead, it is very easy to converge, you can see the drho:
image
You can notice the drho decrease very fast to 1e-7, while fails to converge to 1e-8. These results indicate the Ce calculations is unstable numerically. This numerical instability might be caused by the pseudopotential. Furthermore, this instability also can lead to some numerical errors in the iterative solution methods (like Davidson method), but this is not a bug, rather it is a feature of this numerical solution technique. You can see more discussion in Issue #4068.

@pxlxingliang
Copy link
Collaborator

pxlxingliang commented May 11, 2024

I have checked some examples calculated previous (ecutwfc is also 100 Ry).

example natom nbands nelec kpoints bohrium_machine (parallel core) cpu ave scf_time
041_ZnMnGa 49 290 481 63 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 305
043_RuSc 30 223 370 112 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 530
055_ErAlNi 24 217 360 152 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 278
V(this issue) 32 250 416 112 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 605
Sm(this issue) 24 159 264 172 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 1568

@ZLI-afk
Copy link
Author

ZLI-afk commented May 11, 2024

I have check one example I calculated previous, and the ecutwfc is also 100 Ry.

example natom nbands nelec kpoints bohrium_machine (parallel core) cpu ave scf_time
041_ZnMnGa 49 290 481 63 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 305
043_RuSc 30 223 370 112 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 530
055_ErAlNi 24 217 360 152 c32_m128_cpu(32) Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz 278
V(this issue) 32 250 416 112 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 605
Sm(this issue) 24 159 264 172 c64_m128_cpu_H(64) AMD EPYC 7452 32-Core Processor 1568

Is average scf_time too high for V and Sm? Is any way to solve this?

@pxlxingliang
Copy link
Collaborator

Is average scf_time too high for V and Sm? Is any way to solve this?

Yes, it seems abnormal for these two examples. I suspect the performance of c64_m128_cpu_H(64) is not good. I will try to use c32_m128_cpu (paratera) to test them.

@pxlxingliang
Copy link
Collaborator

I use c32_m128_cpu (paratera) to run example v with 32 cores parallel, the first 3 scf steps are:

ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s) 
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  7.761e+02  
 DA2    -6.425507e+04  -5.239252e+00  2.121e+00  4.531e+02  
 DA3    -6.425613e+04  -1.065634e+00  8.683e+00  6.127e+02  

While the results in this issue are:

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  8.604e+02
 DA2    -6.425507e+04  -5.239372e+00  2.121e+00  5.164e+02
 DA3    -6.425613e+04  -1.063900e+00  8.689e+00  7.088e+02

As we can see, the performance of c32_m128_cpu (paratera) is better than c64_m128_cpu_H(64).

@pxlxingliang
Copy link
Collaborator

Update the first 3 SCF steps of Sm on c32_m128_cpu :

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -2.616618e+04  0.000000e+00   7.877e-02  4.249e+03
 DA2    -2.616630e+04  -1.143728e-01  1.732e-02  1.742e+03
 DA3    -2.616623e+04  6.750572e-02   1.187e-01  1.925e+03

The results in this issue are:

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -2.616618e+04  0.000000e+00   7.877e-02  3.811e+03
 DA2    -2.616630e+04  -1.143728e-01  1.732e-02  1.372e+03
 DA3    -2.616623e+04  6.750572e-02   1.187e-01  1.624e+03

@ZLI-afk
Copy link
Author

ZLI-afk commented May 11, 2024

I use c32_m128_cpu (paratera) to run example v with 32 cores parallel, the first 3 scf steps are:

ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s) 
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  7.761e+02  
 DA2    -6.425507e+04  -5.239252e+00  2.121e+00  4.531e+02  
 DA3    -6.425613e+04  -1.065634e+00  8.683e+00  6.127e+02  

While the results in this issue are:

 ITER   ETOT(eV)       EDIFF(eV)      DRHO       TIME(s)
 DA1    -6.424983e+04  0.000000e+00   2.174e+00  8.604e+02
 DA2    -6.425507e+04  -5.239372e+00  2.121e+00  5.164e+02
 DA3    -6.425613e+04  -1.063900e+00  8.689e+00  7.088e+02

As we can see, the performance of c32_m128_cpu (paratera) is better than c64_m128_cpu_H(64).

Please see the latest V case which fails to finished the scf calculation due to KILLED BY SIGNAL: 6 (Aborted). Maybe this imply something
V_failed_signal_6.zip

@pxlxingliang
Copy link
Collaborator

pxlxingliang commented May 11, 2024

Please see the latest V case which fails to finished the scf calculation due to KILLED BY SIGNAL: 6 (Aborted). Maybe this imply something V_failed_signal_6.zip

The error of this test is related to the SchmitOrth in davidson:

abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.
abacus: /abacus-develop/source/module_hsolver/diago_david.cpp:947: void hsolver::DiagoDavid<>::SchmitOrth(const int &, const int, const int, psi::Psi<T, Device> &, const T *, T *, const int, const int) [T = std::complex<double>, Device = psi::DEVICE_CPU]: Assertion `psi_norm > 0.0' failed.

Usually, it is because of the numerical instability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Issues related to fail running ABACUS
Projects
None yet
Development

No branches or pull requests

3 participants