SCF parallelization with xTB Hamiltonian #872

steinmig · 2021-10-14T11:06:00Z

I compiled DFTB+ from the current master with openMP support. Running an energy calculation with the xTB Hamiltonian gave me virtual identical wall times of ~2 minutes for both 1 and 32 cores. I could confirm that multiple threads (the CPU usage never reached 32, but definitely > 1) were started and also the DFTB+ output gives me a different CPU time.

Is the reason that

tblite does not support multiple threads for the SCF?
the openMP support for DFTB+ is not forwarded to tblite?
I am doing something wrong in the compilation?

To Reproduce

Current master with gcc-9.2, all cmake defaults + WITH_TBLITE=ON
Input file is attached (same as in NaN / Infinity gradients and general question concerning gradients #871)

dftb_in.hsd.txt

The text was updated successfully, but these errors were encountered:

awvwgk · 2021-10-14T11:30:51Z

The library should use be able to exploit OpenMP parallelization. We propagate the OpenMP setting (hopefully correct) here:

dftbplus/external/tblite/CMakeLists.txt

Lines 3 to 4 in 7455efc

    
           # Propagate OpenMP option correctly to subproject 
        
           set(WITH_OpenMP ${WITH_OMP})

You might be able to improve the threading performance by setting OMP_SCHEDULE=dynamic. Since most OpenMP regions use a runtime schedule (both in DFTB+ and tblite), which will usually default to a static schedule and therefore lead to suboptimal scheduling for a symmetric neighbour list (should fix this at least in tblite). Performance with default schedule seems to be better than with dynamic in tblite (4 threads).

steinmig · 2021-10-14T14:01:42Z

But do you get a significant speed-up or is it more or less the same wall-time?

chemistza · 2021-10-15T09:41:54Z

I have noticed the same thing. xTB run thrugh DFTB+ doesn't seem to scale with increasing OMP threads, whereas running the same calculation through the native tblite interface scales well.

stale · 2022-09-02T19:23:07Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

steinmig · 2022-09-03T08:45:51Z

Any updates on this?

stale · 2023-03-03T11:12:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

steinmig · 2023-03-03T12:53:59Z

I think it is worth keeping this open

aradi · 2023-03-03T13:23:13Z

I agree. I'll try to have a look at it. Is this also valid with the recent version of DFTB+?

stale · 2023-09-04T06:44:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

steinmig · 2023-09-04T06:53:23Z

I'll try to test it this month

stefanoferr · 2023-11-17T10:09:49Z

I am experiencing this behaviour in the DFTB+ 23.1 version too.
Do you have any updates on this topic?

francescalb · 2024-02-27T09:31:34Z

Has this been solved? I also experience problems with parallell performance when runnin xTB

aradi · 2024-02-28T15:51:42Z

OK. Could you just add the following lines to the input and check the system above with 1 and 2 threads? (The system is rather small).

Options {
  TimingVerbosity = -1
}

When I do it on my laptop with the Conda version of 24.1, I see, that the diagonalization, where the program spends the most time, takes almost identical wall-clock times in both cases, that means, the matrix is too small for the diagonalization library (openblas in my case) to profit from multiple threads. If that is also the case here, you could try to use MKL instead of OpenBLAS, but otherwise there is not much we can do.

awvwgk added the library: tblite Related to tblite external dependency (xTB Hamiltonian) label Nov 16, 2021

awvwgk added the Hamiltonian: xTB Related to the extended tight-binding Hamiltonian label Mar 6, 2022

stale bot added the stale label Sep 2, 2022

stale bot removed the stale label Sep 3, 2022

stale bot added the stale label Mar 3, 2023

stale bot removed the stale label Mar 3, 2023

stale bot added the stale label Sep 4, 2023

bhourahine removed the stale label Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SCF parallelization with xTB Hamiltonian #872

SCF parallelization with xTB Hamiltonian #872

steinmig commented Oct 14, 2021

awvwgk commented Oct 14, 2021 •

edited

steinmig commented Oct 14, 2021

chemistza commented Oct 15, 2021

stale bot commented Sep 2, 2022

steinmig commented Sep 3, 2022

stale bot commented Mar 3, 2023

steinmig commented Mar 3, 2023

aradi commented Mar 3, 2023

stale bot commented Sep 4, 2023

steinmig commented Sep 4, 2023

stefanoferr commented Nov 17, 2023 •

edited

francescalb commented Feb 27, 2024

aradi commented Feb 28, 2024

SCF parallelization with xTB Hamiltonian #872

SCF parallelization with xTB Hamiltonian #872

Comments

steinmig commented Oct 14, 2021

awvwgk commented Oct 14, 2021 • edited

steinmig commented Oct 14, 2021

chemistza commented Oct 15, 2021

stale bot commented Sep 2, 2022

steinmig commented Sep 3, 2022

stale bot commented Mar 3, 2023

steinmig commented Mar 3, 2023

aradi commented Mar 3, 2023

stale bot commented Sep 4, 2023

steinmig commented Sep 4, 2023

stefanoferr commented Nov 17, 2023 • edited

francescalb commented Feb 27, 2024

aradi commented Feb 28, 2024

awvwgk commented Oct 14, 2021 •

edited

stefanoferr commented Nov 17, 2023 •

edited