Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCF parallelization with xTB Hamiltonian #872

Open
steinmig opened this issue Oct 14, 2021 · 13 comments
Open

SCF parallelization with xTB Hamiltonian #872

steinmig opened this issue Oct 14, 2021 · 13 comments
Labels
Hamiltonian: xTB Related to the extended tight-binding Hamiltonian library: tblite Related to tblite external dependency (xTB Hamiltonian)

Comments

@steinmig
Copy link
Contributor

I compiled DFTB+ from the current master with openMP support. Running an energy calculation with the xTB Hamiltonian gave me virtual identical wall times of ~2 minutes for both 1 and 32 cores. I could confirm that multiple threads (the CPU usage never reached 32, but definitely > 1) were started and also the DFTB+ output gives me a different CPU time.

Is the reason that

  • tblite does not support multiple threads for the SCF?
  • the openMP support for DFTB+ is not forwarded to tblite?
  • I am doing something wrong in the compilation?

To Reproduce

dftb_in.hsd.txt

@awvwgk
Copy link
Member

awvwgk commented Oct 14, 2021

The library should use be able to exploit OpenMP parallelization. We propagate the OpenMP setting (hopefully correct) here:

# Propagate OpenMP option correctly to subproject
set(WITH_OpenMP ${WITH_OMP})

You might be able to improve the threading performance by setting OMP_SCHEDULE=dynamic. Since most OpenMP regions use a runtime schedule (both in DFTB+ and tblite), which will usually default to a static schedule and therefore lead to suboptimal scheduling for a symmetric neighbour list (should fix this at least in tblite). Performance with default schedule seems to be better than with dynamic in tblite (4 threads).

@steinmig
Copy link
Contributor Author

But do you get a significant speed-up or is it more or less the same wall-time?

@chemistza
Copy link

I have noticed the same thing. xTB run thrugh DFTB+ doesn't seem to scale with increasing OMP threads, whereas running the same calculation through the native tblite interface scales well.

@awvwgk awvwgk added the library: tblite Related to tblite external dependency (xTB Hamiltonian) label Nov 16, 2021
@awvwgk awvwgk added the Hamiltonian: xTB Related to the extended tight-binding Hamiltonian label Mar 6, 2022
@stale
Copy link

stale bot commented Sep 2, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 2, 2022
@steinmig
Copy link
Contributor Author

steinmig commented Sep 3, 2022

Any updates on this?

@stale stale bot removed the stale label Sep 3, 2022
@stale
Copy link

stale bot commented Mar 3, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Mar 3, 2023
@steinmig
Copy link
Contributor Author

steinmig commented Mar 3, 2023

I think it is worth keeping this open

@stale stale bot removed the stale label Mar 3, 2023
@aradi
Copy link
Member

aradi commented Mar 3, 2023

I agree. I'll try to have a look at it. Is this also valid with the recent version of DFTB+?

@stale
Copy link

stale bot commented Sep 4, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label Sep 4, 2023
@steinmig
Copy link
Contributor Author

steinmig commented Sep 4, 2023

I'll try to test it this month

@bhourahine bhourahine removed the stale label Sep 4, 2023
@stefanoferr
Copy link

stefanoferr commented Nov 17, 2023

I am experiencing this behaviour in the DFTB+ 23.1 version too.
Do you have any updates on this topic?

@francescalb
Copy link

Has this been solved? I also experience problems with parallell performance when runnin xTB

@aradi
Copy link
Member

aradi commented Feb 28, 2024

OK. Could you just add the following lines to the input and check the system above with 1 and 2 threads? (The system is rather small).

Options {
  TimingVerbosity = -1
}

When I do it on my laptop with the Conda version of 24.1, I see, that the diagonalization, where the program spends the most time, takes almost identical wall-clock times in both cases, that means, the matrix is too small for the diagonalization library (openblas in my case) to profit from multiple threads. If that is also the case here, you could try to use MKL instead of OpenBLAS, but otherwise there is not much we can do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hamiltonian: xTB Related to the extended tight-binding Hamiltonian library: tblite Related to tblite external dependency (xTB Hamiltonian)
Projects
None yet
Development

No branches or pull requests

7 participants