-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SCF parallelization with xTB Hamiltonian #872
Comments
The library should use be able to exploit OpenMP parallelization. We propagate the OpenMP setting (hopefully correct) here: dftbplus/external/tblite/CMakeLists.txt Lines 3 to 4 in 7455efc
|
But do you get a significant speed-up or is it more or less the same wall-time? |
I have noticed the same thing. xTB run thrugh DFTB+ doesn't seem to scale with increasing OMP threads, whereas running the same calculation through the native tblite interface scales well. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
Any updates on this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I think it is worth keeping this open |
I agree. I'll try to have a look at it. Is this also valid with the recent version of DFTB+? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. |
I'll try to test it this month |
I am experiencing this behaviour in the DFTB+ 23.1 version too. |
Has this been solved? I also experience problems with parallell performance when runnin xTB |
OK. Could you just add the following lines to the input and check the system above with 1 and 2 threads? (The system is rather small).
When I do it on my laptop with the Conda version of 24.1, I see, that the diagonalization, where the program spends the most time, takes almost identical wall-clock times in both cases, that means, the matrix is too small for the diagonalization library (openblas in my case) to profit from multiple threads. If that is also the case here, you could try to use MKL instead of OpenBLAS, but otherwise there is not much we can do. |
I compiled DFTB+ from the current master with openMP support. Running an energy calculation with the xTB Hamiltonian gave me virtual identical wall times of ~2 minutes for both 1 and 32 cores. I could confirm that multiple threads (the CPU usage never reached 32, but definitely > 1) were started and also the DFTB+ output gives me a different CPU time.
Is the reason that
To Reproduce
WITH_TBLITE=ON
dftb_in.hsd.txt
The text was updated successfully, but these errors were encountered: