-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detecting and linking against tcmalloc if present #961
Comments
out of curiosty, which options did you use to run GW |
I used the same setup I think as in your benchmarks: |
okay very interesting. |
I did not know GW had that many allocations especially with the ppm |
just changing the gw.mode to evGW slows it down tremendously, but similar speedup for tcmalloc, eg. 1250 vs 1900 sec (for 2 core test) |
but there are a lot of other options (some of which I am testing) - any particular suggestions for things to try? |
|
have you tried |
btw, which system size are you using? |
I am testing with benzene here (also doing tests with naphtalene); exact has same pattern (550s vs 400s with tcmalloc); cda is a lot slower again (5500 sec) - I think I also had to change to gwbse.gw.qp_solver=cda for this (otherwise it seemed to freeze at some point, but maybe I was too impatient). There is no difference in this case with tcmalloc I also tried gwbse.gw.qp_solver=fixedpoint - the other option besides cda and grid for this) (with the default ppm, G0W0) this is a faster, and shows some speedup with tcmalloc (9.2 and 7.3 sec) |
I am not sure, but qp_solver=cda should not be there. @JensWehner ? From what you see there, the alloc business seems to be related to solving the qp equations on grid. The fixed point solver is expected to require by far fewer evaluations of sigma (but can find "wrong" solutions). And as you saw yourself, CDA is a disaster. Grid is practically impossible to run. Maybe you can try with a very small molecule, like CO. |
qp_solver=cda should not be valid, I am surprised it worked. Yes CDA is not very performant, but that was expected. |
yea, I think the option is fixed now |
@ipelupessy the cmake refactor is done, |
This issue serves as a reminder for a possible addition to the build system in the form of (optionally) detecting and
linking against the tcmalloc library (google thread aware allocator); as such it should probably wait for #950.
The reason for this is the improved performance for some parts of the xtp code (GW).
This can be seen in the attached timings.
alloc.pdf
(this plot shows timings of a xtp_tools -e dftgwbse task run for benzene molecule as a function of number of threads. Different
colors indicate different tasks in the run (DFT, GW or BSE) and linestyles indicate different allocater (default (ptmalloc), tcmalloc or jemalloc - though tcmalloc and jemalloc are mostly very similar. As can be seen the GW task in particular is sensistive to the
malloc used)
As an alternative; the code involved can be examined in more detail and maybe the offending allocation refactored out.
The text was updated successfully, but these errors were encountered: