-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sudden jump in VMC and nan in DMC energies using Frontier #4903
Comments
Could you rerun with exactly the same condition and see if the issue is reproducible? |
I ran it twice and observed the jump in VMC in both times. I didn't check for the nan errors in the first try. |
Here are the results from the first run I made (the results reported at the top are from the second run): qmca -q eV *.scalar.dat Comparing 1st and the 2nd run, different twists were affected except for gamma which seems to be problematic in both cases. Inputs and the statistics outputs of the first run are attached here: dmc_WSe2_AAp_pbe_u_None_4x4x1_2x2x1_2500_first.tar.gz The first and the second run only differ in the "walkers_per_rank" parameter. |
Could you rerun with |
It seems that you are using hybridrep + GPU, this is still under development. Could you run with gpu=no to sposet_builder line? |
@ye-luo Is hybridrep+GPU incomplete or known to be buggy or just not tested enough (etc.)? If it is known to be incomplete then it should be blocked off or have an unmissable warning printed. @kayahans Have you been able to run this elsewhere (NERSC CPUs?)? It is more important that you can publish the science than spend any time chasing this. |
@prckent I ran these calculations in Cades.I have attached the input files I used and the trace data plots in the issue post at the top. |
@ye-luo Should I run this in Frontier again? |
|
Thanks @ye-luo, yes I had no such issues when running this particular or other bilayered materials at Cades which is a CPU only machine. I think your suggestion is to run the same calculation on Polaris? |
My suggestions is putting hybridrep on CPU even you are using GPU. |
@ye-luo Running with the hybrid rep on CPU seems to solve the problem. I didn't see any spikes in VMC energy with hybrid rep on CPU. Here are the VMC total energies compared with the run in Cades vs Frontier, they are identical:
Frontier:
Frontier VMC trace: |
Describe the bug
VMC energies and the variance suddenly jump for twists number 0 and 1. Although, they seem to recover for the both twists, the twist number 1 later on gets nan energies in the DMC calculation.
To Reproduce
Steps to reproduce the behavior:
QMCPACK 3.17.9 (Dec 22nd)
Frontier
Using the Frontier build script
All the input and smaller statistical output files are provided in the attachment
Wavefunctions are provided in
/lustre/orion/mat151/proj-shared/qmcpack_bug_issue_4903
Expected behavior
From Frontier
Local energy
Variance
In the figures, it looks like there is only jump in the VMC energies, but
grep nan *scalar.dat
shows persistent nan values in the dmc.g001.s002.scalar.dat file upon inspection.
From Cades:
Local energy
Variance
System:
Frontier
Additional context
input and statistical output files
From Frontier
dmc_WSe2_AAp_pbe_u_None_4x4x1_2x2x1_2500.tar.gz
From Cades
dmc_WSe2_AAp_pbe_u_None_4x4x1_2x2x1_2500_cades.tar.gz
The text was updated successfully, but these errors were encountered: