You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm testing an HMC workflow with the ILDG checkpointer
The sample code can be accessed here
The code runs well with Nersc checkpointer used as : TheHMC.Resources.LoadNerscCheckpointer(CPparams);
It fails when using ILCGCheckpointer as: TheHMC.Resources.LoadILDGCheckpointer(CPparams);
The code runs until the first checkpoint, then I get a 'core' file and the following errors:
Can you please
i) recompile with configure flags including --enable-debug
ii) rerun on a single MPI rank the same volume, using a cold start if necessary.
iii) rerun it under gdb interactively. This core dump should become trapped and you can type "backtrace"
and find out the line of code and hopefully the problem. You can print variables in the local file with print if necessary.
Recompiled with --enable-debug
Ran on a single MPI rank -> code works fine. Repeating with 2 ranks causes same failure as above.
The rng file is written, but the issue occurs while writing the lat file, which is much bigger.
Using gdb for coredump doesn't yield anything.
"backtrace" gives "No stack"
Any idea why this could only happen for ILDG (not NERSC format) on multiple ranks only ?
I'm testing an HMC workflow with the ILDG checkpointer
The sample code can be accessed here
The code runs well with Nersc checkpointer used as :
TheHMC.Resources.LoadNerscCheckpointer(CPparams);
It fails when using ILCGCheckpointer as:
TheHMC.Resources.LoadILDGCheckpointer(CPparams);
The code runs until the first checkpoint, then I get a 'core' file and the following errors:
The last few lines of the output are :
Have replicated the error on Crusher (ORNL) and Tioga(LLNL) AMD machines.
Building Grid:
For building Grid, I use the standard procedure with lime, documented here
The text was updated successfully, but these errors were encountered: