Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: corrupted size vs. prev_size when running cp2k on cluster #3376

Open
HemanthHaridas opened this issue Apr 26, 2024 · 3 comments
Open

Comments

@HemanthHaridas
Copy link

I am running an AIMD Metadynamics calculation using cp2k for a small system (89 atoms and 256 electrons). I also have PLUMED patched for enabling Metadynamics calculations. The calculations would run for a few tens of steps before crashing with the following error message:

corrupted size vs. prev_size

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x14b25ff01171 in ???
#1  0x14b25ff00313 in ???
#2  0x14b25eaa9b4f in ???
#3  0x14b25eaa9acf in ???
#4  0x14b25ea7cea4 in ???
#5  0x14b25eaeacd6 in ???
#6  0x14b25eaf1fdb in ???
#7  0x14b25eaf2885 in ???
#8  0x14b25eaf3f0a in ???
#9  0x2db8cce in __dbcsr_mm_csr_MOD_dbcsr_mm_csr_finalize
	at /uufs/chpc.utah.edu/common/home/u6046562/cp2k-2022.2/exts/dbcsr/src/mm/dbcsr_mm_csr.F:590
#10  0x2ce7cf6 in __dbcsr_mm_multrec_MOD_dbcsr_mm_multrec_finalize
	at /uufs/chpc.utah.edu/common/home/u6046562/cp2k-2022.2/exts/dbcsr/src/mm/dbcsr_mm_multrec.F:353
#11  0x2cc7001 in __dbcsr_mm_cannon_MOD_multiply_cannon._omp_fn.4
	at /uufs/chpc.utah.edu/common/home/u6046562/cp2k-2022.2/exts/dbcsr/src/mm/dbcsr_mm_cannon.F:1658
#12  0x14b25f2944bd in ???
#13  0x14b267e6d1c9 in ???
#14  0x14b25ea94e72 in ???
#15  0xffffffffffffffff in ???

Any help on troubleshooting this would be much appreciated.

@hfp
Copy link
Member

hfp commented Apr 29, 2024

This can be related to DBCSR (rather than CP2K). Anyhow, can you please share some more details about how the CP2K executable was built? Specifically, is this is a CPU-only build or is there any GPU-support involved?

@alazzaro
Copy link
Member

The error comes from DBCSR, right, but I doubt the error is strictly DBCSR related.
The message says that somehow the matrix gets defragmented.

Need definitely more info: threads/ranks, input to reproduce it, compilation flags...
Can you reproduce with different number of ranks/threads?

@HemanthHaridas
Copy link
Author

Thank you for your comments.

&GLOBAL 
  PROJECT Water_NaNO3
  RUN_TYPE MD 
  PRINT_LEVEL low
&END GLOBAL

&EXT_RESTART
  RESTART_FILE_NAME Water_NaNO3-1_20000.restart
&END
 
&FORCE_EVAL 
  METHOD Quickstep
  &DFT 
    BASIS_SET_FILE_NAME $HOME/Basis_Set.vanda
POTENTIAL_FILE_NAME $HOME/Potential.vanda
    MULTIPLICITY 1 
    &SCF 
      SCF_GUESS ATOMIC 
      MAX_SCF 600 
      EPS_SCF 5.0E-06
      CHOLESKY OFF
      &OT 
        MINIMIZER DIIS
        LINESEARCH 3PNT 
        PRECONDITIONER FULL_ALL
      &END
      &PRINT 
        &RESTART 
          &EACH 
            MD 0 
          &END 
        &END 
      &END 
    &END SCF
    &XC
      &XC_FUNCTIONAL NO_SHORTCUT
         &GGA_X_RPBE T
         &END GGA_X_RPBE
         &GGA_C_PBE T
         &END GGA_C_PBE
      &END XC_FUNCTIONAL
      &VDW_POTENTIAL
         POTENTIAL_TYPE PAIR_POTENTIAL
         &PAIR_POTENTIAL
           TYPE DFTD3
           PARAMETER_FILE_NAME $HOME/cp2k/dftd3.dat
           REFERENCE_FUNCTIONAL "NONE"
           D3_SCALING  1.0000000000000000E+000  8.7200000000000000E-001  5.1400000000000001E-001
         &END PAIR_POTENTIAL
      &END VDW_POTENTIAL
    &END XC
    &MGRID
      CUTOFF 1000
      REL_CUTOFF 50 !default: 40 Ry
    &END MGRID
  &END 
  &SUBSYS 
    &CELL 
      ABC 9.2218 9.2218 9.2218
    &END 
    &KIND H 
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE 
    &END
    &KIND O
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE
    &END
    &KIND N
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE
    &END
    &KIND Na
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE
    &END
 &END 
&END

&MOTION 
  &MD 
    ENSEMBLE NVT 
    STEPS 10000 
    TIMESTEP 1.0
    &THERMOSTAT 
      TYPE NOSE 
      &NOSE 
        TIMECON 10
      &END 
    &END 
    TEMPERATURE 298
  &END 
  &PRINT 
    &RESTART 
      &EACH 
        MD 1
      &END 
   &END
    &TRAJECTORY 
      FORMAT XYZ
   &END 
 &END
 
 &FREE_ENERGY
   &METADYN
     USE_PLUMED .TRUE.
     PLUMED_INPUT_FILE ./plumed.0
   &END
 &END
&END MOTION

This is the input file that I had used for the run.

Regarding the compilation flags, I do not know how the executable was built; because it was built by a previous member of the group whom I do not have contact with.

The error was observed if I run it on a single node with 64 cores, but If I run it with 8 cores, the code will run for ~ 80 steps before crashing with an SCF error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants