Error: corrupted size vs. prev_size when running cp2k on cluster #3376

HemanthHaridas · 2024-04-26T16:36:40Z

I am running an AIMD Metadynamics calculation using cp2k for a small system (89 atoms and 256 electrons). I also have PLUMED patched for enabling Metadynamics calculations. The calculations would run for a few tens of steps before crashing with the following error message:

corrupted size vs. prev_size

Program received signal SIGABRT: Process abort signal.

Backtrace for this error:
#0  0x14b25ff01171 in ???
#1  0x14b25ff00313 in ???
#2  0x14b25eaa9b4f in ???
#3  0x14b25eaa9acf in ???
#4  0x14b25ea7cea4 in ???
#5  0x14b25eaeacd6 in ???
#6  0x14b25eaf1fdb in ???
#7  0x14b25eaf2885 in ???
#8  0x14b25eaf3f0a in ???
#9  0x2db8cce in __dbcsr_mm_csr_MOD_dbcsr_mm_csr_finalize
	at /uufs/chpc.utah.edu/common/home/u6046562/cp2k-2022.2/exts/dbcsr/src/mm/dbcsr_mm_csr.F:590
#10  0x2ce7cf6 in __dbcsr_mm_multrec_MOD_dbcsr_mm_multrec_finalize
	at /uufs/chpc.utah.edu/common/home/u6046562/cp2k-2022.2/exts/dbcsr/src/mm/dbcsr_mm_multrec.F:353
#11  0x2cc7001 in __dbcsr_mm_cannon_MOD_multiply_cannon._omp_fn.4
	at /uufs/chpc.utah.edu/common/home/u6046562/cp2k-2022.2/exts/dbcsr/src/mm/dbcsr_mm_cannon.F:1658
#12  0x14b25f2944bd in ???
#13  0x14b267e6d1c9 in ???
#14  0x14b25ea94e72 in ???
#15  0xffffffffffffffff in ???

Any help on troubleshooting this would be much appreciated.

The text was updated successfully, but these errors were encountered:

hfp · 2024-04-29T09:15:27Z

This can be related to DBCSR (rather than CP2K). Anyhow, can you please share some more details about how the CP2K executable was built? Specifically, is this is a CPU-only build or is there any GPU-support involved?

alazzaro · 2024-04-29T09:28:36Z

The error comes from DBCSR, right, but I doubt the error is strictly DBCSR related.
The message says that somehow the matrix gets defragmented.

Need definitely more info: threads/ranks, input to reproduce it, compilation flags...
Can you reproduce with different number of ranks/threads?

HemanthHaridas · 2024-04-29T15:34:58Z

Thank you for your comments.

&GLOBAL 
  PROJECT Water_NaNO3
  RUN_TYPE MD 
  PRINT_LEVEL low
&END GLOBAL

&EXT_RESTART
  RESTART_FILE_NAME Water_NaNO3-1_20000.restart
&END
 
&FORCE_EVAL 
  METHOD Quickstep
  &DFT 
    BASIS_SET_FILE_NAME $HOME/Basis_Set.vanda
POTENTIAL_FILE_NAME $HOME/Potential.vanda
    MULTIPLICITY 1 
    &SCF 
      SCF_GUESS ATOMIC 
      MAX_SCF 600 
      EPS_SCF 5.0E-06
      CHOLESKY OFF
      &OT 
        MINIMIZER DIIS
        LINESEARCH 3PNT 
        PRECONDITIONER FULL_ALL
      &END
      &PRINT 
        &RESTART 
          &EACH 
            MD 0 
          &END 
        &END 
      &END 
    &END SCF
    &XC
      &XC_FUNCTIONAL NO_SHORTCUT
         &GGA_X_RPBE T
         &END GGA_X_RPBE
         &GGA_C_PBE T
         &END GGA_C_PBE
      &END XC_FUNCTIONAL
      &VDW_POTENTIAL
         POTENTIAL_TYPE PAIR_POTENTIAL
         &PAIR_POTENTIAL
           TYPE DFTD3
           PARAMETER_FILE_NAME $HOME/cp2k/dftd3.dat
           REFERENCE_FUNCTIONAL "NONE"
           D3_SCALING  1.0000000000000000E+000  8.7200000000000000E-001  5.1400000000000001E-001
         &END PAIR_POTENTIAL
      &END VDW_POTENTIAL
    &END XC
    &MGRID
      CUTOFF 1000
      REL_CUTOFF 50 !default: 40 Ry
    &END MGRID
  &END 
  &SUBSYS 
    &CELL 
      ABC 9.2218 9.2218 9.2218
    &END 
    &KIND H 
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE 
    &END
    &KIND O
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE
    &END
    &KIND N
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE
    &END
    &KIND Na
      BASIS_SET DZVP-MOLOPT-SR-GTH
      POTENTIAL GTH-PBE
    &END
 &END 
&END

&MOTION 
  &MD 
    ENSEMBLE NVT 
    STEPS 10000 
    TIMESTEP 1.0
    &THERMOSTAT 
      TYPE NOSE 
      &NOSE 
        TIMECON 10
      &END 
    &END 
    TEMPERATURE 298
  &END 
  &PRINT 
    &RESTART 
      &EACH 
        MD 1
      &END 
   &END
    &TRAJECTORY 
      FORMAT XYZ
   &END 
 &END
 
 &FREE_ENERGY
   &METADYN
     USE_PLUMED .TRUE.
     PLUMED_INPUT_FILE ./plumed.0
   &END
 &END
&END MOTION

This is the input file that I had used for the run.

Regarding the compilation flags, I do not know how the executable was built; because it was built by a previous member of the group whom I do not have contact with.

The error was observed if I run it on a single node with 64 cores, but If I run it with 8 cores, the code will run for ~ 80 steps before crashing with an SCF error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: corrupted size vs. prev_size when running cp2k on cluster #3376

Error: corrupted size vs. prev_size when running cp2k on cluster #3376

HemanthHaridas commented Apr 26, 2024

hfp commented Apr 29, 2024

alazzaro commented Apr 29, 2024

HemanthHaridas commented Apr 29, 2024

Error: corrupted size vs. prev_size when running cp2k on cluster #3376

Error: corrupted size vs. prev_size when running cp2k on cluster #3376

Comments

HemanthHaridas commented Apr 26, 2024

hfp commented Apr 29, 2024

alazzaro commented Apr 29, 2024

HemanthHaridas commented Apr 29, 2024