Segfault when above ~700 atoms using ASE #218

tgmaxson · 2023-10-14T17:17:22Z

Describe the bug
Systems which are too big seem to fail when using the ASE calculator

To Reproduce
Steps to reproduce the behaviour:

from ase.build import bulk
from dftd4.ase import DFTD4

calc = DFTD4(method="PBE")

size = 9
atoms = bulk("Ag") * (size, size, size)
print(len(atoms))

atoms.calc = calc
atoms.get_potential_energy()

This seems to crash immediately with a segfault across multiple clusters. The cutoff is somewhere around 680 atoms we think, but I do not remember the exact point where it starts failing. This is not "method", atomic species, or volume-dependent it seems.

Interestingly, D3 works fine using the simple D3 calculator. VASP also manages to calculate this fine.

The text was updated successfully, but these errors were encountered:

marvinfriede · 2023-10-15T08:11:11Z

It works if you set ulimit -s unlimited in your environment, as also suggested in the xtb docs.

tgmaxson · 2023-10-15T16:51:25Z

Isnt the requirement to make the stack larger like this typically considered a bug / bad practice? Why is DFTD4 going down such an extreme recursive path that scales by atom count?

I will however check this now.

marvinfriede · 2023-10-15T20:46:53Z

Additional info: I tested a large molecule (1000 atoms, coord.txt), and it

works with the standalone program version 3.5.0 and 3.6.0
(compiled with ifort and meson setup ... --buildtype=release --default-library=static -Dfortran_link_args="-static" -Dfortran_args="-Ofast -axAVX2 -mtune=core-avx2 -fma")
segfaults with older versions of standalone program (tested 3.3.0 and 3.4.0)
segfaults with ase
segfaults with dftd4 version (3.5.0) from conda

If more than one core is used, one also has to (expectedly) increase OMP_STACKSIZE.

Inspecting the error with version 3.3.0 suggests that the problem comes from the multicharge library.

Error log

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source             
dftd4              00000000020E1F6A  Unknown               Unknown  Unknown
dftd4              00000000022DF200  Unknown               Unknown  Unknown
dftd4              0000000000453F41  multicharge_model         131  model.f90
dftd4              00000000021FE773  Unknown               Unknown  Unknown
dftd4              00000000021C0DCC  Unknown               Unknown  Unknown
dftd4              00000000021908B8  Unknown               Unknown  Unknown
dftd4              0000000000453A49  multicharge_model         131  model.f90
dftd4              000000000044FAD2  multicharge_model         462  model.f90
dftd4              000000000043A35D  dftd4_charge_mp_g          67  charge.f90
dftd4              000000000040F08D  dftd4_disp_mp_get          82  disp.f90
dftd4              00000000004063A2  MAIN__                    150  main.f90
dftd4              00000000004053D2  Unknown               Unknown  Unknown
dftd4              00000000022E06A0  Unknown               Unknown  Unknown
dftd4              00000000004052B7  Unknown               Unknown  Unknown

tgmaxson · 2023-10-15T22:15:50Z

I tested up to 6,000,000 atoms in ASE with the stack size changed, worked fine. Still would be good if we didn't need to increase the stack but this at least works for us.

Maybe a warning can be thrown by the code when the stacksize is limited and more than 500 or so atoms are used? What I find weird is it worked in VASP I believe. This is something that is an actual bug potentially in the ASE interface / mamba.

marvinfriede · 2023-10-17T07:54:37Z

I found more context and helpful explanations (here, here, and especially here).

To summarize the most important points:

it is compiler-dependent, if certain arrays go to the stack or the heap (there are certain flags to modify this behavior)
the default stack size is 8mb, and even setting it to unlimited only increases it to 64mb (on my machine, ulimit -a | grep "\-s")
coming back to the error trace of my previous comment: the error seems to come from the multicharge library, in particular from the get_amat_0d subroutine, so it is not surprising that D3 works without problems

Since it comes from a dependency, I do not think we should change anything in DFT-D4. I am no sure if increasing the stack size is problematic; it also seems supported by the Fortran community.

marvinfriede mentioned this issue Oct 17, 2023

Segfault for large systems grimme-lab/multicharge#23

Open

marvinfriede added the bug Something isn't working label Oct 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault when above ~700 atoms using ASE #218

Segfault when above ~700 atoms using ASE #218

tgmaxson commented Oct 14, 2023

marvinfriede commented Oct 15, 2023

tgmaxson commented Oct 15, 2023

marvinfriede commented Oct 15, 2023 •

edited

tgmaxson commented Oct 15, 2023

marvinfriede commented Oct 17, 2023

Segfault when above ~700 atoms using ASE #218

Segfault when above ~700 atoms using ASE #218

Comments

tgmaxson commented Oct 14, 2023

marvinfriede commented Oct 15, 2023

tgmaxson commented Oct 15, 2023

marvinfriede commented Oct 15, 2023 • edited

tgmaxson commented Oct 15, 2023

marvinfriede commented Oct 17, 2023

marvinfriede commented Oct 15, 2023 •

edited