Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda error out of memory with Wilson Fermions on Volta V100 GPUs #362

Open
LupoA opened this issue Jun 30, 2021 · 0 comments
Open

Cuda error out of memory with Wilson Fermions on Volta V100 GPUs #362

LupoA opened this issue Jun 30, 2021 · 0 comments

Comments

@LupoA
Copy link
Contributor

LupoA commented Jun 30, 2021

Hi, I am having trouble running some applications with Grid (develop) on Marconi100 at CINECA (2xIBM power AC922 with 4 NVIDIA Volta V100 GPUs, NVLink 2.0)

I am using

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Thu_Oct_24_17:58:26_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89

gcc (GCC) 8.4.0

mpirun (IBM Spectrum MPI) 10.3.1.02rtm0

The error appears in different tests involving Wilson fermions. The one we ultimately need to run is Test_hmc_WCMixedRepFG_Production with Nc=4 but the error can be reproduced by simply running Test_wilson_force with Nc=3, see below. The error message is

accelerator_barrier(): Cuda error out of memory 
File /m100/home/userexternal/alupo000/production_src/Grid/Grid/lattice/Lattice_local.h Line 83
Test_wilson_force: /m100/home/userexternal/alupo000/production_src/Grid/Grid/lattice/Lattice_local.h:83: Grid::Lattice<decltype (Grid::outerProduct(ll(), rr()))> Grid::outerProduct(const Grid::Lattice<vobj>&, const Grid::Lattice<obj2>&) [with ll = Grid::iScalar<Grid::iVector<Grid::iVector<Grid::Grid_simd<thrust::complex<double>, Grid::GpuVector<4, Grid::GpuComplex<double2> > >, 3>, 4> >; rr = Grid::iScalar<Grid::iVector<Grid::iVector<Grid::Grid_simd<thrust::complex<double>, Grid::GpuVector<4, Grid::GpuComplex<double2> > >, 3>, 4> >; decltype (Grid::outerProduct(ll(), rr())) = Grid::iScalar<Grid::iMatrix<Grid::iMatrix<Grid::Grid_simd<thrust::complex<double>, Grid::GpuVector<4, Grid::GpuComplex<double2> > >, 3>, 4> >]: Assertion `err==cudaSuccess' failed.

Let me list a few cases in which the error appears. In all the following examples I am using 4^4 local lattices.

Test_wilson_force Nc=3, 1 node and 2 GPUs works (e.g. mpirun -np 2 Test_wilson_force —grid 4.4.4.8 —mpi 1.1.1.2)
Test_wilson_force Nc=3, 1 node and 4 GPUs fails (e.g. mpirun -np 4 Test_wilson_force —grid 4.4.8.8 —mpi 1.1.2.2)
Test_wilson_force Nc=4, 1 node and 2 GPU fails
Test_wilson_force Nc=4, 1 node and 4 GPU fails
Test_hmc_WCMixedRepFG_Production fails always when running on GPUs.

Other informations:
-Benchmark dwf, ITT, and comms_host_device work fine.
-The error does not appear on Jureca.
-I ran Test_wilson_force with the —men-debug option and the profile I see for the allocated memory is the same for jureca and marconi100 (until the latter dies).
-Reducing --enable-gen-simd-width allows Test_wilson_force to work but Test_hmc_WCMixedRepFG_Production will ultimately fail, especially with Nc=4.

The configure line I am using is inspired by the instructions for Summit on the grid wiki.

../configure
--enable-comms=mpi
--enable-simd=GPU
CXX=nvcc
CXXFLAGS="-ccbin mpicxx -gencode arch=compute_70,code=sm_70 -I$HOME/prefix/include/ -std=c++11 -Xcompiler -mno-float128"
--disable-gparity
--enable-accelerator=cuda
--enable-shm=nvlink
--enable-accelerator-cshift
--enable-Nc=4

The cxxflag “-Xcompiler -mno-float128” seems necessary when using Cuda 10 with gcc. The most recent Cuda version on Marconi100 is 11.0, which I also tried by disabling the macro error in CompilerCompatible.h, but the error persists.
I have tried to add and remove several configure options with no luck, e.g. the ones suggested in #346.

I am attaching config.log, grid.configure.summary and the output of make v=1

thanks for the help,
Alessandro

config.log
grid.configure.summary.log
makeV1.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant