Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NERSC Perlmutter multiple-node run issue #112

Open
biweidai opened this issue Jan 27, 2023 · 40 comments
Open

NERSC Perlmutter multiple-node run issue #112

biweidai opened this issue Jan 27, 2023 · 40 comments

Comments

@biweidai
Copy link
Collaborator

biweidai commented Jan 27, 2023

Hi,

I run FastPM with two nodes (256 MPI tasks) on Nersc Perlmutter, but got the following results at z=0:

fastpm_2node

I got this strange structure at top left and bottom right. We can also see the grid corresponds to different MPI tasks?

This only happens with multiple nodes. It works well with a single node (128 MPI tasks) on Perlmutter:

fastpm_1node

This is the linear density field generated with two nodes, which looks fine to me:

lineardensity_2node

The IC is generated at z=9 with 2LPT. This is the snapshot at z=9:

z9_2node

The code is compiled with Makefile.local.example. I am not sure whether I compile it correctly (there are some warnings but no errors during compilation)?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 27, 2023 via email

@biweidai
Copy link
Collaborator Author

Thanks for the fast reply!

I didn't enable the lightcone outputs. Does the code still compute lightcones?

Could you explain what regression analysis and PR mean?

The number of particles is correct. I used the same script to run FastPM and read the files. The only difference is the number of nodes and MPI tasks. With one node, I tried lots of different particle numbers and different MPI tasks, and they all work well. With multiple nodes, I also tried several different particle numbers and MPI tasks, and they all fail in a similar way.

Here is another run with 1024^3 particles and 512 MPI tasks (4 nodes):
fastpm_n1024_4node

And here is another run with 1280^3 particles and 640 MPI tasks (5 nodes).
fastpm_5node

Seems like this strange feature strongly depends on the number of nodes. I will try 2 nodes 128 ranks.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 27, 2023 via email

@biweidai
Copy link
Collaborator Author

I submitted the job and will do the comparison between 1 node and 2 nodes.

I think the z=9 snapshot plot in my first post is just 2LPT, since the IC is generated at z=9? We can already see two rectangles there at top left and bottom right.

I run FastPM with OMP_NUM_THREADS=1, but compiled it with -fopenmp, And yes, I compiled with gcc.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 27, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Jan 28, 2023

2LPT at z=9 with 1 node 128 MPI tasks:
2lpt_1node

2LPT at z=9 with 2 node 128 MPI tasks:
2lpt_2node

These two plots use the same colorbar. It seems that 2 nodes give larger density contrast, and the strange rectangles at top left and bottom right.

6.25% of the particles are the same between the two runs. I plot the particles that are the same between the two runs, and here are their locations:
2lpt_samep

So it seems that the MPI ranks at top left and bottom right are correct, and the other MPI ranks are wrong?

I tried compiling the code with openMPI (the default is cray MPICH), but currently have some problems with running jobs with openMPI and mpirun. I have submitted a ticket to NERSC for help.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 29, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Jan 29, 2023

Yes I tried to use openMPI (I was using cray MPICH before), but got some different problems with running the code with openmpi and mpirun. Most of the time the jobs just fail, but with some certain setup I get correct results with 2 nodes. I have submitted a ticket to NERSC for help. Here is the link to the ticket :

I don't think Perlmutter supports intel compiler, so we cannot use intel MPI?

I will check the other things you mentioned.

@biweidai
Copy link
Collaborator Author

The code seems correct with 2 node 2 task, but fails with 2 node 4 tasks. DX1 and DX2 are correct in rank 0 and rank 3, but both of them are wrong in rank 1 and rank 2.

Looking at the log files, the first deviation happens with sending/receiving the ghosts before 2LPT. Here is the log file from the correct run (1 node 4 task, 4^3 particles):

Sending ghosts: min = 0 max = 0 mean = 0 std = 0 [ pmghosts.c:173 ]
Receiving ghosts: min = 0 max = 0 mean = 0 std = 0 [ pmghosts.c:177 ]

And here is the log file from the incorrect run (2 node 4 task, 4^3 particles):

Sending ghosts: min = 0 max = 16 mean = 8 std = 8 [ pmghosts.c:173 ]
Receiving ghosts: min = 0 max = 16 mean = 8 std = 8 [ pmghosts.c:177 ]

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 30, 2023 via email

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 30, 2023 via email

@biweidai
Copy link
Collaborator Author

I was using 4 ranks for both runs (1 node * 4 rank v.s. 2 node * 2 rank/node), so they should give the same results?

I printed the particle information before 2LPT (uniform grid). With 2 nodes, rank 0 and rank 3 are correct. But the particles in rank 1 should belong to rank 2, so rank 1 sends all of its particles to rank 2. Similarly, the particles in rank 2 should belong to rank 1, so rank 2 sends all of its particles to rank 1. Could you think of any reasons that could cause this kind of bug?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 30, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Jan 30, 2023

Are the particle positions all wrong on the wrong ranks, or they are the
same as the 1 node run, but somehow the 2 node run wants to send them?

The particle positions are all wrong on the wrong ranks (i.e., they are not the same as the 1 node run).

The computation of the domain boundaries could be buggy?

Indeed. I checked "pm->IRegion.start" and "pm->IRegion.size" returned by "pm_init" function and they are incorrect. Is this because I didn't compile pfft correctly? Do I need to modify these two lines in Makefile.local?

PFFT_CONFIGURE_FLAGS = --enable-sse2 --enable-avx
PFFT_CFLAGS =

Why does it work on one node, but fails on several nodes?

@biweidai
Copy link
Collaborator Author

This can happen if one part of the program got a different value from MPI
comm get rank. One possibility is e.g the pfft library linked to a wrong
version of mpi?

Thanks! This is exactly what happens! The fastpm comm (MPI_COMM_WORLD) and pfft comm (Comm2D from "pfft_create_procmesh") give me different rank values. How can I fix this issue?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 31, 2023 via email

@biweidai
Copy link
Collaborator Author

Thanks. I also opened a ticket at NERSC. If you contact NERSC you could mention the ticket INC0197829.

I am not sure what moo is. The compiler seems to be cc, the same as fastpm. Here is the pfft-single.log file:

pfft-single.log

My Makefile.local is very similar to Makefile.local.example:

CC = cc
OPTIMIZE = -O3 -g
GSL_LIBS = -lgsl -lgslcblas
PFFT_CONFIGURE_FLAGS = --enable-sse2 --enable-avx
PFFT_CFLAGS =

I also tried CC = mpicc, but it doesn't make a difference.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 31, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Jan 31, 2023

The static compilation fails with the following error:

Archive file install/lib/libfftw3.a not found.
make[1]: *** [Makefile:39: libfastpm-dep.a] Error 1
make[1]: Leaving directory '/global/u1/b/biwei/fastpm/depends'
make: *** [Makefile:10: all] Error 2

And here is the compilation output related to pfft:

(make "CPPFLAGS=" "OPENMP=-fopenmp" "CC=cc -static" -f Makefile.pfft "PFFT_CONFIGURE_FLAGS=--enable-sse2 --enable-avx" "PFFT_CFLAGS=")
make[2]: Entering directory '/global/u1/b/biwei/fastpm/depends'
mkdir -p download
curl -L -o download/pfft-1.0.8-alpha3-fftw3.tar.gz https://github.com/rainwoodman/pfft/releases/download/1.0.8-alpha3-fftw3/pfft-1.0.8-alpha3-fftw3.tar.gz ; \

% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 4408k 100 4408k 0 0 1794k 0 0:00:02 0:00:02 --:--:-- 3748k
mkdir -p src ;
gzip -dc download/pfft-1.0.8-alpha3-fftw3.tar.gz | tar xf - -C src ;
touch /global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure
mkdir -p double;
(cd double;
/global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure --prefix=/global/u1/b/biwei/fastpm/depends/install --disable-shared --enable-static
--disable-fortran --disable-doc --enable-mpi --enable-sse2 --enable-avx --enable-openmp "CFLAGS=" "CC=cc -static" "MPICC=cc -static"
2>&1 ;
make -j 4 2>&1 ;
make install 2>&1;
) | tee pfft-double.log | tail
checking for function MPI_Init in -lmpich... no
configure: error: in /global/homes/b/biwei/fastpm/depends/double': configure: error: PFFT requires an MPI C compiler. See config.log' for more details
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: *** No targets specified and no makefile found. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: *** No rule to make target 'install'. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/double'
mkdir -p single;
(cd single;
/global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure --prefix=/global/u1/b/biwei/fastpm/depends/install --enable-single --disable-shared --enable-static
--disable-fortran --disable-doc --enable-mpi --enable-sse --enable-avx --enable-openmp "CFLAGS=" "CC=cc -static" "MPICC=cc -static"
2>&1 ;
make -j 4 2>&1 ;
make install 2>&1;
) | tee pfft-single.log | tail
checking for function MPI_Init in -lmpich... no
configure: error: in /global/homes/b/biwei/fastpm/depends/single': configure: error: PFFT requires an MPI C compiler. See config.log' for more details
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: *** No targets specified and no makefile found. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: *** No rule to make target 'install'. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/single'
make[2]: Leaving directory '/global/u1/b/biwei/fastpm/depends'

Here are the config.log and pfft-single.log:

config.log
pfft-single.log

By the way, there is a warning for static compilation on Perlmutter (see https://docs.nersc.gov/development/compilers/wrappers/#static-compilation ):

Static linking can fail on Perlmutter
Please note that static compilation is not supported by NERSC, and it was observed that building statically linked executables can fail as the compiler wrappers may not properly link necessary static PE libraries.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Jan 31, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 1, 2023

I got the errors that "C compiler cannot create executables"

(make "CPPFLAGS=" "OPENMP=-fopenmp" "CC=cc" -f Makefile.pfft "PFFT_CONFIGURE_FLAGS=--enable-sse2 --enable-avx" "PFFT_CFLAGS=-static")
make[2]: Entering directory '/global/u1/b/biwei/fastpm/depends'
mkdir -p src ;
gzip -dc download/pfft-1.0.8-alpha3-fftw3.tar.gz | tar xf - -C src ;
touch /global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure
mkdir -p double;
(cd double;
/global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure --prefix=/global/u1/b/biwei/fastpm/depends/install --disable-shared --enable-static
--disable-fortran --disable-doc --enable-mpi --enable-sse2 --enable-avx --enable-openmp "CFLAGS=-static" "CC=cc" "MPICC=cc"
2>&1 ;
make -j 4 2>&1 ;
make install 2>&1;
) | tee pfft-double.log | tail
checking whether the C compiler works... no
configure: error: in /global/homes/b/biwei/fastpm/depends/double': configure: error: C compiler cannot create executables See config.log' for more details
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: *** No targets specified and no makefile found. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: *** No rule to make target 'install'. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/double'
mkdir -p single;
(cd single;
/global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure --prefix=/global/u1/b/biwei/fastpm/depends/install --enable-single --disable-shared --enable-static
--disable-fortran --disable-doc --enable-mpi --enable-sse --enable-avx --enable-openmp "CFLAGS=-static" "CC=cc" "MPICC=cc"
2>&1 ;
make -j 4 2>&1 ;
make install 2>&1;
) | tee pfft-single.log | tail
checking whether the C compiler works... no
configure: error: in /global/homes/b/biwei/fastpm/depends/single': configure: error: C compiler cannot create executables See config.log' for more details
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: *** No targets specified and no makefile found. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: *** No rule to make target 'install'. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/single'
make[2]: Leaving directory '/global/u1/b/biwei/fastpm/depends'

config.log
pfft-single.log

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 1, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 1, 2023

Yes gpu and cudatoolkit are loaded. After removing these two modules, I got back the "PFFT requires an MPI C compiler." error..

(make "CPPFLAGS=" "OPENMP=-fopenmp" "CC=cc" -f Makefile.pfft "PFFT_CONFIGURE_FLAGS=--enable-sse2 --enable-avx" "PFFT_CFLAGS=-static")
make[2]: Entering directory '/global/u1/b/biwei/fastpm/depends'
mkdir -p src ;
gzip -dc download/pfft-1.0.8-alpha3-fftw3.tar.gz | tar xf - -C src ;
touch /global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure
mkdir -p double;
(cd double;
/global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure --prefix=/global/u1/b/biwei/fastpm/depends/install --disable-shared --enable-static
--disable-fortran --disable-doc --enable-mpi --enable-sse2 --enable-avx --enable-openmp "CFLAGS=-static" "CC=cc" "MPICC=cc"
2>&1 ;
make -j 4 2>&1 ;
make install 2>&1;
) | tee pfft-double.log | tail
checking for function MPI_Init in -lmpich... no
configure: error: in /global/homes/b/biwei/fastpm/depends/double': configure: error: PFFT requires an MPI C compiler. See config.log' for more details
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: *** No targets specified and no makefile found. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/double'
make[3]: *** No rule to make target 'install'. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/double'
mkdir -p single;
(cd single;
/global/u1/b/biwei/fastpm/depends/src/pfft-1.0.8-alpha3-fftw3/configure --prefix=/global/u1/b/biwei/fastpm/depends/install --enable-single --disable-shared --enable-static
--disable-fortran --disable-doc --enable-mpi --enable-sse --enable-avx --enable-openmp "CFLAGS=-static" "CC=cc" "MPICC=cc"
2>&1 ;
make -j 4 2>&1 ;
make install 2>&1;
) | tee pfft-single.log | tail
checking for function MPI_Init in -lmpich... no
configure: error: in /global/homes/b/biwei/fastpm/depends/single': configure: error: PFFT requires an MPI C compiler. See config.log' for more details
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: *** No targets specified and no makefile found. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: Entering directory '/global/u1/b/biwei/fastpm/depends/single'
make[3]: *** No rule to make target 'install'. Stop.
make[3]: Leaving directory '/global/u1/b/biwei/fastpm/depends/single'
make[2]: Leaving directory '/global/u1/b/biwei/fastpm/depends'

pfft-single.log
config.log

Here are the modules I loaded:

Currently Loaded Modules:

  1. craype-x86-milan 3) craype-network-ofi 5) PrgEnv-gnu/8.3.3 7) cray-libsci/22.11.1.2 9) craype/2.7.19 11) perftools-base/22.09.0 13) xalt/2.10.2 15) cray-pmi/6.1.7
  2. libfabric/1.15.2.0 4) xpmem/2.5.2-2.4_3.20__gd0f7936.shasta 6) cray-dsmml/0.2.2 8) cray-mpich/8.1.22 10) gcc/11.2.0 12) cpe/22.11 14) gsl/2.7

I also asked NERSC about static compilation on Perlmutter, and they said that "Static compilation is not supported on Perlmutter."

Can we reorder the rank values so that fastpm rank and pfft rank are consistent? Do you think this will fix the problem?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 1, 2023 via email

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 1, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 1, 2023

Here are the compilation log files that work on one node.

pfft-single.log
config.log

I am not sure about shifter. I will read the NERSC shifter documentation and see if there is any related information.

Perlmutter is down now. I can check which mpi libraries cc used after it is back online.

Maybe I can remove the --enable-static flag in Makefile.pfft?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 1, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 1, 2023

Here is the output of ldd fastpm

linux-vdso.so.1 (0x00007fffef5f8000)
libgsl.so.25 => /global/common/software/spackecp/perlmutter/e4s-22.05/75197/spack/opt/spack/cray-sles15-zen3/gcc-11.2.0/gsl-2.7-fhx3zdzzsac7koioqjzpx2uvg4wg4caw/lib/libgsl.so.25 (0x00001461cb6e0000)
libgslcblas.so.0 => /global/common/software/spackecp/perlmutter/e4s-22.05/75197/spack/opt/spack/cray-sles15-zen3/gcc-11.2.0/gsl-2.7-fhx3zdzzsac7koioqjzpx2uvg4wg4caw/lib/libgslcblas.so.0 (0x00001461cb4a3000)
libm.so.6 => /lib64/libm.so.6 (0x00001461cb158000)
libcuda.so.1 => /usr/lib64/libcuda.so.1 (0x00001461c9cf4000)
libmpi_gnu_91.so.12 => /opt/cray/pe/lib64/libmpi_gnu_91.so.12 (0x00001461c70d6000)
libmpi_gtl_cuda.so.0 => /opt/cray/pe/lib64/libmpi_gtl_cuda.so.0 (0x00001461c6e92000)
libdl.so.2 => /lib64/libdl.so.2 (0x00001461c6c8e000)
libxpmem.so.0 => /opt/cray/xpmem/default/lib64/libxpmem.so.0 (0x00001461c6a8b000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00001461c6868000)
libgomp.so.1 => /opt/cray/pe/gcc-libs/libgomp.so.1 (0x00001461c6621000)
libc.so.6 => /lib64/libc.so.6 (0x00001461c622c000)
/lib64/ld-linux-x86-64.so.2 (0x00001461cbbb4000)
librt.so.1 => /lib64/librt.so.1 (0x00001461c6023000)
libfabric.so.1 => /opt/cray/libfabric/1.15.2.0/lib64/libfabric.so.1 (0x00001461c5d31000)
libatomic.so.1 => /opt/cray/pe/gcc-libs/libatomic.so.1 (0x00001461c5b28000)
libpmi.so.0 => /opt/cray/pe/lib64/libpmi.so.0 (0x00001461c5926000)
libpmi2.so.0 => /opt/cray/pe/lib64/libpmi2.so.0 (0x00001461c56ed000)
libgfortran.so.5 => /opt/cray/pe/gcc-libs/libgfortran.so.5 (0x00001461c5222000)
libgcc_s.so.1 => /opt/cray/pe/gcc-libs/libgcc_s.so.1 (0x00001461c5003000)
libquadmath.so.0 => /opt/cray/pe/gcc-libs/libquadmath.so.0 (0x00001461c4dbe000)
libcudart.so.11.0 => /opt/nvidia/hpc_sdk/Linux_x86_64/22.5/cuda/11.7/lib64/libcudart.so.11.0 (0x00001461c4b19000)
libstdc++.so.6 => /opt/cray/pe/gcc-libs/libstdc++.so.6 (0x00001461c46f7000)
libcxi.so.1 => /usr/lib64/libcxi.so.1 (0x00001461c44d2000)
libcurl.so.4 => /usr/lib64/libcurl.so.4 (0x00001461c4234000)
libjson-c.so.3 => /usr/lib64/libjson-c.so.3 (0x00001461c4024000)
libpals.so.0 => /opt/cray/pe/lib64/libpals.so.0 (0x00001461c3e1f000)
libnghttp2.so.14 => /usr/lib64/libnghttp2.so.14 (0x00001461c3bf7000)
libidn2.so.0 => /usr/lib64/libidn2.so.0 (0x00001461c39da000)
libssh.so.4 => /usr/lib64/libssh.so.4 (0x00001461c376c000)
libpsl.so.5 => /usr/lib64/libpsl.so.5 (0x00001461c355a000)
libssl.so.1.1 => /usr/lib64/libssl.so.1.1 (0x00001461c32bc000)
libcrypto.so.1.1 => /usr/lib64/libcrypto.so.1.1 (0x00001461c2d82000)
libgssapi_krb5.so.2 => /usr/lib64/libgssapi_krb5.so.2 (0x00001461c2b30000)
libldap_r-2.4.so.2 => /usr/lib64/libldap_r-2.4.so.2 (0x00001461c28dc000)
liblber-2.4.so.2 => /usr/lib64/liblber-2.4.so.2 (0x00001461c26cd000)
libzstd.so.1 => /usr/lib64/libzstd.so.1 (0x00001461c239d000)
libbrotlidec.so.1 => /usr/lib64/libbrotlidec.so.1 (0x00001461c2191000)
libz.so.1 => /lib64/libz.so.1 (0x00001461c1f7a000)
libunistring.so.2 => /usr/lib64/libunistring.so.2 (0x00001461c1bf7000)
libkrb5.so.3 => /usr/lib64/libkrb5.so.3 (0x00001461c191e000)
libk5crypto.so.3 => /usr/lib64/libk5crypto.so.3 (0x00001461c1706000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00001461c1502000)
libkrb5support.so.0 => /usr/lib64/libkrb5support.so.0 (0x00001461c12f3000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00001461c10db000)
libsasl2.so.3 => /usr/lib64/libsasl2.so.3 (0x00001461c0ebe000)
libbrotlicommon.so.1 => /usr/lib64/libbrotlicommon.so.1 (0x00001461c0c9d000)
libkeyutils.so.1 => /usr/lib64/libkeyutils.so.1 (0x00001461c0a98000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00001461c086f000)
libpcre.so.1 => /usr/lib64/libpcre.so.1 (0x00001461c05e6000)

For compiler mpicc, here is the output of "mpicc -show"

gcc -I/opt/cray/pe/mpich/8.1.22/ofi/gnu/9.1/include -L/opt/cray/pe/mpich/8.1.22/ofi/gnu/9.1/lib -lmpi_gnu_91

For compiler cc, I am not sure how to print out its full command line information, since it doesn't recognize "cc -show" or "cc -link_info".

I tried disabling the shift modules with "#SBATCH --module=none" in the job script, but it makes no difference: https://docs.nersc.gov/development/shifter/how-to-use/#shifter-modules

Removing the "--disable-shared" and "--enable-statc" flags in Makefile.pfft also makes no difference.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 2, 2023

Here is the output of "objdump -t -T fastpm" (it's too long so I put it in a text file):

output.txt

Is there a way to check whether fastpm and mpsort agrees on mpi rank? For pfft, it returns its communicator Comm2D so I can check the pfft rank values in fastpm. For mpsort does it also have a similar MPI communicator in fastpm?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 2, 2023

I see! So fastpm and pfft are using the same mpi library, but pfft reorders the ranks so it's no longer consistent with fastpm.

I wonder if this reorder depends on mesh size? For example, particle mesh and force mesh may have different resolutions. Do they give the same Comm2D?

If yes, then we can call pfft_create_procmesh at the beginning of fastpm, and use the returned comm2d?

Or can we replace pfft_create_procmesh with MPI_Cart_create and set reorder=0?

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 2, 2023

For approach 2, simply replacing pfft_create_procmesh with

int periods[2] = {1,1};
MPI_Cart_create(comm, 2, pm->Nproc, periods, 0, &pm->Comm2D);

fix the problem. Does pfft_create_procmesh do anything more than the above two lines? Do you want me to make a pull request on this? Or do you want to try approach 1 and compare the performance first?

@biweidai
Copy link
Collaborator Author

biweidai commented Feb 2, 2023

Ah I was wrong about approach 1. It only changes the CPU - MPI rank mapping, and doesn't change the MPI rank - domain mapping. Replacing all the fastpm comm with comm2d should be enough.

@rainwoodman
Copy link
Collaborator

rainwoodman commented Feb 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants