Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with QE v6.3 with and without openmp #10

Open
rolly-ng opened this issue Sep 11, 2018 · 2 comments
Open

Issue with QE v6.3 with and without openmp #10

rolly-ng opened this issue Sep 11, 2018 · 2 comments
Assignees

Comments

@rolly-ng
Copy link

rolly-ng commented Sep 11, 2018

Hi,
I have parallel studio 2017 update 7 and I have successfully compiled ELPA 2017.11.001 then QE v6.3 via the configure-xxx-hsw.sh script.
It is okay when I try to run QE v6.3 on a single node in the cluster, i.e. srun -p ABC -N 1 -n 176 pw.x < my.in > my.out.
However, once I try over 2 nodes, i.e. srun -p ABC -N 2 -n 352 pw.x < my.in > my.out, it produces the strange "Error in routine cdiaghg problems computing cholesky" error.
If I compile ELPA and QE with configure-xxx-hsw-omp.sh script, it is also okay for single node. However, if 2 nodes, it produces "PMPI_Group_incl: Invalid rank, error stack:" message in the slurm-xxx.out
Could you please have a look at QE v6.3?

Moreover, conventional compilation without xconfigure run okay across multi nodes, i.e. ./configure CC=icc CXX=icpc F77=ifort F90=ifort MPIF90=mpiifort --enable-shared --enable-parallel --disable-openmp --with-scalapack=intel CFLAGS="-O3 -I -xCORE-AVX2" CXXFLAGS="-O3 -I -xCORE-AVX2" FCFLAGS="-O3 -I -xCORE-AVX2" F90FLAGS="-O3 -I -xCORE-AVX2" FFLAGS="-O3 -I -xCORE-AVX2

Thanks,
Rolly

@hfp hfp self-assigned this Sep 23, 2018
@hfp
Copy link
Owner

hfp commented Sep 25, 2018

Thank you for the report! At a first look, this looks like a problem only occurring when ELPA is incorporated. I may step back from ELPA as a default with Xconfigure, or find a version that works again.

@rolly-ng
Copy link
Author

Hi Hans,
I have done some further tests and found that the -D__NON_BLOCKING_SCATTER in QE make.inc creates the problem.
I have compiled ELPA as instructed, then remove this parameter in QE make.inc. The v6.3 runs, but I have to make use of pw.x -nk 2 to speed up the parallel speed. Otherwise, 2 nodes runs slower then 1 node on the AUSURF112 benchmark.
Not sure if -nk 2 can fix the problem?
Thanks,
Rolly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants