Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

f-exx-opt: Results are dependent on compiler/library versions and # of MPI ranks #350

Open
connoraird opened this issue May 14, 2024 · 1 comment

Comments

@connoraird
Copy link
Contributor

connoraird commented May 14, 2024

Description

In the branch f-exx-opt the Conquest_out results appear to be dependent on the compiler/libraries and the number of MPI ranks you use.

Compiler and library versions and their results

The results mentioned are for running the benchmark test_EXX_isol_C2H4_4proc_PBE0ERI_fullSZP_0.4_SCF

Mac, Apple ARM

On an M2 mac, the following Homebrew versions of each dependency were used.

fftw v3.3.10
scalapack v2.2.0_1
openblas v0.3.27
libxc v6.2.2
lapack v3.12.0
openmpi v5.0.3
gcc (gfortran) v14.1.0

With the above, the Harris-Foulkes energy obtained is,
np-2, nt-1 - |* Harris-Foulkes energy = -13.443098419459020 Ha
np-4, nt-1 - |* Harris-Foulkes energy = -14.028830939450383 Ha

Linux

On a UCL cluster (myriad) the following version were used.

fftw v3.3.8
scalapack v2.1.0
openblas v0.3.7
libxc v6.2.2 compiled myself with gcc v9.2.0
lapack from the above openblas
openmpi v3.1.5
gcc (gfortran) v9.2.0

With the above, the Harris-Foulkes energy obtained was,
np-2, nt-1 - |* Harris-Foulkes energy = -13.300253717139059Ha
np-4, nt-1 - |* Harris-Foulkes energy = -13.561798008220549 Ha

@connoraird
Copy link
Contributor Author

connoraird commented May 14, 2024

Profiling results for test_EXX_isol_C2H4_4proc_PBE0ERI_fullSZP_0.4_SCF on myriad using the intel compiler and the following modules.

Currently Loaded Modulefiles:
 1) beta-modules               5) libxc/6.2.2/intel-2022   9) git/2.41.0-lfs-3.3.0
 2) gcc-libs/10.2.0            6) cmake/3.21.1            10) emacs/28.1
 3) compilers/intel/2022.2     7) python/3.8.6            11) userscripts/1.4.0
 4) mpi/intel/2021.6.0/intel   8) gerun

The results of these runs are:

np-2:
      |* Harris-Foulkes energy   =       -13.038957181266369 Ha
np-4:
      |* Harris-Foulkes energy   =       -12.967703483818312 Ha

When running with 4 ranks and 2 threads (np-4), about 25% of the time for rank 0's primary thread is spent at a call to MPI_Wait.

When running with 2 ranks and 4 threads (np-2), no time is spent in this call for the primary thread of rank 0.

connoraird added a commit that referenced this issue May 30, 2024
connoraird added a commit that referenced this issue May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant