Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI tests on s390x #3362

Open
LecrisUT opened this issue Apr 23, 2024 · 0 comments
Open

MPI tests on s390x #3362

LecrisUT opened this issue Apr 23, 2024 · 0 comments

Comments

@LecrisUT
Copy link
Contributor

This issue is to document the MPI testing issues on s390x architectures in the unlikely case a RedHat engineer fancies to have a look. If upstream wish to debug it, s390x architecture is available on copr (also ppc if they wish to officially support it) to replicate this issue and add more tracebacks.

The issue was discovered in: https://src.fedoraproject.org/rpms/cp2k/pull-request/6 and should be replicable in the final form of that PR by removing the ExcludeArch: s390x. The error experienced varies, sometimes 1/6 unit-tests fail, sometimes 3/6, but it seems to be an MPI issue.

Error traceback
------------------------------- Errors ---------------------------------
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
/builddir/build/BUILD/cp2k-2024.1/regtesting/local_mpich/psmp/TEST-local_mpich-psmp-2024-04-22_21-01-01/UNIT/libcp2k_unittest/libcp2k_unittest.out
Unit test starts ...
Testing cp_c_get_version(): CP2K version 2024.1.
Unit test starts ...
Testing cp_c_get_version(): CP2K version 2024.1.
  **** **** ******  **  PROGRAM STARTED AT               2024-04-22 21:01:03.378
 ***** ** ***  *** **   PROGRAM STARTED ON        fa7470e99a474cfa89624bb3b90776
 **    ****   ******    PROGRAM STARTED BY                             mockbuild
 ***** **    ** ** **   PROGRAM PROCESS ID                                 59164
  **** **  *******  **  PROGRAM STARTED IN /builddir/build/BUILD/cp2k-2024.1/reg
                                           testing/local_mpich/psmp/TEST-local_m
                                           pich-psmp-2024-04-22_21-01-01/UNIT/li
                                           bcp2k_unittest
 CP2K| version string:                                       CP2K version 2024.1
 CP2K| source code revision number:                                             
 CP2K| cp2kflags: omp fftw3 libxc parallel mpi_f08 scalapack spglib             
 CP2K| is freely available from                            https://www.cp2k.org/
 CP2K| Program compiled at                                   2024-04-22 00:00:00
 CP2K| Program compiled on                                                      
 CP2K| Program compiled for                                                s390x
 CP2K| Data directory path    /builddir/build/BUILDROOT/cp2k-2024.1-5.fc41.s390x
 CP2K| Input file name                                                    H2.inp
 *******************************************************************************
 *             MPI error 340866319 in mpi_bcast @ mp_bcast_iv_src : Other MPI  *
 *           error, error stack:
internal_Bcast(7723).......................:  *
 *   ___        MPI_Bcast(buffer=0x2aa255108c0, count=1000, MPI_INTEGER, 0,    *
 *  /   \  MPI_COMM_WORLD) failed
MPID_Bcast(272)............................: *
 * [ABORT]             
MPIDI_Bcast_allcomm_composition_json(225)..:           *
 *  \___/              
MPIDI_POSIX_mpi_bcast(238).................:           *
 *    |     
MPIDI_POSIX_mpi_release_gather_release(218): message sizes do not *
 *  O/|      match across processes in the collective routine: Received 0 but  *
 * /| |                                expected 4000                           *
 * / \                                                  message_passing.F:1305 *
 *******************************************************************************
 ===== Routine Calling Stack ===== 
            6 broadcast_input_information
            5 parser_read_line_low
            4 parser_read_line
            3 section_vals_parse
            2 read_input
            1 CP2K
Abort(1) on node 1 (rank 1 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
STOP 1
Runtime failure with code 1.
------------------------------- Timings --------------------------------
Plot: name="timings", title="Timing Distribution", ylabel="time [s]"
PlotPoint: name="100th_percentile", plot="timings", label="100th %ile", y=1.22, yerr=0.0
PlotPoint: name="99th_percentile", plot="timings", label="99th %ile", y=1.20, yerr=0.0
PlotPoint: name="98th_percentile", plot="timings", label="98th %ile", y=1.18, yerr=0.0
PlotPoint: name="95th_percentile", plot="timings", label="95th %ile", y=1.11, yerr=0.0
PlotPoint: name="90th_percentile", plot="timings", label="90th %ile", y=1.01, yerr=0.0
PlotPoint: name="80th_percentile", plot="timings", label="80th %ile", y=0.80, yerr=0.0
----------------------------- Slow Tests -------------------------------
Duration threshold (2x 95th %ile): 2.22 sec
Found 0 slow tests (0 suppressed):
------------------------------- Summary --------------------------------
Number of FAILED  tests 1
Number of WRONG   tests 0
Number of CORRECT tests 5
Total number of   tests 6
Summary: correct: 5 / 6; failed: 1; 0min
Status: FAILED
*************************** Testing ended ******************************
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant