Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

armci-mpi checks segfaulted on OpenMPI/3.1.4 #32

Open
wirawan0 opened this issue Apr 15, 2021 · 1 comment
Open

armci-mpi checks segfaulted on OpenMPI/3.1.4 #32

wirawan0 opened this issue Apr 15, 2021 · 1 comment

Comments

@wirawan0
Copy link

Hi Jeff,

I tried a second attempt to build armci-mpi outside container with OpenMPI 3.1.4 and UCX 1.9. The cluster has Infiniband EDR hardware, OS: Ubuntu Linux 18.04. I tried to build the software with stock gcc/gfortran on the OS.

This time around, I encountered a different set of issues. I have 10 tests that fail consistently:

FAIL: benchmarks/ping-pong
FAIL: benchmarks/ring-flood
FAIL: benchmarks/contiguous-bench
FAIL: benchmarks/strided-bench
FAIL: benchmarks/rmw_perf
FAIL: tests/ARMCI_AccS_latency
FAIL: tests/test_rmw_fadd
FAIL: tests/mpi/test_mpi_dim
FAIL: tests/contrib/armci-perf
FAIL: tests/contrib/armci-test

The cause of error is obvious--due to the function calls PMPI_Accumulate or MPI_Fetch_and_op under the hood:

$ grep -e FAIL -e MPI_ test-suite.log
# XFAIL: 0
# FAIL:  10
FAIL: benchmarks/ping-pong
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7f2cedc2cf51]
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7fa2c6df9f51]
FAIL benchmarks/ping-pong (exit status: 139)
FAIL: benchmarks/ring-flood
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7f85e9cabf51]
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7f248dd0af51]
FAIL benchmarks/ring-flood (exit status: 139)
FAIL: benchmarks/contiguous-bench
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7ffa903f6f51]
FAIL benchmarks/contiguous-bench (exit status: 139)
FAIL: benchmarks/strided-bench
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7fd8ac115f51]
FAIL benchmarks/strided-bench (exit status: 139)
FAIL: benchmarks/rmw_perf
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(MPI_Fetch_and_op+0xf5) [0x7f193bf270c5]
FAIL benchmarks/rmw_perf (exit status: 139)
FAIL: tests/ARMCI_AccS_latency
 8  /shared/apps/auto/openmpi/3.1.4-gcc-7.3.0-kesl/lib/libmpi.so.40(PMPI_Accumulate+0x101) [0x7f0655c6ef51]
FAIL tests/ARMCI_AccS_latency (exit status: 139)
[redacted]

I feel I may have to go to OpenMPI forum to get help to resolve this, but I want to see if you have ever encountered this kind of issue or have any insight.

Wirawan

@wirawan0 wirawan0 changed the title armci-mpi checks failed on OpenMPI/3.1.4 armci-mpi checks segfaulted on OpenMPI/3.1.4 Apr 15, 2021
@jeffhammond
Copy link
Member

This is new to me but I'll try to test things later. I'm busy right now but please feel free to remind me if I don't post any updates by the end of April.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants