Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mpich test failures on s390x #35

Open
drew-parsons opened this issue Feb 28, 2022 · 6 comments
Open

mpich test failures on s390x #35

drew-parsons opened this issue Feb 28, 2022 · 6 comments

Comments

@drew-parsons
Copy link

A build of armci-mpi with mpich 4.0 fails tests on s390x. Tests pass for Intel and ARM architectures (amd64 and arm64 and their lesser counterparts)

The build log is available at https://buildd.debian.org/status/fetch.php?pkg=armci-mpi&arch=s390x&ver=0.3.1%7Ebeta-5&stamp=1645753186&raw=0 .
Tests pass with openmpi but 16 tests fail with mpich:

mpicc.mpich -DHAVE_CONFIG_H -I. -I./src  -I./src -Wdate-time -D_FORTIFY_SOURCE=2  -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security  -pthread -c -o tests/contrib/non-blocking/simple.o tests/contrib/non-blocking/simple.c
/bin/bash ./libtool  --tag=CC   --mode=link mpicc.mpich  -g -O2 -ffile-prefix-map=/<<PKGBUILDDIR>>=. -fstack-protector-strong -Wformat -Werror=format-security  -pthread  -Wl,-z,relro -o tests/contrib/non-blocking/simple tests/contrib/non-blocking/simple.o libarmci-mpich.la -lm 
libtool: link: mpicc.mpich -g -O2 "-ffile-prefix-map=/<<PKGBUILDDIR>>=." -fstack-protector-strong -Wformat -Werror=format-security -pthread -Wl,-z -Wl,relro -o tests/contrib/non-blocking/simple tests/contrib/non-blocking/simple.o  ./.libs/libarmci-mpich.a -lm -pthread
make[3]: Leaving directory '/<<PKGBUILDDIR>>/build-mpich'
/usr/bin/make  check-TESTS
make[3]: Entering directory '/<<PKGBUILDDIR>>/build-mpich'
make[4]: Entering directory '/<<PKGBUILDDIR>>/build-mpich'
PASS: benchmarks/ping-pong
PASS: benchmarks/ring-flood
PASS: benchmarks/contiguous-bench
FAIL: benchmarks/strided-bench
PASS: benchmarks/rmw_perf
PASS: tests/test_onesided
PASS: tests/test_onesided_shared
PASS: tests/test_onesided_shared_dla
PASS: tests/test_mutex
PASS: tests/test_mutex_rmw
PASS: tests/test_mutex_trylock
PASS: tests/test_malloc_irreg
FAIL: tests/ARMCI_PutS_latency
FAIL: tests/ARMCI_AccS_latency
PASS: tests/test_groups
PASS: tests/test_group_split
PASS: tests/test_malloc_group
FAIL: tests/test_accs
FAIL: tests/test_accs_dla
FAIL: tests/test_puts
FAIL: tests/test_puts_gets
FAIL: tests/test_puts_gets_dla
FAIL: tests/test_putv
PASS: tests/test_igop
PASS: tests/test_rmw_fadd
PASS: tests/test_parmci
PASS: tests/mpi/test_mpi_accs
FAIL: tests/mpi/test_mpi_dim
FAIL: tests/mpi/test_mpi_indexed_accs
FAIL: tests/mpi/test_mpi_indexed_gets
FAIL: tests/mpi/test_mpi_indexed_puts_gets
FAIL: tests/mpi/test_mpi_subarray_accs
PASS: tests/mpi/test_win_create
PASS: tests/mpi/test_win_model
PASS: tests/ctree/ctree_test
PASS: tests/ctree/ctree_test_rand
PASS: tests/ctree/ctree_test_rand_interval
FAIL: tests/contrib/armci-perf
FAIL: tests/contrib/armci-test
PASS: tests/contrib/lu/lu-block
PASS: tests/contrib/lu/lu-b-bc
PASS: tests/contrib/transp1D/transp1D-c
PASS: tests/contrib/non-blocking/simple
============================================================================
Testsuite summary for armci 0.1
============================================================================
# TOTAL: 43
# PASS:  27
# SKIP:  0
# XFAIL: 0
# FAIL:  16
# XPASS: 0
# ERROR: 0

Further details of the errors are listed in the build log

There are essentially only two test errors here. Most of these failures all point at the same error

Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0

e.g.

FAIL: benchmarks/strided-bench
==============================

Starting one-sided strided performance test with 2 processes
   Trg. Rank    Xdim Ydim   Get (usec)   Put (usec)   Acc (usec)  Get (MiB/s)  Put (MiB/s)  Acc (MiB/s)
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b44c6) [0x3ffa1f344c6]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1fcfee) [0x3ffa1e7cfee]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1c6f94) [0x3ffa1e46f94]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1cd63c) [0x3ffa1e4d63c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25727e) [0x3ffa1ed727e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25a036) [0x3ffa1eda036]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25c590) [0x3ffa1edc590]
/usr/lib/s390x-linux-gnu/libmpich.so.12(PMPI_Accumulate+0xa94) [0x3ffa1d79864]
./benchmarks/strided-bench(+0x43ee) [0x2aa3c3843ee]
./benchmarks/strided-bench(+0x5828) [0x2aa3c385828]
./benchmarks/strided-bench(main+0x2ea) [0x2aa3c382f32]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x3ffa1a24c5e]
./benchmarks/strided-bench(+0x31f4) [0x2aa3c3831f4]
internal ABORT - process 0
FAIL benchmarks/strided-bench (exit status: 1)

looputil.c is actually in mpich not armci-mpi, maybe this is an mpich bug?
Not sure if it's relevant to looputil.c l.813 here, but we caught a bug in incorrect assumptions about how long double alignment was implemented on s390x, exposed in mpi4py, see mpi4py/mpi4py#91

The other error is in test_mpi_indexed_gets:

FAIL: tests/mpi/test_mpi_indexed_gets
=====================================

MPI RMA Strided Get Test:
0: Data validation failed at [318, 0] expected=1.000000 actual=19153196493101324300117002266184609761638785168706969756587673992816090829370440047833267676841021126741158161912149458901300240246916622245811317773215680681469166039489874870997064119253413911245961967859065159680.000000
1: Data validation failed at [318, 0] expected=2.000000 actual=19153196493101324300117002266184609761638785168706969756587673992816090829370440047833267676841021126741158161912149458901300240246916622245811317773215680681469166039489874870997064119253413911245961967859065159680.000000
0: Data validation failed at [345, 0] expected=1.000000 actual=2523265647856334203312318852546941707356501213688096169388892899082082058579155051647685988391931887920786317575791927786357084394346485273106592523647787658186823812963720560115712.000000
1: Data validation failed at [345, 0] expected=2.000000 actual=2523265647856334203312318852546941707356501213688096169388892899082082058579155051647685988391931887920786317575791927786357084394346485273106592523647787658186823812963720560115712.000000

I see an error like this if there is a mismatch in libmpich.so (e.g. on amd64, running armci-mpi tests with libarmci built against mpich 4.0 but then compiling tests using libmpich1.2 from mpich 3.4.1), but that kind of mismatch shouldn't apply to the s390x build-time test failure reported here.

For reference, various tests also fail at build time for other less common architectures, evidently for different reasons. Build logs are collected at https://buildd.debian.org/status/package.php?p=armci-mpi
On mips64el, test_mpi_indexed_gets fails on mpich, all tests pass with openmpi. On mipsel tests pass with mpich but fail with openmpi.

CI runtime (installation) test logs are collected at https://ci.debian.net/packages/a/armci-mpi/ (the version building with mpich is 0.3.1~beta-5 or later), showing the same test failure on s390x.

@jeffhammond
Copy link
Member

I have an idea of the problem. If MPICH fails and Open-MPI succeeds, then I suspect the MPICH datatypes code is broken.

Can you set the MPICH build to also use ARMCI_STRIDED_METHOD=IOV and ARMCI_IOV_METHOD=BATCHED on the s390x config?

@drew-parsons
Copy link
Author

With ARMCI_STRIDED_METHOD=IOV and ARMCI_IOV_METHOD=BATCHED, the five mpi tests still fail with the same error message (including test_mpi_indexed_gets reporting the different symptom), but the other 11 tests pass:

/usr/bin/make  check-TESTS
make[3]: Entering directory '/home/dparsons/armci/armci-mpi-0.3.1~beta/build-mpich'
make[4]: Entering directory '/home/dparsons/armci/armci-mpi-0.3.1~beta/build-mpich'
PASS: benchmarks/ping-pong
PASS: benchmarks/ring-flood
PASS: benchmarks/contiguous-bench
PASS: benchmarks/strided-bench
PASS: benchmarks/rmw_perf
PASS: tests/test_onesided
PASS: tests/test_onesided_shared
PASS: tests/test_onesided_shared_dla
PASS: tests/test_mutex
PASS: tests/test_mutex_rmw
PASS: tests/test_mutex_trylock
PASS: tests/test_malloc_irreg
PASS: tests/ARMCI_PutS_latency
PASS: tests/ARMCI_AccS_latency
PASS: tests/test_groups
PASS: tests/test_group_split
PASS: tests/test_malloc_group
PASS: tests/test_accs
PASS: tests/test_accs_dla
PASS: tests/test_puts
PASS: tests/test_puts_gets
PASS: tests/test_puts_gets_dla
PASS: tests/test_putv
PASS: tests/test_igop
PASS: tests/test_rmw_fadd
PASS: tests/test_parmci
PASS: tests/mpi/test_mpi_accs
FAIL: tests/mpi/test_mpi_dim
FAIL: tests/mpi/test_mpi_indexed_accs
FAIL: tests/mpi/test_mpi_indexed_gets
FAIL: tests/mpi/test_mpi_indexed_puts_gets
FAIL: tests/mpi/test_mpi_subarray_accs
PASS: tests/mpi/test_win_create
PASS: tests/mpi/test_win_model
PASS: tests/ctree/ctree_test
PASS: tests/ctree/ctree_test_rand
PASS: tests/ctree/ctree_test_rand_interval
PASS: tests/contrib/armci-perf
PASS: tests/contrib/armci-test
PASS: tests/contrib/lu/lu-block
PASS: tests/contrib/lu/lu-b-bc
PASS: tests/contrib/transp1D/transp1D-c
PASS: tests/contrib/non-blocking/simple
============================================================================
Testsuite summary for armci 0.1
============================================================================
# TOTAL: 43
# PASS:  38
# SKIP:  0
# XFAIL: 0
# FAIL:  5
# XPASS: 0
# ERROR: 0

There's a small variation in the PMPI function triggering the error. test_mpi_dim references PMPI_Accumulate:

FAIL: tests/mpi/test_mpi_dim
============================

MPI test program (2 processes)

Testing strided gets and puts
(Only std output for process 0 is printed)

--------array[5]--------
local[1:3] -> remote[0:2] -> local[1:3] 
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b3d76) [0x3ff7e2b3d76]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1fc89e) [0x3ff7e1fc89e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1c6774) [0x3ff7e1c6774]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1cce1c) [0x3ff7e1cce1c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x256b2e) [0x3ff7e256b2e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2598e6) [0x3ff7e2598e6]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25be40) [0x3ff7e25be40]
/usr/lib/s390x-linux-gnu/libmpich.so.12(PMPI_Accumulate+0xa94) [0x3ff7e0f9044]
./tests/mpi/test_mpi_dim(+0x2980) [0x2aa1bf02980]
./tests/mpi/test_mpi_dim(main+0x6a) [0x2aa1bf0123a]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x3ff7de24c5e]
./tests/mpi/test_mpi_dim(+0x1314) [0x2aa1bf01314]
internal ABORT - process 0
FAIL tests/mpi/test_mpi_dim (exit status: 1)

while the other 3 (apart from test_mpi_indexed_gets) reference PMPI_Win_unlock, e.g.

FAIL: tests/mpi/test_mpi_indexed_accs
=====================================

MPI RMA Strided Accumulate Test:
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b3d76) [0x3ff870b3d76]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1fc89e) [0x3ff86ffc89e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1c6774) [0x3ff86fc6774]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1cce1c) [0x3ff86fcce1c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x24dfde) [0x3ff8704dfde]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x270a40) [0x3ff87070a40]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x29125c) [0x3ff8709125c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x24fd46) [0x3ff8704fd46]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x251b20) [0x3ff87051b20]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25577a) [0x3ff8705577a]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x255ab6) [0x3ff87055ab6]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x238822) [0x3ff87038822]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x28c87e) [0x3ff8708c87e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2539e2) [0x3ff870539e2]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x26237c) [0x3ff8706237c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(PMPI_Win_unlock+0x310) [0x3ff86f0f1c0]
./tests/mpi/test_mpi_indexed_accs(main+0x21e) [0x2aa0d180fa6]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x3ff86c24c5e]
./tests/mpi/test_mpi_indexed_accs(+0x1314) [0x2aa0d181314]
internal ABORT - process 0
FAIL tests/mpi/test_mpi_indexed_accs (exit status: 1)

(likewise test_mpi_indexed_puts_gets and test_mpi_subarray_accs)
In the original build log, the test_mpi_indexed_accs referenced PMPI_Accumulate not PMPI_Win_unlock, though the other 2 already referenced PMPI_Win_unlock.

@drew-parsons
Copy link
Author

drew-parsons commented Feb 28, 2022

Actually, I need to report it might not be so straightforward. When I manually rebuild the original configuration on an s390x porterbox, without adding ARMCI_STRIDED_METHOD=IOV and ARMCI_IOV_METHOD=BATCHED, I get the same result. The five test_mpi_* tests fail for mpich, the other tests pass. Between the original build test errors and today's tests, our mpich was upgraded from 4.0 to 4.0.1, if that explains why the other tests now pass.

Without adding the extra flags, test_mpi_indexed_accs is triggered from PMPI_Accumulate, as before, not from PMPI_Win_unlock

FAIL: tests/mpi/test_mpi_indexed_accs
=====================================

MPI RMA Strided Accumulate Test:
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b3d76) [0x3ffbbbb3d76]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b3d76) [0x3ff8b133d76]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1fc89e) [0x3ff8b07c89e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1c6774) [0x3ff8b046774]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1cce1c) [0x3ff8b04ce1c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x24dfde) [0x3ff8b0cdfde]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x270a40) [0x3ff8b0f0a40]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x29125c) [0x3ff8b11125c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x24fd46) [0x3ff8b0cfd46]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x251b20) [0x3ff8b0d1b20]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25577a) [0x3ff8b0d577a]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x255ab6) [0x3ff8b0d5ab6]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x238822) [0x3ff8b0b8822]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x28c87e) [0x3ff8b10c87e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2539e2) [0x3ff8b0d39e2]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25942e) [0x3ff8b0d942e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25be40) [0x3ff8b0dbe40]
/usr/lib/s390x-linux-gnu/libmpich.so.12(PMPI_Accumulate+0xa94) [0x3ff8af79044]
./tests/mpi/test_mpi_indexed_accs(main+0x20e) [0x2aa25d80f96]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x3ff8aca4c5e]
./tests/mpi/test_mpi_indexed_accs(+0x1314) [0x2aa25d81314]
internal ABORT - process 0
FAIL tests/mpi/test_mpi_indexed_accs (exit status: 1)

@jeffhammond
Copy link
Member

Can you try again with ARMCI_IOV_METHOD=CONSRV, ARMCI_IOV_CHECKS=1, ARMCI_SHR_BUF_METHOD=COPY, ARMCI_RMA_NOCHECK=0, and ARMCI_NO_FLUSH_LOCAL=1? Those are the most conservative settings I can come up with, and might reveal something.

@drew-parsons
Copy link
Author

drew-parsons commented Mar 4, 2022

Hmm, with those settings (without ARMCI_STRIDED_METHOD=IOV) I'm back to 15 failures:

FAIL: benchmarks/strided-bench
FAIL: tests/ARMCI_PutS_latency
FAIL: tests/ARMCI_AccS_latency
FAIL: tests/test_accs
FAIL: tests/test_accs_dla
FAIL: tests/test_puts
FAIL: tests/test_puts_gets
FAIL: tests/test_puts_gets_dla
FAIL: tests/mpi/test_mpi_dim
FAIL: tests/mpi/test_mpi_indexed_accs
FAIL: tests/mpi/test_mpi_indexed_gets
FAIL: tests/mpi/test_mpi_indexed_puts_gets
FAIL: tests/mpi/test_mpi_subarray_accs
FAIL: tests/contrib/armci-perf
FAIL: tests/contrib/armci-test

with a touch more error output, just adding a short description of the test

AIL: benchmarks/strided-bench
==============================

Starting one-sided strided performance test with 2 processes
   Trg. Rank    Xdim Ydim   Get (usec)   Put (usec)   Acc (usec)  Get (MiB/s)  Put (MiB/s)  Acc (MiB/s)
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b3d76) [0x3ff83333d76]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1fc89e) [0x3ff8327c89e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1c6774) [0x3ff83246774]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1cce1c) [0x3ff8324ce1c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x256b2e) [0x3ff832d6b2e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2598e6) [0x3ff832d98e6]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25be40) [0x3ff832dbe40]
/usr/lib/s390x-linux-gnu/libmpich.so.12(PMPI_Accumulate+0xa94) [0x3ff83179044]
./benchmarks/strided-bench(+0x43ee) [0x2aa37e843ee]
./benchmarks/strided-bench(+0x5828) [0x2aa37e85828]
./benchmarks/strided-bench(main+0x2ea) [0x2aa37e82f32]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x3ff82e24c5e]
./benchmarks/strided-bench(+0x31f4) [0x2aa37e831f4]
internal ABORT - process 0
FAIL benchmarks/strided-bench (exit status: 1)

FAIL: tests/ARMCI_PutS_latency
==============================

ARMCI_PutS Latency - local and remote completions - in usec 
  Dimensions(array of doubles) Latency-LocalCompeltion Latency-RemoteCompletion
Assertion failed in file src/mpi/datatype/typerep/dataloop/looputil.c at line 815: *lengthp > 0
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2b3d76) [0x3ffb38b3d76]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1fc89e) [0x3ffb37fc89e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1c6774) [0x3ffb37c6774]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x1cce1c) [0x3ffb37cce1c]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x256b2e) [0x3ffb3856b2e]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x2598e6) [0x3ffb38598e6]
/usr/lib/s390x-linux-gnu/libmpich.so.12(+0x25be40) [0x3ffb385be40]
/usr/lib/s390x-linux-gnu/libmpich.so.12(PMPI_Accumulate+0xa94) [0x3ffb36f9044]
./tests/ARMCI_PutS_latency(+0x45be) [0x2aa1e3045be]
./tests/ARMCI_PutS_latency(+0x59f8) [0x2aa1e3059f8]
./tests/ARMCI_PutS_latency(main+0x1ae) [0x2aa1e302e96]
/lib/s390x-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x3ffb33a4c5e]
./tests/ARMCI_PutS_latency(+0x33c4) [0x2aa1e3033c4]
internal ABORT - process 0
FAIL tests/ARMCI_PutS_latency (exit status: 1)

@drew-parsons
Copy link
Author

If I activate ARMCI_STRIDED_METHOD=IOV alongside ARMCI_IOV_METHOD=CONSRV, ARMCI_IOV_CHECKS=1, ARMCI_SHR_BUF_METHOD=COPY, ARMCI_RMA_NOCHECK=0, and ARMCI_NO_FLUSH_LOCAL=1 then I'm back to the 5 failures.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants