You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While trying to run some of the MPI tests under a test instance, I noticed that MPI bootstrap with mvapich2-2.3.7-intel hangs when there are multiple tasks per node:
$ flux run -N4 -n4 t/mpi/hello
f5Cm644CF: completed MPI_Init in 1.110s. There are 4 tasks
f5Cm644CF: completed first barrier in 0.028s
f5Cm644CF: completed MPI_Finalize in 0.041s
$ flux run -N4 -n8 t/mpi/hello
[hangs]
This is with v0.60.0. This reproduces on current master, but instead of a hang, a task get a Bus Error:
$ flux run -N4 -n8 t/mpi/hello
[corona212:mpi_rank_2][error_sighandler] Caught error: Bus error (signal 7)
The bus error occurs in MPIDI_CH3I_CM_SHMEM_Sync (). There are no details in the backtrace since debug symbols are not available.
The text was updated successfully, but these errors were encountered:
Does the gitlab CI run in a multiple brokers per node configuration? If not, I guess would could add that because having that working does aid in testing I suppose.
While trying to run some of the MPI tests under a test instance, I noticed that MPI bootstrap with mvapich2-2.3.7-intel hangs when there are multiple tasks per node:
This is with v0.60.0. This reproduces on current master, but instead of a hang, a task get a Bus Error:
The bus error occurs in
MPIDI_CH3I_CM_SHMEM_Sync ()
. There are no details in the backtrace since debug symbols are not available.The text was updated successfully, but these errors were encountered: