You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran into an issue with the ssh launcher when MPICH is configured to use CUDA on Ubuntu22. hydra_pmi_proxy runs into a linker error when it can't find libcudart.so:
user@frogfish:~$ mpirun -n 2 -hosts frogfish,kingfish -launcher ssh echo test
test
/usr/local/bin/hydra_pmi_proxy: error while loading shared libraries: libcudart.so.12: cannot open shared object file: No such file or directory
[mpiexec@frogfish] ui_cmd_cb (mpiexec/pmiserv_pmci.c:51): Launch proxy failed.
[mpiexec@frogfish] HYDT_dmxu_poll_wait_for_event (lib/tools/demux/demux_poll.c:76): callback returned error status
[mpiexec@frogfish] HYD_pmci_wait_for_completion (mpiexec/pmiserv_pmci.c:173): error waiting for event
[mpiexec@frogfish] main (mpiexec/mpiexec.c:260): process manager error waiting for completion
Workaround:
Since it's a linker error with hydra_pmi_proxy, the intuitive places to set LD_LIBRARY_PATH don't work.
Ubuntu22 restricts bash environment variables when running in non-interactive mode, and the ssh launcher spawns the hydra_pmi_proxy in non-interactive mode.
My workaround was to add the LD_LIBRARY_PATH export at the top of ~/.bashrc, before the interactive/non-interactive mode check.
Indide ~/.bashrc:
# make sure cuda libs are expored, even in non-interactive mode.
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
# If not running interactively, don't do anything
case $- in
*i*) ;;
*) return;;
esac
fwiw, MPICH+CUDA works out of the box on Rocky9, this is Ubuntu-specific.
I doubt this is a good long term fix, it would probably be better if hydra_pmi_proxy had libcudart.so set in it's RPATH, or some maybe some other form of linker magic.
The text was updated successfully, but these errors were encountered:
Thanks for the report! Hydra is getting the CUDA dependency from the embedded convenience library (MPL) that contains our GPU wrappers used in MPICH. Since none of that GPU code is actually used in Hydra, another solution might be to link Hydra with an MPL without any GPU dependencies. That or we move to dlopen/dlsym for GPU support in MPL, which shouldn't get triggered by the Hydra processes.
I ran into an issue with the ssh launcher when MPICH is configured to use CUDA on Ubuntu22.
hydra_pmi_proxy
runs into a linker error when it can't find libcudart.so:Workaround:
Since it's a linker error with
hydra_pmi_proxy
, the intuitive places to setLD_LIBRARY_PATH
don't work.Ubuntu22 restricts bash environment variables when running in non-interactive mode, and the ssh launcher spawns the
hydra_pmi_proxy
in non-interactive mode.My workaround was to add the LD_LIBRARY_PATH export at the top of ~/.bashrc, before the interactive/non-interactive mode check.
Indide ~/.bashrc:
fwiw, MPICH+CUDA works out of the box on Rocky9, this is Ubuntu-specific.
I doubt this is a good long term fix, it would probably be better if
hydra_pmi_proxy
had libcudart.so set in it's RPATH, or some maybe some other form of linker magic.The text was updated successfully, but these errors were encountered: