Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal hwloc shipped with OpenMPI 4.1.7 no longer compatible with SLURM 23.11 cgroup plugin / system hwloc #12470

Open
NicoMittenzwey opened this issue Apr 16, 2024 · 3 comments

Comments

@NicoMittenzwey
Copy link

NicoMittenzwey commented Apr 16, 2024

System

AlmaLinux 9.3
OpenMPI 4.1.7 out of HPCX 2.18.0
Nvidia Infiniband NDR
Slurm 23.11

Issue

We are running Slurm 23.11 on Alma Linux 9.3 with TaskPlugin=task/affinity,task/cgroup and OpenMPI 4.1.7 from Mellanox / Nvidia HPC-X 2.18.0. When starting jobs with less then the maximum number of processes per node and NOT defining --ntasks-per-node OpenMPI 4.1.7 will crash as it is trying to bind process to cores which are not available to it:

Open MPI tried to bind a new process, but something went wrong.  The
process was killed without launching the target application.  Your job
will now abort.

  Local host:        gpu004
  Application name:  ./hpcx
  Error message:     hwloc_set_cpubind returned "Error" for bitmap "2,114"
  Location:          rtc_hwloc.c:382
--------------------------------------------------------------------------

Workaround

Recompiling OpenMPI and forcing it to use system hwloc resolves this issue (might need a dnf install hwloc-devel):

./configure [...] --with-hwloc=/usr/ && make && make install

@bgoglin
Copy link
Contributor

bgoglin commented Apr 16, 2024

Might be related to Cgroup v2. This has been supported by hwloc since 2.2 but OMPI 4.1 seems to still have hwloc 2.0.

@jsquyres
Copy link
Member

It's unlikely that we'll update the hwloc in Open MPI v4.1.x.

Your workaround is fine (use the system hwloc). You might also want to try bumping up to Open MPI v5.0.x (which will use the system-provided hwloc -- if available -- by default.

@NicoMittenzwey
Copy link
Author

NicoMittenzwey commented Apr 18, 2024

Thanks. Yes, actually we also installed Open MPI v5.0.2 in parallel. However, some applications run significant faster using HCOLL but we ran into #10718 with Open MPI v5.

We also try to stick with vendor optimized environments for support reasons and Nvidia HPC-X ships with Open MPI 4.1 using the internal hwloc. So this issue also serves as a documentation of our findings in the hopes, search engines will index it and others don't have to investigate for hours to find the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants