New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open MPI fails with 480 processes on a single node #12489
Comments
Sounds like the file limits on that machine are too low. Try running See https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux for details. |
It looks like this issue is expecting a response, but hasn't gotten one yet. If there are no responses in the next 2 weeks, we'll assume that the issue has been abandoned and will close it. |
@devreal the upper limit on files is 65536. Upon further testing, the failure happens at around np = 250. 65536 = 256^2, so that tracks (obviously, the system has other file handles open). Is it possible that Open MPI is creating a direct connection between each process that lives on the same node? That would explain this np^2 behavior. |
That can happen if communications use TCP, but that should not be the case by default.
to force the shared memory component.
should tell you what is going on by default. |
Thank you for taking the time to submit an issue!
Background information
I am testing OpenFOAM on a Power 10 server node with 768 hardware threads. If I run -np 768 (anything over about 256, really), Open MPI crashes due to the operating system being out of file handles. I have increased the number of handles to 64k, and it still runs out. Another MPI code, LAMMPS, runs out at np = 240.
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
5.0.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
OS distribution package
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I am running the OpenFOAM motorbike test with various mesh sizes. I expect to be able to run with MPI processes populating all the hardware threads, so -np 768. However, the program crashes with an operating system error reporting insufficient file handles. This happens on other MPI codes when the process count is well over 200.
Note: If you include verbatim output (or a code block), please use a GitHub Markdown code block like below:
The text was updated successfully, but these errors were encountered: