You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Problems executing mpirun in parallel, it hangs and sends the message: ORTE does not know how to route a message to the specified daemon located on the indicated node
#12476
Open
jonny261 opened this issue
Apr 18, 2024
· 0 comments
If you have a problem launching MPI or OpenSHMEM applications, be sure to read this.
If you have a problem running MPI or OpenSHMEM applications (i.e., after launching them), be sure to read this.
Background information
What version of Open MPI are you using? 4.1.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
I installed it with apt-get install openmpi-bin and libopenmpi-dev.
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
Operating system/version: Ubuntu 22.04.3 LTS
Details of the problem
I have a master with IP (192.168.1.10) and 4 nodes with IPs (.20, .30, .40, .50). I configured passwordless SSH, and from the master, I can access each node without using a password. I installed pssh, and I can run commands in parallel on each node from the master. I installed NFS, created a directory, mounted it on each node, and it works. I installed OpenMPI, and when I try to run 'mpirun -hostfile hosts ./hello_world
It hangs, and I have to do Ctrl + Z to cancel it, and it shows me this message
^Z mpirun Forwarding signal 20 to job
ORTE does not know how to route a message to the specified daemon
located on the indicated node:
my node: master-H510M-H
target node: 192.168.1.20
This is usually an internal programming error that should be
reported to the developers. In the meantime, a workaround may
be to set the MCA param routed=direct on the command line or
in your environment. We apologize for the problem.
[master-H510M-H] 3 more processes have sent help message help-errmgr-base.txt / no-path
[master-H510M-H] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Could you help me solve this error and be able to execute in parallel?
The text was updated successfully, but these errors were encountered:
Please submit all the information below so that we can understand the working environment that is the context for your question.
Background information
What version of Open MPI are you using? 4.1.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
I installed it with apt-get install openmpi-bin and libopenmpi-dev.
If you are building/installing from a git clone, please copy-n-paste the output from
git submodule status
.Please describe the system on which you are running
Details of the problem
I have a master with IP (192.168.1.10) and 4 nodes with IPs (.20, .30, .40, .50). I configured passwordless SSH, and from the master, I can access each node without using a password. I installed pssh, and I can run commands in parallel on each node from the master. I installed NFS, created a directory, mounted it on each node, and it works. I installed OpenMPI, and when I try to run 'mpirun -hostfile hosts ./hello_world
It hangs, and I have to do Ctrl + Z to cancel it, and it shows me this message
^Z mpirun Forwarding signal 20 to job
ORTE does not know how to route a message to the specified daemon
located on the indicated node:
my node: master-H510M-H
target node: 192.168.1.20
This is usually an internal programming error that should be
reported to the developers. In the meantime, a workaround may
be to set the MCA param routed=direct on the command line or
in your environment. We apologize for the problem.
[master-H510M-H] 3 more processes have sent help message help-errmgr-base.txt / no-path
[master-H510M-H] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
Could you help me solve this error and be able to execute in parallel?
The text was updated successfully, but these errors were encountered: