Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port conficts will occur when multiple pods dispatched to the same node under hostnetwork. #593

Open
Saturnoul opened this issue Sep 11, 2023 · 3 comments

Comments

@Saturnoul
Copy link

Hostnetwork needs to be enabled to utilize RDMA for high performance transmission. In such circumstance, there will be port conflicts when multiple pods dispatched to the same node for the following reasons:

  • sshd of all workers will listen at port 22
  • ssh will use default port 22 for connection

Can mpi-operator handles the port conflicts under hostnetwork?

@alculquicondor
Copy link
Collaborator

Why not have one pod per node when using hostNetwork?

It would be convoluted to dynamically generate a port for each pod and put that into the hostfile.

@alculquicondor
Copy link
Collaborator

FWIIW, we use port 2222 by default in the base image, so that you can use hostNetwork without the pod's ssh agent conflicting with the host's ssh agent

BASE_IMAGE_SSH_PORT?=2222

@tenzen-y
Copy link
Member

FYI: Instead of hostNetwork, you could use SRIO-V device plugin and multus CNI.

In my production, the way works well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants