Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tolerations only to specific worker pods #539

Open
anxietymonger opened this issue Mar 24, 2023 · 10 comments
Open

Add tolerations only to specific worker pods #539

anxietymonger opened this issue Mar 24, 2023 · 10 comments

Comments

@anxietymonger
Copy link

I am currently facing with a scenario, where I need to schedule worker0 pod onto specific nodes. Is there any possibility to configure custom tolerations or labels only to specific worker pods (like worker0). Thank you in advance for any information.

@tenzen-y
Copy link
Member

No, the mpi-operator doesn't support configuring parameters for a specific worker.

@tenzen-y
Copy link
Member

/kind question

@anxietymonger
Copy link
Author

Thank you very much for your information. Closing the issue.

@alculquicondor
Copy link
Collaborator

Why is worker 0 special? In addition to the launcher already being "special".

@anxietymonger
Copy link
Author

In my case, worker0 is special since rank0 need to access some resources that only exists on specific nodes. It is true that launcher is already special to some extents, but in my understanding, the launcher won't do any computation, right? BTW, is it possible to make rank0 running on the launcher?

@alculquicondor
Copy link
Collaborator

That is correct, the launcher just coordinates.

Having the workers in its own pods has the advantage that the resources can be exclusive to the worker computations, as opposed to be shared with launcher tasks. Nothing prohibits the launcher pod to be in the same node as other workers, but you have the isolation of the pod namespaces to have better control.

@alculquicondor
Copy link
Collaborator

I wonder how common is the specialization you mention is.

Would we need to add support for an arbitrary number of pod templates?

cc @ahg-g @danielvegamyhre

@alculquicondor
Copy link
Collaborator

/reopen

@google-oss-prow
Copy link

@alculquicondor: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@google-oss-prow google-oss-prow bot reopened this Mar 29, 2023
@alculquicondor
Copy link
Collaborator

This was also discussed in #384

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants