Workbench: Tolerations for specific Pods (GPU) #447

kenchrcum · 2023-12-08T13:55:43Z

Hi everybody, we are currently trying to deploy Workbench on our Kubernetes cluster via Helm. Everything works fine, but we have some hardware GPU nodes, which should be reserved for Workbench GPU Sessions. We do not have any problems starting the GPU Sessions, but we can't get the node "reserved" for these sessions.
We are trying to do this tainting the nodes, but we can't get the toleration exclusively on the GPU sessions. After reading through the chart and other repo issues it seems that it is only possible to set taints for all sessions of a Workbench server. We hoped placement-constraints would help us solving the task, but this isn't working as expected, as it looks at the labels of a node.
Is there any chance to make this work? Are we just missing some documentation or is this totally out of scope?

Thanks in advance for any help or suggestion :)

iamsarat · 2024-01-23T01:01:31Z

+1

iamsarat · 2024-01-23T01:03:14Z

We need a way to exclusively use GPU nodes for ONLY GPU resource requests and current configuration doesn't support this.

colearendt · 2024-01-24T19:39:39Z

Thanks for reporting this! I think you are right that this is less than ideal. If you are trying to set a toleration exclusively on a GPU session, that is something that may be possible by customizing templates. Customizing templates is generally a pretty advanced feature (and can definitely be tedious / annoying across chart versions), but it should be able to get you going here!

Can you share an example of a toleration as you would expect it to be defined on the pod that is launched? I should be able to mock up some helm values that can work with that input!

kenchrcum · 2024-02-27T09:49:58Z

Sorry for the long delay and thank you for your reply.

One Taint we would set on the GPU Node is for example nvidia-gpu=server:NoSchedule and we would need to set the according toleration on GPU workbench sessions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workbench: Tolerations for specific Pods (GPU) #447

Workbench: Tolerations for specific Pods (GPU) #447

kenchrcum commented Dec 8, 2023

iamsarat commented Jan 23, 2024

iamsarat commented Jan 23, 2024

colearendt commented Jan 24, 2024

kenchrcum commented Feb 27, 2024

Workbench: Tolerations for specific Pods (GPU) #447

Workbench: Tolerations for specific Pods (GPU) #447

Comments

kenchrcum commented Dec 8, 2023

iamsarat commented Jan 23, 2024

iamsarat commented Jan 23, 2024

colearendt commented Jan 24, 2024

kenchrcum commented Feb 27, 2024