Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: no overwrite when run launcher as worker #628

Merged
merged 2 commits into from Mar 4, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 7 additions & 6 deletions pkg/controller/mpi_job_controller.go
Expand Up @@ -1534,12 +1534,13 @@ func (c *MPIJobController) newLauncherPodTemplate(mpiJob *kubeflow.MPIJob) corev
case kubeflow.MPIImplementationMPICH:
container.Env = append(container.Env, mpichEnvVars...)
}

container.Env = append(container.Env,
// We overwrite these environment variables so that users will not
// be mistakenly using GPU resources for launcher due to potential
// issues with scheduler/container technologies.
nvidiaDisableEnvVars...)
if !ptr.Deref(mpiJob.Spec.RunLauncherAsWorker, false) {
container.Env = append(container.Env,
// We overwrite these environment variables so that users will not
// be mistakenly using GPU resources for launcher due to potential
// issues with scheduler/container technologies.
nvidiaDisableEnvVars...)
}
c.setupSSHOnPod(&podTemplate.Spec, mpiJob)

// Submit a warning event if the user specifies restart policy for
Expand Down