Skip to content

Kueue v0.5.0

Compare
Choose a tag to compare
@alculquicondor alculquicondor released this 25 Oct 21:39
· 540 commits to main since this release
739ebb1

Changes since v0.4.0:

Highlights

  • AdmissionChecks: a mechanism for internal or external components to influence whether a Workload can be admitted.
  • Integration with cluster-autoscaler's ProvisioningRequest via AdmissionChecks.
  • Information about pending workloads in a ClusterQueue status.
  • Metrics for resource usage of ClusterQueues and LocalQueues.
  • Policy to control whether to preempt or borrow before trying the next flavors.
  • Partial admission graduated to Beta.
  • Workload priority, independent from Pod priority.
  • New integrations:
    • All Kubeflow training APIs
    • Single plain Pods

Changes by Kind

Feature

  • A mechanism for AdmissionChecks to provide labels, annotations, tolerations and node selectors to the pod templates when starting a job (#1180, @mimowo)
  • A reference standalone controller that can be used to support plain Pods using taints and tolerations, which can be used in Kubernetes versions that don't support scheduling gates. (#1111, @nstogner)
  • Add Active condition to AdmissionChecks (#1193, @trasc)
  • Add optional cluster queue resource quota and usage metrics. (#982, @trasc)
  • Add support for AdmissionChecks, a mechanism for internal or external components to influence whether a Workload can be admitted. (#1045, @trasc)
  • Add support for single plain Pods. (#1072, @achernevskii)
  • Add support for workload Priority (#1081, @Gekko0114)
  • Add tolerations to ResourceFlavor. Kueue injects these tolerations to the jobs that are assigned to the flavor when admitted. (#1248, @trasc)
  • Added pprof endpoints for profiling (#978, @stuton)
  • Allow the admission of multiple workloads within one scheduling cycle while borrowing. (#1039, @trasc)
  • An option to synchronize batch/job.completions with parallelism in case of partial admission (#971, @trasc)
  • Expose cluster queue information about pending workloads (#1069, @stuton)
  • Expose probe configurations to helm chart (#986, @yyzxw)
  • Graduate Partial admission to Beta. (#1221, @trasc)
  • Integrate with Cluster Autoscaler's ProvisioningRequest via two stage admission (#1154, @trasc)
  • Manage cluster queue active state based on admission checks life cycle. (#1079, @trasc)
  • Metrics for usage and reservations in ClusterQueues and LocalQueues. (#1206, @trasc)
  • Options to allow workloads to borrow quota or preempt other workloads before trying the next flavor in the list (#849, @KunWuLuan)
  • Support kubeflow.org/mxjob (#1183, @tenzen-y)
  • Support kubeflow.org/paddlejob (#1142, @tenzen-y)
  • Support kubeflow.org/pytorchjob (#995, @tenzen-y)
  • Support kubeflow.org/tfjob (#1068, @tenzen-y)
  • Support kubeflow.org/xgboostjob (#1114, @tenzen-y)
  • Workload objects have the label kueue.x-k8s.io/job-uid where the value matches the uid of the parent job, whether that's a Job, MPIJob, RayJob, JobSet (#1032, @achernevskii)

Bug or Regression

  • Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (#1197, @alculquicondor)
  • Ensure the ClusterQueue status is updated as the number of pending workloads changes. (#1135, @mimowo)
  • Fix resuming of RayJob after preempted. (#1156, @kerthcet)
  • Fixed missing create verb for webhook (#1035, @stuton)
  • Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (#1023, @alculquicondor)
  • Helm: Enable the JobSet integration by default (#1184, @tenzen-y)
  • Improve job controller to be resilient to API failures during preemption (#1005, @alculquicondor)
  • Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (#1024, @alculquicondor)
  • Terminate Kueue when there is an internal failure during setup, so that it can be retried. (#1077, @alculquicondor)

Other (Cleanup or Flake)