Skip to content

Kueue v0.7.0-rc.1

Pre-release
Pre-release
Compare
Choose a tag to compare
@alculquicondor alculquicondor released this 08 May 18:29
· 60 commits to main since this release
v0.7.0-rc.1
515c225

Changes since v0.6.0:

Urgent Upgrade Notes

(No, really, you MUST read this before you upgrade)

  • Added CRD validation rules to AdmissionCheck.

    Requires Kubernetes 1.25 or newer (#1975, @IrvingMg)

  • Added CRD validation rules to ClusterQueue.

Requires Kubernetes 1.25 or newer (#1972, @IrvingMg)

  • Added CRD validation rules to ResourceFlavor.

Requires Kubernetes 1.25 or newer (#1958, @IrvingMg)

  • Added CRD validation rules to Workload.

Requires Kubernetes 1.25 or newer (#2008, @IrvingMg)

  • Replaced LocalQueue admission webhook with CRD validation rules.

Requires Kubernetes 1.25 or newer (#1938, @IrvingMg)

  • Upgrade RayJob API to v1

If you use KubeRay older than v1.0.0, you'll have to upgrade your existing installation
to KubeRay v1.0.0, or any more recent version, that supports KubeRay v1 APIs, for it to
remain compatible with Kueue. (#1802, @astefanutti)

  • Use recommended labels and a uniquely identifying selector for Kueue deployment resources.

You need to recreate the Kueue deployment if you had it previously installed,
as the label selector field is immutable. (#1695, @astefanutti)

Changes by Kind

API Change

  • Make ClusterQueue queueingStrategy field mutable. The field can be mutated while there are pending workloads. (#1934, @mimowo)
  • User can now pass parameters to ProvisioningRequest using job's annotations (#1869, @PBundyra)

Feature

  • A new condition with type Preempted allows to distinguish different reasons for the preemption to happen (#1942, @mimowo)

  • Add MultiKueue support for JobSet spec.managedBy field. (#1870, @trasc)

  • Add configuration to register Kinds as being managed by an external Kueue-compatible controller (#2059, @dgrove-oss)

  • Add fair sharing when borrowing unused resources from other ClusterQueues in a cohort.

    Fair sharing is based on DRF for usage above nominal quotas.
    When fair sharing is enabled, Kueue prefers to admit workloads from ClusterQueues with the lowest share first.
    Administrators can enable and configure fair sharing preemption using a combination of two policies: LessThanOrEqualtoFinalShare, LessThanInitialShare.

    You can define a fair sharing weight for ClusterQueues. The weight determines how much of the unused resources each ClusterQueue can take in comparison to others. (#2070, @alculquicondor)

  • Add kubectl kueue plugin that allows to create LocalQueues without writing yamls. (#2027, @mbobrovskyi)

  • Add support allow configuration of ipFamilyPolicy for ipDualStack kubernetes cluster (#1933, @dongjiang1989)

  • Add support allow configuration of custom annotations on Service and Deployment's Pod (#2030, @tozastation)

  • Added MultiKueue worker connection monitoring and reconnect. (#1806, @trasc)

  • Added label copying from Pod/Job into the Kueue Workload. (#1959, @pajakd)

  • Added scalability test for scheduling performance (#1931, @trasc)

  • Added validations for the "multiKueue.origin", ".multiKueue.gcInterval" and the "multiKueue.workerLostTimeout" in the Configuration. (#2129, @tenzen-y)

  • Adds ObservedGeneration in conditions (#1939, @vladikkuzn)

  • Improve metrics related to workload's quota reservation and admission:

    • fix admission_wait_time_seconds - to measure the time to "Admitted" condition since creation time or last requeue (as opposed to the "QuotaReserved" condition as before)
    • add quota_reserved_wait_time_seconds - measures time to "QuotaReserved" condition since creation time, or last eviction time
    • add quota_reserved_workloads_total - counts the number of workloads that got admitted
    • admission_checks_wait_time_seconds - measures the time to admit a workload with admission checks since quota reservation
    • use longer buckets (up to 10240s) for histogram metrics: admission_wait_time_seconds, quota_reserved_wait_time_seconds, admission_checks_wait_time_seconds (#1977, @mbobrovskyi)
  • Improve pod integration performance (#1952, @gabesaba)

  • Improve the kubectl output for workloads using admission checks. (#1991, @vladikkuzn)

  • Make the PodsReady base delay for requeuing configurable (#2040, @mimowo)

  • MuliKueue - Manage worker cluster unavailability (#1681, @trasc)

  • Pods created by Kueue have now the ProvisioningRequest's classname annotation (#2052, @PBundyra)

  • Provisioning Admission Check Controller (ProvisioningACC) feature is now enabled by default (#1968, @pajakd)

  • The message for a ProvisioningRequest being provisioned (which might include an ETA, depending on the implementation) is now propagated to workloads. (#2007, @pajakd)

  • Use PATCH updates for pods. This fixes support for Pods when using the latest features in Kubernetes v1.29 (#2074, @mbobrovskyi)

  • Users can define AdmissionChecks per ResourceFlavor in the ClusterQueue API, using admissionChecksStrategy. (#1960, @PBundyra)

  • Workload finished reason replaced with succeeded and failed reasons (#2026, @vladikkuzn)

Bug or Regression

  • Avoid unnecessary preemptions when there are multiple candidates for preemption with the same admission timestamp (#1875, @alculquicondor)

  • Do not default to suspending a job whose parent is already managed by Kueue (#1846, @astefanutti)

  • Exclude Pod labels, preemptionPolicy and container images when determining whether pods in a pod group have the same shape. (#1758, @alculquicondor)

  • Fix Pods in Pod groups stuck with finalizers when deleted immediately after Succeeded (#1905, @alculquicondor)

  • Fix chart values configuration for the number of reconcilers for the Pod integration. (#2046, @alculquicondor)

  • Fix handling of eviction in StrictFIFO to ensure the evicted workload is in the head.
    Previously, in case of priority-based preemption, it was possible that the lower-priority
    workload might get admitted while the higher priority workload is being evicted. (#2061, @mimowo)

  • Fix incorrect quota management when lendingLimit enabled in preemption (#1770, @kerthcet)

  • Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible, and when using .preemption.borrowWithinCohort (#2110, @alculquicondor)

  • Fix preemption algorithm to reduce the number of preemptions within a ClusterQueue when reclamation is not possible. (#1979, @mimowo)

  • Fix preemption to reclaim quota that is blocked by an earlier pending Workload from another ClusterQueue in the same cohort. (#1866, @alculquicondor)

  • Fix the configuration for the number of reconcilers for the Pod integration. It was only reconciling one group at a time. (#1835, @alculquicondor)

  • Fix the counter of pending workloads in cluster queue status.

    The counter would not count the head workload for StrictFIFO queues, if the workload cannot get admitted.

    This change also includes the blocked workload in the metrics and the visibility API for the list of pending workloads. (#1936, @mimowo)

  • Fix the resource requests computation taking into account sidecar containers. (#2099, @IrvingMg)

  • Fix transitions of Requeued condition. (#2063, @mbobrovskyi)

  • Helm Chart: Fix a bug that the kueue does not work with the cert-manager. (#2087, @EladDolev)

  • HelmChart: Fix a bug that the integrations.podOptions.namespaceSelector is not propagated. (#2086, @EladDolev)

  • Kueue visibility API is no longer installed by default. Users can install it via helm or applying the visibility-api.yaml artifact. (#1746, @trasc)

  • Make the defaults for PodsReadyTimeout backoff more practical, as for the original values
    the couple of first requeues made the impression as immediate on users (below 10s, which
    is negligible to the wait time spent waiting for PodsReady).

    The defaults values for the formula to determine the exponential back are changed as follows:

    • base 1s -> 10s
    • exponent: 1.41284738 -> 2
      So, now the consecutive times to requeue a workload are: 10s, 20s, 40s, ... (#2025, @mimowo)
  • Reduce number of Workload reconciliations due to wrong equality check. (#1897, @gabesaba)

  • The Failed pods in a pod-group are finalized once a replacement pods are created. (#1766, @trasc)

  • WaitForPodsReady: Fix a bug that the requeueState isn't reset. (#1838, @tenzen-y)

  • Сlear RequeuAt on workload backoff finished. (#2143, @mbobrovskyi)

Other (Cleanup or Flake)

  • Avoid API calls for admission attempts when Workload already has condition Admitted=false (#1820, @alculquicondor)
  • Correctly log workload status for workloads with quota reserved, but awaiting for admission checks. (#2062, @mimowo)
  • Dropped the usage of kueue.x-k8s.io/parent-workload annotation in favor of an object ownership based approach. (#1747, @trasc)
  • JobFramework: The eviction by inactivation mechanism was moved to the workload controller. (#2131, @tenzen-y)
  • Skip requeueing of Workloads when there is a status update for a ClusterQueue, saving on API calls for Workloads that were already attempted for admission. (#1822, @alculquicondor)
  • The hash suffix of the workload's name are now influenced by the job's object UID. Recreated jobs with the same name and kind will use different workload names. (#1732, @trasc)