Skip to content

Releases: kubernetes-sigs/kueue

Kueue v0.4.2

11 Oct 20:01
417b060
Compare
Choose a tag to compare

Changes since v0.4.1:

Bug or Regression

  • Adjust resources (based on LimitRanges, PodOverhead and resource limits) on existing Workloads when a LocalQueue is created (#1197, @alculquicondor)
  • Fix resuming of RayJob after preempted. (#1190, @kerthcet)

Kueue v0.4.1

15 Aug 13:40
328bb66
Compare
Choose a tag to compare

Bug or Regression

  • Fixed missing create verb for webhook (#1053, @stuton)
  • Fixed scheduler to only allow one admission or preemption per cycle within a cohort that has ClusterQueues borrowing quota (#1029, @alculquicondor)
  • Prevent workloads in ClusterQueue with StrictFIFO from blocking higher priority workloads in other ClusterQueues in the same cohort that require preemption (#1030, @alculquicondor)

Kueue v0.4.0

07 Jul 14:41
5cc79d1
Compare
Choose a tag to compare

Changes since v0.3.0:

API Change

Feature

  • Add client-go libraries. (#789, @tenzen-y)
  • Add support for Kuberay's RayJobs. (#667, @trasc)
  • Add support for dynamic reclaim in the JobSet integration. (#901, @trasc)
  • Add support for partial workload admission (#771, @trasc)
  • Add the support for dynamic resources reclaim. (#756, @trasc)
  • Allow scheduler to admit more jobs when the head job have not reached the PodReady=true status. (#708, @KunWuLuan)
  • Allow specifying the manager pod and container security context instead of hardcoded values (#878, @bh-tt)
  • Feature gates for alpha/experimental features is introduced to Kueue Project. (#788, @kerthcet)
  • Ignoring integrations if crd wasn't installed otherwise all integrations are enabled by default (#883, @stuton)
  • Integrate JobSet into kueue (#762, @mcariatm)

Bug or Regression

  • Add permission to update frameworkjob status. (#797, @tenzen-y)
  • Fix a bug that updates events for clusterQueues are created endlessly. (#907, @tenzen-y)
  • Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (#835, @tenzen-y)
  • Fix panic in cluster queue if resources and coveredResources do not have the same length. (#787, @kannon92)
  • Fix: Enforce borrowed=0 if ClusterQueue doesn't belong to a cohort. (#759, @tenzen-y)
  • Fix: Potential over-admission within cohort when borrowing. (#805, @trasc)
  • Fixed preemption to prefer preempting workloads that were more recently admitted. (#843, @stuton)
  • Fixed the suspend=true add to the job/mpijob by the default webhook has not taken effect. (#758, @fjding)

Other (Cleanup or Flake)

  • Add validation for child jobs without ownerReference. (#865, @tenzen-y)

Kueue v0.3.2

13 Jun 14:51
ff63c63
Compare
Choose a tag to compare

Changes since v0.3.1:

Bug or Regression

  • Add permission to update frameworkjob status. (#798, @tenzen-y)
  • Fix a bug where a child batch/job of an unmanaged parent (doesn't have queue name) was being suspended. (#839, @tenzen-y)
  • Fix panic in cluster queue if resources and coveredResources do not have the same length. (#799, @kannon92)
  • Fix: Potential over-admission within cohort when borrowing. (#822, @trasc)
  • Fixed preemption to prefer preempting workloads that were more recently admitted. (#845, @stuton)

Kueue v0.3.1

16 May 18:55
50f628a
Compare
Choose a tag to compare

Changes since v0.3.0:

Bug fixes

  • Fix a bug that the validation webhook doesn't validate the queue name set as a label when creating MPIJob. #711
  • Fix a bug that updates a queue name in workloads with an empty value when using framework jobs that use batch/job internally, such as MPIJob. #713
  • Fix a bug in which borrowed values are set to a non-zero value even though the ClusterQueue doesn't belong to a cohort. #761
  • Fixed adding suspend=true job/mpijob by the default webhook. #765

Kueue v0.3.0

06 Apr 21:07
0e5db01
Compare
Choose a tag to compare

Changes since v0.2.1:

Features

  • Support for kubeflow's MPIJob (v2beta1)
  • Upgrade the config.kueue.x-k8s.io API version from v1alpha1 to v1beta1. v1alpha1 is no longer supported.
    v1beta1 includes the following changes:
    • Add namespace to propagate the namespace where kueue is deployed to the webhook certificate.
    • Add internalCertManagement with fields enable, webhookServiceName and webhookSecretName.
    • Remove enableInternalCertManagement. Use internalCertManagement.enable instead.
  • Upgrade the kueue.x-k8s.io API version from v1alpha2 to v1beta1.
    v1alpha2 is no longer supported.
    v1beta1 includes the following changes:
    • ClusterQueue:
      • Immutability of spec.queueingStrategy.
      • Refactor quota.min and quota.max into nominalQuota and borrowingLimit.
      • Swap hieararchy between resources and flavors.
      • Group flavors and resources into spec.resourceGroups to make
        co-dependent resources explicit.
      • Move admission from spec to status.
      • Add conditions field to status.
    • LocalQueue:
      • Add admitted field in status.
      • Add conditions field to status.
    • Workload:
      • Add metadata to podSet templates.
      • Move admission into status.
    • ResourceFlavor:
      • Introduce spec to hold all fields.
      • Rename labels to nodeLabels.
      • Rename taints to nodeTaints.
  • Reduce API calls by setting .status.admission and updating the Admitted condition in the same API call.
  • Obtain queue names from label kueue.x-k8s.io/queue-name. The annotation with
    the same name is still supported, but it's now deprecated.
  • Multiplatform support for linux/amd64 and linux/arm64.
  • Validating webhook for batch/v1.Job validates kueue-specific labels and
    annotations.
  • Sequential admission of jobs https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/
  • Preemption within ClusterQueue and cohort https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption
  • Support for LimitRanges when calculating jobs usage.
  • Library for integrating job-like CRDs (controller and webhooks) https://sigs.k8s.io/kueue/pkg/controller/jobframework

Production Readiness

Bug fixes

  • Fix job controller ClusterRole for clusters that enable OwnerReferencesPermissionEnforcement admission control validation #392
  • Fix race condition when admission attempt and requeuing happen at the same time #427
  • Atomically release quota and requeue previously inadmissible workloads #512
  • Fix support for leader election #580
  • Fix support for RuntimeClass when calculating jobs usage #565

Acknowledgments

Thanks to our contributors in this release, in no particular order:
@tenzen-y @mcariatm @moficodes @mwielgus @trasc @mimowo @alculquicondor @fjding @kerthcet @ArangoGutierrez @Fish-pro @rbarberop @cortespao @rptaylor @kannon92 @noryev @oginskis @charlieyu1996 @kincl @ahg-g

Kueue v0.2.1

25 Aug 23:43
Compare
Choose a tag to compare

Changes since v0.1.0:

Features

  • Upgrade the API version from v1alpha1 to v1alpha2. v1alpha1 is no longer supported.
    v1alpha2 includes the following changes:
    • Rename Queue to LocalQueue.
    • Remove ResourceFlavor.labels. Use ResourceFlavor.metadata.labels instead.
  • Add webhooks to validate and to add defaults to all kueue APIs.
  • Add internal cert manager to serve webhooks with TLS.
  • Use finalizers to prevent ClusterQueues and ResourceFlavors in use from being
    deleted prematurely.
  • Support codependent resources
    by assigning the same flavor to codependent resources in a pod set.
  • Support pod overhead
    in Workload pod sets.
  • Set requests to limits if requests are not set in a Workload pod set,
    matching internal defaulting for k8s Pods.
  • Add prometheus metrics to monitor health of
    the system and the status of ClusterQueues.
  • Use Server Side Apply for Workload admission to reduce API conflicts.

Bug fixes

  • Fix bug that caused Workloads that don't match the ClusterQueue's
    namespaceSelector to block other Workloads in StrictFIFO ClusterQueues.
  • Fix the number of pending workloads in BestEffortFIFO ClusterQueues status.
  • Fix a bug in BestEffortFIFO ClusterQueues where a workload might not be
    retried after a transient error.
  • Fix requeuing an out-of-date workload when failed to admit it.
  • Fix a bug in BestEffortFIFO ClusterQueues where inadmissible workloads
    were not removed from the ClusterQueue when removing the corresponding Queue.

Thanks to all our contributors!

In no particular order: @ahg-g @alculquicondor @ArangoGutierrez @cmssczy @denkensk @kerthcet @knight42 @cortespao @shuheiktgw @thisisprasad

Full Changelog: v0.1.0...v0.2.1

Kueue v0.2.0

25 Aug 22:49
Compare
Choose a tag to compare
Kueue v0.2.0 Pre-release
Pre-release

Do not use. The published container image doesn't match the release.

Kueue v0.1.1

13 Jun 21:07
1be9e66
Compare
Choose a tag to compare

Changes since v0.1.0:

  • Fixed number of pending workloads in a BestEffortFIFO ClusterQueue.
  • Fixed bug in a BestEffortFIFO ClusterQueue where a workload might not be
    retried after a transient error.
  • Fixed requeuing an out-of-date workload when failed to admit it.
  • Fixed bug in a BestEffortFIFO ClusterQueue where unadmissible workloads
    were not removed from the ClusterQueue when removing the corresponding Queue.

Kueue v0.1.0

12 Apr 19:48
9b79226
Compare
Choose a tag to compare

First release of Kueue, a Kubernetes native set of APIs and controllers for job queueing.

The release includes:

  • The API group kueue.x-k8s.io/v1alpha1 that includes the ClusterQueue, Queue, ResourceFlavor, and Workload APIs.
  • A set of controllers that supports quota-based job queuing, with:
    • Resource sharing: you can define unused resources that can be borrowed by other tenants.
    • Resource flavors and fungibility: you can define multiple flavors or variants of a resource. Jobs are assigned to flavors that are still available.
    • Two queueing strategies: StrictFIFO and BestEffortFIFO.
  • Support for the Kubernetes batch/v1.Job API.
  • The Workload API abstraction allows you to integrate a third-party job API with Kueue.
  • Documentation available at https://sigs.k8s.io/kueue/docs

Thanks to all our contributors!

In no particular order: @alculquicondor @ahg-g @denkensk @ArangoGutierrez @kerthcet @cortespao @BinacsLee @jiwq @Huang-Wei