Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize tasks into folders #1888

Merged
merged 1 commit into from Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
6 changes: 3 additions & 3 deletions README.md
Expand Up @@ -21,12 +21,12 @@ Read the [overview](https://kueue.sigs.k8s.io/docs/overview/) to learn more.
- **Resource management:** Support resource fair sharing and [preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
- **Dynamic resource reclaim:** A mechanism to [release](https://kueue.sigs.k8s.io/docs/concepts/workload/#dynamic-reclaim) quota as the pods of a Job complete.
- **Resource flavor fungibility:** Quota [borrowing or preemption](https://kueue.sigs.k8s.io/docs/concepts/cluster_queue/#flavorfungibility) in ClusterQueue and Cohort.
- **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/), [Kubeflow training jobs](https://kueue.sigs.k8s.io/docs/tasks/run_kubeflow_jobs/), [RayJob](https://kueue.sigs.k8s.io/docs/tasks/run_rayjobs/), [RayCluster](https://kueue.sigs.k8s.io/docs/tasks/run_rayclusters/), [JobSet](https://kueue.sigs.k8s.io/docs/tasks/run_jobsets/), [plain Pod](https://kueue.sigs.k8s.io/docs/tasks/run_plain_pods/).
- **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](https://kueue.sigs.k8s.io/docs/tasks/run/jobs/), [Kubeflow training jobs](https://kueue.sigs.k8s.io/docs/tasks/run/kubeflow/), [RayJob](https://kueue.sigs.k8s.io/docs/tasks/run/rayjobs/), [RayCluster](https://kueue.sigs.k8s.io/docs/tasks/run/rayclusters/), [JobSet](https://kueue.sigs.k8s.io/docs/tasks/run/jobsets/), [plain Pod](https://kueue.sigs.k8s.io/docs/tasks/run/plain_pods/).
- **System insight:** Build-in [prometheus metrics](https://kueue.sigs.k8s.io/docs/reference/metrics/) to help monitor the state of the system, as well as Conditions.
- **AdmissionChecks:** A mechanism for internal or external components to influence whether a workload can be [admitted](https://kueue.sigs.k8s.io/docs/concepts/admission_check/).
- **Advanced autoscaling support:** Integration with cluster-autoscaler's [provisioningRequest](https://kueue.sigs.k8s.io/docs/admission-check-controllers/provisioning/#job-using-a-provisioningrequest) via admissionChecks.
- **Sequential admission:** A simple implementation of [all-or-nothing scheduling](https://kueue.sigs.k8s.io/docs/tasks/setup_sequential_admission/).
- **Partial admission:** Allows jobs to run with a [smaller parallelism](https://kueue.sigs.k8s.io/docs/tasks/run_jobs/#partial-admission), based on available quota, if the application supports it.
- **Sequential admission:** A simple implementation of [all-or-nothing scheduling](https://kueue.sigs.k8s.io/docs/tasks/manage/setup_sequential_admission/).
- **Partial admission:** Allows jobs to run with a [smaller parallelism](https://kueue.sigs.k8s.io/docs/tasks/run/jobs/#partial-admission), based on available quota, if the application supports it.

## Production Readiness status

Expand Down
2 changes: 1 addition & 1 deletion cmd/experimental/README.md
Expand Up @@ -30,4 +30,4 @@ Keep in mind the following rules for each integration:
mark the integration as stale for at most 2 releases. After that, Kueue maintainers will remove
the folder.
- Based on user feedback, the [Kueue maintainers](/OWNERS), at their discretion, might choose to
move the [integration to pkg/controller/jobs](https://kueue.sigs.k8s.io/docs/tasks/integrate_a_custom_job/).
move the [integration to pkg/controller/jobs](https://kueue.sigs.k8s.io/docs/tasks/dev/integrate_a_custom_job/).
2 changes: 1 addition & 1 deletion site/content/en/docs/concepts/cluster_queue.md
Expand Up @@ -487,5 +487,5 @@ If set to `None` or `spec.stopPolicy` is removed the ClusterQueue will to normal

- Create [local queues](/docs/concepts/local_queue)
- Create [resource flavors](/docs/concepts/resource_flavor) if you haven't already done so.
- Learn how to [administer cluster quotas](/docs/tasks/administer_cluster_quotas).
- Learn how to [administer cluster quotas](/docs/tasks/manage/administer_cluster_quotas).
- Read the [API reference](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-ClusterQueue) for `ClusterQueue`
8 changes: 4 additions & 4 deletions site/content/en/docs/concepts/multikueue.md
Expand Up @@ -64,10 +64,10 @@ Known Limitations:
An approach similar to the one described for [`batch/Job`](#batchjob) is taken into account to overcome this.

## Submitting Jobs
In a [configured MultiKueue environemnt](/docs/tasks/setup_multikueue), you can submit any MultiKueue supported job to the Manager cluster, targeting a ClusterQueue configured for Multikueue.
In a [configured MultiKueue environemnt](/docs/tasks/manage/setup_multikueue), you can submit any MultiKueue supported job to the Manager cluster, targeting a ClusterQueue configured for Multikueue.
Kueue delegates the job to the configured worker clusters without any additional configuration changes.

## What’s next?
- Learn how to [setup a MultiKueue environment](/docs/tasks/setup_multikueue/)
- Learn how to [submit JobSets](/docs/tasks/run_jobsets/#jobset-definition) to a running Kueue cluster.
- Learn how to [submit batch/Jobs](/docs/tasks/run_jobs/#1-define-the-job) to a running Kueue cluster.
- Learn how to [setup a MultiKueue environment](/docs/tasks/manage/setup_multikueue/)
- Learn how to [submit JobSets](/docs/tasks/run/jobsets/#jobset-definition) to a running Kueue cluster.
- Learn how to [submit batch/Jobs](/docs/tasks/run/jobs/#1-define-the-job) to a running Kueue cluster.
2 changes: 1 addition & 1 deletion site/content/en/docs/concepts/workload.md
Expand Up @@ -154,5 +154,5 @@ the requeueState (`.status.requeueState`) will be reset to null.
## What's next

- Learn about [workload priority class](/docs/concepts/workload_priority_class).
- Learn how to [run jobs](/docs/tasks/run_jobs)
- Learn how to [run jobs](/docs/tasks/run/jobs)
- Read the [API reference](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-Workload) for `Workload`
4 changes: 2 additions & 2 deletions site/content/en/docs/concepts/workload_priority_class.md
Expand Up @@ -110,6 +110,6 @@ Workload's `PriorityClassSource` and `PriorityClassName` fields are immutable.

## What's next?

- Learn how to [run jobs](/docs/tasks/run_jobs)
- Learn how to [run jobs with workload priority](/docs/tasks/run_job_with_workload_priority)
- Learn how to [run jobs](/docs/tasks/run/jobs)
- Learn how to [run jobs with workload priority](/docs/tasks/manage/run_job_with_workload_priority)
- Read the [API reference](/docs/reference/kueue.v1beta1/#kueue-x-k8s-io-v1beta1-WorkloadPriorityClass) for `WorkloadPriorityClass`
4 changes: 2 additions & 2 deletions site/content/en/docs/overview/_index.md
Expand Up @@ -28,12 +28,12 @@ A core design principle for Kueue is to avoid duplicating mature functionality i
- **Resource management:** Support resource fair sharing and [preemption](/docs/concepts/cluster_queue/#preemption) with a variety of policies between different tenants.
- **Dynamic resource reclaim:** A mechanism to [release](/docs/concepts/workload/#dynamic-reclaim) quota as the pods of a Job complete.
- **Resource flavor fungibility:** Quota [borrowing or preemption](/docs/concepts/cluster_queue/#flavorfungibility) in ClusterQueue and Cohort.
- **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](/docs/tasks/run_jobs/), [Kubeflow training jobs](/docs/tasks/run_kubeflow_jobs/), [RayJob](/docs/tasks/run_rayjobs/), [RayCluster](/docs/tasks/run_rayclusters/), [JobSet](/docs/tasks/run_jobsets/), [plain Pod](/docs/tasks/run_plain_pods/).
- **Integrations:** Built-in support for popular jobs, e.g. [BatchJob](/docs/tasks/run/jobs/), [Kubeflow training jobs](/docs/tasks/run/kubeflow/), [RayJob](/docs/tasks/run/rayjobs/), [RayCluster](/docs/tasks/run/rayclusters/), [JobSet](/docs/tasks/run/jobsets/), [plain Pod](/docs/tasks/run/plain_pods/).
- **System insight:** Built-in [prometheus metrics](/docs/reference/metrics/) to help monitor the state of the system, as well as Conditions.
- **AdmissionChecks:** A mechanism for internal or external components to influence whether a workload can be [admitted](/docs/concepts/admission_check/).
- **Advanced autoscaling support:** Integration with cluster-autoscaler's [provisioningRequest](/docs/admission-check-controllers/provisioning/#job-using-a-provisioningrequest) via admissionChecks.
- **Sequential admission:** A simple implementation of [all-or-nothing scheduling](/docs/tasks/setup_sequential_admission/).
- **Partial admission:** Allows jobs to run with a [smaller parallelism](/docs/tasks/run_jobs/#partial-admission), based on available quota, if the application supports it.
- **Partial admission:** Allows jobs to run with a [smaller parallelism](/docs/tasks/run/jobs/#partial-admission), based on available quota, if the application supports it.

## High-level Kueue operation

Expand Down
31 changes: 16 additions & 15 deletions site/content/en/docs/tasks/_index.md
Expand Up @@ -19,37 +19,38 @@ quotas and queues.

As a batch administrator, you can learn how to:

- [Setup role-based access control](/docs/tasks/rbac)
- [Setup role-based access control](manage/rbac)
to Kueue objects.
- [Administer cluster quotas](/docs/tasks/administer_cluster_quotas) with ClusterQueues and LocalQueues.
- Setup [Sequential Admission with Ready Pods](/docs/tasks/setup_sequential_admission).
- [Administer cluster quotas](manage/administer_cluster_quotas) with ClusterQueues and LocalQueues.
- Setup [Sequential Admission with Ready Pods](manage/setup_sequential_admission).
- As a batch administrator, you can learn how to
[monitor pending workloads](/docs/tasks/monitor_pending_workloads).
- As a batch administrator, you can learn how to [run a Kueue managed Jobs with a custom WorkloadPriority](/docs/tasks/run_job_with_workload_priority).
- As a batch administrator, you can learn how to [setup a MultiKueue environment](/docs/tasks/setup_multikueue).
[monitor pending workloads](manage/monitor_pending_workloads).
- As a batch administrator, you can learn how to [run a Kueue managed Jobs with a custom WorkloadPriority](manage/run_job_with_workload_priority).
- As a batch administrator, you can learn how to [setup a MultiKueue environment](manage/setup_multikueue).

### Batch user

A _batch user_ runs [workloads](/docs/concepts/workload). A typical
batch user is a researcher, AI/ML engineer, data scientist, among others.

As a batch user, you can learn how to:
- [Run a Kueue managed batch/Job](/docs/tasks/run_jobs).
- [Run a Kueue managed Flux MiniCluster](/docs/tasks/run_flux_minicluster).
- [Run a Kueue managed Kubeflow Job](/docs/tasks/run_kubeflow_jobs).
- [Run a Kueue managed batch/Job](run/jobs).
- [Run a Kueue managed Flux MiniCluster](run/flux_miniclusters).
- [Run a Kueue managed Kubeflow Job](run/kubeflow).
Kueue supports MPIJob v2beta1, PyTorchJob, TFJob, XGBoostJob, PaddleJob, and MXJob.
- [Run a Kueue managed KubeRay RayJob](/docs/tasks/run_rayjobs).
- [Submit Kueue jobs from Python](/docs/tasks/run_python_jobs).
- [Run a Kueue managed plain Pod](/docs/tasks/run_plain_pods).
- [Run a Kueue managed JobSet](/docs/tasks/run_jobsets).
- [Run a Kueue managed KubeRay RayJob](run/rayjobs).
- [Run a Kueue managed KubeRay RayCluster](run/rayclusters).
- [Submit Kueue jobs from Python](run/python_jobs).
- [Run a Kueue managed plain Pod](run/plain_pods).
- [Run a Kueue managed JobSet](run/jobsets).

### Platform developer

A _platform developer_ integrates Kueue with other software and/or contributes to Kueue.

As a platform developer, you can learn how to:
- [Integrate a custom Job with Kueue](/docs/tasks/integrate_a_custom_job).
- [Enable pprof endpoints](/docs/tasks/enabling_pprof_endpoints).
- [Integrate a custom Job with Kueue](dev/integrate_a_custom_job).
- [Enable pprof endpoints](dev/enabling_pprof_endpoints).

## Troubleshooting

Expand Down
7 changes: 7 additions & 0 deletions site/content/en/docs/tasks/dev/_index.md
@@ -0,0 +1,7 @@
---
title: "Developer Tools"
weight: 3
date: 2024-03-22
description: >
As a _platform developer_, you can integrate with or develop for Kueue.
---
7 changes: 7 additions & 0 deletions site/content/en/docs/tasks/manage/_index.md
@@ -0,0 +1,7 @@
---
title: "Manage Kueue"
weight: 1
date: 2024-03-22
description: >
As a _batch administrator_, you can manage Kueue.
---
@@ -1,7 +1,7 @@
---
title: "Administer Cluster Quotas"
date: 2022-03-14
weight: 3
weight: 2
description: >
Manage your cluster resource quotas and to establish fair sharing rules among the tenants.
---
Expand Down
@@ -0,0 +1,11 @@
---

title: "Monitor pending Workloads"
linkTitle: "Monitor pending Workloads"
weight: 3
date: 2023-12-05
description: >
How to monitor pending Workloads
---

Kueue provides two ways of monitoring pending Workloads. For Kueue 0.6 and newer, the preferred way to monitor pending Workloads is using the on-demand API.
Expand Up @@ -3,7 +3,7 @@ title: "Pending workloads in Status"
date: 2023-09-27
weight: 3
description: >
Pending workloads in Status
Obtain the pending workloads in ClusterQueue and LocalQueue statuses.
---

This page shows you how to monitor pending workloads.
Expand Down
Expand Up @@ -3,7 +3,7 @@ title: "Pending Workloads on-demand"
date: 2023-12-05
weight: 3
description: >
Pending Workloads on-demand
Obtain the pending Workloads via the on-demand visibility API
---

This page shows you how to monitor pending workloads with VisibilityOnDemand feature.
Expand Down
@@ -1,7 +1,7 @@
---
title: "Setup RBAC"
date: 2022-02-14
weight: 2
weight: 1
description: >
Setup role-based access control (RBAC) in your cluster to control the types of users that can view and create Kueue objects.
---
Expand Down
@@ -1,7 +1,7 @@
---
title: "Run job with WorkloadPriority"
date: 2023-10-02
weight: 8
weight: 4
description: >
Run job with WorkloadPriority, which is independent from Pod's priority
---
Expand Down
@@ -1,7 +1,7 @@
---
title: "Sequential Admission with Ready Pods"
date: 2022-03-14
weight: 4
weight: 5
description: >
Simple implementation of the all-or-nothing scheduling
---
Expand Down
15 changes: 0 additions & 15 deletions site/content/en/docs/tasks/monitor_pending_workloads/_index.md

This file was deleted.

7 changes: 7 additions & 0 deletions site/content/en/docs/tasks/run/_index.md
@@ -0,0 +1,7 @@
---
title: "Run Workloads"
weight: 2
date: 2024-03-22
description: >
As a _batch user_, you can run workloads.
---
@@ -1,5 +1,6 @@
---
title: "Run A Flux MiniCluster"
linkTitle: "Flux MiniClusters"
date: 2022-02-14
weight: 6
description: >
Expand Down
@@ -1,5 +1,6 @@
---
title: "Run A Job"
title: "Run A Kubernetes Job"
linkTitle: "Kubernetes Jobs"
date: 2022-02-14
weight: 5
description: >
Expand Down
@@ -1,5 +1,6 @@
---
title: "Run A JobSet"
linkTitle: "Jobsets"
date: 2023-06-16
weight: 7
description: >
Expand Down
@@ -1,7 +1,7 @@
---

title: "Run with Kubeflow"
linkTitle: "Run with Kubeflow"
linkTitle: "Kubeflow Jobs"
weight: 6
date: 2023-08-23
description: >
Expand Down
@@ -1,5 +1,6 @@
---
title: "Run Plain Pods"
linkTitle: "Plain Pods"
date: 2023-09-27
weight: 6
description: >
Expand Down
@@ -1,5 +1,6 @@
---
title: "Run Jobs Using Python"
linkTitle: "Python"
date: 2023-07-05
weight: 7
description: >
Expand Down
@@ -1,5 +1,6 @@
---
title: "Run A RayCluster"
linkTitle: "RayClusters"
date: 2024-01-17
weight: 6
description: >
Expand Down
@@ -1,5 +1,6 @@
---
title: "Run A RayJob"
linkTitle: "RayJobs"
date: 2023-05-18
weight: 6
description: >
Expand Down
30 changes: 30 additions & 0 deletions site/static/_redirects
@@ -0,0 +1,30 @@
###############################################
# set server-side redirects in this file #
# see https://www.netlify.com/docs/redirects/ #
# test at https://play.netlify.com/redirects #
###############################################

/docs/tasks/administer_cluster_quotas /doc/tasks/manage/administer_cluster_quotas 301
/docs/tasks/monitor_pending_workloads /doc/tasks/manage/monitor_pending_workloads 301
/docs/tasks/rbac /doc/tasks/manage/rbac 301
/docs/tasks/run_job_with_workload_priority /doc/tasks/manage/run_job_with_workload_priority 301
/docs/tasks/setup_multikueue /doc/tasks/manage/setup_multikueue 301
/docs/tasks/setup_sequential_admission /doc/tasks/manage/setup_sequential_admission 301

/docs/tasks/enabling_pprof_endpoints /doc/tasks/dev/enabling_pprof_endpoints 301
/docs/tasks/integrate_a_custom_job /doc/tasks/dev/integrate_a_custom_job 301

/docs/tasks/run_flux_minicluster /docs/tasks/run/flux_miniclusters 301
/docs/tasks/run_jobs /docs/tasks/run/jobs 301
/docs/tasks/run_jobsets /docs/tasks/run/jobsets 301
/docs/tasks/run_kubeflow_jobs /docs/tasks/run/kubeflow 301
/docs/tasks/run_plain_pods /docs/tasks/run/plain_pods 301
/docs/tasks/run_rayclusters /docs/tasks/run/rayclusters 301
/docs/tasks/run_rayjobs /docs/tasks/run/rayjobs 301

/docs/tasks/run_kubeflow_jobs/run_mpijobs /docs/tasks/run/kubeflow/mpijobs 301
/docs/tasks/run_kubeflow_jobs/run_mxjobs /docs/tasks/run/kubeflow/mxjobs 301
/docs/tasks/run_kubeflow_jobs/run_paddlejobs /docs/tasks/run/kubeflow/paddlejobs 301
/docs/tasks/run_kubeflow_jobs/run_pytorchjobs /docs/tasks/run/kubeflow/pytorchjobs 301
/docs/tasks/run_kubeflow_jobs/run_tfjobs /docs/tasks/run/kubeflow/tfjobs 301
/docs/tasks/run_kubeflow_jobs/run_xgboostjobs /docs/tasks/run/kubeflow/xgboostjobs 301
2 changes: 1 addition & 1 deletion site/static/examples/python/README.md
@@ -1,3 +1,3 @@
# Kueue in Python

Documentation for these examples can be found [on the Kueue documentation site](https://kueue.sigs.k8s.io/docs/tasks/run_python_jobs/).
Documentation for these examples can be found [on the Kueue documentation site](https://kueue.sigs.k8s.io/docs/tasks/run/python_jobs/).