title | linkTitle | date | weight | description |
---|---|---|---|---|
Run A RayCluster |
RayClusters |
2024-01-17 |
6 |
Run a RayCluster on Kueue.
|
This page shows how to leverage Kueue's scheduling and resource management capabilities when running RayCluster.
This guide is for batch users that have a basic understanding of Kueue. For more information, see Kueue's overview.
-
Make sure you are using Kueue v0.6.0 version or newer and KubeRay 1.1.0 or newer.
-
Check Administer cluster quotas for details on the initial Kueue setup.
-
See KubeRay Installation for installation and configuration details of KubeRay.
When running RayClusters on Kueue, take into consideration the following aspects:
The target local queue should be specified in the metadata.labels
section of the RayCluster configuration.
metadata:
name: raycluster-sample
namespace: default
labels:
kueue.x-k8s.io/queue-name: local-queue
The resource needs of the workload can be configured in the spec
.
headGroupSpec:
spec:
affinity: {}
containers:
- env: []
image: rayproject/ray:2.7.0
imagePullPolicy: IfNotPresent
name: ray-head
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
securityContext: {}
volumeMounts:
- mountPath: /tmp/ray
name: log-volume
workerGroupSpecs:
template:
spec:
affinity: {}
containers:
- env: []
image: rayproject/ray:2.7.0
imagePullPolicy: IfNotPresent
name: ray-worker
resources:
limits:
cpu: "1"
memory: 1G
requests:
cpu: "1"
memory: 1G
Note that a RayCluster will hold resource quotas while it exists. For optimal resource management, you should delete a RayCluster that is no longer in use.
- Limited Worker Groups: Because a Kueue workload can have a maximum of 8 PodSets, the maximum number of
spec.workerGroupSpecs
is 7 - In-Tree Autoscaling Disabled: Kueue manages resource allocation for the RayCluster; therefore, the cluster's internal autoscaling mechanisms need to be disabled
The RayCluster looks like the following:
{{< include "examples/jobs/ray-cluster-sample.yaml" "yaml" >}}
You can submit a Ray Job using the CLI or log into the Ray Head and execute a job following this example with kind cluster.