Deadlocked volcano jobs #3373

johnhe-dev · 2024-03-27T16:42:09Z

What happened:
We are seeing deadlocked jobs in Volcano queuing. Lets say the cluster has 100 nodes available. Two Volcano jobs are submitted simultaneously. Each job requires 100 Pods. Each Pod must run on a separate node. The MinAvailable Pods for both jobs is 100.

We observe both jobs are in running state. But each job's ['status']['taskStatusCount']['x']['phase']['Running'] is less than 100. It means each job gets a subset of the available 100 nodes. As the MinAvailable Pods for both jobs is 100, neither of the two jobs can get all the MinAvailable Pods allocated.

What you expected to happen:
Ideally, we want to allocate all 100 nodes to one job's PodGroup and keep the second job pending. That would have high cluster resource utilization.

How to reproduce it (as minimally and precisely as possible):
Provision a K8s cluster with 100 available nodes. Create two Volcano jobs simultaneously. Each job requires 100 Pods. Each Pod must run on a separate node. The MinAvailable Pods for both jobs is 100.

Anything else we need to know?:

Volcano-scheduler.conf is as following:

actions: "enqueue, allocate, backfill"
tiers:

plugins:
- name: priority
- name: gang
  enablePreemptable: false
- name: conformance
plugins:
- name: overcommit
- name: drf
  enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack

Environment:

Volcano Version: 1.7.0-beta.0
Kubernetes version (use kubectl version): 1.27
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

Monokaix · 2024-03-28T03:13:14Z

you mean the pod set node affinity?

Vacant2333 · 2024-04-01T15:40:00Z

can u try disable overcommit plugin

noobzzw · 2024-04-10T11:15:50Z

Hi, do you only have 100 nodes available in your cluster? Are there any other resources available, but the job has been set with affinity or node selector? Volcano does not consider affinity or node selector when in the "inqueue" state, so if there are other resources in the cluster but your job has set affinity or node selector, the situation described above may occur.

johnhe-dev · 2024-04-10T16:02:36Z

@noobzzw, thanks for commenting on the issue. 1/ Yes, in the error scenario, there are only 100 nodes available. Other nodes in the cluster are already running other jobs. 2/ Yes, I use node/pod affinity to ensure only 1 Pod running exclusively on 1 node.

To state more clearly, the issue is about gang scheduling. There are two independent job A and B in the queue. Each job requests 100 nodes. The MinAvailable for each job is set to 100. It means each job can run if and only if it is allocated with 100 nodes. I was expecting gang scheduler can respect MinAvailable requirement and allocate all 100 nodes available to 1 job. But this is not happening. Volcano allocates nodes to both jobs. Each job gets a subset of the free nodes and stuck in the queue.

Maybe my understanding about gang scheduling is wrong.

johnhe-dev added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deadlocked volcano jobs #3373

Deadlocked volcano jobs #3373

johnhe-dev commented Mar 27, 2024

Monokaix commented Mar 28, 2024

Vacant2333 commented Apr 1, 2024

noobzzw commented Apr 10, 2024

johnhe-dev commented Apr 10, 2024

Deadlocked volcano jobs #3373

Deadlocked volcano jobs #3373

Comments

johnhe-dev commented Mar 27, 2024

Monokaix commented Mar 28, 2024

Vacant2333 commented Apr 1, 2024

noobzzw commented Apr 10, 2024

johnhe-dev commented Apr 10, 2024