Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocked volcano jobs #3373

Open
johnhe-dev opened this issue Mar 27, 2024 · 4 comments
Open

Deadlocked volcano jobs #3373

johnhe-dev opened this issue Mar 27, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@johnhe-dev
Copy link

What happened:
We are seeing deadlocked jobs in Volcano queuing. Lets say the cluster has 100 nodes available. Two Volcano jobs are submitted simultaneously. Each job requires 100 Pods. Each Pod must run on a separate node. The MinAvailable Pods for both jobs is 100.

We observe both jobs are in running state. But each job's ['status']['taskStatusCount']['x']['phase']['Running'] is less than 100. It means each job gets a subset of the available 100 nodes. As the MinAvailable Pods for both jobs is 100, neither of the two jobs can get all the MinAvailable Pods allocated.

What you expected to happen:
Ideally, we want to allocate all 100 nodes to one job's PodGroup and keep the second job pending. That would have high cluster resource utilization.

How to reproduce it (as minimally and precisely as possible):
Provision a K8s cluster with 100 available nodes. Create two Volcano jobs simultaneously. Each job requires 100 Pods. Each Pod must run on a separate node. The MinAvailable Pods for both jobs is 100.

Anything else we need to know?:

Volcano-scheduler.conf is as following:

actions: "enqueue, allocate, backfill"
tiers:

  • plugins:
    • name: priority
    • name: gang
      enablePreemptable: false
    • name: conformance
  • plugins:
    • name: overcommit
    • name: drf
      enablePreemptable: false
    • name: predicates
    • name: proportion
    • name: nodeorder
    • name: binpack

Environment:

  • Volcano Version: 1.7.0-beta.0
  • Kubernetes version (use kubectl version): 1.27
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:
@johnhe-dev johnhe-dev added the kind/bug Categorizes issue or PR as related to a bug. label Mar 27, 2024
@Monokaix
Copy link
Member

you mean the pod set node affinity?

@Vacant2333
Copy link
Contributor

can u try disable overcommit plugin

@noobzzw
Copy link

noobzzw commented Apr 10, 2024

Hi, do you only have 100 nodes available in your cluster? Are there any other resources available, but the job has been set with affinity or node selector? Volcano does not consider affinity or node selector when in the "inqueue" state, so if there are other resources in the cluster but your job has set affinity or node selector, the situation described above may occur.

@johnhe-dev
Copy link
Author

@noobzzw, thanks for commenting on the issue. 1/ Yes, in the error scenario, there are only 100 nodes available. Other nodes in the cluster are already running other jobs. 2/ Yes, I use node/pod affinity to ensure only 1 Pod running exclusively on 1 node.

To state more clearly, the issue is about gang scheduling. There are two independent job A and B in the queue. Each job requests 100 nodes. The MinAvailable for each job is set to 100. It means each job can run if and only if it is allocated with 100 nodes. I was expecting gang scheduler can respect MinAvailable requirement and allocate all 100 nodes available to 1 job. But this is not happening. Volcano allocates nodes to both jobs. Each job gets a subset of the free nodes and stuck in the queue.

Maybe my understanding about gang scheduling is wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants