Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the latest version, the job cannot run because the gpu quota is set in the queue #3426

Closed
ffz12 opened this issue Apr 19, 2024 · 18 comments
Closed
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@ffz12
Copy link

ffz12 commented Apr 19, 2024

What happened:

In the latest version, the job cannot run because the gpu quota is set in the queue

What you expected to happen:

gpu quotas can be set for queues

How to reproduce it (as minimally and precisely as possible):

Sufficient node resources
1、The queue configuration file is as follows:
a800.yaml

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: a800
spec:
    reclaimable: true
    weight: 1
    capability:
      nvidia.com/gpu: "4"
      cpu: "5"

2、The job configuration file is as follows:
job1.yaml

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-1
spec:
  minAvailable: 1
  schedulerName: volcano
  queue: a800
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: nginx
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          containers:
            - command:
              - sleep
              - 10m
              image: harbor.unijn.cn/zhaofengfeng/dev:v1
              name: nginx
              resources:
                requests:
                  cpu: 4
                  nvidia.com/gpu: "3"
                limits:
                  cpu: 4
                  nvidia.com/gpu: "3"
          restartPolicy: Never

3、pending occurs after running:

 kubectl  apply -f job1.yaml

 kubectl get vcjob
NAME    STATUS    MINAVAILABLE   RUNNINGS   AGE
job-1   Pending   1                         22s

kubectl describe vcjob job-1
。。。
 Warning  PodGroupPending  63s   vc-controller-manager  PodGroup default:job-1 unschedule,reason: 1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable

4、Earlier version 1.72 gup quota function is normal
cat queue_a6k_ada.yaml

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: a6kada
spec:
    reclaimable: true
    weight: 1
    capability:
      nvidia.com/gpu: "1"
    affinity:            # added field
      nodeGroupAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - A6k_ada

cat job2.yaml

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: job-3
spec:
  minAvailable: 1
  schedulerName: volcano
  queue: a6kada
  policies:
    - event: PodEvicted
      action: RestartJob
  tasks:
    - replicas: 1
      name: nginx
      policies:
      - event: TaskCompleted
        action: CompleteJob
      template:
        spec:
          containers:
            - command:
              - sleep
              - 10m
              image: nginx:latest
              name: nginx
              resources:
                requests:
                  cpu: 1
                  nvidia.com/gpu: "1"
                limits:
                  cpu: 1
                  nvidia.com/gpu: "1"
          restartPolicy: Never
 kubectl get po
NAME            READY   STATUS    RESTARTS   AGE
job-3-nginx-0   1/1     Running   0          46s

Anything else we need to know?:

Environment:

  • Volcano Version:
  • Kubernetes version (use kubectl version): 1.23.10
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release): CentOS Linux release 7.9.2009 (Core)
  • Kernel (e.g. uname -a): 3.10.0-1160.el7.x86_64
  • Install tools:
  • Others:
@ffz12 ffz12 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 19, 2024
@lowang-bh
Copy link
Member

please use yaml in markdown to format the yamls. Thanks

@Monokaix
Copy link
Member

Please also paste volcano scheduler logs: )

@ffz12
Copy link
Author

ffz12 commented Apr 19, 2024

1.log

@lowang-bh
Copy link
Member

1.log

log shows it is skipped becuase podgroup is pending, not inqueue status.

I0419 06:46:59.356535       1 enqueue.go:79] Try to enqueue PodGroup to 1 Queues
I0419 06:46:59.356568       1 overcommit.go:123] Sufficient resources, permit job <default/job-1-ee4a4c4d-932b-4a74-872d-bb31c0565b47> to be inqueue
I0419 06:46:59.356670       1 allocate.go:74] Job <default/job-1-ee4a4c4d-932b-4a74-872d-bb31c0565b47> Queue <a800> skip allocate, reason: job status is pending.
I0419 06:46:59.356689       1 allocate.go:64] Try to allocate resource to 0 Queues

@ffz12
Copy link
Author

ffz12 commented Apr 22, 2024

Similarly, the latest version of the operation shows that podgroup is suspended, no cause can be found, and resources are sufficient。This error with podgroup was seen at the time, but don't know how to fix it。

@ffz12
Copy link
Author

ffz12 commented Apr 23, 2024

So how do we solve this problem, that the gpu capability of the newer version of queue is not available, but the older version is available
Like the following

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: a800
spec:
    reclaimable: true
    weight: 1
    capability:
      nvidia.com/gpu: "4"

@lowang-bh
Copy link
Member

You should check the scheduler-config, and change scheduler log level to a large one to see which plugin reject podgroup to become inqueue status.

@Monokaix
Copy link
Member

Plz paste shceduler configmap,and try to restart volcano scheduler.

@ffz12
Copy link
Author

ffz12 commented Apr 25, 2024

1、volcano-scheduler Specifies the appropriate log level to be set
2、volcano-scheduler-configmap as follows

apiVersion: v1
data:
  volcano-scheduler.conf: |
    actions: "enqueue, allocate, backfill, reclaim, preempt"
    tiers:
    - plugins:
      - name: priority
      - name: gang
        enablePreemptable: false
      - name: conformance
    - plugins:
      - name: overcommit
      - name: drf
        enablePreemptable: false
      - name: predicates
      - name: proportion
      - name: nodeorder
      - name: nodegroup
      - name: binpack
        arguments:
          binpack.weight: 10
          binpack.cpu: 1
          binpack.memory: 1
          binpack.resources: nvidia.com/gpu
          binpack.resources.nvidia.com/gpu: 8
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"volcano-scheduler.conf":"actions: \"enqueue, allocate, backfill\"\ntiers:\n- plugins:\n  - name: priority\n  - name: gang\n    enablePreemptable: false\n  - name: conformance\n- plugins:\n  - name: overcommit\n  - name: drf\n    enablePreemptable: false\n  - name: predicates\n  - name: proportion\n  - name: nodeorder\n  - name: binpack\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"volcano-scheduler-configmap","namespace":"volcano-system"}}
  creationTimestamp: "2024-04-10T03:18:37Z"
  name: volcano-scheduler-configmap
  namespace: volcano-system
  resourceVersion: "120372887"
  uid: 122c0b0b-fea7-4aa7-873d-3017bcb7722c

@ffz12
Copy link
Author

ffz12 commented Apr 25, 2024

The setting log is basically 5. The error log about job1 is as follows:

Added Queue attributes.
I0425 06:28:19.398670 1 proportion.go:158] Queue a800 allocated <cpu 0.00, memory 0.00> request <cpu 0.00, memory 0.00> inqueue <cpu 0.00, memory 0.00> elastic <cpu 0.00, memory 0.00>
I0425 06:28:19.398730 1 proportion.go:204] Considering Queue : weight <1>, total weight <1>.
I0425 06:28:19.398782 1 proportion.go:220] Format queue deserved resource to <cpu 0.00, memory 0.00>
I0425 06:28:19.398825 1 proportion.go:224] queue is meet
I0425 06:28:19.398866 1 proportion.go:230] The attributes of queue in proportion: deserved <cpu 0.00, memory 0.00>, realCapability <cpu 2703200.00, memory 41159590465010.00, nvidia.com/gpu 5000.00, nvidia.com/hostdev_2 0.00, pods 0.00, nvidia.com/hostdev_1 0.00, ephemeral-storage 0.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>, allocate <cpu 0.00, memory 0.00>, request <cpu 0.00, memory 0.00>, elastic <cpu 0.00, memory 0.00>, share <0.00>
I0425 06:28:19.398925 1 proportion.go:242] Remaining resource is <cpu 2703200.00, memory 41159590465010.00, nvidia.com/gpu 320000.00, nvidia.com/hostdev_2 40000.00, pods 4620.00, nvidia.com/hostdev_1 200000.00, ephemeral-storage 20403247824896000.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00>
I0425 06:28:19.398992 1 proportion.go:244] Exiting when remaining is empty or no queue has more resource request: <cpu 2703200.00, memory 41159590465010.00, hugepages-2Mi 0.00, nvidia.com/gpu 320000.00, nvidia.com/hostdev_2 40000.00, pods 4620.00, nvidia.com/hostdev_1 200000.00, ephemeral-storage 20403247824896000.00, hugepages-1Gi 0.00>
I0425 06:28:19.399086 1 nodegroup.go:217] queueGroupAffinity queueGroupAntiAffinityRequired <map[]> queueGroupAntiAffinityPreferred <map[]> queueGroupAffinityRequired <map[a800:map[a800:{}] a8001:map[a8001:{}]]> queueGroupAffinityPreferred <map[]> groupLabelName <volcano.sh/nodegroup-name>
I0425 06:28:19.399131 1 binpack.go:165] Enter binpack plugin ...
I0425 06:28:19.399148 1 binpack.go:183] resources [] record in weight but not found on any node
I0425 06:28:19.399169 1 binpack.go:167] Leaving binpack plugin. binpack.weight[10], binpack.cpu[1], binpack.memory[1], nvidia.com/gpu[8], cpu[1], memory[1] ...
I0425 06:28:19.399187 1 enqueue.go:45] Enter Enqueue ...
I0425 06:28:19.399206 1 enqueue.go:63] Added Queue for Job <r1/job-1-47412291-c368-4e58-ae68-4fcb9158cbec>
I0425 06:28:19.399233 1 enqueue.go:74] Added Job <r1/job-1-47412291-c368-4e58-ae68-4fcb9158cbec> into Queue
I0425 06:28:19.399254 1 enqueue.go:79] Try to enqueue PodGroup to 1 Queues
I0425 06:28:19.399286 1 overcommit.go:123] Sufficient resources, permit job <r1/job-1-47412291-c368-4e58-ae68-4fcb9158cbec> to be inqueue
I0425 06:28:19.399339 1 proportion.go:336] job job-1-47412291-c368-4e58-ae68-4fcb9158cbec min resource <cpu 4000.00, memory 4294967296.00, nvidia.com/gpu 2000.00, pods 1.00>, queue a800 capability <cpu 2703200.00, memory 41159590465010.00, ephemeral-storage 0.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 5000.00, nvidia.com/hostdev_2 0.00, pods 0.00, nvidia.com/hostdev_1 0.00> allocated <cpu 0.00, memory 0.00> inqueue <cpu 0.00, memory 0.00> elastic <cpu 0.00, memory 0.00>
I0425 06:28:19.399368 1 proportion.go:349] job job-1-47412291-c368-4e58-ae68-4fcb9158cbec inqueue false
I0425 06:28:19.399421 1 enqueue.go:104] Leaving Enqueue ...
I0425 06:28:19.399440 1 allocate.go:47] Enter Allocate ...
I0425 06:28:19.399454 1 allocate.go:74] Job <r1/job-1-47412291-c368-4e58-ae68-4fcb9158cbec> Queue skip allocate, reason: job status is pending.
I0425 06:28:19.399468 1 allocate.go:64] Try to allocate resource to 0 Queues

@lowang-bh
Copy link
Member

proportion.go:336] job job-1-47412291-c368-4e58-ae68-4fcb9158cbec min resource <cpu 4000.00, memory 4294967296.00, nvidia.com/gpu 2000.00, pods 1.00>, queue a800 capability <cpu 2703200.00, memory 41159590465010.00, ephemeral-storage 0.00, hugepages-1Gi 0.00, hugepages-2Mi 0.00, nvidia.com/gpu 5000.00, nvidia.com/hostdev_2 0.00, pods 0.00, nvidia.com/hostdev_1 0.00> allocated <cpu 0.00, memory 0.00> inqueue <cpu 0.00, memory 0.00> elastic <cpu 0.00, memory 0.00>

Your queue's capacity of pods is 0 and cannot enqueue job.

		klog.V(5).Infof("job %s min resource <%s>, queue %s capability <%s> allocated <%s> inqueue <%s> elastic <%s>",
			job.Name, minReq.String(), queue.Name, attr.realCapability.String(), attr.allocated.String(), attr.inqueue.String(), attr.elastic.String())
		// The queue resource quota limit has not reached
		r := minReq.Add(attr.allocated).Add(attr.inqueue).Sub(attr.elastic)
		rr := attr.realCapability.Clone()

		for name := range rr.ScalarResources {
			if _, ok := r.ScalarResources[name]; !ok {
				delete(rr.ScalarResources, name)
			}
		}

		inqueue := r.LessEqual(rr, api.Infinity)
		klog.V(5).Infof("job %s inqueue %v", job.Name, inqueue)

@ffz12
Copy link
Author

ffz12 commented Apr 26, 2024

How do I do this?

@ffz12
Copy link
Author

ffz12 commented Apr 29, 2024

This is the normal way to write a job

cat a800.yaml
apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: a800
spec:
    reclaimable: true
    weight: 1
    capability:
       nvidia.com/gpu: "5"
       pods: 200    #pods数必须填写
    affinity:            # added field
      nodeGroupAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - a800

@lowang-bh
Copy link
Member

lowang-bh commented Apr 29, 2024

Because #3188 add the pods as a kind of extend resource to support preempt.
@Monokaix I think we'd better to concluse a change log to declare those changes will influence end users when publish a release note.

@Monokaix
Copy link
Member

Can v1.8.2 solve your problem?

@Monokaix
Copy link
Member

Because #3188 add the pods as a kind of extend resource to support preempt. @Monokaix I think we'd better to concluse a change log to declare those changes will influence end users when publish a release note.

I think this can be solved after #3216 merged.

@Monokaix
Copy link
Member

Monokaix commented May 9, 2024

/close

@volcano-sh-bot
Copy link
Contributor

@Monokaix: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants