Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard time understanding how PodGroup exactly works #3431

Open
Gygrus opened this issue Apr 21, 2024 · 7 comments
Open

Hard time understanding how PodGroup exactly works #3431

Gygrus opened this issue Apr 21, 2024 · 7 comments

Comments

@Gygrus
Copy link

Gygrus commented Apr 21, 2024

This is more of a question about Volcano PodGroup functionality rather than an issue, because I am almost certain that I misunderstood how it works and it confuses me. I tried to find answer in other Github issues topics as well as in the official documentation, but no luck there.

I have a Kubernetes cluster (created via Minikube) with 4 nodes and Volcano is properly configured. I created a simple queue,

apiVersion: scheduling.volcano.sh/v1beta1
kind: Queue
metadata:
  name: tq
  namespace: default
spec:
  reclaimable: true
  weight: 1
  capability:
    cpu: "4"
    memory: "4096Mi"

Then a simple PodGroup with no constraints on resources,

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: pg
  namespace: default
spec:
  queue: tq

and finally, a simple job that runs 3 tasks simultaneously, that just sleep for 30 seconds:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vcjob1
  namespace: default
spec:
  minAvailable: 3
  schedulerName: volcano
  policies:
    - event: PodEvicted
      action: RestartJob
  plugins:
    ssh: []
    env: []
    svc: []
  maxRetry: 5
  queue: tq
  tasks:
    - replicas: 3
      template:
        spec:
          containers:
          - name: dummy-job
            image: gcr.io/k8s-staging-perf-tests/sleep:latest
            imagePullPolicy: IfNotPresent
            args: ["30s"]
            resources:
              requests:
                cpu: 1
                memory: "200Mi"
          restartPolicy: Never

So now, when I deploy both the queue and the PodGroup, I (wrongly) expected that all created vcjob1 jobs would run on pods that belong to the defined pg PodGroup (as job is connected to the tq queue and queue is connected to the pg PodGroup), however when the job is running, Volcano creates a new dynamic PodGroup, as if there was no PodGroup assigned to the queue to which jobs were assigned:

image
image

I've tried multiple different PodGroup configurations, with some MinMembers and MinResources flags defined as well (and I am quite certain that the cluster/jobs have resources to meet those demands), but the result was always the same: jobs were starting a new PodGroup and were executed on pods belonging to that group. So it's clearly how this system should work, but then it raises a couple of questions:

  1. What I need to do to make my newly created jobs use my custom PodGroup?
  2. What is the exact reason for PodGroups to exist? I thought it was to What are some use cases that would showcase those reasons? I don't think that documentation covers that part very well, at least the "PodGroup" section.
  3. Is MinMember PodGroup property the minimum number of pods that a job would require to run? For example, we want to run a job with 3 replicas on our PodGroup, but the PodGroup won't start if its MinMember field is set to 4?

Sorry if those questions are trivial and only come from my system misunderstanding, but maybe I'm not the only one who didn't get the idea of PodGroup from the sole documentation and this thread might help them as well.

@lowang-bh
Copy link
Member

You can have the docs at https://volcano.sh/zh/docs/. Your vcjob doesn't specify a queue and use the default queue

@Gygrus
Copy link
Author

Gygrus commented Apr 21, 2024

You can have the docs at https://volcano.sh/zh/docs/. Your vcjob doesn't specify a queue and use the default queue

In the yaml file defining my vcjob there is a queue: tq tag, so I guess it should point to tq queue, not default?

@PigNatovsky
Copy link

PigNatovsky commented Apr 27, 2024

Probably You need to add pod annotation scheduling.k8s.io/group-name: grouoname.
This can be done through task.spec.metadata (I think so). If You don't explicity set a podgroup (which is taken from this pod annotation), pod will be added to default podgroup.

@PigNatovsky
Copy link

@Gygrus
Have You tried an option with annotation in task metadata?

@Gygrus
Copy link
Author

Gygrus commented May 6, 2024

@PigNatovsky Sorry for not being active lately. I tried to add annotation, however I'm not sure if I did this in the right place in the vcjob yaml:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vcjob
  namespace: default
spec:
  minAvailable: 3
  schedulerName: volcano
  policies:
    - event: PodEvicted
      action: RestartJob
  plugins:
    ssh: []
    env: []
    svc: []
  maxRetry: 5
  queue: tq
  tasks:
    - replicas: 3
      template:
        metadata:
            annotations:
              scheduling.k8s.io/group-name: pg
        spec:
          containers:
          - name: dummy-job
            image: gcr.io/k8s-staging-perf-tests/sleep:latest
            imagePullPolicy: IfNotPresent
            args: ["30s"]
            resources:
              requests:
                cpu: 1
                memory: "200Mi"
          restartPolicy: Never

I entered my created podgroup name as the value of annotation. Unfortunately, my vcjobs still don't get assigned to the right podgroup and when the jobs are submited, a dynamic podgroup is being created:
image
image

What changed though, is that now dynamic podgroups, as well as created vcjobs, have tq as the queue (which is generally good, as I connected the podgroup with tq queue). What is weird though is that the dynamic podgroup doesn't run the job, which stay in the pending status. In the dynamic podgroup yaml we can see:
image
which isn't very understandable for me. Why Volcano created a podgroup which cannot run specified jobs?

So to conclude, now my vcjobs don't execute, but at least they are assigned to the right queue, so that's progress :)

Actually, now when I removed this additional annotation from the same vcjob yaml file, it seems that the behavior of those jobs is actually the same regardless of the annotation. It's getting more and more confusing, it seems like now I get different results
then when I created this thread (different queues are getting assigned to the dynamic podgroups and vcjobs, vcjobs stuck in pending status while they were normally executed earlier) and it's somehow nondeterministic (??)

Still, I really appreciate your help and thanks for replying!

@PigNatovsky
Copy link

Well, pod group is not ready, 3 minAvailable and NotEnoughResources is pretty self descriptive. You've created a job that has minAvailable set to 3. So it will be not scheduled till will be enough resources in the cluster. How much CPU and Memory do You have as a allocatable resources per node?

@Gygrus
Copy link
Author

Gygrus commented May 7, 2024

OK, apparently the reason for those jobs not running was the fact that yunikorn was installed in the cluster, after uninstalling it, vcjobs are running and completing normally, but still they aren't assigned to the right podgroup, a dynamic one is being created on start.
With following modified vcjob and podgroup configuration:

apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
  name: vcjob1
  namespace: default
spec:
  minAvailable: 1
  schedulerName: volcano
  policies:
    - event: PodEvicted
      action: RestartJob
  plugins:
    ssh: []
    env: []
    svc: []
  maxRetry: 5
  queue: tq
  tasks:
    - replicas: 1
      template:
        metadata:
            annotations:
              scheduling.k8s.io/group-name: pg
        spec:
          containers:
          - name: dummy-job
            image: gcr.io/k8s-staging-perf-tests/sleep:latest
            imagePullPolicy: IfNotPresent
            args: ["30s"]
            resources:
              requests:
                cpu: 1
                memory: "200Mi"
          restartPolicy: Never
apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  name: pg
  namespace: default
spec:
  minMember: 1
  minResources:
    cpu: "1"
    memory: "200Mi"
  queue: tq

volcano creates a dynamic podgroup with such definition:

apiVersion: scheduling.volcano.sh/v1beta1
kind: PodGroup
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"batch.volcano.sh/v1alpha1","kind":"Job","metadata":{"annotations":{},"name":"vcjob1","namespace":"default"},"spec":{"maxRetry":5,"minAvailable":1,"plugins":{"env":[],"ssh":[],"svc":[]},"policies":[{"action":"RestartJob","event":"PodEvicted"}],"queue":"tq","schedulerName":"volcano","tasks":[{"replicas":1,"template":{"metadata":{"annotations":{"scheduling.k8s.io/group-name":"pg"}},"spec":{"containers":[{"args":["30s"],"image":"gcr.io/k8s-staging-perf-tests/sleep:latest","imagePullPolicy":"IfNotPresent","name":"dummy-job","resources":{"requests":{"cpu":1,"memory":"200Mi"}}}],"restartPolicy":"Never"}}}]}}
  creationTimestamp: "2024-05-07T15:58:59Z"
  generation: 4
  name: vcjob1-5b34308d-1701-47f7-9a0c-930bfb5aacbd
  namespace: default
  ownerReferences:
  - apiVersion: batch.volcano.sh/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: vcjob1
    uid: 5b34308d-1701-47f7-9a0c-930bfb5aacbd
  resourceVersion: "35791"
  uid: 2eef20ec-828b-40df-868d-a86d4f43d5ee
spec:
  minMember: 1
  minResources:
    count/pods: "1"
    cpu: "1"
    memory: 200Mi
    pods: "1"
    requests.cpu: "1"
    requests.memory: 200Mi
  minTaskMember:
    default0: 1
  queue: tq
status:
  conditions:
  - lastTransitionTime: "2024-05-07T15:59:00Z"
    message: '1/0 tasks in gang unschedulable: pod group is not ready, 1 minAvailable'
    reason: NotEnoughResources
    status: "True"
    transitionID: ee48ea00-0450-4b05-9670-a9ca17283054
    type: Unschedulable
  - lastTransitionTime: "2024-05-07T15:59:03Z"
    reason: tasks in gang are ready to be scheduled
    status: "True"
    transitionID: a18d0bd7-6d12-4ac7-982f-ed2c3b9491ca
    type: Scheduled
  phase: Running
  running: 1

which is basically the same configuration as the one podgroup created statically. So why it didn't use my ready podgroup which just sits in the 'Inqueue' status? I checked my cluster's nodes resources and there are enough resources on each of the 3 worker nodes:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants