New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hard time understanding how PodGroup exactly works #3431
Comments
You can have the docs at https://volcano.sh/zh/docs/. Your vcjob doesn't specify a queue and use the default queue
|
In the yaml file defining my vcjob there is a |
Probably You need to add pod annotation |
@Gygrus |
@PigNatovsky Sorry for not being active lately. I tried to add annotation, however I'm not sure if I did this in the right place in the vcjob yaml: apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
name: vcjob
namespace: default
spec:
minAvailable: 3
schedulerName: volcano
policies:
- event: PodEvicted
action: RestartJob
plugins:
ssh: []
env: []
svc: []
maxRetry: 5
queue: tq
tasks:
- replicas: 3
template:
metadata:
annotations:
scheduling.k8s.io/group-name: pg
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:latest
imagePullPolicy: IfNotPresent
args: ["30s"]
resources:
requests:
cpu: 1
memory: "200Mi"
restartPolicy: Never I entered my created podgroup name as the value of annotation. Unfortunately, my vcjobs still don't get assigned to the right podgroup and when the jobs are submited, a dynamic podgroup is being created: What changed though, is that now dynamic podgroups, as well as created vcjobs, have So to conclude, now my vcjobs don't execute, but at least they are assigned to the right queue, so that's progress :) Actually, now when I removed this additional annotation from the same vcjob yaml file, it seems that the behavior of those jobs is actually the same regardless of the annotation. It's getting more and more confusing, it seems like now I get different results Still, I really appreciate your help and thanks for replying! |
Well, |
This is more of a question about Volcano PodGroup functionality rather than an issue, because I am almost certain that I misunderstood how it works and it confuses me. I tried to find answer in other Github issues topics as well as in the official documentation, but no luck there.
I have a Kubernetes cluster (created via Minikube) with 4 nodes and Volcano is properly configured. I created a simple queue,
Then a simple PodGroup with no constraints on resources,
and finally, a simple job that runs 3 tasks simultaneously, that just sleep for 30 seconds:
So now, when I deploy both the queue and the PodGroup, I (wrongly) expected that all created
vcjob1
jobs would run on pods that belong to the definedpg
PodGroup (as job is connected to thetq
queue and queue is connected to thepg
PodGroup), however when the job is running, Volcano creates a new dynamic PodGroup, as if there was no PodGroup assigned to the queue to which jobs were assigned:I've tried multiple different PodGroup configurations, with some MinMembers and MinResources flags defined as well (and I am quite certain that the cluster/jobs have resources to meet those demands), but the result was always the same: jobs were starting a new PodGroup and were executed on pods belonging to that group. So it's clearly how this system should work, but then it raises a couple of questions:
MinMember
PodGroup property the minimum number of pods that a job would require to run? For example, we want to run a job with 3 replicas on our PodGroup, but the PodGroup won't start if itsMinMember
field is set to 4?Sorry if those questions are trivial and only come from my system misunderstanding, but maybe I'm not the only one who didn't get the idea of PodGroup from the sole documentation and this thread might help them as well.
The text was updated successfully, but these errors were encountered: