New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In the latest version, the job cannot run because the gpu quota is set in the queue #3426
Comments
please use yaml in markdown to format the yamls. Thanks |
Please also paste volcano scheduler logs: ) |
log shows it is skipped becuase podgroup is pending, not inqueue status.
|
Similarly, the latest version of the operation shows that podgroup is suspended, no cause can be found, and resources are sufficient。This error with podgroup was seen at the time, but don't know how to fix it。 |
So how do we solve this problem, that the gpu capability of the newer version of queue is not available, but the older version is available
|
You should check the scheduler-config, and change scheduler log level to a large one to see which plugin reject podgroup to become inqueue status. |
Plz paste shceduler configmap,and try to restart volcano scheduler. |
1、volcano-scheduler Specifies the appropriate log level to be set
|
The setting log is basically 5. The error log about job1 is as follows: Added Queue attributes. |
Your queue's capacity of pods is 0 and cannot enqueue job. klog.V(5).Infof("job %s min resource <%s>, queue %s capability <%s> allocated <%s> inqueue <%s> elastic <%s>",
job.Name, minReq.String(), queue.Name, attr.realCapability.String(), attr.allocated.String(), attr.inqueue.String(), attr.elastic.String())
// The queue resource quota limit has not reached
r := minReq.Add(attr.allocated).Add(attr.inqueue).Sub(attr.elastic)
rr := attr.realCapability.Clone()
for name := range rr.ScalarResources {
if _, ok := r.ScalarResources[name]; !ok {
delete(rr.ScalarResources, name)
}
}
inqueue := r.LessEqual(rr, api.Infinity)
klog.V(5).Infof("job %s inqueue %v", job.Name, inqueue) |
How do I do this? |
This is the normal way to write a job
|
Can v1.8.2 solve your problem? |
/close |
@Monokaix: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
What happened:
In the latest version, the job cannot run because the gpu quota is set in the queue
What you expected to happen:
gpu quotas can be set for queues
How to reproduce it (as minimally and precisely as possible):
Sufficient node resources
1、The queue configuration file is as follows:
a800.yaml
2、The job configuration file is as follows:
job1.yaml
3、pending occurs after running:
4、Earlier version 1.72 gup quota function is normal
cat queue_a6k_ada.yaml
cat job2.yaml
Anything else we need to know?:
Environment:
kubectl version
): 1.23.10uname -a
): 3.10.0-1160.el7.x86_64The text was updated successfully, but these errors were encountered: