New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
victims in reclaim.go don't sort by ssn.TaskOrderFn #2807
Comments
/assign |
I'm very sorry, things have been a little busy lately, I will fix this as soon as possible @zhaizhch |
Do you have time to fix it now? @wangyang0616 |
Sorry, there are many things recently, and we have not yet come and modified. If some contributors are interested in this problem, we can fix it together. |
Hello 👋 Looks like there was no activity on this issue for last 90 days. |
/remove lifecycle/stale |
What happened:
victims in reclaim.go don't sort by ssn.TaskOrderFn
What you expected to happen:
when reclaim happened, lower priority was reclaimed first.
How to reproduce it (as minimally and precisely as possible):
1.create two queues, queue1's weight = 5 and queue2's weigh = 3(this is only one node in cluster which has 8 cards)
2. in queue2, submit 5 low priority jobs that each one requires 1GPU card and 3 medium priority jobs that each one requires 1GPU card, wait until jobs are running
3. in queue1, submit 1 high priority job that requires 4GPU cards
expect
4 low priority jobs in queue2 will be killed and their resource was reclaimed. the high priority job in queue1 will be running
actually
4 jobs in queue2 was killed randomly.
Anything else we need to know?:
config file like:
volcano-scheduler.conf: |
actions: "enqueue, reclaim, allocate, preempt, backfill"
tiers:
plugins:
name: priority
name: gang
enableJobStarving: false
enableReclaimable: false
enablePreemptable: false
name: conformance
name: sla
plugins:
name: drf
name: predicates
name: proportion
enableJobEnqueued: false
name: nodeorder
name: binpack
arguments:
binpack.weight: 17
binpack.cpu: 1
binpack.memory: 1
binpack.resources: nvidia.com/gpu, baidu.com/xpu, cambricon.com/mlu
binpack.resources.nvidia.com/gpu: 5
binpack.resources.baidu.com/xpu: 5
binpack.resources.cambricon.com/mlu: 5
Environment:
Others:
victim doesn't sort by ssn.TaskOrderFn like preempt.go
The text was updated successfully, but these errors were encountered: