New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Issue 2262: add priority capability for reclaim action #3340
base: master
Are you sure you want to change the base?
Conversation
e468089
to
0590c26
Compare
/assign @Monokaix @william-wang |
If other queue's jobs all have higher priority,current queue can not reclaim their resources, reclaim will not happen, is this reasonable? |
We just support this feature and close it by default。 How to use it is depended on the cluster admin. One solutions is to limit the higher priority jobs used resource not exceed queue's deserved in application layer. |
But this seems places high demands on administrators and limits the job priority of the queue, we'd better add a desige doc and give some user-guide. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@Monokaix docs is added. |
/close |
@lowang-bh: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
@lowang-bh: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/assign @william-wang |
New changes are detected. LGTM label has been removed. |
It conflicts with the existing allocation logic between queues. When we allocate tasks, we do not consider the priority of jobs between queues, but only consider the priority at the queue level. |
I think there may be a bug in Reclaimable in session_plugins.go, for this configmap tiers:
- plugins:
- name: priority
- name: proportion as reclaim enabled in priority plugin, let us image that theres some victims return from priority, but if no victims return from proportion plugin , Reclaimable function will return no victims, and reclaim not work any more, put proportion in front of priority can fix this, but it may be a bug? @hwdef |
That is a problem of your config. You'd better to put plugins about resource in a same tier, but a tier different from gang/priority, eg: tiers:
- plugins:
- name: priority
enableReclaimable: false
- name: gang
enablePreemptable: false
- name: conformance
- plugins:
- name: overcommit
- name: drf
enablePreemptable: false
- name: predicates
- name: proportion
- name: nodeorder
- name: binpack |
ok, make sense |
@lowang-bh , there is another promble as following: actions: "enqueue,allocate,backfill,preempt,reclaim"
tiers:
- plugins:
- name: priority
- name: gang
enablePreemptable: false
enableJobStarving: false
enableReclaimable: false
enabledQueueScoreOrder: false
- name: conformance
- plugins:
- name: predicates
- name: proportion gang enablePreemptable: false( make it true also don't work too), so it's cant do any preempt/reclaim, when a high priority job comes, if there no enough resource to meet gang constraint, I hope get resources by reclaim, but as not meet gang constraint, this job will pending, and reclaim action is just skip pending job, so reclaim not happend, no reclaim, no release resource to get job running, like deadlock, delete enqueue may work, but in my scene,enqueue is a must, and |
@zhoushuke There are two kind of evicts:
If you have any problems about how to use volcano, please file a issue to describe it. |
…y-plugin note: set priority plugin conf: enableReclaimable default to false Signed-off-by: lowang-bh <lhui_wang@163.com>
Signed-off-by: lowang-bh <lhui_wang@163.com>
Signed-off-by: lowang-bh <lhui_wang@163.com>
I have some volcano practice and use it in a production to support about 30k~50k pods a day. |
So can you explain why set enablePreemptable: false of gang? |
@lowang-bh: PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Fixes #2262
reclaimable
switch and add UT about priority pluginNote:
reclaimableFn
usually is used inreclaim
action to reclaim a queue's deserved resource when cluster has not enough resource to allocate new coming tasks in this queue. So please be carefully to setenableReclaimable
totrue
inpriority
plugin, in case that a queue's resource owned by high priority jobs can not be released. AndenableReclaimable
is disabled by default for compatibility