Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preemption admission check controller. KEP update. #1178

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

trasc
Copy link
Contributor

@trasc trasc commented Oct 4, 2023

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Preemption admission check controller. KEP update.

Which issue(s) this PR fixes:

Fixes #1677

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. labels Oct 4, 2023
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Oct 4, 2023
@netlify
Copy link

netlify bot commented Oct 4, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 9ddfc1b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6572ef43336a6b00080893da

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 4, 2023
@trasc
Copy link
Contributor Author

trasc commented Oct 4, 2023

/assign @alculquicondor

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved

At every run the controller will get the list of workloads pending preemption.
Since for some of these workloads is not necessary to issue eviction at that given point (eg. Having a pending check that uses AfterCheckPassedOrOnDemand policy) their quota reservation will be ignored.
Fore every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fore every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`.
For every other preemption pending workloads, it will check if it can fit without evicting other workloads, case in which the preemption admission check condition will be set to `Ready`.

Preemption can run before other checks have finished their transition to Ready. By the time every check has transitioned to Ready, we still need to check if we need more preemptions.
So, in a sense, there is no concept of "preemption ready", we need to continue checking until the workload is fully admitted. And once it fits, and the checks are all "ready", we can directly transition to admitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quota reserved by a workload is considered in use by the scheduler, if another workload gets is reservation after the fist one gets to "preemption ready" , that second workload will need to trigger it's own set of evictions and so on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(fore -> for done)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The quota reserved by a workload is considered in use by the scheduler, if another workload gets is reservation after the fist one gets to "preemption ready" , that second workload will need to trigger it's own set of evictions and so on.

Yes, that I understand.

Let me state my thinking again. There are scenarios where preemption needs to run before other admission checks have completed, because other AdmissionChecks can request preemptions.
In that case, we will issue preemptions, but even when completed, we might still need preemptions later on, when all the AdmissionChecks actually complete.

That's why I say that there might not be a concept of "preemptions are ready".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preemption is finished when the workload can fit without the need of any additional evictions. Pass this point its quota is locked and can start execution, or wait more. The quote remains locked until this workload finishes or gets evicted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes me think that the existing QuotaReserved condition will have different semantics. And the idea of QuotaReserved could actually be used to signal that preemption is "ready".

But maybe that's a bit of over-engineering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... we could a yet another Quota specific condition like QuotaLocked, to only be set when all the evictions are done, then change the Admission logic from QuoataReserved && AdmissionChecks==Ready to QuoataLocked && AdmissionChecks==Ready.

The preemption controller can then manage workloads that have QuoataReserved but not QuoataLocked.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would QuotaReserved be equivalent to "preemption ready" then?

Copy link
Contributor Author

@trasc trasc Dec 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other way around, will have less impact on the code, The scheduler creates the admission struct and sets the QuotaReserved, the preemption controller will set the new condition when no evictions are needed. The workload controller will set the Admitted condition when the new condition is true and all Admission checks are ready. (sure we can add a fast path in the scheduler so if at reservation time the wl fits without the need of evictions set both conditions)

keps/993-two-phase-admission/README.md Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: trasc
Once this PR has been reviewed and has the lgtm label, please ask for approval from alculquicondor. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Show resolved Hide resolved
@trasc
Copy link
Contributor Author

trasc commented Oct 9, 2023

Any implications of this update to https://github.com/kubernetes-sigs/kueue/tree/main/keps/582-preempt-based-on-flavor-order ?

No, the changes done by that only have effect before quota reservation.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 19, 2023
@alculquicondor
Copy link
Contributor

I think one important question is whether the preemption controller needs to be an admission check.

In some scenarios, the preemption check is running only once all the other admission checks have passed. This has the implication that now no other admission check can do the same, otherwise they will block each other.

So that's an argument against it being an admission check.

I'll give another pass to the proposal tomorrow and think about it more.

keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
- Set its admission check to `Ready`
- Add it to the snapshot
- If it cannot fit
- Get an updated list of eviction candidates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify which ones are candidates? Is it only workloads that are fully Admitted? I don't think you want to consider ones that only have ResourceQuota.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The candidates are picked from the list of workloads that hold a quota reservation, just because maybe a admission check of a workload is transitioning slowly to ready dose not mean that it should be able to hold its quota against a higher priority workload.

keps/993-two-phase-admission/README.md Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Outdated Show resolved Hide resolved
keps/993-two-phase-admission/README.md Show resolved Hide resolved
- If the updated list is empty, meaning the preemption cannot be done.
- Set its admission check to `Retry`, the quota reservation will be lost and the workload placed in the queue waiting for a new QuotaReservation.

**NOTE** The list of candidates is picked out from the list of workloads holding a QuotaReservation, regardless if they are fully Admitted or not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is QuotaReservation enough? Consider jobs waiting for ProvReq to be fulfilled could be waiting for hours. There is no point in preempting them yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ProvRequest Admission check can be configured with AfterCheckPassedOrOnDemand policy, the evictions will be done only when the provisioning is finished (and only if still needed).

In the future , if we can know that the provisioning request has no chance to succeed for long period of times, the ProvRecAC can set its state to Retry this will fully release the quota.After that time the state can be set to Pending to put wl back in the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is more about the candidates for eviction. Obviously, fully Admitted workloads are candidates for eviction.
Other than that, I think it should be only the ones that have checks ready or are requesting preemption themselves.

Copy link
Contributor Author

@trasc trasc Dec 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be a source of priority inversion. Say we have wl1 (p=1, reserving, one check pending), wl2 (p=2, fully admited), wl3(p=3, needs to evict wl1 or wl2), We should not evict wl2 just because wl1 has some pending admission check.

@k8s-ci-robot
Copy link
Contributor

@trasc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kueue-test-multikueue-e2e-main-1-29 9ddfc1b link true /test pull-kueue-test-multikueue-e2e-main-1-29

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to defer preemption when using admission checks
4 participants