Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish preemption reasons #1942

Merged

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Apr 3, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #1874

Special notes for your reviewer:

Does this PR introduce a user-facing change?

A new condition with type Preempted allows to distinguish different reasons for the preemption to happen

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 3, 2024
Copy link

netlify bot commented Apr 3, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 3a4fc2d
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6616614b2f909d000869a295

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 3, 2024
@mimowo mimowo force-pushed the distinguish-preemption-reasons branch from 130989a to 03a9385 Compare April 4, 2024 08:04
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 4, 2024
@mimowo mimowo force-pushed the distinguish-preemption-reasons branch 2 times, most recently from 535e7cb to 4fcde5a Compare April 4, 2024 12:52
@mimowo mimowo force-pushed the distinguish-preemption-reasons branch 2 times, most recently from 4a38c6d to ca51d57 Compare April 4, 2024 17:41
@mimowo mimowo changed the title WIP: Distinguish preemption reasons Distinguish preemption reasons Apr 4, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 4, 2024
@mimowo
Copy link
Contributor Author

mimowo commented Apr 4, 2024

/assign @alculquicondor @astefanutti @tenzen-y

@alculquicondor
Copy link
Contributor

/release-note-edit

A new condition with type Preempted allows to distinguish different reasons for the preemption to happen

Maybe also include the names and how they might be expanded in the future?

pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
apis/kueue/v1beta1/workload_types.go Outdated Show resolved Hide resolved
pkg/workload/workload.go Outdated Show resolved Hide resolved
pkg/workload/workload.go Outdated Show resolved Hide resolved
test/integration/scheduler/preemption_test.go Outdated Show resolved Hide resolved
pkg/scheduler/preemption/preemption.go Outdated Show resolved Hide resolved
test/integration/scheduler/preemption_test.go Outdated Show resolved Hide resolved
@mimowo mimowo force-pushed the distinguish-preemption-reasons branch from 6541c10 to 727ae15 Compare April 8, 2024 07:13
@mimowo mimowo force-pushed the distinguish-preemption-reasons branch from 727ae15 to e09a6f6 Compare April 8, 2024 07:20
apis/kueue/v1beta1/workload_types.go Outdated Show resolved Hide resolved
apis/kueue/v1beta1/workload_types.go Outdated Show resolved Hide resolved
Type: kueue.WorkloadPreempted,
Status: metav1.ConditionFalse,
Reason: "InClusterQueue",
Message: fmt.Sprintf("Preempted to accommodate a workload (UID: %s) in the ClusterQueue", alphaMidWl.UID),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm... this is rather confusing, because now the workload isn't preempted.

Should we add a prefix? Like: Previously:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we see only the message, it may bring any confusion. But doesn't avoid confusion since we have a Status: False field?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I only flipped status for consistency with how it was done for the "Evicted" condition.

However, I think ideally, the condition should indicate the reason for the "last" transition.
The API comment, Also, an example in the Job controller when transitioning from True to False (see here.

In this case of transitoning Evicted / Preempted conditions from True->False, the reason for this transition is that it has now QuotaReserved. OTOH, it might be handy to still keep the information why the workload was preempted in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed a change along the lines in the last commit. PTAL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we see only the message, it may bring any confusion. But doesn't avoid confusion since we have a Status: False field?

I have already received tickets from customers that missed the status: false line. So better avoid the confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have already received tickets from customers that missed the status: false line. So better avoid the confusion.

I see. End-user's feedback is important. Thank you for sharing it with me.
/lgtm

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mimowo mimowo force-pushed the distinguish-preemption-reasons branch from 5eeaa40 to 6734c2a Compare April 10, 2024 09:33
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@@ -397,8 +397,27 @@ func SetQuotaReservation(w *kueue.Workload, admission *kueue.Admission) {
//reset Evicted condition if present.
if evictedCond := apimeta.FindStatusCondition(w.Status.Conditions, kueue.WorkloadEvicted); evictedCond != nil {
evictedCond.Status = metav1.ConditionFalse
evictedCond.Reason = "QuotaReserved"
evictedCond.Message = "Previously: " + evictedCond.Message
evictedCond.LastTransitionTime = metav1.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @vladikkuzn, given that you are working on #1939

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 1c5d2f5a818c1f699c4c742bb7b8d8325d3c55f9

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2024
@k8s-ci-robot k8s-ci-robot merged commit 4905cfd into kubernetes-sigs:main Apr 10, 2024
14 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone Apr 10, 2024
vsoch pushed a commit to researchapps/kueue that referenced this pull request Apr 18, 2024
* Distinguish the preemption reasons

* Review comments

* Review comments2

* Make the conditions more accurate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Distinguish between preemption within ClusterQueue, reclaim within Cohort and preemption with borrowing
5 participants