Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate the reason for Eviction into the pod TerminationTarget condition #2160

Merged
merged 8 commits into from May 13, 2024

Conversation

pajakd
Copy link
Contributor

@pajakd pajakd commented May 8, 2024

What type of PR is this?

/kind feature

What this PR does / why we need it:

Whenever kueue is stopping a pod, it adds a condition TerminationTarget. However, the reason for termination is always "StoppedByKueue". The goal of this PR is to set a specific reason depending on why the pod is stopped (like workload is evicted or workload is deleted).

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

The reason for stopping a pod is now specified in the pod TerminationTarget condition

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 8, 2024
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 8, 2024
@k8s-ci-robot
Copy link
Contributor

Hi @pajakd. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 8, 2024
Copy link

netlify bot commented May 8, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit acc4914
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6641e14759424500086bd947

@trasc
Copy link
Contributor

trasc commented May 8, 2024

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 8, 2024
StopReasonNoMatchingWorkload
StopReasonNotAdmitted
StopReasonWorkloadDeleted StopReason = "WorkloadDeleted"
StopReasonWorkloadEvicted StopReason = "WorkloadEvicted"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm... this one might not be enough. We also need to know if it was WaitForPodsReady or Preemption

Copy link
Contributor Author

@pajakd pajakd May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, a second try. Please take a look. I don't know if I should add to this enum all possible Eviction reasons or I can just directly cast and propagate the evCond.Reason ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's see if we can use a hyphen to concatenate the reasons.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alculquicondor Could we decide the delimiter based on this discussion https://github.com/kubernetes/enhancements/pull/4479/files#r1581226166 since the reason should be CamelCase?

In condition types, and everywhere else they appear in the API, Reason is intended to be a one-word, CamelCase representation of the category of cause of the current status

https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties

That discussion will seem to happen at the next sig apps meeting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That discussion will seem to happen at the next sig apps meeting.

Oh, there is not the discussion in the agenda...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just trying to make the reason make sense.

WorkloadEvictedPreempted doesn't read that well.

So, the alternatives would be:

  1. WorkloadPreempted (drop the Evicted part)
  2. Preempted (just use the reason directly, and forget about the Workload bit)
  3. WorkloadEvictedBecausePreempted (same proposal as the hyphen, but with a word)

Maybe 1 reads best of all? WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you since we have already the Evicted in the condition.type, right?

Copy link
Contributor Author

@pajakd pajakd May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the possible reasons for eviction are:

const (
// WorkloadEvictedByPreemption indicates that the workload was evicted
// in order to free resources for a workload with a higher priority.
WorkloadEvictedByPreemption = "Preempted"
// WorkloadEvictedByPodsReadyTimeout indicates that the eviction took
// place due to a PodsReady timeout.
WorkloadEvictedByPodsReadyTimeout = "PodsReadyTimeout"
// WorkloadEvictedByAdmissionCheck indicates that the workload was evicted
// because at least one admission check transitioned to False.
WorkloadEvictedByAdmissionCheck = "AdmissionCheck"
// WorkloadEvictedByClusterQueueStopped indicates that the workload was evicted
// because the ClusterQueue is Stopped.
WorkloadEvictedByClusterQueueStopped = "ClusterQueueStopped"
// WorkloadEvictedByDeactivation indicates that the workload was evicted
// because spec.active is set to false.
WorkloadEvictedByDeactivation = "InactiveWorkload"

And 1. reads definitely good for "Preempted" but for the other ones it would be a bit confusing. In my opinion concatenation with the dash would make the most sense. But if it is not recommended in the guidelines then we can decide on any of the alternatives. Would WorkloadEvictedReason{Preempted,PodsReadyTimeout ...} (a variant of 3) be too long?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it, even if it's too long

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed the joining word to "Reason".

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 9, 2024
@pajakd pajakd marked this pull request as ready for review May 9, 2024 14:07
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2024
pajakd and others added 2 commits May 13, 2024 11:24
Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 13, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: d68c24335a60cf1b5fbb031ad04c1635bb7a6b32

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, pajakd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 13, 2024
@k8s-ci-robot k8s-ci-robot merged commit b6ec5fd into kubernetes-sigs:main May 13, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.7 milestone May 13, 2024
@pajakd pajakd deleted the reason_propagate branch May 14, 2024 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants