Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distinguish between preemption within ClusterQueue, reclaim within Cohort and preemption with borrowing #1874

Closed
3 tasks
alculquicondor opened this issue Mar 20, 2024 · 15 comments · Fixed by #1942
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@alculquicondor
Copy link
Contributor

alculquicondor commented Mar 20, 2024

What would you like to be added:

Preemption in Kueue can happen due to the following reasons:

  • Within a Cluster Queue, due to an incoming Pod with higher priority
  • Within a Cohort, to reclaim quota
  • Within a Cohort, due to preemption with borrowing, because the preempted workload is below a threshold.

We need to enhance the Condition reason and messages to incorporate this information.

Why is this needed:

For users to better understand why Kueue preempted their Jobs.

Administrators might want to aggregate information for multiple preemptions, so ideally the Reason should be different, with a common prefix.
A potential problem is to break users that already depend on the Preempted reason to identify preemptions. Perhaps highlighting the breaking change should be enough in the release notes.

We can also incorporate the information in a metric label.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@alculquicondor alculquicondor added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 20, 2024
@alculquicondor
Copy link
Contributor Author

cc @tenzen-y @astefanutti @kerthcet wdyt of the breaking change?

@astefanutti
Copy link
Member

+1 not maintaining backward compatibility is OK on our side.

I think that relates to #1741 for the metrics part.

@alculquicondor
Copy link
Contributor Author

Actually, the current state could be considered buggy, because the message in the Evicted condition always says: "Preempted to accommodate a higher priority Workload".

So at the very least we can start by distinguishing the message.

@tenzen-y
Copy link
Member

@alculquicondor, I agree with making UX better.
And I'm ok with these breaking changes. I my side, we expose such reasons via our gRPC API. So, I can just modify the internal Server logic.

But, I think we need to announce this change as breaking in RELEASE NOTE and release announcement in google groups.

@alculquicondor
Copy link
Contributor Author

Yes, we can put ACTION REQUIRED in the notes to highlight at the top.

@mimowo
Copy link
Contributor

mimowo commented Mar 27, 2024

+1. The current message is misleading because when reclaimWithinCohort: Any then a workload might get preempted by a lower priority workload actually.

I'm not sure using reason for this purpose is the right way. Currently we have 3 modes of preemption, we are likely to have more in the future, for example, if fair sharing is implemented. So we may end up adding many reasons, and determining the reasons might be tricky at some point as we need to propagate the information via a couple of functions.

I'm thinking about enriching the message, similarly as we build the events in the k8s scheduler. We could have a message like: `Preempted to accommodate the "UID-x" workload submitted to the "team-y" ClusterQueue."

@alculquicondor
Copy link
Contributor Author

The thing is that messages are not machine-readable. For direct users of Kueue, messages are acceptable. But if you use kueue as a low-level component, then you might need a reason to distinguish between the different kinds of preemption.

The reasons having a common prefix could be good enough to group them as one thing with different variants.

Btw, in fair sharing, the preemption reason would still be PreemptDueToReclaim (or something similar). It's just that the "reclamation thresholds" are dynamic.

@mimowo
Copy link
Contributor

mimowo commented Mar 27, 2024

The thing is that messages are not machine-readable.

I see, so maybe instead we have an extra condition type Preempted, with a dedicated reason. So we have 2 conditions:

- type: Evicted
  reason: Preempted
  message: preempted to accomodate workload UID
- type: Preempted
  reason: ReclaimWithinCohort / PreemptionWithinClusterQueue / FairSharingWithinCohort ... 
  message: preempted to accomodate workload UID # possibly some more data here 

For direct users of Kueue, messages are acceptable. But if you use kueue as a low-level component, then you might need a reason to distinguish between the different kinds of preemption.

Yes, but then when a new reason is added by Kueue, then the external system also needs to be extended for monitoring, when a new version of Kueue introduces a new reason (if the system does not need fine grained information). I guess we could mitigate this by some convention, say all reasons start with "PreemptedDueTo".

@alculquicondor
Copy link
Contributor Author

I like the idea of a dedicated Preempted condition. It's fully backwards compatible. We just need to make sure that the documentation makes a distinction between the two conditions (or that one is a subset of the other).

@astefanutti
Copy link
Member

A separate Preempted condition looks like a good solution, bringing better structure maintaining backward compatibility.

@tenzen-y
Copy link
Member

+1 on having the Evicted and the Preempted condition type.
Also, we may need to clarify what the difference is between Preemption and Eviction in the documentation.

@mimowo
Copy link
Contributor

mimowo commented Apr 3, 2024

/assign

@mimowo
Copy link
Contributor

mimowo commented Apr 4, 2024

I'm wondering if we should distinguish reclaimWithinCohort and preemption while borrowing, at the reason level, as suggested in the issue description.

I think preemption while borrowing can happen it two scenarios:

  1. we are only preempting workloads within ClusterQueue here
  2. we are preempting workloads within cohort (enabled by preemption.BorrowWithinCohort), see here.

Then, there are variants of (2), because the threshold does not need to be specified. Also, preemption.BorrowWithinCohort can only be enabled when preemption.reclaimWithinCohort is enabled, so it is more of an additional configuration option, rather than a new "mode" of preempting.

My suggestion would be to just go with two reasons:
PreemptionInClusterQueue and ReclaimWithinCohort . Then, we put the preemptor UID into the message so that one can figure out more if has access to the workload (this is is what I do in the initial version of the PR: #1942).

Any views?

@alculquicondor
Copy link
Contributor Author

If we can make the distinction between the two, that would be better IMO.

End users might have access to the preemption condition, but they might not have access to the names or priorities or other workloads. But they still would want to know roughly why they were preempted.

A nit on PreemptionInClusterQueue: It shouldn't start with preemption, because that's already the name of the condition. But maybe we can say HigherPriorityWithinClusterQueue. And similarly UnderPriorityThresholdForPreemptionWithBorrowing (I'm open to ideas for a shorter version).

@mimowo
Copy link
Contributor

mimowo commented Apr 4, 2024

From the offline discussions we want to balance two aspects: satisfy the MVP which distinguishes "In ClusterQueue" and "In cohort", and being as granular as possible.

I think there will always be a possibility, that a new config option or threshold is introduced, making the reason to split into two or more detailed reasons.

In that case, I would suggest to start with something simple for now, two reasons: InClusterQueue and InCohort. Then, we say that reasons prefixed by them are meant to provide more detailed information. So, in the future we may have:
InClusterQueueByHigherPriorityWorkload or InCohortByBorrowingWorkload, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants