New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-4368: Job API managed-by label #4370
Conversation
mimowo
commented
Dec 20, 2023
- One-line PR description: Proposes Job API label for Multi-Kueue
- Issue link: Job API managed-by mechanism #4368
- Other comments:
8ed11fd
to
93eb6dc
Compare
|
||
Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels]. | ||
|
||
#### Alpha |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason or some background as to why this is going straight to beta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base-functionality implementation is very simple, just short-circuit the syncJob
invocation. Also, I think going via Alpha is a way to allow for version skew between kube-apiservers in HA setup. To ensure that all apiserver instances recognize the new field. However, since this is a label, there is no such need technically.
Finally, we would like to have the feature soak without unnecessary delay.
I have added a note along the lines. PTAL.
ab35059
to
c24755c
Compare
|
||
In the [Beta](#beta) iteration of the feature we skip synchronization of the | ||
Jobs with the `managed-by` label, if it has any different value than `job-controller`. | ||
We also default the label for all newly created jobs, to avoid implicit defaults. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about requiring the label on all jobs. I was expecting the normal default controller to keep working as-is for jobs without the label, and to switch to excluding jobs with the label, without needing to decorate or modify any existing jobs. That would mean that only exceptional jobs that wanted to be managed by a non-default controller would opt into the label.
We also default the label for all newly created jobs
Does this mean 1) we'd set a label on a job created by a user, or 2) our controllers that create jobs (e.g. cronjob controller) would populate a label? If 1, is there precedence for that? What component would default that and how would it know what job controller name the "default" controller was running as?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about requiring the label on all jobs. I was expecting the normal default controller to keep working as-is for jobs without the label, and to switch to excluding jobs with the label, without needing to decorate or modify any existing jobs. That would mean that only exceptional jobs that wanted to be managed by a non-default controller would opt into the label.
Makes sense to me. This requirement complicates defaulting and would prolong graduation. Let's consider how it can be avoided.
One question is if the built-in controller should handle jobs with managed-by: job-controller
? I assume yes, by analogy to prior work on managed-by
label.
So, if we do with simple if
s to skip synchronization, then it is simple. If we want to use label selectors for jobs, then we can probably do it with two informers, one for no label, and one for job-controller
value.
I will adjust the KEP assuming we don't require it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to change the fact that we're using shared-informers. If that is true, then simple "if skip" is the only path forward.
... | ||
// LabelManagedBy is used to indicate the controller or entity that manages | ||
// an Job. | ||
LabelManagedBy = "batch.kubernetes.io/managed-by" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the simplicity of the surface area here. Some benefits of the label approach over a field:
- it generalizes to custom resources as well
- it could consistently live in the same place in multiple objects (all objects have metadata.labels)
The main downside I see is that there's no signal in openapi whether the cluster is new enough to be aware of the controller subdivision. That's largely a problem time would cure.
I wonder if it would make sense to teach all the core workload controllers a generic version of this (kubernetes.io/managed-by
) on their parent objects. That would enable use of multiple controllers on the same type, with the default (no managed-by label) meaning "use the default controller for this type".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
The main downside I see is that there's no signal in openapi whether the cluster is new enough to be aware of the controller subdivision. That's largely a problem time would cure.
Indeed, I added a note on the related risk in the Two controllers running at the same time on old version section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would make sense to teach all the core workload controllers a generic version of this (
kubernetes.io/managed-by
) on their parent objects. That would enable use of multiple controllers on the same type, with the default (no managed-by label) meaning "use the default controller for this type".
@atiratree this ^ would probably be a good goal for sig-apps long term to address.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an interesting idea. I suppose we could consider it, if there were clear benefits and a clearly defined process and contract for the ownership.
The downsides mentioned in this KEP are still quite significant to support this in all controllers. I think CRDs are still preferable for extending workloads like Deployments, StatefulSets, etc.
I know we have conformance tests, but that doesn't stop people from forking the controllers and implementing incompatible behavior. And API validation has its limits. This would then be problematic for the consumers of these workloads who would either be unprepared and unable to react in a safe and bug-free manner. And if they could, they would need extra logic and switch the behavior according to the current controller. They would also run into a similar problem if they found a controller they do not support.
As we see for example with StatefulSets and their need for migration. I expect the same need to arise for these controllers. E.g. I want to migrate from a controller A to a controller B for a new shiny functionality. So sooner or later we might make the managed label/field mutable. This would probably require some kind of locking, so that two controller would not operate on the same object. Even in mid change. And to consider rogue controllers.
The benefit for Jobs is that it is used as a building block in many projects, and we do not expect people to fork the controller. Not sure if this applies elsewhere.
@mimowo can we add the following to the GA graduation criteria?
Asses the fragmentation of the ecosystem. Look for other implementations of a job controller and asses their conformance with k8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, added the point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the generic kubernetes.io/managed-by
label. I would like to defer it for this KEP.
There is currently no clear use case to support the label across the stack for APIs like StatefulSets, Deployments, or DaemonSets.
A generic name without support across all k8s APIs might be confusing to the users, and supporting it for all k8s APIs would be much bigger effort than
currently needed for the MultiKueue scenario use.
The "managed-by" label idea has significant risks, such as "ecosystem fragmentation due to forks".
It makes sense to start with limited scope as a "pilot" and assess the impact.
I put the proposal, along with the reasons to defer in the Alternatives section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the generic
kubernetes.io/managed-by
label. I would like to defer it for this KEP.
Generic is definitely outside the scope of this KEP.
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
In the [Beta](#beta) iteration of the feature we skip synchronization of the | ||
Jobs with the `managed-by` label, if it has any different value than `job-controller`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question is what happens if the label is mutated on an existing job? From the point of view of a controller filtering on a label selector, that will result in a "delete" event if the label mutates to no longer match its selector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that all our controllers are using SharedInformers anyway, I don't think that setting a selector is even possible. So the only thing we can effectively handle differently events where the label is set and that means that we will effectively be handling update in those cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure how label selectors would work so deferred as a potential thing to reconsider for second Beta.
However, since we use SharedInformers we would keep all job objects in memory anyway. This greatly undermines the complicated idea of label selectors. I will update the KEP to just stay with the skip if
within the syncJob
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I give up on the idea of using label selectors, added them to the alternatives section, along with the reason why. The main point is that we cannot really use them to make memory gains due to used shared-informers. They are also much more complex conceptually.
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
e82d5c1
to
e221bcd
Compare
0f68654
to
b0477b3
Compare
In the [Beta](#beta) iteration of the feature we skip synchronization of the | ||
Jobs with the `managed-by` label, if it has any different value than `job-controller`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that all our controllers are using SharedInformers anyway, I don't think that setting a selector is even possible. So the only thing we can effectively handle differently events where the label is set and that means that we will effectively be handling update in those cases.
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
d2a6141
to
495866f
Compare
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
24bf924
to
fe3922b
Compare
/label tide/merge-method-squash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nits, but overall
/lgtm
/approve
In particular, the CronJob relies on the assumption that the `CompletionTime` | ||
is only set to successful jobs. Also, that a job does not flip its state from | ||
Complete to Failed (or the other way round). Also, a finished job should not | ||
flip back to non-finished. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strong 👍, if we're planning to allow external controllers update the status, we need to ensure that job won't start flapping between states.
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
…DME.md Co-authored-by: Maciej Szulik <soltysh@gmail.com>
/assign @wojtek-t |
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two very small comments - other than that LGTM.
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
keps/sig-apps/4368-support-managed-by-label-for-batch-jobs/README.md
Outdated
Show resolved
Hide resolved
/approve PRR [will lgtm after fixing the nit] |
done, also updated the checkpoints, PTAL (09321d0) |
/lgtm @soltysh - for SIG-level approval [you only lgtm-ed it] |
That's weird, I did that in #4370 (review) 🤷♂️ oh well, let me do that again. /approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mimowo, soltysh, wojtek-t The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |