New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider exposing node labels via downward api #40610
Comments
There could be security implications for some types of clusters (exposing information the admin doesn't want the pod to have, like the exact core architecture). That said, for most deployments it doesn't seem completely unreasonable. A per node setting might decide whether to expose it or to return an empty map. |
This was previously explored in #25957 |
/sig node |
Issues go stale after 90d of inactivity. Prevent issues from auto-closing with an If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle rotten My use case is setting labels for node group detection in the autoscaler. If I could use the downward API, I wouldn't have to hardcode (or template) the manifests. |
/reopen |
@Kargakis: you can't re-open an issue/PR unless you authored it or you are assigned to it. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
/remove-lifecycle stale |
Is this still being considered? I'm currently trying to find a sensible way to retrieve the zone a pod is scheduled in for rack awareness, as per the original use case. |
We're looking at doing this for a per-zone load balancer, and per-region load balancer on a single deployment. Our use case is that customers must be able to bind at layer 4 to multiple failure domains. This could be achieved with 3x deployments (one per zone) but it would be much nicer to only have to worry about a single deployment. |
I think this should be brought up in a sig-node meeting, otherwise I doubt this will get attention and ultimately get closed off by @fejta-bot |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Note: I'm not saying I'll implement this. If the SIG agrees that this is a doable approach, I'll leave it up to others to write a KEP and implementation. |
Even this simple mode needs some thought. If a user sets a topology label
(they shouldn't but we don't stop them) should we override it or leave it?
Do we copy all such labels or just the well-known ones?
Do we copy labels for pods that manually set nodeName (no scheduling)?
Doel we copy labels onto mirror pods?
Will this cause problems for systems which enumerate pod labels to derive
identity (e.g. Cilium)?
Do we need to update those if the labels change on the node? Topology
should not change, but we don't prevent it...
Just off the top of my head :)
…On Thu, May 4, 2023, 6:40 AM Aldo Culquicondor ***@***.***> wrote:
Note: I'm not saying I'll implement this. If the SIG agrees that this is a
doable approach, I'll leave it up to others to write a KEP and
implementation.
—
Reply to this email directly, view it on GitHub
<#40610 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKWAVG43G2M2SAGEAJMCDDXEOWWJANCNFSM4C57YMFA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
We didn't have enough time to discuss during the meeting, but the SIG scheduling leads are aware of it. My take is that we only copy well-known labels, including:
The rest of the questions can probably be discussed in the KEP pull request. |
Automagically modifying pod labels is a bit scary to me, although it has the added benefit of free topology-based pod querying... How about adding a e.g.:
That way, everything is very explicit and follows well-established patterns. |
@thockin re: #40610 (comment) I'm not sure what the issue is here, we'd just be exposing node labels via the downward API right? So who cares what they're set to? Just make them available. We already have a downward API for things like node name. These are only exposable as env vars or mounted files. So this doesn't change pod labels. https://kubernetes.io/docs/concepts/workloads/pods/downward-api/ I believe in the past week this conversation has gotten off track from the original request. I understand that every feature is an extra complexity and extra maintenance burden, but in this case I believe that the turn in discussion has made this much more complex than necessary. I believe the only thing discussed in the past few posts that we are going to need to worry about is what happens when someone updates a node label after a pod that is using the label via the downward API has already fetched it. However, I'm pretty sure that there are things in the downward API that can change out without the pod having to be recreated, and that's why they're not allowed to be environment variables. So, wouldn't this fall under that case? Anyway to everyone here who's suggested a KEP, I want to say this, with the caveat that I'm not someone very important, I just want to give my opinion on a phenomenon I've witnessed as a user and small-time contributor over the past 5 years; nothing I'm saying should be taken personally or is even written with a particular group of people in mind since I've seen this come from so many contributors that I can't keep track of them. The KEP process is so overly bureaucratic and tedious that it's no wonder that people avoid engaging with it, especially for proposals that are pretty small in scope. KEPs make sense for big changes like adding or removing entire features, or breaking changes to existing features, or other efforts that would be wide-in-scope; however, asking people to make them for every little thing makes it seem–to a lot of people requesting smaller changes (like this one)–that KEPs exist primarily to prevent improvements from being made without sufficiently flagellating the people who dared to be so brazen as to ask for that improvement. I have noticed that a lot of issues I follow end up leading down the "make a KEP (talk to the hand)" path. Of course that is not their stated purpose but I am not alone in perceiving this as their de facto purpose, or at least, an intended side effect. I sympathize with everyone's concerns for not wanting scope creep to end up making maintenance a nightmare, of course! I believe in this particular case we are overthinking ourselves into reasons why not to accept something as a valid feature request. Since it's possible to work around the lack of this feature, I agree it'd be a low priority feature and have only a moderate value add, so it's reasonable to say it should be left to a community contributor to implement when they feel like they have the time, but invoking the KEP process before they're allowed to try an implementation would be (IMO) bikeshedding, wasting not only their time but also the time of everyone reviewing the KEP. But, that's just my 2 cents as a user and an extremely minor contributor who doesn't attend SIG meetings and doesn't know anyone here personally. Perhaps my opinion is born out of ignorance of the inner workings of the core kubernetes contributor community. Hopefully you can take it as just one perspective out of many. |
How confident are you that every cluster operator in the world is OK with this - that none of them have information that isn't meant for pods to see?
...which are are all present as fields on the Pod object. Node labels are fundamentally different, and warrant some caution.
Almost correct - KEPs exist primarily to prevent improvements from being made without thinking about some of the harder and less obvious issues that are easy to miss. I know it feels like a lot of BS, but the project is HUGE. Even I don't know where all the traps are any more. Everything in the KEP questionairre is there because someone forgot to think about it and caused a problem. KEPs are how we communicate the details to each other across time and space.
You are free to try anything you like. Go implement it, and come back with evidence that it was easy, that all the tests Just Work, and use that to inform your KEP. 100%. Go forth and hack.
Look, I hear you. I myself have said "oh no, another KEP to write". But the reality is that the potential blast-radius of doing something dumb is just SO high, and it's so easy to make a mistake. Writing a KEP is not really that hard, but it does make you think about some things you might like to avoid thinking about until later. Bolting tests and metrics and scale on at the end just doesn't work. I encourage you to try it. It's not so bad. |
I'm interest with this issue, let me try to make it out, a KEP or a design doc as a startup. |
Or could spec.nodeTopologyZone and spec.nodeTopologyRegion be added from the nodes topology.kubernetes.io well known labels? Then pods could consume using downward API like spec.nodeName today? |
@kerthcet we need this feature. Does KEP in process? I want join this feature development. |
On the way, when completed, you can take a review of that. |
@kerthcet Sorry to interrupt. How is the progress now? |
Just a wild idea: how about making this a Node Resource Interface (NRI) plugin, instead(?) That is, not make it part of Kubernetes but a runtime plugin that can be deployed by those who need it. (Note: NRI is quite new and isn't enabled by default in cri-o or containerd, yet). Refs: |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
/remove-lifecycle stale |
If we agree on this approach as a community, I am more than happy to take/assign this myself and work to get it done (with help from @marquiz ) |
Would that make the solution less portable? How easy is it to install an NRI plugin, specially in a cloud environment? |
Anything that bypasses access-control should be a non-starter. Doing it as NRI or doing it in Kubelet or doing it in apiserver are all identical in this aspect. As said elsewhere, I think that copying topology labels from node->pod is totally reasonable, but has some corner cases to sort out. Anything more than that needs to answer access control. As a workaround: https://github.com/thockin/kubectl-sidecar |
What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): downward api node labels
Is this a BUG REPORT or FEATURE REQUEST? (choose one): Feature request
I think the downward API should support exposing the node labels on which the pod it is running. While I understand that the Pod should not be coupled too much with the node it is scheduled to. But I think there are certain scenarios that require the pod to have more information about it's host.
This helps applications that are cloud-native to understand where they are located. While this could be achieved using API calls to the node object, I think it should be supported by the downward API. A node object would expose way more information that should not be available to any pod. In OpenShift's default configuration there is access to the Node objects prevented by default.
@mattbates @smarterclayton @bprashanth
The text was updated successfully, but these errors were encountered: