Consider exposing node labels via downward api #40610

simonswine · 2017-01-27T16:18:02Z

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): downward api node labels

Is this a BUG REPORT or FEATURE REQUEST? (choose one): Feature request

I think the downward API should support exposing the node labels on which the pod it is running. While I understand that the Pod should not be coupled too much with the node it is scheduled to. But I think there are certain scenarios that require the pod to have more information about it's host.

failure-zones like rack names on premise or availability zones in cloud
instance-types for adapting thread settings per instance type
other labels like local ssd storage available

This helps applications that are cloud-native to understand where they are located. While this could be achieved using API calls to the node object, I think it should be supported by the downward API. A node object would expose way more information that should not be available to any pod. In OpenShift's default configuration there is access to the Node objects prevented by default.

@mattbates @smarterclayton @bprashanth

smarterclayton · 2017-01-28T05:37:12Z

There could be security implications for some types of clusters (exposing information the admin doesn't want the pod to have, like the exact core architecture). That said, for most deployments it doesn't seem completely unreasonable. A per node setting might decide whether to expose it or to return an empty map.

tpetracca · 2017-04-27T00:52:32Z

This was previously explored in #25957

0xmichalis · 2017-06-21T20:20:01Z

/sig node

fejta-bot · 2017-12-29T12:24:38Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

fejta-bot · 2018-01-28T12:33:25Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-02-27T13:18:50Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

discordianfish · 2018-04-06T12:19:57Z

/remove-lifecycle rotten
Can somebody please re-open this issue? I think this is a valid concern. I don't think that node labels and annotations are sensitive but if people disagree, maybe whitelisting some labels would be an option.

My use case is setting labels for node group detection in the autoscaler. If I could use the downward API, I wouldn't have to hardcode (or template) the manifests.

0xmichalis · 2018-04-06T12:23:50Z

/reopen

k8s-ci-robot · 2018-04-06T12:23:51Z

@Kargakis: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fejta-bot · 2018-07-05T13:00:42Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

discordianfish · 2018-07-09T17:22:55Z

/remove-lifecycle stale
/lifecycle freeze

dwilliams782 · 2018-09-20T07:50:30Z

Is this still being considered? I'm currently trying to find a sensible way to retrieve the zone a pod is scheduled in for rack awareness, as per the original use case.

laurieodgers · 2018-10-09T02:13:11Z

We're looking at doing this for a per-zone load balancer, and per-region load balancer on a single deployment. Our use case is that customers must be able to bind at layer 4 to multiple failure domains. This could be achieved with 3x deployments (one per zone) but it would be much nicer to only have to worry about a single deployment.

discordianfish · 2018-10-10T10:46:26Z

I think this should be brought up in a sig-node meeting, otherwise I doubt this will get attention and ultimately get closed off by @fejta-bot

fejta-bot · 2019-01-08T10:54:22Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

alculquicondor · 2023-05-04T13:40:32Z

Note: I'm not saying I'll implement this. If the SIG agrees that this is a doable approach, I'll leave it up to others to write a KEP and implementation.

thockin · 2023-05-04T14:40:32Z

Even this simple mode needs some thought. If a user sets a topology label (they shouldn't but we don't stop them) should we override it or leave it? Do we copy all such labels or just the well-known ones? Do we copy labels for pods that manually set nodeName (no scheduling)? Doel we copy labels onto mirror pods? Will this cause problems for systems which enumerate pod labels to derive identity (e.g. Cilium)? Do we need to update those if the labels change on the node? Topology should not change, but we don't prevent it... Just off the top of my head :)

…

On Thu, May 4, 2023, 6:40 AM Aldo Culquicondor ***@***.***> wrote: Note: I'm not saying I'll implement this. If the SIG agrees that this is a doable approach, I'll leave it up to others to write a KEP and implementation. — Reply to this email directly, view it on GitHub <#40610 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABKWAVG43G2M2SAGEAJMCDDXEOWWJANCNFSM4C57YMFA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

alculquicondor · 2023-05-04T18:54:14Z

We didn't have enough time to discuss during the meeting, but the SIG scheduling leads are aware of it.

My take is that we only copy well-known labels, including:

topology.kubernetes.io/region
topology.kubernetes.io/zone

The rest of the questions can probably be discussed in the KEP pull request.

abursavich · 2023-05-04T19:28:14Z

Automagically modifying pod labels is a bit scary to me, although it has the added benefit of free topology-based pod querying... How about adding a NodeFieldRef *ObjectFieldSelector field to EnvVarSource and DownwardAPIVolumeFile that is validated against an explicit allow-list of fields (which could grow in the future)?

e.g.:

metadata.name
metadata.labels['topology.kubernetes.io/region']
metadata.labels['topology.kubernetes.io/zone']

That way, everything is very explicit and follows well-established patterns.

2rs2ts · 2023-05-04T21:04:16Z

@thockin re: #40610 (comment) I'm not sure what the issue is here, we'd just be exposing node labels via the downward API right? So who cares what they're set to? Just make them available. We already have a downward API for things like node name. These are only exposable as env vars or mounted files. So this doesn't change pod labels. https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

I believe in the past week this conversation has gotten off track from the original request. I understand that every feature is an extra complexity and extra maintenance burden, but in this case I believe that the turn in discussion has made this much more complex than necessary.

I believe the only thing discussed in the past few posts that we are going to need to worry about is what happens when someone updates a node label after a pod that is using the label via the downward API has already fetched it. However, I'm pretty sure that there are things in the downward API that can change out without the pod having to be recreated, and that's why they're not allowed to be environment variables. So, wouldn't this fall under that case?

Anyway to everyone here who's suggested a KEP, I want to say this, with the caveat that I'm not someone very important, I just want to give my opinion on a phenomenon I've witnessed as a user and small-time contributor over the past 5 years; nothing I'm saying should be taken personally or is even written with a particular group of people in mind since I've seen this come from so many contributors that I can't keep track of them.

The KEP process is so overly bureaucratic and tedious that it's no wonder that people avoid engaging with it, especially for proposals that are pretty small in scope. KEPs make sense for big changes like adding or removing entire features, or breaking changes to existing features, or other efforts that would be wide-in-scope; however, asking people to make them for every little thing makes it seem–to a lot of people requesting smaller changes (like this one)–that KEPs exist primarily to prevent improvements from being made without sufficiently flagellating the people who dared to be so brazen as to ask for that improvement. I have noticed that a lot of issues I follow end up leading down the "make a KEP (talk to the hand)" path. Of course that is not their stated purpose but I am not alone in perceiving this as their de facto purpose, or at least, an intended side effect.

I sympathize with everyone's concerns for not wanting scope creep to end up making maintenance a nightmare, of course! I believe in this particular case we are overthinking ourselves into reasons why not to accept something as a valid feature request. Since it's possible to work around the lack of this feature, I agree it'd be a low priority feature and have only a moderate value add, so it's reasonable to say it should be left to a community contributor to implement when they feel like they have the time, but invoking the KEP process before they're allowed to try an implementation would be (IMO) bikeshedding, wasting not only their time but also the time of everyone reviewing the KEP.

But, that's just my 2 cents as a user and an extremely minor contributor who doesn't attend SIG meetings and doesn't know anyone here personally. Perhaps my opinion is born out of ignorance of the inner workings of the core kubernetes contributor community. Hopefully you can take it as just one perspective out of many.

thockin · 2023-05-04T23:04:54Z

we'd just be exposing node labels via the downward API right

How confident are you that every cluster operator in the world is OK with this - that none of them have information that isn't meant for pods to see?

We already have a downward API for things like node name

...which are are all present as fields on the Pod object. Node labels are fundamentally different, and warrant some caution.

KEPs exist primarily to prevent improvements from being made

Almost correct - KEPs exist primarily to prevent improvements from being made without thinking about some of the harder and less obvious issues that are easy to miss. I know it feels like a lot of BS, but the project is HUGE. Even I don't know where all the traps are any more. Everything in the KEP questionairre is there because someone forgot to think about it and caused a problem. KEPs are how we communicate the details to each other across time and space.

invoking the KEP process before they're allowed to try an implementation

You are free to try anything you like. Go implement it, and come back with evidence that it was easy, that all the tests Just Work, and use that to inform your KEP. 100%. Go forth and hack.

Hopefully you can take it as just one perspective out of many.

Look, I hear you. I myself have said "oh no, another KEP to write". But the reality is that the potential blast-radius of doing something dumb is just SO high, and it's so easy to make a mistake. Writing a KEP is not really that hard, but it does make you think about some things you might like to avoid thinking about until later. Bolting tests and metrics and scale on at the end just doesn't work.

I encourage you to try it. It's not so bad.

kerthcet · 2023-05-05T01:56:17Z

I'm interest with this issue, let me try to make it out, a KEP or a design doc as a startup.
/assign

jimmyjones2 · 2023-05-11T10:38:12Z

Or could spec.nodeTopologyZone and spec.nodeTopologyRegion be added from the nodes topology.kubernetes.io well known labels? Then pods could consume using downward API like spec.nodeName today?

xvdy · 2023-05-31T06:56:33Z

@kerthcet we need this feature. Does KEP in process? I want join this feature development.

kerthcet · 2023-05-31T11:33:22Z

On the way, when completed, you can take a review of that.

xvdy · 2023-07-17T03:04:16Z

@kerthcet Sorry to interrupt. How is the progress now?

marquiz · 2023-07-25T08:40:56Z

Just a wild idea: how about making this a Node Resource Interface (NRI) plugin, instead(?) That is, not make it part of Kubernetes but a runtime plugin that can be deployed by those who need it. (Note: NRI is quite new and isn't enabled by default in cri-o or containerd, yet).

Refs:

k8s-triage-robot · 2024-01-25T05:15:28Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

mindw · 2024-01-25T06:36:32Z

/remove-lifecycle stale

k8s-triage-robot · 2024-04-24T06:51:39Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

runningman84 · 2024-04-24T06:59:30Z

/remove-lifecycle stale

manshshriva · 2024-04-25T14:13:34Z

/remove-lifecycle stale

ArangoGutierrez · 2024-04-25T14:38:50Z

Just a wild idea: how about making this a Node Resource Interface (NRI) plugin, instead(?) That is, not make it part of Kubernetes but a runtime plugin that can be deployed by those who need it. (Note: NRI is quite new and isn't enabled by default in cri-o or containerd, yet).

Refs:

https://github.com/containerd/nri
-https://github.com/containerd/containerd/blob/main/docs/NRI.md

https://kccncna2022.sched.com/event/182JT

https://github.com/containers/nri-plugins

If we agree on this approach as a community, I am more than happy to take/assign this myself and work to get it done (with help from @marquiz )

cc @simonswine @kerthcet

alculquicondor · 2024-04-25T16:12:39Z

Would that make the solution less portable? How easy is it to install an NRI plugin, specially in a cloud environment?

thockin · 2024-04-25T18:54:02Z

Anything that bypasses access-control should be a non-starter. Doing it as NRI or doing it in Kubelet or doing it in apiserver are all identical in this aspect.

As said elsewhere, I think that copying topology labels from node->pod is totally reasonable, but has some corner cases to sort out. Anything more than that needs to answer access control.

As a workaround: https://github.com/thockin/kubectl-sidecar

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 21, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 21, 2017

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2017

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 28, 2018

k8s-ci-robot closed this as completed Feb 27, 2018

josdotso mentioned this issue Apr 3, 2018

Feature Request: Create a node's automatic node labels on its pods #62078

Open

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 6, 2018

0xmichalis reopened this Apr 6, 2018

bacongobbler mentioned this issue Jun 29, 2018

Inject values into charts from kubectl command statement result at runtime helm/helm#4270

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 9, 2018

frittentheke mentioned this issue Sep 4, 2018

Cluster-Autoscaler should query AWS meta-data to determine AWS_REGION if not set kubernetes/autoscaler#1208

Closed

washingtoneg mentioned this issue Sep 12, 2018

Downward API EnvVar and VolFiles contain differing information, could use more examples #43222

Closed

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 4, 2023

k8s-ci-robot assigned kerthcet May 5, 2023

xvdy mentioned this issue Jul 17, 2023

[feature request] Consider exposing node labels via downward api openkruise/kruise#1333

Closed

thockin mentioned this issue Jul 25, 2023

Make our kubectl image more useful? #119567

Closed

jkh52 mentioned this issue Nov 30, 2023

FR: support routing strategy "failure zone" kubernetes-sigs/apiserver-network-proxy#543

Open

6 tasks

ringerc mentioned this issue Dec 8, 2023

datadog: discover host inventory tags from environment or metrics stream open-telemetry/opentelemetry-collector-contrib#29700

Open

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2024

Consider exposing node labels via downward api #40610

Consider exposing node labels via downward api #40610

Comments

simonswine commented Jan 27, 2017

smarterclayton commented Jan 28, 2017

tpetracca commented Apr 27, 2017

0xmichalis commented Jun 21, 2017

fejta-bot commented Dec 29, 2017

fejta-bot commented Jan 28, 2018

fejta-bot commented Feb 27, 2018

discordianfish commented Apr 6, 2018

0xmichalis commented Apr 6, 2018

k8s-ci-robot commented Apr 6, 2018

fejta-bot commented Jul 5, 2018

discordianfish commented Jul 9, 2018

dwilliams782 commented Sep 20, 2018

laurieodgers commented Oct 9, 2018 • edited

discordianfish commented Oct 10, 2018

fejta-bot commented Jan 8, 2019

alculquicondor commented May 4, 2023

thockin commented May 4, 2023 via email

alculquicondor commented May 4, 2023

abursavich commented May 4, 2023 • edited

2rs2ts commented May 4, 2023

thockin commented May 4, 2023

kerthcet commented May 5, 2023

jimmyjones2 commented May 11, 2023

xvdy commented May 31, 2023

kerthcet commented May 31, 2023

xvdy commented Jul 17, 2023

marquiz commented Jul 25, 2023

k8s-triage-robot commented Jan 25, 2024

mindw commented Jan 25, 2024

k8s-triage-robot commented Apr 24, 2024

runningman84 commented Apr 24, 2024

manshshriva commented Apr 25, 2024

ArangoGutierrez commented Apr 25, 2024

alculquicondor commented Apr 25, 2024

thockin commented Apr 25, 2024

laurieodgers commented Oct 9, 2018 •

edited

abursavich commented May 4, 2023 •

edited