Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exposing node labels via downward api #40610

Open
simonswine opened this issue Jan 27, 2017 · 111 comments
Open

Consider exposing node labels via downward api #40610

simonswine opened this issue Jan 27, 2017 · 111 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/batch Categorizes an issue or PR as relevant to WG Batch.

Comments

@simonswine
Copy link
Contributor

What keywords did you search in Kubernetes issues before filing this one? (If you have found any duplicates, you should instead reply there.): downward api node labels


Is this a BUG REPORT or FEATURE REQUEST? (choose one): Feature request

I think the downward API should support exposing the node labels on which the pod it is running. While I understand that the Pod should not be coupled too much with the node it is scheduled to. But I think there are certain scenarios that require the pod to have more information about it's host.

  • failure-zones like rack names on premise or availability zones in cloud
  • instance-types for adapting thread settings per instance type
  • other labels like local ssd storage available

This helps applications that are cloud-native to understand where they are located. While this could be achieved using API calls to the node object, I think it should be supported by the downward API. A node object would expose way more information that should not be available to any pod. In OpenShift's default configuration there is access to the Node objects prevented by default.

@mattbates @smarterclayton @bprashanth

@smarterclayton
Copy link
Contributor

There could be security implications for some types of clusters (exposing information the admin doesn't want the pod to have, like the exact core architecture). That said, for most deployments it doesn't seem completely unreasonable. A per node setting might decide whether to expose it or to return an empty map.

@tpetracca
Copy link

This was previously explored in #25957

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jun 21, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 21, 2017
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 29, 2017
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 28, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@discordianfish
Copy link
Contributor

/remove-lifecycle rotten
Can somebody please re-open this issue? I think this is a valid concern. I don't think that node labels and annotations are sensitive but if people disagree, maybe whitelisting some labels would be an option.

My use case is setting labels for node group detection in the autoscaler. If I could use the downward API, I wouldn't have to hardcode (or template) the manifests.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Apr 6, 2018
@0xmichalis
Copy link
Contributor

/reopen

@k8s-ci-robot
Copy link
Contributor

@Kargakis: you can't re-open an issue/PR unless you authored it or you are assigned to it.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 5, 2018
@discordianfish
Copy link
Contributor

/remove-lifecycle stale
/lifecycle freeze

@dwilliams782
Copy link

Is this still being considered? I'm currently trying to find a sensible way to retrieve the zone a pod is scheduled in for rack awareness, as per the original use case.

@laurieodgers
Copy link

laurieodgers commented Oct 9, 2018

We're looking at doing this for a per-zone load balancer, and per-region load balancer on a single deployment. Our use case is that customers must be able to bind at layer 4 to multiple failure domains. This could be achieved with 3x deployments (one per zone) but it would be much nicer to only have to worry about a single deployment.

@discordianfish
Copy link
Contributor

I think this should be brought up in a sig-node meeting, otherwise I doubt this will get attention and ultimately get closed off by @fejta-bot

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 4, 2023
@alculquicondor
Copy link
Member

Note: I'm not saying I'll implement this. If the SIG agrees that this is a doable approach, I'll leave it up to others to write a KEP and implementation.

@thockin
Copy link
Member

thockin commented May 4, 2023 via email

@alculquicondor
Copy link
Member

We didn't have enough time to discuss during the meeting, but the SIG scheduling leads are aware of it.

My take is that we only copy well-known labels, including:

  • topology.kubernetes.io/region
  • topology.kubernetes.io/zone

The rest of the questions can probably be discussed in the KEP pull request.

@abursavich
Copy link
Contributor

abursavich commented May 4, 2023

Automagically modifying pod labels is a bit scary to me, although it has the added benefit of free topology-based pod querying... How about adding a NodeFieldRef *ObjectFieldSelector field to EnvVarSource and DownwardAPIVolumeFile that is validated against an explicit allow-list of fields (which could grow in the future)?

e.g.:

  • metadata.name
  • metadata.labels['topology.kubernetes.io/region']
  • metadata.labels['topology.kubernetes.io/zone']

That way, everything is very explicit and follows well-established patterns.

@2rs2ts
Copy link
Contributor

2rs2ts commented May 4, 2023

@thockin re: #40610 (comment) I'm not sure what the issue is here, we'd just be exposing node labels via the downward API right? So who cares what they're set to? Just make them available. We already have a downward API for things like node name. These are only exposable as env vars or mounted files. So this doesn't change pod labels. https://kubernetes.io/docs/concepts/workloads/pods/downward-api/

I believe in the past week this conversation has gotten off track from the original request. I understand that every feature is an extra complexity and extra maintenance burden, but in this case I believe that the turn in discussion has made this much more complex than necessary.

I believe the only thing discussed in the past few posts that we are going to need to worry about is what happens when someone updates a node label after a pod that is using the label via the downward API has already fetched it. However, I'm pretty sure that there are things in the downward API that can change out without the pod having to be recreated, and that's why they're not allowed to be environment variables. So, wouldn't this fall under that case?


Anyway to everyone here who's suggested a KEP, I want to say this, with the caveat that I'm not someone very important, I just want to give my opinion on a phenomenon I've witnessed as a user and small-time contributor over the past 5 years; nothing I'm saying should be taken personally or is even written with a particular group of people in mind since I've seen this come from so many contributors that I can't keep track of them.

The KEP process is so overly bureaucratic and tedious that it's no wonder that people avoid engaging with it, especially for proposals that are pretty small in scope. KEPs make sense for big changes like adding or removing entire features, or breaking changes to existing features, or other efforts that would be wide-in-scope; however, asking people to make them for every little thing makes it seem–to a lot of people requesting smaller changes (like this one)–that KEPs exist primarily to prevent improvements from being made without sufficiently flagellating the people who dared to be so brazen as to ask for that improvement. I have noticed that a lot of issues I follow end up leading down the "make a KEP (talk to the hand)" path. Of course that is not their stated purpose but I am not alone in perceiving this as their de facto purpose, or at least, an intended side effect.

I sympathize with everyone's concerns for not wanting scope creep to end up making maintenance a nightmare, of course! I believe in this particular case we are overthinking ourselves into reasons why not to accept something as a valid feature request. Since it's possible to work around the lack of this feature, I agree it'd be a low priority feature and have only a moderate value add, so it's reasonable to say it should be left to a community contributor to implement when they feel like they have the time, but invoking the KEP process before they're allowed to try an implementation would be (IMO) bikeshedding, wasting not only their time but also the time of everyone reviewing the KEP.

But, that's just my 2 cents as a user and an extremely minor contributor who doesn't attend SIG meetings and doesn't know anyone here personally. Perhaps my opinion is born out of ignorance of the inner workings of the core kubernetes contributor community. Hopefully you can take it as just one perspective out of many.

@thockin
Copy link
Member

thockin commented May 4, 2023

we'd just be exposing node labels via the downward API right

How confident are you that every cluster operator in the world is OK with this - that none of them have information that isn't meant for pods to see?

We already have a downward API for things like node name

...which are are all present as fields on the Pod object. Node labels are fundamentally different, and warrant some caution.

KEPs exist primarily to prevent improvements from being made

Almost correct - KEPs exist primarily to prevent improvements from being made without thinking about some of the harder and less obvious issues that are easy to miss. I know it feels like a lot of BS, but the project is HUGE. Even I don't know where all the traps are any more. Everything in the KEP questionairre is there because someone forgot to think about it and caused a problem. KEPs are how we communicate the details to each other across time and space.

invoking the KEP process before they're allowed to try an implementation

You are free to try anything you like. Go implement it, and come back with evidence that it was easy, that all the tests Just Work, and use that to inform your KEP. 100%. Go forth and hack.

Hopefully you can take it as just one perspective out of many.

Look, I hear you. I myself have said "oh no, another KEP to write". But the reality is that the potential blast-radius of doing something dumb is just SO high, and it's so easy to make a mistake. Writing a KEP is not really that hard, but it does make you think about some things you might like to avoid thinking about until later. Bolting tests and metrics and scale on at the end just doesn't work.

I encourage you to try it. It's not so bad.

@kerthcet
Copy link
Member

kerthcet commented May 5, 2023

I'm interest with this issue, let me try to make it out, a KEP or a design doc as a startup.
/assign

@jimmyjones2
Copy link
Contributor

Or could spec.nodeTopologyZone and spec.nodeTopologyRegion be added from the nodes topology.kubernetes.io well known labels? Then pods could consume using downward API like spec.nodeName today?

@xvdy
Copy link

xvdy commented May 31, 2023

@kerthcet we need this feature. Does KEP in process? I want join this feature development.

@kerthcet
Copy link
Member

On the way, when completed, you can take a review of that.

@xvdy
Copy link

xvdy commented Jul 17, 2023

@kerthcet Sorry to interrupt. How is the progress now?

@marquiz
Copy link
Contributor

marquiz commented Jul 25, 2023

Just a wild idea: how about making this a Node Resource Interface (NRI) plugin, instead(?) That is, not make it part of Kubernetes but a runtime plugin that can be deployed by those who need it. (Note: NRI is quite new and isn't enabled by default in cri-o or containerd, yet).

Refs:

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024
@mindw
Copy link

mindw commented Jan 25, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 25, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2024
@runningman84
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 24, 2024
@manshshriva
Copy link

/remove-lifecycle stale

@ArangoGutierrez
Copy link
Contributor

Just a wild idea: how about making this a Node Resource Interface (NRI) plugin, instead(?) That is, not make it part of Kubernetes but a runtime plugin that can be deployed by those who need it. (Note: NRI is quite new and isn't enabled by default in cri-o or containerd, yet).

Refs:

If we agree on this approach as a community, I am more than happy to take/assign this myself and work to get it done (with help from @marquiz )

cc @simonswine @kerthcet

@alculquicondor
Copy link
Member

Would that make the solution less portable? How easy is it to install an NRI plugin, specially in a cloud environment?

@thockin
Copy link
Member

thockin commented Apr 25, 2024

Anything that bypasses access-control should be a non-starter. Doing it as NRI or doing it in Kubelet or doing it in apiserver are all identical in this aspect.

As said elsewhere, I think that copying topology labels from node->pod is totally reasonable, but has some corner cases to sort out. Anything more than that needs to answer access control.

As a workaround: https://github.com/thockin/kubectl-sidecar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. wg/batch Categorizes an issue or PR as relevant to WG Batch.
Projects
Status: Needs Triage
Development

No branches or pull requests