Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude kube-system namespace from seed VPA webhook #9727

Closed

Conversation

andrerun
Copy link
Contributor

@andrerun andrerun commented May 9, 2024

How to categorize this PR?
/area auto-scaling
/kind improvement

What this PR does / why we need it:
Before this PR, Gardener uses the same mutating webhook for VPA on shoots and seeds. In result, seed VPA performs admission control to the seed's kube-system namespace. This has no benefits, as seed VPA has no targets in that namespace, and can only cause problems. Specifically, it fails a sanity check currently implemented by GKE and leads to GKE issuing warning messages.

This PR excludes the kube-system namespace from the seed VPA webhook.

Release note:

The `kube-system` namespace is now excluded by the seed VPA webhook.

Copy link
Contributor

gardener-prow bot commented May 9, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related labels May 9, 2024
Copy link
Contributor

gardener-prow bot commented May 9, 2024

@andrerun: The label(s) kind/improvement cannot be applied, because the repository doesn't have them.

In response to this:

How to categorize this PR?
/area auto-scaling
/kind improvement

What this PR does / why we need it:
Before this PR, Gardener uses the same mutating webhook for VPA on shoots and seeds. In result, seed VPA performs admission control to the seed's kube-system namespace. This has no benefits, as seed VPA has no targets in that namespace, and can only cause problems. Specifically, it fails a sanity check currently implemented by GKE and leads to GKE issuing warning messages.

This PR excludes the kube-system namespace from the seed VPA webhook.

Release note:

The `kube-system` namespace is now excluded by the seed VPA webhook.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@gardener-prow gardener-prow bot requested review from ary1992 and shafeeqes May 9, 2024 05:23
@gardener-prow gardener-prow bot added cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 9, 2024
@andrerun andrerun marked this pull request as ready for review May 9, 2024 06:36
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 9, 2024
@ialidzhikov
Copy link
Member

/assign

if v.values.ClusterType == component.ClusterTypeSeed {
namespaceSelector = &metav1.LabelSelector{
MatchExpressions: []metav1.LabelSelectorRequirement{
{Key: "name", Operator: metav1.LabelSelectorOpNotIn, Values: []string{"kube-system"}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key should be kubernetes.io/metadata.name, and there are constants that can be used here.

Suggested change
{Key: "name", Operator: metav1.LabelSelectorOpNotIn, Values: []string{"kube-system"}},
{Key: corev1.LabelMetadataName, Operator: metav1.LabelSelectorOpNotIn, Values: []string{metav1.NamespaceSystem}},

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for pointing me to the specific constant definitions!

@ialidzhikov
Copy link
Member

/kind enhancement

@gardener-prow gardener-prow bot added kind/enhancement Enhancement, improvement, extension and removed do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. labels May 9, 2024
@vpnachev
Copy link
Member

vpnachev commented May 9, 2024

I am just wondering is this the right approach? If a GKE cluster has VPA enabled then isn't it better to not install VPA via Gardener, i.e. set seed.spec.verticalPodAutoscaler.enabled=false?

The feature to install VPA on seeds was implemented for k8s clusters that are not equipped with VPA by the cluster vendor. With this change, such clusters will get the VPA from Gardener, however any component in the kube-system namespace will not be able to use it.

Copy link
Contributor

gardener-prow bot commented May 10, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ialidzhikov. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ialidzhikov
Copy link
Member

I am just wondering is this the right approach? If a GKE cluster has VPA enabled then isn't it better to not install VPA via Gardener, i.e. set seed.spec.verticalPodAutoscaler.enabled=false?

Is this related to this PR? When VPA is enabled for a Seed gardenlet installs the VPA CRDs and I think we can assume that there are no other VPAs in the cluster except the ones Gardener creates.

The feature to install VPA on seeds was implemented for k8s clusters that are not equipped with VPA by the cluster vendor. With this change, such clusters will get the VPA from Gardener, however any component in the kube-system namespace will not be able to use it.

I am not sure this is something we have to consider or support. As I shared gardenlet is the component that installs the VPA CRDs on Seed bootstrap. So, the case you describe could only happen if GKE or any other K8s cluster provider for some reason initially does not create a VPA in kube-system, then gardenlet installs the CRDs and afterwards it decides to create a VPA in the kube-system namespace.

cc @voelzmo

@vpnachev
Copy link
Member

I am just wondering is this the right approach? If a GKE cluster has VPA enabled then isn't it better to not install VPA via Gardener, i.e. set seed.spec.verticalPodAutoscaler.enabled=false?

Is this related to this PR? When VPA is enabled for a Seed gardenlet installs the VPA CRDs and I think we can assume that there are no other VPAs in the cluster except the ones Gardener creates.

The questions should be "Does Gardener have to install VPA controllers, CRDs, admission webhooks, etc. in the seeds that have VPA installed and maintained by the vendor?" My understanding is "no" as two different instances of controllers are acting on the same set of resources which leads to conflicts. So, for GKE clusters the landscape administrator must instruct Gardener to not install the VPA controllers, CRDs, webhooks, etc. in favor of the ones managed by Google. Alternatively, it should be explored if GKE can be configured to not enable the VPA for the cluster and then Gardener can install its own version of the controller.

In both cases, I think the change in this PR is not needed.

@ialidzhikov
Copy link
Member

ialidzhikov commented May 13, 2024

We had a short sync with @vpnachev and we found killer arguments against this PR:

  • I wrongly assumed that we don't deploy VPAs in the kube-system Namespace of a Seed. We found that there is an internal component (image signatures' verification) that deploys a VPA in the kube-system Namespace of Seed. For such a component, this PR would cause troubles as vpa-recommender/vpa-updated will evict the Pod to have the recommendations applied but the webhook will never apply the recommendations.
  • According to @vpnachev the webhook is not really problematic:
- admissionReviewVersions:
  - v1
  clientConfig:
    caBundle: <omitted>
    service:
      name: vpa-webhook
      namespace: garden
      port: 443
  failurePolicy: Ignore
  matchPolicy: Exact
  name: vpa.k8s.io
  namespaceSelector: {}
  objectSelector: {}
  reinvocationPolicy: IfNeeded
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  - apiGroups:
    - autoscaling.k8s.io
    apiVersions:
    - '*'
    operations:
    - CREATE
    - UPDATE
    resources:
    - verticalpodautoscalers
    scope: '*'
  sideEffects: None
  timeoutSeconds: 10

Due to failurePolicy: Ignore and the low timeout (10s) the Seed cluster's API server should not fail the request even if the webhook is unavailable.
GKE docs for this warning - they don't check the failurePolicy of the webhook.

/hold

Any other opinions? Otherwise I would suggest to close this PR. If the warning in GKE is problem, we should rather open a ticket to them.

@gardener-prow gardener-prow bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 13, 2024
@vpnachev
Copy link
Member

there is an internal component (image signatures' verification)

Actually, it is public component:

https://github.com/gardener/gardener-extension-shoot-lakom-service/blob/7bcc4d901171721712e2cb112c1d5a5d77e642ba/pkg/controller/seed/reconciler.go#L426-L429

@voelzmo
Copy link
Member

voelzmo commented May 15, 2024

If the warning in GKE is problem, we should rather open a ticket to them.

This was also discussed in the upstream VPA issue, with the same line of arguments: GKE shouldn't care about a webhook in kube-system, as long as failurePolicy: ignore is set.

And if we do put VPAs into kube-system, there is no way that we can make this change, as Ismail pointed out. So I guess let's close it for now and we can try to talk to our GKE contacts about this.

@ialidzhikov
Copy link
Member

/close

@gardener-prow gardener-prow bot closed this May 15, 2024
Copy link
Contributor

gardener-prow bot commented May 15, 2024

@ialidzhikov: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/auto-scaling Auto-scaling (CA/HPA/VPA/HVPA, predominantly control plane, but also otherwise) related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/enhancement Enhancement, improvement, extension size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants