-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(helm chart): Add metrics-server hardening options #1288
base: master
Are you sure you want to change the base?
feat(helm chart): Add metrics-server hardening options #1288
Conversation
|
Welcome @mkilchhofer! |
Hi @mkilchhofer. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
1a6b939
to
f0b5051
Compare
4ca13e0
to
167cf2d
Compare
167cf2d
to
7baff6e
Compare
/assign @stevehipwell |
7baff6e
to
25e9877
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mkilchhofer The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @mkilchhofer, I've got a couple of high level review point which it would be good if you could address before I review this in more details
- There is no need to run functions for the APIService if it's not enabled
- The
helm
mode would be a better default (see below) - You don't need to recreate the cert on each Helm install, you can look up the existing secret
- I don't think the chart should support creating a secret, the secret option should only be existing
- Please remove the
extraObjects
value, this isn't a "good" pattern
How do you mean that? The
Sure, will do
I think we should not rely on the lookup function. There are GitOps solutions in the wild which only use helm for rendering (helm template subcommand). The lookup function of helm returns nothing during template as it cannot talk to the kubeapi by design. Only upgrade or install talks to the apiserver.
Hmm why not? It is pretty common that helm charts support this.
How can I test the existing secret method then? I think it is an approved feature request, so contributors can file PRs for this, ref:
|
You moved the test for if the APIService was enabled below the new code.
In that case they can use one of the other patterns, GitOps shouldn't define architecture it should support best practice.
Because it's unnecessary and leaks data into the chart values.
We could add a test secret as part or the CI initialisation or it could be generated through the chart as a testing resource. |
Well, why is that an issue? This is how helm is designed for. These values are stored as a K8s secret. |
It's the unnecessary part which is the primary concern here; but the Helm implementation and increased surface area for secret information shouldn't be discounted as unimportant. But as I've said in my other replies you can easily extend this chart by creating a wrapper chart which depends on it and at which point you can make your own architectural decisions. |
cddec52
to
81f71cd
Compare
81f71cd
to
ee37851
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes @mkilchhofer, I've added a couple of structural change requests.
{{- $altNames = append $altNames (printf "%s.%s" $commonName $ns) }} | ||
{{- $altNames = append $altNames (printf "%s.%s.svc" $commonName $ns) }} | ||
{{- $altNames = append $altNames (printf "%s.%s.svc.%s" $commonName $ns .Values.tls.clusterDomain) }} | ||
{{- $certs = genSelfSignedCert $commonName nil $altNames 36500 }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look up the existing secret instead of generating a new certificate each time the chart is deployed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can easily recreate the Secrets every helm upgrade. It's even more secure to change it on a regular basis IMHO.
Using the lookup
function isn't safe against helm tooling as already mentioned somewhere in this PR.
People who are using
- Terraform template datasource
- Argo CD
- Kustomize with helm support
- probably Helmfile (cannot verify my statement here as I don't use this)
- probably Tanka (cannot verify my statement here as I don't use this)
uses the helm chart only for templating. And template cannot use lookup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second point is moot as this is a Helm chart and optimised for idiomatic Helm use, alternative usage patterns can use this if it works but if it doesn't there are alternatives. Also this should behave exactly the same as if lookup wasn't used if called by helm template
.
For the first point have you tested the behaviour of changing the certificate during standard usage of MS for HPAs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not test the upgrade procedure. But it should be fine:
- We update both the Secret and the APIserver resource
- Kubernetes spin up a new Pod and performs a rolling update
- The existing Pod should have an updated Secret mounted as we don't use subPath mounting options. MS uses generic
k8s.io/apiserver/pkg/server
to implement the API, this one uses github.com/fsnotify/fsnotify to watch if the certificate file changes: - ...
Do you need some test results here? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably need to test that MS is reloading the cert when it changes and then what the behaviour during the cert change is. If there is no interruption then I think we could not lookup the cert but add a comment in the code to say it could be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I now learned that Secrets/ConfigMaps are not immediately updated via kubelet. When there are no events happening on the Pod, it can take up to 1 minute.
Screen.Recording.2023-07-20.at.17.17.25_compressed.mov
I found the technical background inside a Kyverno policy 😄 :
Although ConfigMaps and Secrets mounted as volumes to a Pod, when the contents change, will eventually propagate to the Pods mounting them, this process may take between 60-90 seconds. In order to reduce that time, a modification made to downstream Pods will cause the changes to take effect almost instantly. This policy watches for changes to ConfigMaps which have been marked for this quick reloading process which contain the label `kyverno.io/watch=true` and will write an annotation to any Pods which mount them as volumes causing a fast refresh in their contents. See the related policy entitled "Refresh Environment Variables in Pods" for a similar reloading process when ConfigMaps and Secrets are consumed as environment variables instead. Use of this policy may require providing the Kyverno ServiceAccount with permission to update Pods.
Source: https://kyverno.io/policies/other/refresh-volumes-in-pods/refresh-volumes-in-pods/
Will investigate what I can do here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has been my experience with optimising Fluent Bit for hot-reload.
If the Helm pattern used lookup
it could be documented as interruption safe which run directly from Helm, if interruptions are an issue and you're using an alternative apply method then use Cert Manager (or an external secret).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option would be to use an init container or a helm job hook and something like ingress-nginx from the kubernetes org:
https://github.com/kubernetes/ingress-nginx/tree/main/images/kube-webhook-certgen
Seems to be a fork of https://github.com/jet/kube-webhook-certgen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like overengineering to me. For the Helm implementation we should use lookup
and document why you might not want to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. Lookup is already implemented 👍
b0326dc
to
0d99387
Compare
2e903c2
to
93f1c41
Compare
/triage accepted |
@mkilchhofer could you please rebase and then I'll review again? |
93f1c41
to
0f4d76f
Compare
@stevehipwell rebase done |
# Use helm lookup function to reuse Secret created in previous helm install | ||
lookup: true | ||
# Cert validity duration in days | ||
certDurationDays: 90 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this expiry should probably be longer by default and we should provide a cert replacement mechanism. This could be automated based on an expiry field in the secret or based on a generation chart input to rotate the certificate. Either way the Helm TLS pattern should probably be called out as not suitable for production without additional engineering overhead.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@mkilchhofer could you rebase this PR? |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
Will try to rebase the PR as soon as I have time |
/remove-lifecycle stale |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
What this PR does / why we need it:
It adds options on howto secure the connection between metrics-server and the kube-apiserver.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):related to:
Resolves #1230
Additional info:
The accepted FR #1230 is implemented also as I need this to test the hardening feature.
/cc: @serathius