New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716
Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716
Conversation
Skipping CI for Draft Pull Request. |
Draft, because in the local setup with KinD, the kubelet's certificate is not accepted by Prometheus.
and indeed:
whereas in the dev landscape:
I'm not sure how to configure the KinD setup https://github.com/gardener/gardener/tree/master/example/gardener-local/kind/cluster/templates in such a way that it generates a certificate for the Kubelet with an IP SAN. @rfranzke, @ScheererJ: maybe it rings a bell to you. |
The kube-prometheus stack also skips the verification of the kubelet's certificate - maybe it is not obvious to get the issuer of the kubelet's certificate and configure Prometheus to use it for the verification.
I wanted to check if there is an obvious approach to properly verify the kubelet's certificate, but it does not seem so. |
6e1dfe3
to
4fa086a
Compare
Current state:
To address this, we could either use the default CA for the Kubelet's certificates, or make the Kubelet-CA available in the seed cluster: currently it is only available in the control plane of the seed cluster. Idea: fyi @rfranzke
|
4fa086a
to
1f0b884
Compare
Once available, I think
could be a good approach to make the Kubelet CA-s available for the cluster's workload, like Prometheus. For the time being, I think we could skip the verification of the Kubelet's certificate in managed seeds. For simplicity, although it would work in the KinD based local setup and in non-managed seeds, I think skipping the verification with a single line for all the scenarios is preferable. The commit "Enable server certificate bootstrap for the Kubelet in the KinD cluster" could still be kept, so that the KinD cluster is more similar e.g. to a GKE cluster, where the kubelet's certificate is also signed by the default CA. If it is worth the effort, in a separate PR, we could try to make the Kubelet CA trust anchor available in the kube-system and garden namespaces as a result of a collaboration by the soil's gardenlet and the seed's gardenlet. |
1f0b884
to
b9f0cfe
Compare
Anyways, in the meantime, all the tests passed so the PR is ready to be reviewed.EDIT:
fails. One caveat was that the gardener/hack/test-e2e-local.sh Line 79 in c024f37
was not present on my local system and these entries are added only in the CI environment, so the tests failed here: gardener/test/utils/rotation/observability.go Lines 47 to 52 in b9f0cfe
but they passed after I added the required domain mapping
Note that these tests are kind of slow, even in a non resource constrained local environment which might explain that they hit a timeout from time to time in the more resource constrained Prow environment. Maybe a quick win could be to make sure that pods start and terminate in a timely fashion (less grace period) in the local setup. After preparing the local environment with
Executing a single test
gardener/test/e2e/gardener/shoot/create_rotate_delete.go Lines 172 to 173 in c024f37
took 1690s, ~28min and while inspecting what is happening, I noticed that probably a significant portion of this time is spent in pods starting up (getting the state ready) or terminating. I suspect that the underlying processes are faster, and maybe the probe configuration or termination period configuration could be tweaked to speed up these tests and reduce false positives. Maybe there is also a real bug / racecondition, that I could not reproduce. At least in one execution for 10+ minutes the shoot was blocked with the unexpected condition:
In another occasion, the tests in Prow failed with
and I'm not sure why the deployment could not be created in 7 minutes. When reviewing these insights together with @vicwicker, we concluded that the failing integration tests in Prow failed due to the worker node not becoming ready. Note that the observability component related symptoms are just consequences. The node itself did not get ready because of some calico / container issues:
At least it is not related to this PR. |
/lgtm |
LGTM label has been added. Git tree hash: dda0608142344ba2a2eded2708d743ddc56a0a60
|
According to https://kind.sigs.k8s.io/docs/user/configuration/#kubeadm-config-patches, it could be possible to provide a @istvanballok WDYT? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this nice improvement.
Now, that I looked through the changes I saw that you find an even better solution for the local setup than manually generating the certificates as I proposed.
I have only two minor questions. Apart from that it looks very good.
pkg/component/observability/monitoring/prometheus/cache/networkpolicy.go
Show resolved
Hide resolved
pkg/component/observability/monitoring/prometheus/cache/assets/scrapeconfigs/cadvisor.yaml
Outdated
Show resolved
Hide resolved
05d926c
to
7481e63
Compare
/retest |
/retest-required |
This is no flake but a real issue :) |
/lgtm |
LGTM label has been added. Git tree hash: 79cd0f3f104fc86ef19e10e85fbb7b790dd6bf8a
|
…ectly Previously, the Cache Prometheus scraped the kubelet and cadvisor metrics from the seed nodes via the Kubernetes API server's proxy feature. This is unnecessary as the Cache Prometheus can directly access the cluster nodes. This change enhances the Cache Prometheus configuration to scrape metrics directly from the cluster nodes, thereby eliminating the need for proxying through the Kubernetes API server, which results in Internet network traffic and associated costs. It's important to note that the control plane Prometheus still requires the proxy feature for scraping metrics from shoot nodes, as it operates within the seed cluster and lacks direct network access to the shoot nodes.
Without this change, the Prometheus federation fails with the following error in the local setup: Get "https://172.18.0.2:10250/metrics": tls: failed to verify certificate: x509: cannot validate certificate for 172.18.0.2 because it doesn't contain any IP SANs The reason for that is that the self issued certificate of the Kubelet: openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:gardener-local-ha-multi-zone-control-plane does not contain a SAN IP entry, unlike in a real cluster: openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24 With this change, the PR works in the local setup but the certificate is not verified. Although it would be fine to skip certificate verification only in the local setup (it is not straightforward to achieve it only for the local deployment), we should anyways try to verify the certificate in real clusters to prevent man in the middle attacks. Note that the impact of such an attack is fairly small (kubelet metrics) but it is a valid (academic) goal to properly protect the network path of the Kubelet scrape job.
The Prometheus federation fails with the following error in the local setup: Get "https://172.18.0.2:10250/metrics": tls: failed to verify certificate: x509: cannot validate certificate for 172.18.0.2 because it doesn't contain any IP SANs We could tweak the Prometheus configuration to scrape the nodes using the hostname and not the node's IP. That could be acceptable (although not idiomatic) in real clusters as well. This way Prometheus can validate the certificate, because it connects via the hostname, and the certificate contains a DNS SAN entry both in the local setup and in real clusters. Local setup: only DNS openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:gardener-local-ha-multi-zone-control-plane Real setup: both DNS and IP openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24 However, federation still fails in the local setup, because the issuer of the certificate is not trusted by Prometheus in the current configuration.
> Instead of self signing a serving certificate, the Kubelet will request a > certificate from the 'certificates.k8s.io' API. This requires an approver to > approve the certificate signing requests (CSR). https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ The CSRs are approved after the KinD cluster is created. With this setting, the Kubelet's certificate is issued by the default CA (kube-root-ca.crt) that Prometheus trusts in the current configuration, and the certificate contains a SAN IP entry for the node IP so that Prometheus can successfully verify the certificate. openssl s_client -connect 172.18.0.2:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -text | grep -A 1 'Subject Alternative Name:' X509v3 Subject Alternative Name: DNS:gardener-local-ha-multi-zone-worker2, IP Address:172.18.0.2 So, at this state of the PR, the local setup works, and it would also work in GKE managed clusters, where the Kubelet's certificate is issued by kube-root-ca.crt. However, it would not work in seeds that are Gardener shoots, because currently the Kubelet's certificates are issued by a separate CA, ca-kubelet-... which is available in the control plane of the seed, but not in the seed itself. Maybe the Gardenlet in the soil could copy this CA to the kube-system namespace of its shoot, and the Gardenlet in the seed, if it finds a ca-kubelet secret in the kube-system namespace, could copy it to the garden namespace. A conditional Prometheus volume could be used to mount this ca-kubelet to Prometheus if it exists (managed seed) or mount kube-root-ca.crt otherwise. This condition handling has to happen in Gardener because Prometheus expects a single CA in its configuration: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config
This is currently needed only for Gardener managed seeds, because the Kubelet's certificate is not issued by the default CA. Maybe the Gardenlet in the soil could copy the Kubelet CA to the kube-system namespace of its child shoot, and the Gardenlet in the seed, if it finds a ca-kubelet secret in the kube-system namespace (i.e. a managed seed), could copy it to the garden namespace. A conditional Prometheus volume could be used to mount this ca-kubelet to Prometheus if it exists (managed seed) or mount kube-root-ca.crt otherwise. This condition handling has to happen in Gardener because Prometheus expects a single CA in its configuration. > KEP-3257: Cluster Trust Bundles > https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-cluster-trust-bundles could be also a good approach to make the Kubelet CA-s available for the workload. The cluster internal communication is still encrypted (protection against a passively eavesdropping entity), but the certificate is not verified, which could allow for a man in the middle attack by an active entity. For the cluster internal scraping of the kubelet metrics, this might be acceptable. Both the ideas above might not be feasible short term, so with this PR, the certificate verification is skipped.
to the seed's node CIDR for the Kubelet network policy that was introduced in this PR, and also for the node exporter network policy which served as a blue print for the Kubelet network policy. With this change, a minor TODO of the node exporter network policy could be resolved.
Use the garden client and try to look for a managed seed with the same name as the seed. If it exists, it is a managed seed, otherwise it isn't. The isManagedSeed flag is passed down to the AdditionalScrapeConfigs function and golang text templating is used to set the insecure_skip_verify flag. To test this in the KinD based local setup: instead of using the default local project, create the shoot in the garden project (namespace) and upgrade it to a managed seed with: apiVersion: seedmanagement.gardener.cloud/v1alpha1 kind: ManagedSeed metadata: name: local namespace: garden spec: shoot: name: local gardenlet: bootstrap: BootstrapToken mergeWithParent: true config: seedConfig: spec: ingress: controller: kind: nginx Then check the scrape config both in the KinD cluster and in the managed seed: k get secrets prometheus-cache-additional-scrape-configs -o json | jq -r ' .data."prometheus.yaml" | <at>base64d ' | grep skip TLS verification is enabled in the KinD cluster but disabled in the managed seed.
2abd654
to
def83d3
Compare
def83d3
to
4ae8bfc
Compare
Thank you for the commits, @rfranzke.
It looks good to me. |
/lgtm |
LGTM label has been added. Git tree hash: 67d0556d518fe69acc37804581a3e9cf0d75007d
|
How to categorize this PR?
/area monitoring
/kind enhancement
What this PR does / why we need it:
Previously, the Cache Prometheus scraped the kubelet and cadvisor metrics from the seed nodes via the Kubernetes API server's proxy feature. This is unnecessary as the Cache Prometheus can directly access the cluster nodes.
This change enhances the Cache Prometheus configuration to scrape metrics directly from the cluster nodes, thereby eliminating the need for proxying through the Kubernetes API server, which results in Internet network traffic and associated costs.
It's important to note that the control plane Prometheus still requires the proxy feature for scraping metrics from shoot nodes, as it operates within the seed cluster and lacks direct network access to the shoot nodes.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
/cc @rfranzke @ScheererJ @vicwicker @rickardsjp
Release note: