Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716

istvanballok · 2024-05-07T12:23:44Z

How to categorize this PR?

/area monitoring
/kind enhancement

What this PR does / why we need it:

Previously, the Cache Prometheus scraped the kubelet and cadvisor metrics from the seed nodes via the Kubernetes API server's proxy feature. This is unnecessary as the Cache Prometheus can directly access the cluster nodes.

This change enhances the Cache Prometheus configuration to scrape metrics directly from the cluster nodes, thereby eliminating the need for proxying through the Kubernetes API server, which results in Internet network traffic and associated costs.

It's important to note that the control plane Prometheus still requires the proxy feature for scraping metrics from shoot nodes, as it operates within the seed cluster and lacks direct network access to the shoot nodes.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

/cc @rfranzke @ScheererJ @vicwicker @rickardsjp

Release note:

The Cache Prometheus scrapes the kubelet and cadvisor metrics directly, without using the proxy feature of the API server.

gardener-prow · 2024-05-07T12:23:47Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

istvanballok · 2024-05-07T12:24:33Z

Draft, because in the local setup with KinD, the kubelet's certificate is not accepted by Prometheus.

Get "https://172.18.0.2:10250/metrics": 
tls: failed to verify certificate: 
x509: cannot validate certificate for 172.18.0.2 
because it doesn't contain any IP SANs

and indeed:

openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:gardener-local-ha-multi-zone-control-plane

whereas in the dev landscape:

/ # openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24

I'm not sure how to configure the KinD setup https://github.com/gardener/gardener/tree/master/example/gardener-local/kind/cluster/templates in such a way that it generates a certificate for the Kubelet with an IP SAN.

@rfranzke, @ScheererJ: maybe it rings a bell to you.

istvanballok · 2024-05-08T15:04:42Z

The kube-prometheus stack also skips the verification of the kubelet's certificate - maybe it is not obvious to get the issuer of the kubelet's certificate and configure Prometheus to use it for the verification.

I wanted to check if there is an obvious approach to properly verify the kubelet's certificate, but it does not seem so.

istvanballok · 2024-05-10T14:08:53Z

Current state:

insecure_skip_verify would only be needed in managed seeds, because the Kubelet's certificate is not issued by the default CA in Gardener shoots.

To address this, we could either use the default CA for the Kubelet's certificates, or make the Kubelet-CA available in the seed cluster: currently it is only available in the control plane of the seed cluster.

Idea: fyi @rfranzke

Maybe the Gardenlet in the soil could copy the Kubelet CA to the kube-system
namespace of its child shoot (the seed), and the Gardenlet in the seed, if it finds a
ca-kubelet secret in the kube-system namespace (i.e. a managed seed), could copy
it to the garden namespace. A conditional Prometheus volume could be used to
mount this ca-kubelet to Prometheus if it exists (managed seed) or mount
kube-root-ca.crt otherwise. This condition handling has to happen in Gardener
because Prometheus expects a single CA in its configuration.

istvanballok · 2024-05-13T11:16:46Z

Once available, I think

KEP-3257: Cluster Trust Bundles
https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-cluster-trust-bundles

could be a good approach to make the Kubelet CA-s available for the cluster's workload, like Prometheus.

For the time being, I think we could skip the verification of the Kubelet's certificate in managed seeds. For simplicity, although it would work in the KinD based local setup and in non-managed seeds, I think skipping the verification with a single line for all the scenarios is preferable.

The commit "Enable server certificate bootstrap for the Kubelet in the KinD cluster" could still be kept, so that the KinD cluster is more similar e.g. to a GKE cluster, where the kubelet's certificate is also signed by the default CA.

If it is worth the effort, in a separate PR, we could try to make the Kubelet CA trust anchor available in the kube-system and garden namespaces as a result of a collaboration by the soil's gardenlet and the seed's gardenlet.

istvanballok · 2024-05-14T08:41:03Z

Anyways, in the meantime, all the tests passed so the PR is ready to be reviewed.

EDIT:
In a local setup I could not reproduce that

make ci-e2e-kind-ha-multi-zone

fails. One caveat was that the /etc/hosts entry here:

gardener/hack/test-e2e-local.sh

Line 79 in c024f37

    
           printf "\n$local_address gu-local--e2e-rotate.ingress.$seed_name.seed.local.gardener.cloud\n" >>/etc/hosts

was not present on my local system and these entries are added only in the CI environment, so the tests failed here:

gardener/test/utils/rotation/observability.go

Lines 47 to 52 in b9f0cfe

    
           By("Use old credentials to access observability endpoint") 
        
           Eventually(func(g Gomega) { 
        
           	response, err := accessEndpoint(ctx, v.observabilityEndpoint, v.oldKeypairData["username"], v.oldKeypairData["password"]) 
        
           	g.Expect(err).NotTo(HaveOccurred()) 
        
           	g.Expect(response.StatusCode).To(Equal(http.StatusOK)) 
        
           }).Should(Succeed())

but they passed after I added the required domain mapping

echo "127.0.0.1 gu-local--e2e-rotate.ingress.local-ha-multi-zone.seed.local.gardener.cloud" >> /etc/hosts

Note that these tests are kind of slow, even in a non resource constrained local environment which might explain that they hit a timeout from time to time in the more resource constrained Prow environment.

Maybe a quick win could be to make sure that pods start and terminate in a timely fashion (less grace period) in the local setup.

After preparing the local environment with

make kind-ha-multi-zone-up
make gardener-ha-multi-zone-up

Executing a single test

SHOOT_FAILURE_TOLERANCE_TYPE=zone 
ginkgo run --v --show-node-events 
  --label-filter "basic && credentials-rotation"
  test/e2e/gardener/...

gardener/test/e2e/gardener/shoot/create_rotate_delete.go

Lines 172 to 173 in c024f37

    
           Context("Shoot with workers", Label("basic"), func() { 
        
           	test(e2e.DefaultShoot("e2e-rotate"))

Shoot Tests Shoot with workers Create Shoot, Rotate Credentials and Delete Shoot [Shoot, default, basic, credentials-rotation]                                                                                                                                      
/root/go/src/github.com/gardener/gardener/test/e2e/gardener/shoot/create_rotate_delete.go:173

took 1690s, ~28min and while inspecting what is happening, I noticed that probably a significant portion of this time is spent in pods starting up (getting the state ready) or terminating. I suspect that the underlying processes are faster, and maybe the probe configuration or termination period configuration could be tweaked to speed up these tests and reduce false positives.

Maybe there is also a real bug / racecondition, that I could not reproduce. At least in one execution for 10+ minutes the shoot was blocked with the unexpected condition: ConfigUnsupported on the prometheus-shoot vpa.

{
  "level": "info",
  "ts": "2024-05-14T12:15:27.318+0200",
  "logger": "shoot-test.test",
  "msg": "Shoot not yet created",
  "shoot": {
    "name": "e2e-rotate",
    "namespace": "garden-local"
  },
  "reason":"condition type ObservabilityComponentsHealthy is not true yet,
  had message VerticalPodAutoscaler \"shoot--local--e2e-rotate/prometheus-shoot\" is unhealthy: 
  condition \"ConfigUnsupported\" has invalid status True (expected False) due to : 
  Unknown error during checking if target is a topmost well-known or scalable controller: 
  %!s(<nil>) with  reason VerticalPodAutoscalerUnhealthy"}

In another occasion, the tests in Prow failed with

Missing required deployments: [kube-state-metrics] with reason DeploymentMissing

and I'm not sure why the deployment could not be created in 7 minutes.

When reviewing these insights together with @vicwicker, we concluded that the failing integration tests in Prow failed due to the worker node not becoming ready. Note that the observability component related symptoms are just consequences. The node itself did not get ready because of some calico / container issues:

an error occurred when try to find sandbox: not found. Sending the event with nil podSandboxStatus.
...

At least it is not related to this PR.

vicwicker · 2024-05-14T14:21:28Z

/lgtm

gardener-prow · 2024-05-14T14:21:34Z

LGTM label has been added.

Git tree hash: dda0608142344ba2a2eded2708d743ddc56a0a60

ScheererJ · 2024-05-15T05:40:43Z

Draft, because in the local setup with KinD, the kubelet's certificate is not accepted by Prometheus.
Get "https://172.18.0.2:10250/metrics": 
tls: failed to verify certificate: 
x509: cannot validate certificate for 172.18.0.2 
because it doesn't contain any IP SANs
and indeed:
openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:gardener-local-ha-multi-zone-control-plane
whereas in the dev landscape:
/ # openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24
I'm not sure how to configure the KinD setup https://github.com/gardener/gardener/tree/master/example/gardener-local/kind/cluster/templates in such a way that it generates a certificate for the Kubelet with an IP SAN.

@rfranzke, @ScheererJ: maybe it rings a bell to you.

According to https://kind.sigs.k8s.io/docs/user/configuration/#kubeadm-config-patches, it could be possible to provide a KubeletConfiguration to the kind nodes. KubeletConfiguration has fields for TLSCertFile and TLSPrivateKeyFile (see here), which could allow to specify custom certificates for the nodes.
However, I have not tested this and I am unsure whether it makes sense to add this complexity to the local setup if we anyway need to skip TLS verification for the managed seed scenario.

@istvanballok WDYT?

ScheererJ

Thanks a lot for this nice improvement.
Now, that I looked through the changes I saw that you find an even better solution for the local setup than manually generating the certificates as I proposed.
I have only two minor questions. Apart from that it looks very good.

pkg/component/observability/monitoring/prometheus/cache/networkpolicy.go

pkg/component/observability/monitoring/prometheus/cache/assets/scrapeconfigs/cadvisor.yaml

oliver-goetz · 2024-05-21T08:30:34Z

/retest

vicwicker · 2024-05-21T14:39:40Z

/retest-required

rfranzke · 2024-05-21T14:50:27Z

@istvanballok: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gardener-integration 02495be link true /test pull-gardener-integration
Full PR test history. Your PR dashboard. Command help for this repository. Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

This is no flake but a real issue :)

rfranzke · 2024-05-22T08:33:25Z

/lgtm

gardener-prow · 2024-05-22T08:33:29Z

LGTM label has been added.

Git tree hash: 79cd0f3f104fc86ef19e10e85fbb7b790dd6bf8a

…ectly Previously, the Cache Prometheus scraped the kubelet and cadvisor metrics from the seed nodes via the Kubernetes API server's proxy feature. This is unnecessary as the Cache Prometheus can directly access the cluster nodes. This change enhances the Cache Prometheus configuration to scrape metrics directly from the cluster nodes, thereby eliminating the need for proxying through the Kubernetes API server, which results in Internet network traffic and associated costs. It's important to note that the control plane Prometheus still requires the proxy feature for scraping metrics from shoot nodes, as it operates within the seed cluster and lacks direct network access to the shoot nodes.

Without this change, the Prometheus federation fails with the following error in the local setup: Get "https://172.18.0.2:10250/metrics": tls: failed to verify certificate: x509: cannot validate certificate for 172.18.0.2 because it doesn't contain any IP SANs The reason for that is that the self issued certificate of the Kubelet: openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:gardener-local-ha-multi-zone-control-plane does not contain a SAN IP entry, unlike in a real cluster: openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24 With this change, the PR works in the local setup but the certificate is not verified. Although it would be fine to skip certificate verification only in the local setup (it is not straightforward to achieve it only for the local deployment), we should anyways try to verify the certificate in real clusters to prevent man in the middle attacks. Note that the impact of such an attack is fairly small (kubelet metrics) but it is a valid (academic) goal to properly protect the network path of the Kubelet scrape job.

The Prometheus federation fails with the following error in the local setup: Get "https://172.18.0.2:10250/metrics": tls: failed to verify certificate: x509: cannot validate certificate for 172.18.0.2 because it doesn't contain any IP SANs We could tweak the Prometheus configuration to scrape the nodes using the hostname and not the node's IP. That could be acceptable (although not idiomatic) in real clusters as well. This way Prometheus can validate the certificate, because it connects via the hostname, and the certificate contains a DNS SAN entry both in the local setup and in real clusters. Local setup: only DNS openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:gardener-local-ha-multi-zone-control-plane Real setup: both DNS and IP openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1 X509v3 Subject Alternative Name: DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24 However, federation still fails in the local setup, because the issuer of the certificate is not trusted by Prometheus in the current configuration.

> Instead of self signing a serving certificate, the Kubelet will request a > certificate from the 'certificates.k8s.io' API. This requires an approver to > approve the certificate signing requests (CSR). https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/ The CSRs are approved after the KinD cluster is created. With this setting, the Kubelet's certificate is issued by the default CA (kube-root-ca.crt) that Prometheus trusts in the current configuration, and the certificate contains a SAN IP entry for the node IP so that Prometheus can successfully verify the certificate. openssl s_client -connect 172.18.0.2:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -text | grep -A 1 'Subject Alternative Name:' X509v3 Subject Alternative Name: DNS:gardener-local-ha-multi-zone-worker2, IP Address:172.18.0.2 So, at this state of the PR, the local setup works, and it would also work in GKE managed clusters, where the Kubelet's certificate is issued by kube-root-ca.crt. However, it would not work in seeds that are Gardener shoots, because currently the Kubelet's certificates are issued by a separate CA, ca-kubelet-... which is available in the control plane of the seed, but not in the seed itself. Maybe the Gardenlet in the soil could copy this CA to the kube-system namespace of its shoot, and the Gardenlet in the seed, if it finds a ca-kubelet secret in the kube-system namespace, could copy it to the garden namespace. A conditional Prometheus volume could be used to mount this ca-kubelet to Prometheus if it exists (managed seed) or mount kube-root-ca.crt otherwise. This condition handling has to happen in Gardener because Prometheus expects a single CA in its configuration: https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config

This is currently needed only for Gardener managed seeds, because the Kubelet's certificate is not issued by the default CA. Maybe the Gardenlet in the soil could copy the Kubelet CA to the kube-system namespace of its child shoot, and the Gardenlet in the seed, if it finds a ca-kubelet secret in the kube-system namespace (i.e. a managed seed), could copy it to the garden namespace. A conditional Prometheus volume could be used to mount this ca-kubelet to Prometheus if it exists (managed seed) or mount kube-root-ca.crt otherwise. This condition handling has to happen in Gardener because Prometheus expects a single CA in its configuration. > KEP-3257: Cluster Trust Bundles > https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-cluster-trust-bundles could be also a good approach to make the Kubelet CA-s available for the workload. The cluster internal communication is still encrypted (protection against a passively eavesdropping entity), but the certificate is not verified, which could allow for a man in the middle attack by an active entity. For the cluster internal scraping of the kubelet metrics, this might be acceptable. Both the ideas above might not be feasible short term, so with this PR, the certificate verification is skipped.

to the seed's node CIDR for the Kubelet network policy that was introduced in this PR, and also for the node exporter network policy which served as a blue print for the Kubelet network policy. With this change, a minor TODO of the node exporter network policy could be resolved.

Use the garden client and try to look for a managed seed with the same name as the seed. If it exists, it is a managed seed, otherwise it isn't. The isManagedSeed flag is passed down to the AdditionalScrapeConfigs function and golang text templating is used to set the insecure_skip_verify flag. To test this in the KinD based local setup: instead of using the default local project, create the shoot in the garden project (namespace) and upgrade it to a managed seed with: apiVersion: seedmanagement.gardener.cloud/v1alpha1 kind: ManagedSeed metadata: name: local namespace: garden spec: shoot: name: local gardenlet: bootstrap: BootstrapToken mergeWithParent: true config: seedConfig: spec: ingress: controller: kind: nginx Then check the scrape config both in the KinD cluster and in the managed seed: k get secrets prometheus-cache-additional-scrape-configs -o json | jq -r ' .data."prometheus.yaml" | <at>base64d ' | grep skip TLS verification is enabled in the KinD cluster but disabled in the managed seed.

istvanballok · 2024-05-22T10:25:34Z

Fix failing integration test by adding missing scheme
Do not panic

Thank you for the commits, @rfranzke.

Prow: New changes are detected. LGTM label has been removed.

It looks good to me.

rfranzke · 2024-05-22T10:26:46Z

/lgtm

gardener-prow · 2024-05-22T10:26:51Z

LGTM label has been added.

Git tree hash: 67d0556d518fe69acc37804581a3e9cf0d75007d

gardener-prow bot requested review from rfranzke, rickardsjp, ScheererJ and vicwicker May 7, 2024 12:23

istvanballok force-pushed the cache-prometheus branch from 6e1dfe3 to 4fa086a Compare May 10, 2024 14:04

istvanballok force-pushed the cache-prometheus branch from 4fa086a to 1f0b884 Compare May 13, 2024 11:16

istvanballok marked this pull request as ready for review May 13, 2024 11:16

gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024

istvanballok force-pushed the cache-prometheus branch from 1f0b884 to b9f0cfe Compare May 13, 2024 11:58

gardener-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 13, 2024

gardener-prow bot assigned vicwicker May 14, 2024

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024

ScheererJ reviewed May 15, 2024

View reviewed changes

pkg/component/observability/monitoring/prometheus/cache/networkpolicy.go Show resolved Hide resolved

pkg/component/observability/monitoring/prometheus/cache/assets/scrapeconfigs/cadvisor.yaml Outdated Show resolved Hide resolved

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 15, 2024

istvanballok force-pushed the cache-prometheus branch from 05d926c to 7481e63 Compare May 17, 2024 14:31

gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2024

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 22, 2024

gardener-prow bot requested a review from ScheererJ May 22, 2024 08:33

gardener-prow bot assigned rfranzke May 22, 2024

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2024

istvanballok added 10 commits May 22, 2024 10:47

Allow network traffic from the Cache Prometheus to the Kubelet

0bd14c4

Revert "WIP: Option 1: skip verification of the kubelet's certificate"

5782175

Revert "WIP: Option 2: use the host name to scrape the nodes"

c93990b

rfranzke force-pushed the cache-prometheus branch from 2abd654 to def83d3 Compare May 22, 2024 08:48

gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 22, 2024

rfranzke added 2 commits May 22, 2024 10:48

Fix failing integration test by adding missing scheme

4e9b5b1

Do not panic

4ae8bfc

rfranzke force-pushed the cache-prometheus branch from def83d3 to 4ae8bfc Compare May 22, 2024 08:48

gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2024

gardener-prow bot merged commit d404d6e into gardener:master May 22, 2024
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716

Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716

istvanballok commented May 7, 2024

gardener-prow bot commented May 7, 2024

istvanballok commented May 7, 2024

istvanballok commented May 8, 2024

istvanballok commented May 10, 2024 •

edited

istvanballok commented May 13, 2024

istvanballok commented May 14, 2024 •

edited

vicwicker commented May 14, 2024

gardener-prow bot commented May 14, 2024

ScheererJ commented May 15, 2024

ScheererJ left a comment

oliver-goetz commented May 21, 2024

vicwicker commented May 21, 2024

rfranzke commented May 21, 2024

rfranzke commented May 22, 2024

gardener-prow bot commented May 22, 2024

istvanballok commented May 22, 2024

rfranzke commented May 22, 2024

gardener-prow bot commented May 22, 2024

Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716

Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716

Conversation

istvanballok commented May 7, 2024

gardener-prow bot commented May 7, 2024

istvanballok commented May 7, 2024

istvanballok commented May 8, 2024

istvanballok commented May 10, 2024 • edited

istvanballok commented May 13, 2024

istvanballok commented May 14, 2024 • edited

vicwicker commented May 14, 2024

gardener-prow bot commented May 14, 2024

ScheererJ commented May 15, 2024

ScheererJ left a comment

Choose a reason for hiding this comment

oliver-goetz commented May 21, 2024

vicwicker commented May 21, 2024

rfranzke commented May 21, 2024

rfranzke commented May 22, 2024

gardener-prow bot commented May 22, 2024

istvanballok commented May 22, 2024

rfranzke commented May 22, 2024

gardener-prow bot commented May 22, 2024

istvanballok commented May 10, 2024 •

edited

istvanballok commented May 14, 2024 •

edited