Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure Cache Prometheus to scrape kubelet and cadvisor metrics directly #9716

Merged
merged 12 commits into from May 22, 2024

Conversation

istvanballok
Copy link
Contributor

How to categorize this PR?

/area monitoring
/kind enhancement

What this PR does / why we need it:

Previously, the Cache Prometheus scraped the kubelet and cadvisor metrics from the seed nodes via the Kubernetes API server's proxy feature. This is unnecessary as the Cache Prometheus can directly access the cluster nodes.

This change enhances the Cache Prometheus configuration to scrape metrics directly from the cluster nodes, thereby eliminating the need for proxying through the Kubernetes API server, which results in Internet network traffic and associated costs.

It's important to note that the control plane Prometheus still requires the proxy feature for scraping metrics from shoot nodes, as it operates within the seed cluster and lacks direct network access to the shoot nodes.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

/cc @rfranzke @ScheererJ @vicwicker @rickardsjp

Release note:

The Cache Prometheus scrapes the kubelet and cadvisor metrics directly, without using the proxy feature of the API server.

Copy link
Contributor

gardener-prow bot commented May 7, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@gardener-prow gardener-prow bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. area/monitoring Monitoring (including availability monitoring and alerting) related kind/enhancement Enhancement, improvement, extension cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 7, 2024
@istvanballok
Copy link
Contributor Author

Draft, because in the local setup with KinD, the kubelet's certificate is not accepted by Prometheus.

Get "https://172.18.0.2:10250/metrics": 
tls: failed to verify certificate: 
x509: cannot validate certificate for 172.18.0.2 
because it doesn't contain any IP SANs

and indeed:

openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:gardener-local-ha-multi-zone-control-plane

whereas in the dev landscape:

/ # openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24

I'm not sure how to configure the KinD setup https://github.com/gardener/gardener/tree/master/example/gardener-local/kind/cluster/templates in such a way that it generates a certificate for the Kubelet with an IP SAN.

@rfranzke, @ScheererJ: maybe it rings a bell to you.

@istvanballok
Copy link
Contributor Author

The kube-prometheus stack also skips the verification of the kubelet's certificate - maybe it is not obvious to get the issuer of the kubelet's certificate and configure Prometheus to use it for the verification.

I wanted to check if there is an obvious approach to properly verify the kubelet's certificate, but it does not seem so.

@istvanballok
Copy link
Contributor Author

istvanballok commented May 10, 2024

Current state:

insecure_skip_verify would only be needed in managed seeds, because the Kubelet's certificate is not issued by the default CA in Gardener shoots.

To address this, we could either use the default CA for the Kubelet's certificates, or make the Kubelet-CA available in the seed cluster: currently it is only available in the control plane of the seed cluster.

Idea: fyi @rfranzke

Maybe the Gardenlet in the soil could copy the Kubelet CA to the kube-system
namespace of its child shoot (the seed), and the Gardenlet in the seed, if it finds a
ca-kubelet secret in the kube-system namespace (i.e. a managed seed), could copy
it to the garden namespace. A conditional Prometheus volume could be used to
mount this ca-kubelet to Prometheus if it exists (managed seed) or mount
kube-root-ca.crt otherwise. This condition handling has to happen in Gardener
because Prometheus expects a single CA in its configuration.

image

@istvanballok
Copy link
Contributor Author

Once available, I think

KEP-3257: Cluster Trust Bundles
https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-cluster-trust-bundles

could be a good approach to make the Kubelet CA-s available for the cluster's workload, like Prometheus.

For the time being, I think we could skip the verification of the Kubelet's certificate in managed seeds. For simplicity, although it would work in the KinD based local setup and in non-managed seeds, I think skipping the verification with a single line for all the scenarios is preferable.

The commit "Enable server certificate bootstrap for the Kubelet in the KinD cluster" could still be kept, so that the KinD cluster is more similar e.g. to a GKE cluster, where the kubelet's certificate is also signed by the default CA.

If it is worth the effort, in a separate PR, we could try to make the Kubelet CA trust anchor available in the kube-system and garden namespaces as a result of a collaboration by the soil's gardenlet and the seed's gardenlet.

@istvanballok istvanballok marked this pull request as ready for review May 13, 2024 11:16
@gardener-prow gardener-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 13, 2024
@gardener-prow gardener-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 13, 2024
@istvanballok
Copy link
Contributor Author

istvanballok commented May 14, 2024

Anyways, in the meantime, all the tests passed so the PR is ready to be reviewed.

EDIT:
In a local setup I could not reproduce that

make ci-e2e-kind-ha-multi-zone

fails. One caveat was that the /etc/hosts entry here:

printf "\n$local_address gu-local--e2e-rotate.ingress.$seed_name.seed.local.gardener.cloud\n" >>/etc/hosts

was not present on my local system and these entries are added only in the CI environment, so the tests failed here:

By("Use old credentials to access observability endpoint")
Eventually(func(g Gomega) {
response, err := accessEndpoint(ctx, v.observabilityEndpoint, v.oldKeypairData["username"], v.oldKeypairData["password"])
g.Expect(err).NotTo(HaveOccurred())
g.Expect(response.StatusCode).To(Equal(http.StatusOK))
}).Should(Succeed())

but they passed after I added the required domain mapping

echo "127.0.0.1 gu-local--e2e-rotate.ingress.local-ha-multi-zone.seed.local.gardener.cloud" >> /etc/hosts

Note that these tests are kind of slow, even in a non resource constrained local environment which might explain that they hit a timeout from time to time in the more resource constrained Prow environment.

Maybe a quick win could be to make sure that pods start and terminate in a timely fashion (less grace period) in the local setup.

After preparing the local environment with

make kind-ha-multi-zone-up
make gardener-ha-multi-zone-up

Executing a single test

SHOOT_FAILURE_TOLERANCE_TYPE=zone 
ginkgo run --v --show-node-events 
  --label-filter "basic && credentials-rotation"
  test/e2e/gardener/...

Context("Shoot with workers", Label("basic"), func() {
test(e2e.DefaultShoot("e2e-rotate"))

Shoot Tests Shoot with workers Create Shoot, Rotate Credentials and Delete Shoot [Shoot, default, basic, credentials-rotation]                                                                                                                                      
/root/go/src/github.com/gardener/gardener/test/e2e/gardener/shoot/create_rotate_delete.go:173

took 1690s, ~28min and while inspecting what is happening, I noticed that probably a significant portion of this time is spent in pods starting up (getting the state ready) or terminating. I suspect that the underlying processes are faster, and maybe the probe configuration or termination period configuration could be tweaked to speed up these tests and reduce false positives.

Maybe there is also a real bug / racecondition, that I could not reproduce. At least in one execution for 10+ minutes the shoot was blocked with the unexpected condition: ConfigUnsupported on the prometheus-shoot vpa.

{
  "level": "info",
  "ts": "2024-05-14T12:15:27.318+0200",
  "logger": "shoot-test.test",
  "msg": "Shoot not yet created",
  "shoot": {
    "name": "e2e-rotate",
    "namespace": "garden-local"
  },
  "reason":"condition type ObservabilityComponentsHealthy is not true yet,
  had message VerticalPodAutoscaler \"shoot--local--e2e-rotate/prometheus-shoot\" is unhealthy: 
  condition \"ConfigUnsupported\" has invalid status True (expected False) due to : 
  Unknown error during checking if target is a topmost well-known or scalable controller: 
  %!s(<nil>) with  reason VerticalPodAutoscalerUnhealthy"}

In another occasion, the tests in Prow failed with

Missing required deployments: [kube-state-metrics] with reason DeploymentMissing

and I'm not sure why the deployment could not be created in 7 minutes.

When reviewing these insights together with @vicwicker, we concluded that the failing integration tests in Prow failed due to the worker node not becoming ready. Note that the observability component related symptoms are just consequences. The node itself did not get ready because of some calico / container issues:

an error occurred when try to find sandbox: not found. Sending the event with nil podSandboxStatus.
...

At least it is not related to this PR.

@vicwicker
Copy link
Contributor

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 14, 2024
Copy link
Contributor

gardener-prow bot commented May 14, 2024

LGTM label has been added.

Git tree hash: dda0608142344ba2a2eded2708d743ddc56a0a60

@ScheererJ
Copy link
Contributor

Draft, because in the local setup with KinD, the kubelet's certificate is not accepted by Prometheus.

Get "https://172.18.0.2:10250/metrics": 
tls: failed to verify certificate: 
x509: cannot validate certificate for 172.18.0.2 
because it doesn't contain any IP SANs

and indeed:

openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:gardener-local-ha-multi-zone-control-plane

whereas in the dev landscape:

/ # openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
            X509v3 Subject Alternative Name: 
                DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24

I'm not sure how to configure the KinD setup https://github.com/gardener/gardener/tree/master/example/gardener-local/kind/cluster/templates in such a way that it generates a certificate for the Kubelet with an IP SAN.

@rfranzke, @ScheererJ: maybe it rings a bell to you.

According to https://kind.sigs.k8s.io/docs/user/configuration/#kubeadm-config-patches, it could be possible to provide a KubeletConfiguration to the kind nodes. KubeletConfiguration has fields for TLSCertFile and TLSPrivateKeyFile (see here), which could allow to specify custom certificates for the nodes.
However, I have not tested this and I am unsure whether it makes sense to add this complexity to the local setup if we anyway need to skip TLS verification for the managed seed scenario.

@istvanballok WDYT?

Copy link
Contributor

@ScheererJ ScheererJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this nice improvement.
Now, that I looked through the changes I saw that you find an even better solution for the local setup than manually generating the certificates as I proposed.
I have only two minor questions. Apart from that it looks very good.

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 15, 2024
@gardener-prow gardener-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 21, 2024
@oliver-goetz
Copy link
Member

/retest

@vicwicker
Copy link
Contributor

/retest-required

@rfranzke
Copy link
Member

@istvanballok: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-gardener-integration 02495be link true /test pull-gardener-integration
Full PR test history. Your PR dashboard. Command help for this repository. Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see our testing guideline for how to avoid and hunt flakes.

This is no flake but a real issue :)

@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 22, 2024
@gardener-prow gardener-prow bot requested a review from ScheererJ May 22, 2024 08:33
@rfranzke
Copy link
Member

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2024
Copy link
Contributor

gardener-prow bot commented May 22, 2024

LGTM label has been added.

Git tree hash: 79cd0f3f104fc86ef19e10e85fbb7b790dd6bf8a

…ectly

Previously, the Cache Prometheus scraped the kubelet and cadvisor metrics from
the seed nodes via the Kubernetes API server's proxy feature. This is
unnecessary as the Cache Prometheus can directly access the cluster nodes.

This change enhances the Cache Prometheus configuration to scrape metrics
directly from the cluster nodes, thereby eliminating the need for proxying
through the Kubernetes API server, which results in Internet network traffic and
associated costs.

It's important to note that the control plane Prometheus still requires the
proxy feature for scraping metrics from shoot nodes, as it operates within the
seed cluster and lacks direct network access to the shoot nodes.
Without this change, the Prometheus federation fails with the following
error in the local setup:

  Get "https://172.18.0.2:10250/metrics":
  tls: failed to verify certificate:
  x509: cannot validate certificate for 172.18.0.2
  because it doesn't contain any IP SANs

The reason for that is that the self issued certificate of the Kubelet:

  openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
    X509v3 Subject Alternative Name:
      DNS:gardener-local-ha-multi-zone-control-plane

does not contain a SAN IP entry, unlike in a real cluster:

  openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
    X509v3 Subject Alternative Name:
      DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24

With this change, the PR works in the local setup but the certificate is not
verified.

Although it would be fine to skip certificate verification only in the local
setup (it is not straightforward to achieve it only for the local deployment),
we should anyways try to verify the certificate in real clusters to prevent man
in the middle attacks. Note that the impact of such an attack is fairly small
(kubelet metrics) but it is a valid (academic) goal to properly protect the
network path of the Kubelet scrape job.
The Prometheus federation fails with the following error
in the local setup:

      Get "https://172.18.0.2:10250/metrics":
      tls: failed to verify certificate:
      x509: cannot validate certificate for 172.18.0.2
      because it doesn't contain any IP SANs

We could tweak the Prometheus configuration to scrape the nodes
using the hostname and not the node's IP. That could be acceptable
(although not idiomatic) in real clusters as well.

This way Prometheus can validate the certificate, because it connects
via the hostname, and the certificate contains a DNS SAN entry
both in the local setup and in real clusters.

Local setup: only DNS

  openssl s_client -connect 172.18.0.3:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
    X509v3 Subject Alternative Name:
      DNS:gardener-local-ha-multi-zone-control-plane

Real setup: both DNS and IP

  openssl s_client -connect 10.254.74.24:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -noout -text | grep "Subject Alternative Name" -A 1
    X509v3 Subject Alternative Name:
      DNS:ip-10-254-74-24.eu-west-1.compute.internal, IP Address:10.254.74.24

However, federation still fails in the local setup, because the issuer of the
certificate is not trusted by Prometheus in the current configuration.
> Instead of self signing a serving certificate, the Kubelet will request a
> certificate from the 'certificates.k8s.io' API. This requires an approver to
> approve the certificate signing requests (CSR).
https://kubernetes.io/docs/reference/config-api/kubelet-config.v1beta1/

The CSRs are approved after the KinD cluster is created.

With this setting, the Kubelet's certificate is issued by the default CA
(kube-root-ca.crt) that Prometheus trusts in the current configuration, and the
certificate contains a SAN IP entry for the node IP so that Prometheus can
successfully verify the certificate.

openssl s_client -connect 172.18.0.2:10250 -showcerts </dev/null 2>/dev/null | openssl x509 -text | grep -A 1 'Subject Alternative Name:'
  X509v3 Subject Alternative Name:
    DNS:gardener-local-ha-multi-zone-worker2, IP Address:172.18.0.2

So, at this state of the PR, the local setup works, and it would also
work in GKE managed clusters, where the Kubelet's certificate is issued by
kube-root-ca.crt.

However, it would not work in seeds that are Gardener shoots, because
currently the Kubelet's certificates are issued by a separate CA,
  ca-kubelet-...
which is available in the control plane of the seed, but not in the seed itself.
Maybe the Gardenlet in the soil could copy this CA to the kube-system namespace
of its shoot, and the Gardenlet in the seed, if it finds a ca-kubelet secret in
the kube-system namespace, could copy it to the garden namespace. A conditional
Prometheus volume could be used to mount this ca-kubelet to Prometheus if it
exists (managed seed) or mount kube-root-ca.crt otherwise. This condition
handling has to happen in Gardener because Prometheus expects a single CA in its
configuration:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#tls_config
This is currently needed only for Gardener managed seeds, because
the Kubelet's certificate is not issued by the default CA.

Maybe the Gardenlet in the soil could copy the Kubelet CA to the kube-system
namespace of its child shoot, and the Gardenlet in the seed, if it finds a
ca-kubelet secret in the kube-system namespace (i.e. a managed seed), could copy
it to the garden namespace. A conditional Prometheus volume could be used to
mount this ca-kubelet to Prometheus if it exists (managed seed) or mount
kube-root-ca.crt otherwise. This condition handling has to happen in Gardener
because Prometheus expects a single CA in its configuration.

> KEP-3257: Cluster Trust Bundles
> https://github.com/kubernetes/enhancements/tree/master/keps/sig-auth/3257-cluster-trust-bundles
could be also a good approach to make the Kubelet CA-s available
for the workload.

The cluster internal communication is still encrypted (protection against
a passively eavesdropping entity), but the certificate is not verified,
which could allow for a man in the middle attack by an active entity.

For the cluster internal scraping of the kubelet metrics, this might
be acceptable.

Both the ideas above might not be feasible short term, so with this PR,
the certificate verification is skipped.
to the seed's node CIDR for the Kubelet network policy that was
introduced in this PR, and also for the node exporter network
policy which served as a blue print for the Kubelet network policy.
With this change, a minor TODO of the node exporter network policy
could be resolved.
Use the garden client and try to look for a managed seed with the same name as
the seed. If it exists, it is a managed seed, otherwise it isn't.

The isManagedSeed flag is passed down to the AdditionalScrapeConfigs function
and golang text templating is used to set the insecure_skip_verify flag.

To test this in the KinD based local setup: instead of using the default local
project, create the shoot in the garden project (namespace) and upgrade it to a
managed seed with:

    apiVersion: seedmanagement.gardener.cloud/v1alpha1
    kind: ManagedSeed
    metadata:
      name: local
      namespace: garden
    spec:
      shoot:
        name: local
      gardenlet:
        bootstrap: BootstrapToken
        mergeWithParent: true
        config:
          seedConfig:
            spec:
              ingress:
                controller:
                  kind: nginx

Then check the scrape config both in the KinD cluster and in the managed seed:

    k get secrets prometheus-cache-additional-scrape-configs -o json | jq -r '
      .data."prometheus.yaml" | <at>base64d
    ' | grep skip

TLS verification is enabled in the KinD cluster but disabled in the managed
seed.
@gardener-prow gardener-prow bot removed the lgtm Indicates that a PR is ready to be merged. label May 22, 2024
@istvanballok
Copy link
Contributor Author

Fix failing integration test by adding missing scheme
Do not panic

Thank you for the commits, @rfranzke.

Prow: New changes are detected. LGTM label has been removed.

It looks good to me.

@rfranzke
Copy link
Member

/lgtm

@gardener-prow gardener-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 22, 2024
Copy link
Contributor

gardener-prow bot commented May 22, 2024

LGTM label has been added.

Git tree hash: 67d0556d518fe69acc37804581a3e9cf0d75007d

@gardener-prow gardener-prow bot merged commit d404d6e into gardener:master May 22, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/monitoring Monitoring (including availability monitoring and alerting) related cla: yes Indicates the PR's author has signed the cla-assistant.io CLA. kind/enhancement Enhancement, improvement, extension lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants