Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Metrics Serving in EKS Anywhere #7875

Open
jiayiwang7 opened this issue Mar 20, 2024 · 1 comment
Open

Introduce Metrics Serving in EKS Anywhere #7875

jiayiwang7 opened this issue Mar 20, 2024 · 1 comment
Assignees
Milestone

Comments

@jiayiwang7
Copy link
Member

jiayiwang7 commented Mar 20, 2024

What would you like to be added:

Introduce options to secure serving metrics on K8s system and EKS-A management components.

Why is this needed:

As an EKS Anywhere cluster administrator, I would like to scrape metrics from the K8s system and EKS-A management components in a simple but secure way. Those metrics are useful for building dashboard and alerts, monitoring the healthy state of a cluster.

Currently in EKS-A, metrics of some system components are already exposed by default (e.g. coredns, kube-api-server). Other system and management components such as kube-controller-manager are configured with the default --bind-address=127.0.0.1 or equivalent, so that these servers are only listening on localhost. The goal is to expose those metrics in a secure fashion so that external monitoring services such as Prometheus can consume them properly.

Details

There are three types of system/management components we would like to serve metrics from:

  1. K8s system components, such as kube-controller-manager, kube-scheduler, kube-proxy.
  2. EKS-A management components, such as eksa-cluster-controller, eks-anywhere-packages.
  3. CAPI components, such as capi-controller, capi-kubeadm-control-plane, capv-controller (provider specific), etcdadm-controller, etcdadm-bootstrap-provider

In the list above, scraping metrics on the secure port of the K8s system components are already introduced as default in Kubernetes with native K8s authentication and authorization workflow: kubernetes/kubernetes#72491. So the controller-manager / scheduler secure metrics should already be enabled by default with --authentication-kubeconfig and authorization-kubeconfig flags. Regarding how they can emit metrics with RBAC, we need more investigation (whether the above core components can all be exposed from the /metrics endpoint via authentication (user/group/SA) and authorization (via RBAC verb: get, nonResourceURLs: /metrics)).

As for CAPI components, all of them are built based of controller-runtime who implemented a feature in its v0.16.0 release to provide a secure endpoint for metrics which uses https and provides authentication and authorization: kubernetes-sigs/controller-runtime#2407. CAPI community took this feature and implemented it to its core controllers in its v1.6.0 release: kubernetes-sigs/cluster-api#9264. Not all the CAPI infrastructure providers have yet implement the same feature but we do expect this to be the API pattern to follow. External etcd components are maintained by the EKS Anywhere team. We can follow the same pattern CAPI core did for secure diagnostics and implement it in etcdadm-controller-manager, etcdadm-bootstrap-provider.

For EKS-A management, it is also built based of controller-runtime. We can follow the same pattern CAPI community did for secure diagnostics -- this requires further changes in the EKS-A cluster-controller-manager and eks-anywhere-packages.

After figuring out how each type of components can serve metrics endpoint securely, we can then decide on how to make them configurable through EKS-A with simplicity and security. Whether it's through EKS-A cluster spec, or doc recommendation with RBAC and ClusterRole.

Planning

We want to prioritize the work of exposing K8s system components first based on request:

As explained above, the metrics authentication and authorization flow are different between those native K8s components vs the rest built on top of controller-runtime. Thus we would like to implement the feature by phases:

  1. A design doc for a solution for all the system and management components. It needs to be generic enough to onboard or be compatible with the K8s/ EKS-A / CAPI / etcd components metrics.
  2. Implementation of exposing K8s system components based on the design.
  3. Introducing secure diagnostics in EKS-A management components featuring controller-runtime authorization for metrics endpoint.
  4. Introducing secure diagnostics in external etcd components featuring controller-runtime authorization for metrics endpoint.
  5. Pushing or contributing to CAPI to enable secure diagnostics features for all EKS-A supported CAPI providers.
  6. Implementation of exposing EKS-A and CAPI components metrics through cluster spec.
@jiayiwang7 jiayiwang7 self-assigned this Mar 20, 2024
@sp1999 sp1999 self-assigned this Mar 26, 2024
@sp1999 sp1999 added this to the v0.20.0 milestone Apr 14, 2024
@sp1999
Copy link
Member

sp1999 commented Apr 15, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants