Limit impact on k8s apiserver in large clusters #824

dashpole · 2024-05-08T19:50:26Z

What I would like to be able to do

I mentioned this briefly at the community meeting earlier today.

As a general best practice, DaemonSets should avoid watching a resource cluster-wide, such as watching all pods, all replicasets, all services, etc. Doing this can limit the maximum possible number of nodes in a cluster. It is acceptable to watch pods assigned to the same node as the DaemonSet pod. That actually generates less load on the kube-apiserver than a deployment with multiple replicas watching all pods, since the traffic is roughly O(pods * replicas) for the deployment. Ideally, I would like to be able to run Beyla with the following architecture:

The Beyla DaemonSet watches pods assigned to the same node as itself, and telemetry includes pod information.
A horizontally scaled deployment (e.g. the OpenTelemetry Collector) enriches telemetry with information about other k8s resources.

To do that, it would be nice to have more control over which k8s resources beyla watches. This would typically be done using field selectors, similar to the prometheus server's selectors config in kubernetes_sd_configs.

Alternatives considered

The above will work well for single-application metrics, like HTTP golden signal metrics for a pod, since all relevant pod information is about pods running on the node. However, if I want to make a service graph, the above approach won't work, as I would also need pod information about pods running on other nodes, which defeats the purpose of the improvement. I had considered doing all IP -> Pod mapping in a deployment to enable that use-case.

The issue I ran into is filtering. At least on GKE, there is a bunch of traffic to things I don't really care about (e.g. kubelet health checks). I would like to be able to filter out things that aren't a pod, and only collect telemetry for pods, but I couldn't figure out how to do that (and couldn't think of a good way to implement it, either).

The text was updated successfully, but these errors were encountered:

dimunech · 2024-05-31T07:56:32Z

To expand on this - Kubernetes metadata decorator adds considerable load to Kubernetes API servers. Here's a graph of master nodes memory usage before and after disabling the decorator (yellow annotation on the graph).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit impact on k8s apiserver in large clusters #824

Limit impact on k8s apiserver in large clusters #824

dashpole commented May 8, 2024

dimunech commented May 31, 2024

Limit impact on k8s apiserver in large clusters #824

Limit impact on k8s apiserver in large clusters #824

Comments

dashpole commented May 8, 2024

dimunech commented May 31, 2024