Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit impact on k8s apiserver in large clusters #824

Open
dashpole opened this issue May 8, 2024 · 1 comment
Open

Limit impact on k8s apiserver in large clusters #824

dashpole opened this issue May 8, 2024 · 1 comment

Comments

@dashpole
Copy link
Contributor

dashpole commented May 8, 2024

What I would like to be able to do

I mentioned this briefly at the community meeting earlier today.

As a general best practice, DaemonSets should avoid watching a resource cluster-wide, such as watching all pods, all replicasets, all services, etc. Doing this can limit the maximum possible number of nodes in a cluster. It is acceptable to watch pods assigned to the same node as the DaemonSet pod. That actually generates less load on the kube-apiserver than a deployment with multiple replicas watching all pods, since the traffic is roughly O(pods * replicas) for the deployment. Ideally, I would like to be able to run Beyla with the following architecture:

  • The Beyla DaemonSet watches pods assigned to the same node as itself, and telemetry includes pod information.
  • A horizontally scaled deployment (e.g. the OpenTelemetry Collector) enriches telemetry with information about other k8s resources.

To do that, it would be nice to have more control over which k8s resources beyla watches. This would typically be done using field selectors, similar to the prometheus server's selectors config in kubernetes_sd_configs.

Alternatives considered

The above will work well for single-application metrics, like HTTP golden signal metrics for a pod, since all relevant pod information is about pods running on the node. However, if I want to make a service graph, the above approach won't work, as I would also need pod information about pods running on other nodes, which defeats the purpose of the improvement. I had considered doing all IP -> Pod mapping in a deployment to enable that use-case.

The issue I ran into is filtering. At least on GKE, there is a bunch of traffic to things I don't really care about (e.g. kubelet health checks). I would like to be able to filter out things that aren't a pod, and only collect telemetry for pods, but I couldn't figure out how to do that (and couldn't think of a good way to implement it, either).

@dimunech
Copy link

To expand on this - Kubernetes metadata decorator adds considerable load to Kubernetes API servers. Here's a graph of master nodes memory usage before and after disabling the decorator (yellow annotation on the graph).
Screenshot 2024-05-31 at 10 52 42

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants