Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete metrics with policy label on policy delete #2256

Open
lambdanis opened this issue Mar 25, 2024 · 3 comments
Open

Delete metrics with policy label on policy delete #2256

lambdanis opened this issue Mar 25, 2024 · 3 comments
Assignees
Labels
area/metrics Related to prometheus metrics kind/enhancement This improves or streamlines existing functionality

Comments

@lambdanis
Copy link
Contributor

tetragon_policy_events_total metric has a policy label. When a TracingPolicy is deleted, the corresponding metrics are still exposed by Tetragon, leading to an overhead (usually not big, as policies are not churned frequently, but might be significant in some cases).

There was a similar problem with metrics that have a pod label. Now when a pod is deleted, Tetragon deletes its metrics from the registry (see https://github.com/cilium/tetragon/blob/main/pkg/metrics/metricwithpod.go).

We should add a similar handler for policy delete events. One small gotcha is that at the moment metrics don't distinguish between clusterwide and namespaced policies, so we should first add another label to distinguish between different policy kinds.

@lambdanis lambdanis added kind/enhancement This improves or streamlines existing functionality area/metrics Related to prometheus metrics labels Mar 25, 2024
@prateek041
Copy link
Contributor

Hello @lambdanis
Just to get a proper understanding of the issue.

We are talking about the Policy_events_total metric. which does not have any Pod Delete Handler like this, registered in main. which makes sure that when a tetragon policy is deleted, corresponding metrics are also deleted.

So, to resolve this, following things need to be done:

  • Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)
  • Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

@lambdanis
Copy link
Contributor Author

lambdanis commented Apr 23, 2024

Hi @prateek041

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)

Yes, maybe "policy_kind" and "policy_namespace" (empty for clusterwide) would be good.

Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

Something similar. I think metrics deletion logic should be triggered from DeleteTracingPolicy in the sensors manager rather than from the k8s API watcher (WatchTracePolicy). That way stale metrics will be cleaned up also if policies are created via Tetragon API rather than via k8s CRD.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

No deadline. I don't recall any reports of this being a big problem, so it's just a nice to have. Feel free to open a PR or if anything is unclear then post here.

@prateek041
Copy link
Contributor

Sure, thanks for the pointers, I have started working on it. Please assign it to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/metrics Related to prometheus metrics kind/enhancement This improves or streamlines existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants