Delete metrics with policy label on policy delete #2256

lambdanis · 2024-03-25T14:27:14Z

tetragon_policy_events_total metric has a policy label. When a TracingPolicy is deleted, the corresponding metrics are still exposed by Tetragon, leading to an overhead (usually not big, as policies are not churned frequently, but might be significant in some cases).

There was a similar problem with metrics that have a pod label. Now when a pod is deleted, Tetragon deletes its metrics from the registry (see https://github.com/cilium/tetragon/blob/main/pkg/metrics/metricwithpod.go).

We should add a similar handler for policy delete events. One small gotcha is that at the moment metrics don't distinguish between clusterwide and namespaced policies, so we should first add another label to distinguish between different policy kinds.

The text was updated successfully, but these errors were encountered:

prateek041 · 2024-04-22T17:01:57Z

Hello @lambdanis
Just to get a proper understanding of the issue.

We are talking about the Policy_events_total metric. which does not have any Pod Delete Handler like this, registered in main. which makes sure that when a tetragon policy is deleted, corresponding metrics are also deleted.

So, to resolve this, following things need to be done:

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)
Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

lambdanis · 2024-04-23T23:48:56Z

Hi @prateek041

Add a new label, that can tell if policy is cluster wide or namespaced. (what do you suggest the name of that label be ?)

Yes, maybe "policy_kind" and "policy_namespace" (empty for clusterwide) would be good.

Implement and Register handler, that will make sure tracing policy metrics are also deleted when the policy itself is deleted.

Something similar. I think metrics deletion logic should be triggered from DeleteTracingPolicy in the sensors manager rather than from the k8s API watcher (WatchTracePolicy). That way stale metrics will be cleaned up also if policies are created via Tetragon API rather than via k8s CRD.

If I got it right, what is the deadline for resolution of this issue ? I would like to work on it.

No deadline. I don't recall any reports of this being a big problem, so it's just a nice to have. Feel free to open a PR or if anything is unclear then post here.

prateek041 · 2024-04-24T10:06:36Z

Sure, thanks for the pointers, I have started working on it. Please assign it to me.

lambdanis added kind/enhancement This improves or streamlines existing functionality area/metrics Related to prometheus metrics labels Mar 25, 2024

lambdanis assigned prateek041 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete metrics with policy label on policy delete #2256

Delete metrics with policy label on policy delete #2256

lambdanis commented Mar 25, 2024

prateek041 commented Apr 22, 2024

lambdanis commented Apr 23, 2024 •

edited

prateek041 commented Apr 24, 2024

Delete metrics with policy label on policy delete #2256

Delete metrics with policy label on policy delete #2256

Comments

lambdanis commented Mar 25, 2024

prateek041 commented Apr 22, 2024

lambdanis commented Apr 23, 2024 • edited

prateek041 commented Apr 24, 2024

lambdanis commented Apr 23, 2024 •

edited