Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limit feature for on_kubernetes_warning_event #1173

Open
otherguy opened this issue Nov 17, 2023 · 5 comments
Open

Rate limit feature for on_kubernetes_warning_event #1173

otherguy opened this issue Nov 17, 2023 · 5 comments
Labels
needs-triage This issue should be reviewed and tagged appropriately

Comments

@otherguy
Copy link

otherguy commented Nov 17, 2023

We sometimes have GCP acting up and causing false positives in our alerts.Currently this is our configuration, but sometimes we still get false positives.

- triggers:
  - on_kubernetes_warning_event_create:
      include: [ "FailedGetPodsMetric", "FailedGetExternalMetric" ]
      exclude: [ "googleapi: Error 503", "googleapi: Error 429", "No recommendation" ]

Some time ago, @Avi-Robusta built a custom image to test a few additional options to the trigger:

  • rate_limit
  • min_count
  • delay_s

But I don't see those in the documentation, so they probably never made it into a release.It would be great if those could be included and e.g. have the warning event only trigger if the Kubernetes Warning event has been firing for a certain length of time, or has fired a certain amount of times.

@pavangudiwada pavangudiwada added the needs-triage This issue should be reviewed and tagged appropriately label Nov 21, 2023
@aantn
Copy link
Collaborator

aantn commented Nov 25, 2023

Hi @otherguy,
We have the rate_limit param included today.

The branch adding the other changes is found here but it's been delayed in merging for now. We plan to get back to it, but it will take a little time due to backlog on our end.

@aantn
Copy link
Collaborator

aantn commented Nov 25, 2023

@pavangudiwada can you update the docs for on_kubernetes_warning_event_create (and any related triggers) to add rate_limit?

@otherguy
Copy link
Author

@aantn @pavangudiwada any update on this? 😃 We're getting a lot of false positives with GKE prometheus. They go away after a few seconds/minutes and we get pointlessly alerted. I've excluded this alert from our pages, but it would be great to have OpsGenie pages for real alerts.

CleanShot 2023-12-12 at 12 39 06@2x

@aantn
Copy link
Collaborator

aantn commented Dec 17, 2023

@otherguy reaching out to you about this on Slack.

@otherguy
Copy link
Author

otherguy commented Feb 9, 2024

I was wondering if there is any news about that :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-triage This issue should be reviewed and tagged appropriately
Projects
None yet
Development

No branches or pull requests

3 participants