Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alerting: added time based restrictions to alert rules #33075

Closed
wants to merge 3 commits into from

Conversation

bblazei
Copy link

@bblazei bblazei commented Apr 16, 2021

anyone else sick of these +1 emails that go on for years and years and years....??

well i had some extra time this week and felt like giving this a go ;)

here's what the alert tab looks like with my changes
grafana_alert_layout

features:

  • db-agnostic as long as regular alerting is supported
  • panel level time restrictions
  • condition level time restrictions
  • restrictions by day and/or time
  • backward compatible with existing dashboards (hopefully)

tried to minimize touch points as i'm pretty sure this entire alerting framework will change sooner than later. actually had a merge conflict even though i pulled a few days ago.

it feels like the alerts started as a super simple straightforward query check (1 condition). then as different needs came up, it slowly got augmented which all the other bells and whistles we see today, like thresholds, frequency, emails, dashboards, etc. however, under the covers, there is still a strong dependency to check specific 'conditions'. based on my interaction with it so far, it doesn't really seem like this is a sustainable design moving forward.

also see this ticket comment: #7832 (comment)

i really had to shoe-horn the idea of time to the backend evaluators, especially the idea of a 'parent' time for the whole panel. is it the cleanest code i've written? i'm sure there's a much cleaner way to incorporate with the existing AlertEvaluator's, but like i said, there's a lot of churn in these files and i'm trying to balance my time with minimizing touch points.

some gotchas:

  • dashboard must be saved in order for alerting to take place, backend db is what drives the overall framework
  • timezone is based on server where grafana backend is running
    -- there is a context level 'starttime' parameter that gets set when the alerting check is kicked off (driven by top-level 'evaluate every' input)
    -- that time parameter is what drives the alert range time and there is no guarantee when it actually runs, so be careful about super specific time requirements, probably not good enough for 12:00:01
  • no hard checks on min [0, 59] or hour [0-23]
  • on the dashboard, the 'Test Rule' doesn't evaluate the parent time restriction, only the condition level; so triple-check that's correct
  • did not test this super thoroughly for backwards compatability (did make some effort); the only additional json changes are labeled 'timeEvaluator' so make a new dashboard and look at it's json to get an idea if you run into trouble
  • no validation stop on parent level parameters so make sure those are accurate when saving, however will log any errors

this fork is off of latest master 4/15/21, but i'm thinking this could be backported to whatever version first had an alerting framework. the files in the framework have changed slightly so i don't think the patch will necessarily apply cleanly, however it should be obvious where the changes go.

i have a pretty good handle on the alerting framework now, so if you guys want anything else lemme know.
unless it's javascript... i battled that render maze long enough today.

ps grafana hmu, my fees are reasonable

@CLAassistant
Copy link

CLAassistant commented Apr 16, 2021

CLA assistant check
All committers have signed the CLA.

@bblazei
Copy link
Author

bblazei commented Apr 16, 2021

original ticket: #6592

@agido-malter
Copy link

It's sad to see that a long awaited feature isn't prioritized

@bblazei bblazei marked this pull request as ready for review September 22, 2021 15:08
@bblazei bblazei requested a review from a team as a code owner September 22, 2021 15:08
@bblazei bblazei requested review from a team, kminehart, ying-jeanne, jackw and ashharrison90 and removed request for a team September 22, 2021 15:08
@bblazei bblazei marked this pull request as draft September 22, 2021 15:10
@gotjosh
Copy link
Contributor

gotjosh commented Oct 11, 2021

Apologies for the delay in response to this, but sadly, we're trying not to accept any more legacy alerting changes.

With that said, If, for some reason, we're not ready with this feature for Grafana Alerts 8, I'm happy to revisit this topic and try to get this merged.

@grafanabot
Copy link
Contributor

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 2 weeks if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@grafanabot grafanabot added the stale Issue with no recent activity label Mar 23, 2022
@gotjosh
Copy link
Contributor

gotjosh commented Mar 23, 2022

We've introduce this feature within Grafana now with https://grafana.com/docs/grafana/next/alerting/unified-alerting/notifications/mute-timings/.

Thank you very much for your contribution!

@gotjosh gotjosh closed this Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend area/frontend pr/external This PR is from external contributor stale Issue with no recent activity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants