ref(rules): Fire delayed rules #69830

ceorourke · 2024-04-26T23:18:07Z

Follow up to #69167 to actually fire the rules

Closes https://github.com/getsentry/team-core-product-foundations/issues/242

codecov · 2024-04-27T00:31:07Z

Codecov Report

Attention: Patch coverage is 95.31250% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 79.85%. Comparing base (9a31bb0) to head (830c3f5).
Report is 50 commits behind head on master.

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #69830       +/-   ##
===========================================
+ Coverage   58.39%   79.85%   +21.45%     
===========================================
  Files        6490     6504       +14     
  Lines      288802   289487      +685     
  Branches    49750    49854      +104     
===========================================
+ Hits       168641   231158    +62517     
+ Misses     119745    57918    -61827     
+ Partials      416      411        -5

Files	Coverage Δ
src/sentry/buffer/redis.py	`92.61% <ø> (+65.53%)`	⬆️
src/sentry/rules/processing/processor.py	`93.37% <100.00%> (+71.14%)`	⬆️
src/sentry/rules/processing/delayed_processing.py	`93.33% <95.08%> (+63.50%)`	⬆️

... and 1938 files with indirect coverage changes

wedamija · 2024-04-30T00:10:42Z

src/sentry/rules/processing/delayed_processing.py

+            notification_uuid = str(uuid.uuid4())
+            rule_fire_history = history.record(rule, group, event.event_id, notification_uuid)
+            activate_downstream_actions(
+                rule, event.for_group(group), notification_uuid, rule_fire_history


We'll also need to get the occurrence for the error here if this is an issue platform event. We should have this available in post process, so maybe the best option is to include the occurrence id along with the event id, like <event_id>::<occurrence_id and then we just decode it.

The other option is to query snuba to get it, but we're probably better off just passing it along.

We could probably do the for_group stuff in get_group_id_to_event potentially, so that we just have a bunch of GroupEvent objects with/without occurrences ready to go.

Do error events have an occurrence id too, or would it just be the issue platform ones having it?

No, only issue platform events.

Starting to wonder if we should just be storing a json blob in the value. That way we can store event_id, occurrence_id, and any other things we might need in the future.

I like the idea of storing this in JSON if we have to add more fields if we need to add more fields.

Is there anywhere I can dive into to figure out all the differences between events (Error, issue platform, etc)? Just the event code itself?

The main different is that an error event has no occurrence associated with it, and issue platform events do. Occurrences are basically extra metadata that gives us info about the specific event type. So for an N+1 db query perf issue it'd have info about the query that was repeated.

https://develop.sentry.dev/issue-platform/
https://www.notion.so/sentry/Issue-Platform-Overview-b5799988da494f11a435479ffb539391

src/sentry/rules/processing/delayed_processing.py

schew2381 · 2024-05-01T19:31:45Z

src/sentry/rules/processing/delayed_processing.py

Just wanna double check this, but do we need to worry about snoozed rules at all? I realize that we only add to the buffer if the rule is not snoozed in the first place, and at worst we have a minute delay so theoretically an action could be fired in a tiny bit >1min after a snooze.

Is it worth considering doing another snooze check in delayed processing?

sentry/src/sentry/rules/processing/processor.py

Lines 340 to 346 in 1730d73

snoozed_rules = RuleSnooze.objects.filter(rule__in=rules, user_id=None).values_list(

"rule", flat=True

)

rule_statuses = bulk_get_rule_status(rules, self.group)

for rule in rules:

if rule.id not in snoozed_rules:

self.apply_rule(rule, rule_statuses[rule.id])

Seems pretty minor if we fire a rule slightly after it's snoozed, but probably makes sense to recheck. I'd put that in a separate pr to keep this simple

src/sentry/rules/processing/delayed_processing.py

wedamija · 2024-05-01T21:59:45Z

src/sentry/rules/processing/delayed_processing.py

Seems pretty minor if we fire a rule slightly after it's snoozed, but probably makes sense to recheck. I'd put that in a separate pr to keep this simple

wedamija · 2024-05-01T22:01:06Z

src/sentry/rules/processing/delayed_processing.py

+    group_to_groupevent: dict[Group, GroupEvent] = {}
+    groups = Group.objects.filter(id__in=group_ids)
+    group_id_to_group = {group.id: group for group in groups}
+    for rule_group, instance_id in rulegroup_to_events.items():


Does it make sense to parse these in the function that fetches rulegroup_to_events, and have the type be
dict[(tuple(int, int): <json>]. That way you don't need to parse it all in here

Do you mean in get_rules_to_groups? There isn't a function besides buffer.get_hash that gets rulegroup_to_events.

Ahh I hadn't checked and assumed there was a function for this. It's nbd, but it might be better to transform it separately and just pass it in. Feel free to ignore though

src/sentry/rules/processing/delayed_processing.py

wedamija

I'd like to add some instrumentation around each step in this code, could you add that as a follow up? Just so we know how long things are taking, how many groups we're processing in total, etc etc.

wedamija · 2024-05-02T17:29:05Z

src/sentry/rules/processing/delayed_processing.py

+def parse_rulegroup_to_event_data(
+    rulegroup_to_event_data: dict[str, str]
+) -> dict[tuple[str, str], dict[str, str]]:


Might be nice to define a dict type here for the value, but could be a follow up.

wedamija · 2024-05-02T17:36:55Z

src/sentry/rules/processing/delayed_processing.py

+            parsed_rulegroup_to_event_data, project.id, group_ids
+        )
+        for group, groupevent in group_to_groupevent.items():
+            rule_statuses = bulk_get_rule_status(alert_rules, group, project)


Should this loop be inverted, so that we process each group, fetch all the statuses for it, and then fire it for each rule?

I was going to say maybe it doesn't make sense to fetch all the rules at once in this loop, but it looks like we cache them, so this should be ok for now

I think if I did that I'd end up fetching events and occurrences for groups for rules that didn't necessarily fire

Gonna merge for now cause I need to rebase my other PR against this and I've made some follow up tickets for the dangling stuff - if this needs to be added later too I can.

Follow up to #69830 (comment) to check the `RuleSnooze` table before firing a delayed rule, on the off chance it got muted in the < 1 minute it took to process. Closes getsentry/team-core-product-foundations#307

Add instrumentation and logging to the delayed rule processor to measure how long the bulkier functions are taking and how many rules and groups we're processing. Closes https://getsentry.atlassian.net/browse/ALRT-19 and getsentry/team-core-product-foundations#308 (a dupe) as a follow up to #69830 (review)

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 26, 2024

vercel bot deployed to Preview April 26, 2024 23:18 View deployment

vercel bot deployed to Preview April 26, 2024 23:24 View deployment

vercel bot deployed to Preview April 26, 2024 23:41 View deployment

vercel bot deployed to Preview April 26, 2024 23:54 View deployment

ceorourke force-pushed the ceorourke/delayed-rule-fire branch from 98611c3 to e109f10 Compare April 29, 2024 21:57

vercel bot deployed to Preview April 29, 2024 21:59 View deployment

vercel bot deployed to Preview April 29, 2024 22:01 View deployment

ceorourke marked this pull request as ready for review April 29, 2024 22:30

vercel bot deployed to Preview April 29, 2024 22:31 View deployment

ceorourke requested review from a team and wedamija April 29, 2024 22:39

wedamija reviewed Apr 30, 2024

View reviewed changes

schew2381 assigned ceorourke Apr 30, 2024

schew2381 reviewed May 1, 2024

View reviewed changes

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved

schew2381 reviewed May 1, 2024

View reviewed changes

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved

vercel bot deployed to Preview May 1, 2024 19:04 View deployment

ceorourke added 9 commits May 1, 2024 12:04

ref(rules): Fire delayed rules

724ce1e

pass GroupEvent

cb0ba3f

bind_nodes at once

9607fa0

make it an int

8f142d4

update tests

d3a3e58

rm comments,I don't think we need this stuff

292d0f6

typing

1dbc955

decided to not use analytics

5e15399

pass json blob with occurrence id to buffer

6684dee

ceorourke force-pushed the ceorourke/delayed-rule-fire branch from 9cc9050 to 6684dee Compare May 1, 2024 19:05

nits

067f155

ceorourke force-pushed the ceorourke/delayed-rule-fire branch from 41e72e1 to 067f155 Compare May 1, 2024 19:07

vercel bot deployed to Preview May 1, 2024 19:09 View deployment

schew2381 reviewed May 1, 2024

View reviewed changes

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved

schew2381 reviewed May 1, 2024

View reviewed changes

Add test for issue platform issues, pr comments

17fdbb1

vercel bot deployed to Preview May 1, 2024 20:18 View deployment

schew2381 reviewed May 1, 2024

View reviewed changes

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved

src/sentry/rules/processing/delayed_processing.py Outdated Show resolved Hide resolved

wedamija reviewed May 1, 2024

View reviewed changes

fix test and pr nits

22dfa35

vercel bot deployed to Preview May 1, 2024 23:26 View deployment

typing

3447659

vercel bot deployed to Preview May 1, 2024 23:54 View deployment

parse redis data

830c3f5

vercel bot deployed to Preview May 2, 2024 17:00 View deployment

ceorourke requested a review from wedamija May 2, 2024 17:25

wedamija approved these changes May 2, 2024

View reviewed changes

ceorourke merged commit 6a38c60 into master May 2, 2024
49 checks passed

ceorourke deleted the ceorourke/delayed-rule-fire branch May 2, 2024 18:54

ceorourke mentioned this pull request May 2, 2024

ref(rules): Don't fire delayed rules if they're snoozed #70203

Merged

ceorourke mentioned this pull request May 10, 2024

ref(delayed rules): Add instrumentation #70693

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ref(rules): Fire delayed rules #69830

ref(rules): Fire delayed rules #69830

ceorourke commented Apr 26, 2024

codecov bot commented Apr 27, 2024 •

edited

wedamija Apr 30, 2024

ceorourke Apr 30, 2024

wedamija Apr 30, 2024

saponifi3d Apr 30, 2024

wedamija May 1, 2024

schew2381 May 1, 2024 •

edited

wedamija May 1, 2024

wedamija May 1, 2024

wedamija May 1, 2024

ceorourke May 1, 2024

wedamija May 1, 2024

wedamija left a comment

wedamija May 2, 2024

wedamija May 2, 2024

ceorourke May 2, 2024

ceorourke May 2, 2024

	snoozed_rules = RuleSnooze.objects.filter(rule__in=rules, user_id=None).values_list(
	"rule", flat=True
	)
	rule_statuses = bulk_get_rule_status(rules, self.group)
	for rule in rules:
	if rule.id not in snoozed_rules:
	self.apply_rule(rule, rule_statuses[rule.id])

ref(rules): Fire delayed rules #69830

ref(rules): Fire delayed rules #69830

Conversation

ceorourke commented Apr 26, 2024

codecov bot commented Apr 27, 2024 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schew2381 May 1, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wedamija left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 27, 2024 •

edited

schew2381 May 1, 2024 •

edited