-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(similarity-embedding): Create backfill script for inserting records #68466
Conversation
Bundle ReportChanges will increase total bundle size by 9.79kB ⬆️
|
and event.group.data["metadata"].get("has_embeddings_record_v1") | ||
) | ||
): | ||
grouping_info = get_grouping_info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up with any ramifications if grouping config changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that this only needs to be run on prod? If so, it might make sense to move to getsentry
|
||
group_id_batch = list(group_id_message_batch.keys()) | ||
time_now = datetime.now() | ||
events_entity = Entity("events", alias="events") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up: should this be the errors entity?
|
||
redis_client = redis.redis_clusters.get(settings.SENTRY_MONITORS_REDIS_CLUSTER) | ||
if last_processed_id is None: | ||
last_processed_id = int(redis_client.get(LAST_PROCESSED_REDIS_KEY) or 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up: use project_id in this redis key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
follow up: should we set a TTL on this redis key?
src/sentry/features/temporary.py
Outdated
@@ -276,6 +276,7 @@ def register_temporary_features(manager: FeatureManager): | |||
manager.add("projects:similarity-embeddings", ProjectFeature, FeatureHandlerStrategy.INTERNAL) | |||
manager.add("projects:similarity-embeddings-grouping", ProjectFeature, FeatureHandlerStrategy.INTERNAL) | |||
manager.add("projects:similarity-embeddings-metadata", ProjectFeature, FeatureHandlerStrategy.INTERNAL) | |||
manager.add("projects:similarity-embeddings-backfill", ProjectFeature, FeatureHandlerStrategy.INTERNAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to use the FeatureHandlerStrategy.OPTIONS
? or what kind of feature handler strategy were you thinking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes good catch, I changed it to options
) Create endpoint to call the backfill script [here](#68466) --------- Co-authored-by: Josh Ferge <josh.ferge@sentry.io> Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com> Co-authored-by: Katie Byers <katie.byers@sentry.io>
Create script to backfill seer grouping records
Create function to call seer bulk insert endpoint (https://github.com/getsentry/seer/pull/480) in batches of 20 groups with a 10 second timeout
Add seer bulk insert function
Modify get_grouping_info function to take in an optional event
The script will be triggered using an endpoint, which will be in a follow-up PR. This is because the script will only be run for the s4s project for now, thus we only need to run it once.
closes https://github.com/getsentry/getsentry/issues/13420