Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(similarity-embedding): Create backfill script for inserting records #68466

Merged
merged 33 commits into from
May 14, 2024

Conversation

jangjodi
Copy link
Member

@jangjodi jangjodi commented Apr 8, 2024

Create script to backfill seer grouping records
Create function to call seer bulk insert endpoint (https://github.com/getsentry/seer/pull/480) in batches of 20 groups with a 10 second timeout
Add seer bulk insert function
Modify get_grouping_info function to take in an optional event

The script will be triggered using an endpoint, which will be in a follow-up PR. This is because the script will only be run for the s4s project for now, thus we only need to run it once.

closes https://github.com/getsentry/getsentry/issues/13420

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Apr 8, 2024
Copy link

codecov bot commented Apr 8, 2024

Bundle Report

Changes will increase total bundle size by 9.79kB ⬆️

Bundle name Size Change
sentry-webpack-bundle-array-push 26.3MB 9.79kB ⬆️

src/sentry/seer/utils.py Outdated Show resolved Hide resolved
@jangjodi jangjodi marked this pull request as ready for review April 9, 2024 16:08
@jangjodi jangjodi requested review from a team, lobsterkatie and JoshFerge April 9, 2024 16:08
and event.group.data["metadata"].get("has_embeddings_record_v1")
)
):
grouping_info = get_grouping_info(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up with any ramifications if grouping config changes

Copy link
Member

@wedamija wedamija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea that this only needs to be run on prod? If so, it might make sense to move to getsentry

src/sentry/tasks/backfill_seer_grouping_records.py Outdated Show resolved Hide resolved
@jangjodi jangjodi changed the title feat(similarity-embedding): Create backfill script for inserting records ref(similarity-embedding): Add support for embeddings record backfill script Apr 10, 2024
@jangjodi jangjodi marked this pull request as draft April 10, 2024 18:53
@jangjodi jangjodi changed the title ref(similarity-embedding): Add support for embeddings record backfill script [WIP] ref(similarity-embedding): Add support for embeddings record backfill script Apr 10, 2024

group_id_batch = list(group_id_message_batch.keys())
time_now = datetime.now()
events_entity = Entity("events", alias="events")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up: should this be the errors entity?


redis_client = redis.redis_clusters.get(settings.SENTRY_MONITORS_REDIS_CLUSTER)
if last_processed_id is None:
last_processed_id = int(redis_client.get(LAST_PROCESSED_REDIS_KEY) or 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up: use project_id in this redis key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

follow up: should we set a TTL on this redis key?

@@ -276,6 +276,7 @@ def register_temporary_features(manager: FeatureManager):
manager.add("projects:similarity-embeddings", ProjectFeature, FeatureHandlerStrategy.INTERNAL)
manager.add("projects:similarity-embeddings-grouping", ProjectFeature, FeatureHandlerStrategy.INTERNAL)
manager.add("projects:similarity-embeddings-metadata", ProjectFeature, FeatureHandlerStrategy.INTERNAL)
manager.add("projects:similarity-embeddings-backfill", ProjectFeature, FeatureHandlerStrategy.INTERNAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use the FeatureHandlerStrategy.OPTIONS? or what kind of feature handler strategy were you thinking

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes good catch, I changed it to options

@jangjodi jangjodi merged commit 33d4970 into master May 14, 2024
51 checks passed
@jangjodi jangjodi deleted the jodi/similarity-embeddings-backfill branch May 14, 2024 21:05
JoshFerge added a commit that referenced this pull request May 14, 2024
)

Create endpoint to call the backfill script
[here](#68466)

---------

Co-authored-by: Josh Ferge <josh.ferge@sentry.io>
Co-authored-by: getsantry[bot] <66042841+getsantry[bot]@users.noreply.github.com>
Co-authored-by: Katie Byers <katie.byers@sentry.io>
@github-actions github-actions bot locked and limited conversation to collaborators May 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants