New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(release-health): Sessions should not be default #67353
Conversation
Sessions does not even exist anymore, the metrics pipeline should be used for release health. Flip the default.
Can you add a scope to the pr title |
done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -21,7 +21,7 @@ | |||
from sentry.testutils.skips import requires_snuba | |||
from sentry.utils.samples import load_data | |||
|
|||
pytestmark = [requires_snuba] | |||
pytestmark = [pytest.mark.sentry_metrics, requires_snuba] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any attempt to access metrics in tests seems to fail without it
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #67353 +/- ##
==========================================
- Coverage 79.43% 79.40% -0.04%
==========================================
Files 6372 6373 +1
Lines 282294 282347 +53
Branches 48667 48679 +12
==========================================
- Hits 224247 224202 -45
- Misses 57676 57774 +98
Partials 371 371
|
SENTRY_RELEASE_HEALTH = "sentry.release_health.sessions.SessionsReleaseHealthBackend" | ||
SENTRY_RELEASE_HEALTH = "sentry.release_health.metrics.MetricsReleaseHealthBackend" | ||
SENTRY_RELEASE_HEALTH_OPTIONS: dict[str, Any] = {} | ||
|
||
# Release Monitor | ||
SENTRY_RELEASE_MONITOR = ( | ||
"sentry.release_health.release_monitor.sessions.SessionReleaseMonitorBackend" | ||
) | ||
SENTRY_RELEASE_MONITOR = "sentry.release_health.release_monitor.metrics.MetricReleaseMonitorBackend" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this cause data loss in self-hosted? I don't remember if self-hosted ever had release health but if they did, they will lose access to that data now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm not sure we ever put a proper migration in place for self hosted. We should make sure that self hosted keeps using the old backend until we do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this won't, self-hosted has already cut over to use metrics release health.
https://github.com/getsentry/self-hosted/blob/b3d3ce06da1661eca62a5d1fd5112810d6bbd117/sentry/sentry.conf.example.py#L195
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we actually backfill release health from sessions over to metrics? lgtm if so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We implemented a period where we dual wrote to sessions/metrics datasets. Then, after 3 months(~90 days), we cut over to reading from the metrics dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that work in self hosted? What if they come from a version that had sessions, and come straight to latest release using metric. How do you enforce a 90 day dual write there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we weren't able to provide a way to smoothly accommodate that scenario. We gave self-hosted users as many warnings as we could through documentation and release notes months in advance before cutting over the metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I don't think this is a good way to handle situations like this going forward. We could have provided a dual write backend that logs a date, dual writes, and starts reading from sessions going forward. Another option could have been to write a backfill that is only applied in self hosted. We really shouldn't be making self hosted a second class citizen like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with you there. There are definitely ways that this could've gone better but it also feels like a part of the broader issue that teams are shipping features without keeping self-hosted in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine to me for self-hosted, we're already using the new default settings
@jjbayer @iker-barriocanal Do you want me to remove or fix these tests? I'm not sure they are actually relevant anymore since this code is all deprecated, and should actually be removed (not just made non-default) |
@lynnagara what tests exactly? After we get rid of the sentry/tests/snuba/sessions/test_sessions.py Lines 21 to 41 in 6030fda
|
@jjbayer are you sure that parametrize_backend is actually parametrizing anything? From what I can tell it doesn't seem to actually run any of the release health tests. When I turn them on they all fail. |
@@ -167,6 +168,7 @@ def test_release_list_order_by_sessions_empty(self): | |||
response, [release_5, release_4, release_3, release_2, release_1] | |||
) | |||
|
|||
@pytest.mark.xfail(reason="Does not work with the metrics release health backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wedamija fyi - is this actually expected to work with the release health backend? if not, i can remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know why it doesn't work? I think it's probably testing something useful here that should work across backends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likely because store_session is writing to snuba's sessions backend. I'm not really sure how to change it though, and would need someone more familiar with that code to take it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same for the sessions api tests fyi
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a great situation:
- It seems like we don't really have much (any?) coverage on many parts of release health
- Meanwhile we are running tons of CI that isn't relevant at all and is frankly a total waste of CI hours since it was testing the old backend that just doesn't even exist anymore but we didn't get around to cleaning up from our codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wedamija how do we get this fixed then?
There are serious risks associated with having the default backend be something that just doesn't run in production. We have to remember to override it in ops, self hosted and everywhere we deploy sentry or stuff will be broken. In the meantime, are we just pretending we have existing coverage and are "disabling" something here when we don't have any real coverage anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to see if i can switch out to the old backend and preserve the tests at least
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok that doesn't work. @iambriccardo do you have any idea how to do what dan suggests about migrating store_session
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi! Let me take a look. A teammate of mine did the conversion, I know partially about the whole domain but let me see if I can be of any help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lynnagara I fixed the problem and pushed the changes directly here. I hope that is fine by you.
Yeah,
|
@@ -85,7 +85,7 @@ def new_create_snql_in_snuba(subscription, snuba_query, snql_query, entity_subsc | |||
is_performance_metrics = False | |||
is_metrics = False | |||
if isinstance(query.match, Entity): | |||
is_performance_metrics = query.match.name.startswith("generic_metrics") | |||
is_performance_metrics = query.match.name.startswith("generic") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the best fix but it was the easiest for the scope of this PR. We might want in the future to improve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, how does this change fix the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The acceptance tests were running into some endpoints which were querying generic metrics but via the entity key generic_org_metrics_counters
which is still on generic metrics but failed the generic_metrics
check. For this reason I relaxed the condition but a better solution would be to have some sort of mapping or inferring the use case id from the query and marking it as performance if it's not sessions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @iambriccardo for the fix.. unfortunately there are 4 more modules/tests marked xfail in this PR. Could we apply similar there?
@@ -86,6 +86,7 @@ def adjust_end(end: datetime.datetime, interval: int) -> datetime.datetime: | |||
return end | |||
|
|||
|
|||
@pytest.mark.xfail(reason="Does not work with the metrics release health backend") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iambriccardo any thoughts on this one?
Bundle ReportChanges will increase total bundle size by 37.0kB ⬆️
|
@@ -311,7 +311,7 @@ def make_session(i): | |||
duration=None, | |||
errors=0, | |||
# The line below is crucial to spread sessions throughout the time period. | |||
started=received - i, | |||
started=received - i - 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had to do this since the new metrics based implementation has the end
interval as non inclusive so we want data to fit within the interval because we use interval: x
in tests which is going back x
seconds from the current time.
Sessions does not even exist anymore, the metrics pipeline should be used for release health. Flip the default as all environments have been cut over.
Sessions does not even exist anymore, the metrics pipeline should be used for release health. Flip the default as all environments have been cut over.