Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(release-health): Sessions should not be default #67353

Merged
merged 16 commits into from Apr 3, 2024

Conversation

lynnagara
Copy link
Member

@lynnagara lynnagara commented Mar 20, 2024

Sessions does not even exist anymore, the metrics pipeline should be used for release health. Flip the default as all environments have been cut over.

Sessions does not even exist anymore, the metrics pipeline should
be used for release health. Flip the default.
@lynnagara lynnagara requested review from a team and jjbayer March 20, 2024 17:21
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Mar 20, 2024
@lynnagara lynnagara requested review from a team and hubertdeng123 March 20, 2024 17:22
@evanpurkhiser
Copy link
Member

Can you add a scope to the pr title

@lynnagara lynnagara changed the title feat: Sessions should not be default feat(release-health): Sessions should not be default Mar 20, 2024
@lynnagara
Copy link
Member Author

Can you add a scope to the pr title

done

Copy link
Member

@jjbayer jjbayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@@ -21,7 +21,7 @@
from sentry.testutils.skips import requires_snuba
from sentry.utils.samples import load_data

pytestmark = [requires_snuba]
pytestmark = [pytest.mark.sentry_metrics, requires_snuba]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any attempt to access metrics in tests seems to fail without it

Copy link

codecov bot commented Mar 20, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 79.40%. Comparing base (7c89c55) to head (16142cf).
Report is 23 commits behind head on master.

❗ Current head 16142cf differs from pull request most recent head 55401ea. Consider uploading reports for the commit 55401ea to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #67353      +/-   ##
==========================================
- Coverage   79.43%   79.40%   -0.04%     
==========================================
  Files        6372     6373       +1     
  Lines      282294   282347      +53     
  Branches    48667    48679      +12     
==========================================
- Hits       224247   224202      -45     
- Misses      57676    57774      +98     
  Partials      371      371              
Files Coverage Δ
src/sentry/conf/server.py 89.63% <100.00%> (ø)
src/sentry/release_health/metrics.py 95.55% <100.00%> (ø)
src/sentry/testutils/pytest/metrics.py 92.40% <ø> (-2.60%) ⬇️

... and 23 files with indirect coverage changes

Comment on lines -2334 to +2338
SENTRY_RELEASE_HEALTH = "sentry.release_health.sessions.SessionsReleaseHealthBackend"
SENTRY_RELEASE_HEALTH = "sentry.release_health.metrics.MetricsReleaseHealthBackend"
SENTRY_RELEASE_HEALTH_OPTIONS: dict[str, Any] = {}

# Release Monitor
SENTRY_RELEASE_MONITOR = (
"sentry.release_health.release_monitor.sessions.SessionReleaseMonitorBackend"
)
SENTRY_RELEASE_MONITOR = "sentry.release_health.release_monitor.metrics.MetricReleaseMonitorBackend"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause data loss in self-hosted? I don't remember if self-hosted ever had release health but if they did, they will lose access to that data now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'm not sure we ever put a proper migration in place for self hosted. We should make sure that self hosted keeps using the old backend until we do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this won't, self-hosted has already cut over to use metrics release health.
https://github.com/getsentry/self-hosted/blob/b3d3ce06da1661eca62a5d1fd5112810d6bbd117/sentry/sentry.conf.example.py#L195

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we actually backfill release health from sessions over to metrics? lgtm if so

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We implemented a period where we dual wrote to sessions/metrics datasets. Then, after 3 months(~90 days), we cut over to reading from the metrics dataset

Copy link
Member

@wedamija wedamija Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does that work in self hosted? What if they come from a version that had sessions, and come straight to latest release using metric. How do you enforce a 90 day dual write there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we weren't able to provide a way to smoothly accommodate that scenario. We gave self-hosted users as many warnings as we could through documentation and release notes months in advance before cutting over the metrics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't think this is a good way to handle situations like this going forward. We could have provided a dual write backend that logs a date, dual writes, and starts reading from sessions going forward. Another option could have been to write a backfill that is only applied in self hosted. We really shouldn't be making self hosted a second class citizen like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with you there. There are definitely ways that this could've gone better but it also feels like a part of the broader issue that teams are shipping features without keeping self-hosted in mind.

cc @chadwhitacre

Copy link
Member

@hubertdeng123 hubertdeng123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to me for self-hosted, we're already using the new default settings

@lynnagara
Copy link
Member Author

lynnagara commented Mar 20, 2024

@jjbayer @iker-barriocanal Do you want me to remove or fix these tests? I'm not sure they are actually relevant anymore since this code is all deprecated, and should actually be removed (not just made non-default)

@jjbayer
Copy link
Member

jjbayer commented Mar 21, 2024

@jjbayer @iker-barriocanal Do you want me to remove or fix these tests? I'm not sure they are actually relevant anymore since this code is all deprecated, and should actually be removed (not just made non-default)

@lynnagara what tests exactly? After we get rid of the SessionsReleaseHealthBackend, we can also remove the @parametrize_backend helper, and make all test classes that use it go to the metrics backend instead.

def parametrize_backend(cls):
"""
hack to parametrize test-classes by backend. Ideally we'd move
over to pytest-style tests so we can use `pytest.mark.parametrize`, but
hopefully we won't have more than one backend in the future.
"""
assert isinstance(cls.backend, SessionsReleaseHealthBackend)
newcls = type(
f"{cls.__name__}MetricsLayer",
(BaseMetricsTestCase, cls),
{
"__doc__": f"Repeat tests from {cls} with metrics layer",
"backend": MetricsReleaseHealthBackend(),
"adjust_interval": True, # HACK interval adjustment for new MetricsLayer implementation
},
)
globals()[newcls.__name__] = newcls
return cls

@lynnagara
Copy link
Member Author

@jjbayer are you sure that parametrize_backend is actually parametrizing anything? From what I can tell it doesn't seem to actually run any of the release health tests. When I turn them on they all fail.

@@ -167,6 +168,7 @@ def test_release_list_order_by_sessions_empty(self):
response, [release_5, release_4, release_3, release_2, release_1]
)

@pytest.mark.xfail(reason="Does not work with the metrics release health backend")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wedamija fyi - is this actually expected to work with the release health backend? if not, i can remove it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why it doesn't work? I think it's probably testing something useful here that should work across backends

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely because store_session is writing to snuba's sessions backend. I'm not really sure how to change it though, and would need someone more familiar with that code to take it up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for the sessions api tests fyi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a great situation:

  • It seems like we don't really have much (any?) coverage on many parts of release health
  • Meanwhile we are running tons of CI that isn't relevant at all and is frankly a total waste of CI hours since it was testing the old backend that just doesn't even exist anymore but we didn't get around to cleaning up from our codebase

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wedamija how do we get this fixed then?
There are serious risks associated with having the default backend be something that just doesn't run in production. We have to remember to override it in ops, self hosted and everywhere we deploy sentry or stuff will be broken. In the meantime, are we just pretending we have existing coverage and are "disabling" something here when we don't have any real coverage anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to see if i can switch out to the old backend and preserve the tests at least

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok that doesn't work. @iambriccardo do you have any idea how to do what dan suggests about migrating store_session?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi! Let me take a look. A teammate of mine did the conversion, I know partially about the whole domain but let me see if I can be of any help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lynnagara I fixed the problem and pushed the changes directly here. I hope that is fine by you.

@lynnagara lynnagara requested review from a team as code owners April 1, 2024 20:21
@jjbayer
Copy link
Member

jjbayer commented Apr 2, 2024

@jjbayer are you sure that parametrize_backend is actually parametrizing anything? From what I can tell it doesn't seem to actually run any of the release health tests. When I turn them on they all fail.

Yeah, @parametrize_backend inserts a "metrics" version of each test into the global scope, so when you run test_sessions.py you'll see

tests/snuba/sessions/test_sessions.py::SnubaSessionsTestMetricsLayer::test_basic_release_model_adoptions PASSED
tests/snuba/sessions/test_sessions.py::SnubaSessionsTest::test_basic_release_model_adoptions PASSED

@@ -85,7 +85,7 @@ def new_create_snql_in_snuba(subscription, snuba_query, snql_query, entity_subsc
is_performance_metrics = False
is_metrics = False
if isinstance(query.match, Entity):
is_performance_metrics = query.match.name.startswith("generic_metrics")
is_performance_metrics = query.match.name.startswith("generic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the best fix but it was the easiest for the scope of this PR. We might want in the future to improve this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, how does this change fix the tests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The acceptance tests were running into some endpoints which were querying generic metrics but via the entity key generic_org_metrics_counters which is still on generic metrics but failed the generic_metrics check. For this reason I relaxed the condition but a better solution would be to have some sort of mapping or inferring the use case id from the query and marking it as performance if it's not sessions.

Copy link
Member Author

@lynnagara lynnagara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iambriccardo for the fix.. unfortunately there are 4 more modules/tests marked xfail in this PR. Could we apply similar there?

@@ -86,6 +86,7 @@ def adjust_end(end: datetime.datetime, interval: int) -> datetime.datetime:
return end


@pytest.mark.xfail(reason="Does not work with the metrics release health backend")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iambriccardo any thoughts on this one?

Copy link

codecov bot commented Apr 3, 2024

Bundle Report

Changes will increase total bundle size by 37.0kB ⬆️

Bundle name Size Change
sentry-webpack-bundle-array-push 26.11MB 37.0kB ⬆️

@@ -311,7 +311,7 @@ def make_session(i):
duration=None,
errors=0,
# The line below is crucial to spread sessions throughout the time period.
started=received - i,
started=received - i - 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had to do this since the new metrics based implementation has the end interval as non inclusive so we want data to fit within the interval because we use interval: x in tests which is going back x seconds from the current time.

@lynnagara lynnagara merged commit c4349cf into master Apr 3, 2024
49 checks passed
@lynnagara lynnagara deleted the sessions-is-not-default branch April 3, 2024 21:55
shellmayr pushed a commit that referenced this pull request Apr 10, 2024
Sessions does not even exist anymore, the metrics pipeline should be
used for release health. Flip the default as all environments have been
cut over.
@github-actions github-actions bot locked and limited conversation to collaborators Apr 19, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants