Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add basic schema and examples for ingest-events [INC-660] #230

Merged
merged 6 commits into from Mar 11, 2024

Conversation

lynnagara
Copy link
Member

@lynnagara lynnagara commented Mar 6, 2024

Adding the schema for ingest-events so we can perform more validations against clients (producers + consumers) on this topic.

Specifically, the are 4 known immediate use cases for this:

  • Exceptions raised in the ingest consumer can be checked against this schema and DLQed if they do not conform
  • Relay CI can check that messages produced on this topic conform to this schema
  • Sentry devserver will validate all incoming messages against this schema
  • Sentry CI can launch all consumers and pass these messages in and assert that they do not crash

This is based on https://github.com/getsentry/relay/blob/19230c0deabd41096830e46fbec821999dd51ee3/relay-server/src/services/store.rs#L1246-L1261

Adding the schema for ingest-events so we can perform more validations
against clients (producers + consumers) on this topic.

Specifically, the are 4 known immediate use cases for this:
- Exceptions raised in the ingest consumer can be checked against this schema and DLQed if they do not conform
- Relay CI can check that messages produced on this topic conform to this schema
- Sentry devserver will validate all incoming messages against this schema
- Sentry CI can run these examples against consumer code
@lynnagara lynnagara requested review from a team as code owners March 6, 2024 04:10
@@ -0,0 +1,14 @@
{
Copy link
Member Author

@lynnagara lynnagara Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To double check: are there other message types sharing this topic right now? User feedback?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Until we have migrated relay, SDK, etc to ingest-feedback-events

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I compared this to the example you added in #227.

event_id and type is there, but you have no project_id or payload in your schema? Is that actually true? How do you know what project the user feedback belongs to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is probably project_id in every event, and I can add that. But don't want to make it required, since I don't think it's required for feedback events. (In fact it turns out the feedback fields themselves are all optional 😅)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link you sent represents the payload that the user sends to Relay though, not the one received on this topic which happens after Relay processing. I assume the project ID is automatically added by Relay and is not optional, as a user feedback that doesn't belong to any project doesn't really make any sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lynnagara lynnagara requested a review from a team March 8, 2024 18:20
@lynnagara
Copy link
Member Author

@jjbayer @Dav1dde can you take a look from the ingest side?

"properties": {
"type": { "type": "string" },
"payload": {
"description": "bytes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "description": "bytes" purely information for the human reader, or do we have a validator that special-cases this for msgpack messages?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only for humans, this is not doing any validation

Comment on lines +7 to +9
"payload": {
"description": "bytes"
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON schema allows additionalProperties by default, right? Because Relay does send more fields: https://github.com/getsentry/relay/blob/87b23e6953cd18a0d591c9d13e43baee0d0d4b95/relay-server/src/services/store.rs#L1268-L1271

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does -- the example included here also has those extra fields and is passing the tests

@lynnagara lynnagara merged commit 171532b into main Mar 11, 2024
14 checks passed
@lynnagara lynnagara deleted the ingest-events branch March 11, 2024 16:22
lynnagara added a commit to getsentry/sentry that referenced this pull request Mar 15, 2024
This PR depends on getsentry/sentry-kafka-schemas#230

It attempts to be somewhat cautious about DLQing and not DLQ
anything that is even potentially retriable. This could be tweaked
later.

If a non-retriable exception is raised, the consumer tries to further determine whether the message
is actually bad (or it's some transient retriable error) by validating the message against
the schema. The message will be DLQed only if that also fails.
JonasBa pushed a commit to getsentry/sentry that referenced this pull request Mar 17, 2024
This PR depends on getsentry/sentry-kafka-schemas#230

It attempts to be somewhat cautious about DLQing and not DLQ
anything that is even potentially retriable. This could be tweaked
later.

If a non-retriable exception is raised, the consumer tries to further determine whether the message
is actually bad (or it's some transient retriable error) by validating the message against
the schema. The message will be DLQed only if that also fails.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants