Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metric-outcomes): Add initial topic and schema #231

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
10 changes: 10 additions & 0 deletions examples/outcomes-metrics/1/outcomes-metrics-full.json
@@ -0,0 +1,10 @@
{
"timestamp": 1709812800,
"org_id": 1,
"project_id": 1,
"mri": "d:custom/request_duration@seconds",
"outcome": 2,
"reason": "foobar",
"quantity": 42,
"cardinality": 1337
}
@@ -0,0 +1,8 @@
{
"timestamp": 1709812800,
"org_id": 1,
"project_id": 1,
"mri": "c:custom/clicks@none",
"outcome": 0,
"cardinality": 1337
}
@@ -0,0 +1,8 @@
{
"timestamp": 1709812800,
"org_id": 1,
"project_id": 1,
"mri": "c:custom/clicks@none",
"outcome": 0,
"quantity": 1337
}
49 changes: 49 additions & 0 deletions schemas/outcomes-metrics.v1.schema.json
@@ -0,0 +1,49 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent is the outcomes-metrics schema actually different from outcomes or is it mostly the same?

In Snuba, all of outcomes, outcomes-billing and loadbalancer-outcomes all share the exact same consumer code, and we just re-use the same schema in those scenarios. In the past we generally focused on having schemas reflect the minimum set of requirements for a consumer to work correctly. If the consumer is the same one, it might be simpler to use the existing outcomes schema for the new topic rather than maintain a separate one. On the other hand if the consumer code is different, I think a different schema here makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest differences to the existing outcomes are:

  • Metric outcomes are tracked by MRI (consumer is expected to parse type and namespace from the MRI and materialize it in the database)
  • Metric outcomes have quantity and cardinality tracked, where latter is aggregated max by hour instead of sum

We considered trying to make this work with the current outcomes but came to the conclusion that the metric outcomes are too different and we'd rather have a separate concept just for metrics.

"$schema": "http://json-schema.org/draft-07/schema#",
"title": "metrics-outcome",
"$ref": "#/definitions/MetricsOutcome",
"definitions": {
"MetricsOutcome": {
"properties": {
"timestamp": {
"description": "The UNIX timestamp in seconds when the outcome was created, optionally truncated to the last hour.",
"$ref": "#/definitions/U64"
},
"org_id": {
"description": "The organization for which this outcome is being sent.",
"$ref": "#/definitions/U64"
},
"project_id": {
"description": "The project for which this outcome is being sent.",
"$ref": "#/definitions/U64"
},
"mri": {
"description": "Valid metric resource identifier `<type>:<namespace>/<name>[@<unit>]` for which the outcome is being sent.",
"type": "string"
},
"outcome": {
"description": "Outcome ID, metric outcomes share the same numeric outcome ID with regular outcomes.",
"$ref": "#/definitions/U64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should limit this to existing outcome IDs, or deliberately leave it open. I tend towards leaving it open.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to follow the outcomes schema and leave it open.

},
"reason": {
"description": "Optional machine readable, free-form reason code.",
"type": ["string", "null"]
},
"quantity": {
"description": "Amount of metric buckets accepted by Relay (volume).",
"type": "integer"
},
"cardinality": {
"description": "Maximum observed cardinality of the metric.",
"type": "integer"
Dav1dde marked this conversation as resolved.
Show resolved Hide resolved
}
},
"required": ["timestamp", "org_id", "project_id", "mri", "outcome"]
},
"U64": {
"type": "integer",
"minimum": 0,
"maximum": 18446744073709551615
}
}
}
16 changes: 16 additions & 0 deletions topics/outcomes-metrics.yaml
@@ -0,0 +1,16 @@
topic: outcomes-metrics
pipeline: outcomes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is used primarily to identify a pipeline logically. @lynnagara and @untitaker can provide more context.
But it should be correct as we call outcomes everything that populates the outcomes and billing data,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the metrics outcomes eventually makes it's way to the shared outcomes table in clickhouse with all the other outcomes? So this pipeline is probably fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be parallel to the current outcomes with their own schema and from the looks of it in a separate clickhouse cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything is fully separate, I think you can change this to outcomes-metrics or something like that

description: Outcomes for metrics sent to Sentry
services:
producers:
- getsentry/relay
consumers:
- getsentry/snuba
- getsentry/super-big-consumers
Comment on lines +5 to +9

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pipeline meant to bypass the current metrics outcomes pipeline that takes data in after the indexer ? If yes can we get rid of the old one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with any (other) metrics outcomes pipeline, so I assume the answer is yes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

schemas:
- version: 1
compatibility_mode: none
type: json
resource: outcomes-metrics.v1.schema.json
examples:
- outcomes-metrics/1