Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metric-outcomes): Add initial topic and schema #231

Closed
wants to merge 7 commits into from

Conversation

Dav1dde
Copy link
Member

@Dav1dde Dav1dde commented Mar 8, 2024

Preliminary schema for the outcomes-metrics topic, work described and tracked in getsentry/relay#3147.

The cardinality description is vague on purpose, custom metrics will most likely have cardinality tracked per hour, but this is subject to change and may be different for other metric types (or even completely missing).

@Dav1dde Dav1dde requested review from a team as code owners March 8, 2024 08:50
@@ -0,0 +1,16 @@
topic: outcomes-metrics
pipeline: outcomes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is used primarily to identify a pipeline logically. @lynnagara and @untitaker can provide more context.
But it should be correct as we call outcomes everything that populates the outcomes and billing data,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the metrics outcomes eventually makes it's way to the shared outcomes table in clickhouse with all the other outcomes? So this pipeline is probably fine

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be parallel to the current outcomes with their own schema and from the looks of it in a separate clickhouse cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If everything is fully separate, I think you can change this to outcomes-metrics or something like that

@Dav1dde Dav1dde self-assigned this Mar 8, 2024
@Dav1dde Dav1dde requested a review from a team March 8, 2024 08:50
Copy link
Member

@iker-barriocanal iker-barriocanal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I don't have context on how these schemas are used so you may want to get another approval.

},
"outcome": {
"description": "Outcome ID, metric outcomes share the same numeric outcome ID with regular outcomes.",
"$ref": "#/definitions/U64"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should limit this to existing outcome IDs, or deliberately leave it open. I tend towards leaving it open.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to follow the outcomes schema and leave it open.

schemas/outcomes-metrics.v1.schema.json Outdated Show resolved Hide resolved
@@ -0,0 +1,16 @@
topic: outcomes-metrics
pipeline: outcomes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is used primarily to identify a pipeline logically. @lynnagara and @untitaker can provide more context.
But it should be correct as we call outcomes everything that populates the outcomes and billing data,

Comment on lines +5 to +9
producers:
- getsentry/relay
consumers:
- getsentry/snuba
- getsentry/super-big-consumers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pipeline meant to bypass the current metrics outcomes pipeline that takes data in after the indexer ? If yes can we get rid of the old one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not familiar with any (other) metrics outcomes pipeline, so I assume the answer is yes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 37 to 40
"cardinality": {
"description": "Maximum observed cardinality of the metric.",
"$ref": "#/definitions/U64"
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume for "max cardinality" we mean the number of distinct combinations of tags keys:values observed in the bucket ? OR over a period of time ?
This would be useful information in the schema for who decides to consume from this topic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is over time, current plan for custom metrics is over the span of an hour (because that's what people are interested and and what Relay will be enforcing).

I'll try to improve the description, but I was a bit vague on purpose because the time frame may change and may be different per usecase.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to improve the description, but I was a bit vague on purpose because the time frame may change and may be different per usecase.

I think this may highlight a concern. If the time frame changes, that likely concerns both the producer and the consumers which can easily end up in incidents where the billing code miscounts metrics for assuming a wrong time frame. max_cardinality over a day vs over a minute is likely going to mean different things for the billing code.

I would suggest an alternative approach to make this more robust: add a time_frame field in the message that indicates what the time frame is.
This puts the responsibility to manage the time frame on the consumer in an explicit way and it allows you to change the time frame in a safe and backwards compatible way as the consumer immediately knows how to interpret the message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduced cardinality_window which is required when a cardinality is emitted.

Comment on lines +33 to +36
"quantity": {
"description": "Amount of metric buckets accepted by Relay (volume).",
"$ref": "#/definitions/U64"
},

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below. What's the time range we are talking about here? Is it fixed or is it dependent on relay bucketing time ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is independent of time, this is the amount of statsd elements Relay receives and parses out of envelopes without any aggregation.

@@ -0,0 +1,50 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To what extent is the outcomes-metrics schema actually different from outcomes or is it mostly the same?

In Snuba, all of outcomes, outcomes-billing and loadbalancer-outcomes all share the exact same consumer code, and we just re-use the same schema in those scenarios. In the past we generally focused on having schemas reflect the minimum set of requirements for a consumer to work correctly. If the consumer is the same one, it might be simpler to use the existing outcomes schema for the new topic rather than maintain a separate one. On the other hand if the consumer code is different, I think a different schema here makes sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The biggest differences to the existing outcomes are:

  • Metric outcomes are tracked by MRI (consumer is expected to parse type and namespace from the MRI and materialize it in the database)
  • Metric outcomes have quantity and cardinality tracked, where latter is aggregated max by hour instead of sum

We considered trying to make this work with the current outcomes but came to the conclusion that the metric outcomes are too different and we'd rather have a separate concept just for metrics.

Copy link
Member

@lynnagara lynnagara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me assuming that there will be a fully separate snuba consumer (and possibly clickhouse schema) and it's not shared with existing outcomes. If this ends up not being the case, I'd suggest later merging this together with outcomes so there would be one less schema to maintain.

@Dav1dde
Copy link
Member Author

Dav1dde commented Mar 20, 2024

We've decided to use generic metrics for now, I'll be cleaning up the Kafka topic.

@Dav1dde Dav1dde closed this Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants