-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add guidance for adding new metrics #5116
base: dev
Are you sure you want to change the base?
Conversation
@BrynCooke, please consider creating a changeset entry in |
CI performance tests
|
Co-authored-by: Jesse Rosenberger <git@jro.cc>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of what is encoded in this document is unclear to me, I think we should discuss it a bit more
## Adding new metrics | ||
There are different types of metrics. | ||
|
||
* Static - Used by us to monitor feature usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Static - Used by us to monitor feature usage. | |
* Static - Used by Router developers to monitor feature usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's assume router users will end up looking at the dev docs
> Why are static metrics no longer recommended for users to use directly? | ||
> | ||
> They can, but usually it'll be only a starting point for them. We can't predict the things that users will want to monitor, and if we tried we would blow up the cardinality of our metrics resulting in high costs for our users via their APMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not clear to me. What do we mean by "users using static metrics directly?" Is it when they would add that in their custom plugin? (which would not increase cardinality for all users) Or asking us to add a new metric to the router?
|
||
### Static metrics | ||
When adding a new feature to the Router you must also add new static metrics to monitor the usage of that feature and users cannot turn them off. | ||
These metrics must be low cardinality and not leak any sensitive information. Users cannot change these metrics and they are primarily for us to see how our features are used so that we can inform future development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of static metrics actually monitor standard router operations and are not for us to collect data, but for users to observe the router.
If we want this to be the defining point, let's maybe not call them static
VS dynamic
metrics, but internal
VS monitoring
or user
metrics, something like that?
I'd prefer we keep the distinction between static
metrics as defined directly with tracing
, and dynamic
metrics as the ones defined by custom instruments, that can be activated with runtime conditions, and have another clear separation beween the metrics used for internal reporting (as with the apollo.router.operations
and apollo.router.config
prefixes) and the user facing ones.
* Look at the [OTel semantic conventions](https://opentelemetry.io/docs/specs/semconv/general/metrics/) | ||
* Notify `#proj-router-analytics` channel in Slack. | ||
* Add the metrics to the spreadsheet linked in the `#proj-router-analytics` channel in Slack. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code example of what is a static metric, to be sure which is which between static and dynamic?
|
||
When defining new operation metrics use the following conventions: | ||
|
||
**Name:** `apollo.router.operations.<feature>` - (counter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of the apollo.router.operations
metrics are actually monitored by users. What is the strategy here? Do we keep them available for users?
Until now there has been little guidance on adding new metrics to the router. This PR expands the dev doc to include this.
Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩