Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve streaming metrics #24886

Closed

Conversation

ThisIsMissEm
Copy link
Contributor

This branches off from #24702 (as the package.json has been split); Only the last two commits are new.

This pull request:

  • Splits internal API from external API for the streaming server
  • Migrates from hardcoded metrics to prom-client
    • Adds collection of the default metrics as recommended by prometheus.
    • Adds labels to the 'connected_clients' metric to allow analysis by type of http vs websocket connection.
    • Adds 'streaming_latency' metric which indicates the lag between when an event was published to redis and when the streaming server consumed the event.
    • Corrects measurement of 'connected_channels' metric to report number of topic subscriptions to redis, not the number of handlers subscribed to those topics. We may wish to change this in the future to use labelling, so you can get a breakdown of connections per channel type, and potentially messages per channel type.

* Adds labels to the 'connected_clients' metric to allow analysis by type of http vs websocket connection.

* Adds node.js performance metrics,

* Corrects measurement error in 'connected_channels' metric to only report number of subscriptions to redis, not the number of handlers subscribed

* Adds 'streaming_latency' metric which indicates the lag between when an event was published to redis and the streaming server consumed it
res.set('Content-Type', metrics.register.contentType);
res.end(await metrics.register.metrics());
} catch (ex) {
res.status(500).end(ex);

Check warning

Code scanning / CodeQL

Information exposure through a stack trace Medium

This information exposed to the user depends on
stack trace information
.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably fix this, but the /metrics endpoint should never be exposed to the public internet anyway.

@@ -621,6 +692,8 @@ const startServer = async () => {
const delta = now - queued_at;
const encodedPayload = typeof payload === 'object' ? JSON.stringify(payload) : payload;

streamingLatency.observe(delta);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this is often undefined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a lot of the events don't actually carry the queued_at property, and as such lag cannot be calculated.

// FIXME: most stream events don't actually carry a queued_at timestamp
// const now = new Date().getTime();
// const delta = now - queued_at;
// streamingLatency.observe(delta);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code didn't actually work as expected, as it was usually doing delta = now - undefined because most events didn't carry the queued_at timestamp.

@renchap renchap added the streaming Streaming server label May 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented May 9, 2023

This pull request has merge conflicts that must be resolved before it can be merged.

@ThisIsMissEm ThisIsMissEm changed the title Improve streaming metrics WIP: Improve streaming metrics May 11, 2023
});

const streamingLatency = new metrics.Histogram({
name: 'streaming_latency',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is disabled at present, as not all redis.publish payloads carry the queued_at value.

@renchap @Gargron to solve this, we'd be best creating a Streaming class/module which takes care of serialising all the events, instead of having that code and the publish calls distributed across the codebase.

@ThisIsMissEm
Copy link
Contributor Author

This pull request has been superseded by #26299

@ThisIsMissEm ThisIsMissEm deleted the improve-streaming-metrics branch August 2, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants