Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize traffic metrics across traffic sources #354

Open
Tracked by #965
tomkerkhove opened this issue Dec 16, 2021 · 6 comments
Open
Tracked by #965

Standardize traffic metrics across traffic sources #354

tomkerkhove opened this issue Dec 16, 2021 · 6 comments
Labels
epic help wanted Extra attention is needed stale-bot-ignore All issues that should not be automatically closed by our stale bot traffic-sources All issues related to where HTTP traffic can come from

Comments

@tomkerkhove
Copy link
Member

As per #350 our goal is to support various traffic flavors and support them in a neutral way.

The tricky thing here is that there is no standard way of having metrics for these traffic flows and we have to provide our own, use metrics from ingress controllers, or rely on Service Mesh Interface's Traffic Metrics API.

We should move to a standardized approach so that:

  • We define and use standard for traffic metrics to have a unified approach, regardless of the traffic type/source
    • In a later stage, we can propose this metric to TAG Network as an open standard beyond the scope of KEDA
  • We unify all these metrics into a central place and use that as input for our external scaler
  • Change our interceptor so that it publishes the metrics it has today in to our central place above
    • There is a chance that we could remove this from the interceptor, but most likely we will still need it for our service-to-service support

Proposal

OpenTelemetry Metrics are about to go stable and is an open standard for using metrics in systems.

Standardizing on OpenTelemetry & its Collector

Our interceptor should be changed so that it can publish its metrics to an OpenTelemetry Collector so that we can bring the metrics where we need them and end-users can re-use these metrics for their own purposes:
image

These metrics should comply with the defined HTTP semantics as per this doc.

Once the metrics are available, we can choose one of the existing exporters (full overview) to consume the metrics by pushing metrics to our external scaler directly (HTTP-based or gRPC-based, preferred approach) or through an external system such as Prometheus (less preferred).

When end-users install the HTTP add-on, we should automatically install a collector, unless they opt-out and configure a different endpoint. However, ideally, we fully manage and configure the collector with all the bells and whistles that we need.

Bringing existing traffic metrics into our standardized metrics approach

In order to bring existing traffic metrics into our way of working we will need two components:

  1. An adapter per traffic source to pull the metrics and make them available in the collector
  2. A custom processor to transform the source metrics format to our standardized metrics format (learn more)

Some traffic sources might already be supported through an existing receiver.

For example, it would make sense to have an SMI-receiver that we can rely on instead of rolling our own. (servicemeshinterface/smi-spec#199)

Traffic Metrics Spec

SMI has its Traffic Metrics spec and OpenTelemetry is defining semantics for HTTP metrics.

We should aim to use those before rolling our own standard.

@tomkerkhove tomkerkhove added epic traffic-sources All issues related to where HTTP traffic can come from labels Dec 16, 2021
@arschles
Copy link
Collaborator

@tomkerkhove I really like the idea of having a KEDA-wide standard for HTTP metrics based on the OTEL semantics, so +1 to that. One thing that we should consider having, though, is that the interceptors should be able to push some (primitive) metrics down to external scalers. Doing so would allow for a completely push-based notification system from interceptor -> external scaler -> KEDA itself, and gives us the capability of scaling from zero more quickly.

@tomkerkhove
Copy link
Member Author

One thing that we should consider having, though, is that the interceptors should be able to push some (primitive) metrics down to external scalers. Doing so would allow for a completely push-based notification system from interceptor -> external scaler -> KEDA itself, and gives us the capability of scaling from zero more quickly.

Can you elaborate a bit more what you want to achieve here? I presume you mean the external scaler of HTTP add-on then or?

@arschles
Copy link
Collaborator

@tomkerkhove my basic ask is to reduce the latency between when a request comes into the cluster and when the external scaler (and thus, KEDA) knows about it. I'd love to see whether we can design something to push appropriate metrics from "edge" (ingress controllers and/or service meshes) to external scaler

@stale
Copy link

stale bot commented Mar 28, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Mar 28, 2022
@arschles arschles added the stale-bot-ignore All issues that should not be automatically closed by our stale bot label Mar 29, 2022
@stale stale bot removed the stale All issues that are marked as stale due to inactivity label Mar 29, 2022
@JorTurFer
Copy link
Member

Could this be related? #910

@tomkerkhove
Copy link
Member Author

That one feels more to operator HTTP add-on rather than a source of scaling, no?

@JorTurFer JorTurFer added the help wanted Extra attention is needed label Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic help wanted Extra attention is needed stale-bot-ignore All issues that should not be automatically closed by our stale bot traffic-sources All issues related to where HTTP traffic can come from
Projects
Status: To Do
Development

No branches or pull requests

3 participants