Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a metrics label for HTTP "Referer" header or the full Request URL #669

Open
DimitriosNaikopoulos opened this issue Mar 6, 2024 · 3 comments

Comments

@DimitriosNaikopoulos
Copy link

Hey all 馃憢

Feature / Idea

Could we have the HTTP "Referer" header or the full Request URL (including both the DNS and the path) as a label in the metrics that Beyla exports?

I was thinking if this information was available (even as raw information) we could have our own transformation rule in Prometheus to only keep the parts that we want. Then we could just keep the DNS so we can identify which internal service name or even external DNS the request used to access a specific service.

This could be hidden behind a feature flag since not everyone may want this functionality especially because this could significantly increase metric cardinality in Prometheus.

This feature could be refined even further if we do not want the whole Request URL but just the DNS since there are already labels that can report the request path

Example

We have a request in a k8s cluster from serviceA in namespaceA to serviceB in namespaceB. The request URL of such a request will look something like serviceB.namespaceB:1234/path/to/action. Since the request was initiated from serviceA Beyla already decorates the metrics with the appropriate k8s labels for that server request. By also having a label with information like the previous URL (or even only the DNS) we could create graphs that show information about service-to-service communications and not just general incoming/outgoing graphs for a single service with no ability to check if outgoing request to serviceB is what is causing latency issues.

This ticket is inspired by metrics generated from different service mesh that include information about the destination and source service as labels for the Prometheus metrics

@grcevski
Copy link
Contributor

grcevski commented Mar 6, 2024

Thanks @DimitriosNaikopoulos, this is a really good idea. I have couple of thoughts and questions.

  1. The URL part is relatively easy, we would just need to increase the amount of memory we use on the eBPF side to capture the URL, to make sure we get it all, at least in most cases. At the moment, we limit it to 160 bytes.

  2. The 'Referrer' header is a bit more challenging. It would be easy for Go services, but for the other languages where we use pure kernel network packet monitoring, we'll be restricted to kernels 5.17 and newer. Is this a problem? The main restriction is that we'll need to use the bpf_loop helper to walk the headers buffers, since headers can be large, we can't anticipate that the referrer will be in the first 256 bytes or so for us to find it. We tried in the past to copy the headers buffers fully and pass them to userspace to be extracted, because of lack of loops in BPF, but it causes the monitoring overhead to be significant.

  3. We do have a new feature, which is in the main branch that captures network flows to be used for service graphs as you suggested. We reverse DNS on kubernetes, so you'll be able to get the pod names etc. The one current limitation is that we only support pushing metrics with OpenTelemetry, and we haven't added Prometheus scrape support yet, but it can be overcome by deploying the OpenTelemetry collector with Beyla and configuring Prometheus scrape on it. Would this work for your usecase? We still need to document this feature, but it's there at the moment in the codebase, albeit still unstable and being worked on...

@DimitriosNaikopoulos
Copy link
Author

Hey @grcevski 馃憢

  1. That could be great although the initial thought/idea was capturing the DNS. I just proposed to capture the whole Request URL and expose it as a label since it could require less work from Beyla and more work in Prometheus (to clean the label as we see fit). So I don't think increasing the limit will solve this use case since we are mostly interested in the DNS.

  2. We are currently running our clusters in AWS EKS and at least the current AMIs that we are using have a kernel version of 5.10 according to uname -rm so I don't think this will work at least for our use case.

  3. That could be great to give it a try in our pre-production clusters for testing. We currently do not have a remote-write with our Prometheus but if we could scrape the OpenTelemetry collector that could be an easy test. Please let me know if there is a feature flag that we need to toggle to use that feature or if we need to use a specific image (or build the image from the main branch). I will have a look over the weekend in the otel exporter to check if I can spot that functionality and how easy is to add it (if possible) to the Prometheus exporter 馃

@grcevski
Copy link
Contributor

grcevski commented Mar 7, 2024

OK, great. There's a PR open for adding docs for the new feature #675. This is only with the main branch for now, but hopefully we'll make a release soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants