Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy_latency_response_out is always equal to DURATION time of the request #9317

Closed
edubonifs opened this issue Apr 3, 2024 · 5 comments · Fixed by #9489
Closed

proxy_latency_response_out is always equal to DURATION time of the request #9317

edubonifs opened this issue Apr 3, 2024 · 5 comments · Fixed by #9489
Labels
Area: Envoy activities related to envoy proxy Committed: 1.17 Prioritized Indicating issue prioritized to be worked on in RFE stream Type: Bug Something isn't working

Comments

@edubonifs
Copy link

edubonifs commented Apr 3, 2024

Gloo Edge Product

Enterprise

Gloo Edge Version

1.15.x

Kubernetes Version

1.26

Describe the bug

Testing with proxyLatency response_out, I observed that

proxy_latency_response_out: '%DYNAMIC_METADATA(io.solo.filters.http.proxy_latency:response_out)%'

Is always equal to:

DURATION: '%DURATION%'

Expected Behavior

For me, latency for response_out should only measure the latency on response time, not having the same value as duration of the total request.

Steps to reproduce the bug

Gateway object:

apiVersion: gateway.solo.io/v1
kind: Gateway
metadata:
  annotations:
  labels:
    app: gloo
  name: gateway-proxy
  namespace: gloo-system
spec:
  bindAddress: '::'
  bindPort: 8080
  httpGateway:
    options:
      proxyLatency:
        measureRequestInternally: true
        request: LAST_INCOMING_LAST_OUTGOING
        response: LAST_INCOMING_LAST_OUTGOING
  options:
    accessLoggingService:
      accessLog:
      - fileSink:
          jsonFormat:
            DURATION: '%DURATION%'
            proxy_latency_response_out: '%DYNAMIC_METADATA(io.solo.filters.http.proxy_latency:response_out)%'
          path: /dev/stdout
  proxyNames:
  - gateway-proxy
  ssl: false
  useProxyProto: false

Inject some fault injection on request:

apiVersion: gateway.solo.io/v1
kind: VirtualService
metadata:
  name: vs
  namespace: gloo-system
spec:
  virtualHost:
    domains:
    - '*'
    routes:
    - matchers:
      - prefix: /
      options:
        faults:
          delay:
            fixedDelay: 2s
            percentage: 100
      routeAction:
        single:
          upstream:
            name: httpbin-httpbin-8000
            namespace: gloo-system

As you see, I have a delay of 2s on the request, however, if I check my logs:

{"DURATION":2006,"proxy_latency_response_out":2006}

proxy_latency_response_out is always taking the total response time

Additional Environment Detail

No response

Additional Context

No response

┆Issue is synchronized with this Asana task by Unito

@edubonifs edubonifs added the Type: Bug Something isn't working label Apr 3, 2024
@DuncanDoyle DuncanDoyle added Prioritized Indicating issue prioritized to be worked on in RFE stream Area: Envoy activities related to envoy proxy Committed: 1.17 labels Apr 12, 2024
@DuncanDoyle
Copy link
Contributor

Reproducer project: https://github.com/DuncanDoyle/ge-gloo-9317

@nfuden
Copy link
Contributor

nfuden commented Apr 24, 2024

Proxy latency definitely needs more docs.

However this is what it does: it gets the total time in the proxy so that it can report what the overhead of using the proxy is.

In this case the reproduction steps involve setting a delay via the fault filter https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/fault_filter which can do things like simulate a delay on a request.

Unfortunately in this case we have the following issue fault filter is technically in envoy and therefore counts towards proxy latency filters time.

EDIT: removing a section previously here about nighthawk. While it had mostly correct info it actually had a piece that was not quite right. Clarifying comment below.

@nfuden nfuden closed this as completed Apr 24, 2024
@nfuden nfuden reopened this May 5, 2024
@nfuden
Copy link
Contributor

nfuden commented May 5, 2024

Reopened for two parts.

Issue as stated here:
DURATION versus request_out how are they different.
Today the dynamic metadata of request_out is recorded as the first byte sent to the downstream while DURATION is when the whole requestresponse is done. This means that the difference in the two is minimal.

Gap:
Proxy latency filter today emits 2 types of artifacts. It emits 3 fields in DYNAMIC_METADATA which are mainly for internal use. These are as follows 1.response_out: the time when the first byte is sent downstream 2. request_out: the time when the first byte is sent upstream 3.request_out_internal: the time until the first byte of headers is processed
Proxy latency filter also emits STATS as an artifact: These are proxy_latency.rq_proxy_latency and proxy_latency.resp_proxy_latency the exact value of these is tunable via configuration as found here https://docs.solo.io/gloo-edge/latest/reference/api/github.com/solo-io/gloo/projects/gloo/api/external/envoy/extensions/proxylatency/proxylatency.proto.sk/#measurement

Possible follow ups:

  1. Improve documentation on this in main docs
  2. Expose the stats as metadata as well

@nfuden
Copy link
Contributor

nfuden commented May 6, 2024

After further investigation it seems like we should mainly push users to recreate these stats if needed in metadata.
Most of the possible setups we have with stats can be derived via the current metadata from this along with the upstream operators
REQUEST_TX_DURATION
RESPONSE_TX_DURATION

@nfuden
Copy link
Contributor

nfuden commented May 15, 2024

Put up PR for this.

Includes some additional documentation around how this plays with commun_duration envoyproxy/envoy#33240

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Envoy activities related to envoy proxy Committed: 1.17 Prioritized Indicating issue prioritized to be worked on in RFE stream Type: Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants