Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update OTEL trace_id value from a request header #5056

Open
Samjin opened this issue Apr 30, 2024 · 9 comments
Open

Update OTEL trace_id value from a request header #5056

Samjin opened this issue Apr 30, 2024 · 9 comments
Assignees
Labels
component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. raised by user

Comments

@Samjin
Copy link

Samjin commented Apr 30, 2024

Is your feature request related to a problem? Please describe.

There is an existing infrastructure that utilizes a dedicated request header for trace ID. In Router, it'd make sense to use that as the value of OTEL trace_id for all logging, metrics, and span tags. This could reduce multiple trace ids labeled everywhere and also helps to correlate request logs from apollo-router, otherwise user may have to lookup 2 different trace ids to find a request logs.

i.e.

{ [-]
   dd.trace_id: 7188631615460499999
   fetch_error: SubrequestHttpError { status_code: None, service: "xxx", reason: "error trying to connect: Connection reset by peer (os error 104)" }
   filename: /usr/local/cargo/git/checkouts/router-18de2f110854a520/155375b/apollo-router/src/services/subgraph_service.rs
   level: ERROR
   span_id: fc8f2f9c6b29999
   trace_id: 000000000000000063c3265e04aa1b0f
   ...
}

As you can see above, the trace_id is not the one our infrastructure use, but generated by apollo router.

i.e. we use trace_id for Splunk and Datadog in current Node gateway, and would like to continue to use same names in Router. If display_trace_id: true is enabled from logging, there maybe be two trace_ids that could potentially overwrite each other if located at same level(I have not check if this is the case today but maybe possible?).

Describe the solution you'd like

Have an option to override default trace_id value from a request header's UUID like value. This should update default trace_id value everywhere, including span attributes, logging, metrics etc.

trace id header will be created in router_service if it doesn't exist, this created header should be able to propagate as trace_id as well

@Samjin Samjin changed the title Update OTEL trace_id value with a request header Update OTEL trace_id value from a request header Apr 30, 2024
@abernix abernix added the component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. label May 6, 2024
@abernix
Copy link
Member

abernix commented May 6, 2024

The router would only originate a trace Id when it doesn't get one from the headers. Do you have tracing propagation setup or any exporters enabled? Can you share more of your configuration?

@abernix
Copy link
Member

abernix commented May 6, 2024

Put another way, the thing you're asking for should definitely already work. The router is meant to play nice with existing distributed tracing headers, including passing them to subgraphs as fit.

@BrynCooke Anything to add here?

@Samjin
Copy link
Author

Samjin commented May 6, 2024

The router would only originate a trace Id when it doesn't get one from the headers. Do you have tracing propagation setup or any exporters enabled? Can you share more of your configuration?

I tried it and left comment here. #4823 (comment)

          datadog:
            enabled: true
            enable_span_mapping: true
            batch_processor:
              ...
            endpoint: "xx"
          propagation:
            datadog: true
            request:
              header_name: trace-id

I think it trimmed off the - from UUID. Our downstream received value like traceid=28e9fb547e465690f288cdb96194d119 which failed their UUID validation.

Update: i'll take another look tmr to observe the change.

@Samjin
Copy link
Author

Samjin commented May 7, 2024

I still get a different trace_id than trace-id header used above. The log is from subgraph_service.

image

And, subgraph would receive this error(invalid UUID) from trace-id header. So somewhere overwrite trace-id header.
image

Also the following trace ids are mutual exclusive in router span attrs.

      router:
        attributes:
          trace_id: true
          trace-id:
            request_header: trace-id

@Samjin
Copy link
Author

Samjin commented May 7, 2024

In debug mode, I get following error

  "message": "cannot generate custom trace_id: invalid digit found in string",
  "target": "apollo_router::plugins::telemetry",

looks like trace_id is recreated because it couldn't parse UUID header.

@abernix
Copy link
Member

abernix commented May 13, 2024

What is your subgraph server implemented in and what telemetry mechanism (SDK, format, etc.) are you using in it?

@BrynCooke
Copy link
Contributor

@Samjin I think that this PR will likely fix things for you: #5071
We are targeting this for the next release.

@Samjin
Copy link
Author

Samjin commented May 14, 2024

image image

I'm using latest dev, both custom and apollo-router events are logged with trace_id, which is now using original UUID but without dashes, but the id is only added in deprecate spans mode. trace_id doesn't appear inspec_compliant logs.

That said, even if it is successfully added in spec_compliant mode, it's still a different trace id than the UUID that up/downstream services use. So I'm not completely sure how removing - from UIUD would solve this issue.

I think to fix this temporarily, the easy way maybe just allow user to have a req header selector to create custom attribute for all logs, then they can just disable the default trace_id. Doing this way may require user to add their own trace id to their custom events.

@Samjin
Copy link
Author

Samjin commented May 14, 2024

What is your subgraph server implemented in and what telemetry mechanism (SDK, format, etc.) are you using in it?

They use opentracing in SDK for datadog. It looks like trace context use trace-id https://github.com/opentracing/specification/blob/master/rfc/trace_identifiers.md#trace-context-http-headers

But still Im not completely sure they are related, it'd be great if anyone know how they work together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/open-telemetry OTLP, Datadog, Prometheus, etc. and the integrations around it. raised by user
Projects
None yet
Development

No branches or pull requests

3 participants