Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

W3C implementation generates a new parentID for each request #308

Open
braunsonm opened this issue Jan 18, 2023 · 11 comments
Open

W3C implementation generates a new parentID for each request #308

braunsonm opened this issue Jan 18, 2023 · 11 comments

Comments

@braunsonm
Copy link

braunsonm commented Jan 18, 2023

Issue

As part of this PR: cloudfoundry/gorouter#261

W3C support was added which is great! However it seems like the functionality was decided to make new parentIDs when you receive a new request. This doesn't seem like the correct behaviour for the gorouter since it is only proxying the request and is not a member of the trace. As far as I know, gorouter does not have support for reporting to OTEL or Zipkin tracers and only acts as a helpful proxy with this header support.

https://github.com/cloudfoundry/gorouter/pull/261/files#diff-4c0749740a74100927f204e1e56dfa5334d5a43ae2a57f3d6dfafccfa7c02897R135-R156

The effect of this is traces using W3C are always disconnected when routed through the gorouter because the next hop receives and records a child span that is not connected to any known parent spans.

Affected Versions

All since this PR was merged: cloudfoundry/gorouter#261

Traffic Diagram

HAProxy ---> Gorouter ---> App ---> Gorouter ---> App

Steps to Reproduce

  1. Have 2 endpoints with tracing
  2. Have app1 call app2 through the gorouter
  3. Notice in the trace report the spans are disconnected with the exception of being in the same trace

Expected result

ParentIDs should not be recreated, they should be transparently passed to the next hop.

Current result

ParentIDs are always recreated, disconnecting your spans from their children.

Possible Fix

See linked lines above. Next() should not be called or generate a new parentID. cc @tlwr not sure if you still help out with this project or if you could provide some rationale.

@braunsonm
Copy link
Author

braunsonm commented Jan 18, 2023

I think the rationale for this was an attempt to follow the W3C specification as described here: https://www.w3.org/TR/trace-context/#a-traceparent-is-received

However gorouter should be following this part of the spec: https://www.w3.org/TR/trace-context/#alternative-processing Since it is specifically acting as a proxy in this case and not reporting its own processing spans.

Proxies or messaging middleware MAY decide not to modify the traceparent headers but remove invalid headers or add additional information to tracestate.

This is the approach that should be followed to ensure end-to-end traces are propagated correctly. This is how it currently works in the B3 implementation in the gorouter.

@tlwr
Copy link
Contributor

tlwr commented Jan 25, 2023

I have not worked on anything CF related for some time, and have forgotten most of the context

If I remember correctly, my logic for this was twofold:

  • I wanted gorouter to have its own span, so I could explicitly evaluate the impact of Gorouter in the trace (rather than only implicitly)
  • something to do with route services ?

Depending on your use-case, the behaviour of gorouter should/could be different, and I have no strong opinion to changing behaviour

@ameowlia
Copy link
Member

👋 Hi @tlwr! It's good to see you around again.

Thank you for opening this @braunsonm, apologizes for the delayed response. Your explanations about why ParentIDs should not be recreated makes sense to me. But, as always, I am worried about breaking people who rely on previous functionality. Though perhaps it could be considered a bug?

@ebroberson has been investigating what it would take to introduce zipkin headers for component logs in CF.

@ebroberson - what is your thought on this? In your ideal situation do you agree that gorouter should not be recreating the ParentID?

@ebroberson
Copy link
Contributor

Hi @braunsonm! Thanks for bringing this up!

The current implementation is in line with the W3C spec, since "proxies MAY choose not to modify it". To me, this implies that the default behavior would be to modify the traceparent.

I can definitely see your use case, though, and support this being a configurable property. I'd love to see this get PR'd.

@braunsonm
Copy link
Author

braunsonm commented Jan 27, 2023

I disagree @ebroberson. The line MAY is contextually for cases where the proxy is also reporting its own processing, it is then optional for that component to modify the traceparent. Since the gorouter does not report spans, it should not be modifying traceparent.

If you want to read it that way, then the spec says:

Update parent-id: The value of property parent-id MUST be set to a value representing the ID of the current operation.

The "current operation" in this case is never reported to the tracer.. so I don't think this mode of operation is relevant.

I'd love to know the reason you think the gorouter should behave this way, it serves no purpose, and even worse it disconnects spans from any apps using W3C properly. This is definitely a bug.

@tlwr
Copy link
Contributor

tlwr commented Jan 28, 2023 via email

@ebroberson
Copy link
Contributor

References to upcoming work subject to change.

In some upcoming work for zipkin headers, gorouter will be assigning its own span-id, so the keeping current W3C implementation was for consistency purposes and to keep from potentially breaking projects that may be relying on this functionality. W3C headers unfortunately don't have the pair of span-id+parent-span-id in addition to the trace-id, which is why I suggest that the default functionality be preserved, but be configurable so that we can account for both use cases.

@braunsonm
Copy link
Author

braunsonm commented Jan 30, 2023

keep from potentially breaking projects that may be relying on this functionality

Again I challenge this @ebroberson.. What user is relying on the fact that gorouter is breaking the connection of spans? Can you actually point to any use case where this would be desired behaviour?

W3C headers unfortunately don't have the pair of span-id+parent-span-id in addition to the trace-id, which is why I suggest that the default functionality be preserved,

I'm not sure what your point is here... W3C does have the parent span ID in the traceparent. It wouldn't make sense to include the "current span-id", as that is implicitly the parent span ID.

I'm not intentionally being combative on this issue, but I cannot follow the logic that this would ever be desired behavior or adherent to the specification and this is not the same behaviour as the gorouters own B3 implementation. In the end though I suppose if it's an option that's fine, as long as it is added 😄

@ebroberson
Copy link
Contributor

ebroberson commented Jan 30, 2023

I'm not saying that it's desired behavior, just existing behavior that must be taken into account.

My point there was that zipkin headers do have a span-id and parent-span-id, which allows gorouter to set those headers and preserve the steps in the trace.

Upon further reading of the spec, it looks like the tracestate header is used like the span-id of the B3 header, and traceparent serves the same purpose of the parent-span-id. Does that match your understanding? If that is how the W3C headers are used, I'd be fine with changing the behavior to reflect that like you've suggested. @ameowlia do you have any further insight into this?

domdom82 pushed a commit to domdom82/routing-release that referenced this issue Jul 12, 2023
…ns (cloudfoundry#308)

Golang 1.17 will no longer accept them when url.Parse() is called.
Gorouter does not look at/modify the request params, however we're
adding warnings to the gorouter log + updating the X-Cf-RouterError
header so app devs + operators will be able to trace back requests
if they have an app performing strangely

Signed-off-by: Geoff Franks <gfranks@vmware.com>

Co-authored-by: Chris Selzo <cselzo@vmware.com>
mkocher added a commit to cloudfoundry/gorouter that referenced this issue Apr 4, 2024
Adds the span ID and trace ID for each request to the corresponding
HTTPStartStop envelopes as tags, if they're present.

Should help operators and developers to track their traces through
gorouter (see
cloudfoundry/routing-release#308).

Also aids corresponding work within the OTel Collector to export
HTTPStartStop envelopes as traces.

Signed-off-by: Carson Long <12767276+ctlong@users.noreply.github.com>
@mkocher
Copy link
Member

mkocher commented Apr 4, 2024

We're working on being able to export traces from gorouter using the otel-collector release. cloudfoundry/gorouter#407 is the first piece of this work.

@braunsonm If you're still interested we'd love it if you were able to deploy this after we've got the otel-collector changes done and see if the traces can show up in your tracing system without gaps

@braunsonm
Copy link
Author

braunsonm commented Apr 5, 2024

Would definitely be interested in trying it when it's released @mkocher

Do you have a roadmap? In order to use it on our side we would want a way to provide basic auth and a tenancy header for Tempo.

geofffranks pushed a commit to cloudfoundry/gorouter that referenced this issue Apr 5, 2024
Adds the span ID and trace ID for each request to the corresponding
HTTPStartStop envelopes as tags, if they're present.

Should help operators and developers to track their traces through
gorouter (see
cloudfoundry/routing-release#308).

Also aids corresponding work within the OTel Collector to export
HTTPStartStop envelopes as traces.

Signed-off-by: Carson Long <12767276+ctlong@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants