Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add otel tracing implementation support #2578

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

lucastt
Copy link
Contributor

@lucastt lucastt commented Sep 7, 2023

The idea of this PR is to add a new tracing implementation to skipper run options that would support OpenTelemetry (Otel), instead of the deprecated OpenTracing (Ot).

The initial plan is just to support Otel tracer creation and wrap Ot tracer into a custom wrapper that would implement Otel tracer and span interface but would call Ot under the hood.

With next PRs we would start to have Otel tracers as first class citizens in our structs and interfaces enabling a skipper user to use full Otel support in all functionalities.

On changing the tracer interface many modules were affected including hot path changes.

@lucastt lucastt self-assigned this Sep 7, 2023
@lucastt lucastt force-pushed the add-opentelemetry-support branch 4 times, most recently from fbf33d5 to 7b78485 Compare September 8, 2023 10:02
proxy/proxy.go Outdated Show resolved Hide resolved
proxy/proxy.go Outdated Show resolved Hide resolved
proxy/proxy.go Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
tracing/spanWrapper.go Outdated Show resolved Hide resolved
tracing/tracers/otel/otel.go Outdated Show resolved Hide resolved
tracing/tracing.go Outdated Show resolved Hide resolved
tracing/tracing.go Outdated Show resolved Hide resolved
filters/filters.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
proxy/tracing.go Outdated Show resolved Hide resolved
tracing/tracing.go Outdated Show resolved Hide resolved
tracing/tracers/otel/otel.go Show resolved Hide resolved
@lucastt lucastt force-pushed the add-opentelemetry-support branch 6 times, most recently from 5f873fa to c839ab5 Compare March 19, 2024 14:34
tracing/wrappers.go Outdated Show resolved Hide resolved
tracing/wrappers.go Outdated Show resolved Hide resolved
tracing/wrappers.go Outdated Show resolved Hide resolved
tracing/wrappers.go Outdated Show resolved Hide resolved
tracing/wrappers.go Outdated Show resolved Hide resolved
@lucastt lucastt force-pushed the add-opentelemetry-support branch 3 times, most recently from 6d84d8c to dd99682 Compare March 22, 2024 09:18
lucastt and others added 12 commits March 22, 2024 10:34
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucassthiesen@gmail.com>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucassthiesen@gmail.com>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
Signed-off-by: Lucas Thiesen <lucas.thiesen@zalando.de>
@lucastt
Copy link
Contributor Author

lucastt commented Mar 22, 2024

👍

// Tracer is the opentracing tracer to use in the client
Tracer opentracing.Tracer
// Tracer is the open telemetry tracer to use in the client
Tracer trace.Tracer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes I see Tracer and others OtelTracer, is this intended?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in some places I used OtelTracer to differentiate from Tracer and OtTracer, the former is only OT if we need to be backward compatible and the later is kept in places we need access to both tracers and I wanted to make it clear that it is one or the other. Marjority of cases it should be just tracer or Tracer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, when we discontinue OpenTracing, which will probably take a while, we can have all of them as just Tracer/tracer.

@lucastt
Copy link
Contributor Author

lucastt commented Mar 22, 2024

Regarding how I tested these changes locally, the strategy I used was unity tests + stdouttrace, which is a tracer that implements otel.Tracer interface and drops the span in JSON form in stdout or any other file.

I didn't test it directly with specific collectors or vendors like lightstep, I was planning to run skipper on a custom cluster and test it with lightstep collectors there before merging it.

stdouttrace implements all behaviors supported by Otel SDK (OTLP) what is not well tested is the config part, the arguments we would pass in skipper parameters that would configure the communication between skipper tracer and a tracer collector or tracing commercial solution.

cc: @MustafaSaber

@@ -83,7 +83,7 @@ func TestFifoChanges(t *testing.T) {
t.Helper()

client := net.NewClient(net.Options{
ResponseHeaderTimeout: 2 * time.Second,
ResponseHeaderTimeout: 3 * time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was flaky and was failing most of the time in my local environment, increasing the timeout solved the problem.

}),
)
// tee filters override if we have a tracer
if len(o.OpenTracing) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the override only apply for opentracing?

Copy link
Contributor Author

@lucastt lucastt Mar 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option has a very unfortunate name, but it contains both OpenTracing and OpenTelemetry tracers. I could not change the name to something with more meaning than Options.OpenTracing to not break backward compatibility with library users, which is not great but we don't have much choice. :/

}
globalTags[tag] = tagVal
}
case "otlp-exporter-endpoint":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From testing in the cluster: maybe an URL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll try to test with a local OTLP collector to make sure no parameter is missing. But one of the things we need to change is this parameter should not be split by port and address only because we need the URL path to route it to the proper OTLP endpoint in most collectors.

@szuecs
Copy link
Member

szuecs commented Mar 22, 2024

Tested in the cluster: compatibility mode works great

image

otel does not work right now, because of otlp-exporter-endpoint (likely requires an URL).

Copy link
Member

@AlexanderYastrebov AlexanderYastrebov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this big effort. I did a first review pass and left my comments.

},
DNSDone: func(httptrace.DNSDoneInfo) {
span.LogKV("DNS", "end")
span.AddEvent("DNS", trace.WithAttributes(attribute.String("DNS", "stop")))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why end changed to stop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is a mistake. Good catch!

originalRequest.Body = body
cc.request = cr
return cc, nil
}

func (c *context) Loopback() {
loopSpan := c.Tracer().StartSpan(c.proxy.tracing.initialOperationName, opentracing.ChildOf(c.ParentSpan().Context()))
defer loopSpan.Finish()
_, loopSpan := c.Tracer().Start(c.request.Context(), c.proxy.tracing.initialOperationName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What return value is ignored here and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an stdlib context. This context doesn't seem to be used in the function.

One thing that could be done is update the request context so it carries the span information forward if any other function or package wants to use it.

But in theory this was not done in the past either, Ot.ContextWithSpan(...) was never called here so the context was never updated, the only difference is that Otel.Tracer.Start(...) always returns a context, which was not true for OT implementation.

Comment on lines +124 to +129
// OtTracer holds the open telemetry tracer enabled for this proxy instance
// DEPRECATED, use TracerWrapper.
Tracer ot.Tracer

// OtelTracer holds the opentracing tracer enabled for this proxy instance
OtelTracer trace.Tracer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere we keep old and new tracer fields.
Why not use only OTEL tracer and provide users an adapter for opentracing tracer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we do this from skipper.Run() point of view @lucastt , so we likely can delete proxy.Tracer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is possible, I just considered this as something that people using skipper as a library could be using so I wanted to keep backward compatibility. Under the hood the old tracer is converted into a tracerWrapper instance that convert calls from OTEL to OT.

So yeah, if this part does not need to be backward compatible it would be much better to have a single tracer instance.

Comment on lines +849 to +852
type otelTransport struct {
tr *http.Transport
tracer trace.Tracer
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Somehow we created proxy span without it before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting question and I think it's how OTel is different from OT. Please confirm @lucastt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, maybe this is not necessary. Because the goal of this struct is to inject the span into the request, but we already do this in net/httpclient.go:406 so the only case this struct would be necessary is if we make requests using this Roundtripper directly, without using skipper http client.

@@ -1173,14 +1211,14 @@ func (p *Proxy) do(ctx *context, parentSpan ot.Span) (err error) {
ctx.ensureDefaultResponse()
} else if ctx.route.BackendType == eskip.LoopBackend {
loopCTX := ctx.clone()
loopSpan := tracing.CreateSpan("loopback", ctx.request.Context(), p.tracing.tracer)
_, loopSpan := p.tracing.Start(ctx.request.Context(), "loopback")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What return value is ignored here and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1310,7 +1354,7 @@ func (p *Proxy) serveResponse(ctx *context) {
p.tracing.setTag(ctx.proxySpan, ClientRequestStateTag, ClientRequestCanceled)
}

p.tracing.setTag(ctx.initialSpan, HTTPStatusCodeTag, uint16(ctx.response.StatusCode))
p.tracing.setTag(ctx.initialSpan, HTTPStatusCodeTag, fmt.Sprint(uint16(ctx.response.StatusCode)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used attributes elsewhere but here it is "tag", why is it so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of proxy it implements a tracing package, this package was refactored to use Otel.Tracer but I didn't change the names of the functions on this package, in fact I changed the minimum necessary to make it work with Otel, but if you think is more valuable here to change the nomenclature I can do that.

} else {
span.SetTag(key, value)
}
span.SetAttributes(attribute.String(key, value))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ensureUTF8 was lost with this change.

Comment on lines +22 to +24
// OtelTracer is an implementation of opentelemetry.OtelTracer for testing. It records
// the defined spans during a series of operations.
type OtelTracer struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should take a chance and get rid of tracing and tracing/tracingtest packages and just use otel directly (and https://pkg.go.dev/go.opentelemetry.io/otel/oteltest).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case why do we implement own test tracer instead of using https://pkg.go.dev/go.opentelemetry.io/otel/oteltest ?

Copy link
Contributor Author

@lucastt lucastt Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is a mistake, I didn't know oteltest package at the time. Oteltest seems to be a better solution for tracing tests for now on. But maybe we could address this in a different PR? It seems to me this will bring a lot of new changes to an already very big PR. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major moderate risk, for example new API, small filter changes that have no risk like refactoring or logs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants