Turn off spans from a scope while retaining downstream spans and without breaking traces #3867

jack-berg · 2024-02-07T20:12:12Z

Suppose you have a span hierarchy of ({tracerName}:{spanKind}:{spanName}):

tracerA:server:parent -> tracerB:internal:child -> tracerC:client:grandchild

And you decide that internal spans from tracerB are too noisy. You want to turn them off while:

Not losing downstream spans. You want to keep tracerC:client:grandchild because it contains important information about calls to downstream systems.
Not producing broken traces. tracerC:client:grandchild's parent should be tracerA:server:parent, not tracerB:internal:child.

How can you accomplish this? Well, you pretty much can't. A couple of problems prevent it:

Samplers dont have access to scope, so there's no SDK configuration mechanism which can do this.
Even if Samplers could make decisions based on scope, the sampling decision options don't make the desired behavior easy.
- If a sampler returns DROP for tracerB and delegates to ParentBased{root=AlwaysOn} for other spans: Then all descendants of tracerB:internal:child will be dropped as well. Not what we want.
- If a sampler returns DROP for tracerB and delegates to ParentBased{root=AlwaysOn,localParentNotSampled=AlwaysOn} for other spans: Then we end up with a broken trace since tracerC:client:grandchild will assign its parent to be tracerB:internal:child, which is dropped. Not what we want.
- If a sampler returns DROP for tracerB and delegates to ParentBased{root=AlwaysOn,localParentNotSampled=AlwaysOn} and instrumentation is aware and only sets spans in context if Span.isRecording()=true: Then we achieve the desired affect. tracerB:internal:child is dropped and tracerC:client:grandchild will assign its parent to be tracerA:server:parent, so we have a complete trace.
Even if Samplers could make decisions based on scope, its cumbersome to configure a sampler to have special rules for a scope. It's simpler to think of a sampling policy as orthogonal to the decision of which scopes / instrumentation libraries should be enabled.

So there is no way to get the behavior we want today. And even if samplers were given access to scope, we'd still need: a complex / non-standard sampling policy and coordination with instrumentation (i.e. checking Span.isRecording()==true). Ouch.

This is related to #530 and #1588, but the problem framing here is a little different so I think it warrants a separate issue.

Originally opened as open-telemetry/opentelemetry-java#6197 but think we should provide a spec level solution for this.

The text was updated successfully, but these errors were encountered:

svrnm · 2024-02-07T20:23:39Z

@jack-berg, do you think this issue has some relationship with this one? #3205 -- both issues seem to look for a way to remove noise intermediate spans without loosing the span tree.

jack-berg · 2024-02-07T20:34:39Z

I propose we solve this problem by starting to more fully realize the potential of scope as a concept. Today, scope is just a piece of metadata attached to records because there is no SDK level configuration surface area involving scopes (besides views) which allow you to select or do anything special with a scope.

I propose we introduce the the notion of scope level configuration for all three signals:

Give each provider (TracerProvider, MeterProvider, LoggerProvider) the ability to register configuration which applies scope configuration to some selection of matching scopes. Let scopes be selected using a combination of scope name, version, schema, and attributes.
When evaluating the scope config for a Tracer / Meter / Logger, use the first matching scope config.
Define each signal with a unique definition of the types of things which are configurable at the scope level for the respective Tracer, Meter, Logger, but introduce uniformity where it makes sense. This reflects that there will be a mixture of overlapping and signal specific scope configuration options.
To start, give each signal's scope config the ability specify that the scope is disabled. In the future, we might consider introducing scope specific samplers for tracing, or scope specific log level thresholds for logs, etc.
When a scope is disabled, the tracer / meter / logger uses the behavior of the relevant noop tracer / meter / logger.
For tracing, the noop tracer returns Span.wrap(parentContext). This means that disabling a tracer is an effective solution to the broken trace problem discussed in this issue.

The more I think about #1588, the more I think that extending sampler to include access to resource and scope is a leaky abstraction. Sure a sampler could be used to change the behavior of scopes, but if you play that out, we'll end up with samplers with very complicated rules. It seems much cleaner to separate concerns, with a global sampler policy applicable to all scopes, and separate scope level configuration.

We really need to start to realize the potential of the scope concept. This proposal would lay the groundwork for that by introducing the simplest possible scope level configuration (enabled | disabled) while solving a very concrete use case.

jack-berg · 2024-02-07T20:39:14Z

@jack-berg, do you think this issue has some relationship with this one? #3205 -- both issues seem to look for a way to remove noise intermediate spans without loosing the span tree.

@svrnm yes I do. I think right now we only have a hammer (i.e. sampler) so everything looks like a nail. If we had a scope level configuration, it would be a no brainer to add a severity level threshold option for logs, and we probably wouldn't nearly as interested in log sampling.

svrnm · 2024-02-07T21:04:03Z

Thanks, @jack-berg! Utilizing scope to accomplish that kind of "span-level sampling" (for a lack of a better word) is really an interesting concept. It would solve the issue(s) discussed in #3205 and probably a few more.

I just try to wrap my head around with an example:

Service A uses HTTP library H to call service B.
In Service A there are 2 scopes, one for the "app" and one for the http library, in service B there is one scope.

So similar to your example above there are now five spans

Server Span 1 in scope A -> Internal Span 2 in scope A -> Internal Span 3 in scope H -> Client Span 4 in scope H -> Server Span 5 in scope B

I do not care about span 2 + 3 (which may be multiple internal spans)
If I follow what you have outlined above the current scoping A, H, B would not work so I would need to put those internal spans into their own scope (e.g. "internal A scope" and "internal H scope").

Is this a correct understanding?

jack-berg · 2024-02-07T21:44:06Z

Yeah you're correct. In my experience most instrumentation produces a single tier of spans. I know the idea of a single instrumentation scope producing both server + internal spans, or internal + client spans exists, but maybe those instrumentations would rethink that position if the SDK gave them tools that gave them an incentive to do so. Or maybe there is a need for additional scope level configuration beyond just the enabled or disabled.

I think adding the ability to enable / disable scopes is doing the simplest possible thing while addressing a glaring omission in SDK configuration. Its the low hanging fruit. After that, we can take stock and determine what else is missing.

svrnm · 2024-02-08T07:50:05Z

Or maybe there is a need for additional scope level configuration beyond just the enabled or disabled.

I think disable/enable is a great starting point, and I see another usecase: there are plenty of questions around disabling individual instrumentation libraries. Of course one could argue that not loading them in the first place is what you should be doing, but if you start with a Java Agent, or a Node JS metapackage that has ALL the instrumentation libraries in it, it can be easier to disable those you don't care about. Now, if an instrumentation library has a well-defined scope, you could simply turn that off, and you are done.

The screenshot in open-telemetry/opentelemetry-demo#659 is a good example of this use case. Of course boiling this down to the 3 spans you "really" need is a good solution, but what if I suspect an issue in Slim... or Psr... or I am a library developer and want to get that level of insight, if I can turn those scopes on/off, I have that level of flexibility without compromising on span creation.

pyohannes · 2024-02-08T08:50:11Z

It would be worth getting @lmolkova's input, she has been (and is) thinking about this problems. The problems she highlights are mostly focusing on nested client spans (a client span parenting another client span), but I think the solution proposed here could also apply to those cases.

Using the scope mechanism is a much more promising approach then inventing a new mechanism, as proposed in the OTEP linked above.

lmolkova · 2024-02-08T19:11:55Z

I'd like to add that .NET OTel has an existing mechanism for it.

Let's say instrumentation creates a scope (via ActivitySource aka tracer) called My.Source.
When you want to enable this instrumentation, you enable a source via traceProvider.AddSource("My.Source").

Let's say library also creates a scope called My.Source.LowLevelVerboseStuff. Users may (but don't need to) enable it as well, but separately.

If I want to enable all sources from a library, I can do it with traceProvider.AddSource("My.Source*") (which introduces some small but solvable problems like this one)

The same mechanism is used for meters.

Effectively source name acts like a logger name where you can control loggers by their namespaces.

The changes in the spec that I'd like to see to generalize this approach

OTel SDK should allow enabling/disabling specific instrumentation scopes (by name?) for traces, metrics, and logs
Instrumentation scope within one library/instrumentation should be granular enough to allow controlling scopes independently
- Instrumentations MUST document the scopes they emit
- Instrumentations SHOULD use the same scopes names across signals for the same instrumentation points (e.g. http spans and http metrics)
- [Update] It could be a good idea for semconv to recommend a specific set of scopes
If instrumentation scope is disabled
- OTel SDK MUST return non-recording span (null, or other language-specific convention can be used)
- MUST NOT create new span context
- MUST NOT change active/current context
OTel API SHOULD provide a way for instrumentations to check if scope is enabled before collecting pre-sampling attributes. This is necessary to allow writing verbose layers that also affect performance.

Nice-to-haves

I'd be great to introduce SamplingResult.Drop decision as well (plus pass instrumentation scope to samplers) - it should result in the same behavior as if the whole scope was disabled but enable more granular per-span suppression

Per-scope enablement vs verbosity

As I mentioned before per-scope enablement is like a logger configuration, but there are two modes: on and off.
I think we can add more modes later by introducing a new property on the source (verbosity or similar) with some default value and adding config options to enable all scopes >= x verbosity.
But this would be an incremental improvement.

In the meantime we can just ask instrumentations to document their stuff and users to read docs. Distros would also need to pick good defaults.

lmolkova · 2024-02-08T19:26:09Z

One more example where this could be extremely useful is messaging where we have

per-message span to inject unique context and make each message traceable independently
send span tracing send operation for a batch of messages.

This creates problems for some users:

Many applications always send one message in a batch even though internally SDK treats everything as a batch.
Or they don't really need per-message tracing (when doing stream processing/aggregation)
Or their batches are so big that per-message tracing becomes prohibitively expensive (perf and volume of spans)

The suppression mechanism could allow users to decide if they want per-message tracing and enable it then.
Or if they are fine if all messages share the same span context (in which they were sent).

This brings one more small requirement (which is probably already covered, but maybe worth mentioning anyway):

instrumentations SHOULD be able to access and inject the context (theirs or their parent's) even if the span they created is suppressed/disabled

svrnm · 2024-02-09T09:33:37Z

Thank you @lmolkova for this detailed plan on how to tackle this issue. A question, it looks like this supersedes #3205? For the sake of having less issues, I am happy to close it then.

LikeTheSalad · 2024-02-09T10:37:42Z

I think disable/enable is a great starting point, and I see another usecase: there are plenty of questions around disabling individual instrumentation libraries.

This is a use case I'm dealing with right now where it could be very useful to disable/enable a specific library instrumentation dynamically. My use case involves remote configuration for Android applications where users can decide what to disable/enable after they've deployed their app, and since library instrumentations are added at compile time in Android, unloading an instrumentation at runtime is not an option, so I think this proposal can help addressing that use case in a clean way too.

lmolkova · 2024-02-09T17:20:20Z

Thank you @lmolkova for this detailed plan on how to tackle this issue. A question, it looks like this supersedes #3205? For the sake of having less issues, I am happy to close it then.

@svrnm
I believe this issue only covers on/off, but not the formal definition of verbosity per se. If this one is resolved, #3205 would become much more straightforward, but still not covered.
So I'd rather keep #3205 open (unless you feel that on/off is good enough).

tsloughter · 2024-02-10T16:28:16Z

I think this is also related to some stuff I'd been kicking around for years now,

#2555
#3103
open-telemetry/oteps#186

I planned to attempt again to find a solution in a rebooted Convenience SIG/WG. At least I think it would fall under that SIG/WG.

Is there any interest for tackling this there?

jmacd · 2024-02-14T17:09:49Z

This approach sounds good to me. Note that this approach does not exclude Sampler-related mechanisms that might solve the problem of broken traces due to sampled-out intermediate spans.

jack-berg · 2024-02-14T23:06:25Z

@tsloughter my immediate interest is to add the simplest possible scope level config mechanism across signals, by just providing a way to enable / disable scopes. I think that once a scope level config mechanism exists, its more straight forward to pursue something like #2555, but also, the need diminishes because at least part of the motivation is already accommodated by disabling scopes.

I think that #3103 is somewhat related, but should be separately pursued.

As for open-telemetry/oteps#186, there's definitely overlapping motivation, but I question the value of having all of the tracerprovider configuration options configurable at the tracer level. I think it makes sense to consider each property on a case by case basis. This will be easier to do once a scope level config mechanism exists.

jack-berg · 2024-02-29T15:27:29Z

No sure if folks have seen this, but opened PR #3877 (with java prototype here) which largely reflects @lmolkova's proposal. Please review if you're interested in seeing this move forward. Thanks!

Fixes #3867. This introduces the concept of scope-level configuration for Tracer, Meter, and Logger. Notes: - TracerProvider, MeterProvider, and LoggerProvider have a scope config provider function, which accepts an `InstrumentationScope` and returns the relevant scope config. This function is invoked when a Tracer, Meter, Logger is created, and the resulting scope config dictates the behavior of resulting Tracers, Meters, Loggers. Computing the config on scope creation avoids doing extra work on the hot path. - If TracerProvider, MeterProvider, LoggerProvider supports updating configuration, the function can be updated, at which point the scope config of outstanding Tracer, Meter, Logger will be re-computed. This accommodates emerging dynamic config use cases. - A function as the mechanism to determine relevant scope config to maximize flexibility. However, implementations are encourages to provide syntactic sugar / helpers for accommodating popular use cases: select one or more scopes by name or with pattern matching, disable one or more selected scopes, disable all scopes by default and enable one or more selective scopes. - Scope config is simple at the moment, consisting of on the ability to enable / disable a particular Tracer, Meter, Logger. When a disabled, use the respective noop Tracer, Meter, Logger. - Scope config parameters are expected to evolve over time. When it makes sense, there should be consistency across signals. - Java prototype available for this proposal: open-telemetry/opentelemetry-java#6201 Leaving as a draft for now because its not ready to merge, but I am actively soliciting feedback. If / when we agree on direction, I'll go back and add the appropriate experimental indications.

jack-berg added the spec:trace Related to the specification/trace directory label Feb 7, 2024

github-actions bot assigned jmacd Feb 7, 2024

lmolkova mentioned this issue Feb 13, 2024

AddSource with predicate open-telemetry/opentelemetry-dotnet#5353

Closed

4 tasks

jack-berg added the triaged-accepted The issue is triaged and accepted by the OTel community, one can proceed with creating a PR proposal label Feb 14, 2024

jack-berg mentioned this issue Feb 15, 2024

Add simple scope configuration to Tracer, Meter, Logger #3877

Merged

CodeBlanch mentioned this issue Feb 20, 2024

[trace-sdk-sampler] Differentiate between root and child span creation in trace sampling spec #3888

Open

1 task

cijothomas mentioned this issue Feb 25, 2024

Add ActivitySource to SamplingParameters open-telemetry/opentelemetry-dotnet#5376

Closed

lmolkova mentioned this issue Mar 26, 2024

Tracing configuration similar to MetricsBuilder dotnet/runtime#100317

Open

jmacd mentioned this issue Mar 28, 2024

Context propagation to client instrumentation #3825

Draft

5 tasks

lmolkova mentioned this issue Apr 3, 2024

Should tracers/loggers/meters be enabled by default #3983

Open

adriangb mentioned this issue Apr 5, 2024

Give samplers the span ID #3991

Open

jack-berg closed this as completed in #3877 Apr 8, 2024

Gaganjuneja mentioned this issue Apr 12, 2024

Adding new inferred sampler opensearch-project/OpenSearch#12927

Open

8 tasks

arielvalentin mentioned this issue Apr 23, 2024

feat: support context propagation through links open-telemetry/opentelemetry-ruby-contrib#837

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Turn off spans from a scope while retaining downstream spans and without breaking traces #3867

Turn off spans from a scope while retaining downstream spans and without breaking traces #3867

jack-berg commented Feb 7, 2024

svrnm commented Feb 7, 2024

jack-berg commented Feb 7, 2024

jack-berg commented Feb 7, 2024

svrnm commented Feb 7, 2024

jack-berg commented Feb 7, 2024

svrnm commented Feb 8, 2024

pyohannes commented Feb 8, 2024 •

edited

lmolkova commented Feb 8, 2024 •

edited

lmolkova commented Feb 8, 2024 •

edited

svrnm commented Feb 9, 2024

LikeTheSalad commented Feb 9, 2024

lmolkova commented Feb 9, 2024

tsloughter commented Feb 10, 2024

jmacd commented Feb 14, 2024

jack-berg commented Feb 14, 2024

jack-berg commented Feb 29, 2024

Turn off spans from a scope while retaining downstream spans and without breaking traces #3867

Turn off spans from a scope while retaining downstream spans and without breaking traces #3867

Comments

jack-berg commented Feb 7, 2024

svrnm commented Feb 7, 2024

jack-berg commented Feb 7, 2024

jack-berg commented Feb 7, 2024

svrnm commented Feb 7, 2024

jack-berg commented Feb 7, 2024

svrnm commented Feb 8, 2024

pyohannes commented Feb 8, 2024 • edited

lmolkova commented Feb 8, 2024 • edited

The changes in the spec that I'd like to see to generalize this approach

Per-scope enablement vs verbosity

lmolkova commented Feb 8, 2024 • edited

svrnm commented Feb 9, 2024

LikeTheSalad commented Feb 9, 2024

lmolkova commented Feb 9, 2024

tsloughter commented Feb 10, 2024

jmacd commented Feb 14, 2024

jack-berg commented Feb 14, 2024

jack-berg commented Feb 29, 2024

pyohannes commented Feb 8, 2024 •

edited

lmolkova commented Feb 8, 2024 •

edited

lmolkova commented Feb 8, 2024 •

edited