Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to register global profiler #1044

Open
SergeyKanzhelev opened this issue Jan 16, 2019 · 8 comments
Open

Allow to register global profiler #1044

SergeyKanzhelev opened this issue Jan 16, 2019 · 8 comments

Comments

@SergeyKanzhelev
Copy link

SergeyKanzhelev commented Jan 16, 2019

Allowing to specify global profiler for all connections will help simplify enablement of profiling auto-magically. Ideally registering of a global profiler should work retroactively for the already created connections. So there will be no application startup race between the connection creation and profiler registration.

Minimum API may be like this:

public static void RegisterDefaultProfiler(Func<ProfilingSession> defaultProfilerSessionsFactory);

It will work, however to discourage calling connection.RegisterProfiler(() => null); for every new connection to make sure default profiler will be cleaned up. For this some multiplexing of profilers needs to be implemented. One solution may be have a static method that will register the factory of creating ProfilerSession factories. It will create a session factory based on profiler registered for the specific connection using RegisterProfiler method.

public static void RegisterDefaultProfiler(Func<Func<ProfilingSession>, Func<ProfilingSession>> profilingSessionProviderFactory);

Than there may be a FinishSession() callback or delegate that will allow to proxy requested results from one profiler session to another.

Even minimum implementation will be enough to make profiling of Redis easier.

@NickCraver
Copy link
Collaborator

The problem with a global profiler for most applications is it would be an unbounded memory leak by nature. I think referencing a default profiler as a wrapper is a better bet (leaving it to the user here). For example, if we set this up on Stack Overflow we'd probably OOM within the hour...that's a pretty dangerous and easy pit to fall into. People can register a global profiler on the shared connection already in a few lines of code, so I don't think we need an API addition here to support the use case +if you so choose_ to do something so global. Does that make snese?

@SergeyKanzhelev
Copy link
Author

@NickCraver will memory leak because of the hanging sessions? Or some other reason?

Also, what is the default pattern configuring Redis connection with the DI? So we can recommend the best practice on how to configure profiler in a single place

@NickCraver
Copy link
Collaborator

@SergeyKanzhelev missed this (saw in the other issue) - yes a profiler that keeps running would be an unbounded memory leak. I'd question the goals to profile everything an application ever does - that is not a goal of our design in the profiler. Our goal is to profile things in some context or some period of time (e.g. for a web request or a window). I want to sanity check goals here on what we're trying to do.

For context: I'd be profiling many billions of Redis hits a day, but I also get that's not the normal use case. Scale slants my view as "...that'd be crazy expensive", where I imagine you're viewing a more general use case with a lot less impact of scale. Thoughts? Am I way off there?

@SergeyKanzhelev
Copy link
Author

I want to sanity check goals here on what we're trying to do.

The end goal is to enable continuous monitoring of Redis calls. For low load apps - we can end up recording information on every call. For high-throughput apps - sampling will be applied per distributed trace (e.g. web request) so many calls will not be recorded.

What is currently implemented is a profiler that can be registered on a connection that works around some limitations of Profiling API (https://github.com/open-telemetry/opentelemetry-dotnet/blob/4729e8c15474cc3d8e807dfd7c911772b6454a9f/src/OpenTelemetry.Adapter.StackExchangeRedis/StackExchangeRedisCallsAdapter.cs#L74-L103) and reports Redis calls as Spans when sampled.

It would be ideal to start using DiagnosticSource-based instrumentation instead of trying to utilize Profiling API here. This is what #1448 is about.

In this issue the request was to allow to register Profiler globally eliminating the need to call RegisterProfiler per connection. So by importing the NuGet and initializing data collection, it may subscribe to all connections.

@tbolon
Copy link

tbolon commented Mar 16, 2022

I am trying to resurrect this issue, linking to my other opentelemetry issue.

The goal are similar and still the same: we want to register redis instrumentation when configuring OpenTelemetry SDK without having to create the connection before.

We could create a custom solution when we own the ConnectionMultiplexer creation, and register this connection to a shared known instance which is registered against OpenTelemetry TraceAdapter and will create a profiler to propagate traces.

It seems most of the issues linked here are for the same base requirement: being able to monitor redis connections when the instrumentation library is built before any connection is created. It seems ActivitySource is the way to go. CreateActivity() is short circuited when there is no listener, and Microsoft.SqlClient uses the older DiagnosticListener.

Perhaps we should create a new issue only asking to implement ActvitySource for this library ? If you are really conservative about performance impact, this feature could be opt-in?

@NickCraver
Copy link
Collaborator

@tbolon I'm happy to discuss it, a few thoughts:

We could create a custom solution when we own the ConnectionMultiplexer creation....

I don't think this is tenable, you'd have to wrap every overload and maintain an upstream library for which there can be no upstream library bifurcation either (unless others wrap you). But, I don't think this is necessary either. Recently I added a DefaultOptionsProvider where something like this is much better suited to live if it's a config thing, e.g. some settings in there. It could also allow profiling before the multiplexer is created, rather than after the initial connection finishes.

Even in the proposals above (apologies @SergeyKanzhelev, didn't see that reply), determining whether not to profile is a cost if a session is registered, so sampling isn't free. Since this happens inside a lock, it's a non-trivial performance impact especially at scale.

With respect to how others do it: they don't make the same sense here. For example, you open a SqlConnection, do some things, and close it - the use case is not multiplexed or generally long-lived. Connections (hard) are long lived within a connection pool but all of that isn't exposed to activity bits and no profiling is active there - the profiled window is finite and rather short. For adapting, SqlClient is ultimately starting an Activity. This incurs quite a bit of overhead per command vs. our Message today.

In comparison to SQL we're infinitely long lived, often for the life of an application. To profile every command is scary scale, many apps doing billions of ops per day or more. That's a crazy amount of telemetry, and telemetry that generally isn't useful unless it's sent. I would never advise anyone enable this by default at all if it's per-command. For things like connections, disconnects, reconnects, errors: yeah sure, that's a lot lower volume and reasonable - we're exploring that now with @philon-msft on best paths (are you already talking there?). To be clear: I'm all for adding whatever there with minimal dependencies involved around the meta information - per command is the sticking point. We get a lot of performance related issues filed here for high traffic users of the library so any regression is very important to avoid.

@tbolon
Copy link

tbolon commented Mar 17, 2022

Everything makes perfect sense.

There is also the case of metrics, which makes a lot more sense in my opinion regarding commands, see https://docs.microsoft.com/en-us/dotnet/core/diagnostics/metrics-instrumentation.

See how an open telemetry receiver exists for the collector to export metrics from redis: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/redisreceiver

Per your comment, I think we need to think about what is really interesting in this library in terms of instrumentation: logging, tracing and/or metrics.

There is also multiple aspects of redis which could have different scopes. Pub/Sub for example when redis is used as a Message Bus, see https://www.aspecto.io/blog/how-to-achieve-end-to-end-visibility-into-redis-pub-sub-using-opentelemetry/

You are right that connections are often long lived. And the same connection could be used for different things inside the same application, some you could be interested in traces, others not. In the end I could understand that it's an application decision.

For example in our application we have a cache service with a memory layer and a distributed cache layer where invalidation is handled via subscriptions. We are interested in traces mostly when there is a value in the time spent in this cache, often because the value creation is costly. We definitely could create traces in our own code, and don't use redis instrumentation.

Eventually, we are simply interested in redis traces when the time passed in the library reaches a certain duration.

@tbolon
Copy link

tbolon commented Mar 17, 2022

For adapting, SqlClient is ultimately starting an Activity. This incurs quite a bit of overhead per command vs. our Message today.

Per overhead, there is only one method call on a sealed class which tests for a list.Count before returning null when there is no listener, see ActivitySource.CreateActivity. I know it's only the tip of the iceberg once an activity has been returned, as I see some complex interaction between the message and the profiledmessage in the current profiling system. Some significant code will be necessary if an activity must be involved in the message lifespan.

Regarding our projects, I think I will abandon instrumenting at the library level for now, and add activity tracing in our cache layer, because we are only using redis as a distributed caching engine. But I think this feature will still be useful when the redis usage is not managed by the developer (eg. external caching library).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants