Exposing more data on IProfiledCommand to enrich traces #1448

CodeBlanch · 2020-04-28T00:00:41Z

Greetings!

I'm using Open Telemetry and its Redis adapater to instrument an application using a Redis cluster for caching.

There are a couple of issues that have been discussed in this area already. Ideally, we would use DiagnosticSource (#833) because the profiling interface is a bit awkward for the integration (#1044) but let's set that aside for a second.

The profiling mechanism works once you get it registered with the connection, but it's missing some key data, primarily the cache key and whether or not the retrieval was a hit or a miss. That stuff isn't exposed on the IProfiledCommand interface so there's no way for OT to write it on its spans. We can solve this by wrapping our calls into StackExchange.Redis with a class that spins up tracing around it, but it would be nice to have an OOB solution.

Are we open to exposing more data on IProfiledCommand? I'm happy to work on it, and throw up a proposal, I just thought it would be a good idea to ask first.

/cc @SergeyKanzhelev @MikeGoldsmith @cijothomas

The text was updated successfully, but these errors were encountered:

SergeyKanzhelev · 2020-05-04T21:31:03Z

CC: @davidfowl and @tarekgh

NickCraver · 2020-05-04T21:59:17Z

What would be the structure of the property on the command if we did this? A few things to keep in mind:

Not everything is a key, a command can be zero, one, or a thousand keys
Some commands are full scripts, etc.
We don't allocate the command today - it's bytes to shove to the wire so we don't necessarily every have a string version of it (which I'm guessing is the ask here). If we did so, there'd be a lot of allocation overhead to satisfy this profiling.

On the result end of things, the questions go towards: what is "success"? Hit or miss is called out here, but that's a very narrow view of Redis in general. We could be getting zero, one, or a thousand items. It could be a key or a set or nothing at all. We could be issuing a config set or a pub/sub to send to replication, etc. None of those abstract to a hit or a miss really - the concept itself is far too narrow for the usage. I think to have a chance at a facade here, we'd have to define "success", but I'm no sure that's any easier to do or less opinionated though.

The above is a quick brain dump to get the conversation going - thoughts on...stuff?

SergeyKanzhelev · 2020-05-06T06:20:43Z

@tarekgh this would be a great use case to try the improved DiagnosticSource design. Would you be available to make an example on how this instrumentation may look like so we can estimate the number of allocations needed?

SergeyKanzhelev · 2020-05-06T06:20:46Z

As of user scenarios and what is missing - the most critical piece of information that is missing (I heard of) is an information on failed calls. Like network issue or such.

After the basic details like Span with name and duration and failure indication, things like hit or miss is a nice addition for cases we can identify. Yes, those attributes would be useful for the narrow set of commands, but would work great for people using those. I like the suggestion to add number of bytes sent/received. If there is a way to obtain this information easily - it may be very informative.

CodeBlanch · 2020-05-08T00:23:34Z

I just opened draft PR showing what I was thinking. Basically add more of the raw data to IProfiledCommand. StackExchange.Redis doesn't really need to worry about what OT will do with it :) At the very least having Script when Command=EVAL will be helpful. But I think we can also roughly determine cache hit/miss? For example, if Script contains redis.call('HMGET' (that's what the dotnet lib does) we can check if the Keys requested have matching Values. Or am I being too optimistic?

As far as adding the key requested on the span, easy to do if one is present. If multiple, we could write how many were requested? Or the top X number? Or make it configurable?

CodeBlanch · 2020-05-09T00:51:44Z

I updated the PR to fix the failed calls not being captured (@SergeyKanzhelev brought that up). Couldn't find anything obvious for bytes sent/received though.

One thing...

The exception that gets exposed is very verbose:

if (command.IsFaulted)
{
	string ErrorMessage = command.Exception.Message;
	// "No connection is active/available to service this operation: EVAL; SocketClosed (ReadEndOfStream, last-recv: 0) on 127.0.0.1:6379/Subscription, Idle/MarkProcessed, last: SUBSCRIBE, origin: ReadFromPipe, outstanding: 0, last-read: 0s ago, last-write: 17s ago, keep-alive: 60s, state: ConnectedEstablished, mgr: 8 of 10 available, in: 0, in-pipe: 0, out-pipe: 0, last-heartbeat: 0s ago, last-mbeat: 0s ago, global: 0s ago, v: 2.1.41.3040, mc: 1/1/0, mgr: 10 of 10 available, clientName: DotNetHashMemberGetAsync, IOCP: (Busy=1,Free=999,Min=16,Max=1000), WORKER: (Busy=4,Free=2043,Min=16,Max=2047), v: 2.1.41.3040"
}

May or may not be a good thing?

SergeyKanzhelev mentioned this issue May 6, 2020

Allow to register global profiler #1044

Open

CodeBlanch linked a pull request May 8, 2020 that will close this issue

IProfiledCommand data expansion #1454

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exposing more data on IProfiledCommand to enrich traces #1448

Exposing more data on IProfiledCommand to enrich traces #1448

CodeBlanch commented Apr 28, 2020

SergeyKanzhelev commented May 4, 2020

NickCraver commented May 4, 2020

SergeyKanzhelev commented May 6, 2020

SergeyKanzhelev commented May 6, 2020

CodeBlanch commented May 8, 2020

CodeBlanch commented May 9, 2020 •

edited

Exposing more data on IProfiledCommand to enrich traces #1448

Exposing more data on IProfiledCommand to enrich traces #1448

Comments

CodeBlanch commented Apr 28, 2020

SergeyKanzhelev commented May 4, 2020

NickCraver commented May 4, 2020

SergeyKanzhelev commented May 6, 2020

SergeyKanzhelev commented May 6, 2020

CodeBlanch commented May 8, 2020

CodeBlanch commented May 9, 2020 • edited

CodeBlanch commented May 9, 2020 •

edited