Timeout exceptions #2681

mattchidley · 2024-03-19T15:10:38Z

Can someone help me understand these exceptions and maybe point at what could be going wrong for us?

Here is our exception:

Timeout awaiting response (outbound=234457KiB, inbound=35KiB, 5020ms elapsed, timeout is 5000ms), command=XADD, next: XREAD X:datachanged, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: TryParseResult, ws: Idle, in: 0, last-in: 15, cur-in: 240084771, sync-ops: 0, async-ops: 898910, serverEndpoint: X, conn-sec: 3254.58, aoc: 0, mc: 1/1/0, mgr: 9 of 10 available, clientName: X(SE.Redis-v2.7.27.49176), IOCP: (Busy=0,Free=2000,Min=2000,Max=2000), WORKER: (Busy=114,Free=32653,Min=2000,Max=32767), POOL: (Threads=114,QueuedItems=3,CompletedItems=68656823,Timers=162), v: 2.7.27.49176

What is unclear is: is outbound the total size in queue to be sent to Redis, or is it just for this command?

We are using the ConnectionMultiplexer, I have read https://stackexchange.github.io/StackExchange.Redis/Timeouts, but some of these abbreviations are not noted in this guide.

If you could help shed some light on what could be going wrong for us, that'd be great. Essentially what we're experiencing is threads piling up with these timeouts, and then we appear to lose connection/stop communicating with Redis entirely (because we're constantly timing out?)

mgravell · 2024-03-19T15:51:08Z

unless my math is wrong... there's 234MiB traffic down the pipe between when the message was enqueued and now; that's quite a lot of data (although data volume is subjective and contextual) - so I'm guessing you're simply saturating bandwidth and/or server load - plus of course that will put a non-trivial strain on the client. Do those numbers sound right for your usage?

mattchidley · 2024-03-19T16:35:44Z

We're a pretty large application with many threads writing to Redis at any given time, and we are using async. Our Redis server is not under load in that CPU and Memory look fine. I'll have to look into network throughput.

Can you confirm. This command for example was to add a new message to the stream (of unknown size in these logs), and that timed out because during this action another 234MiB was added to the stream?

Are there Redis server logs which might help us? What could we look for?

Just to add, on our Redis server we are seeing these logs:

34490:C 19 Mar 2024 12:15:48.806 * Fork CoW for RDB: current 2 MB, peak 2 MB, average 2 MB
8:M 19 Mar 2024 12:15:49.015 * Background saving terminated with success
8:M 19 Mar 2024 12:16:50.065 * 10000 changes in 60 seconds. Saving...
8:M 19 Mar 2024 12:16:50.243 * Background saving started by pid 34761
34761:C 19 Mar 2024 12:17:44.438 * DB saved on disk
34761:C 19 Mar 2024 12:17:44.640 * Fork CoW for RDB: current 2 MB, peak 2 MB, average 2 MB
8:M 19 Mar 2024 12:17:44.892 * Background saving terminated with success
8:M 19 Mar 2024 12:18:45.033 * 10000 changes in 60 seconds. Saving...
8:M 19 Mar 2024 12:18:45.213 * Background saving started by pid 35056
35056:C 19 Mar 2024 12:19:38.109 * DB saved on disk
35056:C 19 Mar 2024 12:19:38.307 * Fork CoW for RDB: current 660 MB, peak 660 MB, average 660 MB
8:M 19 Mar 2024 12:19:38.605 * Background saving terminated with success
8:M 19 Mar 2024 12:20:39.014 * 10000 changes in 60 seconds. Saving...
8:M 19 Mar 2024 12:20:39.198 * Background saving started by pid 35314
35314:C 19 Mar 2024 12:21:28.978 * DB saved on disk
35314:C 19 Mar 2024 12:21:29.177 * Fork CoW for RDB: current 14 MB, peak 14 MB, average 14 MB

mgravell · 2024-03-19T18:24:46Z

The most useful server side thing is always: slowlog; does that report any concerning operations? My comment on number is based on the numbers in the exception; if I had to guess: some fairly chunky BLOBs moving around rather than pure volume of commands. I have some work actively in progress to remove a bottleneck in the processing loop, which could also be a factor with these numbers. Unfortunately, timeouts are inherently a little tricky to opine on - from the clients perspective, all we know is "we didn't process the result of X in time" - we can't say exactly where it got stuck, or how long each step from server to client to your application took.

mattchidley · 2024-03-19T18:48:06Z

Thanks @mgravell can you confirm that the outbound=234457KiB is data to be sent to redis, or data to be sent from redis?

One thing we just learned is that we are using XREAD commands and we have not been setting the count.

mgravell · 2024-03-19T20:48:50Z

I need to check (not at a PC), but I think that's the delta in total bytes sent between enqueue and timeout; however, the number would make more sense as the total bytes received, so please do let me double check that. Due to how the buffers currently work, it is possible that we've inverted them. I'll check.

mattchidley · 2024-03-20T23:55:16Z

@mgravell is there a doc somewhere that describes these debug logs?

mgravell · 2024-03-21T11:13:01Z

There is a link in the exception. It is hard to be exhaustive on all interpretations of the data, though.

NickCraver · 2024-05-07T15:25:36Z

Closing out to tidy up, but happy to re-open if there's a follow-up question.

NickCraver closed this as completed May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Timeout exceptions #2681

Timeout exceptions #2681

mattchidley commented Mar 19, 2024 •

edited

mgravell commented Mar 19, 2024

mattchidley commented Mar 19, 2024 •

edited

mgravell commented Mar 19, 2024

mattchidley commented Mar 19, 2024

mgravell commented Mar 19, 2024

mattchidley commented Mar 20, 2024

mgravell commented Mar 21, 2024

NickCraver commented May 7, 2024

Timeout exceptions #2681

Timeout exceptions #2681

Comments

mattchidley commented Mar 19, 2024 • edited

mgravell commented Mar 19, 2024

mattchidley commented Mar 19, 2024 • edited

mgravell commented Mar 19, 2024

mattchidley commented Mar 19, 2024

mgravell commented Mar 19, 2024

mattchidley commented Mar 20, 2024

mgravell commented Mar 21, 2024

NickCraver commented May 7, 2024

mattchidley commented Mar 19, 2024 •

edited

mattchidley commented Mar 19, 2024 •

edited