Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout exceptions #2681

Closed
mattchidley opened this issue Mar 19, 2024 · 8 comments
Closed

Timeout exceptions #2681

mattchidley opened this issue Mar 19, 2024 · 8 comments

Comments

@mattchidley
Copy link

mattchidley commented Mar 19, 2024

Can someone help me understand these exceptions and maybe point at what could be going wrong for us?

Here is our exception:

Timeout awaiting response (outbound=234457KiB, inbound=35KiB, 5020ms elapsed, timeout is 5000ms), command=XADD, next: XREAD X:datachanged, inst: 0, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: TryParseResult, ws: Idle, in: 0, last-in: 15, cur-in: 240084771, sync-ops: 0, async-ops: 898910, serverEndpoint: X, conn-sec: 3254.58, aoc: 0, mc: 1/1/0, mgr: 9 of 10 available, clientName: X(SE.Redis-v2.7.27.49176), IOCP: (Busy=0,Free=2000,Min=2000,Max=2000), WORKER: (Busy=114,Free=32653,Min=2000,Max=32767), POOL: (Threads=114,QueuedItems=3,CompletedItems=68656823,Timers=162), v: 2.7.27.49176

What is unclear is: is outbound the total size in queue to be sent to Redis, or is it just for this command?

We are using the ConnectionMultiplexer, I have read https://stackexchange.github.io/StackExchange.Redis/Timeouts, but some of these abbreviations are not noted in this guide.

If you could help shed some light on what could be going wrong for us, that'd be great. Essentially what we're experiencing is threads piling up with these timeouts, and then we appear to lose connection/stop communicating with Redis entirely (because we're constantly timing out?)

@mgravell
Copy link
Collaborator

unless my math is wrong... there's 234MiB traffic down the pipe between when the message was enqueued and now; that's quite a lot of data (although data volume is subjective and contextual) - so I'm guessing you're simply saturating bandwidth and/or server load - plus of course that will put a non-trivial strain on the client. Do those numbers sound right for your usage?

@mattchidley
Copy link
Author

mattchidley commented Mar 19, 2024

We're a pretty large application with many threads writing to Redis at any given time, and we are using async. Our Redis server is not under load in that CPU and Memory look fine. I'll have to look into network throughput.

Can you confirm. This command for example was to add a new message to the stream (of unknown size in these logs), and that timed out because during this action another 234MiB was added to the stream?

Are there Redis server logs which might help us? What could we look for?

Just to add, on our Redis server we are seeing these logs:

34490:C 19 Mar 2024 12:15:48.806 * Fork CoW for RDB: current 2 MB, peak 2 MB, average 2 MB
8:M 19 Mar 2024 12:15:49.015 * Background saving terminated with success
8:M 19 Mar 2024 12:16:50.065 * 10000 changes in 60 seconds. Saving...
8:M 19 Mar 2024 12:16:50.243 * Background saving started by pid 34761
34761:C 19 Mar 2024 12:17:44.438 * DB saved on disk
34761:C 19 Mar 2024 12:17:44.640 * Fork CoW for RDB: current 2 MB, peak 2 MB, average 2 MB
8:M 19 Mar 2024 12:17:44.892 * Background saving terminated with success
8:M 19 Mar 2024 12:18:45.033 * 10000 changes in 60 seconds. Saving...
8:M 19 Mar 2024 12:18:45.213 * Background saving started by pid 35056
35056:C 19 Mar 2024 12:19:38.109 * DB saved on disk
35056:C 19 Mar 2024 12:19:38.307 * Fork CoW for RDB: current 660 MB, peak 660 MB, average 660 MB
8:M 19 Mar 2024 12:19:38.605 * Background saving terminated with success
8:M 19 Mar 2024 12:20:39.014 * 10000 changes in 60 seconds. Saving...
8:M 19 Mar 2024 12:20:39.198 * Background saving started by pid 35314
35314:C 19 Mar 2024 12:21:28.978 * DB saved on disk
35314:C 19 Mar 2024 12:21:29.177 * Fork CoW for RDB: current 14 MB, peak 14 MB, average 14 MB

@mgravell
Copy link
Collaborator

The most useful server side thing is always: slowlog; does that report any concerning operations? My comment on number is based on the numbers in the exception; if I had to guess: some fairly chunky BLOBs moving around rather than pure volume of commands. I have some work actively in progress to remove a bottleneck in the processing loop, which could also be a factor with these numbers. Unfortunately, timeouts are inherently a little tricky to opine on - from the clients perspective, all we know is "we didn't process the result of X in time" - we can't say exactly where it got stuck, or how long each step from server to client to your application took.

@mattchidley
Copy link
Author

Thanks @mgravell can you confirm that the outbound=234457KiB is data to be sent to redis, or data to be sent from redis?

One thing we just learned is that we are using XREAD commands and we have not been setting the count.

@mgravell
Copy link
Collaborator

I need to check (not at a PC), but I think that's the delta in total bytes sent between enqueue and timeout; however, the number would make more sense as the total bytes received, so please do let me double check that. Due to how the buffers currently work, it is possible that we've inverted them. I'll check.

@mattchidley
Copy link
Author

@mgravell is there a doc somewhere that describes these debug logs?

@mgravell
Copy link
Collaborator

There is a link in the exception. It is hard to be exhaustive on all interpretations of the data, though.

@NickCraver
Copy link
Collaborator

Closing out to tidy up, but happy to re-open if there's a follow-up question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants