Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The message timed out in the backlog attempting to send because no connection became available #2698

Open
cwhsu1984 opened this issue Apr 16, 2024 · 9 comments

Comments

@cwhsu1984
Copy link

Hi,
I'm getting lots of timeout like belows:

StackExchange.Redis.RedisTimeoutException: The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 12, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1659, cur-in: 0, sync-ops: 1, async-ops: 3759721, serverEndpoint: x.x.x.x:6379, conn-sec: 55196.89, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myservice-6cfccff48c-q65wd(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 11447, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=13,Free=32754,Min=500,Max=32767), POOL: (Threads=13,QueuedItems=123,CompletedItems=214865365,Timers=39), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchange.Redis.RedisTimeoutException: The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 15, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1686, cur-in: 0, sync-ops: 1, async-ops: 3759721, serverEndpoint: x.x.x.x:6379, conn-sec: 55196.87, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myservice-6cfccff48c-q65wd(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 9891, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=13,Free=32754,Min=500,Max=32767), POOL: (Threads=13,QueuedItems=74,CompletedItems=214865364,Timers=39), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchange.Redis.RedisTimeoutException: The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 17, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1686, cur-in: 0, sync-ops: 1, async-ops: 3759721, serverEndpoint: x.x.x.x:6379, conn-sec: 55196.87, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myservice-6cfccff48c-q65wd(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 9839, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=13,Free=32754,Min=500,Max=32767), POOL: (Threads=13,QueuedItems=72,CompletedItems=214865364,Timers=39), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

StackExchange.Redis.RedisTimeoutException: The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 22, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1686, cur-in: 0, sync-ops: 1, async-ops: 3759721, serverEndpoint: x.x.x.x:6379, conn-sec: 55196.87, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myservice-6cfccff48c-q65wd(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 9877, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=13,Free=32754,Min=500,Max=32767), POOL: (Threads=13,QueuedItems=67,CompletedItems=214865364,Timers=39), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

My service has over 20 pods and connecting to a redis cluster with 6 nodes.
I check the cpu and memory usage, my container cpu, memory is all below 30% when errors occur.
It happens over 200 times in a second, and it's gone. But I can see the same error from time to time.

When the errors occur, redis client connection is not high(below 200).
At the peak time, I can see over 600 connections without error, so it seems the error is not related to peak traffic.

I have configured the minThread already. Plus, I don't think it's a thread issue based on the worker pool details above.
And most operations are async-ops, so it doesn't seem related to sync or async issue.

What are the possible reasons for this timeout error?
Thanks.

@NickCraver
Copy link
Collaborator

Marc and I are trying to reason about this and our best guess is that the lowered timeout to 1000ms which is also the frequency of flushing of backlog items may be colliding when there's a pause of any duration and commands come in just at the heartbeat interval.

My recommendation here would be if your timeout is lower like 1000ms then you likely want to lower the HeartbeatInterval to something lower, like 500ms or 250ms - see docs here: https://stackexchange.github.io/StackExchange.Redis/Configuration. This will cause the routine actions like flushing any pending backlog after an interruption to happen more often and in scenarios that shift networks fast, likely eliminate the timeouts you're seeing.

@cwhsu1984
Copy link
Author

I'll try that to see if it reduces or eliminates the issue. Thanks!

@cwhsu1984
Copy link
Author

I haven't tried this.
But after tuning with my application, the error seems to be gone.
The tuning is that we cached the db access which reduce lots of taffic between the db.
I think that greatly reduce the thread usages, and since then, I don't see the errors so far.
I keep checking this for a few days and update here.

@cwhsu1984
Copy link
Author

cwhsu1984 commented Apr 22, 2024

@NickCraver
After my tuning to DB access to reduce thread for db usage, I still see a few errors.
So I follow your suggestions to tune the HeartbeatInterval to 250ms.
The error frequency is like about 3xx times in a minute just like previously.
But you can see the error details seems a bit different now.

Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1178ms elapsed, timeout is 1000ms), command=GET, next: GET xxx:34169274, inst: 0, qu: 0, qs: 81, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1659, cur-in: 0, sync-ops: 1, async-ops: 6650939, serverEndpoint: x.x.x.x:6379, conn-sec: 23737.05, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myproject(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 7263, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=16,Free=32751,Min=500,Max=32767), POOL: (Threads=30,QueuedItems=12,CompletedItems=83240631,Timers=27), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 28, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1654, cur-in: 0, sync-ops: 1, async-ops: 58884859, serverEndpoint: x.x.x.x:6379, conn-sec: 241878.26, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myproject(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 6881, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=500,Max=32767), POOL: (Threads=11,QueuedItems=1,CompletedItems=741107568,Timers=27), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 29, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1654, cur-in: 0, sync-ops: 1, async-ops: 58884859, serverEndpoint: x.x.x.x:6379, conn-sec: 241878.26, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myproject(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 9585, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=500,Max=32767), POOL: (Threads=11,QueuedItems=0,CompletedItems=741107568,Timers=27), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

The message timed out in the backlog attempting to send because no connection became available, command=GET, timeout: 1000, outbound: 0KiB, inbound: 0KiB, inst: 0, qu: 32, qs: 0, aw: False, bw: CheckingForTimeout, rs: ReadAsync, ws: Idle, in: 0, in-pipe: 0, out-pipe: 0, last-in: 1654, cur-in: 0, sync-ops: 1, async-ops: 58884859, serverEndpoint: x.x.x.x:6379, conn-sec: 241878.26, aoc: 0, mc: 1/1/0, mgr: 6 of 6 available, clientName: myproject(SE.Redis-v2.7.33.41805), PerfCounterHelperkeyHashSlot: 6491, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=32763,Min=500,Max=32767), POOL: (Threads=11,QueuedItems=0,CompletedItems=741107567,Timers=27), v: 2.7.33.41805 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)

@NickCraver
Copy link
Collaborator

@cwhsu1984 I'm not sure why you're touching the backlog, my guess would be bursty spikes to the outbound socket writer. Is your load slamming in concentrated spikes in general?

@cwhsu1984
Copy link
Author

@cwhsu1984 I'm not sure why you're touching the backlog, my guess would be bursty spikes to the outbound socket writer. Is your load slamming in concentrated spikes in general?

How do I know if this is the case? Is there anything I can do to debug on this?

What I can tell you is that our application is a read heavy one. The read operation is 40x volume to the write.
I have seen peak traffic sending 700 commands/second, but the error didn't occur at the peak traffic time.
I also checked the network I/O, and it also didn't happen at the peak network I/O.

@NickCraver
Copy link
Collaborator

I'd recommend logging the disconnect/reconnect events (see https://github.com/StackExchange/StackExchange.Redis/blob/main/src/StackExchange.Redis/ConnectionMultiplexer.Events.cs) - so we can take a look at when timings are happening. At a 1 sec timeout, it's probable that we're disconnecting, but not reconnecting and processing in time, perhaps the reconnect takes a bit in your environment. You can pass a text writer into the .Connect(Async) method you're using or set up the logger factory on connections to log details and see where connection time is spent, that's the next likely debugging step if wishing to keep timeouts super low.

@cwhsu1984
Copy link
Author

thanks, let me try it

@cwhsu1984
Copy link
Author

In case someone needs it, this is how I log the connection

public class TextWriterLogger : TextWriter
{
    private readonly ILogger _logger;
    private readonly StringBuilder _builder;

    public TextWriterLogger(ILogger logger)
    {
        _logger = logger;
        _builder = new StringBuilder();
    }

    public override void Write(string? value)
    {
        if (value == "\r\n" || value == "\n" || value == "\r")
        {
            _logger.LogDebug("[redis] " + _builder);
            _builder.Clear();
        }
        else
        {
            _builder.Append(value);
        }
    }

    public override Encoding Encoding => Encoding.UTF8;
}

And the usage is like this.

var redisConnection = ConnectionMultiplexer.Connect(redisConfiguration.ConfigurationOptions, new TextWriterLogger(loggerFactory.CreateLogger("RedisLogger")));

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants