Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuous RedisTimeoutException until app restart on Azure Cache for Redis patched server #568

Open
dstj opened this issue Jul 22, 2023 · 0 comments

Comments

@dstj
Copy link

dstj commented Jul 22, 2023

(This may be a question about proper usage for connection resiliency on Azure, the Redis.Extensions documentation was unclear)

My application using a Azure Cache for Redis server had continuous RedisTimeoutException errors until I manually forced a restart on my container app. Then everything worked again.

The cause appears to be because the Redis Server was automatically patched by Azure. The "Diagnose and solve problems" tab on the Azure Portal produced the following:

This cache was recently patched
Your cache MY CACHE SERVER was patched at 2023-07-21 22:19:00Z. The Azure Cache for Redis service periodically updates all caches with the latest platform features and improvements. To apply the updates, each node within each cache must be taken offline, which means closing its client connections.

Standard and Premium tier caches have internal redundancy to avoid prolonged downtime during all maintenance events. In these caches, the primary nodes failover to replicas when they need to be taken offline for maintenance. After a primary node closes its client connections, the replica takes over and is available for new connections within seconds. Client applications simply need to reconnect, and most Redis client libraries take care of this automatically.

If your Redis client is configured properly and implements connection resilience best practices, it will see only momentary errors as Redis connections failover during patching.

Recommendations

  • Implement a retry policy: Ensure that any commands that fail with a transient timeout or connection error are automatically retried. Consider using a library like Polly to add retries without complicating your code.
  • Upgrade to the latest version of your Redis client library: New versions often include improvements to maintain connection stability and restore lost connections faster.
  • Schedule your updates: Select a time during the week when patching will be less impactful to your application.
  • Detect when a connection fails to reconnect automatically, and replace it with new connections: For examples of how to implement this in .NET code with the StackExchange.Redis client library, see https://github.com/Azure-Samples/azure-cache-redis-samples.
    Subscribe to maintenance notifications within your Redis client application: Your application can use the notifications to redirect requests away from an impacted cache, or take whatever action is appropriate.

The error message received over and over and over again until I forced a restart was:

StackExchange.Redis.RedisTimeoutException: Timeout awaiting response (outbound=0KiB, inbound=0KiB, 1516ms elapsed, timeout is 1000ms), 
command=SET, next: INFO, inst: 0, qu: 0, qs: 0, aw: False, bw: Inactive, rs: ReadAsync, ws: Idle, in: 0, last-in: 28, cur-in: 0, sync-ops: 0, async-ops: 2, 
serverEndpoint: ___my_azure_redis_server.redis.cache.windows.net:6380, 
conn-sec: 10453, aoc: 0, mc: 1/1/0, mgr: 4 of 4 available, 
clientName: ___my_app_container___(SE.Redis-v2.6.122.38350), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), 
WORKER: (Busy=2,Free=32765,Min=2,Max=32767), POOL: (Threads=4,QueuedItems=0,CompletedItems=568694,Timers=49), 
v: 2.6.122.38350 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
at MyApp.Caching.RedisCachingService.SetAsync[T](String key, T value, TimeSpan slidingExpiration, Boolean addOnly) in /src/MyApp/Caching/RedisCachingService.cs:line 26

The Azure recommendation is to implement forced connection reconnect as in this example

My question is "How should this be done in the proper Redis.Extensions way"?

My singleton caching service was basically this:

private readonly IRedisClient _redisCacheClient;

public RedisCachingService(RedisConfiguration redisConfiguration, ILoggerFactory loggerFactory)
{
	var redisCacheConnectionPoolManager = new RedisConnectionPoolManager(redisConfiguration, loggerFactory?.CreateLogger<RedisConnectionPoolManager>());
	var msgPackObjectSerializer = new MsgPackObjectSerializer();
	_redisCacheClient = new RedisClient(redisCacheConnectionPoolManager, msgPackObjectSerializer, redisConfiguration);
}

public async Task SetAsync<T>(string key, T value, TimeSpan slidingExpiration, bool addOnly = false)
{
	var whenCondition = addOnly ? When.NotExists : When.Always;
	if (await _redisCacheClient.Db0.AddAsync(key, value, whenCondition)) return;

	if (!addOnly) {
		var existing = await _redisCacheClient.Db0.GetAsync<T>(key);
		throw new CachingServiceException($"Cache key '{key}' already exists in cache with value '{existing}'");
	}
}
...

(I'll now try moving to the RedisClientFactory)

My RedisConfiguration is:

	"AllowAdmin": false,
	"Ssl": true,
	"ConnectTimeout": 3000,
	"ConnectRetry": 2,
	"Database": 0,
	"Password": "set from environment var",
	"Hosts": [
		{
			"Host": "set from environment var",
			"Port": "set from environment var"
		}
	]

The Redis.Extension documentation seems to mention the the RedisClient can and should be a singleton, so I was under the impression that it would handle connection resiliency, but my repeated error tells me this is not the case... :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant