-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transient fault tolerance and implicit retry support discussion #399
Comments
There is retry mechanism in SE.Redis, we can use below options in the connection string:
Below is the sample connection string:
|
Using : StackExchange.Redis and ServiceStack.Redis For documentation about the "connectRetry" parameter, I found this : => I already have the same issue at some times. But that was with the default value of "3000ms". This will not solve the problem exposed (not a transiant failure mecanism) but could help a little ? . |
Also, you are using the Azure Redis Cache and the recommended client for the same is StackExchange.Redis but the documentation section that you pasted above seems to be from ServiceStack.Redis, can you please confirm, which Redis client you are using. |
The original discussion was about retrying operations, not connections. It is my understanding that I can't find anything about |
bump for this question. We are migrating from ServiceStack to StackExchange client and in the code we are replacing, which used ServiceStack, we caught exceptions and would retry operations after a short thread.sleep. On most occasions the retry would work. If there is a network issue that causes a System.Net.SocketException such as "An established connection was aborted by the software in your host machine" or "An existing connection was forcibly closed by the remote host" does StackExchange.Redis automatically retry up until the syncTimeout time has elapsed? If not, are there any suggested steps that should happen between the initial failure and a retry in our code? Such as:
Just for clarification, I am talking about network issues when attempting StringSet or StringGet. Not when trying to initially connect to the Redis server. |
I know this is an old issue, and something we haven't gotten to, but @deepakverma is now working on approaches here. Expect some retry semantics to be configurable in an upcoming release. |
@NickCraver I believe the Azure Redis team have thought about retry handling and failover notification support for drivers before, just want to mention in case your team wanted to reach out to them to discuss ideas and come up a nice convention others can follow too to have a smoother planned failover occur. |
@Plasma Deepak's on that team ;) We are indeed syncing with them weekly to get more quality of life things in. |
@NickCraver @deepakverma may we get an update of your progress on this issue? Thanks! |
@dariusonsched Marc and I have been slammed but are looking into 2 things here: 1) backlog/retry policy (see #1912), a thread stall issue related to that - which leads to us considering defaulting to the built-in thread pool for the socket manager in the 2.5x release on .NET 6.0+ environments (which have some sync-over-async protections it was part of what it was originally designed around). |
For anyone curious, this is happening in #1912 and will be available in the v2.5 release :) |
The Problem
We're a heavy user of Azure Redis Cache; and the platform will sometimes (eg once a month) reboot the underlying host OS for platform updates, causing our primary redis cache instance to go down. The secondary instance Azure runs takes over, but only after a moment of several command failures.
When these events happen, sockets are disconnected, commands fail, and timeouts momentarily occur and SE.Redis rightfully has to throw that exception.
Here's an example of the exceptions we may see during this time:
And:
And:
Solution proposal
Azure, Amazon and other cloud services all toe the line that things need to handle transient faults. SQL Azure drivers handle this with built-in retry support on the app-code side, but the recommended driver (SE.Redis) has no command retry support to shield app code from these transient faults.
It's not an easy problem to solve: Not all commands should necessarily be retryable as they are not idempotent (eg, INCR operations) that may or may not have succeeded.
I am wondering what the thoughts are about how to best approach a solution to this problem.
Driver level support by SE.Redis that could have a "command retry" option that blindly retries all commands (or optionally only "idempotent commands" like SET or GET) upon any form of connection/timeout failure for up to X retries would be a pretty good initial solution.
Ideally, there was perhaps a Redis-level "This server is shutting down" command that could warn the driver to pause sending any commands for a few moments while the underlying secondary takes over would be better, but that's a more co-ordinated solution also involving the Redis team.
Thoughts?
The text was updated successfully, but these errors were encountered: