Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new failover option prioritising nodes #1896

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

aolley
Copy link
Contributor

@aolley aolley commented Dec 21, 2020

This makes it possible to define a list of nodes you'd prefer to communicate
with in cluster mode.

This behaves similarly to FAILOVER_DISTRIBUTE, where it randomises the list of
nodes before trying them in sequence - however it takes that random list and
sorts any preferred nodes to the top first. All candidate nodes are still in
the list, but the preferred ones get tried first.

This is extremely helpful for setups where you know which nodes are closer
(i.e. not crossing an AZ boundary).

@aolley
Copy link
Contributor Author

aolley commented Dec 21, 2020

This is for issue #1194.

@aolley
Copy link
Contributor Author

aolley commented Dec 21, 2020

I just noticed you get a segfault if you have distribute mode set but have no preferred nodes set when you go to run a command... (fixed in subsequent commits)

@aolley aolley force-pushed the feature/failover_preferred branch 3 times, most recently from 5f8117f to 84f44c6 Compare March 4, 2021 13:18
@michael-grunder michael-grunder self-assigned this Mar 26, 2021
This makes it possible to define a list of nodes you'd prefer to communicate
with in cluster mode.

This behaves similarly to FAILOVER_DISTRIBUTE, where it randomises the list of
nodes before trying them in sequence - however it takes that random list and
sorts any preferred nodes to the top first. All candidate nodes are still in
the list, but the preferred ones get tried first.

This is extremely helpful for setups where you know which nodes are closer
(i.e. not crossing an AZ boundary).
Redis can be in a state where a replica will respond MOVED to a slot even if
that replica should be able to serve that slot. The why is immaterial - redis
spec says you must perform your command on the host:port the MOVED response
specifies.

If we always sort our preferred nodes higher than the host:port it says we
should read from - we end up never reading from it. This shortcuts that by
disabling the FAILOVER mode entirely when this situation is encounted -
ensuring the command then goes to the primary.

This has a slight benefit over FAILOVER_DISTRIBUTE_SLAVES mode - as that will
work, but randomly picks a node out of the list (primary+replicas) each time -
you just happen to eventually get the primary and have it continue. By
switching to NONE - we immediately just talk to the primary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants