Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cluster replicas return ASK and TRYAGAIN #495

Merged

Conversation

zuiderkwast
Copy link
Contributor

@zuiderkwast zuiderkwast commented May 13, 2024

After READONLY, make a cluster replica behave as its primary regarding returning ASK redirects and TRYAGAIN.

Without this patch, a client reading from a replica cannot tell if a key doesn't exist or if it has already been migrated to another shard as part of an ongoing slot migration. Therefore, without an ASK redirect in this situation, offloading reads to cluster replicas wasn't reliable.

Note: The target of a redirect is always a primary. If a client wants to continue reading from a replica after following a redirect, it needs to figure out the replicas of that new primary using CLUSTER SHARDS or similar.

This is related to #21 and has been made possible by the introduction of Replication of Slot Migration States in #445.


Release notes:

During cluster slot migration, replicas are able to return -ASK redirects and -TRYAGAIN.

After READONLY, a replicas behaves as its primary regarding returning
ASK redirects and TRYAGAIN.

Without this patch, a client reading from replicas cannot tell if a
key doesn't exist or if it's being migrated to another slot. Therefore,
without an ASK redirect in this situation, reading from replicas wasn't
reliable.

The target of a redirect is always a primary. If a client wants to
continue reading from a replica after following a redirect, it needs to
figure out the replicas of that primary using CLUSTER SHARDS or similar.

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Copy link

codecov bot commented May 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.81%. Comparing base (a0aebb6) to head (689ee06).

Current head 689ee06 differs from pull request most recent head b1461ee

Please upload reports for the commit b1461ee to get more accurate results.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #495      +/-   ##
============================================
- Coverage     70.17%   69.81%   -0.37%     
============================================
  Files           109      109              
  Lines         59904    61802    +1898     
============================================
+ Hits          42039    43147    +1108     
- Misses        17865    18655     +790     
Files Coverage Δ
src/cluster.c 86.45% <100.00%> (+0.02%) ⬆️

... and 85 files with indirect coverage changes

@zuiderkwast zuiderkwast requested a review from PingXie May 13, 2024 23:36
@zuiderkwast
Copy link
Contributor Author

@supercaracal reported this for Redis in September 2022:

Hello,

I'm implementing a client for redis cluster in ruby.
https://github.com/redis-rb/redis-cluster-client

I'm trying to test the client under resharding and scale reading conditions. But it seems that replica nodes don't reply ask-redirection error. Clients receive nil from replica nodes while resharding. Is there a way to obtain values of keys in the middle of resharding from replica nodes correctly?

I think this is what you need.

@supercaracal
Copy link

Yes, it is. Thank you so much!

Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems a good idea, LGTM.

@enjoy-binbin enjoy-binbin added the release-notes This issue should get a line item in the release notes label May 18, 2024
@zuiderkwast zuiderkwast merged commit d72ba06 into valkey-io:unstable May 24, 2024
15 checks passed
@zuiderkwast zuiderkwast deleted the replica-return-ask-redirect branch May 24, 2024 15:58
@enjoy-binbin
Copy link
Member

i guess we also need to update https://valkey.io/commands/readonly/

@enjoy-binbin enjoy-binbin added the needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. label May 27, 2024
@zuiderkwast
Copy link
Contributor Author

@enjoy-binbin What do you want to write for the READONLY docs? It already says that the replica can return redirects.

@enjoy-binbin
Copy link
Member

ohh, so in the docs, we already says that the replica can return redirects (before the changes)...

@zuiderkwast
Copy link
Contributor Author

Yes, but maybe we should add a note like "Before Valkey 8, it was not reliable during slot migrations bla bla bla....." WDYT?

@enjoy-binbin
Copy link
Member

Yes, but maybe we should add a note like "Before Valkey 8, it was not reliable during slot migrations bla bla bla....." WDYT?

yean, this is a sometime that we can mention.

The cluster was reconfigured (for example resharded) and the replica is no longer able to serve commands for a given hash slot.

btw, in the docs, we says this. This sentence says that "after resharded", the replica can return the redirect, right? and in this PR, replica will return redirect during the slot migrations, right?

@enjoy-binbin
Copy link
Member

@zuiderkwast ping, i need an ACK in case i misunderstood it

@zuiderkwast
Copy link
Contributor Author

@enjoy-binbin ACK

You're right. We should improve the docs. I think it's almost correct already but we can add point 3.

When the connection is in readonly mode, the cluster will send a redirection to the client only if the operation involves keys not served by the replica’s master node. This may happen because:

  1. The client sent a command about hash slots never served by the master of this replica.
  2. The cluster was reconfigured (for example resharded) and the replica is no longer able to serve commands for a given hash slot.

3. A slot migration is ongoing. In this case the replica can return an ASK redirect or a TRYAGAIN error reply.

OK?

zuiderkwast added a commit to valkey-io/valkey-doc that referenced this pull request Jun 2, 2024
Documentation for valkey-io/valkey#495

---------

Signed-off-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants