-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC-0078: Zone-Aware Replica Reads #136
base: master
Are you sure you want to change the base?
Conversation
4ffb771
to
bec99f8
Compare
rfc/0077-zone-aware-replica-reads.md
Outdated
This is current behaviour, where the SDK does not take into account server | ||
groups. | ||
|
||
![figure 1: No Prefrence](figures/0077-case-1-no-preference.svg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to get this rendering in the diff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also noticed, it renders in the doc, but neither absolute, nor relative (like this) is not getting picked up diff. Maybe only if I will make it full URL starting with http://...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to move to a GDoc-less workflow, I think this support will be important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but with google docs still optional (#50 still is not merged) :)
And in a way google docs would require to perform conversion losing all discussions behind.
rfc/0077-zone-aware-replica-reads.md
Outdated
``` | ||
|
||
In all failure cases, when the SDK cannot handle operation, it must return `102 | ||
DocumentUnretrievable` with human-readable explanation whenever it is possible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason I was avoiding using the existing Get[Any|All]Replicas, is I consider their interfaces rather broken. One concrete example - I have no easy way of disambiguating programmatically between the document not existing, and a network error. There are other issues, including that GetAllReplicas swallows all substream errors, making it not a good building block for higher-level replica operations.
We need to consider that the feature needs to be added to transactions also - in fact this is the most important use-case since it is the one the requesting customer will use. And I'm somewhat reluctant to add a broken interface into transactions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On transactions - let's focus on non-transactions for now though. Once we nail that API it should be relatively straightforward to add it to transactions too. I'm only bringing up transactions as an argument for why I'd maybe prefer we avoided using getAnyReplica() for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
new options are essentially change behavior, maybe we can agree that with this new option it will distinguish
- network error
- document is not found
- case where the selected strategy does not allow us to fulfill request
- not enough local replicas
- no replicas configured
- even master is not local group
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that seems a viable solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be changing my feeling on that.. I'm very reluctant to add a not-great interface into transactions.
It is an option to only include getAnyReplica() in transactions, and only allow LOCAL_ONLY mode (with its better behaviour) - but perhaps not a great one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe after all it makes sense to add getAllReplicas and getAnyReplica into transactions and let them behave like regular operations, although implemented using lookupInAllReplicas and lookupInAnyReplica internally.
rfc/0077-zone-aware-replica-reads.md
Outdated
only 7.6.1 announce this information in configuration. | ||
[MB-60835](https://issues.couchbase.com/browse/MB-60835) How the SDK should | ||
behave with older server? Do we need to fail fast or silently fall back to | ||
current behavior? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we treat the desire to write to a specific server group as a request not a command, it gives us a lot of lee-way to handle situations like this. If we don't have a config with server groups, we fallback to whatever the operation does normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated response to this question, basically the behavior for old servers is not defined as they do not support Zone-Aware-Replica-Reads in the way we describe it (which requires serverGroup
markers in the configuration).
I think we should rather document this better to make it clear how to use it, and that whole cluster should have at least 7.6.2 on all nodes, and be balanced correctly.
rfc/0077-zone-aware-replica-reads.md
Outdated
``` | ||
class GetAllReplicasOptions { | ||
// ... | ||
ReadPreference readPreference { ReadPreference::NO_PREFERENCE }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this has any meaning in GetAllReplicas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think getting all replicas from the local server group could be a valid use case (though that's more like GetSomeReplicas)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, also note that we always read from active, so if you have two replicas configured for the bucket, you might have two responses.
rfc/0077-zone-aware-replica-reads.md
Outdated
|
||
Do we want to update GetResult and LookupInResult with the property, which would | ||
tell the user the name of the availability zone, where serving node has been | ||
deployed upon generating response? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a neat solution to a lot of the concerns above. If we tell the user where we got the replica from, they could add their own warning/metrics if it doesn't match the server group they expected, rather than us needing to spam.
rfc/0077-zone-aware-replica-reads.md
Outdated
This case is a compromise between cost and availability, which trades speed. The | ||
idea is that the SDK will partition set of `[active, replica1, ...]` nodes into | ||
two groups: local and non-local. Then it will perform requests to local only, | ||
and only when neither of nodes will return data, send requests to non-local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trouble with LOCAL_FIRST is that if the non-local node doesn't return anything, it's usually going to be due to a timeout - network blip or undiscovered node failover. So we only find out about it at the point the 2.5s KV user timeout has elapsed - and it's too late for us to try a fallback.
Instead of LOCAL_FIRST, we could support: The user can do an initial call with LOCAL_ONLY and a short timeout, and if that times out, try any fallback strategy they want.
This avoids all the complexity of the other potential solutions to this problem (like having two timeouts, or allowing the user to specify some sort of strategy thing).
63f2d41
to
f3c50be
Compare
f3c50be
to
7b9694f
Compare
@programmatix I've uploaded update for transactions API. It documents |
selected server group, but only use current topology to detect the Empty-Group | ||
case. | ||
|
||
![Selected Server Group or All Available](figures/0078-case-3-selected-server-group-or-all-available.svg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inability to view these significantly impacts ability to review
|
||
![Selected Server Group](figures/0078-case-2-local-only.svg) | ||
|
||
## Selected Server Group or All Available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's maybe worth considering as we think about any fallback options, that we perhaps want to recommend that the user always has a fallback strategy in place:
try {
return collection.getAnyReplica(docId, options().timeout("20ms").read_preference(SELECTED_SERVER_GROUP)
} catch DocumentUnretrievableException | DocumentNotFoundException {
return collection.get(docId)
// or collection.getAnyReplica() or collection.getAllReplicas() - as they wish
}
This saves them from a bunch of edge cases - like a new doc not having been replicated, mixed-mode clusters, and possibly some rebalance & failover scenarios. Working with replicas is inherently a bit flaky, and we should advise users how to protect themselves.
Plus they don't necessarily know if they are working with 7.6.2 or not.
So in that context, knowing the user is going to have a fallback strategy in place already - I do wonder if there's any benefit to having fallback modes like this?
|
||
## Selected Server Group | ||
|
||
In this case, all reads will be done only from the local availability zone. It |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/local availability zone/selected|preferred server group, I think?
read replica APIs: | ||
|
||
``` | ||
enum ReadPreference { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight nit: SELECTED_SERVER_GROUP at least is a strict command rather than a preference. Should this be ReadRequirement?
I don't feel strongly on this though.
``` | ||
class TransactionAttemptContext { | ||
// ... | ||
public TransactionGetResult getReplica(Collection collection, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah so maybe this has got lost in translation but I think this should be getReplicaFromPreferred[Server]Group() or similar.
(Or more accurately: I wish I had the time and capability to design a true getReplica() that handles all single-replica-read concerns. But I cannot figure out what that should look like. So until either inspiration strikes or I get more cycles to spend on it, the thought is to have a single-purpose method instead that is focussed on only this use-case. That doesn't "salt the earth" from doing a true getReplica() at a later date.)
property in cluster configuration. | ||
|
||
``` | ||
class TransactionAttemptContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel a bit uneasy with transactions specs being outside of the hackmd docs where they are all grouped together.
Perhaps I can just copy this into the API doc once finalised? We need a better long-term solution though.
|
||
# Caveats | ||
|
||
The use must be aware of the following topics when dealing with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: user
The behavior of this feature with older servers (pre 7.6.2) is not defined. It | ||
is up to SDK to decide what to do (for example, it could return `102 | ||
DocumentIrretrievable` in this case, because the configuration does not tell us | ||
anything about server groups assigned to the nodes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should nail this down.
What you've proposed makes sense to me, in the context of what I propose elsewhere about the user always have try-catch handling.
No description provided.