Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC-0078: Zone-Aware Replica Reads #136

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

avsej
Copy link
Member

@avsej avsej commented Apr 26, 2024

No description provided.

@avsej avsej changed the title initial draft Zone-Aware Replica Reads Apr 26, 2024
@avsej avsej marked this pull request as draft April 26, 2024 20:34
@avsej avsej force-pushed the 0077-zone-aware-replica-reads branch 3 times, most recently from 4ffb771 to bec99f8 Compare April 26, 2024 21:42
README.md Outdated Show resolved Hide resolved
rfc/0077-zone-aware-replica-reads.md Outdated Show resolved Hide resolved
This is current behaviour, where the SDK does not take into account server
groups.

![figure 1: No Prefrence](figures/0077-case-1-no-preference.svg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to get this rendering in the diff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed, it renders in the doc, but neither absolute, nor relative (like this) is not getting picked up diff. Maybe only if I will make it full URL starting with http://...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to move to a GDoc-less workflow, I think this support will be important.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with google docs still optional (#50 still is not merged) :)

And in a way google docs would require to perform conversion losing all discussions behind.

rfc/0077-zone-aware-replica-reads.md Outdated Show resolved Hide resolved
```

In all failure cases, when the SDK cannot handle operation, it must return `102
DocumentUnretrievable` with human-readable explanation whenever it is possible
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason I was avoiding using the existing Get[Any|All]Replicas, is I consider their interfaces rather broken. One concrete example - I have no easy way of disambiguating programmatically between the document not existing, and a network error. There are other issues, including that GetAllReplicas swallows all substream errors, making it not a good building block for higher-level replica operations.

We need to consider that the feature needs to be added to transactions also - in fact this is the most important use-case since it is the one the requesting customer will use. And I'm somewhat reluctant to add a broken interface into transactions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On transactions - let's focus on non-transactions for now though. Once we nail that API it should be relatively straightforward to add it to transactions too. I'm only bringing up transactions as an argument for why I'd maybe prefer we avoided using getAnyReplica() for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new options are essentially change behavior, maybe we can agree that with this new option it will distinguish

  • network error
  • document is not found
  • case where the selected strategy does not allow us to fulfill request
    • not enough local replicas
    • no replicas configured
    • even master is not local group

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems a viable solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be changing my feeling on that.. I'm very reluctant to add a not-great interface into transactions.

It is an option to only include getAnyReplica() in transactions, and only allow LOCAL_ONLY mode (with its better behaviour) - but perhaps not a great one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe after all it makes sense to add getAllReplicas and getAnyReplica into transactions and let them behave like regular operations, although implemented using lookupInAllReplicas and lookupInAnyReplica internally.

rfc/0077-zone-aware-replica-reads.md Outdated Show resolved Hide resolved
only 7.6.1 announce this information in configuration.
[MB-60835](https://issues.couchbase.com/browse/MB-60835) How the SDK should
behave with older server? Do we need to fail fast or silently fall back to
current behavior?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we treat the desire to write to a specific server group as a request not a command, it gives us a lot of lee-way to handle situations like this. If we don't have a config with server groups, we fallback to whatever the operation does normally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated response to this question, basically the behavior for old servers is not defined as they do not support Zone-Aware-Replica-Reads in the way we describe it (which requires serverGroup markers in the configuration).

I think we should rather document this better to make it clear how to use it, and that whole cluster should have at least 7.6.2 on all nodes, and be balanced correctly.

```
class GetAllReplicasOptions {
// ...
ReadPreference readPreference { ReadPreference::NO_PREFERENCE };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this has any meaning in GetAllReplicas?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getting all replicas from the local server group could be a valid use case (though that's more like GetSomeReplicas)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, also note that we always read from active, so if you have two replicas configured for the bucket, you might have two responses.

rfc/0077-zone-aware-replica-reads.md Outdated Show resolved Hide resolved

Do we want to update GetResult and LookupInResult with the property, which would
tell the user the name of the availability zone, where serving node has been
deployed upon generating response?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a neat solution to a lot of the concerns above. If we tell the user where we got the replica from, they could add their own warning/metrics if it doesn't match the server group they expected, rather than us needing to spam.

This case is a compromise between cost and availability, which trades speed. The
idea is that the SDK will partition set of `[active, replica1, ...]` nodes into
two groups: local and non-local. Then it will perform requests to local only,
and only when neither of nodes will return data, send requests to non-local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trouble with LOCAL_FIRST is that if the non-local node doesn't return anything, it's usually going to be due to a timeout - network blip or undiscovered node failover. So we only find out about it at the point the 2.5s KV user timeout has elapsed - and it's too late for us to try a fallback.

Instead of LOCAL_FIRST, we could support: The user can do an initial call with LOCAL_ONLY and a short timeout, and if that times out, try any fallback strategy they want.

This avoids all the complexity of the other potential solutions to this problem (like having two timeouts, or allowing the user to specify some sort of strategy thing).

@avsej avsej force-pushed the 0077-zone-aware-replica-reads branch 2 times, most recently from 63f2d41 to f3c50be Compare May 20, 2024 05:05
@avsej avsej force-pushed the 0077-zone-aware-replica-reads branch from f3c50be to 7b9694f Compare May 20, 2024 05:06
@avsej avsej marked this pull request as ready for review May 20, 2024 05:29
@avsej avsej changed the title Zone-Aware Replica Reads RFC-0078: Zone-Aware Replica Reads May 20, 2024
@avsej
Copy link
Member Author

avsej commented May 29, 2024

@programmatix I've uploaded update for transactions API. It documents DocumentNotFound exception.

selected server group, but only use current topology to detect the Empty-Group
case.

![Selected Server Group or All Available](figures/0078-case-3-selected-server-group-or-all-available.svg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inability to view these significantly impacts ability to review


![Selected Server Group](figures/0078-case-2-local-only.svg)

## Selected Server Group or All Available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's maybe worth considering as we think about any fallback options, that we perhaps want to recommend that the user always has a fallback strategy in place:

try {
   return collection.getAnyReplica(docId, options().timeout("20ms").read_preference(SELECTED_SERVER_GROUP)
} catch DocumentUnretrievableException | DocumentNotFoundException {
  return collection.get(docId)
  // or collection.getAnyReplica() or collection.getAllReplicas()  - as they wish
}

This saves them from a bunch of edge cases - like a new doc not having been replicated, mixed-mode clusters, and possibly some rebalance & failover scenarios. Working with replicas is inherently a bit flaky, and we should advise users how to protect themselves.

Plus they don't necessarily know if they are working with 7.6.2 or not.

So in that context, knowing the user is going to have a fallback strategy in place already - I do wonder if there's any benefit to having fallback modes like this?


## Selected Server Group

In this case, all reads will be done only from the local availability zone. It
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/local availability zone/selected|preferred server group, I think?

read replica APIs:

```
enum ReadPreference {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight nit: SELECTED_SERVER_GROUP at least is a strict command rather than a preference. Should this be ReadRequirement?

I don't feel strongly on this though.

```
class TransactionAttemptContext {
// ...
public TransactionGetResult getReplica(Collection collection,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so maybe this has got lost in translation but I think this should be getReplicaFromPreferred[Server]Group() or similar.

(Or more accurately: I wish I had the time and capability to design a true getReplica() that handles all single-replica-read concerns. But I cannot figure out what that should look like. So until either inspiration strikes or I get more cycles to spend on it, the thought is to have a single-purpose method instead that is focussed on only this use-case. That doesn't "salt the earth" from doing a true getReplica() at a later date.)

property in cluster configuration.

```
class TransactionAttemptContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel a bit uneasy with transactions specs being outside of the hackmd docs where they are all grouped together.

Perhaps I can just copy this into the API doc once finalised? We need a better long-term solution though.


# Caveats

The use must be aware of the following topics when dealing with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: user

The behavior of this feature with older servers (pre 7.6.2) is not defined. It
is up to SDK to decide what to do (for example, it could return `102
DocumentIrretrievable` in this case, because the configuration does not tell us
anything about server groups assigned to the nodes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we should nail this down.

What you've proposed makes sense to me, in the context of what I propose elsewhere about the user always have try-catch handling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants