More efficient support for linearizable reads #5636

heidihoward · 2023-09-08T10:17:25Z

I believe the only way to achieve linearizable reads at the moment is for the app logic to create a phantom write to the kv to ensure that the primary replicates and reaches consensus on the associated txn. Without this, the read might be stale, even if forwarding is set to always and the client waits until the read txn ID is committed. This is because the primary who served the read may no longer be the current primary.

This is related to but somewhat separate from the discussion around wait for commit.

Here's some early ideas about how to make linearizable reads more efficient:

Allow txns with empty write sets - this probably the simplest approach. The app logic would still need to create a transaction for the read but at least with no write set, the transaction will have few conflicts and be smaller. It's a cleaner solution but I'm not sure how much is gained by this approach
Return the last sequence number +1 to the client for linearizable reads. This means that the client will not consider the read committed unless the primary commits another transaction (in the same term). The key issue here is that if the previous txn is a signature and no other write txns arrive then the client could be waiting indefinitely. This could be worked around by forcing another signature or using (1)
Implement PreVote or some type of leader stickiness such that if we trust clocks then the primary can serve linearizable reads without a round trip to a majority
Implement wait for commit and then delay responses to linearizable reads until after the next successful AppendEntries. This is closest to existing systems like etcd

Aside: This could be an interesting to model in TLA+ using a separate high-level specification such as https://github.com/Azure/azure-cosmos-tla/blob/master/general-model/cosmos_client.tla

lemmy · 2023-09-08T16:39:11Z

Aside: This could be an interesting to model in TLA+ using a separate high-level specification such as https://github.com/Azure/azure-cosmos-tla/blob/master/general-model/cosmos_client.tla

The latest versions of the Cosmos specs are hosted at https://github.com/tlaplus/azure-cosmos-tla/ (including a completely new spec discussed in Understanding Inconsistency in Azure Cosmos DB with TLA+). Lorin Hochstein's Linearizability spec could also prove useful.

eddyashton · 2023-09-11T08:58:31Z

This is related to but somewhat separate from the discussion around wait for commit.

I think we'd need a phantom write and wait-for-commit behaviour to produce a linearizable read? If we only add a write to a read operation, then it could still be executed over stale state on an isolated (no longer current) primary. It would at least get a fresh transaction ID, that could later be recognised as invalid/uncommittable, but I don't see how that helps the initial response linearizability, since that read result may report stale state.

heidihoward · 2023-09-11T09:14:38Z

Agreed, the client must wait for commit for a linearizable read (or write for that matter). This comment was referring to the discussion around CCF withholding client responses until the txn has been committed.

achamayou · 2023-09-11T09:16:57Z

I don't think we need a phantom write to detect the next instance of a successful AppendEntries quorum. If we maintained in a counter next to the CheckQuorum response time for example, we could just check for that going up following commit?

heidihoward · 2023-09-11T11:51:06Z

Yes, a successful appendEntries to a majority, even if just a heartbeat, is sufficient for the leader to know that a read is linearizable

eddyashton · 2023-09-11T12:52:28Z

To use the AppendEntries roundtrip, do we need to include a new counter value in AEs (and responses)? I'm specifically worried about delayed arrival of ACKs. You can't currently distinguish a heartbeat-ACK emitted before a user's request arrived, from one emitted after. These are misleading but non-fatal for CheckQuorum, but if a node believes an ACK is fresh (since user request) when it is not, that could break linearizability?

heidihoward · 2023-10-02T15:40:20Z

To use the AppendEntries roundtrip, do we need to include a new counter value in AEs (and responses)? I'm specifically worried about delayed arrival of ACKs. You can't currently distinguish a heartbeat-ACK emitted before a user's request arrived, from one emitted after. These are misleading but non-fatal for CheckQuorum, but if a node believes an ACK is fresh (since user request) when it is not, that could break linearizability?

Yes you're right. The leader will need to ensure that the AE (heartbeat or otherwise) was requested after the read-only transaction was received

heidihoward · 2023-10-02T15:41:32Z

An example of the non-linearizability of read-only transactions is now provided in the TLA+ CI https://github.com/microsoft/CCF/actions/runs/6381743027/job/17318854604

heidihoward added the enhancement label Sep 8, 2023

heidihoward mentioned this issue Sep 29, 2023

TLA+ specification for client consistency #5699

Merged

lemmy mentioned this issue May 5, 2024

Document non-linearizability of read-only transactions in rare system conditions #6167

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More efficient support for linearizable reads #5636

More efficient support for linearizable reads #5636

heidihoward commented Sep 8, 2023

lemmy commented Sep 8, 2023

eddyashton commented Sep 11, 2023

heidihoward commented Sep 11, 2023

achamayou commented Sep 11, 2023

heidihoward commented Sep 11, 2023

eddyashton commented Sep 11, 2023

heidihoward commented Oct 2, 2023

heidihoward commented Oct 2, 2023

More efficient support for linearizable reads #5636

More efficient support for linearizable reads #5636

Comments

heidihoward commented Sep 8, 2023

lemmy commented Sep 8, 2023

eddyashton commented Sep 11, 2023

heidihoward commented Sep 11, 2023

achamayou commented Sep 11, 2023

heidihoward commented Sep 11, 2023

eddyashton commented Sep 11, 2023

heidihoward commented Oct 2, 2023

heidihoward commented Oct 2, 2023