concurrency: allow {exclusive,shared} repl lock acquisition/discovery #110145

arulajmani · 2023-09-06T22:18:33Z

Now that the lock table is able to track replicated locks with lock
strengths other than lock.Intent, this patch enables lock acquisition,
re-acquisition, and discovery of replicated {exclusive, shared} locks.

Most of this patch is about adding testing. One interesting behaviour
change worth calling out is that replicated exclusive locks do not block
non-locking readers, regardless of the reader's/lock holder's timestamp.
This is in contrast to unreplicated exclusive locks, which do.

This patch leaves things in a slightly transitional stage -- for now, we
forward a replicated lock's timestamp blindly on acquisition. In an
upcoming patch, we'll limit this to lock acquisition/discovery of just
intents. This is because the timestamp tracking is meaningless for other
lock strengths. Worse yet, the current behaviour opens us up to the
possibility of infinite lock discovery loops, where a request keeps
re-discovering a write intent but never blocking on it because of this
timestamp forwarding behaviour.

Informs #109673

Release note: None

This patch pulls timestamp assignment for replicated locks behind the call to `.acquire`. In a future patch, we'll only assign/update this field for intents. Doing this now means there'll be a single point in the code that'll need updating then. Epic: none Release note: None

cockroach-teamcity · 2023-09-06T22:18:42Z

This change is

nvanbenschoten

Reviewed 1 of 1 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani)

pkg/kv/kvserver/concurrency/lock_table.go line 1313 at r2 (raw file):

// blanket timestamp from this function is a pretty sharp edge. We should split
// this function out into two -- one that returns the holder's transaction, and
// another that fits the timestamp needs at the callers.

+1, I'm looking forward to seeing how this change looks. I suspect it will lead to some clarity, where the timestamp is only ever consulted on the non-locking read paths.

pkg/kv/kvserver/concurrency/lock_table.go line 1417 at r2 (raw file):

			// 2. We could only check the timestamp if the lock is being acquired with
			// strength lock.Intent, as we disregard the timestamp for all other lock
			// lock strengths when dealing with replicated locks.

"lock lock"

pkg/kv/kvserver/concurrency/lock_table.go line 2895 at r2 (raw file):

		// check that the lock acquisition is compatible.
		m := makeLockMode(acq.Strength, &acq.Txn, acq.Txn.WriteTimestamp)
		for e := kl.holders.Front(); e != nil; e = e.Next() {

With shared locks, there are now a number of places where we'll fall into quadratic behavior if we're not careful. This kind of verification is one of them. The verification is important to catch bugs, but do we want to run in production, or only in tests? How does that fit in with the more general test-only validation framework we were planning to add?

I think in general we do want to avoid quadratic behavior around shared lock acquisition and release, though we can probably start with a goal to avoid it at the storage layer (where costs are amplified) and then make improvements to the in-memory data structures after.

pkg/kv/kvserver/concurrency/lock_table.go line 2901 at r2 (raw file):

			if holderTxn.ID != acq.Txn.ID &&
				// ... which conflicts with the lock being acquired.
				lock.Conflicts(holder.getLockMode(), m, &st.SV) {

nit: pull out a holderMode variable to avoid calling holder.getLockMode() twice.

pkg/kv/kvserver/concurrency/lock_table.go line 2957 at r2 (raw file):

		// If lock(s) are already held on this key by other transactions, sanity
		// check that the discovered lock is compatible with them.
		m := makeLockMode(foundLock.Strength, &foundLock.Txn, foundLock.Txn.WriteTimestamp)

This is almost identical to the logic above. Should we extract it?

pkg/kv/kvserver/concurrency/lock_table_test.go line 471 at r2 (raw file):

				leaseSeq := roachpb.LeaseSequence(seq)
				str := lock.Intent // default replicated locks to write intents
				if d.HasArg("strength") {

Unrelated to this PR, but I noticed in concurrency_manager_test.go, we have the logic:

if d.HasArg("str") {  
    str = concurrency.ScanLockStrength(t, d)  
}

Is that broken? We're looking for "str" and then "strength". It looks like we never try to use it in tests.

Now that the lock table is able to track replicated locks with lock strengths other than lock.Intent, this patch enables lock acquisition, re-acquisition, and discovery of replicated {exclusive, shared} locks. Most of this patch is about adding testing. One interesting behaviour change worth calling out is that replicated exclusive locks do not block non-locking readers, regardless of the reader's/lock holder's timestamp. This is in contrast to unreplicated exclusive locks, which do. This patch leaves things in a slightly transitional stage -- for now, we forward a replicated lock's timestamp blindly on acquisition. In an upcoming patch, we'll limit this to lock acquisition/discovery of just intents. This is because the timestamp tracking is meaningless for other lock strengths. Worse yet, the current behaviour opens us up to the possibility of infinite lock discovery loops, where a request keeps re-discovering a write intent but never blocking on it because of this timestamp forwarding behaviour. Informs cockroachdb#109673 Release note: None

arulajmani

TFTR; addressed all comments -- should be good for another look!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/kv/kvserver/concurrency/lock_table.go line 1313 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

+1, I'm looking forward to seeing how this change looks. I suspect it will lead to some clarity, where the timestamp is only ever consulted on the non-locking read paths.

The non-locking read path is actually covered by the logic in getLockMode(), which reaches into the {,un}replicatedLockHolderInfo to do something smart.

The only place we use this, right now, is to check if the lock's timestamp has advanced during lock re-acquisition/lock update. We unblock non-locking readers that are lower than the (now new) lock timestamp in such cases.

pkg/kv/kvserver/concurrency/lock_table.go line 2895 at r2 (raw file):
Thanks for the nudge, I forgot our earlier conversation to pull these behind a testing build check (at the least).

How does that fit in with the more general test-only validation framework we were planning to add?

I think this will be a strict subset of the more general test-only validation described in #108843.

I've pulled this logic out into its own function and made it such that its only triggered from testing builds. I also changed the error's verbiage such that we can cargo cult this into the more general purpose verification framework once that's a thing. That way, none of the test cases we're adding here will have to change!

pkg/kv/kvserver/concurrency/lock_table.go line 2957 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This is almost identical to the logic above. Should we extract it?

Done!

pkg/kv/kvserver/concurrency/lock_table_test.go line 471 at r2 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Unrelated to this PR, but I noticed in concurrency_manager_test.go, we have the logic:
if d.HasArg("str") {  
    str = concurrency.ScanLockStrength(t, d)  
}
Is that broken? We're looking for "str" and then "strength". It looks like we never try to use it in tests.

Yeah, that looks busted. I'd filed #109634 to add concurrency manager tests for shared locks (currently, we're only testing latches at that level). That would have caught this.

nvanbenschoten

Reviewed 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)

arulajmani · 2023-09-07T01:14:23Z

Thanks for the quick reviews!

bors r=nvanbenschoten

craig · 2023-09-07T01:53:36Z

Build succeeded:

Bazel Essential CI (Cockroach)

arulajmani requested a review from nvanbenschoten September 6, 2023 22:18

arulajmani requested a review from a team as a code owner September 6, 2023 22:18

arulajmani force-pushed the rl-support-lock-strengths-v2 branch from 9bcc0ef to b239637 Compare September 6, 2023 22:18

nvanbenschoten reviewed Sep 6, 2023

View reviewed changes

arulajmani force-pushed the rl-support-lock-strengths-v2 branch from b239637 to 6cb9a3f Compare September 6, 2023 23:48

arulajmani commented Sep 6, 2023

View reviewed changes

arulajmani requested a review from nvanbenschoten September 6, 2023 23:49

nvanbenschoten approved these changes Sep 7, 2023

View reviewed changes

craig bot merged commit 4e79c65 into cockroachdb:master Sep 7, 2023
8 checks passed

miraradeva mentioned this pull request Oct 16, 2023

kv: a replayed write can cause an infinite lock discovery loop #112409

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

concurrency: allow {exclusive,shared} repl lock acquisition/discovery #110145

concurrency: allow {exclusive,shared} repl lock acquisition/discovery #110145

arulajmani commented Sep 6, 2023 •

edited

cockroach-teamcity commented Sep 6, 2023

nvanbenschoten left a comment

arulajmani left a comment

nvanbenschoten left a comment

arulajmani commented Sep 7, 2023

craig bot commented Sep 7, 2023

concurrency: allow {exclusive,shared} repl lock acquisition/discovery #110145

concurrency: allow {exclusive,shared} repl lock acquisition/discovery #110145

Conversation

arulajmani commented Sep 6, 2023 • edited

cockroach-teamcity commented Sep 6, 2023

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

arulajmani commented Sep 7, 2023

craig bot commented Sep 7, 2023

arulajmani commented Sep 6, 2023 •

edited