Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: deadlock between DDL in high-prio txns and descriptor leases #46414

Open
knz opened this issue Mar 23, 2020 · 8 comments
Open

sql: deadlock between DDL in high-prio txns and descriptor leases #46414

knz opened this issue Mar 23, 2020 · 8 comments
Labels
A-schema-changes A-sql-executor SQL txn logic C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation T-kv KV Team X-anchored-telemetry The issue number is anchored by telemetry references.
Projects

Comments

@knz
Copy link
Contributor

knz commented Mar 23, 2020

summary: ROLLBACK TO SAVEPOINT can cause a deadlock in a high-priority txn that contains DDL, so CockroachDB is set to disallow it altogether for the time being.

Ideally, a better solution would enable lease acquisition to operate at an ever higher priority.

Details

Prior to #46170, we had a deadlock when rolling back a txn for a restart / for a savepoint if the txn contained DDL, because subsequent lease acquisitions would wait on the main SQL txn which encloses it.

(This is because lease acquisition uses a separate kv txn.)

With #46170 committed, the lease acquisition occurs at a higher priority, which guarnatees deadlock avoidance in the common case where DDL is ran at normal priority.

However the deadlock still exists when the DDL is executed at high priority too.

Solution

We need two "high priority" levels, where lease acquisition is at a level higher than anything a SQL txn can be set to.

Jira issue: CRDB-5089

@knz knz added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-known-limitation A-sql-executor SQL txn logic A-schema-changes labels Mar 23, 2020
@knz knz added this to Triage in SQL Foundations via automation Mar 23, 2020
knz added a commit to knz/cockroach that referenced this issue Mar 23, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
knz added a commit to knz/cockroach that referenced this issue Mar 23, 2020
See issue cockroachdb#46414

Because lease acquisition must operate at a high prio
than schema changes, and SQL "HIGH PRIORITY" operates at the same
prio as lease acquisition, we can't let ROLLBACK TO SAVEPOINT
after schema change in HIGH PRIORITY txns.

Release justification: fixes for high-priority or high-severity bugs in existing functionality

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with `cockroach_restart`)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue cockroachdb#46414 for details.
@knz knz added the X-anchored-telemetry The issue number is anchored by telemetry references. label Mar 30, 2020
@jseldess
Copy link
Contributor

@knz or @andreimatei, can either of you please provide a realistic example of when this deadlock would occur and a recommended workaround?

@jseldess
Copy link
Contributor

Actually, I think I can use the example from one of the linked PRs. Does this look ok?

ROLLBACK TO SAVEPOINT in high-priority transactions containing DDL

Transactions with priority HIGH that contain DDL and ROLLBACK TO SAVEPOINT are not supported, as they could result in a deadlock. For example:

> BEGIN PRIORITY HIGH; SAVEPOINT s; CREATE TABLE t(x INT); ROLLBACK TO SAVEPOINT s;
ERROR: unimplemented: cannot use ROLLBACK TO SAVEPOINT in a HIGH PRIORITY transaction containing DDL
SQLSTATE: 0A000
HINT: You have attempted to use a feature that is not yet implemented.
See: https://github.com/cockroachdb/cockroach/issues/46414

Tracking Github Issue

@knz
Copy link
Contributor Author

knz commented Apr 13, 2020

@jseldess I changed the behavior in PR #46415 - the deadlock is now avoided and reported as "unsupported feature". The relevant release note is this:

Release note (sql change): ROLLBACK TO SAVEPOINT (for either regular
savepoint or "restart savepoints" defined with cockroach_restart)
causes a "feature not supported" error after a DDL statement in a HIGH
PRIORITY transaction, in order to avoid a transaction deadlock. See
issue #46414 for details.

@knz
Copy link
Contributor Author

knz commented Apr 13, 2020

Your text in #46414 (comment) is thus correct and adequate.

@jseldess
Copy link
Contributor

Thanks, @knz.

@thoszhang thoszhang added this to Incoming in KV via automation May 5, 2020
@thoszhang thoszhang removed this from Needs Triage (4/28/2020) in SQL Foundations May 5, 2020
@thoszhang
Copy link
Contributor

@knz as I understand it, this mostly requires adding a new priority level for transactions for lease acquisition to use, so I'm moving it from SQL Schema to KV; let me know if you disagree.

@knz
Copy link
Contributor Author

knz commented May 5, 2020

mostly requires adding a new priority level for transactions for lease acquisition to use

IT's both adding the new priority and teaching SQL how to use it. It's cross-team if you will. The question is "who should drive this". The current behavior is only visible to SQL users from the fact that certain operations will now be impossible (rejected) when issued within SQL transactions. Who's going to see that pain firsthand? I suppose it's going to be the SQL PM, not the KV PM.

Maybe you can assign this to the PM nearest to you to see what they want to do with it.

@andreimatei
Copy link
Contributor

For the sake of the recording, one thing we've floated around is for KV to recognize the notion of dependent transactions, and incorporate that knowledge in the pushing protocol: If txn A depends on B and now B is trying to wait on A, don't actually wait. This seemed like a useful concept in general because we have other examples in the system of dependent transactions, and we've hit similar deadlocking issues with them too (e.g. resolving ranges, or any sort of "semi-sync" work).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes A-sql-executor SQL txn logic C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) docs-done docs-known-limitation T-kv KV Team X-anchored-telemetry The issue number is anchored by telemetry references.
Projects
KV
On Hold
Development

No branches or pull requests

5 participants