Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Locking for ResignLeadership #20837

Closed
wants to merge 6 commits into from

Conversation

maierlars
Copy link
Collaborator

Scope & Purpose

While streaming transactions and aql queries carefully lock multiple shards in a deterministic order, the maintenance did not care about such a order and just locked shards when resigning in any order. In fact the maintenance has no idea about distributeShardsLike and handles each shard of the shard group individually. A move shard requests locking of all shards in a shard group. This can lead to deadlocks with aql queries, since the locks are acquired in different orders.

@maierlars maierlars self-assigned this Apr 11, 2024
@cla-bot cla-bot bot added the cla-signed label Apr 11, 2024
// check is we have collected information about this prototype already
auto it = prototypeShards.find(prototype);
if (it == prototypeShards.end()) {
auto protoCol = collections.get(prototype);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I had a hard time read protoCol because I was intuitively thinking of protocol (as in HTTP, HTTPS, etc.). I get that it means protoCollection here, but naming it protoCol is slightly ambiguous, so I suggest using protoCollection instead.

auto col = vocbase->lookupCollection(s);
if (col == nullptr) {
std::stringstream error;
error << "Failed to lookup local collection " << collection
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should include the shard name in the error message here as well, to have more detail should this error happen, e.g.
"Failed to lookup shard s of local collection collection"

@maierlars maierlars closed this May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants