Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leadership yielding is not synchronized with replicated log #463

Open
myrrc opened this issue Aug 24, 2023 · 1 comment
Open

Leadership yielding is not synchronized with replicated log #463

myrrc opened this issue Aug 24, 2023 · 1 comment

Comments

@myrrc
Copy link
Contributor

myrrc commented Aug 24, 2023

Situation:

  • We have a cluster 1,2,3; 1 is leader
  • Process a command "add 4, remove 1". Imagine these are two calls in C++ code.
  • "add 4" is accepted but not committed yet
  • We want to remove 1, but 1 is leader. Our options are either to yield_leadership (if a request got on 1) or request_leadership (if we're on 2 or 3).
  • Suppose we request_leadership. The request gets to 1. 1 pauses writes
  • "add 4" is never committed and therefore lost.

Quite a synthetic example, but that's what we encountered in

So, a fix option is to wait for new config to get committed and to execute new commands only after that, but I wonder whether there's an option to solve this at library level.

I tried changing

"pause write from now",
so that an option toggled would make leader commit all appended entries before pausing writes, no luck -- seems there are way to many invariants that get broken

@greensky00
Copy link
Contributor

greensky00 commented Aug 27, 2023

Both yield_leadership and request_leadership cannot enforce adding/removing member. This is because there is no guarantee that membership change will eventually be succeeded and committed. For example, 1 is leader, it gets the adding server requests, but fails to replicate the message due to network partition.

Also, membership change should be done one at a time. Next membership change should be done after making sure that the previous change is committed. There is a known problem that multiple membership change at once may result in incorrect quorum and data inconsistency. The original paper tried to resolve it by "joint consensus", but NuRaft does not implement it and instead enforces one member change at a time.

There was a similar thread: #177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants