New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leader steps down when followers' disks are slow #202
Comments
See the bulletted list in "6.2 Routing requests to the leader" in my dissertation for an explanation of why it's there. In short, an isolated leader shouldn't hold up client requests forever. |
Ok, makes sense. Will close this ticket. |
One thing you could do to mitigate this issue is increase the timeout for when the leader steps down. It's currently set to ELECTION_TIMEOUT (not configurable). I wouldn't go much above 2 * ELECTION_TIMEOUT or clients would be delayed a long while, but maybe that'd help the issue? It's an easy change to try out, and if it turns out to be helpful, it wouldn't be difficult to make that configurable. I'm gonna rename this GitHub issue based on the symptom now. |
Extending the step down timer is easy to try and it makes sense that should help. I'll do that now. Not sure if this is relevant to the general case, but under common failure scenarios in my particular setup, a client that could connect to a leader that is segmented from the network would also be segmented from the network, so its progress would be inhibited regardless of the leader stepping down. |
Ah, great point. I think if you were 100% confident in that statement, you could disable the step down thread entirely (or set a timeout of infinity). Though maybe a large timeout would be a wiser choice in case there's some unexpected wedging anywhere. |
for reference, i have this set to 12x election timeout right now. seems to be helping with slow disks and failover time is within my acceptable range. |
Hmm, I hadn't considered this when I wrote my earlier comment: what if machine1 can't talk to a majority of the cluster but can talk to machine2, and machine2 can talk to all of the others. And let's say machine1:server is a deposed leader and machine2:server is the current leader. In this case, machine1:client would get service if it talked to machine2:server, but without machine1:server stepping down, it could get stuck waiting on machine1:server. This is a bit contrived, and it seems fairly unlikely with a single switch between all the machines. So you might be ok with it, especially given the moderate outage (12x election timeout). Still, I thought I'd bring it up. |
I think we can set a timeout in Peer::callRPC as RPC_FAILURE_BACKOFF. |
this is related to #200. in my saturated disk testing, i was able to keep a leader around longer by setting the election timeout after writing to disk instead of before. that got around the issue of spurious leader elections being proposed by followers.
it also opened the door to a second situation, one in which there is a stable leader but sometimes no followers can respond to appendEntries in time for the leader to not step down in stepDownThreadMain. the assertion of this ticket is that a log cabin leader should rely on the discovery of a new leader to step down, and not a timeout.
i'm not sure the assertion of this ticket is correct, but i thought i'd file it to find out.
The text was updated successfully, but these errors were encountered: