Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some documentation of catch up proccces #852

Open
alievmirza opened this issue Jun 21, 2022 · 1 comment
Open

Some documentation of catch up proccces #852

alievmirza opened this issue Jun 21, 2022 · 1 comment

Comments

@alievmirza
Copy link
Contributor

alievmirza commented Jun 21, 2022

I've investigated catchup procces in JRaft, and made some documentation about it and decided to share it with the community. Hope this will be helpful.

Note, that this is actual for 1.3.5, but I haven't checked, if there were some changes in the algorithm.

Catching up process

Entry point:

We have a closure named NodeImpl.OnCaughtUp, which is responsible for the catching up process for every stale node/replicator on a leader.
This closure is created every time we call NodeImpl.ConfigurationCtx#addNewPeers which happens on a raft configuration change, for example, when we call NodeImpl#changePeers. In NodeImpl.ConfigurationCtx#addNewPeers method we assign OnCaughtUp closure with a corresponding replicator for a stale node. This is done inside ReplicatorGroupImpl#waitCaughtUp by calling Replicator#waitForCaughtUp. To be more precise, we save the closure in a field Replicator#catchUpClosure and also we schedule timer on a replicator to call Replicator#onCatchUpTimedOut (by default it is called after election timeout).

Closure invocation

So, we saved closure, it is time to discuss what happens when the closure is run. It happens in Replicator#notifyOnCaughtUp, we call this method with a status of success or failure of a catching up process and propagate it to the closure by setting Status.setError(int, java.lang.String, java.lang.Object...). When the closure is run, NodeImpl#onCaughtUp is called, this method checks the status of a process and here we have several outcomes:

  • Status is OK and NodeImpl.ConfigurationCtx#onCaughtUp with a success flag equals true is called, so we can move
    to NodeImpl.ConfigurationCtx.nextStage in a configuration changing process.
  • Status is Error and more specific, it is timeout, so we retry catching up process by creating a new NodeImpl.OnCaughtUp closure
    calling the same Replicator#waitForCaughtUp that we described before.
  • If retrying went wrong or Status is Error and is not a timeout, we call NodeImpl.ConfigurationCtx#onCaughtUp with a success flag
    equals false, so the whole process of a configuration changing is reset with RaftError.ECATCHUP.

Where the closure is invoked

Now let's discuss when this closure is run. As we said before, it happens in Replicator#notifyOnCaughtUp, so lets track who call Replicator#Replicator#notifyOnCaughtUp

Calls with successful statuses:

  • Replicator#onInstallSnapshotReturned -- called after a successful installation of a snapshot
  • Replicator#onAppendEntriesReturned -- called after a successful appending of new entries from a leader

Calls with error statuses:

  • Replicator#onCatchUpTimedOut with RaftError.ETIMEDOUT, called when a timer event happens. As was described before, this timer is started
    when we call Replicator#waitForCaughtUp.
  • Replicator#onHeartbeatReturned, Replicator#onAppendEntriesReturned, or Replicator#onTimeoutNowReturned with RaftError.EPERM,
    called when a follower returns a term higher than a leader's current term. This is a general check for RPC calls where we check terms and
    decide, should we step down or not.
  • Replicator#onTimeoutNowReturned with RaftError.ESTOP, called when we passed to the method flag stopAfterFinish equals true. It happens
    when a leader is stepped down and we try to wake up a potential candidate for the optimisation purposes
    (see waking up optimisation), so we call
    Replicator#sendTimeoutNowAndStop(this.wakingCandidate, this.options.getElectionTimeoutMs()) on a leader. For more details see
    NodeImpl#stepDown
  • Replicator#onError with RaftError.ESTOP. This is a general case when some replicator was stopped. For example, it might happen when
    a leader stepped down, or when a node was shutdown, etc. Let's consider all places where Replicator#onError with RaftError.ESTOP can happen, to
    do that we need to trace Replicator#stop
    • NodeImpl#shutdown(org.apache.ignite.raft.jraft.Closure) -- node shutdown case
    • ReplicatorGroupImpl#stopAll -- happens when a leader steps down, including stopping all replicators. See NodeImpl#stepDown.
    • ReplicatorGroupImpl#stopReplicator -- this happens when we call ConfigurationCtx#reset(org.apache.ignite.raft.jraft.Status),
      when we successfully or not successfully changed configuration, so we have to start or stop replicators for new peers.
    • ReplicatorGroupImpl#stopAllAndFindTheNextCandidate -- called when a leader step down, in case we make
      waking up optimisation

Waking up optimisation

When a leader faces some problem, it makes some optimisation when it steps down, to start a new voting with a new candidate immediately. In that case, instead of stopping all replicators as usual, it preserves one replicator for stopping and sends TimeoutNowRequest to it. When the node receives that request, it elects itself and starts voting. Failed leader chose such node by searching for the node with the largest log id among peers in the current configuration. For more details see ReplicatorGroupImpl#stopAllAndFindTheNextCandidate

@killme2008
Copy link
Contributor

Very good, thank you for your sharing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants