Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seems that readIndex() may return inconsistent value (e.g. 0) after node restart, is this a known issue? #1049

Open
sanpwc opened this issue Dec 11, 2023 · 6 comments

Comments

@sanpwc
Copy link

sanpwc commented Dec 11, 2023

Within node restart lastCommittedIndex initialized as 0

public class BallotBox implements Lifecycle<BallotBoxOptions>, Describer {
    ...
    private long lastCommittedIndex = 0;
    ...
}

Taking into the consideration that local log application triggered by NodeImpl#init is asynchronous, it's possible to handle readIndex request before local log re-play is finished and thus see inconsistent readIndex value. In other words, seems that there's a race between readIndex evaluation and lastCommittedIndex restoration on node restart.

@killme2008
Copy link
Contributor

I think it's impossible, the handleReadIndexRequest will check the state of the node before reading the lastCommittedIndex, and if its state is not the leader or the current leader doesn't commit an entry before, the readIndex will fail.

if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) {

And when a node becomes a leader, it will update the lastCommittedIndex before releasing the write lock:

this.confCtx.flush(this.conf.getConf(), this.conf.getOldConf());

this.node.unsafeApplyConfiguration(conf, oldConf == null || oldConf.isEmpty() ? null : oldConf, true);

@sanpwc
Copy link
Author

sanpwc commented Dec 11, 2023

Hmm, seems that if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) will be skipped in case of quorum <= 1 and this is the case when the issue is reproduced.

@killme2008
Copy link
Contributor

Hmm, seems that if (this.logManager.getTerm(lastCommittedIndex) != this.currTerm) will be skipped in case of quorum <= 1 and this is the case when the issue is reproduced.

If so, that would be possible, good catch! But I really doubt if is this a common case(just one node) for the production usage of jraft.

We can fix it, but maybe it is not urgent.

@sanpwc
Copy link
Author

sanpwc commented Dec 11, 2023

We can fix it

Well, that'll be nice))

BTW, seems that it's not the difficult to fix it. In case of single noded cluster (and only single noded cluster). We may consider pendingIndex on restart as committed one, just because in case of single node no-one will discard log entries.
So basically, it should be possible to do following to solve the issue. In case of single noded clusters only.

    public boolean resetPendingIndex(final long newPendingIndex) {
            ...
            this.lastCommittedIndex = newPendingIndex - 1;
            ...
        }

What do you think?

@killme2008
Copy link
Contributor

We can fix it

Well, that'll be nice))

BTW, seems that it's not the difficult to fix it. In case of single noded cluster (and only single noded cluster). We may consider pendingIndex on restart as committed one, just because in case of single node no-one will discard log entries. So basically, it should be possible to do following to solve the issue. In case of single noded clusters only.

    public boolean resetPendingIndex(final long newPendingIndex) {
            ...
            this.lastCommittedIndex = newPendingIndex - 1;
            ...
        }

What do you think?

It seems like the fix will work as expected. Would you be able to create a pull request for this change?

@sanpwc
Copy link
Author

sanpwc commented Dec 12, 2023

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants