Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialization with term=0 shouldn't be allowed #511

Open
GuyLewin opened this issue Jun 28, 2023 · 10 comments · May be fixed by #513
Open

Initialization with term=0 shouldn't be allowed #511

GuyLewin opened this issue Jun 28, 2023 · 10 comments · May be fixed by #513

Comments

@GuyLewin
Copy link

Describe the bug
Nodes often crashing (panic) during first few moments of Raft, when using distinct priority values per node together with pre_vote = true.
The least-prioritized node never crashes.

When I looked into it, I found that it's because during the pre-voting, we reject pre-voting requests from lower-priority nodes:
https://github.com/tikv/raft-rs/blob/82d704cdc3d93258be1f45efd715b95930764d7f/src/raft.rs#L1467C65-L1467C98

We then try to send the rejection back to the proposing node, including our term as the message's term:
https://github.com/tikv/raft-rs/blob/82d704cdc3d93258be1f45efd715b95930764d7f/src/raft.rs#L1494C21-L1498C58

But during that send, term is checked to be != 0, resulting in a panic instead of sending the proposal rejection:
https://github.com/tikv/raft-rs/blob/82d704cdc3d93258be1f45efd715b95930764d7f/src/raft.rs#L617C9-L640C14

Expected behavior
PreVoting rejection and not panic().

@BusyJay
Copy link
Member

BusyJay commented Jun 29, 2023

Then term should be assigned to vote response.

@GuyLewin
Copy link
Author

GuyLewin commented Jun 29, 2023

Then term should be assigned to vote response.

It is assigned, but the term is still 0 when the node initializes, unless I misunderstood what you were referring to.
I can attach the full backtrace if that helps, but the fatal is being called in this scenario pretty consistently.

@BusyJay
Copy link
Member

BusyJay commented Jun 29, 2023

I see. When the raft group is first initialized, it should not have term 0 (INVALID_TERM).

@GuyLewin
Copy link
Author

@BusyJay makes sense, thank you!
I will initialize all our nodes with term=1 in their Raft state.

@tisonkun
Copy link
Contributor

it should not have term 0

Perhaps we check it on initial state loaded? I encountered this issue days before also.

@BusyJay
Copy link
Member

BusyJay commented Jun 30, 2023

There are two cases: 1. constructing the raft node with existing data, so the node is already initialized and should have a valid term; 2. constructing the raft node without any data, so the node is not initialized and can have term 0.

Though adding a check that term must not be 0 if configuration is not empty should be OK.

@GuyLewin
Copy link
Author

Though adding a check that term must not be 0 if configuration is not empty should be OK.

I can work on a PR for that.
We were constructing a raft node without any data, but using pre-vote and priority so we ended up rejecting even before any data was sent, therefore getting this error during rejection message send.
But anyway, this check will help ensure this never happens.

@BusyJay
Copy link
Member

BusyJay commented Jul 1, 2023

Please send it when you are free, thanks!

@tisonkun
Copy link
Contributor

tisonkun commented Jul 2, 2023

Reopened

@tisonkun tisonkun reopened this Jul 2, 2023
@GuyLewin GuyLewin linked a pull request Jul 3, 2023 that will close this issue
@GuyLewin GuyLewin changed the title timing-based panic during prevote when using priority Initialization with term=0 shouldn't be allowed Jul 3, 2023
@GuyLewin
Copy link
Author

GuyLewin commented Jul 3, 2023

Created a PR - #513

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants