Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

小规格机器 Leader心跳定时调度提交RejectedException,Follower一直自己preVote #1042

Open
CZJCC opened this issue Nov 21, 2023 · 1 comment

Comments

@CZJCC
Copy link

CZJCC commented Nov 21, 2023

Describe the bug

一个正常工作的 1c2g 3节点组集群,应该是leader跟follower之间网络突然抖了一下,然后就发生了异常无法自愈

Leader节点(0-0节点)观察到的日志
image
这个现象过后就是一直拒绝follwer 0-1的preVote请求了
image


异常Follower(0-1节点)上观察的日志
image
可以看到0-1跟0-0 leader断了一下后面又重新连上了,但是就陷入了preVote的死循环,但这个时候的leader仍然是0-0,用arthas在0-1 follower上抓这个请求已经没有了,leader上也确认了没有给0-1发AppendEntry,导致0-1一直没法重新正常加到group里

watch com.alipay.sofa.jraft.core.NodeImpl handleAppendEntriesRequest 

猜测是不是因为leader上这个uncaught exception导致了心跳的调度任务终止跳出了,而下次的心跳调度的触发又依赖follower的com.alipay.sofa.jraft.core.Replicator#onHeartbeatReturned, @fengjiachun 大佬有没有什么建议,求指教

Expected behavior

Actual behavior

Steps to reproduce

Minimal yet complete reproducer code (or GitHub URL to code)

Environment

  • SOFAJRaft version: 1.3.12 ,不是用的bolt,用的同版本的rpc-grpc-impl
  • JVM version (e.g. java -version): 8
  • OS version (e.g. uname -a): Linux 4.19.91-24.1.al7.x86_64
  • Maven version:
  • IDE version:
@fengjiachun
Copy link
Contributor

kill -s SIGUSR2 pid

https://www.sofastack.tech/projects/sofa-jraft/jraft-user-guide/

参考第11小节,每个节点会产生三个文件
node_metrics, thread_pool_metrics, node_describe

请发下每个节点的这三个文件,建议文本,不要截图

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants