Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nacos 1.4.2 jraft 线程池MpscSingleThreadExecutor队列满的异常 #1001

Open
alexwylp opened this issue Jun 23, 2023 · 2 comments
Open

Comments

@alexwylp
Copy link

nacos 1.4.2使用的是jraft-core 1.3.5版本
nacos 1.4.2,5节点集群;近期nacos.log出现了几次jraft报错(对外、控制台功能都正常),每次都是其中1台,重启后解决,经nacos社区KomachiSion解答,是“请求过于频繁, 线程池队列满了, 拒绝了新的请求”。

实际情况是差不多重启完5天,会出现相同的问题。参考KomachiSion老师的解答,是生产者>>消费者,因此按均匀速度使得队列慢慢堆积。在此想请教sofa社区的老师:
1)这套集群5节点,负载比较低,这个集群间jraft同步操作的功能是?
2)是否是存在路由不均匀的情况?导致某一台的队列生产速度>>消费速度?
3)这个MpscSingleThreadExecutor的默认队列大小是否是30000多?如果生产速度>>消费速度,那在队列未满,RejectedExecutionHandlers不报错的时候,队列已出现阻塞(消费速度慢),同步操作的时效性已经受影响了?所以说将队列初始大小调大也是治标不治本?
4)是否可以通过参数调大消费的线程数量?但是看名字是MpscSingleThreadExecutor,不知是否可行?
5)有没有比较好的工具推荐,可以监控和观测这个MpscSingleThreadExecutor的任务生产者动作?

谢谢!

2023-06-01 09:03:01,093 ERROR Exception while executing runnable io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed@58cd151b java.util.concurrent.RejectedExecutionException: null at com.alipay.sofa.jraft.util.concurrent.RejectedExecutionHandlers.lambda$static$0(RejectedExecutionHandlers.java:32) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.reject(MpscSingleThreadExecutor.java:288) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.addTask(MpscSingleThreadExecutor.java:215) at com.alipay.sofa.jraft.util.concurrent.MpscSingleThreadExecutor.execute(MpscSingleThreadExecutor.java:140) at com.alipay.sofa.jraft.rpc.impl.GrpcServer.lambda$registerProcessor$2(GrpcServer.java:194) at io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172) at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) at io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35) at io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23) at io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40) at io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86) at io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814) at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) at io.grpc.internal.SerializeReentrantCallsDirectExecutor.execute(SerializeReentrantCallsDirectExecutor.java:49) at io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener.halfClosed(ServerImpl.java:828) at io.grpc.internal.AbstractServerStream$TransportState.deframerClosed(AbstractServerStream.java:242) at io.grpc.netty.shaded.io.grpc.netty.NettyServerStream$TransportState.deframerClosed(NettyServerStream.java:206) at io.grpc.internal.MessageDeframer.close(MessageDeframer.java:229) at io.grpc.internal.MessageDeframer.closeWhenComplete(MessageDeframer.java:191) at io.grpc.internal.AbstractStream$TransportState.closeDeframer(AbstractStream.java:183) at io.grpc.internal.AbstractServerStream$TransportState.inboundDataReceived(AbstractServerStream.java:269) at io.grpc.netty.shaded.io.grpc.netty.NettyServerStream$TransportState.inboundDataReceived(NettyServerStream.java:252) at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.onDataRead(NettyServerHandler.java:478) at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler.access$800(NettyServerHandler.java:101) at io.grpc.netty.shaded.io.grpc.netty.NettyServerHandler$FrameListener.onDataRead(NettyServerHandler.java:787) at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onDataRead(DefaultHttp2ConnectionDecoder.java:292) at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onDataRead(Http2InboundFrameLogger.java:48) at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readDataFrame(DefaultHttp2FrameReader.java:422) at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:251) at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:174) at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:378) at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:438) at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:505) at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:444) at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:283) at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1421) at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) at io.grpc.netty.shaded.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:794) at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:424) at io.grpc.netty.shaded.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:326) at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) at io.grpc.netty.shaded.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.lang.Thread.run(Thread.java:748)

@fengjiachun
Copy link
Contributor

你好,你提供的信息,可参考的不多,我不知道发生了什么,接下来我尽量回答你的每个问题

1)这套集群5节点,负载比较低,这个集群间jraft同步操作的功能是?

这个问题不应该问我,应该看看上层 nacos 做了什么?

2)是否是存在路由不均匀的情况?导致某一台的队列生产速度>>消费速度?

raft 是 leader based ,也就是 leader 节点完成写操作,没有什么路由,你可以先了解下 raft 的原理

3)这个MpscSingleThreadExecutor的默认队列大小是否是30000多?如果生产速度>>消费速度,那在队列未满,
RejectedExecutionHandlers不报错的时候,队列已出现阻塞(消费速度慢),同步操作的时效性已经受影响了?所以说将队列初始大小调大也是治标不治本?

是的,默认队列size 32768,一般不适合再调大,当然也可以调大试试,看看是否有请求大量超时

4)是否可以通过参数调大消费的线程数量?但是看名字是MpscSingleThreadExecutor,不知是否可行?

这个是多生产者单消费者的线程池,只能条 queue size, 就像上面回答的,可以试试,但是还是建议查出让队列饱和的根因,如果确实是往 jraft 中提交这么多 task, 那么也可以调大 queue size

5)有没有比较好的工具推荐,可以监控和观测这个MpscSingleThreadExecutor的任务生产者动作?

可以用两个工具

  1. jraft 自身的工具,参考第11章节:https://www.sofastack.tech/projects/sofa-jraft/jraft-user-guide/
  2. 阿尔萨斯看看关键对象的内存状态: https://arthas.aliyun.com/doc/

@alexwylp
Copy link
Author

收到,谢谢大佬细心解答,我去增加一些端口访问监控,以及在线程池offer入口打点endpoint相关的日志
谢谢!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants