Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventBase::loop(): Failed to invoke event callback, breaking the loop. #111

Open
xuanyanwow opened this issue Sep 13, 2023 · 9 comments
Open

Comments

@xuanyanwow
Copy link
Contributor

businessWorker运行一段时间之后产生以下报错信息,使用K8S部署,启动3个pod,只有1个pod出现这问题,不会同时出现;

------------------------------------------- WORKERMAN --------------------------------------------
Workerman version:4.1.11          PHP version:8.1.16           Event-Loop:\Workerman\Events\Event
-------------------------------------------- WORKERS ---------------------------------------------
proto   user            worker            listen          processes    status           
tcp     root            BusinessWorker    none            12            [OK]            
--------------------------------------------------------------------------------------------------
Press Ctrl+C to stop. Start success.
GatewayConnection Error : 2 ,client closed
Exception: connection close tcp://10.2.164.000:2301 in /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php:1182
Stack trace:
#0 /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php(1111): GatewayWorker\Lib\Gateway::sendAndRecv()
#1 /home/htdocs/im/vendor/workerman/gateway-worker/src/BusinessWorker.php(362): GatewayWorker\Lib\Gateway::getSession()
#2 /home/htdocs/im/vendor/workerman/workerman/Connection/TcpConnection.php(646): GatewayWorker\BusinessWorker->onGatewayMessage()
#3 [internal function]: Workerman\Connection\TcpConnection->baseRead()
#4 /home/htdocs/im/vendor/workerman/workerman/Events/Event.php(193): EventBase->loop()
#5 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1629): Workerman\Events\Event->loop()
#6 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1423): Workerman\Worker::forkOneWorkerForLinux()
#7 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1397): Workerman\Worker::forkWorkersForLinux()
#8 /home/htdocs/im/vendor/workerman/workerman/Worker.php(560): Workerman\Worker::forkWorkers()
#9 /home/htdocs/im/src/Command/ServerCommand.php(129): Workerman\Worker::runAll()
#10 [internal function]: App\Command\ServerCommand->__invoke()
#11 /home/htdocs/im/vendor/minicli/minicli/src/App.php(239): call_user_func()
#12 /home/htdocs/im/vendor/minicli/minicli/src/App.php(218): Minicli\App->runSingle()
#13 /home/htdocs/im/app(27): Minicli\App->runCommand()
#14 {main}
PHP Warning:  EventBase::loop(): Failed to invoke event callback, breaking the loop. in /home/htdocs/im/vendor/workerman/workerman/Events/Event.php on line 193
worker[BusinessWorker:14] exit with status 64000
GatewayConnection Error : 2 ,client closed
GatewayConnection Error : 2 ,client closed
GatewayConnection Error : 2 ,client closed
PHP Warning:  stream_socket_client(): Unable to connect to tcp://10.2.164.000:2303 (Connection timed out) in /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php on line 1409
PHP Warning:  EventBase::loop(): Failed to invoke event callback, breaking the loop. in /home/htdocs/im/vendor/workerman/workerman/Events/Event.php on line 193
Exception: can not connect to tcp://10.2.164.000:2303 Connection timed out in /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php:1411
Stack trace:
#0 /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php(1165): GatewayWorker\Lib\Gateway::getGatewayConnection()
#1 /home/htdocs/im/vendor/workerman/gateway-worker/src/Lib/Gateway.php(1111): GatewayWorker\Lib\Gateway::sendAndRecv()
#2 /home/htdocs/im/vendor/workerman/gateway-worker/src/BusinessWorker.php(362): GatewayWorker\Lib\Gateway::getSession()
#3 /home/htdocs/im/vendor/workerman/workerman/Connection/TcpConnection.php(646): GatewayWorker\BusinessWorker->onGatewayMessage()
#4 [internal function]: Workerman\Connection\TcpConnection->baseRead()
#5 /home/htdocs/im/vendor/workerman/workerman/Events/Event.php(193): EventBase->loop()
#6 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1629): Workerman\Events\Event->loop()
#7 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1423): Workerman\Worker::forkOneWorkerForLinux()
#8 /home/htdocs/im/vendor/workerman/workerman/Worker.php(1397): Workerman\Worker::forkWorkersForLinux()
#9 /home/htdocs/im/vendor/workerman/workerman/Worker.php(560): Workerman\Worker::forkWorkers()
#10 /home/htdocs/im/src/Command/ServerCommand.php(129): Workerman\Worker::runAll()
#11 [internal function]: App\Command\ServerCommand->__invoke()
#12 /home/htdocs/im/vendor/minicli/minicli/src/App.php(239): call_user_func()
#13 /home/htdocs/im/vendor/minicli/minicli/src/App.php(218): Minicli\App->runSingle()
#14 /home/htdocs/im/app(27): Minicli\App->runCommand()
#15 {main}
worker[BusinessWorker:19] exit with status 64000

查询status 有一个进程是N/A busy状态

Workerman[/home/htdocs/im/src/Command/ServerCommand.php] status 
----------------------------------------------GLOBAL STATUS----------------------------------------------------
Workerman version:4.1.11          PHP version:8.1.16
start time:2023-09-13 11:09:39   run 0 days 9 hours   
load average: 15.7, 17.4, 17.13  event-loop:\Workerman\Events\Event
1 workers       12 processes
worker_name    exit_status      exit_count
BusinessWorker 64000            3
----------------------------------------------PROCESS STATUS---------------------------------------------------
pid     memory  listening    worker_name    connections send_fail timers  total_request qps    status
9       2.84M   none         BusinessWorker 13          0         56      1031441       0      [idle]
10      N/A     none         BusinessWorker N/A         N/A       N/A     N/A           N/A    [busy] 
11      2.81M   none         BusinessWorker 13          0         41      1017494       0      [idle]
12      2.81M   none         BusinessWorker 13          0         45      1008825       0      [idle]
13      2.82M   none         BusinessWorker 13          0         49      1006094       0      [idle]
16      2.84M   none         BusinessWorker 13          0         50      997295        0      [idle]
17      2.84M   none         BusinessWorker 13          0         54      1013818       0      [idle]
18      2.84M   none         BusinessWorker 13          0         58      986283        0      [idle]
21      2.86M   none         BusinessWorker 13          0         57      1016074       0      [idle]
2444    2.8M    none         BusinessWorker 13          0         50      1010678       0      [idle]
2989    2.75M   none         BusinessWorker 13          0         41      1003286       0      [idle]
6740    2.79M   none         BusinessWorker 13          0         56      36494         0      [idle]
----------------------------------------------PROCESS STATUS---------------------------------------------------
Summary 22M     -            -              143         0         557     10127782      0      [Summary] 

请问作者大佬,此问题应该如何排查定位问题

@walkor
Copy link
Owner

walkor commented Sep 13, 2023 via email

@xuanyanwow
Copy link
Contributor Author

这是我为提交issue 隐藏机器IP 手动改掉的, 原本IP是对的 已核对过

@walkor
Copy link
Owner

walkor commented Sep 13, 2023 via email

@xuanyanwow
Copy link
Contributor Author

  • 请问负载是从load average: 15.7, 17.4, 17.13 得出吗,一般该值超出多少则可认为负载太高 该值在三种进程中的含义是否一致?(参考值是否一致)
  • 是的 分布式部署
  • gatewayWorker进程有以下输出和status状态
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
Workerman[/home/htdocs/im/src/Command/ServerCommand.php] status 
----------------------------------------------GLOBAL STATUS----------------------------------------------------
Workerman version:4.1.11          PHP version:8.1.16
start time:2023-09-13 11:09:50   run 0 days 10 hours   
load average: 22.3, 20.97, 20.85 event-loop:\Workerman\Events\Event
1 workers       4 processes
worker_name  exit_status      exit_count
Gateway      0                0
----------------------------------------------PROCESS STATUS---------------------------------------------------
pid     memory  listening                worker_name  connections send_fail timers  total_request qps    status
9       7.85M   websocket://0.0.0.0:1216 Gateway      1154        4         3       25770080      0      [idle]
10      9.19M   websocket://0.0.0.0:1216 Gateway      1377        9         3       27351740      0      [idle]
11      8.04M   websocket://0.0.0.0:1216 Gateway      1190        5         3       25933475      0      [idle]
12      8.83M   websocket://0.0.0.0:1216 Gateway      1321        7         3       26545133      0      [idle]
----------------------------------------------PROCESS STATUS---------------------------------------------------
Summary 32M     -                        -            5042        25        12      105600428     0      [Summary] 

@walkor
Copy link
Owner

walkor commented Sep 13, 2023

load average: 15.7, 17.4, 17.13
是负载,一般不超过cpu核心数70%

@xuanyanwow
Copy link
Contributor Author

好的,我先尝试扩容降低负载再观察是否还有loop的问题 谢谢大佬

@walkor
Copy link
Owner

walkor commented Sep 13, 2023

你们是压测么?
gatewayWorker内部接口调用(例如Gatway::sendToAll())一般会与所有gateway进程通讯一次,所以整个集群的gateway进程数越少整个集群效率越高,负载越低。如果系统是因为内部频繁Gateway接口调用导致的负载高,增加gateway服务器并不能减少负载,反而会让负载更高。

如果你们有非常频繁的Gateway接口调用,gateway服务器建议只开两台服务器,每台只开2个进程,可以降低整个集群负载。

@xuanyanwow
Copy link
Contributor Author

xuanyanwow commented Sep 14, 2023

不是压测,是正式环境的请求量
目前gatewaWorker的进程数量为:3个节点,每个节点4个进程。
会频繁调用Client::sendToUid()
我们尝试一下降低gatewayWorker进程数量观察一下负载

@twomiao
Copy link
Contributor

twomiao commented Dec 10, 2023

不是压测,是正式环境的请求量 目前gatewaWorker的进程数量为:3个节点,每个节点4个进程。 会频繁调用Client::sendToUid() 我们尝试一下降低gatewayWorker进程数量观察一下负载

现在怎么样了。
看你DEBUG面板统计,你的连接数很少,但是每个连接的通讯量很大(短时间大量请求数据包)

frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection
frame not masked so close the connection

masked 这个应该是客户端发送到Gateway网关的websocket 二进制帧不合法,Gateway 认定为非法连接给关闭了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants