Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VerneMQ connections not closing causing it to crash #2037

Open
mbitar-gm opened this issue Sep 23, 2022 · 4 comments
Open

VerneMQ connections not closing causing it to crash #2037

mbitar-gm opened this issue Sep 23, 2022 · 4 comments

Comments

@mbitar-gm
Copy link

mbitar-gm commented Sep 23, 2022

Environment

  • VerneMQ Version: 1.11.0
  • VerneMQ configuration (vernemq.conf) or the changes from the default:
listener.http.metrics=0.0.0.0:8888
max_offline_messages=30000
max_online_messages=30000
listener.max_connections=25000

Actual behaviour

I have a VerneMQ broker running with metrics enabled, and I visualize these on a Grafana dashboard.
I have over 5200 devices connected to that broker. These devices restart their connections every approximately 16 hours.

Usually everything goes smoothly, devices restart their connections, they reconnect normally, and no issues seem to happen with the broker.

However, sometimes around restarts, the following happens:

  • Socket Open almost becomes the double of Socket Close:
    image

  • Consequently the number of connected clients keeps going up until broker stops functioning properly:
    image

Around that time I can observe the following errors in the logs :

"22:46:24.232 [error] CRASH REPORT Process <0.3534.183> with 0 neighbours crashed with reason: no try clause matching {shutdown,{state,<<>>,vmq_mqtt_pre_init,terminated,mqttws,#Port<0.276347>,{{172,18,0,5},54939},{{1663,281924,69525},0},{{1663,281924,69525},0}}} in cowboy_websocket:handler_call/6 line 469\r"
"22:46:24.233 [error] Ranch listener {{0,0,0,0},8080} terminated with reason: no try clause matching {shutdown,{state,<<>>,vmq_mqtt_pre_init,terminated,mqttws,#Port<0.276347>,{{172,18,0,5},54939},{{1663,281924,69525},0},{{1663,281924,69525},0}}} in cowboy_websocket:handler_call/6 line 469\r"
"22:46:24.280 [warning] Subquery failed due to timeout\r"

After that happens, all clients are unable to connect to the broker.
Unfortunately I was not able to reproduce the issue yet as it happens haphazardly.

Update:

Also found out the following happens after the issue starts happening:

  • Messages get queued up:
    image

  • Commands that would usually take 1 second to return start timing out after 1 minute.

@ioolkos
Copy link
Contributor

ioolkos commented Sep 24, 2022

Check what the upstream network components do. Do you have any idle timeouts configured in a firewall?
Keep us posted on your findings.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@mbitar-gm
Copy link
Author

mbitar-gm commented Sep 26, 2022

Check what the upstream network components do. Do you have any idle timeouts configured in a firewall? Keep us posted on your findings.

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq 👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

I don't think this is a Network issue.
We have a cronjob that queries the broker once an hour, and when the issue starts commands start timing out.

Updated the post.

@mbitar-gm
Copy link
Author

Bump

@ioolkos
Copy link
Contributor

ioolkos commented Jan 16, 2023

Does this look like a reconnect overload? Do you actually see CPU maxed, maybe running on a supertiny instance?
How long do the clients wait before they get a connack answer (and before they would do another immediate connect attempt).


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants