New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Log spamed with vmq_metrics:met2idx({mqtt5_disconnect_sent,disconnect_migration}) #2087
Comments
It's a little weird this happens here at all. Can't remember this seeing logged often, but seems to need more attention. 👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq |
The weird thing is, that I can constantly see the log filling up with it and goes on for hours and hours. Even a restart did not help. 2023-02-16 15:15:32.777 [error] <0.22206.0>@vmq_queue:drain:{455,5} got unknown sync event in drain state {cleanup,session_taken_over} |
So, I stopped the nodes. Deleted the data directories, rejoined them... one of the nodes keeps doing this, even after a reboot of the machine... This is getting really strange. |
How long do your MQTT clients wait for the server CONNACK before they attempt a re-connect? 👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq |
Meanwhile, I found the culprint. It goes into the direction you mentioned. So the bug is not that my log is spammed, but that it bring down one of my nodes (it was completely unavailble at one point which triggered the warning the first place).
|
For future readers, the trace above shows a ClientId ( I suppose we need more data on whether this is able to block a whole node over time or not. Verne does not do any sliding window accounting for TCP connection setup which would allow per-client ID or per client IP connection rate limiting. Proxy components like HAProxy could do this. (HAProxy even supports MQTT ClientID since 👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq |
Environment
Current Behavior
After an update of one of our nodes the log is full of the following error messages:
with 0 neighbours crashed with reason: no function clause matching vmq_metrics:met2idx({mqtt5_disconnect_sent,disconnect_migration}) line 1950
First, I thought: Easy fix. This seems to have been forgotten in met2idx. After looking into the code, it was not so obvious anymore.
The origin seems to be:
https://github.com/vernemq/vernemq/blob/master/apps/vmq_server/src/vmq_mqtt5_fsm.erl
the code crashes in vmq_metrics, so before serialise frame is called (which would crash in rcn2rc anyway). At this point I realized that adding it to metrics is most likely not the right solution. What is the purpose of disconnect_migration, is this just some internal flag or does it really has to be sent to the clients (which does not happen at the moment?). Should this just be translated to an administrative disconnect, or be eaten somewhere and never reach gen_disconnect, or something completely different...
Expected behaviour
Clean log.
Configuration, logs, error output, etc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: