High CPU usage after restart #2139

GoushikaaMoorthi · 2023-04-11T06:31:22Z

We are running vernemq in AWS EKS setup. Below are the setup details
Vernemq version: 1.12.3
No of pods: 3
OS: Bottlerocket OS
EKS version: 1.24
Machine type: m5.2xlarge

After Os update from AWS linux to bottlerocket, subsequent EKS version upgrade results in increase in CPU usage

Expectation:
CPU usage should remain same as before after restart

*Actual Behaviour:
After restart CPU usage keeps on increasing. There is no increase in no of MQTT connections.
CPU Usage:

Connections:

We could also see the difference in queue and memory allocation metrics
Queue:

Memory Allocation:

will the increase in queue initialisation be the reason for increase in CPU usage?

We also tried the options suggested in this issue, but the increase in CPU usage remains the same

We tried restarting broker
We also tried complete restart (By scaling down to 0 and then scaling up to 3)

ioolkos · 2023-04-11T07:11:58Z

@GoushikaaMoorthi What VerneMQ packages do you use? self-built or EULA-based?
So this issue showed up only after the OS Upgrade to Bottlerock?

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

GoushikaaMoorthi · 2023-04-11T07:30:14Z

What VerneMQ packages do you use? self-built or EULA-based?

can you please explain what you mean by self-built? We are building docker image on top of vernemq:1.12.3, as part of this we are updating only vm.args file

So this issue showed up only after the OS Upgrade to Bottlerock?

Yes, after upgrade, subsequent restarts results in increasing CPU usage. CPU usage is not getting reduced

ioolkos · 2023-04-11T07:36:53Z

@GoushikaaMoorthi
I ask about the package because the binary releases need a paid subscription for commercial use.
The easy test is this: if you need to set accept_eula=yes in the vernemq.conf file to make VerneMQ start, then you are using a package requiring a subscription.

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

GoushikaaMoorthi · 2023-04-11T07:59:48Z

Yes, we are setting accept_eula=yes this variable

ioolkos · 2023-04-11T08:04:48Z

@GoushikaaMoorthi then you are using the packages in non-compliant fashion. Please get in touch for clarification (info at vernemq.com)

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

GoushikaaMoorthi · 2023-04-11T10:06:07Z

We will check into it. But will this impact CPU usage?
We have been using the same image in all our environments, but we are facing connectivity issue in only one environment. Where we see increase in queue initialisation, message drop due to full queue and memory allocation metrics

ioolkos · 2023-06-03T20:21:30Z

potential fix: #2162

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

ioolkos · 2023-06-16T06:37:36Z

@GoushikaaMoorthi any update?

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

localbubble · 2023-11-16T07:55:14Z

Hi, I can confirm the issue is still happening on VerneMQ version 1.13.0

ioolkos · 2023-11-16T11:07:43Z

@localbubble can you tell us what you did to see it?

Here's a bit of context: a VerneMQ cluster needs handling of node leaves/joins and restarts in a more careful way than what might be expected, since it's not a stateless cluster. One of the points would be that joining nodes should be empty, that is not already loaded with traffic or history. If not, the result might be "empty synchronization" attempts leading to increased CPU.

👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

localbubble · 2023-11-17T14:35:52Z

Hi @ioolkos
,
We have VerneMQ deployed on Kubernetes, and it is clustered with 2 VerneMQ instances.

We're using the VerneMQ helm chart for deployment (version 1.9.0), and our deployment uses PVCs for the VerneMQ statefulsets.

We produce our own VerneMQ docker image for the deployment (version 1.13.0).

The docker image we're producing for VerneMQ is pretty much the same with the official VerneMQ docker image, and if needed, I can provide more details about both the helm chart and the docker image. But as a rule of thumb, we try our best to stick to the defaults, meaning we don't have any customizations neither for the docker image creation nor for the helm chart usage, and we're simply replicating the official way of creating the docker image.

On top of the docker image, we have the following two custom plugins.

vmq_clienttopics
vmq_topic_validator

This high CPU consumption issue is not happening on the initial deployment for our 2 clusters.

The issue sometimes happen after we simply restart VerneMQ without any changes for a maintenance operation and/or it sometimes happen after we change some configurations on VerneMQ, like an environment variable change for user/password, therefore causing a restart on the VerneMQ pods.

Unfortunately, I haven't seen any unusual/weird logs once this issue is triggered, and the only thing we see is that CPU consumption is increased drastically, from 100m to 2000m-3000m, and causing high pressure on the node that is deployed.

And the workaround we found is to simply restart the VerneMQ cluster until the CPU consumption is back to normal.

Lastly, this issue happens on our both clusters where, in one cluster we have around 20 clients connected in total and in the other cluster, there are more than 4000 clients connected.

Please let me know if I should provide more information,

Best Regards!

mths1 · 2023-11-18T18:07:26Z

My recommendation would be to check which erlang processes are causing the huge memory load, e.g. with recon & friend. That might give a hint for further investigation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage after restart #2139

High CPU usage after restart #2139

GoushikaaMoorthi commented Apr 11, 2023

ioolkos commented Apr 11, 2023

GoushikaaMoorthi commented Apr 11, 2023 •

edited

ioolkos commented Apr 11, 2023

GoushikaaMoorthi commented Apr 11, 2023

ioolkos commented Apr 11, 2023

GoushikaaMoorthi commented Apr 11, 2023

ioolkos commented Jun 3, 2023

ioolkos commented Jun 16, 2023

localbubble commented Nov 16, 2023

ioolkos commented Nov 16, 2023

localbubble commented Nov 17, 2023

mths1 commented Nov 18, 2023

High CPU usage after restart #2139

High CPU usage after restart #2139

Comments

GoushikaaMoorthi commented Apr 11, 2023

ioolkos commented Apr 11, 2023

GoushikaaMoorthi commented Apr 11, 2023 • edited

ioolkos commented Apr 11, 2023

GoushikaaMoorthi commented Apr 11, 2023

ioolkos commented Apr 11, 2023

GoushikaaMoorthi commented Apr 11, 2023

ioolkos commented Jun 3, 2023

ioolkos commented Jun 16, 2023

localbubble commented Nov 16, 2023

ioolkos commented Nov 16, 2023

localbubble commented Nov 17, 2023

mths1 commented Nov 18, 2023

GoushikaaMoorthi commented Apr 11, 2023 •

edited