Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU usage after restart #2139

Open
GoushikaaMoorthi opened this issue Apr 11, 2023 · 12 comments
Open

High CPU usage after restart #2139

GoushikaaMoorthi opened this issue Apr 11, 2023 · 12 comments

Comments

@GoushikaaMoorthi
Copy link

We are running vernemq in AWS EKS setup. Below are the setup details
Vernemq version: 1.12.3
No of pods: 3
OS: Bottlerocket OS
EKS version: 1.24
Machine type: m5.2xlarge

After Os update from AWS linux to bottlerocket, subsequent EKS version upgrade results in increase in CPU usage

Expectation:
CPU usage should remain same as before after restart

*Actual Behaviour:
After restart CPU usage keeps on increasing. There is no increase in no of MQTT connections.
CPU Usage:
cpu
Connections:
open-connection

We could also see the difference in queue and memory allocation metrics
Queue:
queue
Memory Allocation:
memory

will the increase in queue initialisation be the reason for increase in CPU usage?

We also tried the options suggested in this issue, but the increase in CPU usage remains the same

  • We tried restarting broker
  • We also tried complete restart (By scaling down to 0 and then scaling up to 3)
@ioolkos
Copy link
Contributor

ioolkos commented Apr 11, 2023

@GoushikaaMoorthi What VerneMQ packages do you use? self-built or EULA-based?
So this issue showed up only after the OS Upgrade to Bottlerock?


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@GoushikaaMoorthi
Copy link
Author

GoushikaaMoorthi commented Apr 11, 2023

What VerneMQ packages do you use? self-built or EULA-based?

  • can you please explain what you mean by self-built? We are building docker image on top of vernemq:1.12.3, as part of this we are updating only vm.args file

So this issue showed up only after the OS Upgrade to Bottlerock?

  • Yes, after upgrade, subsequent restarts results in increasing CPU usage. CPU usage is not getting reduced

@ioolkos
Copy link
Contributor

ioolkos commented Apr 11, 2023

@GoushikaaMoorthi
I ask about the package because the binary releases need a paid subscription for commercial use.
The easy test is this: if you need to set accept_eula=yes in the vernemq.conf file to make VerneMQ start, then you are using a package requiring a subscription.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@GoushikaaMoorthi
Copy link
Author

Yes, we are setting accept_eula=yes this variable

@ioolkos
Copy link
Contributor

ioolkos commented Apr 11, 2023

@GoushikaaMoorthi then you are using the packages in non-compliant fashion. Please get in touch for clarification (info at vernemq.com)


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@GoushikaaMoorthi
Copy link
Author

We will check into it. But will this impact CPU usage?
We have been using the same image in all our environments, but we are facing connectivity issue in only one environment. Where we see increase in queue initialisation, message drop due to full queue and memory allocation metrics

@ioolkos
Copy link
Contributor

ioolkos commented Jun 3, 2023

potential fix: #2162


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@ioolkos
Copy link
Contributor

ioolkos commented Jun 16, 2023

@GoushikaaMoorthi any update?


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@localbubble
Copy link

Hi, I can confirm the issue is still happening on VerneMQ version 1.13.0

@ioolkos
Copy link
Contributor

ioolkos commented Nov 16, 2023

@localbubble can you tell us what you did to see it?

Here's a bit of context: a VerneMQ cluster needs handling of node leaves/joins and restarts in a more careful way than what might be expected, since it's not a stateless cluster. One of the points would be that joining nodes should be empty, that is not already loaded with traffic or history. If not, the result might be "empty synchronization" attempts leading to increased CPU.


👉 Thank you for supporting VerneMQ: https://github.com/sponsors/vernemq
👉 Using the binary VerneMQ packages commercially (.deb/.rpm/Docker) requires a paid subscription.

@localbubble
Copy link

Hi @ioolkos
,
We have VerneMQ deployed on Kubernetes, and it is clustered with 2 VerneMQ instances.

We're using the VerneMQ helm chart for deployment (version 1.9.0), and our deployment uses PVCs for the VerneMQ statefulsets.

We produce our own VerneMQ docker image for the deployment (version 1.13.0).

The docker image we're producing for VerneMQ is pretty much the same with the official VerneMQ docker image, and if needed, I can provide more details about both the helm chart and the docker image. But as a rule of thumb, we try our best to stick to the defaults, meaning we don't have any customizations neither for the docker image creation nor for the helm chart usage, and we're simply replicating the official way of creating the docker image.

On top of the docker image, we have the following two custom plugins.

  1. vmq_clienttopics
  2. vmq_topic_validator

This high CPU consumption issue is not happening on the initial deployment for our 2 clusters.

The issue sometimes happen after we simply restart VerneMQ without any changes for a maintenance operation and/or it sometimes happen after we change some configurations on VerneMQ, like an environment variable change for user/password, therefore causing a restart on the VerneMQ pods.

Unfortunately, I haven't seen any unusual/weird logs once this issue is triggered, and the only thing we see is that CPU consumption is increased drastically, from 100m to 2000m-3000m, and causing high pressure on the node that is deployed.

And the workaround we found is to simply restart the VerneMQ cluster until the CPU consumption is back to normal.

Lastly, this issue happens on our both clusters where, in one cluster we have around 20 clients connected in total and in the other cluster, there are more than 4000 clients connected.

Please let me know if I should provide more information,

Best Regards!

@mths1
Copy link
Contributor

mths1 commented Nov 18, 2023

My recommendation would be to check which erlang processes are causing the huge memory load, e.g. with recon & friend. That might give a hint for further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants