You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Virtual Server (8 vCPU, 16 GB RAM) getting unhealthy and unreachable from time to time. First investigation shows that Clickhouse is not respecting ENV MAX_MEMORY_USAGE_RATIO = 0.3. See output of top:
Sometimes, the server comes back after about 20min, many times a hard reboot is necessary. Sounds like heavy swapping. If am now trying to get logs of an outage event and post more information as soon as available.
Event ID
No response
The text was updated successfully, but these errors were encountered:
I can now confirm it is a memory (RAM) issue. I was able to have a look at top while crashing. The server starts swapping at some point. As the server didn't have much swap space it got out of memory. I increased that to a high value and hope that prevents the crashes.
I am not sure if this is a dupe of getsentry/self-hosted#2700 and if it is Clickhouse causing the problem. At the point of swapping Clickhouse's memory usage didn't grow very much. Perhaps I can now see more as another such event would have more place to grow into Swap Memory. As RAM consumption is about 95% all the time it could be any service. I also can not see a higher amount of Sentry Events going in or anything suspect in global throughput statistics just before.
In any case I think it should be investigated why Clickhouse doesn't respect MAX_MEMORY_USAGE_RATIO.
Self-Hosted Version
24.1.0.dev0
CPU Architecture
x86_64
Docker Version
25.0.1
Docker Compose Version
2.24.2
Steps to Reproduce
Expected Result
Sentry should stay healthy.
Actual Result
Virtual Server (8 vCPU, 16 GB RAM) getting unhealthy and unreachable from time to time. First investigation shows that Clickhouse is not respecting ENV MAX_MEMORY_USAGE_RATIO = 0.3. See output of top:
Sometimes, the server comes back after about 20min, many times a hard reboot is necessary. Sounds like heavy swapping. If am now trying to get logs of an outage event and post more information as soon as available.
Event ID
No response
The text was updated successfully, but these errors were encountered: