Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak? #190

Open
grudzien opened this issue Jul 10, 2018 · 5 comments
Open

Memory Leak? #190

grudzien opened this issue Jul 10, 2018 · 5 comments
Labels

Comments

@grudzien
Copy link

I have searched through the issues of both slack-ruby-bot and celluloid for issues of a memory leak and I haven't seen anything. I initially discovered an issue running a Slack Bot in AWS where my bot would eventually leak enough memory to be OOM killed by Ubuntu 16.04. We tried moving the bot from 256M to 512M to 1026M to 2048M and no matter how much we gave it, the bot would eventually consume all memory of the box. To simplify the issue I took the standard Ubuntu 16.04 image from AWS, patched it and installed ruby and the proper gems and ran the ping bot. In the last 24 hours it has gone from 54M of ram to 102M of ram. Here are the traits I have noticed:

  1. The web socket never reconnects.
  2. The ruby heap as reported by 'objspace' is not growing out of control (it's not growing at all)
  3. Running strace shows an mmap (allocation) every 2 minutes for the Slack health check of 524288 bytes
  4. munmap is called during GC freeing up those 524288 blocks but it allocates far faster than it frees.
  5. The bot is completely idle otherwise.
  6. The only thing in the debug log is a celluloid read and write every two minutes.
  7. The bot leaks memory in a linear fashion for about 8 hours then flatlines for 10-24 hours then continues to leak.
  8. Happens inside and outside of docker.
  9. Tested on ubuntu 16.04, 18.04, and Alpine Linux 2

I am trying to avoid radical troubleshooting like jemalloc and recompiling ruby with more debugging. If anyone has any suggestions or has experience with this I would appreciate the help. I am about one or two more days from ditching the project.

My current install (I have tried three different versions of ruby)
OS: Ubuntu 16.04 4.4.0-1061-aws #70-Ubuntu
Ruby: ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu]
Gems:
activesupport (5.2.0)
aws-eventstream (1.0.1)
aws-partitions (1.94.0)
aws-sdk-core (3.22.0)
aws-sdk-dynamodb (1.8.0)
aws-sigv4 (1.0.2)
bigdecimal (1.2.8)
binary_struct (2.1.0)
bundler (1.11.2)
celluloid (0.17.3)
celluloid-essentials (0.20.5)
celluloid-extras (0.20.5)
celluloid-fsm (0.20.5)
celluloid-io (0.17.3)
celluloid-pool (0.20.5)
celluloid-supervision (0.20.6)
concurrent-ruby (1.0.5)
contracts (0.16.0)
did_you_mean (1.0.0)
dry-configurable (0.7.0)
dry-container (0.6.0)
dry-core (0.4.7)
dry-equalizer (0.2.1)
dry-inflector (0.1.2)
dry-logic (0.4.2)
dry-types (0.13.2)
dry-validation (0.12.0)
faraday (0.15.2)
faraday_middleware (0.12.2)
gli (2.17.1)
hashie (3.5.7)
heapy (0.1.3)
hitimes (1.3.0)
httpclient (2.8.3)
i18n (1.0.1)
io-console (0.4.5)
jmespath (1.4.0)
json (2.1.0, 1.8.3)
minitest (5.11.3, 5.8.4)
molinillo (0.4.3)
multipart-post (2.0.0)
net-http-persistent (2.9.4)
net-telnet (0.1.1)
nio4r (2.3.1)
power_assert (0.2.7)
psych (2.0.17)
rake (10.5.0)
rdoc (4.2.1)
slack-ruby-bot (0.10.5)
slack-ruby-client (0.11.1)
sysrandom (1.0.5)
test-unit (3.1.7)
thor (0.20.0, 0.19.1)
thread_safe (0.3.6)
timers (4.1.2)
tss (0.5.0)
tzinfo (1.2.5)
websocket-driver (0.7.0)
websocket-extensions (0.1.3)
ztimer (0.6.0)

@kstole
Copy link
Collaborator

kstole commented Jul 10, 2018

This sounds very similar to some issues I've experienced although I haven't looked very far into them. I have SlackRubyBot running in AWS as well and every so often the websocket will disconnect but the bot will stay running. So far, I've just solved it by restarting the bot, but I'd really like to get to the bottom of this. There was also one time where the websocket appeared to stay connected (according to Slack) but the bot wasn't responding to requests and when I checked the docker container, it said it was uusing 100% CPU (although maybe this was a one-off).

@dblock
Copy link
Collaborator

dblock commented Jul 10, 2018

Likely related, slack-ruby/slack-ruby-client#208

@grudzien
Copy link
Author

grudzien commented Jul 10, 2018

I guess I should clarify my post. I stated the web socket is not reconnecting. What I meant was the bot is NOT disconnecting. I had thought it was a disconnect/reconnect issue but that does not appear to be happening. Its just a linear memory leak. I am still going through #208 to see if there are similarities.

edit
I have been tracking the source port number for the last day and a half and it hasn't changed.

@dblock
Copy link
Collaborator

dblock commented Jul 11, 2018

Oh so you have a bot that's online just fine that's leaking memory? That's not good :) I would find a way to dump the difference and see what objects are leaking (could be something in your code too).

@dblock dblock added the bug? label Jul 11, 2018
@dblock
Copy link
Collaborator

dblock commented Jul 11, 2018

I think https://stackoverflow.com/questions/20385767/finding-the-cause-of-a-memory-leak-in-ruby has pretty good information overall. I would aggressively GC.collect somewhere in the code/library and start dumping what's allocated to see a pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants