Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in 2.5.0 #218

Open
bklang opened this issue Mar 24, 2014 · 19 comments
Open

Memory leak in 2.5.0 #218

bklang opened this issue Mar 24, 2014 · 19 comments

Comments

@bklang
Copy link
Member

bklang commented Mar 24, 2014

This is a continuation of the issues described in #217 as that PR was closed when the branch was merged.

Ping @sfgeorge, @benlangfeld.

Last comment:

From @sfgeorge:

Yes, the situation is significantly better by more than an order of magnitude! Under JRuby, I was able to run almost 40,000 calls (as opposed to just 1,700). This is a huge leap, but I'm hoping that we can do even better.
Using afb3c6b, and the same test setup, I warmed up to 8cps and things stayed stable for over an hour.
At first Full GCs were reclaiming a lot of memory, occurring every ~70 seconds
But after almost 40,000 calls, Full GCs occurred at least once per second.
After ~40,000 call failures started to occur.
The complete gc log is here
I can bump Gemfile[.lock] on the repo to reflect that I've been testing Punchblock's new feature/dtmf_recognizer_leak, would that be helpful?

From @bklang:

No, that's not necessary. Just the ObjectSpace information and any results you find by bisecting between the working and broken version (with the backported fix for the big leak) are the most important things at this point.

Example app: https://github.com/sfgeorge/punchblock-issue-memleak

@benlangfeld
Copy link
Member

After running through 50k calls on Ruby 2.1.1 on both Asterisk and FreeSWITCH, I see memory usage remain fairly stable. @sfgeorge, please provide confirmation of your test results on both MRI and JRuby.

@sfgeorge
Copy link
Member

@benlangfeld Was that using afb3c6b? I definitely hit trouble after 40,000 calls on that w/ JRuby as mentioned here.
If you have a new version for me to test out, let me know.

@benlangfeld
Copy link
Member

That was using the HEAD of develop. My previous request stands :)

@benlangfeld
Copy link
Member

Basically what I'd like is you to update everything documented in your repo with HEAD of develop as of now, including all results, please :)

@benlangfeld
Copy link
Member

Also note @sfgeorge, that you might see performance improvements by upgrading to JRuby 1.7.11 which provides a thread pool for Fibers, drastically reducing Thread allocations. JRuby 1.7.4 is quite old now. Additionally, MRI 2.1.1 will provide a huge performance boost over 1.9.3. Don't be scared of interpreter upgrades! ;)

@sfgeorge
Copy link
Member

Running these tests takes a while, I don't know if it's worth it to re-run at this point. I say that because the only change to Punchblock's between afb3c6b..develop was to the Guardfile, which won't impact the results of this test.

Upgrading JRuby to 1.7.11 is a worthy change for performance reasons, but I do not believe that that is the cause of this memory leak. Do you agree?

@benlangfeld
Copy link
Member

I do not believe it is the cause of the memory leak.

I cannot commit time to chase a memory leak that I cannot reproduce easily as I stated above. I'm looking for you to produce results similar to the ones you provided initially, only after the fix to DTMF recognizers (be it ec30430 or develop), on both MRI and JRuby in order to either support or more fully refute my claim that there is no longer a leak.

In the absence of such suitable documentation, I'm going to have to close the issue as resolved based on the results of the testing I have been able to perform in the time I have available.

@sfgeorge
Copy link
Member

Ben, I don't understand. I mentioned before that I have already tested the DTMF recognizers fix in a version of Punchblock that is functionally equivalent to ec30430, and included a gc log that shows the slow but steady growth of memory during that test.

Can you clarify what the goal is of re-testing here?

  • A. There is something more recent than afb3c6b to test that may improve/change results of the test. If so, please let me know what that is.
  • B. You want more types of logs than I provided before. If so, please specify what those are.

Please don't close this issue. 😄

@benlangfeld
Copy link
Member

I tested only on CRuby (2.1.1, to be exact). I did not experience any noticeable memory growth over 50k calls, testing twice. My tests were not exhaustive, and I may have missed something, but they gave me reasonable confidence that there is not a serious problem running on CRuby. I would like you to test sufficiently that you are confident that I am right, or able to present a solid enough case that I am wrong that I can justify taking time to look into it again.

As for JRuby, I did not test. I would like us to come to some agreement on the presence of the problem on CRuby first, to rule out this being some issue in JRuby itself. I am not interested in JRuby until we can say we're leak-free on CRuby. If we can say that, but JRuby exhibits problems, I would only be interested in testing on the latest JRuby release.

@sfgeorge
Copy link
Member

Ok, cool. I'll give this a go on C-Ruby 2.1.1 and JRuby 1.7.11 with ec30430 and share results.

@sfgeorge
Copy link
Member

sfgeorge commented Apr 3, 2014

Ok, I gave Punchblock 2.5.1 a run several times on ruby-2.1.0 via rvm (had trouble compiling ruby-2.1.1). There's definitely slow, but steady memory growth on the resident image size.

Taking a gross measurement starting after the first 10,000 calls, it seems that there an average growth of ~270 Bytes per call.

Memory growth vs. calls on Punchblock 2.5.1

Here's graphs & stats I collected when running against Punchblock 2.5.1..

Since it was CRuby I didn't need to prime it, but drew up slowly anyway so I could more easily measure from the "baseline" footprint as a starting point. Here was the velocity of the test:

  • minute 0: 1 call per minute
  • minute 1: 12 calls per minute
  • minute 2-N: 8 calls per second

@benlangfeld
Copy link
Member

Thanks for that comprehensive detail, @sfgeorge. I'll review it as soon as I get a chance, probably next week.

@sfgeorge
Copy link
Member

sfgeorge commented Apr 3, 2014

Sounds good, thanks Ben.

@chewi
Copy link
Contributor

chewi commented Apr 17, 2014

I've seen growing RAM usage but I chalked it up to a leaky Ruby 2.1 rather than Adhearsion. Could be wrong. Or could be both?

@chewi
Copy link
Contributor

chewi commented Apr 17, 2014

You may have seen this, it's been widely talked about.

http://www.omniref.com/blog/blog/2014/03/27/ruby-garbage-collection-still-not-ready-for-production/

But I am not denying that there is an issue in Punchblock, just saying that Ruby 2.1 may not be helping either.

@Jared-Prime
Copy link
Contributor

I take back my comment, since @sfgeorge did test against 2.1. I'm not aware of any JRuby 1.7.11 performance numbers.

@sfgeorge
Copy link
Member

@Jared-Prime

I take back my comment, since @sfgeorge did test against 2.1. I'm not aware of any JRuby 1.7.11 performance numbers.

Thanks, you are correct.. did that test against 2.1.0.

Previous test using the same code on JRuby 1.7.4 shows results that are dramatically worse - dying before even 40,000 calls with a 256MB heap.

Stopped and did not run a test on JRuby 1.7.11, because Ben said he'd like to see results on CRuby first.

@chewi

You may have seen this, it's been widely talked about.

http://www.omniref.com/blog/blog/2014/03/27/ruby-garbage-collection-still-not-ready-for-production/

But I am not denying that there is an issue in Punchblock, just saying that Ruby 2.1 may not be helping either.

Thanks, James. Yes, that article from Tim Robertson been a helpful article. Reviewed that a couple weeks ago.

<TL;DR>

A few grievances I have with this article:

  • Despite the article being published on Mar 27th, 2014, he never cites which version of "Ruby 2.1" he is testing on. Odd, seeing as how Ruby 2.1.1 was released on Feb 24, 2014. He should have known that he was being ambiguous.
  • He is citing the talk that Koichi gave in December before the 2.1.0 release. And he repeatedly knocks on flaws in memory management that Koichi highlighted and stated are addressed in the initial release of Ruby 2.1.0. In more than one instance, out-of-context quotes are used to pin-up Koichi as if the flaws he's highlighting have not been addressed.

Mis-quote Number 1

...Run this for long enough, and you’ll see your Ruby process consume more and more memory, in ever-wider swings (I let it gobble up over 2GB on my system before I killed it). But don’t take my word for it: ask Koichi Sasada, the author of the generational GC in Ruby 2.1:

“Some ‘short-lived’ young objects will be promoted to ‘old-gen’ accidentally….if such ‘short-lived’ objects consume huge memory we need to free such objects.”

Reading the full talk actually reveals that Koichi introduced the following mechanism with 2.1.0 to address that very problem:

New generational GC algorithm allows mixing "Write-barrier protected objects" and "WB unprotected objects"

(from Page 30 of Koichi's talk).

Mis-quote Number 2

Tim also partially-quotes Koichi on this:

also:

“We cannot measure how oldgen objects consume [memory collectively]….a few oldgen objects can grab huge memory”

(from page 83 of Koichi's talk)

But Tim failed to reference Koichi's existing solution to the problem, mentioned on the next page:

"We implemented this feature and evaluating it now"

(from page 84 of Koichi's talk)

Furthermore, on page 85... the laundry list of remaining tasks that Koichi stated was needed "Future work For smarter object management" in Ruby beyond 2.1.0 shows that Koichi's remaining concerns around RGenGC revolve around performance, not leaks. That doesn't prove that there aren't leaks, of course, just saying that this talk isn't the "everything's borked" exposé that it's made out to be.

</TL;DR>

Sorry. 😊

@chewi
Copy link
Contributor

chewi commented Apr 17, 2014

I am aware that much of the article was debunked in this other article but this one did acknowledge that there is still an issue, which will be fixed in 2.1.2.

http://samsaffron.com/archive/2014/04/08/ruby-2-1-garbage-collection-ready-for-production

@sfgeorge
Copy link
Member

Thanks, I did not see that article. Good to know that there's the malloc_limit bug that is fixed in Ruby Head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants