Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible Memory Leak when Re-opening Notebooks #8267

Closed
Morgan243 opened this issue Apr 7, 2015 · 20 comments
Closed

Possible Memory Leak when Re-opening Notebooks #8267

Morgan243 opened this issue Apr 7, 2015 · 20 comments
Milestone

Comments

@Morgan243
Copy link

I've been having trouble keeping an IPython notebook server up and running since it eventually uses up most of the 4GB of system memory. I eventually get to the point where attempting to launch a notebook results in 'OSError: [Errno 12] Cannot allocate memory'.

The notebook kernel's themselves never use much memory, though I'm displaying some figures (using display in order to avoid %matplotlib inline because of #7270) and printing some output. I could understand if the server was caching this output in case I opened the notebook again - but it appears to allocate substantial amounts of memory on each launch of the notebook. Of course, the real problem is that the memory usage never appears to decrease. I can leave all notebooks off for days and memory usage never reduces.

Below are some samples of memory usages I just ran. Each entry, except for the initial one, is taken after starting then stopping the notebook. I didn't run anything in the kernel after starting the notebooks.

The "Larger Notebook" has several images displayed along with the outputs of larger print statements. It's pretty easy to see a rapid increase in memory as I start and restart this notebook.

The "Smaller notebook" only has some output from relatively small print statments. There are no images and overall there aren't many cells. However, you can still see the memory usage increasing for the notebook server.

Larger Notebook
Size(KB),RSS(KB),PSS(KB)
8164,8140,8140
215812,215636,215636
383544,383368,383368
479368,479192,479192
652220,652048,652048

Smaller Notebook
Size(KB),RSS(KB),PSS(KB)
8620,8536,8536
10564,10328,10328
10564,10332,10332
10564,10512,10512
11384,11236,11236

I'm running this on RHEL 6 and the latest stable Python3(server)/Python2(kernel) compiled from source. I tried the master branch of IPython last week, but I still had this issue.

Any insight on this? Is there somethine I should be doing differently in order to keep an IPython notebook server up long-term?

@filmor
Copy link
Contributor

filmor commented Jun 17, 2015

I'm having similar problems with Python 3.4 in both kernel and server, IPython version 3.1 on a Windows Server 2008. Here it's running on 64bit so it doesn't die at 4 GiB and is currently taking > 6 GiB of RAM. Is there any way to debug this?

Here it seems like every Save operation takes memory that is not released afterwards.

@Carreau Carreau added this to the 4.0 milestone Jun 17, 2015
@takluyver
Copy link
Member

@minrk , you keep servers running for long periods, have you seen anything like this?

It sounds like something is keeping a reference to the notebook models that are being saved. I can't see anything in our own code that's doing that, but it's possible that something in the way we're using tornado's async magic is keeping references somehow.

@minrk
Copy link
Member

minrk commented Jun 17, 2015

I haven't seen this, no. It's possible the leak would have been in pyzmq or libzmq. @filmor @Morgan243 what version of pyzmq/zeromq are you running?

@filmor
Copy link
Contributor

filmor commented Jun 18, 2015

I'm running the version that's distributed by Anaconda, currently pyzmq 14.6.0. The used libzmq version is 4.0.5. The revision log at https://raw.githubusercontent.com/zeromq/zeromq4-x/master/NEWS indicates a memory leak in PUB and PUSH sockets, are those used?

Interestingly, a few hours after I commented here, the memory usage sharply dropped from 7 to 1 GiB (which still seems a lot for the notebook server IMO).

@takluyver Yes, I looked into this in particular, from a bit of stupid memory debugging with pympler it looks like each save of a nearly empty notebook adds 6280 lists and the same amount of strings, summing up to about 1 MiB of RAM usage even after garbage collection. I'm not sure whether I'm doing it right though, will have a closer look.

@Morgan243
Copy link
Author

I was running pyzmq 14.5.0 when I first opened this issue, but it looks like 14.6.0 is available. I'll upgrade today and see if I can reproduce the issue.

@takluyver
Copy link
Member

Saving nearly empty notebook adds 6k lists/strings? I was assuming it was keeping references to the notebook model somehow, but if that figure is accurate, there must be something else going on, because a nearly empty notebook wouldn't have that many data structures.

We do use a PUB socket for publishing output data, but that's not involved in saving notebooks at all. And the PUB side is in the kernel, not the server.

If the memory use suddenly dropped by multiple GB without you doing anything specific, that sounds like at least part of the problem is related to reference cycles not getting garbage collected. If you can still reproduce the issue in a newly started server, try adding a post_save_hook function (config docs) that calls gc.collect(). If that makes a significant difference, it confirms that reference cycles not being collected is at least part of the issue.

But if it only went down to 1GB, that sounds like garbage collection is not the only problem involved.

@Morgan243
Copy link
Author

I updated pyzmp with pip3 to 14.6.0 and started a couple notebooks this morning. A few restarts throughout the day with plenty of saves. Below are some samples. Like before, the memory usage keeps increasing. I'll shutoff all notebooks and leave the server running to see if I get a drop like @filmor

Size(KB),RSS(KB),PSS(KB)
9452,9260,9260
11936,11852,11852
12200,12112,12112
29996,29756,29756
55964,55756,55756
86820,86728,86728
93216,93008,93008

@filmor
Copy link
Contributor

filmor commented Jun 19, 2015

@takluyver I'll try that. I did consider that reference cycles where a problem but there are not that many __del__s in the IPython codebase.

@Morgan243 To be clear, that drop happen during full productive use. Could have been that people went to lunch but they rarely shutdown notebooks explicitly.

@Morgan243
Copy link
Author

@filmor That's interesting. I just figure that if the notebook server's memory is ever going to drop, it would be after all notebooks are halted.

The server this morning, with no notebooks open, is now using 112240 KB total memory, up from the 93216 KB when I left it yesterday afternoon.

@takluyver I've added the post-save hook to perform gc.collect(). I'll run it like this today and see if anything changes.

@takluyver
Copy link
Member

Reference cycles can occur without __del__ methods - the significance of __del__ is that before Python 3.4, reference cycles containing objects with __del__ could not be collected. There were some changes to garbage collection in 3.4 which mean that now those can be collected.

The sudden drop in use that you saw made me think that a big chunk of the problem is reference cycles which can be cleaned up, but for some reason are not being. Python runs garbage collection based on a count of (allocations - deallocations), not a timer, so I wonder if that counter is not ticking up quickly enough, and it's taking a long time to actually run gc.

You both appear to be running the server on Python 3.4 (assuming @Morgan243's "latest stable Python3" is that). I suspect some of the changes in memory management that introduced may be involved. We had a problem with reference cycles being created before, but I forget the details. @minrk might remember better.

@Morgan243
Copy link
Author

@takluyver You are correct, I'm currently running 3.4.3 from back in March. Let me know if there is a later version you would like me to try.

So far, the addition of gc.collect() as a post-save hook doesn't seem to have impacted anything positively. I went ahead and logged the "unreachable" objects returned by collect - it fluctuates around, but it doesn't appear to be ever-increasing like the memory usage.

@takluyver
Copy link
Member

OK, that suggests that my hypothesis that garbage collection was just getting delayed is wrong.

It may still be that there are reference cycles being created which can't be cleaned up by garbage collection. I think that happens if an object in the cycle has a tp_del slot, the C-level equivalent of the __del__ method. @minrk , didn't something like this come up with zmq sockets?

@minrk
Copy link
Member

minrk commented Jun 19, 2015

Yes it did, but I'm struggling to remember the details. If this happens on open/save, it doesn't sound like zmq to me, though. We can verify that with extra rigor by just hitting the contents API directly, which won't trigger creating a kernel.

I'll see if I can reproduce any of these issues on my machines.

@minrk
Copy link
Member

minrk commented Jun 29, 2015

I have been able to reproduce this, and it appears to be related to gc changes in Python 3. I see unbounded memory growth on Python 3.4 each time I open or save a notebook (no kernels running), and not on Python 2.7. I'll try to see what I can find out about who's holding a reference to the notebook. What seems especially rough is that when I save a 50MB notebook, I see growth of ~250MB in memory.

cc @epifanio, who I believe is experiencing the same issue.

@minrk
Copy link
Member

minrk commented Jun 29, 2015

After poking around, it appears to be in the validation. Hopefully it's something simple that we are doing wrong, but it could be a problem in jsonschema.

@minrk
Copy link
Member

minrk commented Jun 29, 2015

I've reproduced this without jsonschema, and reported it over there: python-jsonschema/jsonschema#237. Shortly after opening, I discovered that my jsonschema wasn't the latest (it was 2.4.0). After updating to 2.5.1, the memory leak is gone.

@epifanio, @Morgan243, and @filmor can you report your version of jsonschema? If it's less than 2.5.1, can you upgrade with pip and see if you still see the same issue? I believe 2.4 is the latest packaged in conda.

@filmor
Copy link
Contributor

filmor commented Jun 30, 2015

I'll update the package today and will let you know tomorrow, when the server has been in use for a while. Awesome catch :)

@filmor
Copy link
Contributor

filmor commented Jul 1, 2015

That seems to have done the trick here. Memory usage is now stable around 50 MiB. Thank you very much for looking this up. What is the way to continue here? A warning when using an older version of jsonschema?

By the way, I can reproduce this issue with Python 3.3, so it's not likely to have something to do with changes to garbage collection in Python 3.4.

@Morgan243
Copy link
Author

I'm currently running jsonschema 2.4 as well, I'll upgrade now and get back with results later today or early tomorrow.

@minrk
Copy link
Member

minrk commented Jul 1, 2015

@filmor I was actually able to reproduce it on 2.7, so you are right, it doesn't seem to have to do with gc changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants