Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with %matplotlib inline #7270

Open
evyasonov opened this issue Dec 19, 2014 · 23 comments
Open

Memory leak with %matplotlib inline #7270

evyasonov opened this issue Dec 19, 2014 · 23 comments

Comments

@evyasonov
Copy link

Hey everyone

I've found a problem. Just launch the code and look at the memory. Then delete "%matplotlib inline" and launch again.

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150

OUTPUT_FILENAME = "Asd"

def printHTML(html):
    with open(OUTPUT_FILENAME, "a") as outputFile: outputFile.write(html if type(html) == str else html.encode('utf8') )

def friendlyPlot():

    figure = plt.Figure()
    ax = plt.subplot2grid((1,2), (0,0))

    ax.plot( range(1000), range(1000) )


    #plt.show() 
    fig = plt.gcf()

    imgdata = StringIO.StringIO()
    fig.savefig(imgdata, format='png')
    imgdata.seek(0)  # rewind the data
    image = imgdata.buf.encode('base64').replace('\n', '')
    printHTML('<img src="data:image/png;base64,{0}" /><br />'.format(image))
    plt.close('all')
    imgdata.close()

open(OUTPUT_FILENAME, 'w').close()

for i in range(500):
    friendlyPlot()
@evyasonov evyasonov changed the title Memory leak Memory leak with %matplotlib inline Dec 19, 2014
@ellisonbg ellisonbg added this to the 4.0 milestone Jan 11, 2015
@minrk minrk modified the milestones: 4.1, 4.0 Jul 11, 2015
@den-run-ai
Copy link

I hit this bug as well, is there any way to get inline plots without memory leaks? I do not want to launch separate processes for each plot, since the arrays are quite large.

@takluyver
Copy link
Member

Can you check this when memory usage increases:

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)

That's a list where figures are being stored. They should be there only temporarily, but maybe they're building up without getting cleared.

@den-run-ai
Copy link

len(IPython.kernel.zmq.pylab.backend_inline.show._to_draw)=0

BTW, I'm plotting using .plot() method on pandas dataframes.

@takluyver
Copy link
Member

OK, so much for that theory.

It's possible pandas keeps some data around plots internally as well. The original report doesn't involve pandas, though.

How much memory does each additional plot appear to add?

@den-run-ai
Copy link

ok, this seems to be my case, I was using pandas 0.16.0, but the issue is fixed in master:

pandas-dev/pandas#9814

@takluyver
Copy link
Member

Great, thanks. Leaving open since the original report didn't involve pandas.

@tacaswell
Copy link
Contributor

This can be reproduced more simply:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()

This does not leak memory so it is something on the IPython side not the pyplot side (I think).

import matplotlib
matplotlib.use('agg')
import matplotlib.pyplot as plt
import matplotlib.ticker



import os
import sys
import StringIO
import urllib, base64

from matplotlib import rcParams

rcParams['figure.figsize'] = (24, 6)
rcParams['figure.dpi'] = 150



def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')


for i in range(500):
    friendlyPlot()

@asteppke
Copy link
Contributor

@tacaswell With your test code IPython on Windows 7 consumes here approximately 1.7GB which are not freed afterwards. Running with a slightly higher number of iterations leads to a memory error. So this is still an issue.

@tacaswell
Copy link
Contributor

@asteppke The first or second block?

@asteppke
Copy link
Contributor

@tacaswell With your first test code (%matplotlib inline) memory consumption goes up to 1.7GB. In contrast when using the second piece (matplotlib.use('agg')) memory usage oscillates only between 50MB and 100MB.

Both tests are executed with Python 3.4 and IPython notebook version 4.0.5.

@takluyver
Copy link
Member

I've played with this a bit more. I notice that if I re-run the for loop in @tacaswell's example a few times, memory usage doesn't increase - it seems to be the number you create in a single cell that matters. IPython certainly keeps a list of all the figures generated in the cell for the inline backend, but that list is quite definitely being cleared after the cell runs, which doesn't make memory usage drop, even after doing gc.collect().

Could our code be interacting badly with something in matplotlib? I thought _pylab_helpers.Gcf looked likely, but it doesn't seem to be holding on to anything.

I tried grabbing a reference to one of the figures and calling gc.get_referrers() on it; apart from the reference I had in user_ns, all the others looked like mpl objects - presumably many of them are in reference loops. What object is it most likely something else would be inappropriately keeping a reference to?

@takluyver
Copy link
Member

I'm dropping this to milestone 'wishlist'. We want to fix it, but at the moment we're not sure how to make further progress in identifying the bug, and I don't think it's worth holding up releases for it.

Anyone who can make progress gets brownie points. Also cake.

@takluyver takluyver modified the milestones: 4.1, wishlist Jan 25, 2016
@lucasb-eyer
Copy link

Not really progress, but the memory seems to be lost somewhere inside the kernel. Neither does calling gc.collect() after or inside the loop help, and summary.print_(summary.summarize(muppy.get_objects())) doesn't find any of the leaked memory. Neither does setting all _N and _iN to None help. It's really mysterious.

@takluyver
Copy link
Member

I also wondered if it was creating uncollectable objects, but those should end up in gc.garbage when there are no other references to them, and that's still empty when I see it using up loads of RAM.

I think someone who knows about these things is going to have to use C-level tools to track down what memory is not getting freed. There's no evidence of extra Python objects being kept alive anywhere we can find.

@thenomemac
Copy link

I'll second that a fix on this issue would be appreciated.

@takluyver
Copy link
Member

We know, but at present no-one has worked out the cause of the bug.

@daidoji
Copy link

daidoji commented Oct 24, 2016

+1

1 similar comment
@akapocsi
Copy link

+1

@den-run-ai
Copy link

BTW, I'm still hitting this issue from time to time on latest matplotlib, pandas, jupyter, ipython. If anyone knows any debugger that can help to troubleshoot this multi-process communication, then please let me know.

@akapocsi
Copy link

Could it perhaps have anything to do with the browser cache mechanism?

@takluyver
Copy link
Member

Good thought, but I don't think so. It's IPython's process taking up extra memory, not the browser, and
@tacaswell's reproduction doesn't involve sending plots to the browser.

@lucasb-eyer
Copy link

Hi, I believe I have found part of the culprit and a way to significantly, but not completely, reduce this problem!

After scrolling through the ipykernel/pylab/backend_inline.py code, I got the hunch that interactive mode does a lot of storing of "plot-things", though I don't understand it completely, so I am not able to pinpoint the exact reason with certainty.

Here is the code to verify this (based on @tacaswell's snippet above), useful for anyone trying to implement a fix.

Initialization:

import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker

%matplotlib inline

matplotlib.rcParams['figure.figsize'] = (24, 6)
matplotlib.rcParams['figure.dpi'] = 150

from resource import getrusage
from resource import RUSAGE_SELF

def friendlyPlot():
    fig, ax = plt.subplots()
    ax.plot(range(1000))
    fig.savefig('tmp.png')
    plt.close('all')

Actual test:

print("before any:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
friendlyPlot()
print("before loop: {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
for i in range(50):
    friendlyPlot()
print("after loop:  {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))
import gc ; gc.collect(2)
print("after gc:    {:7d} kB".format(getrusage(RUSAGE_SELF).ru_maxrss))

Running it for 50 iterations of the loop, I get:

before any:    87708 kB
before loop:  106772 kB
after loop:   786668 kB
after gc:     786668 kB

Running it for 200 iterations of the loop, I get:

before any:    87708 kB
before loop:  100492 kB
after loop:  2824316 kB
after gc:    2824540 kB

which shows the almost linear increase in memory with iterations.

Now to the fix/workaround: call matplotlib.interactive(False) before the test-snippet, and then run it.

With 50 iterations:

before any:    87048 kB
before loop:  104992 kB
after loop:   241604 kB
after gc:     241604 kB

And with 200 iterations:

before any:    87536 kB
before loop:  103104 kB
after loop:   239276 kB
after gc:     239276 kB

Which confirms that only a constant increase (independent of iterations) is left.

Using these numbers, I make a rough estimate of the leak size per iteration:

(786668-(241604 - 104992))/50   = 13001.12
(2824316-(241604 - 104992))/200 = 13438.52

And for a single iteration of the loop, I get 13560. So the amount of leak per iteration is significantly smaller than the image size, be it raw (>3MB) or png-compressed (54KB).

Also, strangely, running a small-scale test (only few iterations) repeatedly in the same cell without restarting the kernel is much less consistent, I have not been able to understand this or determine a pattern.

I hope someone with more knowledge of the internals can take it from here, as I lack the time and knowledge to dive deeper into it right now.

@fedral
Copy link

fedral commented Aug 18, 2018

it works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests