Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak for vectors and matrices #112

Open
NathanDunfield opened this issue Jan 15, 2022 · 15 comments · May be fixed by #113
Open

Memory leak for vectors and matrices #112

NathanDunfield opened this issue Jan 15, 2022 · 15 comments · May be fixed by #113
Labels

Comments

@NathanDunfield
Copy link

NathanDunfield commented Jan 15, 2022

With version 2.1.2, the following causes cypari2 to allocate memory at several GB per minute:

>>> import cypari2
>>> pari = cypari2.pari_instance.Pari()
>>> while True:
...     M = pari.matrix(10, 10, range(100))

Observed in Python 3.8 and 3.9 as part of SageMath 9.3-9.5 on both linux (gcc) and macOS (clang).

For what it's worth, the other cypari does not seem to have this problem.

Also filed at the SageMath trac.

@videlec
Copy link
Collaborator

videlec commented Jan 15, 2022

Thanks for the report. I confirm the behaviour on my machine as well.

@videlec videlec added the bug label Jan 15, 2022
@NathanDunfield
Copy link
Author

Interestingly, this leaks memory

v = pari.vector(10, range(10))

but this does not:

v = pari(range(10))

though they seem to result in the same PARI object.

@videlec
Copy link
Collaborator

videlec commented Jan 16, 2022

Yes these end up with the same pari object. But these two functions allocating memory in different ways.

@kliem
Copy link
Collaborator

kliem commented Jan 16, 2022

I'm a bit confuse as to way ref_target is used here (for __setitem__ and in matrix).

We create the item, put it in the cache, then increase the reference count and then set the pari coefficient. But I don't know why we need to increase the reference count. It is stored in the cache and it will be around as long as the vector/matrix is around or until the index gets set to something new. Either way it should be fine. Increasing the reference count by ref_target seems to be exactly what creates the memory leak.

@videlec
Copy link
Collaborator

videlec commented Jan 16, 2022

Be careful that inside ref_target there is a call to fixGEN that moves the GEN outside of the pari stack (if it was still there).

@kliem
Copy link
Collaborator

kliem commented Jan 16, 2022

Yes, instead of ref_target one should call fixGEN?

@kliem kliem linked a pull request Jan 21, 2022 that will close this issue
@kliem
Copy link
Collaborator

kliem commented Jan 21, 2022

Ok, my previous idea was nonsense. However, the solution seems to be rather simple. The problem is that the matrix/vector is still on the stack when we do the item assignment. Assigning entries to a vector/matrix on the stack sounds like a bad idea.

This is how I figure (in sage):

sage: cython('''
....: import cypari2
....: pari = cypari2.pari_instance.Pari()
....: from cypari2.gen cimport Gen
....: 
....: def foo():
....:     cdef Gen M = pari.matrix(10, 10)
....:     M.fixGEN()
....:     M[1,2] = 20
....: ''')

The above does not leak memory. However, when you comment M.fixGEN it leaks memory, because the matrix is still on the stack, when you assign the integer.

@tornaria
Copy link
Contributor

When I run make check I get

================================================================================
Testing cypari2.pari_instance
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 147988456 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 238449424 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 246420800 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 251542152 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 255781968 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 197923384 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 210963184 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 215847936 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 193350080 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
/usr/lib/python3.10/doctest.py:1350: RuntimeWarning: cypari2 leaked 178484864 bytes on the PARI stack
  exec(compile(example.source, filename, "single",
================================================================================

Is this related?

@culler
Copy link

culler commented Nov 12, 2022

I don't think this is a bug. It is not exactly a feature, but it is a consequence of something which is regarded as a feature. It is also not really a memory leak. If you watch the memory usage while this loop is running you will see it grow to about 75% of physical memory, then decay and stabilize at about 20% of physical memory. That is the python garbage collector at work.

After the last sync of cypari and cypari2, cypari2 changed their memory management scheme for Gen objects. In cypari 1.4.1 all GENs live on the pari stack. Pari has a notion of "cloning" GENs which means copying a GEN to the heap, and "uncloning" heap based GENs which means freeing them. Cypari2 makes use of this, and supports Gen objects whose GEN is either on the heap or on the stack. Intermediate values when evaluating an expression generally stay on the stack. But containers, like matrices, are moved to the heap as soon as an entry is assigned a value. And when the stack gets to be more than half full, all GENs on the stack are moved to the heap. This is a way of dealing with Pari's suggested strategy of cleaning, or "repiling" the stack during a long computation.

Essentially, cypari2 has delegated its management of GENs to the python garbage collector, at least for the heap based GENs. Since this loop involves creating lots of matrices which are immediately destroyed, it generates a very large number of heap based Gens whose GENs must be deallocated by python's garbage collector. The garbage collector allows the memory footprint to grow quite large before becoming more aggressive and taming the growth.

It is worth noting that the stack design provides much more efficient deallocation than could ever be possible with a reference counting garbage collection scheme. If you have a few million dead matrices on the stack you can deallocate all of them in one machine instruction just by setting the stack pointer before the first one. But a reference counting garbage collector will have to call free() millions of times in order to deallocate them. However, since every GEN is wrapped by a Gen object, which must be deallocated by the python garbage collector, it is not really possible for cypari(2) to take full advantage of this efficiency.

@culler culler mentioned this issue Nov 13, 2022
@culler
Copy link

culler commented Nov 14, 2022

I retract my statement that this is not a bug. By adding some print statements in the Gen __dealloc__ method I was able to demonstrate that there are lots of situations in which Gens do not get deallocated, even though there should be no object which is holding a reference. This is not limited to matrices. Here is a sample run demonstrating this:

Normal Python object behavior:

class Dumb():
... def del(self):
... print("Destroying a dumb object.")
...
x = Dumb()
x = Dumb()
Destroying a dumb object.
x = Dumb()
Destroying a dumb object.
x = Dumb()
Destroying a dumb object.

Pari Gen object: behavior:

x = pari(1234567879012345678901234567890)
Destroying a dumb object.
x = pari(1234567879012345678901234567890)
x = None
Destroying gen on stack.
Destroying gen on stack.
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = None
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = None
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = pari(1234567879012345678901234567890)
x = None
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.

For some reason (which is probably important to understand) the case of a pari Gen created from a python list is different. It behaves normally.

from cypari import pari

x = pari(range(3))
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
x = pari(range(3))
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on heap.
x = pari(range(3))
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on heap.
x = pari(range(3))
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on stack.
Destroying gen on heap.

@culler
Copy link

culler commented Nov 14, 2022

An addendum. While the refcount behavior above looks wrong for the pari int Gens, it does not cause the same kind of memory issues to replace
M = pari.matrix(10, 10, range(100))
with
x = pari(12345678901234567890)
in the original loop.

@culler
Copy link

culler commented Nov 16, 2022

If you think through what happens when you repeatedly call
M = pari.matrix(10, 10, range(100))
the behavior we are seeing is exactly what you would expect.

At the beginning of each iteration the bottom of the pari stack is a matrix with a refcount of 1, held by the variable M. (Actually held by the namespace dict and assigned to the name M.) When the assignment expression is evaluated two things happen, presumably atomically.

First a new matrix Gen is created and assigned to stackbottom. The matrix Gen whose GEN is immediately above the bottom now has refcount 2. One reference is held by the variable M and one by the new matrix Gen on the bottom, whose next object is now set to the matrix immediately above the bottom.

Second, the matrix Gen on the bottom is assigned to the varible M. This sets its refcount to 1, and reduces the refcount of the matrix above from 2 to 1. The steady state is that there is a long chain of matrix Gens whose GEN is on the pari stack. Each Gen in the chain has refcount 1 held by the Gen whose GEN is directly below its GEN on the pari stack. Eventually the GENs belonging to the Gens in this chain fill the pari stack to half of its maximum size. At that point all GENS on the pari stack are moved to the heap (i.e. cloned). When a Gen in the long chain is moved to the heap its refcount is reduced to 0, so it gets destroyed immediately after it is moved to the heap.

The docstring for pari_instance.stacksizemax says:
Return the maximum size of the PARI stack, which is determined
at startup in terms of available memory.

If that is correct it also explains why memory grows under this loop to a (large) fixed fraction of total memory but never reaches OOM.

It is not clear to me how this should be fixed. One option might be to always create new Gens on the heap instead of on the stack. However, pari will always create a new Gen on the pari stack. So this would mean immediately cloning the GEN when a new Gen is created. That costs the time required for the copy. I assume that the tactic of leaving GENs on the stack as long as possible was intended to increase performance by avoiding these copies. That performance increase would be lost. On the other hand, in this particular context, the copy is happening anyway. It just gets delayed until the stack is half full.

Another option might be to do away with the heap altogether. That is probably not a good idea.

@culler
Copy link

culler commented Nov 16, 2022

If what I said above is correct, it is definitely not the whole story since it does not account for the fact that many Gens are moved to the heap immediately.
Also, changing the threshold for moving all Gens to the heap has no effect on the memory usage.

@culler
Copy link

culler commented Nov 16, 2022

I can now finish the story. The discussion above explains why the loop
>>> while True:
... M = pari.matrix(10, 10, range(100))
generates many matrices on the pari stack. The other thing to explain is why having lots of matrices on the stack uses huge amounts of memory. The reason is that the pari.matrix method creates Gens for all of the matrix entries and puts them on the heap. It also caches the entries in its itemcache dict. That means that the entries can not be destroyed until the matrix is destroyed. But a stack-based Gen, other than stackbottom, cannot be destroyed because it is referenced as Gen.next by some other Gen on the stack. So the matrices and their entries persist until the stack gets so large that the stack-based Gens get moved to the heap. Note also that there are, in this case, 100 times as many entries as matrices.

@culler
Copy link

culler commented Nov 16, 2022

Oh, and the fix is to call A.fixGen() as soon as a matrix A is created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants