Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak when calling list(...) on OrderedDict #1718

Open
spkl opened this issue Jul 31, 2023 · 2 comments
Open

Memory leak when calling list(...) on OrderedDict #1718

spkl opened this issue Jul 31, 2023 · 2 comments
Milestone

Comments

@spkl
Copy link

spkl commented Jul 31, 2023

Description

There is a memory leak when calling the list function with an OrderedDict. Even after the runtime has been shut down, objects of type PythonGetMemberBinder, PythonType, FunctionDefinition, FunctionCode etc. remain referenced and will not be garbage collected.

Steps to Reproduce

(Repro project available here: https://github.com/spkl/IronPythonMemoryRepro)

  1. In a .NET 6 program, create a ScriptEngine and ScriptScope.
  2. Create a ScriptSource from a file containing the following code:
import collections

def MyMethod(dummy):
    od = collections.OrderedDict()
    for i in range(100):
        od[i] = []
    
    for v in list(od.values()):
        continue

    return dummy
  1. Execute the ScriptSource.
  2. Call the MyMethod function.
  3. Shut down the engine.
  4. Repeat this 50 times.

Expected behavior: The memory footprint is stable. (For reference: After the first execution, it is ~50MB).

Actual behavior: The memory footprint exceeds 300 MB after 50 repetitions. Forcing a garbage collection does not change it significantly. (With a larger, more complex example, we have observed ~600 MB of additional, uncollectable memory after 30 script executions.)

Workaround: Instead of for v in list(od.values()):, use for v in [x for x in od.values()] or for v in od.values():.

Versions

The issue is reproducible with both 3.4.0 and 3.4.1.

@BCSharp
Copy link
Member

BCSharp commented Aug 1, 2023

Thanks for the well-prepared report. I was able to reproduce the problem using the latest development version.

@BCSharp
Copy link
Member

BCSharp commented Aug 11, 2023

The underlying cause of this problem is that the conversion of an object (here an OrderedDict) to an enumerable (here, in the list constructor) happens in the default code context. After a successful conversion (here: a resolved call to __iter__), the call site dispatch table gets a rule for that specific type of the object (OrderedDict), so that subsequent iterations of object of the same type are resolved faster.

After the runtime shutdown, the dynamic type for OrderedDict no longer exists, however, the corresponding rule in the call site of the default context still references that type. A new runtime instance will create a new dynamic type for OrderedDict, which is a different type instance than from the previous execution. The default context is static and lives till the end of the hosting process, so those zombie types from previous executions linger around.

I have not seen the code responsible for the call site cache management, but from reading the docs of the DLR I vaguely remember that the cache site is limited (in number of entries, not memory size). Therefore those stale dispatch rules will eventually get replaced by newer ones, so technically it is not a memory leak, more like memory swelling. Indeed, if I run the provided example, say 500 times, the memory footprint reaches a sort of a plateau around 2GB.

Unfortunately, OrderedDict is a large class, so the effect of the swelling is very visible. Preferably, the default context is not used for dispatching of dynamic types, for any of the dynamic calls/conversions. Conversion to an enumerable is just one example. However, using the actual context may not be the best solution either: each module has its own context. So in my assessment, executing everything in the actual context would increase memory usage and slow down the execution, as modules could not share learned dispatch rules. So perhaps having zombie dynamic types is a reasonable price to pay for the performance gain, especially if the those types are small. This may be a common case for most types implementing some part of the Python protocol (like a type with __index__), but enumerable types (i.e. types with __iter__ or __getitem__) tend to be large.

Ideally, those dynamic calls would be done in a runtime-scope context. Unfortunately, IronPython does not have the runtime-scope context; the choice is either the static global (default) context, or the actual (module) context. Perhaps the context of the builtin module could serve as the runtime-scope context, but implementing this across the whole codebase would be a lot of effort.

@BCSharp BCSharp added this to the 3.4.2 milestone Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants