Memory leak when calling list(...) on OrderedDict #1718

spkl · 2023-07-31T13:27:08Z

Description

There is a memory leak when calling the list function with an OrderedDict. Even after the runtime has been shut down, objects of type PythonGetMemberBinder, PythonType, FunctionDefinition, FunctionCode etc. remain referenced and will not be garbage collected.

Steps to Reproduce

(Repro project available here: https://github.com/spkl/IronPythonMemoryRepro)

In a .NET 6 program, create a ScriptEngine and ScriptScope.
Create a ScriptSource from a file containing the following code:

import collections

def MyMethod(dummy):
    od = collections.OrderedDict()
    for i in range(100):
        od[i] = []
    
    for v in list(od.values()):
        continue

    return dummy

Execute the ScriptSource.
Call the MyMethod function.
Shut down the engine.
Repeat this 50 times.

Expected behavior: The memory footprint is stable. (For reference: After the first execution, it is ~50MB).

Actual behavior: The memory footprint exceeds 300 MB after 50 repetitions. Forcing a garbage collection does not change it significantly. (With a larger, more complex example, we have observed ~600 MB of additional, uncollectable memory after 30 script executions.)

Workaround: Instead of for v in list(od.values()):, use for v in [x for x in od.values()] or for v in od.values():.

Versions

The issue is reproducible with both 3.4.0 and 3.4.1.

The text was updated successfully, but these errors were encountered:

BCSharp · 2023-08-01T03:24:47Z

Thanks for the well-prepared report. I was able to reproduce the problem using the latest development version.

BCSharp · 2023-08-11T00:44:24Z

The underlying cause of this problem is that the conversion of an object (here an OrderedDict) to an enumerable (here, in the list constructor) happens in the default code context. After a successful conversion (here: a resolved call to __iter__), the call site dispatch table gets a rule for that specific type of the object (OrderedDict), so that subsequent iterations of object of the same type are resolved faster.

After the runtime shutdown, the dynamic type for OrderedDict no longer exists, however, the corresponding rule in the call site of the default context still references that type. A new runtime instance will create a new dynamic type for OrderedDict, which is a different type instance than from the previous execution. The default context is static and lives till the end of the hosting process, so those zombie types from previous executions linger around.

I have not seen the code responsible for the call site cache management, but from reading the docs of the DLR I vaguely remember that the cache site is limited (in number of entries, not memory size). Therefore those stale dispatch rules will eventually get replaced by newer ones, so technically it is not a memory leak, more like memory swelling. Indeed, if I run the provided example, say 500 times, the memory footprint reaches a sort of a plateau around 2GB.

Unfortunately, OrderedDict is a large class, so the effect of the swelling is very visible. Preferably, the default context is not used for dispatching of dynamic types, for any of the dynamic calls/conversions. Conversion to an enumerable is just one example. However, using the actual context may not be the best solution either: each module has its own context. So in my assessment, executing everything in the actual context would increase memory usage and slow down the execution, as modules could not share learned dispatch rules. So perhaps having zombie dynamic types is a reasonable price to pay for the performance gain, especially if the those types are small. This may be a common case for most types implementing some part of the Python protocol (like a type with __index__), but enumerable types (i.e. types with __iter__ or __getitem__) tend to be large.

Ideally, those dynamic calls would be done in a runtime-scope context. Unfortunately, IronPython does not have the runtime-scope context; the choice is either the static global (default) context, or the actual (module) context. Perhaps the context of the builtin module could serve as the runtime-scope context, but implementing this across the whole codebase would be a lot of effort.

BCSharp mentioned this issue Aug 8, 2023

Use actual code context for enumerating #1721

Merged

BCSharp added this to the 3.4.2 milestone Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory leak when calling list(...) on OrderedDict #1718

Memory leak when calling list(...) on OrderedDict #1718

spkl commented Jul 31, 2023

BCSharp commented Aug 1, 2023

BCSharp commented Aug 11, 2023

Memory leak when calling list(...) on OrderedDict #1718

Memory leak when calling list(...) on OrderedDict #1718

Comments

spkl commented Jul 31, 2023

Description

Steps to Reproduce

Versions

BCSharp commented Aug 1, 2023

BCSharp commented Aug 11, 2023