Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runaway memory usage on 0.15.x #1520

Closed
asottile opened this issue Apr 23, 2019 · 11 comments
Closed

runaway memory usage on 0.15.x #1520

asottile opened this issue Apr 23, 2019 · 11 comments
Milestone

Comments

@asottile
Copy link
Contributor

I've narrowed it down to this little script, haven't dug into it yet but the smoking gun is the new function compilation stuff

from werkzeug.routing import Map, Rule


def main():
    while True:
        Map([Rule('/a/<string:b>')])


if __name__ == '__main__':
    exit(main())
@davidism
Copy link
Member

@edk0

@davidism
Copy link
Member

Seems you're saying that rules aren't being GC'd correctly?

I'm not clear under what real circumstances this would happen. Typically you define a set of rules and then use them, they're not arbitrarily created and deleted.

@asottile
Copy link
Contributor Author

our usage here involves a set of redirects defined in a configuration file, this configuration file changes periodically as we add or remove vanity routes

instead of deploying the application each time we want to add a vanity route, we simply update the configuration file and the application rebuilds a route mapping (which is used by a flask app to serve redirects)

@davidism
Copy link
Member

Hmm, that's usually discouraged because changes to the map aren't synchronized in multiprocess workers. I'd probably implement it instead as an error handler for 404, to check if a redirect should be returned instead. Not saying it shouldn't be fixed, just that it's not a use case I'd heard of.

@asottile
Copy link
Contributor Author

each individual worker checks the configuration file periodically and reloads it -- it isn't being injected directly into the flask app as far as I can tell

either way, these objects probably shouldn't leak 😆 -- I'm looking into what could cause that -- I suspect either the functions being compiled have issue or the hashing of those (since the Map seems to also be involved in some way)

@asottile
Copy link
Contributor Author

Of note, this leak doesn't happen without the <string:b> portion

@asottile
Copy link
Contributor Author

here's the disassembly of the function objects it creates:

>>> x = Rule('/a/<string:b>')
>>> from werkzeug.routing import Map
>>> y = Map([x])
>>> x._build
<function <builder:'/a/<string:b>'> at 0x7f16d62d1730>
>>> import dis
>>> dis.dis(x._build)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f16d2c9a0f0>>)
              6 LOAD_FAST                0 (b)
              8 CALL_FUNCTION            1
             10 BUILD_STRING             2
             12 BUILD_TUPLE              2
             14 RETURN_VALUE
>>> dis.dis(x._build_unknown)
  1           0 LOAD_CONST               0 ('')
              2 LOAD_CONST               1 ('/a/')
              4 LOAD_CONST               2 (<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f16d2c9a0f0>>)
              6 LOAD_FAST                0 (b)
              8 CALL_FUNCTION            1
             10 LOAD_FAST                1 (.keyword_arguments)
             12 JUMP_IF_TRUE_OR_POP     20
             14 LOAD_CONST               0 ('')
             16 DUP_TOP
             18 JUMP_FORWARD            10 (to 30)
        >>   20 LOAD_CONST               3 (functools.partial(<function url_encode at 0x7f16d2d5d510>, charset='utf-8', sort=False, key=None))
             22 ROT_TWO
             24 CALL_FUNCTION            1
             26 LOAD_CONST               4 ('?')
             28 ROT_TWO
        >>   30 BUILD_STRING             4
             32 BUILD_TUPLE              2
             34 RETURN_VALUE

@asottile
Copy link
Contributor Author

Adjusting the script slightly:

import collections
import gc
import pprint
from werkzeug.routing import Map, Rule


def main():
    for _ in range(10000):
        Map([Rule('/a/<string:b>')])
    for _ in range(5):
        gc.collect()
    counts = collections.Counter(type(o) for o in gc.get_objects())
    pprint.pprint(counts.most_common(15))


if __name__ == '__main__':
    exit(main())

it looks like it is leaking (at the very least, probably more in the other common types above it as well) the Map, Rule, as well as a functools.partial and a UnicodeConverter per call:

$ ./venv/bin/python t.py
[(<class 'dict'>, 62085),
 (<class 'list'>, 50514),
 (<class 'function'>, 24085),
 (<class 'tuple'>, 21811),
 (<class 'method'>, 20032),
 (<class 'set'>, 10518),
 (<class 'functools.partial'>, 10002),
 (<class 'werkzeug.routing.UnicodeConverter'>, 10000),
 (<class 'werkzeug.routing.Map'>, 10000),
 (<class 'werkzeug.routing.Rule'>, 10000),
 (<class 'weakref'>, 1306),
 (<class 'wrapper_descriptor'>, 1131),
 (<class 'method_descriptor'>, 879),
 (<class 'builtin_function_or_method'>, 839),
 (<class 'getset_descriptor'>, 740)]

@asottile
Copy link
Contributor Author

here's some graphs of the things keeping this alive in gc:

def graph(obj, ids, *, seen=None, indent='', limit=10):
    if seen is None:
        seen = set()

    for referrer in gc.get_referrers(obj):
        # the main frame which has a hard reference to ths object
        if (
                type(referrer).__name__ == 'frame' and
                referrer.f_globals['__name__'] == '__main__'
        ):
            continue
        # objects only present due to traversal of gc referrers
        elif id(referrer) not in ids:
            continue
        elif id(referrer) in seen:
            print(f'{indent}(already seen) {id(referrer)}')
            continue

        seen.add(id(referrer))

        if indent == '':
            print('=' * 79)
        print(f'{indent}type: {type(referrer).__name__} ({id(referrer)})')
        fmted = repr(referrer)  #pprint.pformat(referrer)
        print(indent + fmted.replace('\n', f'\n{indent}'))

        if limit:
            graph(
                referrer, ids,
                seen=seen, indent='==' + indent, limit=limit - 1,
            )

...

    ids = {id(o) for o in gc.get_objects()}
    obj = next(iter(o for o in gc.get_objects() if isinstance(o, Map)))
    graph(obj, ids)
===============================================================================
type: dict (140272699432680)
{'map': Map([<Rule '/a/<b>' -> None>]), 'regex': '[^/]{1,}'}
==type: UnicodeConverter (140272699175656)
==<werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>
====type: method (140272750734216)
====<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>
======type: tuple (140272726348928)
======('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>)
====type: method (140272749846664)
====<bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>
======type: tuple (140272726372424)
======('', '/a/', <bound method BaseConverter.to_url of <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>>, functools.partial(<function url_encode at 0x7f93c877eae8>, charset='utf-8', sort=False, key=None), '?')
====type: dict (140272723298056)
===={'b': <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>}
======type: dict (140272749460432)
======{'rule': '/a/<string:b>', 'is_leaf': True, 'map': Map([<Rule '/a/<b>' -> None>]), 'strict_slashes': True, 'subdomain': '', 'host': None, 'defaults': None, 'build_only': False, 'alias': False, 'methods': None, 'endpoint': None, 'redirect_to': None, 'arguments': {'b'}, '_trace': [(False, '|'), (False, '/a/'), (True, 'b')], '_converters': {'b': <werkzeug.routing.UnicodeConverter object at 0x7f93c867f2e8>}, '_regex': re.compile('^\\|\\/a\\/(?P<b>[^/]{1,})$'), '_argument_weights': [100], '_static_weights': [(0, -1)], '_build': <function <builder:'/a/<string:b>'> at 0x7f93c9d880d0>, '_build_unknown': <function <builder:'/a/<string:b>'> at 0x7f93c9d88158>}
========type: Rule (140272749480984)
========<Rule '/a/<b>' -> None>
==========type: list (140272699123848)
==========[<Rule '/a/<b>' -> None>]
============type: dict (140272726280736)
============{'_rules': [<Rule '/a/<b>' -> None>], '_rules_by_endpoint': {None: [<Rule '/a/<b>' -> None>]}, '_remap': False, '_remap_lock': <unlocked _thread.lock object at 0x7f93cb6d9da0>, 'default_subdomain': '', 'charset': 'utf-8', 'encoding_errors': 'replace', 'strict_slashes': True, 'redirect_defaults': True, 'host_matching': False, 'converters': {'default': <class 'werkzeug.routing.UnicodeConverter'>, 'string': <class 'werkzeug.routing.UnicodeConverter'>, 'any': <class 'werkzeug.routing.AnyConverter'>, 'path': <class 'werkzeug.routing.PathConverter'>, 'int': <class 'werkzeug.routing.IntegerConverter'>, 'float': <class 'werkzeug.routing.FloatConverter'>, 'uuid': <class 'werkzeug.routing.UUIDConverter'>}, 'sort_parameters': False, 'sort_key': None}
==============type: Map (140272749480872)
==============Map([<Rule '/a/<b>' -> None>])
================(already seen) 140272699432680
================(already seen) 140272749460432
==========type: list (140272700110792)
==========[<Rule '/a/<b>' -> None>]
============type: dict (140272726280808)
============{None: [<Rule '/a/<b>' -> None>]}
==============(already seen) 140272726280736
(already seen) 140272749460432

@asottile
Copy link
Contributor Author

This is sufficient to break the cycle and eliminate the memory leak:

diff --git a/src/werkzeug/routing.py b/src/werkzeug/routing.py
index c7cff94d..8176ddfe 100644
--- a/src/werkzeug/routing.py
+++ b/src/werkzeug/routing.py
@@ -1254,13 +1254,13 @@ class BaseConverter(object):
     weight = 100
 
     def __init__(self, map):
-        self.map = map
+        self.charset = map.charset
 
     def to_python(self, value):
         return value
 
     def to_url(self, value):
-        return _fast_url_quote(text_type(value).encode(self.map.charset))
+        return _fast_url_quote(text_type(value).encode(self.charset))
 
 
 class UnicodeConverter(BaseConverter):

though I believe the actual cause of the cycle is that LOAD_CONST is being used in the code objects against objects which aren't themselves constants, I believe this confuses the gc (assuming that the referred code objects are live, when they are actually not)

@davidism davidism added this to the 0.15.3 milestone May 8, 2019
@asottile
Copy link
Contributor Author

asottile commented May 8, 2019

via #1524

@asottile asottile closed this as completed May 8, 2019
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants