Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChainMap.__contains__ and .get performance improvement. #118932

Closed
dgrigonis opened this issue May 10, 2024 · 10 comments
Closed

ChainMap.__contains__ and .get performance improvement. #118932

dgrigonis opened this issue May 10, 2024 · 10 comments
Assignees
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir

Comments

@dgrigonis
Copy link
Contributor

dgrigonis commented May 10, 2024

Feature or enhancement

Proposal:

import collections as coll


class ChainMap2(coll.ChainMap):
    def __contains__(self, key):
        for mapping in self.maps:
            if key in mapping:
                return True
        return False

    def get(self, key, default=None):
        for mapping in self.maps:
            if key in mapping:
                return mapping[key]
        return default


maps = [dict(a=1), dict(a=2, b=2), dict(a=3, b=2, c=3)]
cm = coll.ChainMap(*maps)
cm2 = ChainMap2(*maps)


%timeit 'a' in cm               # 615 ns
%timeit 'c' in cm               # 752 ns
%timeit cm.get('a')             # 776 ns
%timeit cm.get('c')             # 1.46 µs

%timeit 'a' in cm2              # 140 ns
%timeit 'c' in cm2              # 198 ns
%timeit cm2.get('a')            # 147 ns
%timeit cm2.get('c')            # 208 ns

Has this already been discussed elsewhere?

I have already discussed this feature proposal on Discourse

Links to previous discussion of this feature:

https://discuss.python.org/t/collections-chainmap-get-performance/41925

Linked PRs

@dgrigonis dgrigonis added the type-feature A feature request or enhancement label May 10, 2024
@pochmann3
Copy link
Contributor

I'm curious: how long does cm2['c'] take?

@dgrigonis
Copy link
Contributor Author

621 ns
Can't think of a way to improve it as it needs to be sensitive to __missing__ implementations.

@rhettinger rhettinger added performance Performance or resource usage and removed type-feature A feature request or enhancement labels May 11, 2024
@rhettinger rhettinger self-assigned this May 11, 2024
@rhettinger
Copy link
Contributor

The change to __contains__ is a good idea. It's a bummer that any() with a generator expression does not perform well.

The get method should be left as-is. The edit alters the semantics so that a override of __contains__ in a ChainMap subclass would be bypassed.

@dgrigonis
Copy link
Contributor Author

Makes sense, can't think of any way to keep that and not to iterate twice.

At least updating __contains__ will boost performance of get too.

Should I issue PR or leave it to you?

@nineteendo
Copy link
Contributor

How much slower is this? https://stackoverflow.com/a/44803103

any(True for m in self.maps if key in m)

@dgrigonis
Copy link
Contributor Author

How much slower is this? https://stackoverflow.com/a/44803103

any(True for m in self.maps if key in m)

Slightly faster than the original:

 %timeit 'a' in cm2    # 589 ns
%timeit 'c' in cm2   # 671 ns

@nineteendo
Copy link
Contributor

... but slower than an actual for loop.

@dgrigonis
Copy link
Contributor Author

dgrigonis commented May 11, 2024

%timeit any(map(opr.call, map(opr.attrgetter(' __contains__'), cm2.maps), itl.cycle([key])))
%timeit any(map(opr.methodcaller('__contains__', key), cm2.maps))

786 ns & 500 ns

@pochmann3
Copy link
Contributor

pochmann3 commented May 11, 2024

any(map(opr.call, map(opr.attrgetter(' __contains__'), cm2.maps), itl.cycle([key])))

Looks like a complicated version of:

any(map(opr.contains, cm2.maps, itl.repeat(key)))

@dgrigonis
Copy link
Contributor Author

dgrigonis commented May 11, 2024

Looks like a complicated version of:

any(map(opr.contains, cm2.maps, itl.repeat(key)))

Which is the best I have seen without loop:
In [16]: %timeit 'a' in cm2 # 360 ns
In [17]: %timeit 'c' in cm2 # 424 ns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage stdlib Python modules in the Lib dir
Projects
None yet
Development

No branches or pull requests

5 participants