-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use CPython hash algorithm for frozenset #5164
Conversation
The original hash algorithm just XOR'd all the hashes of the elements of the set, which is problematic. The CPython algorithm is required to pass the tests. - Replace `PyFrozenSet::hash` with CPython's algorithm - Remove unused `hash_iter_unorded` functions - Add `frozenset` benchmark - Enable tests - Lower performance expectations on effectiveness test - Adjust `slot::hash_wrapper` so that it doesn't rehash the computed hash value in the process of converting PyInt to PyHash.
Co-authored-by: fanninpm <luxverdans@sbcglobal.net>
# The original test has: | ||
# for n in range(18): | ||
# Due to general performance overhead, hashing a frozenset takes | ||
# about 50 times longer than in CPython. This test amplifies that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it take this long time even with release build?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, as I understand cargo bench
always uses a release build.
These are some benchmarks from the original hash function (the new algorithm slows things down 10%-20%):
CPython
microbenchmarks/cpython/frozenset.py
time: [464.42 ns 465.94 ns 467.92 ns]
thrpt: [213.71 Melem/s 214.62 Melem/s 215.32 Melem/s]
microbenchmarks/cpython/frozenset.py #2
time: [468.92 ns 470.99 ns 473.70 ns]
thrpt: [633.31 Melem/s 636.96 Melem/s 639.76 Melem/s]
microbenchmarks/cpython/frozenset.py #3
time: [1.5223 µs 1.5513 µs 1.5795 µs]
thrpt: [316.57 Melem/s 322.30 Melem/s 328.45 Melem/s]
microbenchmarks/cpython/frozenset.py #4
time: [1.7608 µs 1.7734 µs 1.7854 µs]
thrpt: [392.06 Melem/s 394.73 Melem/s 397.56 Melem/s]
microbenchmarks/cpython/frozenset.py #5
time: [2.0057 µs 2.0986 µs 2.1882 µs]
thrpt: [411.30 Melem/s 428.87 Melem/s 448.72 Melem/s]
RustPython
microbenchmarks/rustpython/frozenset.py
time: [4.6804 µs 4.6883 µs 4.6969 µs]
thrpt: [21.290 Melem/s 21.330 Melem/s 21.366 Melem/s]
microbenchmarks/rustpython/frozenset.py #2
time: [13.022 µs 13.064 µs 13.106 µs]
thrpt: [22.891 Melem/s 22.963 Melem/s 23.039 Melem/s]
microbenchmarks/rustpython/frozenset.py #3
time: [22.971 µs 23.087 µs 23.184 µs]
thrpt: [21.567 Melem/s 21.657 Melem/s 21.767 Melem/s]
microbenchmarks/rustpython/frozenset.py #4
time: [31.847 µs 31.905 µs 31.968 µs]
thrpt: [21.897 Melem/s 21.940 Melem/s 21.980 Melem/s]
microbenchmarks/rustpython/frozenset.py #5
time: [42.031 µs 42.118 µs 42.197 µs]
thrpt: [21.329 Melem/s 21.368 Melem/s 21.413 Melem/s]
(this is why I fixed the benchmarks first lol)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is weird that this test just doesn't finish though. Even two orders of magnitude slower should still be within the realm of possibility. This test does some truly bizarre things with power sets that I assume magnify the performance disparity exponentially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! 👍
I wish we can find out what makes it that slow |
#5167 for posterity |
The original hash algorithm just XOR'd all the hashes of the elements of the set, which can be problematic. The CPython algorithm is required to pass the tests.
PyFrozenSet::hash
with CPython's algorithmhash_iter_unorded
functionsfrozenset
benchmarkslot::hash_wrapper
so that it doesn't rehash the computed hash value in the process of converting PyInt to PyHash