Use CPython hash algorithm for frozenset #5164

dchiquito · 2024-02-10T02:10:15Z

The original hash algorithm just XOR'd all the hashes of the elements of the set, which can be problematic. The CPython algorithm is required to pass the tests.

Replace PyFrozenSet::hash with CPython's algorithm
Remove unused hash_iter_unorded functions
Add frozenset benchmark
Enable tests
Lower performance expectations on effectiveness test
Adjust slot::hash_wrapper so that it doesn't rehash the computed hash value in the process of converting PyInt to PyHash

The original hash algorithm just XOR'd all the hashes of the elements of the set, which is problematic. The CPython algorithm is required to pass the tests. - Replace `PyFrozenSet::hash` with CPython's algorithm - Remove unused `hash_iter_unorded` functions - Add `frozenset` benchmark - Enable tests - Lower performance expectations on effectiveness test - Adjust `slot::hash_wrapper` so that it doesn't rehash the computed hash value in the process of converting PyInt to PyHash.

Lib/test/test_set.py

Co-authored-by: fanninpm <luxverdans@sbcglobal.net>

youknowone · 2024-02-10T02:51:40Z

Lib/test/test_set.py

+        # The original test has:
+        # for n in range(18):
+        # Due to general performance overhead, hashing a frozenset takes
+        # about 50 times longer than in CPython. This test amplifies that


Does it take this long time even with release build?

yep, as I understand cargo bench always uses a release build.

These are some benchmarks from the original hash function (the new algorithm slows things down 10%-20%):

CPython

microbenchmarks/cpython/frozenset.py time: [464.42 ns 465.94 ns 467.92 ns] thrpt: [213.71 Melem/s 214.62 Melem/s 215.32 Melem/s] microbenchmarks/cpython/frozenset.py #2 time: [468.92 ns 470.99 ns 473.70 ns] thrpt: [633.31 Melem/s 636.96 Melem/s 639.76 Melem/s] microbenchmarks/cpython/frozenset.py #3 time: [1.5223 µs 1.5513 µs 1.5795 µs] thrpt: [316.57 Melem/s 322.30 Melem/s 328.45 Melem/s] microbenchmarks/cpython/frozenset.py #4 time: [1.7608 µs 1.7734 µs 1.7854 µs] thrpt: [392.06 Melem/s 394.73 Melem/s 397.56 Melem/s] microbenchmarks/cpython/frozenset.py #5 time: [2.0057 µs 2.0986 µs 2.1882 µs] thrpt: [411.30 Melem/s 428.87 Melem/s 448.72 Melem/s]

RustPython

microbenchmarks/rustpython/frozenset.py time: [4.6804 µs 4.6883 µs 4.6969 µs] thrpt: [21.290 Melem/s 21.330 Melem/s 21.366 Melem/s] microbenchmarks/rustpython/frozenset.py #2 time: [13.022 µs 13.064 µs 13.106 µs] thrpt: [22.891 Melem/s 22.963 Melem/s 23.039 Melem/s] microbenchmarks/rustpython/frozenset.py #3 time: [22.971 µs 23.087 µs 23.184 µs] thrpt: [21.567 Melem/s 21.657 Melem/s 21.767 Melem/s] microbenchmarks/rustpython/frozenset.py #4 time: [31.847 µs 31.905 µs 31.968 µs] thrpt: [21.897 Melem/s 21.940 Melem/s 21.980 Melem/s] microbenchmarks/rustpython/frozenset.py #5 time: [42.031 µs 42.118 µs 42.197 µs] thrpt: [21.329 Melem/s 21.368 Melem/s 21.413 Melem/s]

(this is why I fixed the benchmarks first lol)

It is weird that this test just doesn't finish though. Even two orders of magnitude slower should still be within the realm of possibility. This test does some truly bizarre things with power sets that I assume magnify the performance disparity exponentially.

youknowone

Thank you! 👍

youknowone · 2024-02-10T05:55:36Z

I wish we can find out what makes it that slow

dchiquito · 2024-02-10T13:51:39Z

#5167 for posterity

fanninpm reviewed Feb 10, 2024

View reviewed changes

Lib/test/test_set.py Outdated Show resolved Hide resolved

Use correct TODO syntax

074d228

Co-authored-by: fanninpm <luxverdans@sbcglobal.net>

youknowone reviewed Feb 10, 2024

View reviewed changes

youknowone approved these changes Feb 10, 2024

View reviewed changes

youknowone merged commit 1c93c16 into RustPython:main Feb 10, 2024
9 of 11 checks passed

dchiquito deleted the frozenset-hash branch February 10, 2024 13:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use CPython hash algorithm for frozenset #5164

Use CPython hash algorithm for frozenset #5164

dchiquito commented Feb 10, 2024

youknowone Feb 10, 2024

dchiquito Feb 10, 2024

dchiquito Feb 10, 2024

youknowone left a comment

youknowone commented Feb 10, 2024

dchiquito commented Feb 10, 2024

Use CPython hash algorithm for frozenset #5164

Use CPython hash algorithm for frozenset #5164

Conversation

dchiquito commented Feb 10, 2024

youknowone Feb 10, 2024

Choose a reason for hiding this comment

dchiquito Feb 10, 2024

Choose a reason for hiding this comment

dchiquito Feb 10, 2024

Choose a reason for hiding this comment

youknowone left a comment

Choose a reason for hiding this comment

youknowone commented Feb 10, 2024

dchiquito commented Feb 10, 2024