simd: split cursor advancing from value matching #156

lucab · 2024-02-28T17:39:28Z

This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.

Performance impact on my Intel AVX2-capable workstation seems positive (arbitrary benchmark-noise filtering at >20%):

This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.

lucab · 2024-03-06T08:36:02Z

@seanmonstar this is ready for a review pass, whenever you have time.

There is a minor cleanup bundled in this PR (marking several functions as pub(crate)) which I did in order to make sure I wasn't changing public APIs. I can split that to a dedicated PR if you prefer.

I'll be honest, I started doing this rework as part of hyperium/hyper#3574 before actually going for hyperium/hyper#3575, focused on memory usage/allocation patterns.
My goal was to (eventually) use the bytes crate in this library, but then I realized that this required SIMD-related groundwork was actually providing performance improvements on its own. As such, I think it makes sense to land this already.

seanmonstar

Beautiful PR, and the speed boosts seem out of this world!

lucab · 2024-03-08T17:32:55Z

Thanks for merging this. Even if I recorded those perf numbers myself, I'm still somehow puzzled and a bit skeptical about them.
I tried on a different machine (an AMD Ryzen 7) and this time I'm not seeing any measurable change (i.e. all differences are below 20%).
Both cases are workstation laptops and I did disable all kind of turbo-boosting and CPU dynamic scaling I could think of.
So either the code change is reliably hitting something very specific on the Intel machine (better cache locality?), or this workstation is just a-not-very good environment for benchmark comparisons.

Overall, I think the new code is a useful refactor but I personally won't guarantee the pictured performance changes to be valid in all environments.

lucab force-pushed the ups/simd-remove-bytes-cursor branch from eea5c01 to 3aaac3a Compare February 28, 2024 17:41

lucab mentioned this pull request Feb 28, 2024

lib: fix import #157

Merged

lucab force-pushed the ups/simd-remove-bytes-cursor branch 5 times, most recently from 64b4de5 to 4ab2ffb Compare February 29, 2024 08:37

lucab marked this pull request as ready for review February 29, 2024 09:26

simd: split cursor advancing from value matching

a88052f

This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.

lucab force-pushed the ups/simd-remove-bytes-cursor branch from 4ab2ffb to a88052f Compare March 5, 2024 07:24

seanmonstar approved these changes Mar 6, 2024

View reviewed changes

seanmonstar merged commit b2625f3 into seanmonstar:master Mar 6, 2024
34 checks passed

lucab deleted the ups/simd-remove-bytes-cursor branch March 8, 2024 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simd: split cursor advancing from value matching #156

simd: split cursor advancing from value matching #156

lucab commented Feb 28, 2024 •

edited

lucab commented Mar 6, 2024

seanmonstar left a comment

lucab commented Mar 8, 2024 •

edited

simd: split cursor advancing from value matching #156

simd: split cursor advancing from value matching #156

Conversation

lucab commented Feb 28, 2024 • edited

lucab commented Mar 6, 2024

seanmonstar left a comment

Choose a reason for hiding this comment

lucab commented Mar 8, 2024 • edited

lucab commented Feb 28, 2024 •

edited

lucab commented Mar 8, 2024 •

edited