Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simd: split cursor advancing from value matching #156

Merged
merged 1 commit into from Mar 6, 2024

Conversation

lucab
Copy link
Contributor

@lucab lucab commented Feb 28, 2024

This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.

Performance impact on my Intel AVX2-capable workstation seems positive (arbitrary benchmark-noise filtering at >20%):

critcmp

@lucab lucab force-pushed the ups/simd-remove-bytes-cursor branch from eea5c01 to 3aaac3a Compare February 28, 2024 17:41
@lucab lucab mentioned this pull request Feb 28, 2024
@lucab lucab force-pushed the ups/simd-remove-bytes-cursor branch 5 times, most recently from 64b4de5 to 4ab2ffb Compare February 29, 2024 08:37
@lucab lucab marked this pull request as ready for review February 29, 2024 09:26
This refactors all SIMD modules in order to make the value-matching
logic self-contained. Thus, all bytes-cursor manipulations are now
grouped and performed once at the end, outside of SIMD logic.
@lucab lucab force-pushed the ups/simd-remove-bytes-cursor branch from 4ab2ffb to a88052f Compare March 5, 2024 07:24
@lucab
Copy link
Contributor Author

lucab commented Mar 6, 2024

@seanmonstar this is ready for a review pass, whenever you have time.

There is a minor cleanup bundled in this PR (marking several functions as pub(crate)) which I did in order to make sure I wasn't changing public APIs. I can split that to a dedicated PR if you prefer.

I'll be honest, I started doing this rework as part of hyperium/hyper#3574 before actually going for hyperium/hyper#3575, focused on memory usage/allocation patterns.
My goal was to (eventually) use the bytes crate in this library, but then I realized that this required SIMD-related groundwork was actually providing performance improvements on its own. As such, I think it makes sense to land this already.

Copy link
Owner

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful PR, and the speed boosts seem out of this world!

@seanmonstar seanmonstar merged commit b2625f3 into seanmonstar:master Mar 6, 2024
34 checks passed
@lucab
Copy link
Contributor Author

lucab commented Mar 8, 2024

Thanks for merging this. Even if I recorded those perf numbers myself, I'm still somehow puzzled and a bit skeptical about them.
I tried on a different machine (an AMD Ryzen 7) and this time I'm not seeing any measurable change (i.e. all differences are below 20%).
Both cases are workstation laptops and I did disable all kind of turbo-boosting and CPU dynamic scaling I could think of.
So either the code change is reliably hitting something very specific on the Intel machine (better cache locality?), or this workstation is just a-not-very good environment for benchmark comparisons.

Overall, I think the new code is a useful refactor but I personally won't guarantee the pictured performance changes to be valid in all environments.

@lucab lucab deleted the ups/simd-remove-bytes-cursor branch March 8, 2024 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants