Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add benchmarks to source code #54

Open
dralley opened this issue Jun 25, 2022 · 5 comments
Open

Add benchmarks to source code #54

dralley opened this issue Jun 25, 2022 · 5 comments

Comments

@dralley
Copy link

dralley commented Jun 25, 2022

The documentation shares some benchmarks, which is great. But for transparency, and also to make it easier for users to run said benchmarks on their machine and determine what works best for their hardware, it would also be useful to have the benchmarks available in this repo.

Additionally it would be great to test against crates such as memchr.

Another user had posted some code previously, but it is no longer availble #11

@shepmaster
Copy link
Owner

They exist:

jetscii/src/lib.rs

Lines 349 to 350 in 868b04c

#[cfg(all(test, feature = "benchmarks"))]
mod bench {

cargo +nightly bench --features benchmarks


Additionally it would be great to test against crates such as memchr.

Certainly! Feel free to add it as a dev-dependency and add it to the benchmarks.

@dralley
Copy link
Author

dralley commented Jul 7, 2022

Ah, I was looking for a separate directory as is typically done and didn't see them. Sorry for the confusion.

Quick question though. I tried to use jetscii to accelerate an XML parsing library, in particular to do escaping of text, and the results were a little disappointing as it was only 50-75% faster in the ideal case and worse on short inputs. Is that typical?

I've read that pcmpestrm is slower than pcmpistrm and that hardware makers don't tend to prioritize either of them very that much, which sounds kind of unfortunate if true.

tafia/quick-xml#408

@shepmaster
Copy link
Owner

as is typically done

You'll note that this repo is old and predates a number of now-common patterns. 😉

I tried to use jetscii to accelerate an XML parsing library

That would be the reason that I created it. :-)

only 50-75% faster in the ideal case and worse on short inputs

I'm no hardware guru, but those numbers make sense to me. The SIMD parts of the processor are "big and heavy" and use a disproportionate amount of power. Some recent processors even stopped including some units like AVX-512 for related reasons.

(Side note: "X% faster" is not the clearest way of stating performance changes. Prefer "X% of previous speed" or even better showing absolute before and after numbers. I parse "50% faster" as you went from e.g. 100B/sec to 150B/sec)

I've read that pcmpestrm is slower than pcmpistrm

I had not heard that; do you have any links to share?

hardware makers don't tend to prioritize either of them

That wouldn't surprise me with the whole power thing.

@dralley
Copy link
Author

dralley commented Jul 7, 2022

I had not heard that; do you have any links to share?

Yeah. Unfortunately it seems to be true. The variants that are used with C strings got all the love : /

https://uops.info/table.html

https://stackoverflow.com/questions/20935769/sse42-sttni-pcmpestrm-is-twice-slower-than-pcmpistrm-is-it-true

https://stackoverflow.com/questions/46762813/how-much-faster-are-sse4-2-string-instructions-than-sse2-for-memcmp

The comment from burntsushi and the Intel guy here https://news.ycombinator.com/item?id=14422098

@Dr-Emann
Copy link
Collaborator

This should probably be closed if #57 is merged, since it allows cargo bench to work directly, and moves the benchmarks to a separate folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants