Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with Core 2 Duo #76

Open
easyaspi314 opened this issue Dec 3, 2018 · 1 comment
Open

Issues with Core 2 Duo #76

easyaspi314 opened this issue Dec 3, 2018 · 1 comment

Comments

@easyaspi314
Copy link
Contributor

First of all, Penryn lacks the rdtscp instruction. It can use rdtsc instead. Otherwise, it gets a bad instruction issue on the benchmark. Despite this, it seems the benchmark is nonfunctional anyways. :(

In addition, HighwayHash64 seems excessively slow on my (admittedly old) chip compared to other hashes.

xxhsum benchmark (100 KB)
gcc 8.2.0 gcc-8 -O2 -march=native
MacBook (13-inch, Mid 2009)/Macbook5,2
2.13 GHz Intel Core 2 Duo (Penryn, SSE4.1, P7450)
macOS 10.13.6 with High Sierra Patcher
4 GB RAM

Hash Aligned Unaligned
XXH32 3912.6 MB/s 2985.9 MB/s
XXH64 4004.1 MB/s 2891.6 MB/s
XXH32a (two vector_size(16) lanes) 4970.8 MB/s 3144.7 MB/s
XXH64a (two vector_size(16) lanes) 4935.6 MB/s 3152.1 MB/s
FarmHash32 5654.1 MB/s 3619.6 MB/s
FarmHash64 6092.9 MB/s 4197.5 MB/s
HighwayHash64 (SSE4.1) 2462.1 MB/s 1998.7 MB/s
HighwayHash64 (Portable) 290.4 MB/s 289.2 MB/s
HighwayHash64 (C) 451.4 MB/s 435.6 MB/s
SpookyHash v2 6349.3 MB/s 3720.1 MB/s

Note that the Core 2 Duo has a slow multiplier, which takes twice as many cycles as it does for newer Intels. It is the main slowdown for the xxHash family, as replacing multiplies with xors gets it to the upper 5700s (it is ineffective as a hash, though). It also doesn't seem to have fast 64x2 vectors. GCC appears to do operations with 2 32-bit lanes, which is another slowdown.

I mostly want to bring this to attention, because I definitely was disappointed after the effort to make it compile.

@jan-wassenberg
Copy link
Member

@easyaspi314 sorry to hear about the disappointing result. I'm surprised pmuludq was twice as slow on Conroe (assuming that is the Core 2 Duo in question?). According to uops.info, it's 3 cycle latency, as on Nehalem.

Compiler codegen is indeed a concern, we've seen Clang do a better job with intrinsics.

Note that HighwayHash is intended as a MAC (larger state and no trivially reversible operations), hence it is not comparable to other faster hashes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants