New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update speed comparisons with crc32 #62
Comments
Problem is, Also, crc32 and crc32c are similar but different algorithms : you won't get the same results. The crc32 version benched here is the one provided within smasher test suite. If you can tolerate Intel dependency for your application, and can guarantee all your client cpus are recent enough (which is reasonable in 2016), you can then use hardware crc32c, it's indeed very fast. xxHash was created in a different context, using a cpu without this capability, and with intended goal of maximum portability, well beyond Intel's realm (arm, mips, power, etc.). Hence no reliance on brand-specific features. |
@Cyan4973 |
Hardware However, multiple |
CRC32 and CRC32C can be very efficiently implemented using Intel's pclmulqdq or ARMv8 CLMUL instructions. Some time ago I put together couple of ARM implementations using CRC32 and CLMUL instructions and thier speeds are floating around 4.1GB/s on rk3399. Now I compared them with xxh32 and xxh64 and got 3.5GB/s and 2.5GB/s respectively. Is it expected that xxh64 is slower than xxh32 on ARMv8, or there is something wrong? |
It's expected that For a faster 64-bit hash on ARM, you may be interested in trying the newer |
Thanks! |
If you use the crc32 instruction properly, available since Nehalem (SSE 4.2), you can achieve throughput of 1.17 cycles per 8 bytes, which would be a theoretical performance of 20.5 GB/s on a 3ghz processor, under idealistic conditions. Source: http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411?pgno=2
Googling a little brings up this SO question, which quotes 20GB/s throughput, which matches up to the theoretical numbers very nicely: http://stackoverflow.com/questions/17645167/implementing-sse-4-2s-crc32c-in-software
Could you make a little note that hardware crc32 is actually ~3x faster than xxhash? That's not to say it's a more suitable hash algorithm, but I wasted considerable time considering a vectorized xxhash vs crc32 for checksum purposes, before I realized I couldn't come close to crc32 in performance.
The text was updated successfully, but these errors were encountered: