Support using AVX and even AVX512 #11

upsuper · 2016-05-21T03:17:39Z

Nowadays, most mainstream x86 CPUs support AVX, which supports doing integer computation on 256bit. I believe that would further improve the performance.

Intel may start shipping CPUs which support AVX512 in the coming year. AVX512, as its name indicates, supports computation on 512bits. This should probably be considered as well I guess.

upsuper · 2016-05-21T03:21:14Z

Not sure whether the reference implementation already takes the advantage of AVX, but it seems on my machine, it is much faster than this Rust implementation:

running 2 tests
test bench_argon2rs_i ... bench:  10,646,195 ns/iter (+/- 764,309)
test bench_cargon_i   ... bench:   7,329,629 ns/iter (+/- 132,919)

(This is > 30% difference, while the difference shown in the README is only < 20%)

upsuper · 2016-05-21T05:59:19Z

It seems the reference implementation doesn't take the advantage of AVX either. Even if compiled with -march=nocona, it is still much faster.

bryant · 2016-05-21T23:03:52Z

So I discovered this afternoon that codegenning AVX is a simple matter of:

$ export RUSTFLAGS='-C target-feature=+avx'
$ cargo clean
$ cargo bench --features simd
$ objdump -d target/release/argon2rs-* | grep vpalignr | head -n 2  # confirm.
   1644d:       c4 e3 51 0f e7 08       vpalignr $0x8,%xmm7,%xmm5,%xmm4
   16453:       c4 e3 41 0f ed 08       vpalignr $0x8,%xmm5,%xmm7,%xmm5

which does wonders for the cross-swap operations in the Argon2 permutation function.

bryant · 2016-05-21T23:09:16Z

It seems the reference implementation doesn't take the advantage of AVX either. Even if compiled with -march=nocona, it is still much faster.

In fact, the ref-impl does try to exploit AVX. See, for instance, blamka.

The ifdef is keyed to SSSE3, but according to https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=80,2369,547,228&techs=MMX,SSE,SSE2,SSE3,SSSE3,SSE4_1,SSE4_2,AVX,AVX2&text=vpalign , vpalignr is AVX2. Go figure.

Unfortunately, cargon's build.rs was invoking $(CC) without -march=native, which would be necessary for the above ifdef to pass. This has been fixed in 045a54f of the go-faster branch.

bryant · 2016-05-21T23:14:01Z

And on the topic of go-faster, would you mind re-running the benches with +avx and --features simd on that branch? The numbers I'm getting are quite favorable now:

23:11:48 ~/argon2rs/> RUSTFLAGS='-C target-feature=+avx' cargo bench --features simd
     Running target/release/versus_cargon-9211de8e436df972

running 3 tests
test ensure_identical_hashes ... ignored
test bench_argon2rs_i ... bench:   8,488,636 ns/iter (+/- 32,292)
test bench_cargon_i   ... bench:  10,011,768 ns/iter (+/- 491,314)

test result: ok. 0 passed; 0 failed; 1 ignored; 2 measured

upsuper · 2016-05-23T08:48:22Z

It seems whether AVX is enabled doesn't change things a lot.

$ cargo bench --features=simd
test bench_argon2rs_i ... bench:   8,380,176 ns/iter (+/- 584,836)
test bench_cargon_i   ... bench:   7,336,689 ns/iter (+/- 264,857)
$ RUSTFLAGS="-C target-feature=+avx" cargo bench --features=simd
test bench_argon2rs_i ... bench:   8,356,828 ns/iter (+/- 784,442)
test bench_cargon_i   ... bench:   7,317,043 ns/iter (+/- 319,753)

Although it is still not as fast as the reference implementation here, it is indeed much faster than before, good job!

upsuper · 2016-05-23T08:51:40Z

Looks like the only difference is not calling crossbeam when not needed :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support using AVX and even AVX512 #11

Support using AVX and even AVX512 #11

upsuper commented May 21, 2016

upsuper commented May 21, 2016 •

edited

upsuper commented May 21, 2016

bryant commented May 21, 2016 •

edited

bryant commented May 21, 2016 •

edited

bryant commented May 21, 2016

upsuper commented May 23, 2016 •

edited

upsuper commented May 23, 2016

Support using AVX and even AVX512 #11

Support using AVX and even AVX512 #11

Comments

upsuper commented May 21, 2016

upsuper commented May 21, 2016 • edited

upsuper commented May 21, 2016

bryant commented May 21, 2016 • edited

bryant commented May 21, 2016 • edited

bryant commented May 21, 2016

upsuper commented May 23, 2016 • edited

upsuper commented May 23, 2016

upsuper commented May 21, 2016 •

edited

bryant commented May 21, 2016 •

edited

bryant commented May 21, 2016 •

edited

upsuper commented May 23, 2016 •

edited