Benchmarks #95

htot · 2022-06-22T20:38:37Z

@aklomp @mayeut
Again a draft. Please ignore the Benchmarks patch, I was to far to drop that and rebase against HEAD.

The interesting one is codec: add ssse3_atom.

My experience with CRC32C with Silvermont Atom (SLM) processors is that in 64b certain combinations of instructions incur a penalty (see Intel manuals) making the advantage of running in 64b mode negative in some cases. In later Atoms (Goldmont, Airmont) this penalty likely does not occur, but I don't have the hardware to test. Running base64 on SLM shows strange performance regressions while core i7 shows improvement.

So, I revived the best ssse3 codec as ssse3_atom and tested on Intel Edison (dual core 500MHz) in 64b/32b mode (because that is easy to do) and on Intel NUC with Baytrail Atom in 64b (to show the relevancy on main stream CPU).

Min - Speed (MB/sec)	Direction
	decode			encode
Processor	plain	SSSE3	SSSE3_ATOM	plain	SSSE3	SSSE3_ATOM
Atom E3815 @ 1.46GHz (64b)	326	449	565	441	569	556
Edison @ 500MHz (32b)	40	102	103	67	111	111
Edison @ 500MHz (64b)	119	164	206	162	209	204
i7-10700 CPU @ 2.90GHz	3997	9356	4685	4387	8823	7593

Improvement by going back to the revived codec in bold, degradation in italic.

We see that on i7 the latest version is indeed the fastest, on SLM 32 bit there is no difference. But on SLM 64b SSSE3_ATOM is 25% faster.
Now, having a fast algorithm has a much more noticable effect on a slow Atom then on a fast i7... So what do you guys think, should we add a specialized SSSE3 for SLM?

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>

By performing benchmarks on Intel Edison (a Silvermont Atom CPU) in x86_64 mode from v0.3.0 we find that SSE3 had various ups and down. Substantial changes since v0.3.0 were: HASH SSSE3 SSSE3 e12e3cd 165 210 3f3f31c 206 150 67ee3fd 205 205 0a69845 145 205 a5b6739 145 218 6310c1f 157 218 9a0d1b2 158 210 5874921 165 210 Best performance was from 67ee3fd until decode performance regressed from 205 to 145 MB/s with commit 0a69845. The commit before that (b6417f3) had best decode performance with relatively good encode. Core(-i7) processors do not should such large performance changes. This patch adds the ssse3 codec from b6417f3 as ssse3_atom. Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>

htot · 2022-06-23T22:41:58Z

@aqrit?

aqrit · 2022-06-23T23:24:05Z

For dec_loop: #46 is probably faster. Though, it does trade readability for speed.

dec_reshuffle without _mm_madd_epi16 could look like this:

// Pack 16 6-bit values into 12 bytes
// (wasm doesn't have pmaddubsw (but does have pmaddw))
const v128_t shuf = wasm_i8x16_const(2, 1, 0, 6, 5, 4, 10, 9, 8, 14, 13, 12, -1, -1, -1, -1);
v = wasm_v128_or(wasm_i16x8_shr_u(v, 6), wasm_i16x8_shl(v, 8));   // 00cccccc|dddddd00|00aaaaaa|bbbbbb00
v = wasm_v128_or(wasm_i32x4_shr_u(v, 18), wasm_i32x4_shl(v, 10)); // dddd0000|aaaaaabb|bbbbcccc|ccdddddd
v = wasm_i8x16_swizzle(v, shuf);                                  //       ..|ccdddddd|bbbbcccc|aaaaaabb

I'm don't know if it has better latency, but it does have fewer instructions and constants ... edit: in comparision to dec_reshuffle in this PR.

htot · 2022-06-24T07:04:50Z

Yeah, this draft PR just revives an older version of the codec which showed better performance then currently (on SLM). I didn't try to create my own improvement. PR #46 is a bit older, did you benchmark it at the time on atom?

htot · 2022-06-24T22:01:31Z

@aqrit would you rebase #46 on master? I'd like to run benchmarks on edison/atom

htot added 2 commits June 22, 2022 21:36

Benchmarks: fix detection of supported codecs on x86

91dec51

Signed-off-by: Ferry Toth <ftoth@exalondelft.nl>

htot marked this pull request as draft June 23, 2022 22:42

aklomp force-pushed the master branch from 4a87aba to cba709a Compare October 13, 2022 13:15

jirutka mentioned this pull request Nov 19, 2023

Codec detection doesn’t work in test_base64 on musl libc #124

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks #95

Benchmarks #95

htot commented Jun 22, 2022

htot commented Jun 23, 2022

aqrit commented Jun 23, 2022 •

edited

htot commented Jun 24, 2022

htot commented Jun 24, 2022

Benchmarks #95

Are you sure you want to change the base?

Benchmarks #95

Conversation

htot commented Jun 22, 2022

htot commented Jun 23, 2022

aqrit commented Jun 23, 2022 • edited

htot commented Jun 24, 2022

htot commented Jun 24, 2022

aqrit commented Jun 23, 2022 •

edited