Optimization: Special handling for common shadow stack instruction sequences #920

Robbepop · 2024-01-31T11:23:48Z

Wasm producers such as LLVM often generate Wasm code with a so-called shadow stack in order to handle parameters and return values that did not fit the Wasm function parameter and result passing model.

Common sequences include:

global.get 0
i32.const $n
i32.sub
local.tee $v
global.set 0

and

global.get 0
i32.const -$n
i32.add
local.tee $v
global.set 0

as well as the counterpart

local.get $v
i32.const $n
i32.add
global.set 0

Where $n denotes a positive i32 integer literal and $v denotes a local variable index.
Currently Wasmi (register) translates this to roughly the following bytecode:

$r <- global.get 0
$v <- i32.sub_imm $r $n
global.set 0 $v

However, with special treatment in the Wasmi translator and introduction of new Wasmi bytecode instructions we could easily translate the entire sequences above to a single instruction.

$v <- global.0.i32.sub_assign_imm $n

We no longer state the global index (0) since Wasm producers implicitly always use the zero-th index and we can use this fact to further compress the special bytecode for these sequences.
Another optimization that we could apply is that the constant immediate value is always divisible by 4 (since this is about pointer arithmetic on 32-bit systems). So we could store the constants as divided by 4 to increase the effective value range.
Instead of supporting both global.0.i32.sub_assign_imm and global.0.i32.add_assign_imm we can use the fact that sequences with i32.add always have a negative constant immediate value and thus convert the i32.add into an i32.sub using a negated constant.

It is to be expected that this optimization yields good results because:

the most costly things in an efficient interpreter such as Wasmi is instruction dispatch and this technique results in fewer instructions for the same compute effect
past similar optimizations with op-code fusion yielded good results
when inspecting Wasm binaries such as spidermonkey.wat or westend.wat (Polkadot runtime) we observed thousands of occurrences of the above Wasm instruction sequences.

The text was updated successfully, but these errors were encountered:

Robbepop · 2024-02-04T09:37:30Z

Having #924 implemented makes this optimization simpler to implement since we no longer have to differentiate between i32.add r c and i32.sub r c variants of the above instruction sequences.

Robbepop mentioned this issue Jan 31, 2024

Big performance regression between 0.31 and 0.32 #914

Closed

Robbepop added the optimization An performance optimization issue. label Jan 31, 2024

Robbepop linked a pull request Feb 6, 2024 that will close this issue

Optimize shadow stack instruction sequences #928

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: Special handling for common shadow stack instruction sequences #920

Optimization: Special handling for common shadow stack instruction sequences #920

Robbepop commented Jan 31, 2024 •

edited

Robbepop commented Feb 4, 2024

Optimization: Special handling for common shadow stack instruction sequences #920

Optimization: Special handling for common shadow stack instruction sequences #920

Comments

Robbepop commented Jan 31, 2024 • edited

Robbepop commented Feb 4, 2024

Robbepop commented Jan 31, 2024 •

edited