Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimization: Special handling for common shadow stack instruction sequences #920

Open
Robbepop opened this issue Jan 31, 2024 · 1 comment · May be fixed by #928
Open

Optimization: Special handling for common shadow stack instruction sequences #920

Robbepop opened this issue Jan 31, 2024 · 1 comment · May be fixed by #928
Labels
optimization An performance optimization issue.

Comments

@Robbepop
Copy link
Member

Robbepop commented Jan 31, 2024

Wasm producers such as LLVM often generate Wasm code with a so-called shadow stack in order to handle parameters and return values that did not fit the Wasm function parameter and result passing model.

Common sequences include:

global.get 0
i32.const $n
i32.sub
local.tee $v
global.set 0

and

global.get 0
i32.const -$n
i32.add
local.tee $v
global.set 0

as well as the counterpart

local.get $v
i32.const $n
i32.add
global.set 0

Where $n denotes a positive i32 integer literal and $v denotes a local variable index.
Currently Wasmi (register) translates this to roughly the following bytecode:

$r <- global.get 0
$v <- i32.sub_imm $r $n
global.set 0 $v

However, with special treatment in the Wasmi translator and introduction of new Wasmi bytecode instructions we could easily translate the entire sequences above to a single instruction.

$v <- global.0.i32.sub_assign_imm $n
  • We no longer state the global index (0) since Wasm producers implicitly always use the zero-th index and we can use this fact to further compress the special bytecode for these sequences.
  • Another optimization that we could apply is that the constant immediate value is always divisible by 4 (since this is about pointer arithmetic on 32-bit systems). So we could store the constants as divided by 4 to increase the effective value range.
  • Instead of supporting both global.0.i32.sub_assign_imm and global.0.i32.add_assign_imm we can use the fact that sequences with i32.add always have a negative constant immediate value and thus convert the i32.add into an i32.sub using a negated constant.

It is to be expected that this optimization yields good results because:

  • the most costly things in an efficient interpreter such as Wasmi is instruction dispatch and this technique results in fewer instructions for the same compute effect
  • past similar optimizations with op-code fusion yielded good results
  • when inspecting Wasm binaries such as spidermonkey.wat or westend.wat (Polkadot runtime) we observed thousands of occurrences of the above Wasm instruction sequences.
@Robbepop Robbepop added the optimization An performance optimization issue. label Jan 31, 2024
@Robbepop
Copy link
Member Author

Robbepop commented Feb 4, 2024

Having #924 implemented makes this optimization simpler to implement since we no longer have to differentiate between i32.add r c and i32.sub r c variants of the above instruction sequences.

@Robbepop Robbepop linked a pull request Feb 6, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimization An performance optimization issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant