Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redundant checks in floating point to integer casts on WASM #73591

Closed
CryZe opened this issue Jun 21, 2020 · 6 comments · Fixed by #74695
Closed

Redundant checks in floating point to integer casts on WASM #73591

CryZe opened this issue Jun 21, 2020 · 6 comments · Fixed by #74695
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/

Comments

@CryZe
Copy link
Contributor

CryZe commented Jun 21, 2020

If you compile the following Rust code to WASM:

pub unsafe fn cast(x: f64) -> u8 {
    x.to_int_unchecked()
}

it compiles to the following with Rust 1.45:

example::cast:
        block           
        local.get       0
        f64.const       0x1p32
        f64.lt  
        local.get       0
        f64.const       0x0p0
        f64.ge  
        i32.and 
        i32.eqz
        br_if           0
        local.get       0
        i32.trunc_f64_u
        return
        end_block
        i32.const       0
        end_function

The same happens in Rust <1.45 if you use as _. In both cases the following LLVM IR is emitted:

define hidden i8 @_ZN7example4cast17he66904755097d668E(double %x) unnamed_addr #0 !dbg !5 {
  %0 = fptoui double %x to i8, !dbg !9
  ret i8 %0, !dbg !22
}

fptoui is explicitly defined to be UB if an out of range value is provided.

If the value cannot fit in ty2, the result is a poison value.

However it seems like the WASM backend in LLVM ignores that and emits additional bounds checks.

Additionally this gets even worse in Rust 1.45 if you use as _ instead:

example::cast:
        local.get       0
        f64.const       0x1.fep7
        f64.gt  
        local.set       1
        block           
        block           
        local.get       0
        f64.const       0x0p0
        local.get       0
        f64.const       0x0p0
        f64.gt  
        f64.select
        local.tee       0
        f64.const       0x1p32
        f64.lt  
        local.get       0
        f64.const       0x0p0
        f64.ge  
        i32.and 
        i32.eqz
        br_if           0
        local.get       0
        i32.trunc_f64_u
        local.set       2
        br              1
        end_block
        i32.const       0
        local.set       2
        end_block
        i32.const       -1
        local.get       2
        local.get       1
        i32.select
        end_function

This means that in the end in the WASM VM will do 3 range checks (the one emitted by rust, the one emitted by the llvm backend, and the one it needs to do to possibly trap).

Additionally Rust's saturating checks don't play well with WASM's nontrapping-fptoint target-feature as there's still redundant checks:

example::cast:
        i64.const       9223372036854775807
        local.get       0
        f32.const       -0x1p63
        local.get       0
        f32.const       -0x1p63
        f32.gt  
        f32.select
        i64.trunc_sat_f32_s
        local.get       0
        f32.const       0x1.fffffep62
        f32.gt  
        i64.select
        i64.const       0
        local.get       0
        local.get       0
        f32.eq  
        i64.select
        end_function
@CryZe CryZe added the C-bug Category: This is a bug. label Jun 21, 2020
@jonas-schievink jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/ A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Jun 21, 2020
@CryZe
Copy link
Contributor Author

CryZe commented Jun 22, 2020

It seems to come down to this commit by @sunfishcode in LLVM llvm/llvm-project@cdd48b8

I fail to understand the justification here:

// Lower an fp-to-int conversion operator from the LLVM opcode, which has an
// undefined result on invalid/overflow, to the WebAssembly opcode, which
// traps on invalid/overflow.

If the LLVM opcode doesn't define the behavior for invalid results, then why is the WASM backend trying to paper over the UB and instead introduces checks to supposedly trap? (which I don't think it even does, it just returns dummy values)

@sunfishcode
Copy link
Member

In the context of that comment, "undefined result" doesn't mean "undefined behavior". It means the operation produces a return value, without trapping, but the specific value is undefined. Since's the wasm operator in question traps in some questions, extras checks are required to prevent trapping, to emulate the LLVM IR semantics.

#71269 has now landed, so Rust's semantics for fp-to-int conversion match those of wasm's nontrapping-fptoint feature, which is now part of wasm's core spec, though not all engines implement it yet. So for anyone interested in working on this, the path forward is to teach the Rust backend to take advantage of this wasm feature, when present, to avoid emitting the extra range checks.

@CryZe
Copy link
Contributor Author

CryZe commented Jun 22, 2020

So from what I'm understanding from more code, LLVM could be reordering fptoui / fptosi to be speculatively executed earlier, which would mean the code traps when it possibly shouldn't have, as that cast may have never been executed. Is that correct?

There also seem to be LLVM intrinsics in the WASM backend now such as i32 @llvm.wasm.trunc.signed.i32.f32(float). Can those be speculatively executed? If no, the code should probably be lowered like this:

-nontrapping-fptoint +nontrapping-fptoint
.to_int_unchecked() llvm.wasm.trunc.* llvm.wasm.trunc.*
as _ llvm.wasm.trunc.* + manual saturation checks llvm.wasm.trunc.saturate.*

If yes, it should probably be this:

-nontrapping-fptoint +nontrapping-fptoint
.to_int_unchecked() fptosi / fptoui llvm.wasm.trunc.saturate.*
as _ llvm.wasm.trunc.* + manual saturation checks
(or maybe that intrinsic is still dangerous in that case?)
llvm.wasm.trunc.saturate.*

@sunfishcode
Copy link
Member

Yes, LLVM can reorder and speculate fptosi/fptoui as it assumes they don't trap.

@llvm.wasm.trunc.signed.i32.f32 can trap, so it cannot be speculatively executed in general.

Manishearth added a commit to Manishearth/rust that referenced this issue Jul 1, 2020
…crichton

Use WASM's saturating casts if they are available

WebAssembly supports saturating floating point to integer casts behind a target feature. The feature is already available on many browsers. Beginning with 1.45 Rust will start defining the behavior of floating point to integer casts to be saturating as well. For this Rust constructs additional checks on top of the `fptoui` / `fptosi` instructions it emits. Here we introduce the possibility for the codegen backend to construct saturating casts itself and only fall back to constructing the checks ourselves if that is not possible.

Resolves part of rust-lang#73591
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
…crichton

Use WASM's saturating casts if they are available

WebAssembly supports saturating floating point to integer casts behind a target feature. The feature is already available on many browsers. Beginning with 1.45 Rust will start defining the behavior of floating point to integer casts to be saturating as well. For this Rust constructs additional checks on top of the `fptoui` / `fptosi` instructions it emits. Here we introduce the possibility for the codegen backend to construct saturating casts itself and only fall back to constructing the checks ourselves if that is not possible.

Resolves part of rust-lang#73591
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
…crichton

Use WASM's saturating casts if they are available

WebAssembly supports saturating floating point to integer casts behind a target feature. The feature is already available on many browsers. Beginning with 1.45 Rust will start defining the behavior of floating point to integer casts to be saturating as well. For this Rust constructs additional checks on top of the `fptoui` / `fptosi` instructions it emits. Here we introduce the possibility for the codegen backend to construct saturating casts itself and only fall back to constructing the checks ourselves if that is not possible.

Resolves part of rust-lang#73591
Manishearth added a commit to Manishearth/rust that referenced this issue Jul 2, 2020
…crichton

Use WASM's saturating casts if they are available

WebAssembly supports saturating floating point to integer casts behind a target feature. The feature is already available on many browsers. Beginning with 1.45 Rust will start defining the behavior of floating point to integer casts to be saturating as well. For this Rust constructs additional checks on top of the `fptoui` / `fptosi` instructions it emits. Here we introduce the possibility for the codegen backend to construct saturating casts itself and only fall back to constructing the checks ourselves if that is not possible.

Resolves part of rust-lang#73591
alexcrichton added a commit to alexcrichton/rust that referenced this issue Jul 22, 2020
This commit improves codegen for unchecked casts on WebAssembly targets
to use the singluar `iNN.trunc_fMM_{u,s}` instructions. Previously rustc
would codegen a bare `fptosi` and `fptoui` for float casts but for
WebAssembly targets the codegen for these instructions is quite large.
This large codegen is due to the fact that LLVM can speculate these
instructions so the trapping behavior of WebAssembly needs to be
protected against in case they're speculated.

The change here is to update the codegen for the unchecked cast
intrinsics to have a wasm-specific case where they call the appropriate
LLVM intrinsic to generate the right wasm instruction. The intrinsic is
explicitly opting-in to undefined behavior so a trap here for
out-of-bounds inputs on wasm should be acceptable.

cc rust-lang#73591
@alexcrichton
Copy link
Member

I've posted #74659 to improve the unchecked -nontrapping-fptoint case

Manishearth added a commit to Manishearth/rust that referenced this issue Jul 23, 2020
Improve codegen for unchecked float casts on wasm

This commit improves codegen for unchecked casts on WebAssembly targets
to use the singluar `iNN.trunc_fMM_{u,s}` instructions. Previously rustc
would codegen a bare `fptosi` and `fptoui` for float casts but for
WebAssembly targets the codegen for these instructions is quite large.
This large codegen is due to the fact that LLVM can speculate these
instructions so the trapping behavior of WebAssembly needs to be
protected against in case they're speculated.

The change here is to update the codegen for the unchecked cast
intrinsics to have a wasm-specific case where they call the appropriate
LLVM intrinsic to generate the right wasm instruction. The intrinsic is
explicitly opting-in to undefined behavior so a trap here for
out-of-bounds inputs on wasm should be acceptable.

cc rust-lang#73591
@alexcrichton
Copy link
Member

Ok, I've improved the last case at #74695 and I think that's good enough to close this out.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 3, 2020
…es, r=nagisa

rustc: Improving safe wasm float->int casts

This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the `nontrapping-fptoint` feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in rust-lang#74659.

Some more background for this issue is present on rust-lang#73591, but the
general gist of the issue is that in LLVM the `fptosi` and `fptoui`
instructions are defined to return an `undef` value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing `i32.trunc_f32_s` (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the `fptosi` instruction
doesn't trap. Overall this means that a function like this:

    #[no_mangle]
    pub unsafe extern "C" fn cast(x: f64) -> u32 {
        x as u32
    }

will generate this WebAssembly today:

    (func $cast (type 0) (param f64) (result i32)
      (local i32 i32)
      local.get 0
      f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
      f64.gt
      local.set 1
      block  ;; label = @1
        block  ;; label = @2
          local.get 0
          f64.const 0x0p+0 (;=0;)
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.gt
          select
          local.tee 0
          f64.const 0x1p+32 (;=4.29497e+09;)
          f64.lt
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.ge
          i32.and
          i32.eqz
          br_if 0 (;@2;)
          local.get 0
          i32.trunc_f64_u
          local.set 2
          br 1 (;@1;)
        end
        i32.const 0
        local.set 2
      end
      i32.const -1
      local.get 2
      local.get 1
      select)

This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing `fptosi`, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in rust-lang#74659 to checked casts. After this commit the
codegen for the above Rust function looks like:

    (func $cast (type 0) (param f64) (result i32)
      (local i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        local.tee 1
        i32.const 1
        i32.xor
        br_if 0 (;@1;)
        local.get 0
        f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
        f64.le
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const -1
      i32.const 0
      local.get 1
      select)

For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:

    (func $cast (type 0) (param f64) (result i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x1p+32 (;=4.29497e+09;)
        f64.lt
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        i32.and
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const 0)

So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the `i32.trunc_sat_f32_s` instruction rather than always
replacing out-of-bounds values with zero.

There is still work that could be done to improve casts such as `f32` to
`u8`. That form of cast still uses the `fptosi` instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...

Closes rust-lang#73591
@bors bors closed this as completed in 2c1b046 Aug 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. O-wasm Target: WASM (WebAssembly), http://webassembly.org/
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants