i686 floating point behavior does not agree with unit tests in debug mode #73288

ecstatic-morse · 2020-06-12T17:10:26Z

Compiling and running the following code with -C opt-level=0 for 32-bit Linux on a recent nightly (2020-06-07) results in a failing assertion.

fn main() {
    let masked_nan2 = f32::NAN.to_bits() ^ 0x0055_5555;
    assert_eq!(f32::from_bits(masked_nan2).to_bits(), masked_nan2);
}

rustc +nightly --target=i686-unknown-linux-gnu test.rs && ./test

Output:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `2144687445`,
 right: `2140493141`', test.rs:3:5

However, this code is taken directly from a unit test for libstd added in #46012, so presumably it should succeed.

rust/src/libstd/f32.rs

Lines 1508 to 1514 in 5949391

    
           let masked_nan1 = f32::NAN.to_bits() ^ 0x002A_AAAA; 
        
           let masked_nan2 = f32::NAN.to_bits() ^ 0x0055_5555; 
        
           assert!(f32::from_bits(masked_nan1).is_nan()); 
        
           assert!(f32::from_bits(masked_nan2).is_nan()); 
        
           assert_eq!(f32::from_bits(masked_nan1).to_bits(), masked_nan1); 
        
           assert_eq!(f32::from_bits(masked_nan2).to_bits(), masked_nan2);

The text was updated successfully, but these errors were encountered:

RalfJung · 2020-06-12T17:15:59Z

This is probably caused by x87 instructions... I think we have an issue about that somewhere already, not sure where though.
Cc @hanna-kruppe

ecstatic-morse · 2020-06-12T18:22:17Z

Indeed. After the mask, the value in masked_nan2 is the following. On x86, this value represents a signalling NaN.

0 11111111 00101010101010101010101
    exp           significand

The call to f32::from_bits puts the same value onto the top of the x87 FPU stack with an FLD instruction before returning. After this is called, the #IA flag (invalid arithmetic operand exception) is set in the x87 status register, and the value on top of the stack is an 80-bit quiet NaN (0x7fffc000010000000000). According to the reference manual, this occurs only if the operand to FLD is not also a double extended-precision float.

When the return value is popped off the FPU stack in main and stored in memory with an FSTP instruction, the signalling/quiet bit (the most-significant bit of the significand) is set. On x86, this represents a quiet NaN.

0 11111111 10101010101010101010101
    exp           significand

hanna-kruppe · 2020-06-12T20:13:51Z

Independently of x87, the intent behind the test is on shaky ground. LLVM optimizations do not generally preserve NaN bit strings, see e.g. #55131.

ecstatic-morse · 2020-06-12T20:24:29Z

Seems like we should update the docs for {f32,f64}::from_bits then? Currently, they claim

This is currently identical to transmute::<u32, f32>(v) on all platforms

and

this implementation favors preserving the exact bits

hanna-kruppe · 2020-06-12T20:26:43Z

The first part isn't wrong, as far as I can tell. It literally is implemented as transmute, it's just that handling any float values in general does not always "preserves the exact bits".

ecstatic-morse · 2020-06-12T20:31:46Z

fn main() {
    let snan: u32 = 0xff << 23 | 1;

    let round_trip = unsafe {
        let float: f32 = std::mem::transmute(snan);
        std::mem::transmute(float)
    };

    assert_eq!(snan, round_trip);
    // assert_eq!(f32::from_bits(snan).to_bits(), snan);
}

@hanna-kruppe This succeeds in debug mode unless the last line is uncommented, so the docs are misleading at the very least.

hanna-kruppe · 2020-06-12T20:46:33Z

I believe that difference is due to the transmute getting const-eval'd, the LLVM IR just stores the same u32 bit pattern to the stack several times. While this does end up making a difference today:

Several changes to rustc could make the difference disappear again (enabling MIR inlining, not running const-prop in debug builds, forbidding const-prop from evaluating tranmsutes, etc.)
Conversely, all sorts of harmless changes to the version that uses transmute directly can break it. For example, anything that stops constant propagation. Or introducing your own function boundaries.

It's not that the functions in std do something different from transmute, it's that -- contrary to all usual intuition -- function boundaries and IR optimizations matter for this behavior. This whole situation is an incredibly mess and headache, and picking on this one sentence from the docs is not going to help clarify it. If we want to give users proper guidelines about this, we need to go back to the drawing board and figure out what the rules actually are and should be.

ecstatic-morse · 2020-06-12T21:26:41Z

I believe that difference is due to the transmute getting const-eval'd

If you mean MIR const propagated, it does not. Not sure about LLVM const propagation, but adding black_box and wrapping the round trip in a function call did not change behavior. I believe the fact that std::mem::transmute is direct re-export of the intrinsic accounts for the difference.

I'm not saying that this area isn't a mess, but the docs from_bits don't acknowledge this at all, and are trivially wrong in the case I've shown above. I'm going to update them to reflect this.

ecstatic-morse · 2020-06-12T21:30:56Z

@Mark-Simulacrum Besides the unresolved problems around NaN payloads, this raises another issue: Should we be running unit tests for the standard library in debug mode on CI?

As far as I know, we don't distribute a non-optimized libstd, but maybe we should be building one and running its tests as part of the nopt CI jobs?

Mark-Simulacrum · 2020-06-13T02:36:17Z

I am fairly certain (hopeful, I guess) that most of our builders do have debug assertions enabled... however, it sounds like what you actually want here is a debug libstd. I don't know how feasible that is from a performance perspective (I imagine running those tests is considerably slower?).

We do actually have a i686-gnu-nopt builder, and I think some other nopt builders, (https://github.com/rust-lang/rust/blob/master/src/ci/docker/i686-gnu-nopt/Dockerfile#L23). That's currently configured to not enable optimizations for tests, but that only affects compiletest-run tests. For tests like this which are heavily dependent on such, we can move them to run-pass tests to get benefits of debug testing. Note that debug assertions are currently disabled on that builder due to CI time concerns.

Mark-Simulacrum · 2020-06-13T02:37:49Z

Basically, as with all such questions, it comes down to someone sitting down and timing how long a debug build of std/std tests + running those tests would take, then timing some current task (e.g. release std tests) and scaling that to CI time. If that's sufficiently small in CI timescales, we can add it, if not we'll have to not do so.

RalfJung · 2020-06-13T08:53:11Z

In terms of the bug here, should we have a proper meta-bug (maybe even labelled I-unsound) about LLVM FP semantics around NaN payloads being a mess? This is not immediately actionable, but it is certainly something we should track in a single place. Or should we convert either this or #55131 into that meta-bug?

Cc @rust-lang/lang -- the short version is, looking at the bits of NaNs in LLVM is broken. :/

ecstatic-morse · 2020-06-13T20:07:44Z

@RalfJung I went ahead and opened #73328 to track documenting our guarantees around NaN. I think we can close this issue once we remove the tests that are relying on unspecified behavior. I can look into running libstd unit tests in debug mode on CI. Perhaps we just want to make the existing floating point unit tests into UI tests to see if there's any other issues like this?

ecstatic-morse · 2020-06-13T21:19:28Z

This is a duplicate of #46948.

RalfJung · 2020-06-14T07:20:34Z

I can look into running libstd unit tests in debug mode on CI.

Don't our existing debug CI builders already do that?
Maybe not for i686 though.

EDIT: Ah I see, the point is that we do it for ui tests but not unit tests.
FWIW, unit tests are in general preferable because we can more easily run them in Miri.

RalfJung · 2020-06-14T07:29:33Z

This is a duplicate of #46948.

So which of these would you prefer to close?

nikomatsakis · 2020-06-15T17:47:58Z

the short version is, looking at the bits of NaNs in LLVM is broken. :/

From reading the thread, it sounds like the bits of a NaN can basically change arbitrarily, as a result of LLVM optimizations that are not guaranteed to preserve them? (And other things, I suppose)

…r=Mark-Simulacrum Run standard library unit tests without optimizations in `nopt` CI jobs This was discussed in rust-lang#73288 as a way to catch similar issues in the future. This builds an unoptimized standard library with the bootstrap compiler and runs the unit tests. This takes about 2 minutes on my laptop. I confirmed that this method works locally, although there may be a better way of implementing it. It would be better to use the stage 2 compiler instead of the bootstrap one. Notably, there are currently four `libstd` unit tests that fail in debug mode on `i686-unkown-linux-gnu` (a tier one target): ``` failures: f32::tests::test_float_bits_conv f32::tests::test_total_cmp f64::tests::test_float_bits_conv f64::tests::test_total_cmp ``` These are the tests that prompted rust-lang#73288 as well as the ones added in rust-lang#72568, which is currently broken due to rust-lang#73328.

RalfJung · 2020-09-14T16:53:05Z

I always thought the x86-specific problems that arise due to x87 are side-stepped when SSE2 is enabled. But now @ecstatic-morse points out that this bug still remains. How is that possible, isn't SSE2 replacing the x87 mess?

RalfJung · 2020-09-14T17:09:00Z

@ecstatic-morse wrote

Yes. The x86 calling convention mandates that floating point values are returned on the FPU stack. Values on the FPU stack are extended-precision, so storing them into an 8-byte f64 involves truncation and thus is an "arithmetic operation", which canonicalizes NaNs, according to the x86 manual.

So I guess that also alleviates any question about whether LLVM could adjust codegen to not have this problem.

But this also points to a way out, doesn't it? If we adopt a model where we say something like "floating-point operations may change the NaN bits in a non-deterministic way" (which is what wasm does so arguably LLVM has to support it), then we "just" have to also say that as a special exception, on x86-32, returning floating-point values from a function also is considered a "floating-point operation". (Does this also affect arguments passed to a function?)

ecstatic-morse · 2020-09-14T17:16:51Z

The x87 stack is only used for the return value, not for passing arguments. That formulation would be correct on i686 AFAIK.

An alternative would be to use a different calling convention for extern "Rust" functions on 32-bit x86. I think this would break a lot of tooling, however, and we would still have this problem across an FFI boundary. It's also a lot more work.

programmerjake · 2020-09-14T18:11:41Z

If we adopt a model where we say something like "floating-point operations may change the NaN bits in a non-deterministic way"

the above is true with some important exceptions:
no operation (other than a bitcast from an integer or conversion from text to fp) produces a signaling NaN where the input did not have a signaling NaN, and all arithmetic operations (other than sign-bit manipulation) only produce quiet NaNs when they produce NaNs.

RalfJung · 2020-09-14T18:14:48Z

The wasm spec doesn't guarantee anything like that, does it? We cannot guarantee more than our compilation targets do.

Or maybe that's what the "arithmetic NaN" stuff is about, which I didn't really understand.

programmerjake · 2020-09-14T18:26:15Z

The wasm spec doesn't guarantee anything like that, does it? We cannot guarantee more than our compilation targets do.

Or maybe that's what the "arithmetic NaN" stuff is about, which I didn't really understand.

arithmetic NaN is another name for quiet NaN on all but a few unusual platforms: HPPA and MIPS pre-2008.

See the definitions for PlatformProperties in the simple-soft-float crate, which I wrote.

workingjubilee · 2021-12-05T20:54:02Z

Is it possible for us to adjust the Rust float calling convention for the pentium4-* ("i686") targets so that they always use the xmm registers for Rust code (essentially requiring they follow the x86-64 ABI conventions), never passing through the x87 stack-registers (unless a value is given to us from outside of Rust, which is of course not something we can meaningfully control)?

bjorn3 · 2021-12-05T20:57:56Z

It should be possible for the rust abi, though I don't know how to do this with LLVM. For the C abi I think the only option that doesn't break compatibility with existing C compilers would be to roundtrip between x87 and xmm registers to remove the excess precision.

workingjubilee · 2021-12-05T21:27:45Z

I think that would be fine. If we implemented that strategy, that would resolve this bug for Rust for the i686 target, and any other unusual behavior would be functionally Not A Bug: if you want to be sure you are actually moving 64 bits across an FFI barrier, then you can pass it as a u64 and remake it as a float on the other side, otherwise we should follow whatever idiosyncratic behavior a given ABI mandates.

About the previous #73288 (comment) in this thread, I am actually somewhat skeptical as to how much tooling it would break (not that I think it would be zero) as opposed to how many issues would be closed (which would be at least one). For instance, any debuggers should be looking at all the registers, not just the x87 stack, and MSVC actually sets the x87 stack to a different precision than other C compilers, apparently, so we should be avoiding it in general if we want portable results: https://randomascii.wordpress.com/2012/03/21/intermediate-floating-point-precision/

programmerjake · 2021-12-06T03:07:13Z

apparently the x87 stack can be avoided by using the fastcc calling convention or by marking the floating-point return value as inreg:
https://github.com/llvm/llvm-project/blob/e3c2694da98d9e6585b47cebfedce8473f679fff/llvm/lib/Target/X86/X86CallingConv.td#L267-L273

// X86-32 C return-value convention.
def RetCC_X86_32_C : CallingConv<[
  // The X86-32 calling convention returns FP values in FP0, unless marked
  // with "inreg" (used here to distinguish one kind of reg from another,
  // weirdly; this is really the sse-regparm calling convention) in which
  // case they use XMM0, otherwise it is the same as the common X86 calling
  // conv.

workingjubilee · 2022-02-18T00:19:59Z

Belatedly: if that is the case, then that answers why this changes! Our functions wind up annotated with fastcc in LLVM after we run optimizations, but not before, so we actually change our calling conventions when calling into Rust code. We can close this bug simply by tuning rustc to be more insistent and consistent in passing arguments.

RalfJung · 2023-06-28T11:20:31Z

I propose that we close this issue by documenting this as a known non-compliance on x86-32 targets: #113053.

RalfJung · 2023-08-04T18:54:51Z

Closing in favor of a dedicated tracking issue that summarizes what we know about the issue: #115567.

…bilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327

Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee add notes about non-compliant FP behavior on 32bit x86 targets Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look! In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used. I opened rust-lang#114479 to concisely describe and track the issue. Cc `@workingjubilee` `@thomcc` `@chorman0773` `@rust-lang/opsem` Fixes rust-lang#73288 Fixes rust-lang#72327

ecstatic-morse added C-bug Category: This is a bug. O-x86 labels Jun 12, 2020

ecstatic-morse changed the title ~~i686 floating point behavior does not agree with unit tests~~ i686 floating point behavior does not agree with unit tests in debug mode Jun 12, 2020

ecstatic-morse mentioned this issue Jun 12, 2020

Const floating point bitcasts and classification #72449

Merged

RalfJung mentioned this issue Jun 13, 2020

Tracking issue for #![feature(const_fn_floating_point_arithmetic)] #57241

Open

RalfJung added T-lang Relevant to the language team, which will review and decide on the PR/issue. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Jun 13, 2020

ecstatic-morse mentioned this issue Jun 13, 2020

Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs #73328

Open

ecstatic-morse mentioned this issue Jun 14, 2020

Some num tests fail for {i586, i686, x86_64}-unknown-linux-gnu #46948

Closed

ecstatic-morse mentioned this issue Jun 15, 2020

Run standard library unit tests without optimizations in nopt CI jobs #73383

Merged

ecstatic-morse added the A-floating-point Area: Floating point numbers and arithmetic label Jun 15, 2020

This was referenced Aug 7, 2020

Our floating point semantics were a mess rust-lang/unsafe-code-guidelines#237

Open

Comparing to infinity is buggy on x87 #72327

Closed

RalfJung mentioned this issue Sep 9, 2020

Tracking Issue for total_cmp (on f32/f64) #72599

Closed

6 tasks

workingjubilee mentioned this issue Oct 14, 2020

Floats are problematic rust-lang/portable-simd#39

Open

4 tasks

bstrie mentioned this issue Feb 14, 2021

Tracking issue for #![feature(const_float_bits_conv)] #72447

Open

Amanieu mentioned this issue Jun 10, 2021

wasm: Mark simd intrinsics as stable rust-lang/stdarch#1179

Merged

SkiFire13 mentioned this issue Oct 30, 2021

Added next_up and next_down for f32/f64. #88728

Closed

andy-thomason mentioned this issue Nov 23, 2021

32-bit Windows cannot handle Rfloat::na() extendr/extendr#321

Closed

CAD97 mentioned this issue Jan 24, 2023

Rust does not comply with IEEE 754 floats: arithmetic can produce signaling NaN #107247

Closed

workingjubilee added the A-target-feature Area: Enabling/disabling target features like AVX, Neon, etc. label Mar 3, 2023

RalfJung mentioned this issue Jun 28, 2023

add notes about non-compliant FP behavior on 32bit x86 targets #113053

Merged

RalfJung mentioned this issue Aug 4, 2023

Tracking issue: 32bit x86 targets without SSE2 have unsound floating point behavior #114479

Open

RalfJung closed this as completed Aug 4, 2023

RalfJung mentioned this issue Sep 5, 2023

Tracking issue: 32bit x86 targets lose float NaN payload in return values #115567

Open

Nilstrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i686 floating point behavior does not agree with unit tests in debug mode #73288

i686 floating point behavior does not agree with unit tests in debug mode #73288

ecstatic-morse commented Jun 12, 2020 •

edited

RalfJung commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 •

edited

hanna-kruppe commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 •

edited

hanna-kruppe commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 •

edited

hanna-kruppe commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 •

edited

ecstatic-morse commented Jun 12, 2020 •

edited

Mark-Simulacrum commented Jun 13, 2020

Mark-Simulacrum commented Jun 13, 2020

RalfJung commented Jun 13, 2020 •

edited

ecstatic-morse commented Jun 13, 2020

ecstatic-morse commented Jun 13, 2020

RalfJung commented Jun 14, 2020 •

edited

RalfJung commented Jun 14, 2020

nikomatsakis commented Jun 15, 2020

RalfJung commented Sep 14, 2020

RalfJung commented Sep 14, 2020

ecstatic-morse commented Sep 14, 2020 •

edited

programmerjake commented Sep 14, 2020

RalfJung commented Sep 14, 2020 •

edited

programmerjake commented Sep 14, 2020

workingjubilee commented Dec 5, 2021

bjorn3 commented Dec 5, 2021 •

edited

workingjubilee commented Dec 5, 2021

programmerjake commented Dec 6, 2021

workingjubilee commented Feb 18, 2022

RalfJung commented Jun 28, 2023

RalfJung commented Aug 4, 2023 •

edited

i686 floating point behavior does not agree with unit tests in debug mode #73288

i686 floating point behavior does not agree with unit tests in debug mode #73288

Comments

ecstatic-morse commented Jun 12, 2020 • edited

RalfJung commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 • edited

hanna-kruppe commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 • edited

hanna-kruppe commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 • edited

hanna-kruppe commented Jun 12, 2020

ecstatic-morse commented Jun 12, 2020 • edited

ecstatic-morse commented Jun 12, 2020 • edited

Mark-Simulacrum commented Jun 13, 2020

Mark-Simulacrum commented Jun 13, 2020

RalfJung commented Jun 13, 2020 • edited

ecstatic-morse commented Jun 13, 2020

ecstatic-morse commented Jun 13, 2020

RalfJung commented Jun 14, 2020 • edited

RalfJung commented Jun 14, 2020

nikomatsakis commented Jun 15, 2020

RalfJung commented Sep 14, 2020

RalfJung commented Sep 14, 2020

ecstatic-morse commented Sep 14, 2020 • edited

programmerjake commented Sep 14, 2020

RalfJung commented Sep 14, 2020 • edited

programmerjake commented Sep 14, 2020

workingjubilee commented Dec 5, 2021

bjorn3 commented Dec 5, 2021 • edited

workingjubilee commented Dec 5, 2021

programmerjake commented Dec 6, 2021

workingjubilee commented Feb 18, 2022

RalfJung commented Jun 28, 2023

RalfJung commented Aug 4, 2023 • edited

ecstatic-morse commented Jun 12, 2020 •

edited

ecstatic-morse commented Jun 12, 2020 •

edited

ecstatic-morse commented Jun 12, 2020 •

edited

ecstatic-morse commented Jun 12, 2020 •

edited

ecstatic-morse commented Jun 12, 2020 •

edited

ecstatic-morse commented Jun 12, 2020 •

edited

RalfJung commented Jun 13, 2020 •

edited

RalfJung commented Jun 14, 2020 •

edited

ecstatic-morse commented Sep 14, 2020 •

edited

RalfJung commented Sep 14, 2020 •

edited

bjorn3 commented Dec 5, 2021 •

edited

RalfJung commented Aug 4, 2023 •

edited