Wrong signs on division producing NaN #55131

dtolnay · 2018-10-16T20:14:28Z

Noticed this while playing with #54235.

fn f(x: f64) -> f64 {
    0f64 / x
}

fn main() {
    println!("{:?}", (0f64 / 0f64).is_sign_negative());
    println!("{:?}", f(0f64).is_sign_negative());
}

As of rustc 1.31.0-nightly (46880f4 2018-10-15) on x86_64-unknown-linux-gnu, in debug mode this program prints false true and in release mode prints false false. Two of my expectations are violated:

The output should be consistent between debug mode and release mode.
The first and second println should print the same value.

(Happy to reconsider if these expectations are unfounded.)

The text was updated successfully, but these errors were encountered:

dtolnay · 2018-10-16T20:16:46Z

Compilers 1.19 and older consistently print false false which aligns with my expectations; 1.20 and newer behave as above.

hanna-kruppe · 2018-10-16T23:23:47Z

Evidently LLVM does not guarantee the sign of NaNs, just as it does not guarantee the signaling bit or payload. I can't say I would have known that, but it doesn't surprise me either.

Two observations that explain these discrepancies:

(0f64 / 0f64) is constant folded even in debug mode (by IRBuilder), while f(0f64) obviously is only constant folded when inlined, i.e., in release mode.
When constant folding a floating point computation that results in a NaN, LLVM prefers 0x7FF8000000000000 (which has positive sign). Apparently your CPU differs and produces a negative NaN for the runtime division.

RalfJung · 2020-03-03T07:21:53Z

In other words, the semantics of floating point operations would be something like "if the result is a NaN, non-deterministically pick any legal NaN representation". This non-determinism explains why debug and release builds differ in behavior.

I wonder if we should make Miri pick a random NaN payload and sign and signalling bit, just to drive home this point...

RalfJung · 2020-04-24T14:24:31Z

@hanna-kruppe notes that "NaNs are unstable under copying" seems rather excessive and in fact people might rely on NaN payloads being preserved on copy.

A less drastic alternative is to say that every single FP operation (arithmetic and intrinsics and whatnot, but not copying), when it returns a NaN, non-deterministically picks any NaN representation.

hanna-kruppe · 2020-04-24T14:32:03Z

I believe it was @Lokathor who made this point, though I don't disagree.

However, I have doubts whether either option is enough to explain away the behavior LLVM can produce today. Don't have time to summarize but here's a link to the Zulip discussion for future reference: https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/floating.20point.20semantics

RalfJung · 2020-04-24T14:51:40Z

However, I have doubts whether either option is enough to explain away the behavior LLVM can produce today.

I did not see anything in that discussion that makes it sound like either option wouldn't work -- by current impression is that both correctly describe LLVM behavior. What did I miss? (Not urgent, just respond when you got time again.)

hanna-kruppe · 2020-04-24T15:54:24Z

Specifically https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/floating.20point.20semantics/near/194786318 and the whole earlier discussion about how combinations of other optimizations can result in different uses of the same value (in Rust / the initial LLVM IR) observing different results. We talked about how maybe floats should be "frozen" when moving into the integer domain but this does not currently happen and as I said in https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/floating.20point.20semantics/near/194786318 LLVM can currently eliminate the float<->int bitcasts/transmutes/etc. that we do have (even if one might argue that it shouldn't).

RalfJung · 2020-04-24T17:22:51Z

Hm okay if LLVM will duplicate casts then that would indeed contradict a "typed copy messes up NaN" semantics.

For the "FP operations pick arbitrary NaN" semantics, I suppose LLVM will also happily duplicate floating point operations since it considers them deterministic?

But together with "NaNs are not preserved", that actually leads to a contradiction, and if we can make LLVM do the right optimizations in the right order we can likely show a miscompilation from this.

hanna-kruppe · 2020-04-24T17:28:15Z

Right, I believe there's potential miscompilations lurking there, but they're probably very difficult to tease out -- maybe even impossible today, if the stars don't align.

RalfJung · 2020-09-09T11:43:05Z

Would it be worth bringing this up with LLVM? Seems like either they should clarify that NaN payloads are not preserved by some of their FP operations, or else they should consider this a bug. The former might be a problem because people compile browsers in LLVM and those browsers' JS/wasm runtimes might want to actually carry data in NaN payloads...

programmerjake · 2020-09-09T17:25:05Z

In other words, the semantics of floating point operations would be something like "if the result is a NaN, non-deterministically pick any legal NaN representation". This non-determinism explains why debug and release builds differ in behavior.

I wonder if we should make Miri pick a random NaN payload and sign and signalling bit, just to drive home this point...

One note: the IEEE 754 fp standard requires the result of arithmetic operations to not be signaling NaNs.

workingjubilee · 2020-09-12T00:23:46Z

What the IEEE754 FP standard says and what the implementation does are very different things, in practice, per #10186

RalfJung · 2020-09-12T07:46:48Z

If we follow wasm, then Miri could pick any arithmetic NaN. Whether and how that aligns with being signalling or not, I do not know.

RalfJung · 2022-11-23T12:46:32Z

Based on this I am inclined to declare this not-a-bug: NaN-producing operations do not have a well-defined sign, so there cannot be a 'wrong' sign. This is the semantics both in LLVM and wasm. I think Rust should follow suit.

Muon · 2023-01-24T03:16:16Z

This is definitely permissible according to IEEE 754. The only guarantee is that the result of 0/0 is a quiet NaN. The sign bit is not required to be the same between two divisions. Although the target FPU usually produces only specific NaNs, Rust does not (presently) promise that it upholds the semantics of the target FPU.

RalfJung · 2023-08-04T19:06:43Z

Closing in favor of #73328: we are not guaranteeing anything about the sign of a NaN produced by 0.0 / 0.0. (This matches, for instance, the WebAssembly specification.) Better documentation of all this is clearly required, that's what the other issue is about.

hanna-kruppe mentioned this issue Oct 16, 2018

Debug-print negative not-a-number as "-NaN" #54235

Closed

estebank added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Jan 19, 2019

hanna-kruppe mentioned this issue Mar 2, 2020

Miri floating point NaN conversion issue #69532

Closed

RalfJung added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Mar 3, 2020

RalfJung mentioned this issue Apr 20, 2020

powi: Different rounding behaviour between debug and release mode, and across platforms #71355

Closed

jyn514 mentioned this issue May 21, 2020

/ is not the same as f64::div for NaN #72411

Closed

hanna-kruppe mentioned this issue Jun 12, 2020

i686 floating point behavior does not agree with unit tests in debug mode #73288

Closed

RalfJung mentioned this issue Jun 13, 2020

Tracking issue for #![feature(const_fn_floating_point_arithmetic)] #57241

Open

This was referenced Jun 13, 2020

Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs #73328

Open

Tracking Issue for total_cmp (on f32/f64) #72599

Closed

ecstatic-morse added the A-floating-point Area: Floating point numbers and arithmetic label Jun 15, 2020

RalfJung mentioned this issue Sep 9, 2020

Our floating point semantics were a mess rust-lang/unsafe-code-guidelines#237

Open

RalfJung mentioned this issue Feb 20, 2022

Incorrect f32::NAN as u64/u128 conversion rust-lang/rustc_codegen_gcc#75

Open

scottmcm mentioned this issue Sep 28, 2022

Sign of zero from differs between CTFE and normal execution (float % float) #102403

Closed

lukas-code mentioned this issue Sep 28, 2022

Value differs between debug and release: 1.0 / ((-0.0) * black_box_zero) #102402

Closed

RalfJung closed this as completed Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong signs on division producing NaN #55131

Wrong signs on division producing NaN #55131

dtolnay commented Oct 16, 2018

dtolnay commented Oct 16, 2018

hanna-kruppe commented Oct 16, 2018

RalfJung commented Mar 3, 2020

RalfJung commented Apr 24, 2020

hanna-kruppe commented Apr 24, 2020

RalfJung commented Apr 24, 2020

hanna-kruppe commented Apr 24, 2020

RalfJung commented Apr 24, 2020

hanna-kruppe commented Apr 24, 2020

RalfJung commented Sep 9, 2020

programmerjake commented Sep 9, 2020

workingjubilee commented Sep 12, 2020

RalfJung commented Sep 12, 2020

RalfJung commented Nov 23, 2022 •

edited

Muon commented Jan 24, 2023

RalfJung commented Aug 4, 2023

Wrong signs on division producing NaN #55131

Wrong signs on division producing NaN #55131

Comments

dtolnay commented Oct 16, 2018

dtolnay commented Oct 16, 2018

hanna-kruppe commented Oct 16, 2018

RalfJung commented Mar 3, 2020

RalfJung commented Apr 24, 2020

hanna-kruppe commented Apr 24, 2020

RalfJung commented Apr 24, 2020

hanna-kruppe commented Apr 24, 2020

RalfJung commented Apr 24, 2020

hanna-kruppe commented Apr 24, 2020

RalfJung commented Sep 9, 2020

programmerjake commented Sep 9, 2020

workingjubilee commented Sep 12, 2020

RalfJung commented Sep 12, 2020

RalfJung commented Nov 23, 2022 • edited

Muon commented Jan 24, 2023

RalfJung commented Aug 4, 2023

RalfJung commented Nov 23, 2022 •

edited