Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Our floating point semantics were a mess #237

Open
RalfJung opened this issue Jun 14, 2020 · 51 comments
Open

Our floating point semantics were a mess #237

RalfJung opened this issue Jun 14, 2020 · 51 comments
Labels
A-floats Topic: concerns floating point operations/representations C-open-question Category: An open question that we should revisit

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 14, 2020

Floating-point semantics are hard, in particular the NaN part, but this should describe them accurately -- except on x86-32, for more than one reason.

We still need to officially decide that those are our intended semantics though. rust-lang/rust#73328 tracks that on the rustc side.

Historic info

There are several ways in which our de-facto FP semantics currently are broken:

I'm adding this here because figuring out the semantics of Rust is part of the goal of the UCG (I think?) and we should have an overarching "FP semantics" tracker somewhere.

@workingjubilee
Copy link

Worth noting: we can't expect floating point to be handled consistently on all architectures and over all datatypes that use floating points, yet it might be confusing if floating points have wildly different behavior based on the box they're in. rust-lang/rfcs#2977 (comment)

@scottmcm
Copy link
Member

This also reminds me of rust-lang/rust#75786 -- We should probably have an explicit accuracy promise documented somewhere, including for library functions.

(Hopefully we can at least promise ±1ULP everywhere, though some things really ought to be ±½ULP.)

@RalfJung
Copy link
Member Author

RalfJung commented Sep 12, 2020

Worth noting: we can't expect floating point to be handled consistently on all architectures and over all datatypes that use floating points

Usually, that's why specs can be under-specified -- they can leave room for implementations to differ, either via non-determinism (that's what wasm does for NaNs; would be interesting what they do for denormals) or via introducing unspecified target-specific "canonicalization" or similar mechanisms.

This also reminds me of rust-lang/rust#75786 -- We should probably have an explicit accuracy promise documented somewhere, including for library functions.

That's just trig (and other) functions being platform-specific (the division/multiplication in the title is a red herring). So it's related but the issue here is about how operations provided by the core language behave. We can worry about trig functions once we have that sorted out. ;)

@workingjubilee
Copy link

workingjubilee commented Sep 20, 2020

Adding some background data: I recently discovered that Intel recommends setting the denormals-are-zero and flush-to-zero flags on their C++ compiler unless floating point denormal accuracy is application critical: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/understanding-floating-point-operations/setting-the-ftz-and-daz-flags.html

In addition, the fact that a floating point op can vary in speed by 2 orders of magnitude is a rich source of data for timing attacks: https://cseweb.ucsd.edu/~dkohlbre/papers/subnormal.pdf

Also, explicit fused multiply-add ops are part of the IEEE754-2008 standard and usually can lead to increased precision by reducing the number of roundings that occur... but that can also reduce the accuracy of later operations that encode assumptions like "this is commutative, right?"

@RalfJung
Copy link
Member Author

RalfJung commented Sep 20, 2020

In addition, the fact that a floating point op can vary in speed by 2 orders of magnitude is a rich source of data for timing attacks: https://cseweb.ucsd.edu/~dkohlbre/papers/subnormal.pdf

Speed is not currently part of the Rust specification, so that aspect, while certainly relevant in general, is unrelated to specifying floating-point behavior in Rust.

Also, explicit fused multiply-add ops are part of the IEEE754-2008 standard and usually can lead to increased precision by reducing the number of roundings that occur... but that can also reduce the accuracy of later operations that encode assumptions like "this is commutative, right?"

AFAIK LLVM does not introduce FMUs unless we explicitly set a flag, so right now I think there is no problem here. Unless you want to suggest we should introduce FMUs; that could complicate our floating-point story even further but in ways that are orthogonal to the problems listed here. There was at least one attempt at an RFC here (rust-lang/rfcs#2686), but since it is a hypothetical future (we don't introduce FMUs right now) I don't think it needs tracking here.

Adding some background data: I recently discovered that Intel recommends setting the denormals-are-zero and flush-to-zero flags on their C++ compiler unless floating point denormal accuracy is application critical: https://software.intel.com/content/www/us/en/develop/documentation/cpp-compiler-developer-guide-and-reference/top/compiler-reference/floating-point-operations/understanding-floating-point-operations/setting-the-ftz-and-daz-flags.html

What does this mean for Rust programs? This sounds a bit like a fast-math style flag? Similar to the optimizer introducing FMUs, to my knowledge the complications around that are orthogonal to figuring out what the Rust semantics for completely standard IEEE floating point operations are when NaNs or weird platforms (i686) are involved.

Fast-math is its own can of worms, and very little is known about how to specify it precisely. In contrast, the issue here is mostly about figuring out what LLVM (and, by extension, hardware) actually does; writing a reasonable spec is easy ("copy wasm"), but making sure LLVM conforms to that spec is hard because, as usual, LLVM IR behavior is scarcely documented.

Or is this related to the denormal problem around SSE that you raised earlier?

@Lokathor
Copy link
Contributor

I think a separate issue is warranted for denormals and fast math issues, as long as a fix to our NaN and x87 issues don't block potential avenues there they're almost entirely separate.

bors added a commit to rust-lang-ci/rust that referenced this issue Jan 23, 2021
…-obk

avoid promoting division, modulo and indexing operations that could fail

For division, `x / y` will still be promoted if `y` is a non-zero integer literal; however, `1/(1+1)` will not be promoted any more.

While at it, also see if we can reject promoting floating-point arithmetic (which are [complicated](rust-lang/unsafe-code-guidelines#237) so maybe we should not promote them).

This will need a crater run to see if there's code out there that relies on these things being promoted.

If we can land this, promoteds in `fn`/`const fn` cannot fail to evaluate any more, which should let us do some simplifications in codegen/Miri!

Cc rust-lang/rfcs#3027
Fixes rust-lang#61821
r? `@oli-obk`
@antoyo
Copy link

antoyo commented Feb 19, 2022

Not sure where to post this comment, but rustc_codegen_gcc has a NaN with a different sign than Rust with LLVM.
I'm wondering what to do about this, but I guess the solution has not been figured out.

@digama0
Copy link

digama0 commented Feb 20, 2022

@thomcc wrote a lengthy post about this in https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/float.20total_cmp/near/270506799, and the surrounding conversation is also relevant.

@workingjubilee
Copy link

I believe I have devised a path forward which may allow us to cut out the i686 issue, by simply having Rust functions use the calling convention that LLVM annotates them with after optimizations and guaranteeing SSE2 is enabled and x87 is disabled outside extern "C" and the like.

@workingjubilee
Copy link

workingjubilee commented Feb 20, 2022

As far as the NaN thing:

Both GCC and LLVM incorrectly conflate "the NaN payload is implementation defined" with "the sign is undef", as you can simply issue copysign on them all day, both to and from. That defines the sign, and it can then be recovered, and that is not supposed to change the NaN value otherwise. So -NaN and +NaN are separate domains in the IEEE754 "universe", that both are subdomains of NaN, and they are both wrong. The choices they have made may be technically legalized under very strained interpretations of the standard, but they are not truly correct when used to justify optimizations.

@RalfJung
Copy link
Member Author

RalfJung commented Feb 20, 2022

Wait, does LLVM really treat any part of a NaN as undef? That would be a problem -- it would make f32::to_bits unsound since one could use to to safely produce (partially) undef u32.

@workingjubilee
Copy link

I think they treat producing a NaN value as logically originating from an undefined origin every time it is produced, and thus feel entitled to select an arbitrary value each time. So perhaps I should say they treat it as round-tripping through the whocares?(nan) -> nan function, which is, er, a non-injective function, to say the least. :^) I believe by the time they actually place the NaN in the operation they treat it as a defined value.

@RalfJung
Copy link
Member Author

Ah okay. As long as whocares produces defined bits (not undef/poison), we are good. (For some values of of "good"...)

@conrad-watt
Copy link

Just popping in from the Wasm side to say that I'm happy to have conversations here. Apologies if I've missed this being brought up earlier in this issue, but one of the motivations for our current semantics was a divergence of NaN bit patterns between x86 and Arm - see this comment. Presumably LLVM has to worry about something similar if it wants to give a semantics that abstracts over both?

@workingjubilee
Copy link

As I stated elsewhere, those are compile-time constants. Rust also happens to abstract over usize being different sizes on different targets: it's not actually that hard.

@workingjubilee
Copy link

LLVM does not necessarily abstract over compilation targets as much as you might think it does. I have several times asked LLVM developers about how LLVM would handle various somewhat unusual compilation requests, and been frequently told that the frontend should lower to operations the machine can specifically perform. There are some things that LLVM does abstract over but less than you might imagine when you get to actually pushing on the edges of the LangRef, so it is a "target independent" code generator only if you apply a huge series of asterisks to that, even before we introduce actual inline assembly to the mix.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 1, 2022 via email

@sunfishcode
Copy link
Member

sunfishcode commented Dec 1, 2022

That's a fundamental difference between Rust on all platforms other than Wasm, and Wasm. Rust doesn't make any attempt to hide the target architecture; source code can explicitly ask what the architecture is. In Wasm, the target architecture is hidden; it's not known at all at source-code compile time, and at runtime it's only observable if you know what NaN bits to look for, or maybe if you look really closely at the behavior of memory shared between threads.

On everything except Wasm, Rust could be ok saying "the NaN generated by 0.0/0.0 is a target-specific constant". But Wasm fundmentally doesn't know what the target architecture is going to be until at least program startup time. So on Wasm, making it a compile-time constant would require taking a performance hit on at least one popular architecture. We don't have comprehensive data, but one benchmark suggests it could be in the 5%-15% range on floating-point code.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 1, 2022

Oh right, so this is not really like usize.

A "value picked non-deterministically once at program startup" would still work though, if wasm would be willing to guarantee that. Though the wasm issue mentions migrating live code from x86 to ARM, so maybe wasm has to stay non-det... in which case the best the Rust can do is say "compile-time constant on some targets, fully non-det on others".

@Muon
Copy link

Muon commented Dec 27, 2022

I know that this involves fixing a lot of things with LLVM, just thought I'd put in my two cents before an RFC is written up (FWIW, my PhD topic is on floating-point arithmetic decision procedures). Happy to answer any questions.

  • Floating-point operations must follow IEEE 754, meaning that non-NaN results of operations other than mod are fully determined by the active rounding mode.
    • The default rounding mode is rounding to nearest, ties to even.
    • The rounding mode must be changeable in some way.
      • Perhaps something like #[rounding_mode(xyz)]? This affects the calling convention, and FPU state changes are very expensive. Caller probably sets?
      • Unless otherwise specified, functions are presumed to stipulate rounding to nearest, ties to even.
      • It must be possible to say that a function does not care about the current rounding mode.
        • We need this for fusedMultiplyAdd to inherit the active rounding mode, unless there's a version of it for every rounding mode.
  • Non-NaN results of mod are fully determined independent of rounding mode.
  • Optimizations must preserve the results of operations bit for bit unless expressly relaxed.
  • The sign and payload bits of a NaN result are unspecified.
    • Bit operations (neg, abs, copySign) may still fix the sign bit.
  • All floating-point operations must be const, even when NaNs may be produced.
    • Whether or not a NaN is produced is a deterministic property, it's just the exact bit pattern that's unspecified.
    • This probably requires some form of abstract interpretation.
    • Const evaluation must report failure if the result is not deterministic.
      • It may also report failure if we ran out of resources trying to compute a deterministic result.
        • This is likely to happen when totalOrder or transmutes are involved.
        • Users of const evaluation of NaNs may alleviate computational pains by checking for NaN results and replacing them with specific, known NaNs.
      • Even if the target restricts which NaNs it produces, this computation should not be target-dependent.
        • This should perhaps be controllable?

@RalfJung
Copy link
Member Author

The rounding mode must be changeable in some way.

That is a non-goal for now, as far as I am concerned.

Const evaluation must report failure if the result is not deterministic.

I don't think we want this; also see rust-lang/rfcs#3352. I think we want to just say that const-eval produces some unspecified result; the result is deterministic for any given build of the compiler and target configuration but may otherwise vary.

@antoyo
Copy link

antoyo commented Dec 27, 2022

  • The sign and payload bits of a NaN result are unspecified.

Does that mean that it's OK if the GCC and LLVM backends produce a different result for 0.0 / 0.0?

@digama0
Copy link

digama0 commented Dec 27, 2022

Isn't that already the case? LLVM will const-eval floats IIRC, and it does not necessarily use the same sign propagation conventions as the target, which means that unless the optimizations exactly line up you can get different results on different compilers for that (not 0.0 / 0.0 literally but something equivalent to it after passing the values through variables and functions).

@Muon
Copy link

Muon commented Dec 28, 2022

The rounding mode must be changeable in some way.

That is a non-goal for now, as far as I am concerned.

That's okay, although it would be preferable if you don't close that off. It's really painful to emulate other rounding modes.

Const evaluation must report failure if the result is not deterministic.

I don't think we want this; also see rust-lang/rfcs#3352. I think we want to just say that const-eval produces some unspecified result; the result is deterministic for any given build of the compiler and target configuration but may otherwise vary.

It's not true that we need to relax the restrictions on const evaluation in order to allow floating-point to be used in it. We're currently assuming a fixed rounding mode already (round to nearest, ties to even), so the results of all floating-point computations that don't depend on the unspecified bits of a NaN are already fully determined. Any inconsistency or apparent nondeterminism in NaN-free code is a bug. Among IEEE 754-conforming hardware implementations, the only differences may be in the production of NaNs. Picking an arbitrary NaN among the possible NaNs allowed by IEEE 754 is incorrect, since the hardware will actually produce specific NaNs, we just don't know which. It is surprising if const evaluation produces results which are unobtainable in a runtime evaluation. This is why we must overapproximate. Note again that the only instances in which the result is not determined are those in which we directly or indirectly depend on the unspecified bits of a NaN.

  • The sign and payload bits of a NaN result are unspecified.

Does that mean that it's OK if the GCC and LLVM backends produce a different result for 0.0 / 0.0?

Yes. That expression does not produce a fully determined result. A NaN is encoded as having any sign bit, all exponent bits set, and at least one significand bit set. The result of that expression will be a quiet NaN, because all computations produce quiet NaNs. Those are the only restrictions on the result. However, IEEE 754 does not require particular encodings for quiet or signaling NaNs, and leaves it to the target. Furthermore, different ISAs just produce different NaNs. In that regard, it cannot be soundly constant-folded without knowing the details of the target (if the ISA even specifies which NaNs it produces in that instance).

@RalfJung
Copy link
Member Author

RalfJung commented Dec 28, 2022

It is surprising if const evaluation produces results which are unobtainable in a runtime evaluation.

That's exactly what is being discussed in rust-lang/rfcs#3352. I agree it is surprising but I think it is better than the alternatives. (Your statement boils down to "runtime behavior must be superset of compiletime behavior", which is being discussed in the RFC thread, though not yet in the RFC text.) So let's discuss that question there and not here.

Also note that all your statements assume that Rust float semantics are exactly IEEE float semantics, which is not a given. We eventually might adopt something like the wasm semantics which makes more guarantees than IEEE when it comes to NaNs.

In that regard, it cannot be soundly constant-folded without knowing the details of the target (if the ISA even specifies which NaNs it produces in that instance).

This statement is in fact wrong if Rust uses IEEE float semantics. Since under those semantics NaNs bits are picked non-deterministically when the NaN is produced, Rust can constant-fold such operations to an arbitrary NaN.

IOW, Rust does not guarantee that NaNs behave the same way as they would if you were to write assembly by hand. It guarantees that they behave as in a valid IEEE implementation (and maybe even a particular variant of that, as in the wasm case), but it doesn't have to be the same IEEE implementation as what your hardware does.

@Muon
Copy link

Muon commented Dec 28, 2022

Also note that all your statements assume that Rust float semantics are exactly IEEE float semantics, which is not a given. We eventually might adopt something like the wasm semantics which makes more guarantees than IEEE when it comes to NaNs.

I am indeed expecting that Rust uses IEEE 754 semantics. After all, it's what almost all hardware implements, and that's fast.

This statement is in fact wrong if Rust uses IEEE float semantics. Since under those semantics NaNs bits are picked non-deterministically when the NaN is produced, Rust can constant-fold such operations to an arbitrary NaN.

An arbitrary quiet NaN, specifically. However, if we're using the hardware FPU, we don't determine which NaNs are quiet, so the result is still target-dependent.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 28, 2022 via email

@Muon
Copy link

Muon commented Dec 29, 2022

Ah, I see. So, exactly matching the spec is in a certain sense ill-posed. The spec requires that both quiet and signaling NaNs exist and that computational operations only return quiet NaNs. That is, there is more than just nondeterminism at play here. For a (nondeterministic) abstract machine to comply, it must still at minimum choose some NaNs to be quiet and some to be signaling. Notably, different machines can make different choices. However, if necessary, Rust can leave the exact bit patterns unspecified.

Implementation-wise, there are a few pitfalls with actually achieving compliance. Firstly, at some point, rustc has to output specific bits for NaNs, and there are real-world CPUs which disagree about signaling and quiet NaNs. Notoriously, MIPS does the opposite of x86 and ARM. Secondly, LLVM is still probably far away from spec compliance, even disregarding NaNs (llvm/llvm-project#44497, llvm/llvm-project#43070, llvm/llvm-project#24913, llvm/llvm-project#25233, llvm/llvm-project#18362).

@RalfJung
Copy link
Member Author

RalfJung commented Jan 2, 2023

TBH my inclination is to entirely ignore everything related to signalling NaNs for now. AFAIK that is what LLVM does, so some groundwork will be needed elsewhere before we can even attempt to do any better.

@Muon
Copy link

Muon commented Jan 3, 2023

That's somewhat tenable, I think? We don't currently have any way of distinguishing sNaN from qNaN. The ordering given by total_cmp (our version of the totalOrder predicate) is based on the standard's recommendation for what should be a signaling NaN, not the underlying implementation. The documentation they may need to be adjusted. We'd also need to document that our operations are not strictly compliant and can in fact return any NaN pattern. That said, I'm not sure this is simpler than saying we return unspecified qNaNs and leaving unspecified which patterns are qNaNs/sNaNs.

@RalfJung
Copy link
Member Author

I have written a pre-RFC to document our current floating-point semantics. Please comment on Zulip or hackmd!

@Muon
Copy link

Muon commented Jul 18, 2023

I'll have a look through and comment on it when I get back either tomorrow or when I get back home in a few days.

@RalfJung
Copy link
Member Author

RalfJung commented Aug 4, 2023

So... I think this is not so much of a mess any more?

For the x86-32-specific trouble, the FCP in rust-lang/rust#113053 went through. These are unambiguously platform-specific bugs but are hard to fix. We now have rust-lang/rust#114479 and rust-lang/rust#115567 tracking this.

Most of the rest boils down to "we don't guarantee much about the bits you get out of a NaN-producing operation". That is tracked in rust-lang/rust#73328. My Pre-RFC describes our current de-facto guarantees.

I assume that RFC will need approval by t-opsem and t-lang. Let's keep an issue open on our side as well until we have a team decision on this.

@RalfJung RalfJung changed the title Our floating point semantics are a mess Our floating point semantics were a mess Aug 4, 2023
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 3, 2023
…bilee

add notes about non-compliant FP behavior on 32bit x86 targets

Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look!

In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used.

I opened rust-lang#114479 to concisely describe and track the issue.

Cc `@workingjubilee` `@thomcc` `@chorman0773`  `@rust-lang/opsem`
Fixes rust-lang#73288
Fixes rust-lang#72327
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Oct 3, 2023
Rollup merge of rust-lang#113053 - RalfJung:x86_32-float, r=workingjubilee

add notes about non-compliant FP behavior on 32bit x86 targets

Based on ton of prior discussion (see all the issues linked from rust-lang/unsafe-code-guidelines#237), the consensus seems to be that these targets are simply cursed and we cannot implement the desired semantics for them. I hope I properly understood what exactly the extent of the curse is here, let's make sure people with more in-depth FP knowledge take a close look!

In particular for the tier 3 targets I have no clue which target is affected by which particular variant of the x86_32 FP curse. I assumed that `i686` meant SSE is used so the "floating point return value" is the only problem, while everything lower (`i586`, `i386`) meant x87 is used.

I opened rust-lang#114479 to concisely describe and track the issue.

Cc `@workingjubilee` `@thomcc` `@chorman0773`  `@rust-lang/opsem`
Fixes rust-lang#73288
Fixes rust-lang#72327
@RalfJung
Copy link
Member Author

The RFC rust-lang/rfcs#3514 should resolve all questions for good. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floats Topic: concerns floating point operations/representations C-open-question Category: An open question that we should revisit
Projects
None yet
Development

No branches or pull requests