Document guarantees (or lack thereof) regarding sign, quietness, and payload of `NaN`s #73328

ecstatic-morse · 2020-06-13T20:00:34Z

NaNs can behave in surprising ways. On top of that, a very common target is inherently buggy in more than one way. But on all other targets we actually follow fairly clear, if improperly documented, rules. See here for the current status.

Original issue

Several issues have been filed about surprising behavior of NaNs.

Wrong signs on division producing NaN #55131, in which the sign of the result of 0.0 / 0.0 changed depending on whether the right-hand side came from a function argument or a literal.
i686 floating point behavior does not agree with unit tests in debug mode #73288/Some num tests fail for {i586, i686, x86_64}-unknown-linux-gnu #46948, in which the result of f32::from_bits(x).to_bits() was not always equal to x.

The root cause of these issues is that LLVM does not guarantee that NaN payload bits are preserved. Empirically, this applies to the signaling/quiet bit as well as (surprisingly) the sign bit. At least one LLVM developer seems open to changing this, although doing so may not be easy.

Unless we are prepared to guarantee more, we should do a better job of documenting that, besides having all 1s in the exponent and a non-zero significand, the bitwise value of a NaN is unspecified and may change at any point during program execution. In particular, the from_bits method on f32 and f64 types currently states:

This is currently identical to transmute::<u32, f32>(v) on all platforms.

and

this implementation favors preserving the exact bits. This means that any payloads encoded in NaNs will be preserved

These statements are misleading and should be changed.

We may also want to add documentation to {f32,f64}::NAN to this effect, see #52897 (comment).

cc #10186?

The text was updated successfully, but these errors were encountered:

ecstatic-morse · 2020-06-13T22:43:49Z

This also affects the documentation for the methods in #72568.

RalfJung · 2020-06-16T06:44:59Z

Related LLVM bug: https://bugs.llvm.org/show_bug.cgi?id=45152

…r=Mark-Simulacrum Run standard library unit tests without optimizations in `nopt` CI jobs This was discussed in rust-lang#73288 as a way to catch similar issues in the future. This builds an unoptimized standard library with the bootstrap compiler and runs the unit tests. This takes about 2 minutes on my laptop. I confirmed that this method works locally, although there may be a better way of implementing it. It would be better to use the stage 2 compiler instead of the bootstrap one. Notably, there are currently four `libstd` unit tests that fail in debug mode on `i686-unkown-linux-gnu` (a tier one target): ``` failures: f32::tests::test_float_bits_conv f32::tests::test_total_cmp f64::tests::test_float_bits_conv f64::tests::test_total_cmp ``` These are the tests that prompted rust-lang#73288 as well as the ones added in rust-lang#72568, which is currently broken due to rust-lang#73328.

thomcc · 2020-09-09T09:46:52Z

Unless we are prepared to guarantee more, ... the bitwise value of a NaN is unspecified and may change at any point during program execution

This seems... way too conservative. I know it's trying to make the best of a bad situation, and I'm sympathetic here, but please realize how hard overly broad unspecified behavior like this makes it to write robust code (As a user of Rust who came to it from C, this feels like the same kind of undefined behavior you see in the C standard in cases where all supported platforms disagree).

So, my biggest concern is non-Wasm platforms. I think it would really be a huge blow to working with floats in rust to effectively zero guarantees around NaN. I don't really know a good solution here, but even just marking it as a LLVM bug on the problematic platforms (rather than deciding that this isn't a thing that Rust code gets to rely on ever) would be much better.

Just as an example, if NaN payload is totally unspecified and may change at any point, implementing any ordering stronger than PartialEq for floats is impossible (including #72599), as you cannot count on NaN bitwise values to be stable across two calls of to_bits() on the same float.

Same goes for things that stash f32 in a u32 and then expect to get it out again and be the same (for example, I implemented an AtomicF32 at one point on top of AtomicU32 + from_bits/to_bits. If I can't rely on stable bit values though from float => u32, things like compare_exhcange loops become not guaranteed to ever terminate.

Tbat said, I also "totally unspecified behavior" is too conservative on Wasm too — I've done a bit of poking and it seems like the behavior is a lot more sane than suggested, although it does violate IEEE734 and is probably not 100% intentional.

Basically: LLVM's behavior here is inherited from the wasm/js runtime, which canonicalizes NaNs whenever going from bits => float, as it wants to be able to guarantee certain things about which bit patterns are possibly in the float — certain NaNs are off limits.

That means:

The bits=>float operation is the only time the NaN payload can change (explaining the mentioned f32::from_bits(x).to_bits() round trip failure
Float => bits should be totally stable and consistent
After a float => bits operation, those bits are guaranteed not to change when going back to a float.
- There is, admittedly, some dodginess here since perhaps LLVM optimizes a bits => float => bits into a no-op. Perhaps that can be addressed directly and more easily though?

This is non-ideal but is still way easier to reason about and build on top of than arbitrary unspecified behavior.

Yeah that's the basic gist of my thoughts. Changing the documented guaranteed of from_bits/to_bits globally like that would totally neuter those APIs. I'm sympathetic to the position you're in and not having great choices, but that kind of change feels like very much the wrong call, and making the call be this kind of unspecified behavior feels really bad on any platform...

P.S. I accidentally posted an incomplete version of this comment by hitting ctrl+enter in the github text box, sorry if you saw that — really should just do these in a text editor first.

RalfJung · 2020-09-09T10:19:09Z

I am open to better suggestions. I know hardly anything about floating point semantics, so "totally unspecified" is an easy and obviously "correct" choice for me to reach for. If someone with more in-depth knowledge can produce a spec that is consistent with LLVM behavior, I am sure this can be improved upon.

However, the core spec of Rust must be platform-independent, so unless we consider this a platform bug (which I think is what we do with the x87-induced issues on i686), whatever the spec is has to encompass all platforms.

In principle, certain platforms can decide to guarantee more than others, but that is a dangerous game as it risks code inadvertently becoming non-portable in the worst possible way -- usually "non-portable" means "fails to build on other platforms", now it would silently change behavior. Maybe we can handle this in a way similar to endianess, although the situation feels different.

And all of this is assuming that we can get LLVM to commit to preserving NaN payloads on these platforms. You are saying that this issue only affects wasm(-like) targets, but is there a document where LLVM otherwise makes stronger guarantees? The fact that issues only have been obvserved on these platforms does not help, we need an explicit statement by LLVM to establish and maintain this guarantee in the future.

Just as an example, if NaN payload is totally unspecified and may change at any point, implementing any ordering stronger than PartialEq for floats is impossible (including #72599), as you cannot count on NaN bitwise values to be stable across two calls of to_bits() on the same float.

So if I understand correctly, on wasm, the float => bit cast that is inherent in such a total order would canonicalize NaNs. This on its own is not a problem as this is a stable canonicalization, and that's why you think "unstable NaNs" are too broad. Is that accurate?

However, when you combine that with LLVM optimizing away "bit => float => bit" roundtrips (does it do that?), then this already brings us into an unstable situation. Some of the comparisons might have that optimization applied to them, and others not, so suddenly the same float (obtained via a bit => float cast) can compare in two different ways.

It is easy to make a target language spec such as wasm self-consistent, but to do the same on a heavily optimized IR like LLVM's or surface language like Rust is much harder.

thomcc · 2020-09-09T10:47:59Z

So if I understand correctly, on wasm, the float => bit cast that is inherent in such a total order would canonicalize NaNs.

No, float => bit should always* be stable, it's bit => float that canonicalizes. This means it's possible to implement a robust totalOrder without issues on Wasm (just not if all nan payloads are unspecified values which may change at any time).

My point with that paragraph was not that the LLVM behavior is bad (although I am not a fan), but that changing Rust's guarantees to: "the bitwise value of a NaN is unspecified and may change at any point during program execution" is both

Stronger than needed for Wasm
Makes it so that no matter which operations happen to canonicalize and which do not, it's not possible to write a totalOrder.

* (always... except for what I say in my next response)

However, when you combine that with LLVM optimizing away "bit => float => bit" round-trips (does it do that?)

I don't know if it does it on Wasm, but it's obviously free to do this on non-Wasm platforms (and I think I've seen it there, but it's hard to say and I don't have code I'm thinking of on hand).

I'd hope it wouldn't do this on Wasm, and would argue that if it does optimize that away it's an LLVM bug for that platform, but... yeah. Possible.

unless we consider this a platform bug (which I think is what we do with the x87-induced issues on i686)

Honestly that seems like the sanest decision to me, since the alternative is essentially saying that Rust code can't expect IEEE754-compliant floats anymore. And so, I think x87 is a good example because it's also an example of non-IEEE754 compliance, although probably a less annoying one in practice.

Concretely, I wouldn't have complained about this at all if it were listed as a platform bug.

Instead, my issue is entirely with all compliant Rust code loosing the ability to reason about float binary layout, which has been extremely useful in stuff like scientific computing, game development, programming language runtimes, math libraries, ... All things Rust is well suited to do, by design.

This wouldn't cripple those by any means, but it would make things worse for several of them.

Admittedly, in practice, unless it's flat out UB, I suspect people will just code to their target and not to the spec, which isn't great either, but honestly to me it feels like it might be better than Rust genuinely inheriting this limitation from the web platform.

(Ironically, this would also prevent writing a runtime in Rust that does the optimization which is the reason Wasm and JS runtimes want to canonicalize their NaNs. Although that optimization was already fairly unportable anyway)

RalfJung · 2020-09-09T11:33:18Z

No, float => bit should always* be stable, it's bit => float that canonicalizes.

Oh I see... but that is not observable until you cast back? Or does wasm permit transmutation, like writing a float into memory and reading it back as an int without doing an explicit cast? (IIRC their memroy is int-only so you'd have to cast before writing, but I might misremember.)

I don't know if it does it on Wasm, but it's obviously free to do this on non-Wasm platforms (and I think I've seen it there, but it's hard to say and I don't have code I'm thinking of on hand).

I'd hope it wouldn't do this on Wasm, and would argue that if it does optimize that away it's an LLVM bug for that platform, but... yeah. Possible.

Whether it can do that or not depends solely on the semantics of LLVM IR, which (as far as I know) are not affected by whether you are compiling to Wasm or not. That is the entire point of having a single uniform IR.

There is no good way to make optimizations in a highly optimized language like Rust or LLVM IR depend on target behavior -- given how they interact with all the other optimizations, that is basically guaranteed to introduce contradicting assumptions.

Also, I don't think there is much point in discussing what we wish LLVM would do. We first need to figure out what it is doing.

(Ironically, this would also prevent writing a runtime in Rust that does the optimization which is the reason Wasm and JS runtimes want to canonicalize their NaNs. Although that optimization was already fairly unportable anyway)

Ah, but this is getting to the heart of the problem -- what if you implement a wasm runtime in Rust which uses this optimization, and compile that to wasm? Clearly that cannot work as the host wasm is already "using those bits". So, it is fundamentally impossible to have a semantics that achieves all of

platform independence
supporting this optimization
correct compilation to wasm

Instead, my issue is entirely with all compliant Rust code loosing the ability to reason about float binary layout, which has been extremely useful in stuff like scientific computing, game development, programming language runtimes, math libraries, ... All things Rust is well suited to do, by design.

I do feel like it is slightly exaggarated to say that all these usecases rely on stable NaN payloads. That said, there seems to be a fundamental conflict here between having a good cross-platform story (consistent semantics everywhere) and supporting low-level floating point manipulation. FP behavior is just not consistent enough across platforms.

RalfJung · 2020-09-09T11:36:55Z

However, note that not just wasm has strange NaN behavior. We also have some bugs affecting x86_64: #55131, #69532. Both (I think) stem from the LLVM constant propagator (in one case its port to Rust) producing different NaN payloads than real CPUs. This means that if we guarantee stable NaN payloads in x86_64, we have to stop const-propagating unless all CPUs have consistent NaN payload (and then the const propagator needs to be fixed to match that).

So until LLVM commits to preserving NaN payloads on some targets, there is little we can do. It seems people already rely on that when compiling wasm runtimes in LLVM that use the NaN optimization, so maybe it would not be too hard to convince LLVM to commit to that?

thomcc · 2020-09-09T13:14:25Z

That is the entire point of having a single uniform IR.

This isn't really right tho is it? LLVM-IR includes tons of platform specific information. The fact that making LLVM-IR cross platform was non-viable was part of the motivation behind Wasm's current design even.

From the other issue:

A less drastic alternative is to say that every single FP operation (arithmetic and intrinsics and whatnot, but not copying), when it returns a NaN, non-deterministically picks any NaN representation.

This would be totally fine with me FWIW — as soon as you do arithmetic on NaN all portability is out the window in practice and in theory. My concern is largely with stuff like:

Stuff like https://searchfox.org/mozilla-central/source/js/rust/src/jsval.rs suddenly breaking, just as a quick file I remember from my last job as doing stuff that depends on this.
APIs like https://doc.rust-lang.org/core/arch/x86_64/fn._mm_cmpeq_ps.html being in a limbo where nothing guarantees that it works... even though it obviously must work or is a compiler bug.

For context here: this API is one of many SIMD intrinsic apis where you have shortlived NaNs in float vectors where the payload is very important.

Specifically this function will return a float vector (yes, float — __m128i would be the type for an int vector) with an all-bits-set f32 for every slot where the comparison succeeded. One of the ways you're intended to use the result is as a bitmask, to find the elements where the comparison succeeded/failed.

Since all-bits-set is a NaN with a specific payload, this requires the payload be preserved here

So, while I just gave you two examples of very much non-portable code...

The jsval code is probably more portable than you might expect (actually I have no idea what you might expect, but I believe it should support anything Firefox supports, and probably a little more).
Every target with vector registers does the same "it's really just a bag of bits" stuff somewhere in it's intrinsic API (And the solution here shouldn't be to declare core::arch broken — even if portable simd is on the way).

My big concern still comes back to the notion that these payloads are "unspecified values which may change at any time" according to Rust. The way I interpret that, and the general feeling of this conversation, means that there's no guarantee that target-specific things like these are even guaranteed to work reliably on the target in question.

I do feel like it is slightly exaggarated to say that all these usecases rely on stable NaN payloads

That's why I said "This wouldn't cripple those by any means", although honestly the SIMD stuff would be pretty bad if it were actually broken.

I also fully expect those cases to blindly continue doing things to NaN non-portably (and possibly non-deterministically).

This means that if we guarantee stable NaN payloads in x86_64, we have to stop const-propagating unless all CPUs have consistent NaN payload (and then the const propagator needs to be fixed to match that).

This is surprising, because I thought it was the whole point of LLVM's APFloat code (which even goes as far as to support like the horrible PowerPC long double type...). That said, it's not like I can argue with facts, if those bugs are happening, then they're happening... But are we sure those aren't just normal bugs in LLVM?

That said the only reason I wouldn't be willing to say "I don't care that much about what happens to NaN during const prop" is that you can't know when LLVM will happen to see enough to do more const prop.

That said, it seems totally unreasonable and very fragile to me to rely on things like:

A specific float expression (e.g. 0.0/0.0) producing a specific NaN.
Float numerical operations (arithmetic, math functions, etc) with NaN inputs doing anything beyond producing some arbitrary other NaN (except for sign manipulation — neg/abs/copysign and the like just toggle the sign bit).
...

That stuff is totally nonportable (IEEE754 recommends but doesn't require any of it) and unreliable both at compile time and at runtime. Again, my concern is more unexpected fallout here in stuff that expects NaN to go through smoothly.

Just took a peek at https://webassembly.github.io/spec/core/exec/numerics.html (and elsewhere in the spec) and regret not doing so sooner. In particular, there's a lot of mention on when canonicalization can happen, but none of the places are on load/reinterpret.

And so what's in there is pretty close to the suggestion you had earlier (the "less drastic alternative)... and what I suggested as the things that are totally nonportable.

And, it also definitely contradicts what I said before about when canonicalization happens (which mirrored what happened in ASM.js, what I seemed to see in my testing earlier, and would have explained from_bits(x).to_bits() not round-tripping... But maybe all of it be the "native doubles used in LLVM MC code" bug? Needs more investigation). That said, this would make things a lot more tractable, since it brings Wasm up to par as compliant IEEE-754 implementation, and (if true) just points the blame at LLVM for messing up...

Which would also (maybe?) explain why the bugs happen on all platforms, maybe?

...

Ugh, this is still a bit jumbled sorry, some it this needs to be unified and reordered, and more digging into what the deal with the discrepancy is, but I have to run, unfortunately.

RalfJung · 2020-09-11T09:35:08Z

This isn't really right tho is it? LLVM-IR includes tons of platform specific information. The fact that making LLVM-IR cross platform was non-viable was part of the motivation behind Wasm's current design even.

It makes many platform-specific things such as pointer sizes etc explicit. But that is very different from an implicit change in behavior.

Your proposal would basically require many optimizations to have code like if (wasm) { one_thing; } else { another_thing; }. I do not think such code is common in LLVM today, if it exists at all. It is also very fragile as it is easy to forget to add this in all the right places. In contrast, the explicit reification of layout everywhere is impossible to ignore.

And this would affect many optimizations as it makes float point operations and/or-casts non-deterministic, which is a side-effect! So everything that treats them as pure operations needs to be adjusted.

From the other issue:

There's like 5 other issues, which one do you mean?^^ You are quoting this comment I think.

This would be totally fine with me FWIW — as soon as you do arithmetic on NaN all portability is out the window in practice and in theory.

(This was for making FP operations pick arbitrary NaNs.)
The problem is that this makes them non-deterministic. So e.g. if you have code like

let f = f1 / f2;
function(f, f);

then you are no longer allowed to "inline" the definition of f in both places, as that would change the function arguments from two values with definitely the same NaN payload to potentially different NaN payloads.

However, maybe we can make it deterministic but unspecified? As in, after each floating-point operation, if the result is NaN, something unspecified happens with the NaN bits, but given the same inputs there will definitely always be the same output?

The main issue with this is that it means that const-prop must exactly reproduce those NaN patterns (or refuse to const-prop if the result is a NaN).

My concern is largely with stuff like:

So is it the case that all that code would be okay with FP operations clobbering NaN bits?

My big concern still comes back to the notion that these payloads are "unspecified values which may change at any time" according to Rust.

Rust will probably just do whatever LLVM does, once they make up their mind and commit to a fixed and precise semantics. I think you are barking up the wrong tree here, I don't like unspecified values any more than you do. ;) I am just trying to come up with a consistent way to describe LLVM's behavior.

I'm a theoretical PL researcher, so that's something I have experience with that I am happy to lend here -- define a semantics that is consistent with optimizations and compilation to lower-level targets. However, not knowing much about floating-point makes this harder for me than it is for other topics. So I am relying on people like you to gather up the constraints to make sure the resulting semantics is not just consistent with LLVM but also useful. ;) It might turn out that that's impossible, in which case we can hopefully convince LLVM to change.

This is surprising, because I thought it was the whole point of LLVM's APFloat code (which even goes as far as to support like the horrible PowerPC long double type...). That said, it's not like I can argue with facts, if those bugs are happening, then they're happening... But are we sure those aren't just normal bugs in LLVM?

They might well be bugs! Since you seem to know a lot about floating-point, it would be great if you could help figure that out. :)

That said the only reason I wouldn't be willing to say "I don't care that much about what happens to NaN during const prop" is that you can't know when LLVM will happen to see enough to do more const prop.

Right, that's exactly the point -- const-prop must not change what the program does. So either it must produce the exact same results as hardware, or else we have to say that the involved operation is non-deterministic.

Just took a peek at https://webassembly.github.io/spec/core/exec/numerics.html (and elsewhere in the spec) and regret not doing so sooner. In particular, there's a lot of mention on when canonicalization can happen, but none of the places are on load/reinterpret.

So what is the executive summary?

A quick glance shows that these operations are definitely non-deterministic. So scratch all I said about this above, this basically forces LLVM to never ever duplicate floating-point instructions. Any proposals for (a) figuring out if they are doing this right and (b) documenting this in the LLVM LangRef to make sure they are aware of the problem?

RalfJung · 2020-09-14T14:13:14Z

@ecstatic-morse you listed #73288 in the original issue here, but isn't that a different problem? Namely, this issue here is about NaN bits in general, whereas #73288 is specific to i686 and thus seems more related to #72327. (I don't think we have a meta-issue for "x87 floating point problems", but maybe we should.)

ecstatic-morse · 2020-09-14T16:40:18Z

#72327 affects only i586 targets (x86 without SSE2). This is a tier 2 platform, and the last x86 processor without SSE2 left the plant about 20 years ago, so I would have no problem exempting it from whatever guarantees around NaN payloads we wish to make. However, #73288 affects i686 (the latest 32-bit x86 target) as well, which is tier 1. Obviously, we could (and maybe should) exempt all 32-bit x86 targets from the NaN payload guarantees, but I consider #73288 to be of greater importance than issues only affecting i586.

As an aside, I will note that "Unless we are prepared to guarantee more" was doing a lot of work in the OP. I'd be very happy if we came up with a stricter set of semantics that we can support across tier 1 platforms (possibly exempting 32-bit x86) and implemented them. However, doing so will require a non-trivial amount of work, much of it on the LLVM side. I think that, in the meantime, we should explicitly state where we currently fall short in the documentation of affected APIs, similar to #10184. That's what this issue is about.

ecstatic-morse · 2020-09-14T16:44:39Z

Also, look out for my latest crate, AtomicNanCanonicalizingF32, on crates.io.

workingjubilee · 2021-03-19T19:26:30Z

Ah, indeed! Yes, that would Not be okay.

Insofar as the standard is concerned, to my reading and understanding:

If all inputs to an op are non-NaN, then there are only a few sets of input values which can yield a NaN float, which do include mul(NEG_INFINITY, 0.0).
A NaN float is a bitstring with some bits set and others in an undetermined state. Their state can be revealed, however, by:
Operations that only examine a NaN float (e.g. partial_cmp) or interact with it solely as a bitstring (abs, neg, copysign, and Copy) are deterministic.

Most of the LLVM value-changing optimizations are noted as permissible to some degree by the IEEE754-2019 standard if offered as opt-ins, except for the "no signed zeros" marker, which the standard does not recognize as a valid optimization.

thomcc · 2021-03-20T09:44:52Z

#81261 basically says that NEG_INFINITY * 0.0 is non-deterministic

That's not quite right. The issue is that when evaluated at compile time, it produces one result, and at runtime, another. Evaluating either at compiletime or runtime is fully deterministic (modulo wasm, where I guess it's explicitly nondeterministic).

RalfJung · 2021-03-20T09:53:14Z

That's not quite right. The issue is that when evaluated at compile time, it produces one result, and at runtime, another.

The only way this is not a bug is if evaluation is non-deterministic. Rust has the same evaluation rules for compile-time and run-time. Otherwise there'd be two Rust languages and we'd have a horrible mess...

Evaluating either at compiletime or runtime is fully deterministic (modulo wasm, where I guess it's explicitly nondeterministic).

Of course, the actual implementation is never non-deterministic. But the specification of Rust has to be non-deterministic here, or we have to change either compile-time or run-time behavior.

thomcc · 2021-03-20T09:57:30Z

The only way this is not a bug is if evaluation is non-deterministic

IMO it is a bug.

Of course, the actual implementation is never non-deterministic. But the specification of Rust has to be non-deterministic here, or we have to change either compile-time or run-time behavior.

I mean, it's really easy for me to argue that the changing the compile-time behavior is right. Unfortunately, that's difficult because it requires changing how APFloat works in LLVM, and it's not a trivial change either.

That said, IMO the solution to hard, low-impact bugs shouldn't be to rework the language so that they're not bugs. Eventually they should be fixed, even if it's not a high priority.

Additionally, a different Rust compiler probably wouldn't have the same difficulty here.

RalfJung · 2021-03-20T10:01:31Z

That's not quite right. The issue is that when evaluated at compile time, it produces one result, and at runtime, another.

Also, that's not even true. The original code sample in that issue shows two different behaviors at runtime:

use std::ops::Mul;

fn main() {
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY.mul(0.0)), -1.0f64);
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY * 0.0), -1.0f64);
}

thomcc · 2021-03-20T10:16:30Z

What is the consequence of being "non-reproducible"? This is possible in safe code so it cannot do anything funny in Rust. In particular it may not introduce "unstable values" due to inconsistently applied compiler transformations.

I've been meaning to say this, but the reproducibility rules are probably a bit of a red herring. They're only really meant to apply to programs that opt into a subset of floating point semantics.

Also, that's not even true. The original code sample in that issue shows two different behaviors at runtime:

I believe this is due to one of these being impacted by LLVM's constant propagation and the other not.

RalfJung · 2021-03-20T12:05:02Z

I believe this is due to one of these being impacted by LLVM's constant propagation and the other not.

Sure. But that doesn't change the fact that this is runtime code. And to my knowledge, LLVM doesn't consider this optimization a bug, since the result produced by LLVM is legal according to the IEEE floating-point spec. There isn't even an LLVM bugreport for the f64::NEG_INFINITY * 0.0 case, is there?

That said, IMO the solution to hard, low-impact bugs shouldn't be to rework the language so that they're not bugs. Eventually they should be fixed, even if it's not a high priority.

It is my understanding that some aspects of the bitwise results of floating-point operations (in particular for NaNs) are inherently not defined in the LLVM IR semantics (or in the IEEE semantics, which LLVM [mostly?] follows). This is not a bug, it is part of their spec. So if we want to use LLVM as the backend, we have no choice but to also incorporate a similar kind of non-determinism into the Rust semantics (or lobby for LLVM to change their spec).

This is not reworking the language, it is properly understanding the consequences of what it means to say that Rust uses IEEE floating-point semantics. If agree that it would be nice to have deterministic floating-point operations, but that's just not realistic when LLVM (and WebAssembly) made a different choice.

RalfJung · 2021-03-20T12:22:12Z

Put differently: a bug usually means that something is not working according to spec. I don't see that happen here (but I keep getting lost in the details of FP semantics). My understanding is that this issue is about better documenting the Rust spec, not about changing the behavior of rustc.

One could argue that the spec has a bug due to being too liberal, but given that the spec we are talking about here is the LLVM IR spec and by extension the IEEE FP spec, that does not seem like a particularly useful of constructive approach. (Specs can certainly have bugs when they fail to be self-consistent or when they do not adequately reflect intended behavior, but that does not seem to be the case here.)

workingjubilee · 2021-03-23T17:34:06Z

I do not believe lobbying LLVM for a hardware-respecting behavior seems that unlikely. It may make some proofs regarding optimizations easier, for one.

RalfJung · 2021-03-23T18:06:38Z

It may make some proofs regarding optimizations easier, for one.

I don't see how that would be the case.

I do not believe lobbying LLVM for a hardware-respecting behavior seems that unlikely.

Fair. But this is the wrong forum to do so. ;)

DemiMarie · 2021-10-29T22:14:47Z

What about refusing to constant-evaluate any operation that is non-reproducible?

RalfJung · 2021-10-29T22:44:44Z

By const-evaluate I assume you mean constant propagation / constant folding, i.e., the optimization pass that tries to avoid redundant computations at runtime? That is distinct from CTFE (compile-time function evaluation, also sometimes called const evaluation), which is about computations that the spec says happen at compile-time (such as the initial values of a const, array sizes, or enum discriminant values).

We could do that in rustc, but can we convince LLVM to stop folding f64::NEG_INFINITY * 0.0?

DemiMarie · 2021-10-30T01:15:06Z

By const-evaluate I assume you mean constant propagation / constant folding, i.e., the optimization pass that tries to avoid redundant computations at runtime? That is distinct from CTFE (compile-time function evaluation, also sometimes called const evaluation), which is about computations that the spec says happen at compile-time (such as the initial values of a const, array sizes, or enum discriminant values).

We could do that in rustc, but can we convince LLVM to stop folding f64::NEG_INFINITY * 0.0?

File a bug against LLVM? I don’t know 🙂

…shtriplett Improve floating point documentation This is my attempt to improve/solve rust-lang#95468 and rust-lang#73328 . Added/refined explanations: - Refine the "NaN as a special value" top level explanation of f32 - Refine `const NAN` docstring: add an explanation about there being multitude of NaN bitpatterns and disclaimer about the portability/stability guarantees. - Refine `fn is_sign_positive` and `fn is_sign_negative` docstrings: add disclaimer about the sign bit of NaNs. - Refine `fn min` and `fn max` docstrings: explain the semantics and their relationship to the standard and libm better. - Refine `fn trunc` docstrings: explain the semantics slightly more. - Refine `fn powi` docstrings: add disclaimer that the rounding behaviour might be different from `powf`. - Refine `fn copysign` docstrings: add disclaimer about payloads of NaNs. - Refine `minimum` and `maximum`: add disclaimer that "propagating NaN" doesn't mean that propagating the NaN bit patterns is guaranteed. - Refine `max` and `min` docstrings: add "ignoring NaN" to bring the one-row explanation to parity with `minimum` and `maximum`. Cosmetic changes: - Reword `NaN` and `NAN` as plain "NaN", unless they refer to the specific `const NAN`. - Reword "a number" to `self` in function docstrings to clarify. - Remove "Returns NAN if the number is NAN" from `abs`, as this is told to be the default behavior in the top explanation.

Improve floating point documentation This is my attempt to improve/solve rust-lang/rust#95468 and rust-lang/rust#73328 . Added/refined explanations: - Refine the "NaN as a special value" top level explanation of f32 - Refine `const NAN` docstring: add an explanation about there being multitude of NaN bitpatterns and disclaimer about the portability/stability guarantees. - Refine `fn is_sign_positive` and `fn is_sign_negative` docstrings: add disclaimer about the sign bit of NaNs. - Refine `fn min` and `fn max` docstrings: explain the semantics and their relationship to the standard and libm better. - Refine `fn trunc` docstrings: explain the semantics slightly more. - Refine `fn powi` docstrings: add disclaimer that the rounding behaviour might be different from `powf`. - Refine `fn copysign` docstrings: add disclaimer about payloads of NaNs. - Refine `minimum` and `maximum`: add disclaimer that "propagating NaN" doesn't mean that propagating the NaN bit patterns is guaranteed. - Refine `max` and `min` docstrings: add "ignoring NaN" to bring the one-row explanation to parity with `minimum` and `maximum`. Cosmetic changes: - Reword `NaN` and `NAN` as plain "NaN", unless they refer to the specific `const NAN`. - Reword "a number" to `self` in function docstrings to clarify. - Remove "Returns NAN if the number is NAN" from `abs`, as this is told to be the default behavior in the top explanation.

RalfJung · 2023-08-04T18:58:26Z

I have written a Pre-RFC on our floating-point guarantees, which is almost exclusively about NaNs. That document describes what are currently the best possible guarantees we can provide, given LLVM's documentation. However, LLVM also seems to be open to providing stronger guarantees.

the8472 · 2023-09-05T13:55:53Z

and the last x86 processor without SSE2 left the plant about 20 years ago

To be pedantic, the Vortex86DX3 is still being made and only supports SSE
And they claim linux support. Some poor soul out there may still be compiling x86-no-SSE2 code for linux shipped on "new" hardware. That said, I'm not aware of any instances of this actually happening, just raising the possibility.

Edit: #35045 (comment) mentioned in 2016 that he's using a VortexX86

RalfJung · 2023-09-05T14:15:43Z

I'm more concerned about someone using -C target-cpu=pentium on one of our tier 1 i686 targets an expecting that to work properly. Maybe we should just forbid disabling SSE2 support...

RalfJung · 2023-10-14T08:03:48Z

The RFC rust-lang/rfcs#3514 makes a concrete proposal for our guarantees for the bits of NaNs.

ecstatic-morse mentioned this issue Jun 13, 2020

i686 floating point behavior does not agree with unit tests in debug mode #73288

Closed

ecstatic-morse added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 13, 2020

ecstatic-morse mentioned this issue Jun 13, 2020

Comparing to infinity is buggy on x87 #72327

Closed

ecstatic-morse changed the title ~~Document guarantees (or lack thereof) regarding signedness, quietness, and payload of NaNs~~ Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs Jun 13, 2020

This comment has been minimized.

Sign in to view

RalfJung mentioned this issue Jun 14, 2020

Our floating point semantics were a mess rust-lang/unsafe-code-guidelines#237

Open

This comment has been minimized.

Sign in to view

This was referenced Jun 14, 2020

Some num tests fail for {i586, i686, x86_64}-unknown-linux-gnu #46948

Closed

Run standard library unit tests without optimizations in nopt CI jobs #73383

Merged

Tracking Issue for total_cmp (on f32/f64) #72599

Closed

ecstatic-morse added the A-floating-point Area: Floating point numbers and arithmetic label Jun 15, 2020

MaulingMonkey mentioned this issue Aug 21, 2020

Implement {float}::asin_acos(), {float}::asin(), and {float}::acos() Lokathor/wide#48

Merged

RalfJung mentioned this issue Sep 14, 2020

Miri floating point NaN conversion issue #69532

Closed

scottmcm mentioned this issue Oct 30, 2021

Added next_up and next_down for f32/f64. #88728

Closed

Ilia-Kosenkov mentioned this issue Nov 23, 2021

32-bit Windows cannot handle Rfloat::na() extendr/extendr#321

Closed

jedel1043 mentioned this issue Feb 9, 2022

NaN-boxed JsValues boa-dev/boa#1830

Draft

This was referenced Mar 30, 2022

Improve floating point documentation about platform guarantees #95468

Closed

Improve floating point documentation #95483

Merged

CAD97 mentioned this issue Jan 24, 2023

Rust does not comply with IEEE 754 floats: arithmetic can produce signaling NaN #107247

Closed

This was referenced Aug 4, 2023

LLVM (thus, Rust) floating point support support is incomplete #10186

Closed

Wrong signs on division producing NaN #55131

Closed

eswartz mentioned this issue Aug 31, 2023

Expose tests for {f32,f64}.total_cmp in docs #115412

Merged

reisnera mentioned this issue Sep 8, 2023

Incorrect floating point result with remainder (%) operator model-checking/kani#2669

Open

RalfJung mentioned this issue Feb 11, 2024

f64 NaN conversion to f32 maintains sign in Debug, not in Release #120898

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document guarantees (or lack thereof) regarding sign, quietness, and payload of `NaN`s #73328

Document guarantees (or lack thereof) regarding sign, quietness, and payload of `NaN`s #73328

ecstatic-morse commented Jun 13, 2020 •

edited by RalfJung

ecstatic-morse commented Jun 13, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

RalfJung commented Jun 16, 2020

thomcc commented Sep 9, 2020 •

edited

RalfJung commented Sep 9, 2020

thomcc commented Sep 9, 2020 •

edited

RalfJung commented Sep 9, 2020 •

edited

RalfJung commented Sep 9, 2020 •

edited

thomcc commented Sep 9, 2020 •

edited

RalfJung commented Sep 11, 2020

RalfJung commented Sep 14, 2020

ecstatic-morse commented Sep 14, 2020 •

edited

ecstatic-morse commented Sep 14, 2020

workingjubilee commented Mar 19, 2021

thomcc commented Mar 20, 2021 •

edited

RalfJung commented Mar 20, 2021 •

edited

thomcc commented Mar 20, 2021

RalfJung commented Mar 20, 2021

thomcc commented Mar 20, 2021

RalfJung commented Mar 20, 2021 •

edited

RalfJung commented Mar 20, 2021 •

edited

workingjubilee commented Mar 23, 2021

RalfJung commented Mar 23, 2021

DemiMarie commented Oct 29, 2021

RalfJung commented Oct 29, 2021

DemiMarie commented Oct 30, 2021 •

edited

RalfJung commented Aug 4, 2023

the8472 commented Sep 5, 2023 •

edited

RalfJung commented Sep 5, 2023

RalfJung commented Oct 14, 2023

Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs #73328

Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs #73328

Comments

ecstatic-morse commented Jun 13, 2020 • edited by RalfJung

Original issue

ecstatic-morse commented Jun 13, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

RalfJung commented Jun 16, 2020

thomcc commented Sep 9, 2020 • edited

RalfJung commented Sep 9, 2020

thomcc commented Sep 9, 2020 • edited

RalfJung commented Sep 9, 2020 • edited

RalfJung commented Sep 9, 2020 • edited

thomcc commented Sep 9, 2020 • edited

RalfJung commented Sep 11, 2020

RalfJung commented Sep 14, 2020

ecstatic-morse commented Sep 14, 2020 • edited

ecstatic-morse commented Sep 14, 2020

workingjubilee commented Mar 19, 2021

thomcc commented Mar 20, 2021 • edited

RalfJung commented Mar 20, 2021 • edited

thomcc commented Mar 20, 2021

RalfJung commented Mar 20, 2021

thomcc commented Mar 20, 2021

RalfJung commented Mar 20, 2021 • edited

RalfJung commented Mar 20, 2021 • edited

workingjubilee commented Mar 23, 2021

RalfJung commented Mar 23, 2021

DemiMarie commented Oct 29, 2021

RalfJung commented Oct 29, 2021

DemiMarie commented Oct 30, 2021 • edited

RalfJung commented Aug 4, 2023

the8472 commented Sep 5, 2023 • edited

RalfJung commented Sep 5, 2023

RalfJung commented Oct 14, 2023

Document guarantees (or lack thereof) regarding sign, quietness, and payload of `NaN`s #73328

Document guarantees (or lack thereof) regarding sign, quietness, and payload of `NaN`s #73328

ecstatic-morse commented Jun 13, 2020 •

edited by RalfJung

thomcc commented Sep 9, 2020 •

edited

thomcc commented Sep 9, 2020 •

edited

RalfJung commented Sep 9, 2020 •

edited

RalfJung commented Sep 9, 2020 •

edited

thomcc commented Sep 9, 2020 •

edited

ecstatic-morse commented Sep 14, 2020 •

edited

thomcc commented Mar 20, 2021 •

edited

RalfJung commented Mar 20, 2021 •

edited

RalfJung commented Mar 20, 2021 •

edited

RalfJung commented Mar 20, 2021 •

edited

DemiMarie commented Oct 30, 2021 •

edited

the8472 commented Sep 5, 2023 •

edited