Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs #73328

Open
Tracked by #72599
ecstatic-morse opened this issue Jun 13, 2020 · 44 comments
Labels
A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Jun 13, 2020

NaNs can behave in surprising ways. On top of that, a very common target is inherently buggy in more than one way. But on all other targets we actually follow fairly clear, if improperly documented, rules. See here for the current status.

Original issue

Several issues have been filed about surprising behavior of NaNs.

The root cause of these issues is that LLVM does not guarantee that NaN payload bits are preserved. Empirically, this applies to the signaling/quiet bit as well as (surprisingly) the sign bit. At least one LLVM developer seems open to changing this, although doing so may not be easy.

Unless we are prepared to guarantee more, we should do a better job of documenting that, besides having all 1s in the exponent and a non-zero significand, the bitwise value of a NaN is unspecified and may change at any point during program execution. In particular, the from_bits method on f32 and f64 types currently states:

This is currently identical to transmute::<u32, f32>(v) on all platforms.

and

this implementation favors preserving the exact bits. This means that any payloads encoded in NaNs will be preserved

These statements are misleading and should be changed.

We may also want to add documentation to {f32,f64}::NAN to this effect, see #52897 (comment).

cc #10186?

@ecstatic-morse ecstatic-morse added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Jun 13, 2020
@ecstatic-morse ecstatic-morse changed the title Document guarantees (or lack thereof) regarding signedness, quietness, and payload of NaNs Document guarantees (or lack thereof) regarding sign, quietness, and payload of NaNs Jun 13, 2020
@ecstatic-morse
Copy link
Contributor Author

This also affects the documentation for the methods in #72568.

@RalfJung

This comment has been minimized.

@ecstatic-morse

This comment has been minimized.

@RalfJung

This comment has been minimized.

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member

Related LLVM bug: https://bugs.llvm.org/show_bug.cgi?id=45152

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 14, 2020
…r=Mark-Simulacrum

Run standard library unit tests without optimizations in `nopt` CI jobs

This was discussed in rust-lang#73288 as a way to catch similar issues in the future. This builds an unoptimized standard library with the bootstrap compiler and runs the unit tests. This takes about 2 minutes on my laptop.

I confirmed that this method works locally, although there may be a better way of implementing it. It would be better to use the stage 2 compiler instead of the bootstrap one.

Notably, there are currently four `libstd` unit tests that fail in debug mode on `i686-unkown-linux-gnu` (a tier one target):

```
failures:
    f32::tests::test_float_bits_conv
    f32::tests::test_total_cmp
    f64::tests::test_float_bits_conv
    f64::tests::test_total_cmp
```

These are the tests that prompted rust-lang#73288 as well as the ones added in rust-lang#72568, which is currently broken due to rust-lang#73328.
@thomcc
Copy link
Member

thomcc commented Sep 9, 2020

Unless we are prepared to guarantee more, ... the bitwise value of a NaN is unspecified and may change at any point during program execution

This seems... way too conservative. I know it's trying to make the best of a bad situation, and I'm sympathetic here, but please realize how hard overly broad unspecified behavior like this makes it to write robust code (As a user of Rust who came to it from C, this feels like the same kind of undefined behavior you see in the C standard in cases where all supported platforms disagree).

So, my biggest concern is non-Wasm platforms. I think it would really be a huge blow to working with floats in rust to effectively zero guarantees around NaN. I don't really know a good solution here, but even just marking it as a LLVM bug on the problematic platforms (rather than deciding that this isn't a thing that Rust code gets to rely on ever) would be much better.

Just as an example, if NaN payload is totally unspecified and may change at any point, implementing any ordering stronger than PartialEq for floats is impossible (including #72599), as you cannot count on NaN bitwise values to be stable across two calls of to_bits() on the same float.

Same goes for things that stash f32 in a u32 and then expect to get it out again and be the same (for example, I implemented an AtomicF32 at one point on top of AtomicU32 + from_bits/to_bits. If I can't rely on stable bit values though from float => u32, things like compare_exhcange loops become not guaranteed to ever terminate.


Tbat said, I also "totally unspecified behavior" is too conservative on Wasm too — I've done a bit of poking and it seems like the behavior is a lot more sane than suggested, although it does violate IEEE734 and is probably not 100% intentional.

Basically: LLVM's behavior here is inherited from the wasm/js runtime, which canonicalizes NaNs whenever going from bits => float, as it wants to be able to guarantee certain things about which bit patterns are possibly in the float — certain NaNs are off limits.

That means:

  • The bits=>float operation is the only time the NaN payload can change (explaining the mentioned f32::from_bits(x).to_bits() round trip failure
  • Float => bits should be totally stable and consistent
  • After a float => bits operation, those bits are guaranteed not to change when going back to a float.
    • There is, admittedly, some dodginess here since perhaps LLVM optimizes a bits => float => bits into a no-op. Perhaps that can be addressed directly and more easily though?

This is non-ideal but is still way easier to reason about and build on top of than arbitrary unspecified behavior.


Yeah that's the basic gist of my thoughts. Changing the documented guaranteed of from_bits/to_bits globally like that would totally neuter those APIs. I'm sympathetic to the position you're in and not having great choices, but that kind of change feels like very much the wrong call, and making the call be this kind of unspecified behavior feels really bad on any platform...

P.S. I accidentally posted an incomplete version of this comment by hitting ctrl+enter in the github text box, sorry if you saw that — really should just do these in a text editor first.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2020

I am open to better suggestions. I know hardly anything about floating point semantics, so "totally unspecified" is an easy and obviously "correct" choice for me to reach for. If someone with more in-depth knowledge can produce a spec that is consistent with LLVM behavior, I am sure this can be improved upon.

However, the core spec of Rust must be platform-independent, so unless we consider this a platform bug (which I think is what we do with the x87-induced issues on i686), whatever the spec is has to encompass all platforms.

In principle, certain platforms can decide to guarantee more than others, but that is a dangerous game as it risks code inadvertently becoming non-portable in the worst possible way -- usually "non-portable" means "fails to build on other platforms", now it would silently change behavior. Maybe we can handle this in a way similar to endianess, although the situation feels different.

And all of this is assuming that we can get LLVM to commit to preserving NaN payloads on these platforms. You are saying that this issue only affects wasm(-like) targets, but is there a document where LLVM otherwise makes stronger guarantees? The fact that issues only have been obvserved on these platforms does not help, we need an explicit statement by LLVM to establish and maintain this guarantee in the future.

Just as an example, if NaN payload is totally unspecified and may change at any point, implementing any ordering stronger than PartialEq for floats is impossible (including #72599), as you cannot count on NaN bitwise values to be stable across two calls of to_bits() on the same float.

So if I understand correctly, on wasm, the float => bit cast that is inherent in such a total order would canonicalize NaNs. This on its own is not a problem as this is a stable canonicalization, and that's why you think "unstable NaNs" are too broad. Is that accurate?

However, when you combine that with LLVM optimizing away "bit => float => bit" roundtrips (does it do that?), then this already brings us into an unstable situation. Some of the comparisons might have that optimization applied to them, and others not, so suddenly the same float (obtained via a bit => float cast) can compare in two different ways.

It is easy to make a target language spec such as wasm self-consistent, but to do the same on a heavily optimized IR like LLVM's or surface language like Rust is much harder.

@thomcc
Copy link
Member

thomcc commented Sep 9, 2020

So if I understand correctly, on wasm, the float => bit cast that is inherent in such a total order would canonicalize NaNs.

No, float => bit should always* be stable, it's bit => float that canonicalizes. This means it's possible to implement a robust totalOrder without issues on Wasm (just not if all nan payloads are unspecified values which may change at any time).

My point with that paragraph was not that the LLVM behavior is bad (although I am not a fan), but that changing Rust's guarantees to: "the bitwise value of a NaN is unspecified and may change at any point during program execution" is both

  • Stronger than needed for Wasm
  • Makes it so that no matter which operations happen to canonicalize and which do not, it's not possible to write a totalOrder.

* (always... except for what I say in my next response)


However, when you combine that with LLVM optimizing away "bit => float => bit" round-trips (does it do that?)

I don't know if it does it on Wasm, but it's obviously free to do this on non-Wasm platforms (and I think I've seen it there, but it's hard to say and I don't have code I'm thinking of on hand).

I'd hope it wouldn't do this on Wasm, and would argue that if it does optimize that away it's an LLVM bug for that platform, but... yeah. Possible.


unless we consider this a platform bug (which I think is what we do with the x87-induced issues on i686)

Honestly that seems like the sanest decision to me, since the alternative is essentially saying that Rust code can't expect IEEE754-compliant floats anymore. And so, I think x87 is a good example because it's also an example of non-IEEE754 compliance, although probably a less annoying one in practice.

Concretely, I wouldn't have complained about this at all if it were listed as a platform bug.


Instead, my issue is entirely with all compliant Rust code loosing the ability to reason about float binary layout, which has been extremely useful in stuff like scientific computing, game development, programming language runtimes, math libraries, ... All things Rust is well suited to do, by design.

This wouldn't cripple those by any means, but it would make things worse for several of them.

Admittedly, in practice, unless it's flat out UB, I suspect people will just code to their target and not to the spec, which isn't great either, but honestly to me it feels like it might be better than Rust genuinely inheriting this limitation from the web platform.

(Ironically, this would also prevent writing a runtime in Rust that does the optimization which is the reason Wasm and JS runtimes want to canonicalize their NaNs. Although that optimization was already fairly unportable anyway)

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2020

No, float => bit should always* be stable, it's bit => float that canonicalizes.

Oh I see... but that is not observable until you cast back? Or does wasm permit transmutation, like writing a float into memory and reading it back as an int without doing an explicit cast? (IIRC their memroy is int-only so you'd have to cast before writing, but I might misremember.)

I don't know if it does it on Wasm, but it's obviously free to do this on non-Wasm platforms (and I think I've seen it there, but it's hard to say and I don't have code I'm thinking of on hand).

I'd hope it wouldn't do this on Wasm, and would argue that if it does optimize that away it's an LLVM bug for that platform, but... yeah. Possible.

Whether it can do that or not depends solely on the semantics of LLVM IR, which (as far as I know) are not affected by whether you are compiling to Wasm or not. That is the entire point of having a single uniform IR.

There is no good way to make optimizations in a highly optimized language like Rust or LLVM IR depend on target behavior -- given how they interact with all the other optimizations, that is basically guaranteed to introduce contradicting assumptions.

Also, I don't think there is much point in discussing what we wish LLVM would do. We first need to figure out what it is doing.

(Ironically, this would also prevent writing a runtime in Rust that does the optimization which is the reason Wasm and JS runtimes want to canonicalize their NaNs. Although that optimization was already fairly unportable anyway)

Ah, but this is getting to the heart of the problem -- what if you implement a wasm runtime in Rust which uses this optimization, and compile that to wasm? Clearly that cannot work as the host wasm is already "using those bits". So, it is fundamentally impossible to have a semantics that achieves all of

  • platform independence
  • supporting this optimization
  • correct compilation to wasm

Instead, my issue is entirely with all compliant Rust code loosing the ability to reason about float binary layout, which has been extremely useful in stuff like scientific computing, game development, programming language runtimes, math libraries, ... All things Rust is well suited to do, by design.

I do feel like it is slightly exaggarated to say that all these usecases rely on stable NaN payloads. That said, there seems to be a fundamental conflict here between having a good cross-platform story (consistent semantics everywhere) and supporting low-level floating point manipulation. FP behavior is just not consistent enough across platforms.

@RalfJung
Copy link
Member

RalfJung commented Sep 9, 2020

However, note that not just wasm has strange NaN behavior. We also have some bugs affecting x86_64: #55131, #69532. Both (I think) stem from the LLVM constant propagator (in one case its port to Rust) producing different NaN payloads than real CPUs. This means that if we guarantee stable NaN payloads in x86_64, we have to stop const-propagating unless all CPUs have consistent NaN payload (and then the const propagator needs to be fixed to match that).

So until LLVM commits to preserving NaN payloads on some targets, there is little we can do. It seems people already rely on that when compiling wasm runtimes in LLVM that use the NaN optimization, so maybe it would not be too hard to convince LLVM to commit to that?

@thomcc
Copy link
Member

thomcc commented Sep 9, 2020

That is the entire point of having a single uniform IR.

This isn't really right tho is it? LLVM-IR includes tons of platform specific information. The fact that making LLVM-IR cross platform was non-viable was part of the motivation behind Wasm's current design even.


From the other issue:

A less drastic alternative is to say that every single FP operation (arithmetic and intrinsics and whatnot, but not copying), when it returns a NaN, non-deterministically picks any NaN representation.

This would be totally fine with me FWIW — as soon as you do arithmetic on NaN all portability is out the window in practice and in theory. My concern is largely with stuff like:

  • Stuff like https://searchfox.org/mozilla-central/source/js/rust/src/jsval.rs suddenly breaking, just as a quick file I remember from my last job as doing stuff that depends on this.

  • APIs like https://doc.rust-lang.org/core/arch/x86_64/fn._mm_cmpeq_ps.html being in a limbo where nothing guarantees that it works... even though it obviously must work or is a compiler bug.

    For context here: this API is one of many SIMD intrinsic apis where you have shortlived NaNs in float vectors where the payload is very important.

    Specifically this function will return a float vector (yes, float — __m128i would be the type for an int vector) with an all-bits-set f32 for every slot where the comparison succeeded. One of the ways you're intended to use the result is as a bitmask, to find the elements where the comparison succeeded/failed.

    Since all-bits-set is a NaN with a specific payload, this requires the payload be preserved here

So, while I just gave you two examples of very much non-portable code...

  • The jsval code is probably more portable than you might expect (actually I have no idea what you might expect, but I believe it should support anything Firefox supports, and probably a little more).
  • Every target with vector registers does the same "it's really just a bag of bits" stuff somewhere in it's intrinsic API (And the solution here shouldn't be to declare core::arch broken — even if portable simd is on the way).

My big concern still comes back to the notion that these payloads are "unspecified values which may change at any time" according to Rust. The way I interpret that, and the general feeling of this conversation, means that there's no guarantee that target-specific things like these are even guaranteed to work reliably on the target in question.


I do feel like it is slightly exaggarated to say that all these usecases rely on stable NaN payloads

That's why I said "This wouldn't cripple those by any means", although honestly the SIMD stuff would be pretty bad if it were actually broken.

I also fully expect those cases to blindly continue doing things to NaN non-portably (and possibly non-deterministically).


This means that if we guarantee stable NaN payloads in x86_64, we have to stop const-propagating unless all CPUs have consistent NaN payload (and then the const propagator needs to be fixed to match that).

This is surprising, because I thought it was the whole point of LLVM's APFloat code (which even goes as far as to support like the horrible PowerPC long double type...). That said, it's not like I can argue with facts, if those bugs are happening, then they're happening... But are we sure those aren't just normal bugs in LLVM?

That said the only reason I wouldn't be willing to say "I don't care that much about what happens to NaN during const prop" is that you can't know when LLVM will happen to see enough to do more const prop.

That said, it seems totally unreasonable and very fragile to me to rely on things like:

  • A specific float expression (e.g. 0.0/0.0) producing a specific NaN.
  • Float numerical operations (arithmetic, math functions, etc) with NaN inputs doing anything beyond producing some arbitrary other NaN (except for sign manipulation — neg/abs/copysign and the like just toggle the sign bit).
  • ...

That stuff is totally nonportable (IEEE754 recommends but doesn't require any of it) and unreliable both at compile time and at runtime. Again, my concern is more unexpected fallout here in stuff that expects NaN to go through smoothly.


Just took a peek at https://webassembly.github.io/spec/core/exec/numerics.html (and elsewhere in the spec) and regret not doing so sooner. In particular, there's a lot of mention on when canonicalization can happen, but none of the places are on load/reinterpret.

And so what's in there is pretty close to the suggestion you had earlier (the "less drastic alternative)... and what I suggested as the things that are totally nonportable.

And, it also definitely contradicts what I said before about when canonicalization happens (which mirrored what happened in ASM.js, what I seemed to see in my testing earlier, and would have explained from_bits(x).to_bits() not round-tripping... But maybe all of it be the "native doubles used in LLVM MC code" bug? Needs more investigation). That said, this would make things a lot more tractable, since it brings Wasm up to par as compliant IEEE-754 implementation, and (if true) just points the blame at LLVM for messing up...

Which would also (maybe?) explain why the bugs happen on all platforms, maybe?

...

Ugh, this is still a bit jumbled sorry, some it this needs to be unified and reordered, and more digging into what the deal with the discrepancy is, but I have to run, unfortunately.

@RalfJung
Copy link
Member

This isn't really right tho is it? LLVM-IR includes tons of platform specific information. The fact that making LLVM-IR cross platform was non-viable was part of the motivation behind Wasm's current design even.

It makes many platform-specific things such as pointer sizes etc explicit. But that is very different from an implicit change in behavior.

Your proposal would basically require many optimizations to have code like if (wasm) { one_thing; } else { another_thing; }. I do not think such code is common in LLVM today, if it exists at all. It is also very fragile as it is easy to forget to add this in all the right places. In contrast, the explicit reification of layout everywhere is impossible to ignore.

And this would affect many optimizations as it makes float point operations and/or-casts non-deterministic, which is a side-effect! So everything that treats them as pure operations needs to be adjusted.

From the other issue:

There's like 5 other issues, which one do you mean?^^ You are quoting this comment I think.

This would be totally fine with me FWIW — as soon as you do arithmetic on NaN all portability is out the window in practice and in theory.

(This was for making FP operations pick arbitrary NaNs.)
The problem is that this makes them non-deterministic. So e.g. if you have code like

let f = f1 / f2;
function(f, f);

then you are no longer allowed to "inline" the definition of f in both places, as that would change the function arguments from two values with definitely the same NaN payload to potentially different NaN payloads.

However, maybe we can make it deterministic but unspecified? As in, after each floating-point operation, if the result is NaN, something unspecified happens with the NaN bits, but given the same inputs there will definitely always be the same output?

The main issue with this is that it means that const-prop must exactly reproduce those NaN patterns (or refuse to const-prop if the result is a NaN).

My concern is largely with stuff like:

So is it the case that all that code would be okay with FP operations clobbering NaN bits?

My big concern still comes back to the notion that these payloads are "unspecified values which may change at any time" according to Rust.

Rust will probably just do whatever LLVM does, once they make up their mind and commit to a fixed and precise semantics. I think you are barking up the wrong tree here, I don't like unspecified values any more than you do. ;) I am just trying to come up with a consistent way to describe LLVM's behavior.

I'm a theoretical PL researcher, so that's something I have experience with that I am happy to lend here -- define a semantics that is consistent with optimizations and compilation to lower-level targets. However, not knowing much about floating-point makes this harder for me than it is for other topics. So I am relying on people like you to gather up the constraints to make sure the resulting semantics is not just consistent with LLVM but also useful. ;) It might turn out that that's impossible, in which case we can hopefully convince LLVM to change.

This is surprising, because I thought it was the whole point of LLVM's APFloat code (which even goes as far as to support like the horrible PowerPC long double type...). That said, it's not like I can argue with facts, if those bugs are happening, then they're happening... But are we sure those aren't just normal bugs in LLVM?

They might well be bugs! Since you seem to know a lot about floating-point, it would be great if you could help figure that out. :)

That said the only reason I wouldn't be willing to say "I don't care that much about what happens to NaN during const prop" is that you can't know when LLVM will happen to see enough to do more const prop.

Right, that's exactly the point -- const-prop must not change what the program does. So either it must produce the exact same results as hardware, or else we have to say that the involved operation is non-deterministic.

Just took a peek at https://webassembly.github.io/spec/core/exec/numerics.html (and elsewhere in the spec) and regret not doing so sooner. In particular, there's a lot of mention on when canonicalization can happen, but none of the places are on load/reinterpret.

So what is the executive summary?

A quick glance shows that these operations are definitely non-deterministic. So scratch all I said about this above, this basically forces LLVM to never ever duplicate floating-point instructions. Any proposals for (a) figuring out if they are doing this right and (b) documenting this in the LLVM LangRef to make sure they are aware of the problem?

@RalfJung
Copy link
Member

@ecstatic-morse you listed #73288 in the original issue here, but isn't that a different problem? Namely, this issue here is about NaN bits in general, whereas #73288 is specific to i686 and thus seems more related to #72327. (I don't think we have a meta-issue for "x87 floating point problems", but maybe we should.)

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Sep 14, 2020

#72327 affects only i586 targets (x86 without SSE2). This is a tier 2 platform, and the last x86 processor without SSE2 left the plant about 20 years ago, so I would have no problem exempting it from whatever guarantees around NaN payloads we wish to make. However, #73288 affects i686 (the latest 32-bit x86 target) as well, which is tier 1. Obviously, we could (and maybe should) exempt all 32-bit x86 targets from the NaN payload guarantees, but I consider #73288 to be of greater importance than issues only affecting i586.

As an aside, I will note that "Unless we are prepared to guarantee more" was doing a lot of work in the OP. I'd be very happy if we came up with a stricter set of semantics that we can support across tier 1 platforms (possibly exempting 32-bit x86) and implemented them. However, doing so will require a non-trivial amount of work, much of it on the LLVM side. I think that, in the meantime, we should explicitly state where we currently fall short in the documentation of affected APIs, similar to #10184. That's what this issue is about.

@ecstatic-morse
Copy link
Contributor Author

Also, look out for my latest crate, AtomicNanCanonicalizingF32, on crates.io.

@workingjubilee
Copy link
Contributor

Ah, indeed! Yes, that would Not be okay.

Insofar as the standard is concerned, to my reading and understanding:

  • If all inputs to an op are non-NaN, then there are only a few sets of input values which can yield a NaN float, which do include mul(NEG_INFINITY, 0.0).
  • A NaN float is a bitstring with some bits set and others in an undetermined state. Their state can be revealed, however, by:
  • Operations that only examine a NaN float (e.g. partial_cmp) or interact with it solely as a bitstring (abs, neg, copysign, and Copy) are deterministic.

Most of the LLVM value-changing optimizations are noted as permissible to some degree by the IEEE754-2019 standard if offered as opt-ins, except for the "no signed zeros" marker, which the standard does not recognize as a valid optimization.

@thomcc
Copy link
Member

thomcc commented Mar 20, 2021

#81261 basically says that NEG_INFINITY * 0.0 is non-deterministic

That's not quite right. The issue is that when evaluated at compile time, it produces one result, and at runtime, another. Evaluating either at compiletime or runtime is fully deterministic (modulo wasm, where I guess it's explicitly nondeterministic).

@RalfJung
Copy link
Member

RalfJung commented Mar 20, 2021

That's not quite right. The issue is that when evaluated at compile time, it produces one result, and at runtime, another.

The only way this is not a bug is if evaluation is non-deterministic. Rust has the same evaluation rules for compile-time and run-time. Otherwise there'd be two Rust languages and we'd have a horrible mess...

Evaluating either at compiletime or runtime is fully deterministic (modulo wasm, where I guess it's explicitly nondeterministic).

Of course, the actual implementation is never non-deterministic. But the specification of Rust has to be non-deterministic here, or we have to change either compile-time or run-time behavior.

@thomcc
Copy link
Member

thomcc commented Mar 20, 2021

The only way this is not a bug is if evaluation is non-deterministic

IMO it is a bug.

Of course, the actual implementation is never non-deterministic. But the specification of Rust has to be non-deterministic here, or we have to change either compile-time or run-time behavior.

I mean, it's really easy for me to argue that the changing the compile-time behavior is right. Unfortunately, that's difficult because it requires changing how APFloat works in LLVM, and it's not a trivial change either.

That said, IMO the solution to hard, low-impact bugs shouldn't be to rework the language so that they're not bugs. Eventually they should be fixed, even if it's not a high priority.

Additionally, a different Rust compiler probably wouldn't have the same difficulty here.

@RalfJung
Copy link
Member

That's not quite right. The issue is that when evaluated at compile time, it produces one result, and at runtime, another.

Also, that's not even true. The original code sample in that issue shows two different behaviors at runtime:

use std::ops::Mul;

fn main() {
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY.mul(0.0)), -1.0f64);
    assert_eq!(1.0f64.copysign(f64::NEG_INFINITY * 0.0), -1.0f64);
}

@thomcc
Copy link
Member

thomcc commented Mar 20, 2021

What is the consequence of being "non-reproducible"? This is possible in safe code so it cannot do anything funny in Rust. In particular it may not introduce "unstable values" due to inconsistently applied compiler transformations.

I've been meaning to say this, but the reproducibility rules are probably a bit of a red herring. They're only really meant to apply to programs that opt into a subset of floating point semantics.

Also, that's not even true. The original code sample in that issue shows two different behaviors at runtime:

I believe this is due to one of these being impacted by LLVM's constant propagation and the other not.

@RalfJung
Copy link
Member

RalfJung commented Mar 20, 2021

I believe this is due to one of these being impacted by LLVM's constant propagation and the other not.

Sure. But that doesn't change the fact that this is runtime code. And to my knowledge, LLVM doesn't consider this optimization a bug, since the result produced by LLVM is legal according to the IEEE floating-point spec. There isn't even an LLVM bugreport for the f64::NEG_INFINITY * 0.0 case, is there?

That said, IMO the solution to hard, low-impact bugs shouldn't be to rework the language so that they're not bugs. Eventually they should be fixed, even if it's not a high priority.

It is my understanding that some aspects of the bitwise results of floating-point operations (in particular for NaNs) are inherently not defined in the LLVM IR semantics (or in the IEEE semantics, which LLVM [mostly?] follows). This is not a bug, it is part of their spec. So if we want to use LLVM as the backend, we have no choice but to also incorporate a similar kind of non-determinism into the Rust semantics (or lobby for LLVM to change their spec).

This is not reworking the language, it is properly understanding the consequences of what it means to say that Rust uses IEEE floating-point semantics. If agree that it would be nice to have deterministic floating-point operations, but that's just not realistic when LLVM (and WebAssembly) made a different choice.

@RalfJung
Copy link
Member

RalfJung commented Mar 20, 2021

Put differently: a bug usually means that something is not working according to spec. I don't see that happen here (but I keep getting lost in the details of FP semantics). My understanding is that this issue is about better documenting the Rust spec, not about changing the behavior of rustc.

One could argue that the spec has a bug due to being too liberal, but given that the spec we are talking about here is the LLVM IR spec and by extension the IEEE FP spec, that does not seem like a particularly useful of constructive approach. (Specs can certainly have bugs when they fail to be self-consistent or when they do not adequately reflect intended behavior, but that does not seem to be the case here.)

@workingjubilee
Copy link
Contributor

I do not believe lobbying LLVM for a hardware-respecting behavior seems that unlikely. It may make some proofs regarding optimizations easier, for one.

@RalfJung
Copy link
Member

It may make some proofs regarding optimizations easier, for one.

I don't see how that would be the case.

I do not believe lobbying LLVM for a hardware-respecting behavior seems that unlikely.

Fair. But this is the wrong forum to do so. ;)

@DemiMarie
Copy link
Contributor

What about refusing to constant-evaluate any operation that is non-reproducible?

@RalfJung
Copy link
Member

By const-evaluate I assume you mean constant propagation / constant folding, i.e., the optimization pass that tries to avoid redundant computations at runtime? That is distinct from CTFE (compile-time function evaluation, also sometimes called const evaluation), which is about computations that the spec says happen at compile-time (such as the initial values of a const, array sizes, or enum discriminant values).

We could do that in rustc, but can we convince LLVM to stop folding f64::NEG_INFINITY * 0.0?

@DemiMarie
Copy link
Contributor

DemiMarie commented Oct 30, 2021

By const-evaluate I assume you mean constant propagation / constant folding, i.e., the optimization pass that tries to avoid redundant computations at runtime? That is distinct from CTFE (compile-time function evaluation, also sometimes called const evaluation), which is about computations that the spec says happen at compile-time (such as the initial values of a const, array sizes, or enum discriminant values).

We could do that in rustc, but can we convince LLVM to stop folding f64::NEG_INFINITY * 0.0?

File a bug against LLVM? I don’t know 🙂

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 9, 2022
…shtriplett

Improve floating point documentation

This is my attempt to improve/solve rust-lang#95468 and rust-lang#73328 .

Added/refined explanations:
- Refine the "NaN as a special value" top level explanation of f32
- Refine `const NAN` docstring: add an explanation about there being multitude of NaN bitpatterns and disclaimer about the portability/stability guarantees.
- Refine `fn is_sign_positive` and `fn is_sign_negative` docstrings: add disclaimer about the sign bit of NaNs.
- Refine `fn min` and `fn max` docstrings: explain the semantics and their relationship to the standard and libm better.
- Refine `fn trunc` docstrings: explain the semantics slightly more.
- Refine `fn powi` docstrings: add disclaimer that the rounding behaviour might be different from `powf`.
- Refine `fn copysign` docstrings: add disclaimer about payloads of NaNs.
- Refine `minimum` and `maximum`: add disclaimer that "propagating NaN" doesn't mean that propagating the NaN bit patterns is guaranteed.
- Refine `max` and `min` docstrings: add "ignoring NaN" to bring the one-row explanation to parity with `minimum` and `maximum`.

Cosmetic changes:
- Reword `NaN` and `NAN` as plain "NaN", unless they refer to the specific `const NAN`.
- Reword "a number" to `self` in function docstrings to clarify.
- Remove "Returns NAN if the number is NAN" from `abs`, as this is told to be the default behavior in the top explanation.
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
Improve floating point documentation

This is my attempt to improve/solve rust-lang/rust#95468 and rust-lang/rust#73328 .

Added/refined explanations:
- Refine the "NaN as a special value" top level explanation of f32
- Refine `const NAN` docstring: add an explanation about there being multitude of NaN bitpatterns and disclaimer about the portability/stability guarantees.
- Refine `fn is_sign_positive` and `fn is_sign_negative` docstrings: add disclaimer about the sign bit of NaNs.
- Refine `fn min` and `fn max` docstrings: explain the semantics and their relationship to the standard and libm better.
- Refine `fn trunc` docstrings: explain the semantics slightly more.
- Refine `fn powi` docstrings: add disclaimer that the rounding behaviour might be different from `powf`.
- Refine `fn copysign` docstrings: add disclaimer about payloads of NaNs.
- Refine `minimum` and `maximum`: add disclaimer that "propagating NaN" doesn't mean that propagating the NaN bit patterns is guaranteed.
- Refine `max` and `min` docstrings: add "ignoring NaN" to bring the one-row explanation to parity with `minimum` and `maximum`.

Cosmetic changes:
- Reword `NaN` and `NAN` as plain "NaN", unless they refer to the specific `const NAN`.
- Reword "a number" to `self` in function docstrings to clarify.
- Remove "Returns NAN if the number is NAN" from `abs`, as this is told to be the default behavior in the top explanation.
@RalfJung
Copy link
Member

RalfJung commented Aug 4, 2023

I have written a Pre-RFC on our floating-point guarantees, which is almost exclusively about NaNs. That document describes what are currently the best possible guarantees we can provide, given LLVM's documentation. However, LLVM also seems to be open to providing stronger guarantees.

@the8472
Copy link
Member

the8472 commented Sep 5, 2023

and the last x86 processor without SSE2 left the plant about 20 years ago

To be pedantic, the Vortex86DX3 is still being made and only supports SSE
And they claim linux support. Some poor soul out there may still be compiling x86-no-SSE2 code for linux shipped on "new" hardware. That said, I'm not aware of any instances of this actually happening, just raising the possibility.

Edit: #35045 (comment) mentioned in 2016 that he's using a VortexX86

@RalfJung
Copy link
Member

RalfJung commented Sep 5, 2023

I'm more concerned about someone using -C target-cpu=pentium on one of our tier 1 i686 targets an expecting that to work properly. Maybe we should just forbid disabling SSE2 support...

@RalfJung
Copy link
Member

The RFC rust-lang/rfcs#3514 makes a concrete proposal for our guarantees for the bits of NaNs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-floating-point Area: Floating point numbers and arithmetic A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants