Support unsafe float-to-int casts #1264

RalfJung · 2020-03-26T15:02:46Z

With rust-lang/rust#66841, we have some unsafe intrinsics to float-to-int casts. We should implement those in Miri, with appropriate UB checking of course. So we need to figure out how to check, with apfloat, if the cast would go wrong.

We also should check that the float-to-int "as" casts in Miri match the planned safe behavior in Rust. I am not sure what those plans currently are, probably whatever -Z saturating-float-casts does. @eddyb the from_i128/from_u128 methods in apfloat, are they saturating? (rust-lang/rust#70437, #1265)

The text was updated successfully, but these errors were encountered:

RalfJung · 2020-03-26T15:12:45Z

The docs for the apfloat cast method says

In case of an invalid operation exception, deterministic values are returned, namely zero for NaNs and the minimal or maximal value respectively for underflow or overflow.

So, it seems like our "as" implementation matches -Z saturating-float-casts?

eddyb · 2020-03-26T16:00:58Z

@hanna-kruppe is the expert here, but that sounds right to me.

hanna-kruppe · 2020-03-26T16:24:19Z

Yes, the behavior in that quote matches the runtime behavior with -Z saturating-float-casts, assuming underflow means "negative value outside the range of destination type" (like overflow, just with opposite sign) rather than the condition called underflow in IEEE 754 (magnitude too close to zero, regardless of sign).

RalfJung · 2020-03-26T16:28:13Z

assuming underflow means "negative value outside the range of destination type" (like overflow, just with opposite sign) rather than the condition called underflow in IEEE 754 (magnitude too close to zero, regardless of sign)

Hm... sounds like a reasonable interpretation, but it's a bit worrying that terminology doesn't match up.

@hanna-kruppe

Miri float->int casts: be explicit that this is saturating r? @hanna-kruppe Cc rust-lang/miri#1264

RalfJung · 2020-03-28T14:36:13Z

Okay, with #1265 it seems like our cast story is correct (except for the NaN issue tracked at rust-lang/rust#69532). I confirmed locally that this matches behavior of rustc with -Zmir-opt-level=0 -Zsaturating-float-casts.

What remains is the unsafe intrinsic, where the hard part is the UB checking. How do we detect if casting the float to the target int type overflowed or not?
We could rather crudely compare the result to the max/min value of the target type, but I assume that's wrong -- we could hit that value regularly, not as result of saturation.

The safety docs for these methods say the input must "be representable in the return type Int, after truncating off its fractional part". Doesn't that mean that if u64::MAX as f64 rounds up, the result may not be used with these methods? ~~(It seems to round down, but that might not be the case for all possible casts.)~~
Cc @SimonSapin

We could also compare the input with u64::MAX as f64 (if we are currently doing f64 -> u64 casting), and reject it if it is bigger. But I assume that is incorrect, because u64::MAX as f64 is 18446744073709551615, and 18446744073709551616 is still smaller than u64::MAX and thus a legal input. OTOH, that number is not representable I guess, otherwise the cast would not have rounded down quite so far... but for smaller types this certainly doesn't work; 255.5f64 as u8 is legal according to what the docs say, but it is bigger than u8::MAX as f64.

hanna-kruppe · 2020-03-28T15:27:58Z

I believe the simplest way (for miri) is:

Perform the "drop the fractional bits" part first. This should be rustc_apfloat's round_to_integral with rounding to zero.
Cast the truncated float to u128 and compare that against the target type's range. You'll still need to check that the float value wasn't out of range for u128, but since the fractional bits were already dropped in the first step, the is_exact output parameter will tell you if the "integer part" fit into u128.

The conversion to u128 has no equivalent for signed types but waves hand splitting the float into (sign, absolute value) should solve that. Should not be a conceptual challenge, this kind of trick just often leads to extra bookkeeping that can be annoying to spell out (which is why I hand-wave it here).

hanna-kruppe · 2020-03-28T15:33:08Z

Cast the truncated float to u128 and compare that against the target type's range. You'll still need to check that the float value wasn't out of range for u128, but since the fractional bits were already dropped in the first step, the is_exact output parameter will tell you if the "integer part" fit into u128.

Actually, just casting to the right bit width to begin with should work just fine (for unsigned types; again signed types need extra care). The cast needs to perform the integer range check anyway, and for reasons described above is_exact should tell you precisely if it was out of range.

RalfJung · 2020-03-28T15:37:17Z

For signed casts, why would to_i128 not work just as well as to_u128 works for unsigned?

For is_exact, there actually seem to be two ways to determine this? There's the &mut bool: is_exact in the _r variants of the cast methods, but the returned Status also has an INEXACT flag.

RalfJung · 2020-03-28T15:40:56Z

Cast the truncated float to u128

So there will never be a problem where truncation to an integer leads to more rounding loss, or so? The truncation operation is always exactly representable?

hanna-kruppe · 2020-03-28T15:54:20Z

For signed casts, why would to_i128 not work just as well as to_u128 works for unsigned?

Oops, I did not see those on a glance because they're provided methods. Yes, either to_i128 or to_i128_r should work depending on whether the is_exact / INEXACT distinction matters.

For is_exact, there actually seem to be two ways to determine this? There's the &mut bool: is_exact in the _r variants of the cast methods, but the returned Status also has an INEXACT flag.

According to the code, the only difference should be that casting -0.0 to an integer returns Status::OK but sets is_exact = false. Is negative zero UB with this intrinsic or does it map to integer zero?

So there will never be a problem where truncation to an integer leads to more rounding loss, or so? The truncation operation is always exactly representable?

Right, this operation is exact. Conceptually, it just clears any bits of the significant that have "weight" smaller than 1. It's not quite so simple on the level of bit strings (because of the implicit leading 1 in normal numbers, the exponent sometimes has to change), but still you'll never get in the situation where you have to round off non-zero digits.

eddyb · 2020-03-28T21:36:47Z

Looking at the implementation, I'd say INEXACT is for losing fraction bits and is actually safe to ignore, it's rather any other flag (i.e. INVALID_OP) that might be considered UB:
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1239-L1241
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1288-L1291
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1295-L1298

(argh GitHub, this is what we get for using separate repos, I guess 😠)

SimonSapin · 2020-03-29T06:04:37Z

The safety docs for these methods say the input must "be representable in the return type Int, after truncating off its fractional part". Doesn't that mean that if u64::MAX as f64 rounds up, the result may not be used with these methods?

As far as I can tell, that’s correct.

(It seems to round down, but that might not be the case for all possible casts.)

Uh, it seems to round up. The output for me is:

18446744073709551615
18446744073709552000

It’s more obvious in hexadecimal, and when also finding the previous representable f64:

fn main() {
    println!("{:20x}", u64::MAX);
    println!("{:20x}", u64::MAX as f64 as u128);
    println!(
        "{:20x}",
        (0..u64::MAX)
            .rev()
            .find(|&n| n as f64 != u64::MAX as f64)
            .unwrap()
    );
}

(Playground)

Output:

    ffffffffffffffff
   10000000000000000
    fffffffffffffbff

https://doc.rust-lang.org/nightly/reference/expressions/operator-expr.html#type-cast-expressions documents:

Casting from an integer to float will produce the closest possible float *

if necessary, rounding is according to roundTiesToEven mode ***

*** as defined in IEEE 754-2008 §4.3.1: pick the nearest floating point number, preferring the one with an even least significant digit if exactly halfway between two floating point numbers.

Rounding up causes an error of 1, but rounding down would cause an error of 1024 so the former is closest.

SimonSapin · 2020-03-29T06:31:36Z

As to the behavior of {f32,f64}::to_int_unchecked, it’s basically defined as same as LLVM’s fptoui and fptosi instructions. Their docs include “if the value fits in [the target type]” but unfortunately are not more precise than that. However these instructions are presumably designed for clang to implement C and C++ casts, and we find:

C11 standard https://port70.net/~nsz/c/c11/n1570.html#6.3.1.4

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

C++17 standard http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf#section.7.10

A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

It looks like truncation towards zero matches the existing f32::trunc and f64::trunc methods, so that part is easy enough.

Correctly implementing a check that a (truncated) float value can be represented can be tricky however, especially for large integer types. I’d recommend looking at the implementation of -Z saturating-float-casts:

https://github.com/rust-lang/rust/blob/150322f86d441752874a8bed603d71119f190b8b/src/librustc_codegen_ssa/mir/rvalue.rs#L775-L892

RalfJung · 2020-03-29T08:16:01Z

@eddyb

Looking at the implementation, I'd say INEXACT is for losing fraction bits and is actually safe to ignore, it's rather any other flag (i.e. INVALID_OP) that might be considered UB

Oh so by looking at these flags I can actually skip the first truncation step? Or, maybe playing it safer, I could truncate and then make sure there are no flags at all?

However, when we are casting to u128, how can this line possibly detect overflow?
https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1296
r > overflow is just always false when overflow is u128::MAX.

@SimonSapin

Uh, it seems to round up.

Ouch, clearly I cannot read. Thanks for correcting me.
So, (u64::MAX as f64).approx_unchecked_to::<u64>() is UB. Interesting.

presumably

Yeah, that's the usual problem with the LLVM docs, you have to do a lot of guesswork if you want to know the UB conditions. :/ That's why I went for your Rust-side docs instead. Thanks for giving the sources for those!

Correctly implementing a check that a (truncated) float value can be represented can be tricky however, especially for large integer types.

That's what I was worried about, but apfloat's to_[iu]128_r methods have an is_exact output parameter that should tell me if any further rounding happened when turning the truncated part into an integer of appropriate size.

Open questions:

Is casting NaN UB? I would assume so.
Is casting negative 0 UB? From your C/C++ quotes, I would assume not, but apfloat sets is_exact to false for negative 0, so we have to be careful here.
Are there any other "magic" float values that I missed? The infinities quite clearly are too big/small for the target type, and hence UB, I would argue.

I’d recommend looking at the implementation of -Z saturating-float-casts:

I don't think that actually helps for UB detection: that code saturates when the float exceeds Int::MAX as Float. For example, the saturation branch is entered when doing 255.5f64 as u8, as 255.5 is bigger than 255. And the result for that is correct (255u8), but this is not a sufficient condition for have UB.
Am I missing something?

eddyb · 2020-03-29T09:15:53Z

@RalfJung That part is handled by https://github.com/rust-lang/rust/blob/c52cee172fcd2e223100d8bdd5e105dc37aaca23/src/librustc_apfloat/ieee.rs#L1264-L1267.

hanna-kruppe · 2020-03-29T09:35:30Z

@eddyb

Looking at the implementation, I'd say INEXACT is for losing fraction bits and is actually safe to ignore, it's rather any other flag (i.e. INVALID_OP) that might be considered UB:

After looking over it for a few minutes, this claim seems plausible but I'm not 100% sure about it. To me, removing fractional bits first seems easier to have confidence in, and performance is not really a concern here. There's also the open question of -0.0.

@RalfJung

I’d recommend looking at the implementation of -Z saturating-float-casts:

I don't think that actually helps for UB detection: that code saturates when the float exceeds Int::MAX as Float. For example, the saturation branch is entered when doing 255.5f64 as u8, as 255.5 is bigger than 255. And the result for that is correct (255u8), but this is not a sufficient condition for have UB.
Am I missing something?

No, you're right. The [f_min, f_max] range computed there is an under-approximation of the range where fpto[su]i is defined. The precise range where the unchecked conversion is well-defined could be computed similarly with more effort, but the whole approach is rather fiddly and only chosen to minimize performance overhead by avoiding soft-float computations at runtime. For miri there's no such constraint (all FP math is soft-float anyway), so explicitly computing the safe range is unnecessary in that context.

SimonSapin · 2020-03-29T19:18:05Z

Sorry I don’t have a precise source for each of these but my understanding/recollection is:

Is casting NaN UB? I would assume so.

Yes.

Is casting negative 0 UB?

No. The integral part of negative zero can be represented as integer zero.

Are there any other "magic" float values that I missed?

No.

The infinities quite clearly are too big/small for the target type, and hence UB, I would argue.

Right, in an algorithm based on comparing they’ll behave the same as out-of-range finite values.

For example, the saturation branch is entered when doing 255.5f64 as u8,

Oh you’re right, sorry. Scratch that recommendation :)

RalfJung · 2020-04-10T22:24:37Z

There are a bunch of testcases for this at https://github.com/WebAssembly/testsuite/blob/master/conversions.wast (WebAssembly semantics seems to match ours).

RalfJung · 2020-04-11T07:39:47Z

Hm, some of these use hexadecimal floating-point notation, such as 0x1p-149, which Rust cannot parse it seems. :/

RalfJung · 2020-04-11T07:50:08Z

Ah, I can convert them using godbolt ;) (and here's the f64 variant

hanna-kruppe · 2020-04-11T08:50:49Z

You could also use https://crates.io/crates/hexf

eddyb · 2020-04-11T13:35:44Z

Hm, some of these use hexadecimal floating-point notation, such as 0x1p-149, which Rust cannot parse it seems. :/

rustc_apfloat has hexfloat support, let me know if it doesn't work as expected.

RalfJung · 2020-04-11T13:40:37Z

Well that doesn't help, I need to input them in the .rs file. And I can't really add external dependencies there, so for now I just converted them to bit patterns externally and then used from_bits. See #1321.

eddyb · 2020-04-11T13:52:16Z

Ah you're not writing an unit test, my bad.

RalfJung · 2020-04-12T10:06:16Z

I implemented the unchecked intrinsic in #1325. Reviews would be welcome, in particular for the testcases. :)

RalfJung added A-shims Area: This affects the external function shims C-enhancement Category: a PR with an enhancement or an issue tracking an accepted enhancement labels Mar 26, 2020

RalfJung mentioned this issue Mar 26, 2020

Miri float->int casts: be explicit that this is saturating rust-lang/rust#70437

Merged

RalfJung changed the title ~~Support/adjust float-to-int casts~~ Support unsafe float-to-int casts Mar 26, 2020

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Mar 26, 2020

Rollup merge of rust-lang#70437 - RalfJung:miri-saturate, r=hanna-kruppe

195147c

Miri float->int casts: be explicit that this is saturating r? @hanna-kruppe Cc rust-lang/miri#1264

RalfJung added the E-medium label Mar 28, 2020

RalfJung mentioned this issue Apr 12, 2020

implement float_to_int_unchecked #1325

Merged

bors closed this as completed in 45113eb Apr 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support unsafe float-to-int casts #1264

Support unsafe float-to-int casts #1264

RalfJung commented Mar 26, 2020 •

edited

RalfJung commented Mar 26, 2020

eddyb commented Mar 26, 2020

hanna-kruppe commented Mar 26, 2020 •

edited

RalfJung commented Mar 26, 2020

RalfJung commented Mar 28, 2020 •

edited

hanna-kruppe commented Mar 28, 2020 •

edited

hanna-kruppe commented Mar 28, 2020

RalfJung commented Mar 28, 2020

RalfJung commented Mar 28, 2020

hanna-kruppe commented Mar 28, 2020 •

edited

eddyb commented Mar 28, 2020

SimonSapin commented Mar 29, 2020

SimonSapin commented Mar 29, 2020

RalfJung commented Mar 29, 2020 •

edited

eddyb commented Mar 29, 2020

hanna-kruppe commented Mar 29, 2020 •

edited

SimonSapin commented Mar 29, 2020

RalfJung commented Apr 10, 2020

RalfJung commented Apr 11, 2020

RalfJung commented Apr 11, 2020 •

edited

hanna-kruppe commented Apr 11, 2020

eddyb commented Apr 11, 2020

RalfJung commented Apr 11, 2020

eddyb commented Apr 11, 2020

RalfJung commented Apr 12, 2020

Support unsafe float-to-int casts #1264

Support unsafe float-to-int casts #1264

Comments

RalfJung commented Mar 26, 2020 • edited

RalfJung commented Mar 26, 2020

eddyb commented Mar 26, 2020

hanna-kruppe commented Mar 26, 2020 • edited

RalfJung commented Mar 26, 2020

RalfJung commented Mar 28, 2020 • edited

hanna-kruppe commented Mar 28, 2020 • edited

hanna-kruppe commented Mar 28, 2020

RalfJung commented Mar 28, 2020

RalfJung commented Mar 28, 2020

hanna-kruppe commented Mar 28, 2020 • edited

eddyb commented Mar 28, 2020

SimonSapin commented Mar 29, 2020

SimonSapin commented Mar 29, 2020

RalfJung commented Mar 29, 2020 • edited

eddyb commented Mar 29, 2020

hanna-kruppe commented Mar 29, 2020 • edited

SimonSapin commented Mar 29, 2020

RalfJung commented Apr 10, 2020

RalfJung commented Apr 11, 2020

RalfJung commented Apr 11, 2020 • edited

hanna-kruppe commented Apr 11, 2020

eddyb commented Apr 11, 2020

RalfJung commented Apr 11, 2020

eddyb commented Apr 11, 2020

RalfJung commented Apr 12, 2020

RalfJung commented Mar 26, 2020 •

edited

hanna-kruppe commented Mar 26, 2020 •

edited

RalfJung commented Mar 28, 2020 •

edited

hanna-kruppe commented Mar 28, 2020 •

edited

hanna-kruppe commented Mar 28, 2020 •

edited

RalfJung commented Mar 29, 2020 •

edited

hanna-kruppe commented Mar 29, 2020 •

edited

RalfJung commented Apr 11, 2020 •

edited