Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32-bit Windows cannot handle Rfloat::na() #321

Closed
yutannihilation opened this issue Nov 10, 2021 · 16 comments · Fixed by #328
Closed

32-bit Windows cannot handle Rfloat::na() #321

yutannihilation opened this issue Nov 10, 2021 · 16 comments · Fixed by #328
Labels
arch-x86 x86-related issues bug Something isn't working os-Windows Windows-specific problems

Comments

@yutannihilation
Copy link
Contributor

c.f. #318 (comment)

failures:

---- src\scalar\rfloat.rs - scalar::rfloat::Rfloat (line 35) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'assertion failed: (<Rfloat>::na()).is_na()', src\scalar\rfloat.rs:6:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


---- src\wrapper\doubles.rs - wrapper::doubles::Doubles::elt (line 25) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'assertion failed: vec.elt(10).is_na()', src\wrapper\doubles.rs:9:4
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

For this reason, one ALTREP-related test is disabled. Let's not forget to re-enable this as well.

473ac05

@Ilia-Kosenkov
Copy link
Member

I will try to take a look at this.

@andy-thomason
Copy link
Contributor

andy-thomason commented Nov 11, 2021 via email

@Ilia-Kosenkov
Copy link
Member

So the issue is rather weird. Here is what I found:
To test for NA, we use a libR_sys constant libR_sys::R_NaReal, which is then compared to f64 bitwise (because it is a NaN, which is not comparable to any other NaN or itself).

So in extendr tests I retrieved Rfloat::na().0 and libR_sys::R_NaReal and printed the output as i64 using .to_bits().

  • x64
9218868437227407266  # libR_sys::R_NaReal
9218868437227407266  # Rfloat::na()
  • x86
9221120237041092514 # libR_sys::R_NaReal
9218868437227407266 # Rfloat::na()

To me it seems that somehow when compiling for i686, extendr-api references 64-bit bindings, yet the linked bindings are 32-bit. Hence, libR_sys values changes while Rfloat::na() does not.

@Ilia-Kosenkov Ilia-Kosenkov added arch-x86 x86-related issues bug Something isn't working os-Windows Windows-specific problems labels Nov 11, 2021
@Ilia-Kosenkov
Copy link
Member

x64

 & "$env:R_HOME\bin\x64\Rscript.exe" -e "rextendr::rust_eval('rprintln!(\`"{:?}\`", unsafe { libR_sys::R_NaReal.to_bits()});', dependencies = list(``libR-sys`` = '0.2.2'), extendr_deps = list(``extendr-api`` = list(git = 'https://github.com/extendr/extendr', branch = 'master')))"

9218868437227407266

x86

 & "$env:R_HOME\bin\i386\Rscript.exe" -e "rextendr::rust_eval('rprintln!(\`"{:?}\`", unsafe { libR_sys::R_NaReal.to_bits()});', dependencies = list(``libR-sys`` = '0.2.2'), extendr_deps = list(``extendr-api`` = list(git = 'https://github.com/extendr/extendr', branch = 'master')))"

9218868437227407266

Extremely strange.

@Ilia-Kosenkov
Copy link
Member

fn main() {
    extendr_engine::start_r();
    println!("{:?}", unsafe { libR_sys::R_NaReal.to_bits() });
    extendr_engine::end_r();
}

This thing works correctly for both architectures and yields 9218868437227407266.

@Ilia-Kosenkov
Copy link
Member

{cpp11} for both architectures correctly retrieves value of NA:

 writable::integers get_na() {
    auto na = R_NaReal;
    auto na_as_ulong = *((unsigned long long*)&na);
    writable::integers out(2);
    out[0] = (int)(na_as_ulong >> 32);
    out[1] = (int)(na_as_ulong);
    return out;
 }
> get_na()
[1] 2146435072       1954

NA is essentially 0x7ff00000 << 32 | 1954.

@Ilia-Kosenkov
Copy link
Member

Ilia-Kosenkov commented Nov 11, 2021

Soooooo, it seems there is some internal magic in Rust about NaN handling.
The issue seems to be resolved if #[repr(C)]...

And I cannot reproduce this behaviour under different conditions.
This turned out to be incorrect

@Ilia-Kosenkov
Copy link
Member

The difference between 32- and 64- bit NAs is in the high DWORD.
The low part (DWORD) is 1954u32 in both cases, but the high part differs:

  • x64: 0x7ff00000u64 << 32 | 1954 = 9218868437227407266
  • x86: 0x7ff80000u64 << 32 | 1954 = 9221120237041092514
    I have absolutely no idea why this happens. The difference is one bit (0 vs 8 in hex notation).

UPD: I think I found an explanation but not yet the solution...

@andy-thomason
Copy link
Contributor

You may have found a rust bug! Ross Ihaka's birthday breaks the build.

https://en.wikipedia.org/wiki/Ross_Ihaka

@andy-thomason
Copy link
Contributor

Could you formulate this into a 3 line example without dependencies so that we can submit a bug report?

@Ilia-Kosenkov
Copy link
Member

Ilia-Kosenkov commented Nov 23, 2021

@andy-thomason, this is not a bug. Check out the explanation I linked above. In simple terms, when we compile for x86 we observe FPU/CPU being 'smart' and setting NAN silent bit when reinterpreting u64 to f64 and back again. This 'feature' was found by the devs working on wasm compiler wasmi experimental interpreter -- they got the same problem of roundtripping i32 to f32 and back.
One of the devs claims that this allows to convert floats to integers and back without acquiring an extra silent NaN bit.

So far it seems that it happens during compilation of extendr-api, when we access NA literal from libR-sys. I used .to_bits() to obtain the representation and it adds the 'silent' NaN bit.

I tried creating a tiny Rust library referencing libR-sys and extendr-engine, and accessing NA constant after compiling for x86_64 and i686 -- in both cases I got correct values.

Anyway, I will dig deeper when I have time, but at least now I have some understanding of what is going on (I was afraid we are having some sort of memory corruption or non-atomic access to 64-bit value when running 32-bit).

See also Wiki

@Ilia-Kosenkov
Copy link
Member

@andy-thomason, you were right -- there is a four-line reproducible example, unbelievable.
I need to point out that to compile this we use MSYS2, so it may be related to the libraries/compilers provided by mingw32. Same seems to happen if I use mingw32 toolchain from rtools40.

fn main() {
    let expected_na_val = 0x7ff00000u64 << 32 | 1954;
    assert_eq!(expected_na_val, f64::from_bits(expected_na_val).to_bits());
}
cargo run --target i686-pc-windows-gnu
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `C:\Users\[redacted]\AppData\Local\Temp\cargo_temp\i686-pc-windows-gnu\debug\na_test.exe`
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `9218868437227407266`,
 right: `9221120237041092514`', src\main.rs:3:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `C:\Users\[redacted]\AppData\Local\Temp\cargo_temp\i686-pc-windows-gnu\debug\na_test.exe` (exit code: 101)

UPD: I also tested using different toolchains and no cross-compilation, same result

Toolchain Target
stable-i686-pc-windows-gnu i686-pc-windows-gnu
stable-x86_64-pc-windows-gnu i686-pc-windows-gnu
nightly-x86_64-pc-windows-msvc i686-pc-windows-gnu
nightly-x86_64-pc-windows-msvc i686-pc-windows-msvc
stable-i686-pc-windows-msvc i686-pc-windows-msvc

@Ilia-Kosenkov
Copy link
Member

rust-lang/rust#73328

@andy-thomason
Copy link
Contributor

Or

rust-lang/rust#73288

Which seems to exactly this thing!

As I suspect, there is an 8087 register somewhere in the mix.

If CRAN drop x86 support, this should go away with luck.

@Ilia-Kosenkov
Copy link
Member

Yeah, it definitely will go away, as well as a ton of other issues, as we will likely no longer cross-compile. Yet for now, I wonder if I can make it work correctly.

@Ilia-Kosenkov
Copy link
Member

So my survey of the topic revealed the following:
According to Wikipedia,

signalling NaN is represented as

0 11111111111 0000000000000000000000000000000000000000000000000001 ≙ 7FF0 0000 0000 0001 ≙ NaN (sNaN on most processors, such as x86 and ARM)

quiet NaN is represented as

0 11111111111 1000000000000000000000000000000000000000000000000001 ≙ 7FF8 0000 0000 0001 ≙ NaN (qNaN on most processors, such as x86 and ARM)

The payload (mantissa) should be non-zero, so in R it is the birth year of the author (1954).
What happens is R returns a signalling NaN (highest mantissa bit set to 0), but our bit manipulations turn it into quiet NaN (highest bit set to 1).
Because NA - NaN is 'unique (uses special payload of 1954), we can easily distinguish between NA and NaN in Rust, but I am concerned about the returning value -- we need to make sure we return bitwise-correct NA to R.

As soon as we finish with TryFrom, we need to update extendrtests and verify various conversions, including generating and processing NA_real_ in Rust and returning it to R to verify its bit representation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-x86 x86-related issues bug Something isn't working os-Windows Windows-specific problems
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants