Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert(Int16, ::Float64) does not throw inexact exception when it should #14549

Closed
yuyichao opened this issue Jan 3, 2016 · 24 comments
Closed
Labels
system:arm ARMv7 and AArch64 system:powerpc PowerPC

Comments

@yuyichao
Copy link
Contributor

yuyichao commented Jan 3, 2016

julia> convert(Int16, 88776.0)
23240

LLVM IR (wrapped in another function)

define i16 @julia_i16_convert_22530(double) #0 {
top:
  %1 = fptosi double %0 to i16
  %2 = sitofp i16 %1 to double
  %3 = fcmp oeq double %2, %0
  br i1 %3, label %pass, label %fail

fail:                                             ; preds = %top
  %4 = load %jl_value_t*, %jl_value_t** @jl_inexact_exception, align 4
  call void @jl_throw(%jl_value_t* %4)
  unreachable

pass:                                             ; preds = %top
  ret i16 %1
}

ASM

Dump of assembler code for function julia_i16_convert_22473:
   0xcce3b000 <+0>:     push    {r11, lr}
   0xcce3b004 <+4>:     mov     r11, sp
   0xcce3b008 <+8>:     vcvt.s32.f64    s2, d0
   0xcce3b00c <+12>:    vmov    r0, s2
   0xcce3b010 <+16>:    vcvt.f64.s32    d16, s2
   0xcce3b014 <+20>:    vcmpe.f64       d16, d0
   0xcce3b018 <+24>:    vmrs    APSR_nzcv, fpscr
   0xcce3b01c <+28>:    popeq   {r11, pc}
   0xcce3b020 <+32>:    movw    r0, #2920       ; 0xb68
   0xcce3b024 <+36>:    movt    r0, #63345      ; 0xf771
   0xcce3b028 <+40>:    ldr     r0, [r0]
   0xcce3b02c <+44>:    bl      0xcce3b050
End of assembler dump.

(interestingly this doesn't happen on AArch64)

@yuyichao yuyichao added the system:arm ARMv7 and AArch64 label Jan 3, 2016
@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 3, 2016

ASM is from gcc disassembly due to #14550
LLVM 3.7.0. Hardware is a Cortex-A57 with a chroot ArchLinuxARM armv7. JULIA_CPU_TARGET is cortex-a7.

@Keno
Copy link
Member

Keno commented Jan 3, 2016

Isn't this the same issue as #10124? fptosi is undefined for out-of-range floating point values.

@yuyichao
Copy link
Contributor Author

yuyichao commented Jan 3, 2016

Yeah, seems like it #10124 (comment). I wasn't really sure what exactly are the undefined behaviors.

The PR that closes that issue doesn't seem to cover this one though.

@Keno
Copy link
Member

Keno commented Jan 3, 2016

Yeah, we need to audit all uses of unsafe_trunc, not just the one in ==.

@simonbyrne
Copy link
Contributor

Ah, that is interesting. It seems that undefined behaviour actually allows values outside of the range of the destination type (which in hindsight makes sense, since Int16 isn't a native type).

I guess this means we really need to do the checks before calling unsafe_trunc?

@simonbyrne
Copy link
Contributor

So the problem is that adding range checks pre-fptosi adds overhead (the range boundaries need to be loaded from memory to the registers, then 2 cmps and an and) for what is actually a common operation (the nature of being a dynamic technical language is that code will frequently convert between numeric types).

Ideally this would be handled at the LLVM level: either define a checked version of fptosi, or make the undefined behaviour a little less undefined (i.e. that it returns an arbitrary value of the particular type).

c.f. related rust issue: rust-lang/rust#10184

@simonbyrne
Copy link
Contributor

It's a bit of a hack, but what if we were to define:

function convert(::Type{Int16},x::Float64)
    u = unsafe_trunc(Int32,x) % Int16
    convert(Float64,u) == x || throw(InexactError())
    u
end

It seems to give the same (valid) instructions on x86, does it fix the issue on ARM?

We're still technically playing with undefined behaviour here, so it would be good to get this clarified upstream.

@yuyichao
Copy link
Contributor Author

It does seem to fix the issue on ARM and is still working on all platforms I have tested. There seems to be a ~20% performance regression on both x64 and aarch64 though.

@simonbyrne
Copy link
Contributor

Any idea why it's slower? The code_llvm and code_native output looks basically the same.

@yuyichao
Copy link
Contributor Author

You are right. I was hitting johnmyleswhite/Benchmarks.jl#36 and didn't look at allocation count. There's also a small difference due to inlining (the julia version is harder to inline, which shouldn't be an issue if we simply fix the intrinsics). There's no measurable performance difference once these two issues are fixed.

There's a small difference on x64 with (patched) llvm 3.7.1 in the branch instructions and where the error branch is and IMHO the new version generates slightly better code since the error branch is at the end of the function.

@simonbyrne
Copy link
Contributor

Ah good. I was thinking we could just implement these in Julia, since there's no real reason they need to be intrinsics. What if I just wrap them in an @inline?

@yuyichao
Copy link
Contributor Author

What if I just wrap them in an @inline?

It will still make the function that calls them harder to inline. Not sure how big an effect it is though.

@simonbyrne
Copy link
Contributor

Does convert(Int8,::Float64) have the same issue?

@yuyichao
Copy link
Contributor Author

No because it's defined as an intrinsic?

nvm, I see you are refering to the original issue. .........................

@simonbyrne
Copy link
Contributor

I meant, the same issue as above, i.e. does convert(Int8, 88776.0) throw an error?

@yuyichao
Copy link
Contributor Author

Yeah realized that and edited my comment above...
Yes, it seems that most of the conversion has this issue.

julia> convert(Int8, 200000.0)
64

julia> convert(Int8, 200000f0)
64

julia> convert(UInt8, 200000.0)
0x40

julia> convert(UInt8, 200000f0)
0x40

julia> convert(Int16, 200000.0)
3392

julia> convert(Int16, 200000f0)
3392

julia> convert(UInt16, 200000.0)
0x0d40

julia> convert(UInt16, 200000f0)
0x0d40

(U)Int32 and up seems fine (on arm at least not sure about UB).

@simonbyrne
Copy link
Contributor

I posted a note on llvm-dev here:
http://lists.llvm.org/pipermail/llvm-dev/2016-January/094405.html

@simonbyrne
Copy link
Contributor

I don't know much about Swift, but it seems that they manually check every conversion:
https://github.com/apple/swift/blob/bf969a385f06e9731e4642ace08f42efdf4d6dd8/stdlib/public/core/FloatingPoint.swift.gyb#L657-L690

@simonbyrne
Copy link
Contributor

Ah, but it seems that preconditions are removed on -Ounchecked builds.

@tkelman
Copy link
Contributor

tkelman commented Jan 27, 2016

I had seen somewhere that different levels of -O flags make a surprisingly large difference to swift's performance, this kind of thing is probably part of the reason.

@simonbyrne
Copy link
Contributor

Since we still haven't fixed this, here is a recap:

The problem here is that if the input is out of range in Float -> Integer conversion the behaviour is undefined, and LLVM considers returning a 32bit integer instead of a 16bit one acceptable undefined behaviour, which breaks our assumptions in the convert logic (which converts to integer, then back to float and compares this with the original result).

Unless we can convince LLVM to change this, the options here are:

  1. change all float -> integer conversions to include a manual bounds check (which according to LLVM is the correct way). This seems to be about 50% slower, which is a shame as this is not an uncommon operation.
  2. Just fix the breaking cases on ARM only, either by using a manual check, or doing convert(Int16,convert(Int32, x))

@toivoh
Copy link
Contributor

toivoh commented Jun 28, 2016

If we know that we are going to get a 32 bit result, can we just mask it down to 16 bits before converting back to float? That should be pretty cheap.

@simonbyrne
Copy link
Contributor

Actually, I must have been doing something wrong: now it seems to be about the same (or possibly even faster) to do the range check. So maybe we should change this.

@yuyichao
Copy link
Contributor Author

Seems that LLVM implements more optimizations on aarch64 and now (LLVM-svn) this is a problem there too.

simonbyrne added a commit that referenced this issue Sep 30, 2016
Replaces checked_fptosi/checked_fptoui intrinsics with Julia implementations. Fixes #14549. Explain logic behind float->integer conversion checking
tkelman pushed a commit that referenced this issue Feb 22, 2017
Replaces checked_fptosi/checked_fptoui intrinsics with Julia implementations. Fixes #14549. Explain logic behind float->integer conversion checking

(cherry picked from commit f935a50)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
system:arm ARMv7 and AArch64 system:powerpc PowerPC
Projects
None yet
Development

No branches or pull requests

6 participants