Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove setrounding from functions implementation #442

Closed
lucaferranti opened this issue Mar 6, 2021 · 32 comments · May be fixed by #443
Closed

remove setrounding from functions implementation #442

lucaferranti opened this issue Mar 6, 2021 · 32 comments · May be fixed by #443

Comments

@lucaferranti
Copy link
Member

Some functions do not work on 1.6 on windows. Particularly, I found the functions abs, mig, fma, sqrt . The core problem seems that they use setrounding. abs calls mig and mig and fma use directly setrounding (source code here ). sqrt I am not sure, this code does use set rounding, but the other functions defined there (inv, tanh etc.) do not fail. Here's the full stacktrace for reference

julia> sqrt(1..2)
ERROR: could not load symbol "fesetround":
The specified procedure could not be found.
Stacktrace:
 [1] setrounding_raw
   @ ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:45 [inlined]
 [2] setrounding(#unused#::Type{Float64}, r::RoundingMode{:Down})
   @ SetRounding ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:47
 [3] setrounding(f::IntervalArithmetic.var"#45#46"{Float64}, #unused#::Type{Float64}, rounding::RoundingMode{:Down})
   @ Base.Rounding .\rounding.jl:174
 [4] sqrt
   @ ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\rounding.jl:220 [inlined]
 [5] sqrt
   @ ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\rounding.jl:232 [inlined]
 [6] sqrt
   @ ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\rounding.jl:284 [inlined]
 [7] sqrt(a::Interval{Float64})
   @ IntervalArithmetic ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\functions.jl:233
 [8] top-level scope
   @ REPL[29]:1

julia> mig(1..2)
ERROR: could not load symbol "fesetround":
The specified procedure could not be found.
Stacktrace:
 [1] setrounding_raw
   @ ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:45 [inlined]
 [2] setrounding(#unused#::Type{Float64}, r::RoundingMode{:Down})
   @ SetRounding ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:47
 [3] setrounding(f::IntervalArithmetic.var"#127#128"{Interval{Float64}}, #unused#::Type{Float64}, rounding::RoundingMode{:Down})
   @ Base.Rounding .\rounding.jl:174
 [4] mig(a::Interval{Float64})
   @ IntervalArithmetic ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:297
 [5] top-level scope
   @ REPL[30]:1

julia> fma(1..2, 1..2, 1..2)
ERROR: could not load symbol "fesetround":
The specified procedure could not be found.
Stacktrace:
 [1] setrounding_raw
   @ ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:45 [inlined]
 [2] setrounding(#unused#::Type{Float64}, r::RoundingMode{:Down})
   @ SetRounding ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:47
 [3] setrounding(f::IntervalArithmetic.var"#123#125"{Interval{Float64}, Interval{Float64}, Interval{Float64}}, #unused#::Type{Float64}, rounding::RoundingMode{:Down})
   @ Base.Rounding .\rounding.jl:174
 [4] fma(a::Interval{Float64}, b::Interval{Float64}, c::Interval{Float64})
   @ IntervalArithmetic ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:264
 [5] top-level scope
   @ REPL[31]:1

julia> abs(1..2)
ERROR: could not load symbol "fesetround":
The specified procedure could not be found.
Stacktrace:
 [1] setrounding_raw
   @ ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:45 [inlined]
 [2] setrounding(#unused#::Type{Float64}, r::RoundingMode{:Down})
   @ SetRounding ~\.julia\packages\SetRounding\MjtxK\src\SetRounding.jl:47
 [3] setrounding(f::IntervalArithmetic.var"#127#128"{Interval{Float64}}, #unused#::Type{Float64}, rounding::RoundingMode{:Down})
   @ Base.Rounding .\rounding.jl:174
 [4] mig
   @ ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:297 [inlined]
 [5] abs(a::Interval{Float64})
   @ IntervalArithmetic ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:314
 [6] top-level scope
   @ REPL[32]:1

julia> tanh(1..2)
[0.761594, 0.964028]

julia> inv(1..2)
[0.5, 1]

@lucaferranti
Copy link
Member Author

lucaferranti commented Mar 7, 2021

At least for mig I don't see why setrounding is needed. The absolute value simply changes the sign bit, so no rounding error should be introduced?

julia> a = -sqrt(2)
-1.4142135623730951

julia> bitstring(a)
"1011111111110110101000001001111001100110011111110011101111001101"

julia> bitstring(abs(a))
"0011111111110110101000001001111001100110011111110011101111001101"

I think it could just be

function mig(a::Interval{T}) where T<:Real
    isempty(a) && return convert(eltype(a), NaN)
    zero(a.lo) ∈ a && return zero(a.lo)
    min( abs(a.lo), abs(a.hi) )
end

or is there something I am missing?

@lucaferranti
Copy link
Member Author

lucaferranti commented Mar 9, 2021

for fma I think we do need direct rounding, however I am not sure what would be the best way to approach it. As far as I can tell, none of the macros such as @rounding, @rounding_down would work, since they seem to accept only unary and binary functions. Also I think those macros are only for functions supported by CRlibm and fma isn't? Maybe carrying out the scalar fma computations with BigFloat? This seems to introduce some computational overhead though

@lucaferranti
Copy link
Member Author

Also, looking at the current implementation of fma, I don't quite get why min_exclude_nans is used. The only case in which we can have NaNs is if one of the input intervals is [NaN, NaN] , but in that case all scalar fma computations will return NaN and hence e.g. min_exclude_nans will throw an error. Here's a demonstration in julia 1.5

julia> a = Interval(NaN, NaN)
[NaN, NaN]

julia> b = 1.1..2.2
[1.09999, 2.20001]

julia> fma(a, b, b)
ERROR: MethodError: no method matching min()
Closest candidates are:
  min(::Missing, ::Missing) at missing.jl:124
  min(::Missing, ::Any) at missing.jl:125
  min(::BigFloat, ::BigFloat) at mpfr.jl:685
  ...
Stacktrace:
 [1] min_ignore_nans at C:\Users\lucaa\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:239 [inlined]
 [2] #123 at C:\Users\lucaa\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:269 [inlined]
 [3] setrounding(::IntervalArithmetic.var"#123#125"{Interval{Float64},Interval{Float64},Interval{Float64}}, ::Type{Float64}, ::RoundingMode{:Down}) at .\rounding.jl:176
 [4] fma(::Interval{Float64}, ::Interval{Float64}, ::Interval{Float64}) at C:\Users\lucaa\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:264
 [5] top-level scope at REPL[35]:1

I think if any of the inputs is NaI, then also the final output should be NaI?

@dpsanders
Copy link
Member

@JeffreySarnoff: Do you have any thoughts about directed rounding for fma?

@JeffreySarnoff
Copy link
Contributor

I like it.

@JeffreySarnoff
Copy link
Contributor

I need to play with fma and see if/how there is helpful, guidable meshing with [one of] our approaches to rounding.
(tonight)

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Mar 10, 2021

step 0: we need Julia to intervene, restoring our ability to choose, to set, to get IEEE floating point rounding modes
(that is available only with BigFloats cannot continue forward .. so much of our numerical table settings
that provide places for work and relaxation, and services that other persons use to great advantage ..
is in part reliant upon effective, efficient, performant, functions of floating point environment direction and
at computation time, least significant bits' rounding control that do the same in the same way cross-platform.

so please alert those who may think this is has been resolved -- let them know what you know of things now in the way

You can obtain for fma(x, y, z) the result rounded to nearest with nearest = fma(x, y, z) as RoundNearest is the intialization setting (though we really need to have the capability to ensure it is RoundNearest or other Rounding modes).
With the ability to wrap setrounding(Type, ..) as a block covering a calculation is probably the fastest way [or one of two fastest]
to obtain the corresponding roundup = fma(x,y,z) and rounddown = fma(x,y,z).

Absent that, one may use

function interval_fma(a::T, b::T, c::T) where {T<:Base.IEEEFloat}
  hi, lo = two_fma(a, b, c)
  if signbit(lo)
    lo = prevfloat(hi)
  elseif !zero(lo)
    lo = hi
    hi = nextfloat(hi)
  else 
    lo = hi
  end
  return hi, lo
end

function two_fma(a::T, b::T, c::T) where {T<:Base.IEEEFloat}
    hi = fma(a, b, c)
    hi0, lo0 = two_prod(a, b)
    hi1, lo1 = two_sum(c, lo0)
    hi2, lo2 = two_sum(hi0, hi1)
    lo = ((hi2 - hi) + lo2) + lo1
    return hi, lo
end

"""
    two_sum(a, b)

Computes `hi = fl(a+b)` and `lo = err(a+b)`.
"""
@inline function two_sum(a::T, b::T) where {T<:Base.IEEEFloat}
    hi = a + b
    v  = hi - a
    lo = (a - (hi - v)) + (b - v)
    return hi, lo
end

"""
    two_prod(a, b)

Computes `hi = fl(a*b)` and `lo = fl(err(a*b))`.
"""
@inline function two_prod(a::T, b::T) where {T<:Base.IEEEFloat}
    hi = a * b
    lo = fma(a, b, -hi)
    hi, lo
end

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Mar 10, 2021

(which approach is more performant depends on how performant or otherwise setting and resetting rounding happens to be,
and specifics of the processor [can we run rounding control outside of the fpu and run the fmas in SIMD registers] )

@lucaferranti
Copy link
Member Author

lucaferranti commented Mar 10, 2021

The issue with fma, mig and sqrt seems to be the function setrounding_raw defined in SetRounding.jl, I think the problem is the same causing this issue. I don't know to what platform that issue refers, but based on the results of my latest PR here, it seems that the problem is from Julia 1.6 onwards and on windows only.

On the bright side, I tried the alternative proposed by JeffreySarnoff and it seems to work 🎉 . I also did a small benchmarking with 1.5, where the original version with setrounding works, and they seem to be equivalent, Jeffrey's version is maybe a little faster. In the following snippet, fma is the original one and fma1 is Jeffrey's variant.

 julia> a, b, c = 1.1..2.1, 1.2..2.2, 1.3..2.3
([1.09999, 2.10001], [1.19999, 2.20001], [1.29999, 2.30001])

julia> fma(a, b, c) === fma1(a, b, c)
true

julia> @btime fma($a, $b, $c)
  557.297 ns (20 allocations: 512 bytes)
[2.61999, 6.92001]

julia> @btime fma1($a, $b, $c)
  522.917 ns (20 allocations: 512 bytes)
[2.61999, 6.92001]

I don't have the necessary skills or knowledge to comment about pros and cons of using setrounding, but I follow the discussion with interest.

@lucaferranti
Copy link
Member Author

after the PR in SetRounding.jl this seems to be fixed now. I think fma shouldn't use min_ignore_nan (at least not in its current form) because 1) the current implementation returns an error if all its arguments are NaN 2) the current implementation seems to be type unstable on julia 1.6. However I guess this is out of the original scope of this issue.

@JeffreySarnoff
Copy link
Contributor

Where is the current implementation type unstable?

@lucaferranti
Copy link
Member Author

the function min_ignore_nans (sorry I misspelled this in all previous comments) seems to be type unstable

julia> @code_warntype IntervalArithmetic.min_ignore_nans(1.1, 2.2, 3.3)
Variables
  #self#::Core.Const(IntervalArithmetic.min_ignore_nans)
  args::Tuple{Float64, Float64, Float64}
  #119::IntervalArithmetic.var"#119#120"

Body::Any
1 ─ %1 = Base.getproperty(IntervalArithmetic.Iterators, :filter)::Core.Const(Base.Iterators.filter)
│        (#119 = %new(IntervalArithmetic.:(var"#119#120")))
│   %3 = #119::Core.Const(IntervalArithmetic.var"#119#120"())
│   %4 = (%1)(%3, args)::Base.Iterators.Filter{IntervalArithmetic.var"#119#120", Tuple{Float64, Float64, Float64}}
│   %5 = Core._apply_iterate(Base.iterate, IntervalArithmetic.min, %4)::Any
└──      return %5

while on v1.5

julia> @code_warntype IntervalArithmetic.min_ignore_nans(1.1, 2.2, 3.3)
Variables
  #self#::Core.Compiler.Const(IntervalArithmetic.min_ignore_nans, false)
  args::Tuple{Float64,Float64,Float64}
  #119::IntervalArithmetic.var"#119#120"

Body::Float64
1 ─ %1 = Base.getproperty(IntervalArithmetic.Iterators, :filter)::Core.Compiler.Const(Base.Iterators.filter, false)
│        (#119 = %new(IntervalArithmetic.:(var"#119#120")))
│   %3 = #119::Core.Compiler.Const(IntervalArithmetic.var"#119#120"(), false)
│   %4 = (%1)(%3, args)::Base.Iterators.Filter{IntervalArithmetic.var"#119#120",Tuple{Float64,Float64,Float64}}
│   %5 = Core._apply_iterate(Base.iterate, IntervalArithmetic.min, %4)::Float64
└──      return %5

I think the instability is caused by the splatting inside min because the following function is identical to the current implementation, except that it uses minimum instead of min and seems to be type stable

julia> f(args...) = minimum(Iterators.filter(!isnan, args))
f (generic function with 2 methods)

julia> @code_warntype f(1.1, 2.2, 3.3)
Variables
  #self#::Core.Const(f)
  args::Tuple{Float64, Float64, Float64}

Body::Float64
1 ─ %1 = Base.Iterators.filter::Core.Const(Base.Iterators.filter)
│   %2 = !Main.isnan::Core.Const(Base.var"#78#79"{typeof(isnan)}(isnan))
│   %3 = (%1)(%2, args)::Base.Iterators.Filter{Base.var"#78#79"{typeof(isnan)}, Tuple{Float64, Float64, Float64}}
│   %4 = Main.minimum(%3)::Float64
└──      return %4

@JeffreySarnoff
Copy link
Contributor

as a start, these more structured variants are type stable

function max_ignore_nans(args::Vector{T}) where {T<:AbstractFloat}
    max(filter(x->!isnan(x), args)...)
end

function max_ignore_nans22(args::Vararg{T}) where {T}
    max(filter(x->!isnan(x), args)...)
end

@lucaferranti
Copy link
Member Author

Great to have a fix for the type instability!
I still don't get why those functions are needed in the first place though. The only case in which one of the scalar fma can return a NaN is if one of more input intervals are [NaN, NaN], in which case all 4 scalars will be NaN and thus the min_ignore_nans will return an error, because trying to compute min with zero arguments.

julia> a = Interval(NaN, NaN)
[NaN, NaN]

julia> b = 1..2
[1, 2]

julia> fma(a, b, b)
ERROR: MethodError: no method matching min()
Closest candidates are:
  min(::Dates.AbstractTime) at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.6\Dates\src\types.jl:438
  min(::DecoratedInterval{T}, ::DecoratedInterval{T}) where T at C:\Users\lucaa\.julia\packages\IntervalArithmetic\odPjL\src\decorations\functions.jl:216
  min(::CartesianIndex{N}, ::CartesianIndex{N}) where N at multidimensional.jl:118
  ...
Stacktrace:
 [1] min_ignore_nans
   @ ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:239 [inlined]
 [2] #123
   @ ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:269 [inlined]
 [3] setrounding(f::IntervalArithmetic.var"#123#125"{Interval{Float64}, Interval{Float64}, Interval{Float64}}, #unused#::Type{Float64}, rounding::RoundingMode{:Down})
   @ Base.Rounding .\rounding.jl:176
 [4] fma(a::Interval{Float64}, b::Interval{Float64}, c::Interval{Float64})
   @ IntervalArithmetic ~\.julia\packages\IntervalArithmetic\odPjL\src\intervals\arithmetic.jl:264
 [5] top-level scope
   @ REPL[62]:1

Could we simply check if some of the inputs are NaN at the beginning, return [NaN, NaN] if they are and use normal min and max in the computations?

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Mar 19, 2021

if any is NaN the result is (NaN, NaN)
a function guard is in order

@inline function nan_guard(a::T, b::T, c::T)
    isnan(a+b+c) 
end

function interval_fma(a::T, b::T, c::T)
    nan_guard(a, b, c) && return (T(NaN), T(NaN))
    # as before without the filtering of NaNs
end

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Mar 27, 2021

@inline nan_guard(a::T, b::T, c::T) where {T} = isnan(a+b+c) 

function interval_fma(a::T, b::T, c::T) where {T<:Base.IEEEFloat}
  nan_guard(a, b, c) && return(T(NaN), T(NaN))

  hi, lo = two_fma(a, b, c)
  if signbit(lo)
    lo = prevfloat(hi)
  elseif !zero(lo)
    lo = hi
    hi = nextfloat(hi)
  else 
    lo = hi
  end
  return hi, lo
end

function two_fma(a::T, b::T, c::T) where {T<:Base.IEEEFloat}
    hi = fma(a, b, c)
    hi0, lo0 = two_prod(a, b)
    hi1, lo1 = two_sum(c, lo0)
    hi2, lo2 = two_sum(hi0, hi1)
    lo = ((hi2 - hi) + lo2) + lo1
    return hi, lo
end

"""
    two_sum(a, b)

Computes `hi = fl(a+b)` and `lo = err(a+b)`.
"""
@inline function two_sum(a::T, b::T) where {T<:Base.IEEEFloat}
    hi = a + b
    v  = hi - a
    lo = (a - (hi - v)) + (b - v)
    return hi, lo
end

"""
    two_prod(a, b)

Computes `hi = fl(a*b)` and `lo = fl(err(a*b))`.
"""
@inline function two_prod(a::T, b::T) where {T<:Base.IEEEFloat}
    hi = a * b
    lo = fma(a, b, -hi)
    hi, lo
end

@lucaferranti
Copy link
Member Author

Now that SetRounding.jl is fixed, do we still need the manual directed rounding or can we simply use something like:

function fma(a::Interval{T}, b::Interval{T}, c::Interval{T}) where T
    #T = promote_type(eltype(a), eltype(b), eltype(c))

    (isempty(a) || isempty(b) || isempty(c)) && return emptyinterval(T)
    (isnan(a) || isnan(b) || isnan(c)) && return a+b+c

    if isentire(a)
        b == zero(b) && return c
        return entireinterval(T)

    elseif isentire(b)
        a == zero(a) && return c
        return entireinterval(T)

    end

    lo = setrounding(T, RoundDown) do
        lo1 = fma(a.lo, b.lo, c.lo)
        lo2 = fma(a.lo, b.hi, c.lo)
        lo3 = fma(a.hi, b.lo, c.lo)
        lo4 = fma(a.hi, b.hi, c.lo)
        min(lo1, lo2, lo3, lo4)
    end

    hi = setrounding(T, RoundUp) do
        hi1 = fma(a.lo, b.lo, c.hi)
        hi2 = fma(a.lo, b.hi, c.hi)
        hi3 = fma(a.hi, b.lo, c.hi)
        hi4 = fma(a.hi, b.hi, c.hi)
        max(hi1, hi2, hi3, hi4)
    end

    Interval(lo, hi)
end

this is a slightly modified version of the current one, using normal min and max and checking for NaI at the beginning of the function.¨

Not sure how this compares performancewise to the manual directed rounding, I can try to craft a benchmark on my machine.

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Mar 27, 2021

I am putting together some comparisons too.
What is happening here? What is the correct intuition?

julia> a=Interval(sin(0.5))
[0.479425, 0.479426]

julia> a.lo==a.hi
true

I had expected



julia> using SetRounding

julia> setrounding(Float64, RoundDown); lo=sin(0.5); setrounding(Float64, RoundUp); hi=sin(0.5);

julia> lo,hi
(0.47942553860420295, 0.479425538604203)

julia> a=Interval(lo,hi)
[0.479425, 0.479426]

julia> a.lo == a.hi
false

@lucaferranti
Copy link
Member Author

The Interval constructor and the interval method do not perform directed rounding, they merely assign the given input numbers to the attributes of the interval. The "smart" interval which performs directed rounding is obtained with the @interval macro or the .. method

julia> a = @interval sin(0.5)
[0.479425, 0.479426]

julia> a.lo == a.hi
false

julia> using SetRounding

julia> setrounding(Float64, RoundDown); lo = sin(0.5); setrounding(Float64, RoundUp); hi = sin(0.5); setrounding(Float64, RoundNearest)
0

julia> lo == a.lo
true

julia> hi == a.hi
true

@JeffreySarnoff
Copy link
Contributor

why does

julia> a=Interval(sin(0.5))
[0.479425, 0.479426]

julia> a.lo==a.hi
true

show [0.479425, 0.479426] rather than [0.479426, 0.479426]
as a.lo==a.hi==0.479_425_538_604_203

@lucaferranti
Copy link
Member Author

It seems the display function automatically rounds down the lower bound and rounds up the upper bound to the chosen number of digits (6 by default), regardless of whether the lower and upper bound are the same https://github.com/JuliaIntervals/IntervalArithmetic.jl/blob/master/src/display.jl#L190-L191
If only @interval or .. or were used to construct intervals, then I think this wouldn't be an issue. If the interval is created with interval or Interval then the printing can be misleading in some situations I guess. cc @dpsanders

@dpsanders
Copy link
Member

I think we should avoid setrounding completely.

Yes we should probably fix the display to it doesn't round if that's not necessary, if it's possible to do so.

@JeffreySarnoff
Copy link
Contributor

There is code in display.jl that rounds the value after converting it to a string. That is a very powerful approach, and essential to the philosophical integrity of ArbFloats.jl (to show the value that is least misleading, the most veridical representation).

The code here is not trying to do that, so it should not round strings. Using round(x, digits=ndigits, base=2) or ..base=10) will serve better, and when both lo and hi are equal, just round one of them and use that for both in the display). Sometimes the string that results runs _9999999996 or other oddity, so the time convert to a string and round the last digit to be shown (rippling up as necessary) is after the numeric, digit bounded rounding.

@lucaferranti lucaferranti changed the title functions using setrounding do not work with 1.6 on windows remove setrounding from functions implementation Mar 27, 2021
@lucaferranti
Copy link
Member Author

I set up a small benchmark candidates are

fma with setrounding

function fma(a::Interval{T}, b::Interval{T}, c::Interval{T}) where T
    #T = promote_type(eltype(a), eltype(b), eltype(c))

    (isempty(a) || isempty(b) || isempty(c)) && return emptyinterval(T)
    isnan(a+b+c) && return a + b + c

    if isentire(a)
        b == zero(b) && return c
        return entireinterval(T)

    elseif isentire(b)
        a == zero(a) && return c
        return entireinterval(T)

    end

    lo = setrounding(T, RoundDown) do
        lo1 = fma(a.lo, b.lo, c.lo)
        lo2 = fma(a.lo, b.hi, c.lo)
        lo3 = fma(a.hi, b.lo, c.lo)
        lo4 = fma(a.hi, b.hi, c.lo)
        min(lo1, lo2, lo3, lo4)
    end

    hi = setrounding(T, RoundUp) do
        hi1 = fma(a.lo, b.lo, c.hi)
        hi2 = fma(a.lo, b.hi, c.hi)
        hi3 = fma(a.hi, b.lo, c.hi)
        hi4 = fma(a.hi, b.hi, c.hi)
        max(hi1, hi2, hi3, hi4)
    end

    Interval(lo, hi)
end

and fma1 using manual directed rounding and the helping functions as defined above

function fma1(a::Interval{T}, b::Interval{T}, c::Interval{T}) where T

    (isempty(a) || isempty(b) || isempty(c)) && return emptyinterval(T)
    isnan(a+b+c) && return a + b + c

    if isentire(a)
        b == zero(b) && return c
        return entireinterval(T)

    elseif isentire(b)
        a == zero(a) && return c
        return entireinterval(T)
    end

    _, lo1 = fma_interval(a.lo, b.lo, c.lo)
    _, lo2 = fma_interval(a.lo, b.hi, c.lo)
    _, lo3 = fma_interval(a.hi, b.lo, c.lo)
    _, lo4 = fma_interval(a.hi, b.hi, c.lo)
    lo = min(lo1, lo2, lo3, lo4)

    hi1, _ = fma_interval(a.lo, b.lo, c.hi)
    hi2, _ = fma_interval(a.lo, b.hi, c.hi)
    hi3, _ = fma_interval(a.hi, b.lo, c.hi)
    hi4, _ = fma_interval(a.hi, b.hi, c.hi)
    hi = max(hi1, hi2, hi3, hi4)

    return Interval(lo, hi)
end

and the benchmarking

julia> a = b = c = 1.1..1.2
[1.09999, 1.20001]

julia> @btime fma($(Ref(a))[], $(Ref(b))[], $(Ref(c))[])
  1.018 μs (0 allocations: 0 bytes)
[2.30999, 2.64001]

julia> @btime fma1($(Ref(a))[], $(Ref(b))[], $(Ref(c))[])
  61.071 ns (0 allocations: 0 bytes)
[2.30999, 2.64001]

so fma1 is also faster in addition to not using setrounding

@dpsanders
Copy link
Member

Wow, that's really much faster. That's why pretty convincing!

@JeffreySarnoff
Copy link
Contributor

15x is 10 times faster than 1.5x, which is my waterline for "worth the effort"
I suppose fma1 is well worth the, apparently 1/10th effort, we expended. 😃

@lucaferranti
Copy link
Member Author

The version with manual directed rounded apparently fails for "half unbounded intervals", e.g.

julia> a = 1..2
[1, 2]

julia> b = -∞..3
[-∞, 3]

julia> c = 5..∞
[5, ∞]

julia> fma(a, b, c)
[-∞, NaN]

Incidentally, now filtering out the NaN before taking the maximum/minimum fixes the problem (in the sense that all tests pass) 😃 . I replaced lo = min(lo1, lo2, lo3, lo4) with minimum(filter(!isnan, [lo1, lo2, lo3, lo4])) and max similarly and the new benchmark is

julia> a = b = c = 1.1..1.2
[1.09999, 1.20001]

julia> @btime fma($(Ref(a))[], $(Ref(b))[], $(Ref(c))[])
  228.507 ns (4 allocations: 448 bytes)
[2.30999, 2.64001]

so still faster than the original one but quite a loss in efficiency. I think the problem is that Inf - Inf = NaN and thus the help functions (like two_sum, two_fma ) can return a NaN for infinite input. I wonder whether these edge cases can be taken into account without sacrificing efficiency 🤔

@JeffreySarnoff
Copy link
Contributor

JeffreySarnoff commented Mar 28, 2021

Why not do something similar to

function FMA(a::T, b::T, c::T) where T
   fma_hi = fma(a, b, c)
   isnan(fma_hi) && return (T(NaN), T(NaN))
   isinf(fma_hi) && return (fma_hi, fma_hi)
   ....
end

When I have used this pattern in the past, I wrote a fast binary-based isnotfinite(fma_hi) and only checked
isnan(fma_hi), isinf(fma_hi) when it had been determined that isnotfinite(fma_hi) (which happens rarely,
so having the one conditional is much better than always running through two conditionals).
Julia's !isfinite(x) has improved since then. It is faster to test !isfinite than to test isfinite.

@lucaferranti
Copy link
Member Author

lucaferranti commented Mar 28, 2021

Ok I think I figured out how to avoid filtering NaNs. Here is a running example to explain what I mean. The problem is that if we want to compute e.g. fma([1, 2], [-∞, 3], [2, ∞]) then the four candidates for the upper bound are

h1 = fma(1, -∞, ∞) = NaN
h2 = fma(2, -∞, ∞) = NaN
h3 = fma(1, 3, ∞) = ∞
h4 = fma(2, 3, ∞) = ∞

and the first two poison the computation of the maximum and hence the need for max/min_ignore_nans. Followinf Jeffrey's suggestions I modified the function hi, lo = interval_fma(a, b, c), which returns the rounded up (hi) and rounded down (lo) versions of fma as follows:

function interval_fma(a::T, b::T, c::T) where {T}

    hi = fma(a, b, c)
    isnan(hi) && return convert(T, -Inf), convert(T, Inf) # first is hi, second is lo
    !isfinite(hi) && return hi, hi

    hi, lo = two_fma(a, b, c)
    # and then like before

i.e. if the result of the scalar fma is NaN, it returns -Inf as upper bound and Inf as lower bound. This way those are automatically filtered out when taking the maximum of upper bounds and the minimum of lower bounds, because the only case in which min can return Inf is if all its inputs are Inf, which would mean that all 4 candidates would have returned NaN, which can happen only if some of the original input intervals were NaI or empty intervals or entire intervals, but these cases are already checked at the beginning of fma(a::Interval{T}, b::Interval{T}, c::Interval{T}).

Summa summarum, now all tests pass (the ITF1788 test suite has at least 108 tests checking those half unbounded intervals cases, so I guess they are quite exhaustive), no need for filtering NaN out and the benchmark is

julia> a = b = c = 1.1..1.2
[1.09999, 1.20001]

julia> @btime fma($(Ref(a))[], $(Ref(b))[], $(Ref(c))[])
  46.255 ns (0 allocations: 0 bytes)
[2.30999, 2.64001]

as a final small side note, at the beginning of fma(a::Interval{T}, b::Interval{T}, c::Interval{T}) I changed

isnan(a+b+c) && return a + b + c

to

(isnan(a) || isnan(b) || isnan(c)) && return a + b + c

squeezing a little more efficiency, now the benchmark is

julia> @btime fma($(Ref(a))[], $(Ref(b))[], $(Ref(c))[])
  35.549 ns (0 allocations: 0 bytes)
[2.30999, 2.64001]

I will soon polish the code and push it to the PR of this issue ( #443 ), comments, feedback and improvement suggestions are always welcome of course.

@JeffreySarnoff
Copy link
Contributor

With the benchmarking I did this morning

(isnan(a) || isnan(b) || isnan(c)) && return a + b + c

is sometimes less performant than each of these

isnan(a+b+c) && return a+b+c
isnan(fma(a,b,c)) && return fma(a,b,c)

(it goes so quickly that when none are NaN, the one-off timings are indistinguishable)
Where code use brings issues of overlap and latency to the fore, the use of || is not as performant because

I displayed @code_native for each of the three, and then removed the instructions they all shared
The version with '||' is the only one that has an internal jump (better avoided on general principles).
Are your benchmarks using real-world code or just comparing these routines?
If the latter, I am less persuaded. If the former, disregard this note.

@lucaferranti
Copy link
Member Author

cool, I did not think it that way. Thanks for the insight! I just benchmarked the routines, I do not have any benchmarks where those routines are part of a bigger real-world code. I switched back to isnan(a+b+c) && return a+b+c . The code for fma is now updated in the PR #443

@lucaferranti lucaferranti linked a pull request Apr 20, 2021 that will close this issue
3 tasks
@OlivierHnt
Copy link
Member

Fixed by PR #593.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants