Skip to content

perrutquist/DoubleDoubles.jl

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

60 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DoubleDoubles.jl

(This is fork of DoubleDouble.jl. It was renamed DoubleDoubles as it provides the type DoubleDouble.)

This is work in progress. A lot of the code is completely untested. If you need double-double arithmetic right now, I'd suggest using either the TwicePrecision type that is included in the Base module, or the HigherPrecision module, which also supports trigonometric functions.

DoubleDoubles.jl is a Julia package for performing extended-precision arithmetic using pairs of floating-point numbers. This is commonly known as "double-double" arithmetic, as the most common format is a pair of C-doubles (Float64 in Julia), although Doubles will actually work for any floating-point type, including itself. Its aim is to provide accurate results without the overhead of BigFloat types.

The core routines are based on the ideas and algorithms of Dekker (1971).

Interface

The main type is Double, with two floating-point fields: hi, storing the leading bits, and lo storing the remainder. hi is stored to full precision and rounded to nearest. When creating a Double directly using the type constructor, the user is responsible for ensuring that abs(x.lo) <= 0.5 * eps(x.hi). However, Doubles are typically created by conversion from other numeric types.

julia> using DoubleDoubles

julia> x = DoubleDouble(pi)
3.14159265358979323846264338327953

julia> (x.hi, x.lo)
(3.141592653589793,1.2246467991473532e-16)

julia> eps(x.hi)
4.440892098500626e-16

julia> xx = QuadDouble(pi)
3.1415926535897932384626433832795028841971693993751058209749445924

julia> (xx.hi.hi, xx.hi.lo, xx.lo.hi, xx.lo.lo)
(3.141592653589793,1.2246467991473532e-16,-2.764667309787496e-32,1.4797009536535408e-48)

The other type defined is Single, which is a Double where the lo field is guaranteed to be zero. (Attempting a conversion to Single will throw an InexactError if this is not the case.) Operations on Single typically return Double and will often be faster than the corresponding operations on Double.

Note

Constructs like Double(pi) that used to default to Double{Float64}(pi) are depreciated. Either write out Double{Float64} explicitly, or use the type alias DoubleDouble.

Double{BigFloat} is not allowed. The base type must be immutable. (Use setprecision to get higher-precision BigFloats.)

Examples

Exact products and remainders

By exploiting this property, we can compute exact products of floating point numbers.

julia> u, v = 64 * rand(), 64 * rand()
(44.438125149181445, 46.96067529434315)

julia> w = Single(u) * Single(v)
2.08684436582009398756273247966684e+03

julia> (w.hi, w.lo)
(2086.844365820094, -3.871317795744025e-14)

Note that the product of two Singles is a Double: the hi element of this double is equal to the usual rounded product, and the lo element contains the exact difference between the exact value and the rounded.

This can be used to get an accurate remainder

julia> r = rem(w, 1.0)
8.44365820093987562732479666836822e-01

julia> Float64(r)
0.8443658200939875

This is much more accurate than taking ordinary products, and gives the same answer as using BigFloats:

julia> rem(u*v, 1.0)
0.8443658200940263

julia> Float64(rem(big(u) * big(v), 1.0))
0.8443658200939875

However, since the DoubleDouble version is carried out using ordinary floating-point operations, it is of the order of 1000x faster than the BigFloat version.

Correct rounding with non-exact floats

If a number cannot be exactly represented by a floating-point number, it may be rounded incorrectly when used later, e.g.

julia> pi * 0.1
0.3141592653589793

julia> Float64(big(pi) * 0.1)
0.31415926535897937

We can also do this computation using Doubles (note that the promotion rules mean that only one needs to be specified):

julia> Float64(DoubleDouble(pi) * 0.1)
0.31415926535897937

julia> Float64(pi * Single(0.1))
0.31415926535897937

Emulated FMA

The fused multiply-add (FMA) operation is an intrinsic floating-point operation that allows the evaluation of a * b + c, with rounding occurring only at the last step. This operation is unavailable on 32-bit x86 architecture, and available only on the most recent x86_64 chips, but can be emulated via double-double arithmetic:

julia> 0.1 * 0.1 + 0.1
0.11000000000000001

julia> Float64(big(0.1) * 0.1 + 0.1)
0.11

julia> Base.fma(a::Float64,b::Float64,c::Float64) = Float64(Single(a) * Single(b) + Single(c))
fma (generic function with 1 method)

julia> fma(0.1, 0.1, 0.1)
0.11

About

Extended precision arithmetic for Julia

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Julia 100.0%