Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor result in log1p(complex) #790

Open
tim-one opened this issue Apr 18, 2024 · 46 comments · May be fixed by #803
Open

Poor result in log1p(complex) #790

tim-one opened this issue Apr 18, 2024 · 46 comments · May be fixed by #803
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@tim-one
Copy link

tim-one commented Apr 18, 2024

A case where lop1p of a complex number gets a poor real part:

>>> import mpmath
>>> mpmath.__version__
'1.3.0'
>>> from mpmath import mp, mpc, log, log1p, workprec
>>> c = 1.8370676479640493e-39 - 4.6885882517313053e-20j
>>> complex(log1p(c))
(1.0991429897136409e-39-4.6885882517313053e-20j)
>>> with workprec(500): complex(log1p(c))
(2.93621063767769e-39-4.6885882517313053e-20j)

With the higher precision, the real part of the result is entirely different. I don't understand the code:

>>> mp.mag(c), mp.prec
(-63, 53)

Because -63 < -53, I expected the code to take the branch that uses a 2-term expansion, which works fine:

>>> c - 0.5*c**2 # matches the high-precision result
(2.93621063767769e-39-4.6885882517313053e-20j)

But I must not have found the correct source code, since editing it myself has no visible effect.

The other branch in the code calls log() instead after adding 1 to c with double precision:

>>> complex(log(mp.fadd(c, 1, prec=106)))
(1.0991429897136409e-39-4.6885882517313053e-20j) # reproduces the "bad" result

It's not enough extra precision to help. In this case,

>>> mp.mag((2*c.real + c.real**2 + c.imag**2))
-127
>>> complex(log(mp.fadd(c, 1, prec=127+53))) # so this is enough
(2.93621063767769e-39-4.6885882517313053e-20j)
>>> complex(log(mp.fadd(c, 1, prec=127+53-2))) # but 2 bits less not quite enough
(2.9362106376776874e-39-4.6885882517313053e-20j)

To be clear, "the code" I'm talking about is this, from functions/functions.py:

@defun_wrapped
def log1p(ctx, x):
    if not x:
        return ctx.zero
    if ctx.mag(x) < -ctx.prec:
        return x - 0.5*x**2
    return ctx.log(ctx.fadd(1, x, prec=2*ctx.prec))

LATER: found the right source code. ctx.mag(x) and -ctx.prec in the above both return -63 in this case, so that's why the "wrong" branch is taken. I have no idea how ctx.prec ended up at 63. As shown above, I never changed it from the default 53.

@skirpichev skirpichev added the bug an unexpected problem or unintended behavior label Apr 19, 2024
@skirpichev skirpichev added this to the 1.4 milestone Apr 19, 2024
@skirpichev
Copy link
Collaborator

Thanks for a bugreport.

I have no idea how ctx.prec ended up at 63.

It was temporary increased in the _wrap_specfun() helper:

mpmath/mpmath/ctx_mp_python.py

Lines 1017 to 1026 in b5c0450

def f_wrapped(ctx, *args, **kwargs):
convert = ctx.convert
args = [convert(a) for a in args]
prec = ctx.prec
try:
ctx.prec += 10
retval = f(ctx, *args, **kwargs)
finally:
ctx.prec = prec
return +retval

BTW, with

diff --git a/mpmath/functions/functions.py b/mpmath/functions/functions.py
index 2bbb63d..178b375 100644
--- a/mpmath/functions/functions.py
+++ b/mpmath/functions/functions.py
@@ -168,7 +168,8 @@ def log1p(ctx, x):
         return ctx.zero
     if ctx.mag(x) < -ctx.prec:
         return x - 0.5*x**2
-    return ctx.log(ctx.fadd(1, x, prec=2*ctx.prec))
+    u = ctx.fadd(1, x, prec=2*ctx.prec)
+    return ctx.log(u)*(x/(u-1))
 
 @defun_wrapped
 def powm1(ctx, x, y):

I got:

>>> complex(log1p(c))
(2.93621063767769e-39-4.6885882517313053e-20j)

PS: There is another issue, that could be more trivially fixed. Above function code is used in the fp context as well. We could change that to math.log1p() in real case (BTW, why there is no cmath.log1p()? E.g. there is log10(), missing in complex.h.).

@tim-one
Copy link
Author

tim-one commented Apr 19, 2024

Thanks! It's at least cute that a well-intentioned small boost in precision to absorb rounding errors unintentionally convinced later code to do an entirely wrong thing 😉.

Can't make time for it now, but suggest caution about:

u = ctx.fadd(1, x, prec=2*ctx.prec)
return ctx.log(u)*(x/(u-1))

That trick works great for real-valued log1p(), but complex numbers are more complicated beasts. For example, there are no particular numeric challenges in computing the imaginary part of the result, but dividing through by u-1 may well introduce multiple ULPs of needless error in the imaginary part (complex divide in Cartesian coordinates sucks in ... what? ... 6 multiplies, 2 divides, and 3 additions 2 of which are subject to potential massive cancellation - but in the real-valued case, it's just the single division it "looks like" it is).

For example, this code isn't exactly the same, but does something similar in Python:

def test(c):
    c2 = mp.fadd(c, 1.0, prec=2*mp.prec)
    L = mpmath.log(c2)
    return c * (L / (c2 - 1.0))

and then

>>> c = 7.395821761854832e+86-3.4983116269888314e+21j
>>> complex(mpmath.log1p(c))
(200.02323321145997-4.730118896363822e-66j)
>>> with workprec(500): complex(mpmath.log1p(c)) # that was _already_ as good as it gets
(200.02323321145997-4.730118896363822e-66j)       # same result
>>> complex(test(c))  # but the imaginary part is damaged by the "-1 trick"
(200.02323321146-4.7301188963640004e-66j)
>>> import math  # by about 339 ULP :-(
>>> (4.7301188963640004e-66 - 4.730118896363822e-66)/math.ulp(4.730118896363822e-66)
339.0

Note here that 1 is insignificantly tiny compared to either component of c. log1p(c) and log(c) should be the same (1 + c.real loses the 1 to rounding even in 106-bit arithmetic):

>>> c = 7.395821761854832e+86-3.4983116269888314e+21j
>>> r = mpmath.log1p(c)
>>> complex(r)
(200.02323321145997-4.730118896363822e-66j)
>>> c2 = mp.fadd(c, 1, prec=2*mp.prec)
>>> c == c2  # adding 1 is irrelevant for this `c`
True
>>> complex(c/c*r)
(200.02323321145997-4.730118896363822e-66j)  # exactly the same
>>> _ == r
True
>>> complex(c*(r/c)) # but just reordering the operations introduces the 339 ULP error
(200.02323321146-4.7301188963640004e-66j)

@tim-one
Copy link
Author

tim-one commented Apr 21, 2024

The surprisingly high ULP error in the last example seems primarily due to that the Python code rounds r/c back to 53 bits before the multiply by c is done. Turns out it's correctly rounded to 53 bits, but for these specific values turns out that bits beyond the 53rd make a real difference to the result. In the context of an mpmath implementation, though, you're working with 10 "extra" bits throughout, which would act to hide that..

FYI, the function following takes a very different approach. Across about 10 million randomized inputs so far, it hasn't yet reached 2 1/2 total ULP error (summing the absolute values of the ULP errors in the .real and .imag parts), and total ULP error is < 1 over 99.2% of the time. I took the result of mpmath.log1p(c) with prec=500 as being "correct". However, the error stats appear to look better if I raise that to 1000 bits, so "correct" looks arguable.

def test(c):
    if mp.mag(c) >= 3:
        return mpmath.log(1.0 + c)
    R = mpf(c.real)
    C = mpf(c.imag)
    imag_part = mpmath.atan2(C, R + 1.0)
    # compute res = (R+1)**2 + C**2 - 1 exactly,
    # = R**2 + 2*R + C**2.
    Rsq = mp.fmul(R, R, exact=True)
    twoR = mp.ldexp(R, 1)
    res = mp.fadd(Rsq, twoR, exact=True)
    Csq = mp.fmul(C, C, exact=True)
    res = mp.fadd(res, Csq, exact=True)
    # Now the real part is log(sqrt(1 + res)) = 0.5 * log(1 + res).
    real_part = mpmath.log1p(res)
    return mpc(real_part * 0.5, imag_part)

@skirpichev
Copy link
Collaborator

Maybe it's safe to use just

diff --git a/mpmath/functions/functions.py b/mpmath/functions/functions.py
index 2bbb63d..b770e05 100644
--- a/mpmath/functions/functions.py
+++ b/mpmath/functions/functions.py
@@ -168,7 +168,7 @@ def log1p(ctx, x):
         return ctx.zero
     if ctx.mag(x) < -ctx.prec:
         return x - 0.5*x**2
-    return ctx.log(ctx.fadd(1, x, prec=2*ctx.prec))
+    return ctx.log(ctx.fadd(1, x, exact=True))
 
 @defun_wrapped
 def powm1(ctx, x, y):

One case, when mantissa of 1+x could blow up is handled right above. The other is when |x|>>1.

@tim-one
Copy link
Author

tim-one commented Apr 21, 2024

I think you're on to something there. I'm seeing excellent numerical results from the following Python function, which intends to do much the same. Haven't seen total 2 ULP error yet, and 99.99% of the time under 1 total, but that's after only 7 million trials. Alas, it seems to run significantly slower ovreall.

Two tweaks:

  • The range over which the cheap series expansion works fine is widened.

  • Doing exact addition is skipped if adding 1 to the real part is lost in working precision. So the imaginary part is irrelevant to this decision, and exact addition is also skipped if |c.real| >> 1.

def test(c):
    c = mpc(c)
    with extraprec(10):
        if mp.mag(c) * 1.5 < -mp.prec:
            result = c - 0.5 * c * c
        else:
            arg = c
            if c.real + 1.0 != c.real:
                arg = mp.fadd(1.0, c, exact=True)
            result = mp.log(arg)
    return result

Surprise! At about the 11 millionth try, it hit a case pushing 4 ulp error:

>>> c = mpc(-2.0476815825463086e-80 - 2.0235857941734692e-40j)
>>> with workprec(1000): right = mp.log1p(c)
>>> got = test(c)
>>> complex(right)
(-2.3184935597344513e-84-2.0235857941734692e-40j)
>>> complex(got) # small difference in real part
(-2.318493559734453e-84-2.0235857941734692e-40j)
>>> import math # compute ULP difference
>>> rgot = float(got.real)
>>> d = float(abs(rgot - right.real))
>>> d / math.ulp(abs(rgot))
3.7405120134353638

Don't know why. This would have taken the cheap 2-term series branch, and mp.mag(c) is -130 so the teeak I made to its input range didn't matter to that.

WOW! It's due to working precision being 1 bit too small for this case - sheesh.

>>> with workprec(63): complex(c - c*c/2) # what we got
(-2.318493559734453e-84-2.0235857941734692e-40j)
>>> with workprec(64): complex(c - c*c/2) # increase by 1 bit and it's "perfect"
(-2.3184935597344513e-84-2.0235857941734692e-40j)

@tim-one
Copy link
Author

tim-one commented Apr 22, 2024

And here's another stab, which has so far delivered a maximum total ULP error of 1.000254. Those cases are rare enough to check by eyeball, and they're "balanced". Meaning that there's an error very close to 0.5 ULP in each of the components. Deeper digging on half a dozen showed they were all cases where the infinitely precise result was very close to being halfway between representable floats.

The twists here:

  • Widened the "fast path" (series expansion) input range again.

  • But ask for another 5 bits in the fast path to combat the 1-in-10-million (or so) cases that were getting total ULP errors over 5 before. Haven't seen another of those since.

  • Took a trick from my long-winded code to vastly cut the number of cases where an exact add of 1.0 is used. Now that happens only when mp.mag(c) is between -mp.prec/2 and 3.

I suppose in the context of an mpmath implementation, where this code's extraprec(10) is already supplied by magic, the calls to mp.log() would go on to add another 10 bits of their own. And I suppose this code does too. I don't know whether that's important to getting such good results.

My latest run just crossed 24 million cases. No change to mention: max total ULP error still 1.000254, and the number of cases with total error < 1 ULP is well over 99.99%; the most recent reachimg 1 ULP was over 7 million cases ago.

def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    with extraprec(10):
        if cmag >= 3:
            result = mp.log(1.0 + c)
        elif cmag * 2 < -mp.prec:
            with extraprec(5):
                result = c - c * c * 0.5
        else:
            arg = mp.fadd(1.0, c, exact=True)
            result = mp.log(arg)
    return result

Alas, it seems to run significantly slower overall.

That was mostly an illusion. I had forgotten I boosted the reference calculation to use 1000 bits instead of 500. That made it significantly slower.

@tim-one
Copy link
Author

tim-one commented Apr 22, 2024

Ack. After about 71 million trials, a nearly 2-ULP error popped up, at c=-4.382549590908223e-54-2.9606011943930447e-27j. Another "fast path" case that's always been there (mag is -87). And the error would go away if I used extraprec(6) instead of extraprec(5) - another case where just 1 bit makes all the difference.

I seem lost as to how to out-think this. I always figured that since c - c**2/2 has root 2, and mag < -prec ruled out anything near that large, catastrophic cancellation couldn't occur. But I was wrong about that: it also has root 0, and bad cancellation remains a potential problem for very tiny inputs:

>>> c = mpc(-4.382549590908223e-54-2.9606011943930447e-27j)
>>> complex(c)
(-4.382549590908223e-54-2.9606011943930447e-27j)
>>> complex(c**2/2)
(-4.3825797161207615e-54+1.2974981553329635e-80j)

So the leading 5+ decimal digits of the real part cancel out in the subtraction. That's about 15 bits, which is the total extra precision (10 + 5) I'm using.

How bad can it get? Don't know. But if I ask for more & more precision on each ever-rarer bad case at a time,. it won't be long before the name "fast path" will be a joke 😉.

BTW, I don't believe anything analogous can go wrong with this series for real-valued inputs. Despite that these have very different norms:

>>> abs(c)
mpf('2.9606011943930447e-27')
>>> abs(c*c/2)
mpf('4.3825797161207615e-54')

their real components are nevertheless nearly equal. That can't happen with abs(real). In the complex case at hand, it's c.imag that dominates the norm of c, but the norm of c*c/2 is dominated by its real part. That the norms are very different doesn't imply that the components are both far apart ☹️

@skirpichev
Copy link
Collaborator

Wouldn't be better using a Horner scheme c*(1. - 0.5*c)?

@tim-one
Copy link
Author

tim-one commented Apr 22, 2024

In this specific case, yes. But cancellation brings it very close to where the 3rd-order term can matter, and there's no reason at all to hope that worse cancellation can't occur. So what I'm testing now does away with the extraprec(5) and uses a 3rd-order "Hornerized" series instead:

            elif cmag * 2 < -mp.prec:
                result = c*c * (c/3.0 - 0.5) + c

EDIT: which works better in "fully Hornerized" form:

                result = c*(c*(c/3.0 - 0.5) + 1.0)

@tim-one
Copy link
Author

tim-one commented Apr 22, 2024

No surprises.

After 300 million trials, max total ULP error seen is 1.0009, reached at 2.3352642762712193e+104-0.0377801491250023j. All but 47 values had total error less than 1 ULP.

All inputs were built from native Python floats with input context prec 53, The real and imag components were generated in the same way: pick one of the 2**52 representable floats in [1.0, 2.0) uniformly at random, apply ldexp with a shift picked from [-400, 400] at random, and randomly negate it half the time.

@tim-one
Copy link
Author

tim-one commented Apr 23, 2024

Continued testing continued to deliver similar stellar results. Now that I'm (more than just) happy with accuracy, I can think about speed. For a start, the 2-term Taylor series is obviously much better-behaved in this context via Hornerizing, but I haven't tested that. So that's next.

@tim-one
Copy link
Author

tim-one commented Apr 23, 2024

LOL! Testing with the 2-term Horner series quickly delivered the worst case of the last few days: total ULP error of 1.00224, at -3.8445809036335997e-16+1.754102014431826e-63j) (mag -50, so takes the fast path due to my widening of it).

However, it delivers exactly the same (after rounding back to Python's 53-bit floats) with the 3-term series.

To speed this up, I fiddled the random generator to heavily favor producing smaller outputs, to make it much more likely that the fast path will be taken.

@tim-one
Copy link
Author

tim-one commented Apr 23, 2024

Damn. I added a long-ish comment on the actual source of errors here, but Github appears to have lost it.

Short course: "Horner's method" is mostly a red herring here. It's just as prone to cancellation disasters.
However, they're much harder to provoke, because mpc.__mul__ appears to do its adds/subtracts on exact products, while the c - c*c*0.5 spelling does the cancelling subtract after its input products have been rounded back to 63 bits.

Random testing appears too feeble to catch the disasters in reasonable time - didn't see one over 1.002 total ULP error after 730 million tries. But now that I finally understand a cause, it's easy to construct randomish inputs a+bj with small negative a, and b "close" to sqrt(a*(a - 2)) (the most obvious way to provoke cancellation problems).

>>> c = mpc(-6.4922176418510124e-21+1.1394926627101214e-10j)
>>> with workprec(10000): complex(mp.log1p(c)) # presumed correct
(-1.4201199664289643e-37+1.1394926627101214e-10j)
>>> complex(test(c))  # 2-term series only gets 4 correct digits
(-1.42033071087851e-37+1.1394926627101214e-10j)
# And cancellation is so bad the 3-term series isn't better
>>> with extraprec(10): complex(c*(c*(c/3.0 - 0.5) + 1.0))
(-1.42033071087851e-37+1.1394926627101214e-10j)

No matter how it's spelled, the real part of the result is the mathematical a - (a*a - b*b) / 2, and the two terms are identical in the first 16 significant digits):

>>> a = c.real
>>> b = c.imag
>>> with extraprec(100): a
mpf('-6.4922176418510124160942132081389312851943535838e-21')
>>> with extraprec(100): (a**2 - b**2)/2
mpf('-6.49221764185101227404006767533335278133713234072e-21')

I note that mag(c) is -32, the largest it can be and still take the "fast path", which I widened. Widening it may have been a fundamental error - but, if so, for reasons I don't yet grasp. After restoring the original condition (mag < -prec), this new "bad case" random generator hasn't found an actual bad case yet (with the widened test, it finds many almost instantly, on every run).

@tim-one
Copy link
Author

tim-one commented Apr 23, 2024

A bit of better 😉 news. The imaginary part of the 2-term series is, mathematically, b-a*b That can't get anywhere near any cancellation unless a is close to 1. But mag(c) is far too small in the fast path for that to happen.

So disaster analysis can focus exclusively on what happens to the real part.

@tim-one
Copy link
Author

tim-one commented Apr 24, 2024

Thinking out loud here: if you want to move to different log1p() implementations for the real and complex cases, I can think of two ways code could benefit:

  • Cancellation is a non-issue in the real case. Widening the "fast path" is fine there.

  • In the complex case, the fast path is doing much more work under the covers than currently appears needed. There is only one mpf multiplication that really needs very fat precision. and then no mpc operations are needed at all. Like so:

elif cmag < -mp.prec:
    # Series expansion, c - c**2/2.
    a, b = c.real, c.imag
    # Compute (a**2 = b**2) / 2 very accurately.
    a2b2 = mp.ldexp(mp.fmul(a+b, a-b, exact=True), -1)
    # a-a2b2 can lose almost all the bits of `a` to
    # cancellation. That's why we want lots of extra
    # precision in a2b2.
    result = mpc(a - a2b2, b - a*b)

Then again, I found it very elegant that the same code was used for both.

@tim-one
Copy link
Author

tim-one commented Apr 24, 2024

FYI. about this earlier report, of a bad fast-path case when under the widened test (mag * 2 < -prec).

>>> c = mpc(-6.4922176418510124e-21+1.1394926627101214e-10j)
>>> with workprec(10000): complex(mp.log1p(c)) # presumed correct
(-1.4201199664289643e-37+1.1394926627101214e-10j)
>>> complex(test(c))  # 2-term series only gets 4 correct digits
(-1.42033071087851e-37+1.1394926627101214e-10j)
# And cancellation is so bad the 3-term series isn't better
>>> with extraprec(10): complex(c*(c*(c/3.0 - 0.5) + 1.0))
(-1.42033071087851e-37+1.1394926627101214e-10j)
  • That wasn't the worst case I found - just the first. I didn't save them all. At least one case got only the first digit right.

  • It you use lots of precision, the Taylor series does converge to the right 53-bit result. But it takes 4 terms! 3 isn't enough no matter how many bits you use.

  • Despite trying several times, I still don't have a good answer to how the norm of the input relates to the worst-case accuracy for this series with N terms. This appears to be approximately infinitely harder than out-thinking the real-valued case 😦

Empirically, as I mentioned before I have a generator now that produces nothing but cases where cancellation can matter. Using the original mag < -prec cutoff, that's gone through over 125 million cases with max total ULP error 0.5005 (*) (as indirectly noted earlier, the imaginary component of the result is going to be extremely close to the original c.imag in these cases. so that component essentially contributes 0 ULP error every time).

But I've so far failed to find a provable bound on how bad it can get. Indeed, the more I've thought about it, the more confused I've gotten about which approaches might work for that 😆.

(*) This run is using the "only one fat mpf multiply, mpc-free" code I showed in my last comment.

@skirpichev
Copy link
Collaborator

skirpichev commented Apr 24, 2024

Using the original mag < -prec cutoff, that's gone through over 125 million cases with max total ULP error 0.5005. This run is using the "only one fat mpf multiply, mpc-free" code I showed in my last comment.

Hmm, does cancellation ever occur in this branch with original code?

Edit: it seems, flint uses same cutoff, not sure if this was justified somehow, but.

@tim-one
Copy link
Author

tim-one commented Apr 24, 2024

Hmm, does cancellation ever occur in this branch with original code?

Absolutely. I covered this in some detail in the comment GIthub lost 😢 Here's a pretty dramatic example:

>>> c = mpc(-1.430796815051627e-72+1.691624553529315e-36j)
>>> mp.mag(c)
-117

# That's way less than -63, so the original will take its fast path.

>>> with workprec(10000): complex(mp.log1p(c))
(4.6709293580298264e-91+1.691624553529315e-36j)

# I presume that the 10-thousand bit result is correct.
# But the original function just manages to get the first digit right:

>>> complex(mp.log1p(c))
(4.909093465297727e-91+1.691624553529315e-36j)

# The Horner spelling repairs that, though:
>>> with extraprec(10): complex(c * (1.0 - c * 0.5))
(4.6709293580298264e-91+1.691624553529315e-36j)

It's not that Horner is somehow magical, it's that the cancellation occurs inside the implementation of mpc.__mul__, where products are computed exactly. The exact products feed into the cancelling add/subtract, allowing the low-order surviving bits to "shift up" (and closer analysis convinced me that only one "fat multiply" total is really needed in this case, which the code I showed implements).

The original spelling fails because in x - 0.5*x**2, both terms are rounded back to 63 (= 53 + 10) bits before the massively cancelling subtraction is done. Not enough bits survive from the exact products.

@tim-one
Copy link
Author

tim-one commented Apr 24, 2024

Edit: it seems, flint uses same cutoff, not sure if this was justified somehow, but.

I think it's easy to show that mag < -prec is adequate for the real-valued case. I think it's also possible to complete a proof that the much looser mag * 2 < -prec is also adequate for the real-valued case.

But I'm starting to suspect nobody actually thought about it for the complex case 😜

@skirpichev
Copy link
Collaborator

skirpichev commented Apr 24, 2024

I know few libraries, that provide complex variant of log1p. IIRC, julia just uses Kahan's trick. The MPC has no log1p (it's in TODO).

@tim-one
Copy link
Author

tim-one commented Apr 24, 2024

Ya, IMO log1p(complex) is a pretty silly idea 😉. The real numerical problem for log(c) is when the mathematical abs(c) is close to 1, because the real part of the mathematical result is log(abs(c)).

>>> c = mpc(0.5, mp.sqrt(.75000000001))
>>> c
mpc(real='0.5', imag='0.8660254037902122')
>>> abs(c)
mpf('1.000000000005')

# So numerically challenging despite that c.real is far from 0.

# How does mpmath do on it?
>>> mp.log(c)
mpc(real='5.0000441239368832e-12', imag='1.0471975511994844')
>>> mp.log(abs(c))
mpf('5.0000004136893552e-12')

# Oops? The real part materially differs from mp.log(|c|).
# But that's because |c| was computed with inadequate
# precision in the last line.

>>> with extraprec(100): absofc = abs(c)
>>> mp.log(absofc)
mpf('5.0000441239368832e-12')

# And that matches mp.log(c).real.

numpy supports log1p(complex) too. That's how I got sucked into this, replying to complaints about poor results there. No surprise: numpy's log1p(c) just does log(c+1). The numpy docs don't point this out, but only claim better accuracy for the real-valued version (which uses a different algorithm).

@tim-one
Copy link
Author

tim-one commented Apr 24, 2024

I'm going to share my "bad case" generator here, so there's a record of it on the web. It required some effort to come up with, and it's been extremely effective at "breaking" half a dozen lop1p(complex) implementations very quickly (including different approaches than mentioned here - there's apparently something inherently "very hard" about these).

from math import sqrt, ldexp, ulp
from random import random, randrange
# Generate c such that there's significant cancellation
# in computing the real part of c - c*c/2.
def getc():
    a = random() * 0.5 + 0.5
    a = ldexp(a, randrange(-500, -126))
    a = -a
    b = sqrt(a * (a - 2.0))
    u = ulp(b)
    u = ldexp(u, randrange(40))
    if random() < 0.5:
        u = -u
    u *= random()
    b += u
    return complex(a, b)

The last code I talked about hasn't hit a total ULP error over 0.5005 after 650 million of these so far.

@tim-one
Copy link
Author

tim-one commented Apr 25, 2024

I think I may 😉 understand this better now. Where the last code had:

            a2b2 = mp.ldexp(mp.fmul(a+b, a-b, exact=True), -1)
            result = mpc(a - a2b2, b - a*b)

I'm hopeful that this will do almost as well (and saves two mpf additions):

            b2 = mp.ldexp(mp.fmul(b, b, exact=True), -1)
            result = mpc(a + b2, b - a*b)

Unless I'm wrong, the high half of the full b*b product can cancel out all of a, leaving the low half of the full product as "the answer" - the last bit of which may be affected by the full a*a product via rounding. But I don't believe the last can actually happen unless the input actually has mp.prec significant bits. In the context of this testing, it cannot: the input components have at most 53 significant bits, and mp.prec is 63. If the inputs actually have 63 (or more) bits ... I'm not entirely sure, but offhand doubt it matters much.

So a*a is irrelevantly small in cancelling cases. In non-cancelling cases, whichever of |a| and |b2| is larger dominates the a + b2 sum, But won't it hurt to ignore a*a if a*a > b*b? Perhaps, but not by much. If the leading bit of a has weight 2^L, the leading bit of a*a has weight 2^(2*L) or 2^(2*L+1). The least important bit of a has weight 2^(L - prec + 1) So for the leading bit of a*a to directly matter, it must be that 2*L+1 >= L - prec + 1, or L >= -prec But mag(c) < -prec in this branch, so L <= mag(a) <= mag(c) < -prec too. Therefore it's not possible that L >= -prec: the leading bit of a*a is somewhere "to the right" of a's least important bit, so at worst can affect the final result indirectly via rounding.

This is fuzzy, though, because mag() may be too large (the docs say by at most 2). But if mag(c) < -prec is true, and mag(c) is too large, the true magnitude is "even more < -prec". So it doesn't change that L >= -prec can't hold.

Preliminary testing is going fine, but it's only tried 10 million cases. Since this was derived by analyzing the snot 😉 out of cancelling cases, I widened the generator to probe many more non-cancelling fast-path cases too. The reasoning for always ignoring a*a is the more subtle, so sorely needs testing.

@tim-one
Copy link
Author

tim-one commented Apr 25, 2024

BTW, a similar but simpler argument shows why widening the fast path is fine for the mpf case. How can we ensure that + x^3/3 can't affect the last bit directly (so that there's scant point to using a 3rd series term)? The last important bit of x again has weight 2^(L - prec + 1). The leading bit of x^3 has weight with exponent 3L, 3L+1, or 3L+2. So the largest it can get is 3L+2. But dividing by 3 leaves it worst-case maximum 3L+1 instead.

So we want 3*L + 1 < L - prec + 1, which simplifies to 2*L < -prec. Replace L by mag(x), and we're done 😄.

@tim-one
Copy link
Author

tim-one commented Apr 25, 2024

And another: in the mpf case, the original mag < -prec condition is so strong that we could almost jurt as well return x - subtracting 0.5*x*x makes scant difference.

Same kind of argument: the leading bit of x^2 has weight 2L or 2L+1. But dividing by 2 leaves 2L as the worst case. So we want 2L < L - prec + 1, which simplifies to L < -prec + 1. Replace L by mag(x) and we're almost done. Testing for L < -prec is more restrictive than testing for L < -prec + 1, and the latter is all we really need to ensure that the second series term doesn't overlap with x's leading prec significant bits.

And all these arguments are pessimistic. Supposing mag() was exact (returns the mathematical ceiling(log(abs(x), 2)), what I've been calling L equals mag(x) only when x is 2^i for some integer i. Else mag(x) is one larger than L. The actual mp.mag(x) may be as much as 3 larger than L, and never smaller than L.

Alas, this approach appears to be of no help in the complex case. Arguments based on mag carry over to that case, but the L concept is confined to a single float. mag(c) seems pretty much useless for predicting when cancellation may occur in the computation of f(c)'s (for some function f()) .real and/or .imag components.

@tim-one
Copy link
Author

tim-one commented Apr 25, 2024

mag(c) seems pretty much useless for predicting when cancellation may occur in the computation of f(c)'s (for some function f()) .real and/or .imag components.

Let me make that concrete via an earlier example:

>>> c = mpc(-1.430796815051627e-72+1.691624553529315e-36j)
>>> mp.mag(c)
-117
>>> mp.mag(c*c/2)
-237

# In the real-valued case, such a large difference in mag guarantees that
# adding/subtracting the two will leave `c` unchanged to 53 sig bits.

# But not so in the complex case:

>> with extraprec(100): complex(c*c/2)
(-1.430796815051627e-72-2.4203710234528746e-108j)

# The .real part of the result is almost exactly c.real.
# Massive cancellation during `-` despite huge mag difference.

Which, yes. contradicts my:

Arguments based on mag carry over to [the complex case]

That was careless. I had in mind arguments about, say, whether a series converges. So some arguments carry over, but not others - couldn't be clearer 😉 .

@tim-one
Copy link
Author

tim-one commented Apr 26, 2024

Observation: in the latest code, which continues to deliver excellent test results after over 900 million trials:

            b2 = mp.ldexp(mp.fmul(b, b, exact=True), -1)
            result = mpc(a + b2, b - a*b)

consider what happens when b is 0. Then it's just the real-valued function in disquise. b2 is 0 too, and the result is mpc(a, 0). So the current run is also indirectly testing that when mag < -prec, in the real-valued case only the first term of the series actually matters.

BTW, while the mag < -prec condition was used in my argument for why a*a is irrelevantly small in the complex case, I'm still not clear on "the real reason" for why the complex case needs more series terms than the real case.

@tim-one
Copy link
Author

tim-one commented Apr 26, 2024

Ah! The light dawns!! In the complex version, there are really two series: one for the real part, and another for the imag part. They both have to converge, but may converge at different rates. Looking at the imag series alone, by minor variation of arguments already given, only one term should suffice: with mag < -prec, the -a*b in b-a*b can also be ignored. That series only needs one term. It's the real part that needs two terms, and because of possible heavy cancellation.

So on to testing this code, which loses another multiply and add. We're down to just one multiply and one add now:

            b2 = mp.ldexp(mp.fmul(b, b, exact=True), -1)
            result = mpc(a + b2, b)

The last run (using b-a*b) completed a billion tries with worst case total ULP error 0.50048, reached at 3.834704113872437e -131 - 6.775600261387217e-51j,

@tim-one
Copy link
Author

tim-one commented Apr 26, 2024

And one more twist: cancellation can't occur in a + b*b/2 unless a is negative, so it's only when a<0 that the expense of an exact product may be needed. So this suffices:

            b2 = mp.ldexp(mp.fmul(b, b, exact=a<0), -1)

The bound can be loosened to mag * 2 < -prec too, but at significant extra expense. While in the real case the 2nd-order term doesn't really matter under that condition, in the complex case the full 2nd-order terms in both the real and imag series matter. That is, under the looser condition, a**2/2 can matter in a-(a+b)*(a-b)/2, and a*b can matter in b-a*b. Those both have very little effect (at worst indirectly via final rounding) under the stronger mag < -prec condition.

@tim-one
Copy link
Author

tim-one commented Apr 26, 2024

For clarity, I should have spelled this out earlier: what I called L for a given float x is mathematically floor(log(abs(x), 2)). mpmath.mag(x) returns an upper bound on the same expression, but with ceiling replacing floor. That's why L is usually one smaller, but the same (if mpmath.mag(x) were always exact) when x is 2 raised to an integer power.

2^L(x) gives the weight of x's most-significant bit. When doing careful error analysis involving elementary arithmetic, I just find that easiest to reason about. For example, L(x*y) is either L(x) + L(y) or one more than that. This is easy to prove. If x>0, x = f*2^L(x) for some real satisfying 1 <= f < 2. So x*y is f*2^L(x) * g*2^L(y) for some f and g both in [1, 2), which is f*g * 2^(L(x)+L(y)) with 1 <= f*g < 4. So if f*g < 2, then L(x*y) is L(x)+L(y). Else f*g can be divided by 2 to bring it back into the [1, 2) interval, and the exponent of 2 has to be increased by 1 to preserve equality.

@tim-one
Copy link
Author

tim-one commented Apr 27, 2024

I think I'm about done here.

One more loose end: justifying the log(1.0 + c) path for larger abs(c). The numeric challenge for log1p(c) is when abs(c+1) is close to 1, because the real part is log(abs(c+1)), and log(1+e) basically is e for tiny |e|.

By the reverse triangle inequality, abs(c+1) >= abs(c) - 1. So if abs(c) >= 3, abs(c+1) >= 2, nowhere near being a problem.

If abs(c) >= 2, abs(c+1) >= 1 which suggests there might be a problem. But there actually isn't. For abs(c+1) to be less than abs(c), adding 1 has to decrease the absolute value of c.real. For worst case, suppose abs(c) = 2. Then abs(c.real) <= 2 too, and for adding 1 to decrease its absolute value it must be that -2 <= c.real <= -1. If c.real is -2, then adding 1 gives exactly 1. Else c.real is a binary float of the form -1.bbbbbbbb..., and adding 1 in fp arithmetic gives exactly -0.bbbbbbbb.... That is, it's a cancelling add, and those are exact on their own. So c+1 is computed exactly in either case. Since no information was lost, log(c+1) is fine.

Otherwise abs(c+1) >= 2 too, which is nowhere near being a numeric problem. c+1 may not be exact, but if not it's off by 1/2 ulp rounding error in c.real, which has scant effect on abs(c+1). For example, if c.real is 1e-30, adding 1 loses all of it, but since abs(c+1) >= 2 the imaginary part must be >= 2 in this case. Losing 1e-30 from the real part is numerically irrelevant to the sum of the squares of the components then.

To ensure abs(c) >= 2, if mp.mag(c) were always exact it would suffice to check that mp.mag(c) >= 2 (which would guarantee abs(c) <= 2^2 = 4 if it were 2, while mag=1 would guarantee abs(c) <= 2; so mag >= 2 is the smallest value "that works" for this purpose); , Because I had observed that mag was sometimes 1 too large, in the original code I changed that 2 to 3. But since then I found the docs that said mag may be 2 too large, so it should really be 4 instead. Which the "final" version of the code below uses.

Some billions of randomized test cases have been run by now, with total ULP error no worse than 1.001, and in "fast path" cases no more than 0.501. These include a billion randomized fast-path cases contrived to suffer significant cancellation in a+b*b/2.

It's been fun 😉.

def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    with extraprec(10):
        if cmag >= -mp.prec:
            arg = mp.fadd(1.0, c, exact=cmag < 4)
            result = mp.log(arg)
        else:
            a, b = c.real, c.imag
            b2 = mp.ldexp(mp.fmul(b, b, exact=a < 0.0), -1)
            result = mpc(a + b2, b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 28, 2024

EDIT: OUCH! This code is no good. The regular tests still haven't found anything interesting, but my "bad case" generator broke it instantly. Quite some time ago I noted that cancellation was so bad in one case that it almost needed a third sequence term. Ha! This one needs four(!) terms.

>>> c = mpc(-3.1061140011623543e-21+7.881768838480807e-11j)
>>> mp.mag(c)
-32
>>> with extraprec(10000): complex(mp.log1p(c)) # presumed correct
(4.3173401185459216e-38+7.881768838480807e-11j)
>>> with extraprec(1000): complex(c - c**2/2) # only 3+ digits right
(4.3163753241271e-38+7.881768838480807e-11j)
>>> with extraprec(1000): complex(c - c**2/2 + c**3/3) # not much better!
(4.3183049129647434e-38+7.881768838480807e-11j)
>>> with extraprec(1000): complex(c - c**2/2 + c**3/3 - c**4/4) # bingo
(4.3173401185459216e-38+7.881768838480807e-11j)

So I can't cut the complex fast-path range to 2 * mag < -prec after all, without these rare bad cases popping up, or greatly complicating the code. No, thank you 😄

BTW, this isn't a problem for the code posted before this message, because the mag here is far too large to take that code's fast path.

Original msg, which should be ignored:

For completeness, another version that widens the fast-path range to 2 * mag < -prec. As noted before, this requires using more of the 2-term series expansion. It requires another regular (not exact) mpf multiply, and 3 more regular mpf adds. Which triples(!) the total mpf operation count.

If you like this better, tell me and I'll run tests on it for a lot longer. No surprises after a few dozen million trials (worst error below 0.501 ULP in each component):

def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    with extraprec(10):
        if 2 * cmag >= -mp.prec:
            arg = mp.fadd(1.0, c, exact=cmag < 4)
            result = mp.log(arg)
        else:
            a, b = c.real, c.imag
            if cmag < -mp.prec:
                b2 = mp.fmul(b, b, exact=a < 0.0)
            else:
                b2 = mp.fmul(b+a, b-a, exact=a < 0.0)
                b -= a*b
            result = mpc(a + mp.ldexp(b2, -1), b)
    return result

@skirpichev
Copy link
Collaborator

@tim-one, would you like to create a PR? If you want to put in randomized tests, maybe we should depend on the hypothesis.

@tim-one
Copy link
Author

tim-one commented Apr 28, 2024

Sorry, no, I don't want to try to create a PR. I already spent far more time on this than I planned for, and all my work has been at the Python level, outside mpmath's implementation. The internals of mpmath seem partly a different world, with lots of internal functions, data representations, and conventions I know nothing about.

So I expect it would go far easier, better, and faster if, e.g., you 😄 translated the few lines of code to internal "mpmath-speak".

The are also decisions for mpmath to make, which I don't have strong opinions about. Like, do you want different real and complex implementations of log1p()? Do you want to widen the fast path for the real-valued version? Do you just want to change a common fast path to the Horner spelling (return `c * (1.0 - c*0.5)') and leave it as an inside joke 😉 that it avoids cancellation catastrophes largely by implementation accident? That would be a minimal code change, but substantially slower than the 1 mpf add + 1 mpf multiply that's possible.

Those are all project-level decisions, though, and I'm not a project dev here. I'm just a happy mpmath user who wants to see better results for this function.

@tim-one
Copy link
Author

tim-one commented Apr 28, 2024

I truly can't make more time for this now, but there's something else I think needs to be addressed to get production-quality code. This line:

            arg = mp.fadd(1.0, c, exact=cmag < 4)

can make the code arbitrarily expensive the smaller |c.real| gets, regardless of the current precision (the expense isn't just from the add, but also from the potentially giant mantissa passed to log() to chew on). The current implementation does the add with double the current precision, which is sufficient for the real function, but not the complex one (a cause of the original bug report here).

But the smaller |c.real| is, the less difference it can make to the infinitely precise result (mag >= -prec in this branch, so |c.imag| can't also become arbitrarily tiny, so will increasingly dominate the result as c.real's magnitude shrinks).

Empirically, "just" tripling the precision appears so far to work:

            arg = mp.fadd(1.0, c,
                          prec=(mp.prec if cmag >= 4 else 3 * mp.prec))

but 29 * prec // 10 (i.e., an int approximation to multiplying by 2.9) does not. So there appears to be something critical about 3.

But I don't have a handle on how to prove that, nor the time to dig into it now. It's an interesting puzzle, though 😄.

@tim-one
Copy link
Author

tim-one commented Apr 28, 2024

No, this doesn't count as "digging" 😉. Trying out intuitions in Python goes very fast. This appears to work, and I think the comments are qualitatively right, but there's no real justification for why "3" is sufficient when mag(a) <= mag(b). Nailing that would be "digging" (require real thought & effort, and so also significant time).

def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    a, b = c.real, c.imag
    with extraprec(10):
        wp = mp.prec
        if cmag >= -wp:
            if cmag < 4:
                if mp.mag(a) > mp.mag(b):
                    # `a` already contributes the most to c's norm.
                    # After adding 1, it will utterly dominate it.
                    # We only need enough extra precision to avoid
                    # losing any of a's bits when addiog 1.
                    wp *= 2
                else:
                    # `b` contributes the most to c's norm now. After
                    # adding 1 to `a`, in (a+1)**2 + b**2 =
                    # 1 + 2*a + a**2 + b**2, `b` may be so much larger
                    # than `a` now that b**2 is close to }2*a|, or
                    # even cancel 2*a out (in which case the high-
                    # order bits of the teensy a**2 become important).
                    wp *= 3
            arg = mp.fadd(1.0, c, prec=wp)
            result = mp.log(arg)
        else:
            b2 = mp.ldexp(mp.fmul(b, b, exact=a < 0.0), -1)
            result = mpc(a + b2, b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

WRT this comment:

                    # `b` contributes the most to c's norm now. After
                    # adding 1 to `a`, in (a+1)**2 + b**2 =
                    # 1 + 2*a + a**2 + b**2, `b` may be so much larger
                    # than `a` now that b**2 is close to }2*a|, or

that's exactly what happens in the original case that was the subject of this report, c = 1.8370676479640493e-39 - 4.6885882517313053e-20j. b**2 is in the same ballpark as 2a, but adding 1 to a with prec=2*63=126 loses all the bits of a (with mag -128) to rounding. The computed result is exactly 1.

It doesn't matter to this that the log() path doesn't compute the norm. What matters is the demonstration that, despite that a is very small compared to b, it nevertheless has a significant effect on the result. Any computational method that loses a can't get the right answer.

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

So I think this is "the answer" (just came to me while walking): in the hard cases, the magnitude of b aquared is about |2a|. Very roughly, mag(b)*2 == mag(a). Since we don't want to lose a to rounding when adding 1, we need to use enough bits to absorb (viewing a as fixed-point) its leading -mag(a) = -mag(b)*2 0 bits and then its prec most significant bits. Since mag(b) is the larger in this case, where b has larger magnitude than a. mag(c) is close to mag(b). Finally, mag(c) >= -prec in this case too, so -mag(c) <= prec, so -mag(b) <= prec, so -mag(b)*2 <= 2*prec, so -mag(b)*2 + prec <= 3*prec. so -mag(a) + prec <= 3*prec. So uxing 3*prec in the add is enough to preserve the leading prec significant bits of a.

To get the base idea across more intuitively, the worst cases are when c is as small as it can get in this branch: mag(c) = -prec. Then since b is the dominant component in this branch, as a fixed-point number b has about prec leasing 0 bits (after the radix point). Then since a is about the square of b in the hard cases, a has about 2*prec leading 0 bits. So to preserve a's prec most significant bits after adding 1, we need to use about 3*prec bits - 2*prec of which are to preserve the leading 0 bits.

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

Dang. There's an error in the other path of the code (using the series expansion), at

mpc(-1.999999873062092e-40+1.999999936531045e-20j)

Random testing would probably never find one like this. It was contrived so that a == -b**2/2 exactly in binary fp arithmetic. So the partial series expansion returns exactly 0 for the real part. It should be 1.9999997461241924e-80 (nowhere near a ULP in a, but essentially -a**2/2).

Computing the full 2nd-order real seriea, a + (b*b - a*a)/2, rounds to what it should be, but not before boosting extraprec to 133(!). Computing the products exactly is essential, but not enough.

The "doesn't try to take any shortcuts" Horner trick, evaluating c*(1.0 - c*0.5), also returns a result with 0 real part.

Unclear what to do. Using exact adds too is extremely unattractive 😢 .

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

OK! Not so bad 😉. The code gets more long-winded, though:

EXTRAPREC = 10
EXTRAPREC_MAG = mp.mag(EXTRAPREC)
def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    a, b = c.real, c.imag
    with extraprec(EXTRAPREC):
        wp = mp.prec
        if cmag >= -wp:
            if cmag < 4:
                if mp.mag(a) > mp.mag(b):
                    # `a` already contributes the most to c's norm.
                    # After adding 1, it will utterly dominate it.
                    # We only need enough extra precision to avoid
                    # losing any of a's significant bits when addiog,
                    wp *= 2
                else:
                    # `b` contributes the most to c's norm now. After
                    # adding 1 to `a`, in (a+1)**2 + b**2 =
                    # 1 + 2*a + a**2 + b**2, `b` may be so much larger
                    # than `a` now that b**2 is close to }2*a|, or
                    # even cancel 2*a out (in which case the high-
                    # order bits of the teensy a**2 become important).
                    wp *= 3
            arg = mp.fadd(1.0, c, prec=wp)
            result = mp.log(arg)
        else:
            might_cancel = a < 0.0
            b2 = mp.ldexp(mp.fmul(b, b, exact=might_cancel), -1)
            apb2 = a + b2
            if might_cancel:
                if mp.mag(apb2) <= mp.mag(a) - EXTRAPREC_MAG:
                    # The guard bits were lost to cancellation. Rare.
                    # At the contrived
                    # -1.999999873062092e-40+1.999999936531045e-20j
                    # _all_ bits cancel out. Complete the full
                    # 2nd-order real series, esactly.
                    a2 = mp.ldexp(mp.fmul(a, a, exact=True), -1)
                    apb2 = a + mp.fsub(b2, a2, exact=True)
                    #print("X", mp.mag(apb2), mp.mag(a))
            result = mpc(apb2, b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

Last for now rearranging the code for clarity and adding some comments. The comment justifying using 3*prec is poor, but can't do more for now. Ah, also skipped the exact multiply in the possibly-cancelling part. It isn't needed unless there is heavy cancellation. So now the usual path through that code is one plain mpf multiply and one plain mpf add.

LOG1P_EXTRAPREC = 10
LOG1P_EXTRAPREC_MAG = mp.mag(LOG1P_EXTRAPREC)
def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    a, b = c.real, c.imag
    with extraprec(LOG1P_EXTRAPREC):
        wp = mp.prec
        if cmag >= -wp:
            if cmag < 4:
                if mp.mag(a) > mp.mag(b):
                    # `a` already contributes the most to c's norm.
                    # After adding 1, it will utterly dominate it.
                    # We only need enough extra precision to avoid
                    # losing any of a's significant bits when addiog,
                    wp *= 2
                else:
                    # `b` contributes the most to c's norm now. After
                    # adding 1 to `a`, in (a+1)**2 + b**2 =
                    # 1 + 2*a + a**2 + b**2, `b` may be so much larger
                    # than `a` now that b**2 is close to }2*a|, or
                    # even cancel 2*a out (in which case the high-
                    # order bits of the teensy a**2 become important).
                    wp *= 3
            arg = mp.fadd(1.0, c, prec=wp)
            result = mp.log(arg)
        # Else c is "very small", and we use a series expansion.
        # c - c**2/2
        # The real part of that is a+(b*b-a*a)/2, and the
        # imag part b-a*b.
        # Given that cmag < -prec, it can bu shown that the "a*b"
        # is numerically insignifcant in the imag part, and
        # _usually_ the "a*a/2" in the real part. What remains
        # is cheap to compute.
        # In the real part, though, if `a` is negative, the
        # remaining a+b**2/2 can suffer massive cancellation -
        # even total.
        elif a < 0.0: # cancellation is a potential problam
            #b2 = mp.ldexp(mp.fmul(b, b, exact=True), -1)
            apb2 = a + mp.ldexp(b*b, -1)
            if mp.mag(apb2) <= mp.mag(a) - LOG1P_EXTRAPREC_MAG:
                # The guard bits were lost to cancellation. Rare.
                # At the contrived
                # -1.999999873062092e-40+1.999999936531045e-20j
                # _all_ bits cancel out. Complete the full
                # 2nd-order real series, esactly.
                a2 = mp.ldexp(mp.fmul(a, a, exact=True), -1)
                b2 = mp.ldexp(mp.fmul(b, b, exact=True), -1)
                apb2 = a + mp.fsub(b2, a2, exact=True)
            result = mpc(apb2, b)
        else: # cancellation not possible
            result = mpc(a + mp.ldexp(b*b, -1), b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

Ya, I dug into it 😢 . There isn't real need for exact operations of any kind in the worst series-expansion cases. a*a/2 can be done in working precision. b*b/2 in twice that (although if b has no more than prec significant bits, that's the same as exact). And the add only needs 3 times prec.

I'm uncomfortable with such large comment blocks, but don't want to write a book either 😉. I'm also uncomfortable with the relative lack of justifying comments in mpmath's internals. So how to deal with that is another mpmath-internal project decision.

LOG1P_EXTRAPREC = 10
LOG1P_EXTRAPREC_MAG = mp.mag(LOG1P_EXTRAPREC)
def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    a, b = c.real, c.imag
    with extraprec(LOG1P_EXTRAPREC):
        wp = mp.prec
        if cmag >= -wp:
            if cmag < 4:
                if mp.mag(a) > mp.mag(b):
                    # `a` already contributes the most to c's norm.
                    # After adding 1, it will utterly dominate it.
                    # We only need enough extra precision to avoid
                    # losing any of a's significant bits when addiog,
                    wp *= 2
                else:
                    # `b` contributes the most to c's norm now. After
                    # adding 1 to `a`, in (a+1)**2 + b**2 =
                    # 1 + 2*a + a**2 + b**2, `b` may be so much larger
                    # than `a` now that b**2 is close to }2*a|, or
                    # even cancel 2*a out (in which case the high-
                    # order bits of the teensy a**2 become important).
                    wp *= 3
            arg = mp.fadd(1.0, c, prec=wp)
            result = mp.log(arg)
        # Else c is "very small", and we use a series expansion.
        # c - c**2/2
        # The real part of that is a+(b*b-a*a)/2, and the
        # imag part b-a*b.
        # Given that cmag < -prec, it can bu shown that the "a*b"
        # is numerically insignifcant in the imag part, and
        # _usually_ the "a*a/2" in the real part. What remains
        # is cheap to compute.
        # In the real part, though, if `a` is negative, the
        # remaining a+b**2/2 can suffer massive cancellation -
        # even total.
        elif a < 0.0: # cancellation is a potential problam
            apb2 = a + mp.ldexp(b*b, -1)
            if mp.mag(apb2) <= mp.mag(a) - LOG1P_EXTRAPREC_MAG:
                # The guard bits were lost to cancellation. Rare.
                # At the contrived
                # -1.999999873062092e-40+1.999999936531045e-20j
                # _all_ bits cancel out.
                # Since a ~= -b*b/2 in this case, and |b| is at
                # largest (worst case) about 2**-prec, |a| is
                # about 2**(-2*prec), and the true result may
                # be as small as a**2/2, which is about
                # 2**(-4*prec), of which we want the leading
                # prec bits. To get the leading prec bits
                # starting at 2**(-4**prec) from addends
                # starting at 2**(-2*prec), we need the
                # addition to handle 3*prec bits (the first
                # 2*prec of which may cancel to exactly 0).
                a2 = a*a*0.5 # only need at worst prec bits
                b2 = mp.ldexp(mp.fmul(b, b, prec=2*wp), -1)
                apb2 = a + mp.fsub(b2, a2, prec=3*wp)
            result = mpc(apb2, b)
        else: # cancellation not possible
            result = mpc(a + mp.ldexp(b*b, -1), b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 29, 2024

Sorry about this ☹️. The test for whether the guard bits were lost to cancellation let through far more cases than I intended. It's a reduction in mag of at least LOG1P_EXTRAPREC that's relevant, not the coded (and smaller) mp.mag(LOG1P_EXTRAPREC). After repairing that, about 1 in 2 million randomized inputs trigger it. I should probably write a different "bad case" generator that produces a much higher proportion of these cancelling cases ... OK, done, and so far no surprises.

LOG1P_EXTRAPREC = 10
def test(c):
    c = mpc(c)
    cmag = mp.mag(c)
    a, b = c.real, c.imag
    with extraprec(LOG1P_EXTRAPREC):
        wp = mp.prec
        if cmag >= -wp:
            if cmag < 4:
                if mp.mag(a) > mp.mag(b):
                    # `a` already contributes the most to c's norm.
                    # After adding 1, it will utterly dominate it.
                    # We only need enough extra precision to avoid
                    # losing any of a's significant bits when addiog,
                    wp *= 2
                else:
                    # `b` contributes the most to c's norm now. After
                    # adding 1 to `a`, in (a+1)**2 + b**2 =
                    # 1 + 2*a + a**2 + b**2, `b` may be so much larger
                    # than `a` now that b**2 is close to }2*a|, or
                    # even cancel 2*a out (in which case the high-
                    # order bits of the teensy a**2 become important).
                    wp *= 3
            arg = mp.fadd(1.0, c, prec=wp)
            result = mp.log(arg)
        # Else c is "very small", and we use a series expansion.
        # c - c**2/2
        # The real part of that is a+(b*b-a*a)/2, and the
        # imag part b-a*b.
        # Given that cmag < -prec, it can bu shown that the "a*b"
        # is numerically insignifcant in the imag part, and
        # _usually_ the "a*a/2" in the real part. What remains
        # is cheap to compute.
        # In the real part, though, if `a` is negative, the
        # remaining a+b**2/2 can suffer massive cancellation -
        # even total.
        elif a < 0.0: # cancellation is a potential problam
            apb2 = a + b*b*0.5
            if mp.mag(apb2) <= mp.mag(a) - LOG1P_EXTRAPREC:
                # The guard bits were lost to cancellation. Rare.
                # At the contrived
                # -1.999999873062092e-40+1.999999936531045e-20j
                # _all_ bits cancel out.
                # Since a ~= -b*b/2 in this case, and |b| is at
                # largest (worst case) about 2**-prec, |a| is
                # about 2**(-2*prec), and the true result may
                # be as small as a**2/2, which is about
                # 2**(-4*prec), of which we want the leading
                # prec bits. To get the leading prec bits
                # starting at 2**(-4**prec) from addends
                # starting at 2**-(2*prec), we need the
                # addition to handle 3*prec bits (the first
                # 2*prec of which may cancel to exactly 0).
                a2 = a*a*0.5 # only need at worst prec bits
                b2 = mp.ldexp(mp.fmul(b, b, prec=2*wp), -1)
                apb2 = a + mp.fsub(b2, a2, prec=3*wp)
            result = mpc(apb2, b)
        else: # cancellation not possible
            result = mpc(a + b*b*0.5, b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 30, 2024

And another "final" version, fleshing out comments and rearranging the code more for clarity.

I apologize for all the noise. I had no idea at the start how many subtle cancellation surprises were hiding in the details. If I had even an inkling of that, I would have set up a repository instead. I figured it would take a few hours at most - hahahaha 😝

If I ever need to work on a complex function again, at least there are days of "learning curve" I won't have to climb again (e.g., that rapid convergence of the norm says very little about the convergence rates of the derived real and imag series; and that component cancellation can be massive in a complex series even though the same series never gets anywhere near cancellation when applied to reals).

LOG1P_EXTRAPREC = 10
def test(c):
    # Note that all cases could by handled by log(1+c) provided the
    # add is done exactly. Our aim here is be much faster than that,
    # especially when |c| is small.
    c = mpc(c)
    cmag = mp.mag(c)
    a, b = c.real, c.imag
    with extraprec(LOG1P_EXTRAPREC):
        wp = mp.prec
        if cmag >= -wp:
            # |c| isn't very small. We call log(1+c) instead, but
            # are careful about the precision used by the add. The
            # real part of the result is log(|c+1|). That's
            # determined by 1 + 2*a + a**2 + b**2, and the add has
            # to preserve enough info so that no important bits of
            # that sum are lost. It doesn't matter to this that 2*a,
            # a**2, etc, are not computed explicitly here: we're
            # deducing how many bits have to be present in the sum
            # for log() to "reverse engineer" the value of 2*a +
            # a**2 + b**2 to `prec` good bits,
            if cmag < 4:
                # |c| isn't very small, or large.
                if mp.mag(a) > mp.mag(b):
                    # `a` already contributes the most to c's norm.
                    # After adding 1, it will utterly dominate it.
                    # We only need enough extra precision to avoid
                    # losing any of a's `prec` most significant bits
                    # when addiog, `b**2` is too small to matter.
                    wp *= 2
                else:
                    # b**2 is the larger of the square terms. The
                    # smallest b can be is about 2**-prec, so the
                    # smallest b**2 can be is about 2**(-2*prec). So
                    # for a bit to matter compared to b**2, it has
                    # to be at least about 2**(-3*prec). Bits of 2*a
                    # (if any) >= 2**(-3*prec) will be preserved if
                    # we use 3*prec bits for the add.
                    wp *= 3
            # Else (cmag >= 4), |c+1| >= |c| - 1 is so large that
            # working precision is fine (although that takes some
            # careful analysis for cmag=4, given that .mag() _may_
            # return a rexult too large by 2), So leave wp alone.
            arg = mp.fadd(1.0, c, prec=wp)
            result = mp.log(arg)
        else:
            # Else c is "very small", and we use a series expansion,
            # c - c**2/2. The real part of that is a+(b*b-a*a)/2,
            # and the imag part b-a*b. Given that cmag < -prec, it
            # can be shown that "a*b" is numerically insignifcant in
            # the imag part, and _usually_ the "a*a/2" in the real
            # part. What remains is cheap to compute. In the real
            # part, though, if `a` is negative, the remaining
            # a+b**2/2 can suffer massive cancellation - even total.
            real = a + b*b*0.5 # usually the real part of the result
            if (a < 0.0
                and mp.mag(real) <= mp.mag(a) - LOG1P_EXTRAPREC):
                # The guard bits were lost to cancellation. Rare. At
                # the contrived
                # -1.999999873062092e-40+1.999999936531045e-20j
                # _all_ bits cancel out. Since a ~= -b*b/2 in this
                # case, and |b| is at largest (worst case) about
                # 2**-prec, |a| is about 2**(-2*prec), and the true
                # result may be as small as a**2/2, which is about
                # 2**(-4*prec), of which we want the leading prec
                # bits. To get the leading prec bits starting at
                # 2**(-4**prec) from addends starting at
                # 2**-(2*prec), we need the subtraction to handle
                # 3*prec bits (the first 2*prec of which may cancel
                # to exactly 0).
                a2 = a*a # only need at worst prec bits
                b2 = mp.fmul(b, b, prec=2*wp)
                diff = mp.fsub(b2, a2, prec=3*wp)
                real = a + mp.ldexp(diff, -1)
            result = mpc(real, b)
    return result

@tim-one
Copy link
Author

tim-one commented Apr 30, 2024

Offline, it was suggested that the newer code to limit the precision used (instead of using exact=True) when adding 1 was gross overkill, because mag(c) >= -prec in this branch already ruled out very small arguments.

But, alas, that reasoning only works for the real-valued function. For complex c, mag(c) only gives an upper bound on the larger of its component's mags. It reveals almost nothing about the mag of the component with the smaller mag.

For example, in c = mpc(1e-300, 1e-10), the tiny real part has almost no effect on the complex mag. mag(c) is -32, but mag(c.real) is -996. Adding 1 to 1e-300 exactly yields a mantissa with about a thousand bits.

Given that cmag >= -prec, the newer code deduces that no more than the leading 3*prec bits of 1 + c.real can make a real difference to the outcome, so that's all the precision the newer code asks of the sum.

While I'm doing my testing on native machine doubles, of course mpmath supports values far, far smaller than 1e-300. A thousand needless bits may be bearable, but a billion perhaps not so much 😉.

skirpichev added a commit to skirpichev/mpmath that referenced this issue May 5, 2024
Patch, ported from Tim Peters version at:
mpmath#790 (comment)

And few tests from the issue thread

Closes mpmath#790
@skirpichev skirpichev linked a pull request May 5, 2024 that will close this issue
@skirpichev
Copy link
Collaborator

#803

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants