Strange compile behavior for @turbo #498

alex-s-gardner · 2023-06-07T19:38:40Z

I have a very long sequence of scalar mathematical operations [example of code shown at bottom of thread]. The below can be compiled but if I add one additional line:

b = sqrt(taupa^2+1)

I get a: LoadError: "Reduction not found." error

Now if I change this line to the equivalent:

b = (taupa^2+1)^0.5

then I do not get an error?

Any thoughts on what might be going on?

P.S. Thanks for the absolutely amazing package

 @turbo for i = eachindex(y)
            y[i] = y[i] + y0
            xi = y[i] / a1
            eta = x[i] / a1
            xisign = sign(xi)
            etasign = sign(eta)
            xi = xi * xisign
            eta = eta * etasign
            backside = xi > p2

            xi = (p - xi) * backside + (xi * !backside)

            c0 = cos(twoT * xi)
            ch0 = cosh(twoT * eta)
            s0 = sin(twoT * xi)
            sh0 = sinh(twoT * eta)

            ar = twoT * c0 * ch0
            ai = twoT * -s0 * sh0

            # --- j = 6 ------
            y1r = - bet[6] 
            y1i = zeroT

            z1r =  -twoT * 6 * bet[6] 
            z1i = zeroT

            y0r = ar * y1r - ai * y1i - bet[6-1] 
            y0i = ar * y1i + ai * y1r

            z0r = ar * z1r - ai * z1i - twoT * (6 - 1) * bet[6-1]
            z0i = ar * z1i + ai * z1r

            # --- j = 4 ------
            y1r = ar * y0r - ai * y0i - y1r - bet[4] 
            y1i = ar * y0i + ai * y0r - y1i

            z1r = ar * z0r - ai * z0i - z1r - twoT * 4 * bet[4] 
            z1i = ar * z0i + ai * z0r - z1i

            y0r = ar * y1r - ai * y1i - y0r - bet[4-1] 
            y0i = ar * y1i + ai * y1r - y0i

            z0r = ar * z1r - ai * z1i - z0r - twoT * (4 - 1) * bet[4-1]
            z0i = ar * z1i + ai * z1r - z0i

            # --- j = 2 ------
            y1r = ar * y0r - ai * y0i - y1r - bet[2] 
            y1i = ar * y0i + ai * y0r - y1i

            z1r = ar * z0r - ai * z0i - z1r - twoT * 2 * bet[2] 
            z1i = ar * z0i + ai * z0r - z1i

            y0r = ar * y1r - ai * y1i - y0r - bet[2-1] 
            y0i = ar * y1i + ai * y1r - y0i

            z0r = ar * z1r - ai * z1i - z0r - twoT * (2 - 1) * bet[2-1]
            z0i = ar * z1i + ai * z1r - z0i

            # ---
            z1r = oneT - z1r + z0r * ar / twoT - z0i * ai / twoT
            z1i= -z1i+ z0r* ai / twoT + z0i* ar / twoT
            
            xip = xi + y0r * s0 * ch0 - y0i * c0 * sh0
            etap = eta + y0r * c0 * sh0 + y0i * s0 * ch0

            gam[i] = atan(z1i, z1r) / d2r
            k[i] = b1 / sqrt(z1r^2 + z1i^2)

            # ----
            s = sinh(etap)
            c = max(zeroT, cos(xip))
            r = sqrt(s^2 + c^2)
            lon[i] = atan(s, c) / d2r

            sxip = sin(xip)
            sr = sxip / r

            gam[i] += atan(sxip * tanh(etap), c) / d2r

            tau = sr / e2m
           

            #numit = 5, iter = 1
            tau1 = sqrt(tau^2 + oneT)
            b = atanh(e * tau / tau1)
            sig = sinh(e * b)
            taupa = sqrt(sig^2 + oneT) * tau - sig * tau1        
            
            ...
end

chriselrod · 2023-06-07T22:13:26Z

Apparently, the code path for ^ dodges the check for reductions, which is why it doesn't hit the "reduction not found" error.

Note that it will turn x ^ 0.5 gets turned into sqrt(x) -- it is the code that checks if it can optimize x ^ y that doesn't check for reductions.

The reduction check looks for this kind of pattern:

for i in I
  b = something
  for j in J
    b = op(b, foo(...)) 
  end
  # do something with b
end

where it wants to know what op is in case it needs to expand and reduce.
It needs to expand and reduce iff it SIMDs the j loop*, but more often than not it will want to SIMD the i loop instead, which makes the expand and reduce unnecessary.
Yet, it will still throw an error... If someone wants to fix that bug, that'd be a neat project.

*That is, if it SIMDs a loop that one instance of b depends on, but that the other does not.

Of course, in your example where you redefine b, neither instance of b depends on some loop the other doesn't; they both only depend on i.
So it will never expand&reduce.
Thus there is absolutely no reason for it to be able to know how to expand/reduce.

If all you want is a workaround, just use a unique variable name for each assignment.

If you want to try fixing LV, you're welcome to do so and I can answer questions.
The basic idea is to

not requiring being able to expand/reduce.
filter our vectorizable candidates, e.g. in the for i in I; for j in J example above, if it doesn't know how to expand&reduce op, simply remove j from the vectorization candidates. Code generation would also need updates to make sure it handles it correctly when unrolling.

Long term, I am working on replacing LV; the replacement will not have problems like this.
So I'd suggest just working around the problem via using unique names, and wait for the replacement.

alex-s-gardner · 2023-06-07T22:52:08Z

Thanks for the detailed explanation... I'll implement the work around which I can update whenever LV get's replaced. Did you want me to leave the issue as open?

chengxo mentioned this issue Jan 21, 2024

Reduction not found #528

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange compile behavior for @turbo #498

Strange compile behavior for @turbo #498

alex-s-gardner commented Jun 7, 2023

chriselrod commented Jun 7, 2023

alex-s-gardner commented Jun 7, 2023

Strange compile behavior for @turbo #498

Strange compile behavior for @turbo #498

Comments

alex-s-gardner commented Jun 7, 2023

chriselrod commented Jun 7, 2023

alex-s-gardner commented Jun 7, 2023