Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange compile behavior for @turbo #498

Open
alex-s-gardner opened this issue Jun 7, 2023 · 2 comments
Open

Strange compile behavior for @turbo #498

alex-s-gardner opened this issue Jun 7, 2023 · 2 comments

Comments

@alex-s-gardner
Copy link
Contributor

I have a very long sequence of scalar mathematical operations [example of code shown at bottom of thread]. The below can be compiled but if I add one additional line:

b = sqrt(taupa^2+1)

I get a: LoadError: "Reduction not found." error

Now if I change this line to the equivalent:

b = (taupa^2+1)^0.5

then I do not get an error?

Any thoughts on what might be going on?

P.S. Thanks for the absolutely amazing package

 @turbo for i = eachindex(y)
            y[i] = y[i] + y0
            xi = y[i] / a1
            eta = x[i] / a1
            xisign = sign(xi)
            etasign = sign(eta)
            xi = xi * xisign
            eta = eta * etasign
            backside = xi > p2

            xi = (p - xi) * backside + (xi * !backside)

            c0 = cos(twoT * xi)
            ch0 = cosh(twoT * eta)
            s0 = sin(twoT * xi)
            sh0 = sinh(twoT * eta)

            ar = twoT * c0 * ch0
            ai = twoT * -s0 * sh0

            # --- j = 6 ------
            y1r = - bet[6] 
            y1i = zeroT

            z1r =  -twoT * 6 * bet[6] 
            z1i = zeroT

            y0r = ar * y1r - ai * y1i - bet[6-1] 
            y0i = ar * y1i + ai * y1r

            z0r = ar * z1r - ai * z1i - twoT * (6 - 1) * bet[6-1]
            z0i = ar * z1i + ai * z1r

            # --- j = 4 ------
            y1r = ar * y0r - ai * y0i - y1r - bet[4] 
            y1i = ar * y0i + ai * y0r - y1i

            z1r = ar * z0r - ai * z0i - z1r - twoT * 4 * bet[4] 
            z1i = ar * z0i + ai * z0r - z1i

            y0r = ar * y1r - ai * y1i - y0r - bet[4-1] 
            y0i = ar * y1i + ai * y1r - y0i

            z0r = ar * z1r - ai * z1i - z0r - twoT * (4 - 1) * bet[4-1]
            z0i = ar * z1i + ai * z1r - z0i

            # --- j = 2 ------
            y1r = ar * y0r - ai * y0i - y1r - bet[2] 
            y1i = ar * y0i + ai * y0r - y1i

            z1r = ar * z0r - ai * z0i - z1r - twoT * 2 * bet[2] 
            z1i = ar * z0i + ai * z0r - z1i

            y0r = ar * y1r - ai * y1i - y0r - bet[2-1] 
            y0i = ar * y1i + ai * y1r - y0i

            z0r = ar * z1r - ai * z1i - z0r - twoT * (2 - 1) * bet[2-1]
            z0i = ar * z1i + ai * z1r - z0i

            # ---
            z1r = oneT - z1r + z0r * ar / twoT - z0i * ai / twoT
            z1i= -z1i+ z0r* ai / twoT + z0i* ar / twoT
            
            xip = xi + y0r * s0 * ch0 - y0i * c0 * sh0
            etap = eta + y0r * c0 * sh0 + y0i * s0 * ch0

            gam[i] = atan(z1i, z1r) / d2r
            k[i] = b1 / sqrt(z1r^2 + z1i^2)

            # ----
            s = sinh(etap)
            c = max(zeroT, cos(xip))
            r = sqrt(s^2 + c^2)
            lon[i] = atan(s, c) / d2r

            sxip = sin(xip)
            sr = sxip / r

            gam[i] += atan(sxip * tanh(etap), c) / d2r

            tau = sr / e2m
           

            #numit = 5, iter = 1
            tau1 = sqrt(tau^2 + oneT)
            b = atanh(e * tau / tau1)
            sig = sinh(e * b)
            taupa = sqrt(sig^2 + oneT) * tau - sig * tau1        
            
            ...
end    
@chriselrod
Copy link
Member

Apparently, the code path for ^ dodges the check for reductions, which is why it doesn't hit the "reduction not found" error.

Note that it will turn x ^ 0.5 gets turned into sqrt(x) -- it is the code that checks if it can optimize x ^ y that doesn't check for reductions.

The reduction check looks for this kind of pattern:

for i in I
  b = something
  for j in J
    b = op(b, foo(...)) 
  end
  # do something with b
end

where it wants to know what op is in case it needs to expand and reduce.
It needs to expand and reduce iff it SIMDs the j loop*, but more often than not it will want to SIMD the i loop instead, which makes the expand and reduce unnecessary.
Yet, it will still throw an error... If someone wants to fix that bug, that'd be a neat project.

*That is, if it SIMDs a loop that one instance of b depends on, but that the other does not.

Of course, in your example where you redefine b, neither instance of b depends on some loop the other doesn't; they both only depend on i.
So it will never expand&reduce.
Thus there is absolutely no reason for it to be able to know how to expand/reduce.

If all you want is a workaround, just use a unique variable name for each assignment.

If you want to try fixing LV, you're welcome to do so and I can answer questions.
The basic idea is to

  1. not requiring being able to expand/reduce.
  2. filter our vectorizable candidates, e.g. in the for i in I; for j in J example above, if it doesn't know how to expand&reduce op, simply remove j from the vectorization candidates. Code generation would also need updates to make sure it handles it correctly when unrolling.

Long term, I am working on replacing LV; the replacement will not have problems like this.
So I'd suggest just working around the problem via using unique names, and wait for the replacement.

@alex-s-gardner
Copy link
Contributor Author

Thanks for the detailed explanation... I'll implement the work around which I can update whenever LV get's replaced. Did you want me to leave the issue as open?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants