Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent results w/ and w/o @turbo #507

Open
astrozot opened this issue Aug 29, 2023 · 6 comments
Open

Inconsistent results w/ and w/o @turbo #507

astrozot opened this issue Aug 29, 2023 · 6 comments

Comments

@astrozot
Copy link

Consider the following code

using LoopVectorization, OffsetArrays

function morph_dilate(A::AbstractArray{T,N}, kernel::AbstractArray{S,N}) where {T<:Integer,S<:Integer,N}
    out = similar(A, tuple((first(axes(A, n)) + first(axes(kernel, n)):
                            last(axes(A, n)) + last(axes(kernel, n)) for n  1:N)...)...)
    out .= zero(T)
    Ks = CartesianIndices(kernel)
    @turbo for I in CartesianIndices(A)
        for K in Ks
            out[I+K] |= A[I] & kernel[K]
        end
    end
    out
end

[The function essentially performs a convolution with a given kernel of an input array A, using the integer and (&) and the or (|) instead of the product and sum in the convolution expression. It also extends the original input array A.]

When I use this function on a test input I obtain an unexpected result:

julia> morph_dilate([1 2 3; 4 5 6], OffsetArrays.centered([0 7 0; 7 7 7; 0 7 0]))
4×5 OffsetArray(::Matrix{Int64}, 0:3, 0:4) with eltype Int64 with indices 0:3×0:4:
 0  1  0  3  0
 1  5  3  7  3
 4  5  6  7  6
 0  4  0  6  0

Note that this result is wrong: the correct result, which can be obtained removing the @turbo macro in the definition above, is

julia> morph_dilate([1 2 3; 4 5 6], OffsetArrays.centered([0 7 0; 7 7 7; 0 7 0]))
4×5 OffsetArray(::Matrix{Int64}, 0:3, 0:4) with eltype Int64 with indices 0:3×0:4:
 0  1  2  3  0
 1  7  7  7  3
 4  5  7  7  6
 0  4  5  6  0

Am I using @turbo in an wrong way, or is that a bug?

@chriselrod
Copy link
Member

Am I using @turbo in an wrong way, or is that a bug?

Bug/known limitation. It is fixed in LoopModels, the successor of LoopVectorization.
LoopModels doesn't actually work yet, though, so it'll be a while before you can switch.

This blog post discusses how to fix it automatically (i.e. similar to what LoopModels does), but you can also do this manually, like SimpleChains.jl:
https://github.com/PumasAI/SimpleChains.jl/blob/f028d69679d47f11d35e7f311abdf0d1d3bfab9c/src/conv.jl#L561-L581
The manual fix is awkward, because LoopVectorization also only supports rectangular loop nests (another limitation LoopModels fixes), so the workaround for LoopVectorization has to calculate the extrema, and then mask-off the parts that are out of bounds.

@ctkelley
Copy link

Any chance LoopModels will support Float16?

@chriselrod
Copy link
Member

Any chance LoopModels will support Float16?

It is written as an LLVM pass, so it should more naturally support types supported by Julia and LLVM.

@astrozot
Copy link
Author

@chriselrod thank you very much for your prompt reply. I understand this is a known bug/limitation, but from a user perspective I have not seen it mentioned anywhere in the package documentation. Before your kind message, I did not even know what a tiling was (my ignorance, but I guess that not all users of LoopVectorization are experts in this topic). So I only now I realize the difficulty with the rewrite of this specific kind of loop operation.

In any case, since (correct me if I am wrong) there is no mention of this issue in the documentation, I feel that LoopVectorization should raise an error saying that this particular loop cannot be vectorized (as it does in other situations), rather than running without any apparent problem and producing incorrect results.

@chriselrod
Copy link
Member

Well, actually, it could vectorize it in a less fancy way.
So the check should probably just restrict it to the less fancy case.

but I guess that not all users of LoopVectorization are experts in this topic

Yeah, the goal of the repo (and LoopModels) is that users do not have to know anything.

@astrozot
Copy link
Author

OK, I am trying to fix manually the issue using this code

function morph_dilate2(A::AbstractArray{T,N}, kernel::AbstractArray{S,N}) where {T<:Integer,S<:Integer,N}
    out = similar(A, tuple((first(axes(A, n))+first(axes(kernel, n)):
                            last(axes(A, n))+last(axes(kernel, n)) for n  1:N)...)...)
    Ks = CartesianIndices(kernel)
    Is = CartesianIndices(A)
    Ifirst, Ilast = first(Is), last(Is)
    Js = CartesianIndices(out)
    @turbo for J  Js
        tmp = zero(T)
        for K  Ks
            Aᵢ = (Ifirst <= J - K <= Ilast) ? A[J-K] : zero(T)
            tmp |= Aᵢ & kernel[K]
        end
        out[J] = tmp
    end
    out
end

However, @turbo fails with an error message "Reduction not found." (which surprises me since the reduction is actually there). The issue seems to be with the use of CartesianIndex in the ? ... : operator, but if possible I would like to avoid unrolling the entire loops and rather work with a CartesianIndex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants