Variable scoping issue leading to unexpected UndefVarError on CPU #413

aaustin141 · 2023-08-02T17:16:54Z

I have encountered what I think is a variable scoping issue that causes one of my KernelAbstractions kernels to fail when executing on the CPU. (GPU execution is fine.) I'm using KernelAbstractions v0.9.6 in Julia 1.9.2. Here's a minimal example that triggers the problem:

using KernelAbstractions

@kernel function mykernel(x)
    i = @index(Global, Linear)
    _, Nblocks = @ndrange()

    @inbounds begin
        id = Nblocks

        @synchronize

        x[i] = 1.0
    end
end

x = ones(256, 1)
backend = get_backend(x)
kernel! = mykernel(backend, (256,))
kernel!(x, ndrange = (256, 1))

When I run this code, it fails with:

ERROR: LoadError: UndefVarError: `Nblocks` not defined
Stacktrace:
 [1] cpu_mykernel
   @ ~/.julia/packages/KernelAbstractions/lhhMo/src/macros.jl:276 [inlined]
 [2] cpu_mykernel(__ctx__::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, x::Matrix{Float64})
   @ Main ./none:0
 [3] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
   @ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:115
 [4] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck, static_threads::Bool)
   @ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:82
 [5] (::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)})(args::Matrix{Float64}; ndrange::Tuple{Int64, Int64}, workgroupsize::Nothing)
   @ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:44
 [6] top-level scope
   @ ~/debug.jl:19
 [7] include(fname::String)
   @ Base.MainInclude ./client.jl:478
 [8] top-level scope
   @ REPL[1]:1

The compiler thinks that the variable Nblocks in the id = Nblocks line is not defined, even though it clearly is defined via the call to @ndrange. When I inspect the generated kernel code with code_lowered(), I see:

julia> code_lowered(kernel!.f)
1-element Vector{Core.CodeInfo}:
 CodeInfo(
[...]
5 ──       i@_14 = KernelAbstractions.__index_Global_Linear(__ctx__, I#301)
│    %25 = (KernelAbstractions.ndrange)(__ctx__)
│    %26 = (size)(%25)
│    %27 = Base.indexed_iterate(%26, 1)
│          Core.getfield(%27, 1)
│          @_11 = Core.getfield(%27, 2)
│    %30 = Base.indexed_iterate(%26, 2, @_11)
└───       Nblocks = Core.getfield(%30, 1)
[...]
12 ─       i@_17 = KernelAbstractions.__index_Global_Linear(__ctx__, I#303)
└───       id = Main.Nblocks
[...]
)

The code in block 5 shows that Nblocks is getting set OK, but the code in block 12 shows that when the id = Nblocks line gets translated, the compiler looks for a definition of Nblocks in the Main module, where it does not exist. (I redacted this listing for readability. I'm happy to provide the full listing if that would be helpful.)

The issue disappears if I remove the call to @synchronize.

Any thoughts here?

EDIT: This is probably related to (maybe even a duplicate of) #274. Also, another way I can get the issue to disappear is to move the call to @ndrange that defines Nblocks inside the @inbounds begin ... end block.

The text was updated successfully, but these errors were encountered:

vchuravy · 2023-08-02T17:31:46Z

Yeah this is expected and the reason why the @uniform macro is needed.

https://juliagpu.github.io/KernelAbstractions.jl/api/#KernelAbstractions.@uniform

aaustin141 · 2023-08-02T17:45:06Z

OK, yeah---using @uniform fixes it. I guess what confuses me here is that I didn't need to do that for the variables declared with @index. (Indeed, putting indices inside a @uniform block triggers a different error.) But I can work with that---thanks!

vchuravy · 2023-08-02T17:54:51Z

Yeah the CPU lowering is a bit tricky, and doesn't have the best errors

ManuelCostanzo · 2024-02-29T17:18:10Z

Hey, I need help here please!

No matter if I add or remove @uniform or @Private in the index, I can't run the code:

LoadError: UndefVarError: index not defined:

	index = @index(Global)

	@uniform tid = index - 1

LoadError: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}):

index = @uniform @index(Global)
@uniform tid = index - 1

ERROR: LoadError: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}):

@uniform index = @index(Global)
@uniform tid = index - 1

Any ideas? Thanks!

vchuravy · 2024-02-29T19:22:58Z

@ManuelCostanzo please make it easier to help you by formatting your post.

I am unsure what you want to achieve? By definition @uniform and @index are incompatible.

ManuelCostanzo · 2024-03-02T03:11:53Z

Hi @vchuravy, I just need to access to the thread_id and block_id values after the @synchronize. In CPU, the only way (as far as I know) is to create the variables with @uniform or @private. So, It's impossible to me to do that because I'm getting the error "ID IS NOT DEFINED". If I don't add @uniform, I get "VARR IS NOT DEFINED" Here is an example code:

using KernelAbstractions, CUDA

const backend = CPU()
const BLOCK_SIZE = 4



@kernel function kk(A, B, C)
       id = @index(Global) # I tried using @uniform and @private
	@uniform varr = id # I tried using @private too
	for i in 1:varr
		for j in 1:varr
			@synchronize()
			C[i, j] = 0
			for k in 1:varr
				C[i, j] += A[i, k] * B[k, j]
			end
		end
	end
end


function run_gpu()
	m = 10
	n = 20

	#Inicializo las matrices en la GPU
	A = KernelAbstractions.zeros(backend, Int, m, n)
	B = KernelAbstractions.zeros(backend, Int, m, n)
	C = KernelAbstractions.zeros(backend, Int, m, n)

	#Calculo el tamaño de bloque
	block_size = BLOCK_SIZE
	mn = max(m, n)
	if mn < BLOCK_SIZE
		block_size = mn
	end

	#Calculo el número de bloques
	total_blocks = (mn + block_size - 1) ÷ block_size


	#Anti-diagonal loop
	@time @inbounds for diag in 0:(2*total_blocks-1)
		#Número de bloques a lanzar en la anti-diagonal
		num_blocks_diagonal = min(diag + 1, 2 * total_blocks - diag - 1)
		kernel! = kk(backend)
		kernel!(A, B, C, ndrange = (block_size * block_size, num_blocks_diagonal), workgroupsize = block_size * block_size)
		KernelAbstractions.synchronize(backend)
	end

end


run_gpu()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variable scoping issue leading to unexpected UndefVarError on CPU #413

Variable scoping issue leading to unexpected UndefVarError on CPU #413

aaustin141 commented Aug 2, 2023 •

edited

vchuravy commented Aug 2, 2023

aaustin141 commented Aug 2, 2023

vchuravy commented Aug 2, 2023

ManuelCostanzo commented Feb 29, 2024 •

edited

vchuravy commented Feb 29, 2024

ManuelCostanzo commented Mar 2, 2024 •

edited

Variable scoping issue leading to unexpected UndefVarError on CPU #413

Variable scoping issue leading to unexpected UndefVarError on CPU #413

Comments

aaustin141 commented Aug 2, 2023 • edited

vchuravy commented Aug 2, 2023

aaustin141 commented Aug 2, 2023

vchuravy commented Aug 2, 2023

ManuelCostanzo commented Feb 29, 2024 • edited

vchuravy commented Feb 29, 2024

ManuelCostanzo commented Mar 2, 2024 • edited

aaustin141 commented Aug 2, 2023 •

edited

ManuelCostanzo commented Feb 29, 2024 •

edited

ManuelCostanzo commented Mar 2, 2024 •

edited