Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable scoping issue leading to unexpected UndefVarError on CPU #413

Open
aaustin141 opened this issue Aug 2, 2023 · 6 comments
Open

Comments

@aaustin141
Copy link

aaustin141 commented Aug 2, 2023

I have encountered what I think is a variable scoping issue that causes one of my KernelAbstractions kernels to fail when executing on the CPU. (GPU execution is fine.) I'm using KernelAbstractions v0.9.6 in Julia 1.9.2. Here's a minimal example that triggers the problem:

using KernelAbstractions

@kernel function mykernel(x)
    i = @index(Global, Linear)
    _, Nblocks = @ndrange()

    @inbounds begin
        id = Nblocks

        @synchronize

        x[i] = 1.0
    end
end

x = ones(256, 1)
backend = get_backend(x)
kernel! = mykernel(backend, (256,))
kernel!(x, ndrange = (256, 1))

When I run this code, it fails with:

ERROR: LoadError: UndefVarError: `Nblocks` not defined
Stacktrace:
 [1] cpu_mykernel
   @ ~/.julia/packages/KernelAbstractions/lhhMo/src/macros.jl:276 [inlined]
 [2] cpu_mykernel(__ctx__::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}}, x::Matrix{Float64})
   @ Main ./none:0
 [3] __thread_run(tid::Int64, len::Int64, rem::Int64, obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck)
   @ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:115
 [4] __run(obj::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)}, ndrange::Tuple{Int64, Int64}, iterspace::KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.StaticSize{(256, 1)}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, Nothing}, args::Tuple{Matrix{Float64}}, dynamic::KernelAbstractions.NDIteration.NoDynamicCheck, static_threads::Bool)
   @ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:82
 [5] (::KernelAbstractions.Kernel{CPU, KernelAbstractions.NDIteration.StaticSize{(256,)}, KernelAbstractions.NDIteration.DynamicSize, typeof(cpu_mykernel)})(args::Matrix{Float64}; ndrange::Tuple{Int64, Int64}, workgroupsize::Nothing)
   @ KernelAbstractions ~/.julia/packages/KernelAbstractions/lhhMo/src/cpu.jl:44
 [6] top-level scope
   @ ~/debug.jl:19
 [7] include(fname::String)
   @ Base.MainInclude ./client.jl:478
 [8] top-level scope
   @ REPL[1]:1

The compiler thinks that the variable Nblocks in the id = Nblocks line is not defined, even though it clearly is defined via the call to @ndrange. When I inspect the generated kernel code with code_lowered(), I see:

julia> code_lowered(kernel!.f)
1-element Vector{Core.CodeInfo}:
 CodeInfo(
[...]
5 ──       i@_14 = KernelAbstractions.__index_Global_Linear(__ctx__, I#301)
│    %25 = (KernelAbstractions.ndrange)(__ctx__)
│    %26 = (size)(%25)
│    %27 = Base.indexed_iterate(%26, 1)
│          Core.getfield(%27, 1)
│          @_11 = Core.getfield(%27, 2)
│    %30 = Base.indexed_iterate(%26, 2, @_11)
└───       Nblocks = Core.getfield(%30, 1)
[...]
12 ─       i@_17 = KernelAbstractions.__index_Global_Linear(__ctx__, I#303)
└───       id = Main.Nblocks
[...]
)

The code in block 5 shows that Nblocks is getting set OK, but the code in block 12 shows that when the id = Nblocks line gets translated, the compiler looks for a definition of Nblocks in the Main module, where it does not exist. (I redacted this listing for readability. I'm happy to provide the full listing if that would be helpful.)

The issue disappears if I remove the call to @synchronize.

Any thoughts here?

EDIT: This is probably related to (maybe even a duplicate of) #274. Also, another way I can get the issue to disappear is to move the call to @ndrange that defines Nblocks inside the @inbounds begin ... end block.

@vchuravy
Copy link
Member

vchuravy commented Aug 2, 2023

Yeah this is expected and the reason why the @uniform macro is needed.

https://juliagpu.github.io/KernelAbstractions.jl/api/#KernelAbstractions.@uniform

@aaustin141
Copy link
Author

OK, yeah---using @uniform fixes it. I guess what confuses me here is that I didn't need to do that for the variables declared with @index. (Indeed, putting indices inside a @uniform block triggers a different error.) But I can work with that---thanks!

@vchuravy
Copy link
Member

vchuravy commented Aug 2, 2023

Yeah the CPU lowering is a bit tricky, and doesn't have the best errors

@ManuelCostanzo
Copy link

ManuelCostanzo commented Feb 29, 2024

Hey, I need help here please!

No matter if I add or remove @uniform or @Private in the index, I can't run the code:

LoadError: UndefVarError: index not defined:

	index = @index(Global)

	@uniform tid = index - 1

LoadError: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}):

index = @uniform @index(Global)
@uniform tid = index - 1

ERROR: LoadError: MethodError: no method matching __index_Global_Linear(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.NoDynamicCheck, CartesianIndex{2}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{2, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}, CartesianIndices{2, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}):

@uniform index = @index(Global)
@uniform tid = index - 1

Any ideas? Thanks!

@vchuravy
Copy link
Member

@ManuelCostanzo please make it easier to help you by formatting your post.

I am unsure what you want to achieve? By definition @uniform and @index are incompatible.

@ManuelCostanzo
Copy link

ManuelCostanzo commented Mar 2, 2024

Hi @vchuravy, I just need to access to the thread_id and block_id values after the @synchronize. In CPU, the only way (as far as I know) is to create the variables with @uniform or @private. So, It's impossible to me to do that because I'm getting the error "ID IS NOT DEFINED". If I don't add @uniform, I get "VARR IS NOT DEFINED" Here is an example code:

using KernelAbstractions, CUDA

const backend = CPU()
const BLOCK_SIZE = 4



@kernel function kk(A, B, C)
       id = @index(Global) # I tried using @uniform and @private
	@uniform varr = id # I tried using @private too
	for i in 1:varr
		for j in 1:varr
			@synchronize()
			C[i, j] = 0
			for k in 1:varr
				C[i, j] += A[i, k] * B[k, j]
			end
		end
	end
end


function run_gpu()
	m = 10
	n = 20

	#Inicializo las matrices en la GPU
	A = KernelAbstractions.zeros(backend, Int, m, n)
	B = KernelAbstractions.zeros(backend, Int, m, n)
	C = KernelAbstractions.zeros(backend, Int, m, n)

	#Calculo el tamaño de bloque
	block_size = BLOCK_SIZE
	mn = max(m, n)
	if mn < BLOCK_SIZE
		block_size = mn
	end

	#Calculo el número de bloques
	total_blocks = (mn + block_size - 1) ÷ block_size


	#Anti-diagonal loop
	@time @inbounds for diag in 0:(2*total_blocks-1)
		#Número de bloques a lanzar en la anti-diagonal
		num_blocks_diagonal = min(diag + 1, 2 * total_blocks - diag - 1)
		kernel! = kk(backend)
		kernel!(A, B, C, ndrange = (block_size * block_size, num_blocks_diagonal), workgroupsize = block_size * block_size)
		KernelAbstractions.synchronize(backend)
	end

end


run_gpu() 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants