Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic parallelism #442

Open
simone-silvestri opened this issue Dec 13, 2023 · 1 comment
Open

Dynamic parallelism #442

simone-silvestri opened this issue Dec 13, 2023 · 1 comment

Comments

@simone-silvestri
Copy link

I am trying to set up a dynamic kernel wherein a KA kernel launches a CUDA kernel. The final objective would be to have dynamic parallelism using only kernel abstractions. This is a MWE showing the comparison between launching the parent kernel with CUDA or with KA

the child kernel

function child!(a)
    i = threadIdx().x
    @inbounds a[i] = i
    return nothing
end

CUDA implementation (runs)

function parent!(a)
    @cuda dynamic=true threads=10 blocks=1 child!(a)
    return nothing
end

a = CuArray(zeros(10))

kernel! = @cuda launch=false maxthreads=10 always_inline=true parent!(a)

kernel!(a; threads=1, blocks=1)

KA implementation

@kernel function parent!(a)
    @cuda dynamic=true threads=10 blocks=1 children!(a)
end

a = CuArray(zeros(10))

kernel! = parent!(CUDA.CUDABackend(), 1, 1)
 
kernel!(a)

returns

JIT session error: Symbols not found: [ cudaGetErrorString ]
JIT session error: Failed to materialize symbols: { (JuliaOJIT, { julia_throw_device_cuerror_3299 }) }
JIT session error: Failed to materialize symbols: { (JuliaOJIT, { julia_#_#14_3295 }) }
JIT session error: Symbols not found: [ cudaGetErrorString ]
JIT session error: Failed to materialize symbols: { (JuliaOJIT, { julia_throw_device_cuerror_3306 }) }
ERROR: a CUDA error was thrown during kernel execution: invalid configuration argument (code 9, cudaErrorInvalidConfiguration)
ERROR: a exception was thrown during kernel execution.
Stacktrace:
 [1] throw_device_cuerror at /home/ssilvest/.julia/packages/CUDA/35NC6/src/device/intrinsics/dynamic_parallelism.jl:20
 [2] #launch#950 at /home/ssilvest/.julia/packages/CUDA/35NC6/src/device/intrinsics/dynamic_parallelism.jl:27
 [3] launch at /home/ssilvest/.julia/packages/CUDA/35NC6/src/device/intrinsics/dynamic_parallelism.jl:65
 [4] #868 at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:136
 [5] macro expansion at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:95
 [6] macro expansion at ./none:0
 [7] convert_arguments at ./none:0
 [8] #cudacall#867 at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:135
 [9] cudacall at /home/ssilvest/.julia/packages/CUDA/35NC6/lib/cudadrv/execution.jl:134
 [10] macro expansion at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:219
 [11] macro expansion at ./none:0
 [12] #call#1045 at ./none:0
 [13] call at ./none:0
 [14] #_#1061 at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:371
 [15] DeviceKernel at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:371
 [16] macro expansion at /home/ssilvest/.julia/packages/CUDA/35NC6/src/compiler/execution.jl:88
 [17] macro expansion at /home/ssilvest/test.jl:46
 [18] gpu_parent! at /home/ssilvest/.julia/packages/KernelAbstractions/WoCk1/src/macros.jl:90
 [19] gpu_parent! at ./none:0

Is this expected?
I guess it might be a problem of KA setting up maxthreads=1 in the kernel call

@vchuravy
Copy link
Member

Slightly confusing, so not expected.

In my experience dynamic parallelism doesn't have the best performance and of course we will need to figure out what it means for at least one different backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants