Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On CPU always use NoDynamicCheck(), just finish the last partial workgroup with DynamicCheck() #449

Open
rafaqz opened this issue Jan 7, 2024 · 1 comment

Comments

@rafaqz
Copy link

rafaqz commented Jan 7, 2024

Given that DynamicCheck() breaks SIMD this can be an order of magnitude faster for some inexpensive tasks.

I'll write up a better MWE, but this is the scale of it - a single threaded game of life in DynamicGrids.jl (basically summing a 3x3 window over Bool) is 2x faster than an 8 core KernelAbstractions.jl sim pretty much just from DynamicCheck():

julia> using DynamicGrids, BenchmarkTools

julia> init = rand(Bool, 1000, 1000);

julia> output = ResultOutput(init; tspan=1:200);

julia> @btime sim!($output, Life(); proc=SingleCPU());
  338.058 ms (6459 allocations: 3.25 MiB)

julia> @btime sim!($output, Life(); proc=CPUGPU());
  652.198 ms (18401 allocations: 4.63 MiB)
@rafaqz
Copy link
Author

rafaqz commented Jan 8, 2024

It seems DynamicCheck is only half the problem - it helps a lot removing it, but something else is also blocking the compiler constant propagating size information (its like a sized array) from the type through the KernelAbstractions kernel that it can see in the single threaded version.

I will have to fix it to find out what the problem is, so will probably submit a PR sometime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant