Attempt at including offsets in kernel launch #399

simone-silvestri · 2023-06-12T15:25:47Z

This PR tries to include offsets in kernel launches so that the Global indices returned by
@index(Global, NTuple) and @index(Global, Linear) are offset by an offset argument.

Example:

julia> @kernel function show_index()
             i, j = @index(Global, NTuple)
             @show i, j
       end
show_index (generic function with 6 methods)

julia> show_index(CPU(), (2, 2), (3, 3), (-1, -2))()
(i, j) = (0, -1)
(i, j) = (1, -1)
(i, j) = (0, 0)
(i, j) = (1, 0)
(i, j) = (2, -1)
(i, j) = (2, 0)
(i, j) = (0, 1)
(i, j) = (1, 1)
(i, j) = (2, 1)

where the last argument (-1, -2) is the offsets to the global indices.

This PR constrains the offsetting of global indices on static kernel size at launch.

@vchuravy I found it a bit difficult to implement arbitrary indices because of the division in blocks, which would have to be rethought. Aka, this is the easiest (probably not the most general) implementation of offsets. Let me know if you would rather it be implemented in another way.

vchuravy · 2023-06-14T01:31:28Z

Thanks for the initial implementation I will have to think about this a bit.
I still feel like this may be better expressed as a projection f(Idx) -> Idx.

abstract type Projection end
struct Identity <: Projection end
(::Identity)(Idx) = Idx

struct Offset{Offsets} <: Projection end

vchuravy · 2023-06-14T01:33:39Z

@timholy might also be able to offer advise. IIUC you are trying to implement an exterior/interior iteration split like EdgeIterator from https://github.com/JuliaArrays/TiledIteration.jl?

luraess · 2023-06-14T06:49:48Z

@vchuravy following this as ability to handle ranges passed to kernels is also a feature that we would necessitate (FD MPI code) to allow for communication computation overlap (in a similar way as pointed out by @simone-silvestri).

vchuravy · 2023-06-14T11:08:24Z

You can do this right now as you would do with CUDA.jl/AMDGPU.jl by projecting a smaller ndrange to your custom index space. This is more about if we can do something like that automatically.

@lcw I think had some code that does this for his DG code

luraess · 2023-06-14T11:49:08Z

Yeah - having something more automatised could be a nice thing. @utkinis may have a small MWE on what we did recently which would be handy to have as well in KA (similar to the proposed thing).

update

simone-silvestri added 8 commits June 11, 2023 16:45

first commit

e69f2fd

tests

d1ed6b7

separate offsets prior to blocks()

4557353

fix tests

8accb00

handle nothing

d7c7592

fix test

c5d969c

remove manifest

4692df9

remove StepRange

13e6de9

simone-silvestri mentioned this pull request Jun 12, 2023

(0.88.0) MPI communication and computation overlap in the HydrostaticFreeSurfaceModel and NonhydrostaticModel CliMA/Oceananigans.jl#3125

Merged

5 tasks

fixed test

08c0c02

vchuravy mentioned this pull request Jun 23, 2023

Add projection mechanism #403

Draft

Merge pull request #1 from JuliaGPU/main

bbc258f

update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt at including offsets in kernel launch #399

Attempt at including offsets in kernel launch #399

simone-silvestri commented Jun 12, 2023

vchuravy commented Jun 14, 2023

vchuravy commented Jun 14, 2023

luraess commented Jun 14, 2023

vchuravy commented Jun 14, 2023

luraess commented Jun 14, 2023

Attempt at including offsets in kernel launch #399

Are you sure you want to change the base?

Attempt at including offsets in kernel launch #399

Conversation

simone-silvestri commented Jun 12, 2023

vchuravy commented Jun 14, 2023

vchuravy commented Jun 14, 2023

luraess commented Jun 14, 2023

vchuravy commented Jun 14, 2023

luraess commented Jun 14, 2023