You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There has been various examples of situations where we would like to insert no-ops into Exo programs. GPU synchronization #547 is one example. There are some examples in CPUs (prefetching, barriers, pause, etc). Some other examples are architecture independent (inserting debug information into the programs).
I think there are a few questions to answer with regard to this feature:
What do we want out of them?
How should it be exposed in the IR?
How should codegen treat them?
How should we be able to insert it into programs?
How should other rewrites interact with them?
What do no-ops imply from an equivalence/effects perspective?
What do no-ops imply from a safety perspective?
I think it is useful to have a discussion on what various things we might want from them and from different perspectives so that we can get a general design that is not overfit to one particular use case.
prefetching:
Here are the answers to the questions above driven by ideas I have had to support prefetching on CPUs and some of the issues that show up:
I would like to be able to insert prefetch instructions at random points in the code with a memory operand and other arguments (e.g. locality hints). I would want future rewrites to continuously update the memory operand. I would want the rewrites to ensure that the memory operand always references a buffer that is live. In addition, I want want to be able to perform out-of-bound accesses to the buffer.
My idea is to have a reserved keywords in the compiler e.g. no-op that can be used as a name to a proc call. This proc can accept an arbitrary number of arguments.
Unreplaced no-op calls are simply no-operations and the arguments are expressions in Exo which have no side-effects. So, we can simply skip them at codegen.
There should be a rewrite insert_no-op(Proc, Gap, *NewExprs) which inserts a no-op call at the gap cursor with the arguments being the provided list of new expressions.
Rewrites should continuously update the arguments of this proc call. Just like any other proc call.
No-ops should have no effects. What about effects of other parts of the program on the arguments of the no-op? That probably shouldn't matter since the proc itself is a no-op. So, I think effect analysis can generally skip no-op calls.
What safety guarantees should be respected? Here are the safety restrictions that I currently know of in Exo:
Bounds check: We want to be able to specify out-of-bounds memory operands for prefetching. But before that, what is this memory operand going to be? Is it a read expression or a window expression or something else? It is really neither, but we currently have no other way of specify a location with respect to a buffer.
Aliasing: should arguments to no-ops be allowed to alias? I don't have a good example of why this should be relaxed here. So, maybe they shouldn't.
Mixing data/control values: should this be allowed? Also, don't have a good example.
Precision/memory backend checks. Should these be enforced on no-op calls that don't get replaced? Ultimately, no-ops will be replaced by some instruction and so those instructions will be checked. It doesn't matter what the backend checks do for the arguments of an unreplaced no-op call here since the no-op call won't be generated either ways. In any case, we don't really know the type and memory of the no-op proc.
Live variables: variables referenced in the arguments of a no-op should always be live.
Any other safety checks I am missing ...
Here is how I envision being able to insert prefetches:
This gets at another problem with the memory operand issue I mentioned earlier. What should the precision of this operand be? It is not really any of the types we have. This definition will throw an error on another precision. Should I implement an instruction for each precision? Should there be a way to talk about memory addresses in Exo?
You can also add some nice helpers in the stdlib on top of this mechanism above:
Users should be able to insert sync instructions at arbitrary gaps in the code. These syncs also need additional arguments: either a range or a predicate indicating which blocks/threads should sync.
I think no-op you're proposing is essentially an extension of pass with arguments.
It might be nice to raise an error on no-ops (other than `pass). An unreplaced sync no-op could lead to incorrect code when translating from the sequential semantics to parallel semantics.
Like 2, I think this can just be an extension of insert_pass.
Agreed
(and 7.) From the GPU perspective, existing analyses are for enforcing the sequential semantics of a program, and the backend enforces the equality of sequential -> parallel semantics. So I think it's also safe for scheduling operation analyses to skip on no-ops.
Just a general comment on my original post: I suggested above that the reserved name should be no-op, which is what people often refer to them. However, I don't think dashes would be accepted in names in Python. Other alternatives are no_op, noop, or nop.
There has been various examples of situations where we would like to insert no-ops into Exo programs. GPU synchronization #547 is one example. There are some examples in CPUs (prefetching, barriers, pause, etc). Some other examples are architecture independent (inserting debug information into the programs).
I think there are a few questions to answer with regard to this feature:
I think it is useful to have a discussion on what various things we might want from them and from different perspectives so that we can get a general design that is not overfit to one particular use case.
prefetching:
Here are the answers to the questions above driven by ideas I have had to support prefetching on CPUs and some of the issues that show up:
no-op
that can be used as a name to a proc call. This proc can accept an arbitrary number of arguments.insert_no-op(Proc, Gap, *NewExprs)
which inserts ano-op
call at the gap cursor with the arguments being the provided list of new expressions.replace
d by some instruction and so those instructions will be checked. It doesn't matter what the backend checks do for the arguments of an unreplaced no-op call here since the no-op call won't be generated either ways. In any case, we don't really know the type and memory of the no-op proc.Here is how I envision being able to insert prefetches:
This gets at another problem with the memory operand issue I mentioned earlier. What should the precision of this operand be? It is not really any of the types we have. This definition will throw an error on another precision. Should I implement an instruction for each precision? Should there be a way to talk about memory addresses in Exo?
You can also add some nice helpers in the stdlib on top of this mechanism above:
The text was updated successfully, but these errors were encountered: