No-op Intrinsic in Exo #565

SamirDroubi · 2024-01-31T03:32:27Z

There has been various examples of situations where we would like to insert no-ops into Exo programs. GPU synchronization #547 is one example. There are some examples in CPUs (prefetching, barriers, pause, etc). Some other examples are architecture independent (inserting debug information into the programs).

I think there are a few questions to answer with regard to this feature:

What do we want out of them?
How should it be exposed in the IR?
How should codegen treat them?
How should we be able to insert it into programs?
How should other rewrites interact with them?
What do no-ops imply from an equivalence/effects perspective?
What do no-ops imply from a safety perspective?

I think it is useful to have a discussion on what various things we might want from them and from different perspectives so that we can get a general design that is not overfit to one particular use case.

prefetching:
Here are the answers to the questions above driven by ideas I have had to support prefetching on CPUs and some of the issues that show up:

I would like to be able to insert prefetch instructions at random points in the code with a memory operand and other arguments (e.g. locality hints). I would want future rewrites to continuously update the memory operand. I would want the rewrites to ensure that the memory operand always references a buffer that is live. In addition, I want want to be able to perform out-of-bound accesses to the buffer.
My idea is to have a reserved keywords in the compiler e.g. no-op that can be used as a name to a proc call. This proc can accept an arbitrary number of arguments.
Unreplaced no-op calls are simply no-operations and the arguments are expressions in Exo which have no side-effects. So, we can simply skip them at codegen.
There should be a rewrite insert_no-op(Proc, Gap, *NewExprs) which inserts a no-op call at the gap cursor with the arguments being the provided list of new expressions.
Rewrites should continuously update the arguments of this proc call. Just like any other proc call.
No-ops should have no effects. What about effects of other parts of the program on the arguments of the no-op? That probably shouldn't matter since the proc itself is a no-op. So, I think effect analysis can generally skip no-op calls.
What safety guarantees should be respected? Here are the safety restrictions that I currently know of in Exo:
- Bounds check: We want to be able to specify out-of-bounds memory operands for prefetching. But before that, what is this memory operand going to be? Is it a read expression or a window expression or something else? It is really neither, but we currently have no other way of specify a location with respect to a buffer.
- Aliasing: should arguments to no-ops be allowed to alias? I don't have a good example of why this should be relaxed here. So, maybe they shouldn't.
- Mixing data/control values: should this be allowed? Also, don't have a good example.
- Precision/memory backend checks. Should these be enforced on no-op calls that don't get replaced? Ultimately, no-ops will be replaced by some instruction and so those instructions will be checked. It doesn't matter what the backend checks do for the arguments of an unreplaced no-op call here since the no-op call won't be generated either ways. In any case, we don't really know the type and memory of the no-op proc.
- Live variables: variables referenced in the arguments of a no-op should always be live.
- Any other safety checks I am missing ...

Here is how I envision being able to insert prefetches:

insert_no-op(proc, gap, "A[i + 5]", 0)

@instr("...")
def prefetch(A: R @ DRAM, locality_hint: size):
     no-op(A, locality_hint)

replace(proc, "no-op(_)", prefetcht0)

This gets at another problem with the memory operand issue I mentioned earlier. What should the precision of this operand be? It is not really any of the types we have. This definition will throw an error on another precision. Should I implement an instruction for each precision? Should there be a way to talk about memory addresses in Exo?

You can also add some nice helpers in the stdlib on top of this mechanism above:

def insert_prefetch(proc, gap, mem_operand, locality_hint, prefetch_instr):
      proc, (call,) = insert_no-op(proc, gap, mem_operand, locality_hint, rc=True)
      return replace(proc, call, prefetch_instr)

def prefetch_offset_from_access0(proc, gap, access, dims_offset, locality_hint, prefetch_instr):
     # mem_operand = access.idx() + dims_offset
     return insert_prefetch(proc, gap, mem_operand, locality_hint, prefetch_instr)

def prefetch_offset_from_access1(proc, access, dims_offset, locality_hint, prefetch_instr):
     # mem_operand = access.idx() + dims_offset
     # gap = gap before statement containing the `access`
     return insert_prefetch(proc, gap, mem_operand, locality_hint, prefetch_instr)

The text was updated successfully, but these errors were encountered:

skeqiqevian · 2024-02-01T14:53:36Z

From the GPU side:

Users should be able to insert sync instructions at arbitrary gaps in the code. These syncs also need additional arguments: either a range or a predicate indicating which blocks/threads should sync.
I think no-op you're proposing is essentially an extension of pass with arguments.
It might be nice to raise an error on no-ops (other than `pass). An unreplaced sync no-op could lead to incorrect code when translating from the sequential semantics to parallel semantics.
Like 2, I think this can just be an extension of insert_pass.
Agreed
(and 7.) From the GPU perspective, existing analyses are for enforcing the sequential semantics of a program, and the backend enforces the equality of sequential -> parallel semantics. So I think it's also safe for scheduling operation analyses to skip on no-ops.

SamirDroubi · 2024-02-01T15:02:07Z

I think it is possible that pass is just deprecated if this is introduced since it is equivalent to just a no-op() with no arguments.

SamirDroubi · 2024-02-01T15:05:29Z

Just a general comment on my original post: I suggested above that the reserved name should be no-op, which is what people often refer to them. However, I don't think dashes would be accepted in names in Python. Other alternatives are no_op, noop, or nop.

skeqiqevian self-assigned this May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No-op Intrinsic in Exo #565

No-op Intrinsic in Exo #565

SamirDroubi commented Jan 31, 2024 •

edited

skeqiqevian commented Feb 1, 2024 •

edited

SamirDroubi commented Feb 1, 2024

SamirDroubi commented Feb 1, 2024

No-op Intrinsic in Exo #565

No-op Intrinsic in Exo #565

Comments

SamirDroubi commented Jan 31, 2024 • edited

skeqiqevian commented Feb 1, 2024 • edited

SamirDroubi commented Feb 1, 2024

SamirDroubi commented Feb 1, 2024

SamirDroubi commented Jan 31, 2024 •

edited

skeqiqevian commented Feb 1, 2024 •

edited