Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No-op Intrinsic in Exo #565

Open
SamirDroubi opened this issue Jan 31, 2024 · 3 comments
Open

No-op Intrinsic in Exo #565

SamirDroubi opened this issue Jan 31, 2024 · 3 comments
Assignees

Comments

@SamirDroubi
Copy link
Collaborator

SamirDroubi commented Jan 31, 2024

There has been various examples of situations where we would like to insert no-ops into Exo programs. GPU synchronization #547 is one example. There are some examples in CPUs (prefetching, barriers, pause, etc). Some other examples are architecture independent (inserting debug information into the programs).

I think there are a few questions to answer with regard to this feature:

  1. What do we want out of them?
  2. How should it be exposed in the IR?
  3. How should codegen treat them?
  4. How should we be able to insert it into programs?
  5. How should other rewrites interact with them?
  6. What do no-ops imply from an equivalence/effects perspective?
  7. What do no-ops imply from a safety perspective?

I think it is useful to have a discussion on what various things we might want from them and from different perspectives so that we can get a general design that is not overfit to one particular use case.


prefetching:
Here are the answers to the questions above driven by ideas I have had to support prefetching on CPUs and some of the issues that show up:

  1. I would like to be able to insert prefetch instructions at random points in the code with a memory operand and other arguments (e.g. locality hints). I would want future rewrites to continuously update the memory operand. I would want the rewrites to ensure that the memory operand always references a buffer that is live. In addition, I want want to be able to perform out-of-bound accesses to the buffer.
  2. My idea is to have a reserved keywords in the compiler e.g. no-op that can be used as a name to a proc call. This proc can accept an arbitrary number of arguments.
  3. Unreplaced no-op calls are simply no-operations and the arguments are expressions in Exo which have no side-effects. So, we can simply skip them at codegen.
  4. There should be a rewrite insert_no-op(Proc, Gap, *NewExprs) which inserts a no-op call at the gap cursor with the arguments being the provided list of new expressions.
  5. Rewrites should continuously update the arguments of this proc call. Just like any other proc call.
  6. No-ops should have no effects. What about effects of other parts of the program on the arguments of the no-op? That probably shouldn't matter since the proc itself is a no-op. So, I think effect analysis can generally skip no-op calls.
  7. What safety guarantees should be respected? Here are the safety restrictions that I currently know of in Exo:
    • Bounds check: We want to be able to specify out-of-bounds memory operands for prefetching. But before that, what is this memory operand going to be? Is it a read expression or a window expression or something else? It is really neither, but we currently have no other way of specify a location with respect to a buffer.
    • Aliasing: should arguments to no-ops be allowed to alias? I don't have a good example of why this should be relaxed here. So, maybe they shouldn't.
    • Mixing data/control values: should this be allowed? Also, don't have a good example.
    • Precision/memory backend checks. Should these be enforced on no-op calls that don't get replaced? Ultimately, no-ops will be replaced by some instruction and so those instructions will be checked. It doesn't matter what the backend checks do for the arguments of an unreplaced no-op call here since the no-op call won't be generated either ways. In any case, we don't really know the type and memory of the no-op proc.
    • Live variables: variables referenced in the arguments of a no-op should always be live.
    • Any other safety checks I am missing ...

Here is how I envision being able to insert prefetches:

insert_no-op(proc, gap, "A[i + 5]", 0)

@instr("...")
def prefetch(A: R @ DRAM, locality_hint: size):
     no-op(A, locality_hint)

replace(proc, "no-op(_)", prefetcht0)

This gets at another problem with the memory operand issue I mentioned earlier. What should the precision of this operand be? It is not really any of the types we have. This definition will throw an error on another precision. Should I implement an instruction for each precision? Should there be a way to talk about memory addresses in Exo?

You can also add some nice helpers in the stdlib on top of this mechanism above:

def insert_prefetch(proc, gap, mem_operand, locality_hint, prefetch_instr):
      proc, (call,) = insert_no-op(proc, gap, mem_operand, locality_hint, rc=True)
      return replace(proc, call, prefetch_instr)

def prefetch_offset_from_access0(proc, gap, access, dims_offset, locality_hint, prefetch_instr):
     # mem_operand = access.idx() + dims_offset
     return insert_prefetch(proc, gap, mem_operand, locality_hint, prefetch_instr)

def prefetch_offset_from_access1(proc, access, dims_offset, locality_hint, prefetch_instr):
     # mem_operand = access.idx() + dims_offset
     # gap = gap before statement containing the `access`
     return insert_prefetch(proc, gap, mem_operand, locality_hint, prefetch_instr)
@skeqiqevian
Copy link
Collaborator

skeqiqevian commented Feb 1, 2024

From the GPU side:

  1. Users should be able to insert sync instructions at arbitrary gaps in the code. These syncs also need additional arguments: either a range or a predicate indicating which blocks/threads should sync.
  2. I think no-op you're proposing is essentially an extension of pass with arguments.
  3. It might be nice to raise an error on no-ops (other than `pass). An unreplaced sync no-op could lead to incorrect code when translating from the sequential semantics to parallel semantics.
  4. Like 2, I think this can just be an extension of insert_pass.
  5. Agreed
  6. (and 7.) From the GPU perspective, existing analyses are for enforcing the sequential semantics of a program, and the backend enforces the equality of sequential -> parallel semantics. So I think it's also safe for scheduling operation analyses to skip on no-ops.

@SamirDroubi
Copy link
Collaborator Author

I think it is possible that pass is just deprecated if this is introduced since it is equivalent to just a no-op() with no arguments.

@SamirDroubi
Copy link
Collaborator Author

Just a general comment on my original post: I suggested above that the reserved name should be no-op, which is what people often refer to them. However, I don't think dashes would be accepted in names in Python. Other alternatives are no_op, noop, or nop.

@skeqiqevian skeqiqevian self-assigned this May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants