AD Cleanup: Avoid dealing with invalid IR by representing out-of-scope accesses through Push
and Pop
instructions
#4182
Labels
goal:forward looking
kind:cleanup
tech debt and rough edges
priority:medium
nice to have in next milestone
Milestone
The unzipping step creates two copies of the control-flow graph, one holding primal instructions and the other holding differential instructions. The differential blocks run immediately after all the primal ones, so if there is any non-trivial branching, there will be invalid accesses of instructions generated in conditional regions.
Example:
Consider a function f with control-flow:
This will result in the following code after the unzipping step:
This contains invalid out-of-scope accesses.
For now, we allow this until the last step when the checkpointing pass legalizes all these accesses by inserting loads and stores as necessary.
This can be cleaned up in a more principled way by using ideas from "Tape-based AD". This approach assumes a dynamic infinite 'stack' that the primal code can push intermediate values to. The differential blocks then 'pop' this information out and use it.
We can implement this approach by introducing instructions like
IRTapePush
andIRTapePop
This allows us to have different implementations depending on how we lower these instructions. The current approach would simply be the static method where we introduce a struct with fields for all unique
IRTapePush
instructions (and arrays whereverIRTapePush
occurs within a loop region).This could also enable user written derivative code to make use of the intermediate context. Right now, reverse-mode user-written derivatives need to use workarounds such as non-differentiable auxiliary parameters, or write a fully self-contained derivative method.
The text was updated successfully, but these errors were encountered: