You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All GPU stages typically care about minimizing live state (aka "register pressure"), but this is especially acute for ray tracing pipelines, where thread state may need to be evicted/stored as part of scheduling or rebalancing work. At points where a thread might be suspended, the state that needs to be stored out includes all of the live variables.
Downstream compilers can infer the liveness of variables by looking at loads and stores, but there can be cases where the Slang compiler has more complete information about live ranges that a downstream compiler cannot infer.
A concrete example is when there is a variable declared inside a loop, that is conditionally assigned on each iteration:
for(...)
{
SomeType tmp;
...
// code that may or may not fully initialize `tmp`
...
someFunctionThatUses( tmp );
...
someFunctionThatMaySuspendThisThread();
}
Because of constraints on the SPIR-V and DXIL representations, compilation for those targets will effectively move the declaration of tmp outside of the loop:
SomeType tmp;
for(...)
{ ... }
As a result, it can be difficult or impossible for a downstream compiler to know that the value of tmp from one loop iteration cannot be observed by the next iteration. The downstream compiler may have to be conservative and save/restore the state of tmp whenever the thread suspends in the loop.
While there are more narrow ways to inform a downstream compiler about liveness, and we can/should support those in Slang, there is also a relatively simple fix that can help in many scenarios: guarantee that all variables are fully initialized at their point of declaration.
Effectively, that means that in the scenario above, even though tmp will be hoisted out of the loop, we would emit a complete initialization of it on each loop iteration:
SomeType tmp;
for(...)
{
tmp = { 0, ... }; // how ever many `0`s are needed
...
}
With that representation, a downstream compiler can easily see that the assignment at the top of each loop iteration effectively "kills" the value of tmp from the previous iteration, so that it is no longer live at the potential thread suspend point.
Because this feature might impact the performance of generated code, it should probably be enabled under a switch at first.
The text was updated successfully, but these errors were encountered:
When implementing this, we need to be careful about the case where SomeType has resource type fields. Perhaps we should implement it as an IR pass after type legalization, instead of during the initial IR lowering pass.
All GPU stages typically care about minimizing live state (aka "register pressure"), but this is especially acute for ray tracing pipelines, where thread state may need to be evicted/stored as part of scheduling or rebalancing work. At points where a thread might be suspended, the state that needs to be stored out includes all of the live variables.
Downstream compilers can infer the liveness of variables by looking at loads and stores, but there can be cases where the Slang compiler has more complete information about live ranges that a downstream compiler cannot infer.
A concrete example is when there is a variable declared inside a loop, that is conditionally assigned on each iteration:
Because of constraints on the SPIR-V and DXIL representations, compilation for those targets will effectively move the declaration of
tmp
outside of the loop:As a result, it can be difficult or impossible for a downstream compiler to know that the value of
tmp
from one loop iteration cannot be observed by the next iteration. The downstream compiler may have to be conservative and save/restore the state oftmp
whenever the thread suspends in the loop.While there are more narrow ways to inform a downstream compiler about liveness, and we can/should support those in Slang, there is also a relatively simple fix that can help in many scenarios: guarantee that all variables are fully initialized at their point of declaration.
Effectively, that means that in the scenario above, even though
tmp
will be hoisted out of the loop, we would emit a complete initialization of it on each loop iteration:With that representation, a downstream compiler can easily see that the assignment at the top of each loop iteration effectively "kills" the value of
tmp
from the previous iteration, so that it is no longer live at the potential thread suspend point.Because this feature might impact the performance of generated code, it should probably be enabled under a switch at first.
The text was updated successfully, but these errors were encountered: