Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to zero-initialize all variables #3516

Closed
tangent-vector opened this issue Jan 25, 2024 · 4 comments · Fixed by #3987
Closed

Option to zero-initialize all variables #3516

tangent-vector opened this issue Jan 25, 2024 · 4 comments · Fixed by #3987
Assignees
Labels
goal:client support Needed to support a slang use case

Comments

@tangent-vector
Copy link
Contributor

All GPU stages typically care about minimizing live state (aka "register pressure"), but this is especially acute for ray tracing pipelines, where thread state may need to be evicted/stored as part of scheduling or rebalancing work. At points where a thread might be suspended, the state that needs to be stored out includes all of the live variables.

Downstream compilers can infer the liveness of variables by looking at loads and stores, but there can be cases where the Slang compiler has more complete information about live ranges that a downstream compiler cannot infer.

A concrete example is when there is a variable declared inside a loop, that is conditionally assigned on each iteration:

for(...)
{
    SomeType tmp;
    ...
    // code that may or may not fully initialize `tmp`
    ...
    someFunctionThatUses( tmp );
    ...
    someFunctionThatMaySuspendThisThread();
}

Because of constraints on the SPIR-V and DXIL representations, compilation for those targets will effectively move the declaration of tmp outside of the loop:

SomeType tmp;
for(...)
{ ... }

As a result, it can be difficult or impossible for a downstream compiler to know that the value of tmp from one loop iteration cannot be observed by the next iteration. The downstream compiler may have to be conservative and save/restore the state of tmp whenever the thread suspends in the loop.

While there are more narrow ways to inform a downstream compiler about liveness, and we can/should support those in Slang, there is also a relatively simple fix that can help in many scenarios: guarantee that all variables are fully initialized at their point of declaration.

Effectively, that means that in the scenario above, even though tmp will be hoisted out of the loop, we would emit a complete initialization of it on each loop iteration:

SomeType tmp;
for(...)
{
    tmp = { 0, ... }; // how ever many `0`s are needed
    ...
}

With that representation, a downstream compiler can easily see that the assignment at the top of each loop iteration effectively "kills" the value of tmp from the previous iteration, so that it is no longer live at the potential thread suspend point.

Because this feature might impact the performance of generated code, it should probably be enabled under a switch at first.

@csyonghe
Copy link
Collaborator

Is there a work load where we can verify the potential gain of this change?

@csyonghe
Copy link
Collaborator

I doubt this matters to downstream today since there is pervasive inlining and SROA.

@nsubtil
Copy link

nsubtil commented Jan 25, 2024

RTX Remix with SER disabled shows strong benefits from a change like this.

@csyonghe
Copy link
Collaborator

csyonghe commented Jan 26, 2024

That's good to know.

When implementing this, we need to be careful about the case where SomeType has resource type fields. Perhaps we should implement it as an IR pass after type legalization, instead of during the initial IR lowering pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
goal:client support Needed to support a slang use case
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants