Draft: Remove more unnecessary conditional run layer calls #1713

chellmuth · 2023-08-18T18:29:46Z

This PR is a work-in-progress that I've hit a (hopefully) minor stumbling block on.

Overview

@tgrant-nv was looking at some of our shader ptx awhile ago and pointed out that a surprisingly high amount of the total instructions were just conditional layer call checks (due to all the stores and loads surrounding calls). The concern was that this was slowing down OptiX module compilation.

OSL has some basic checks in place to remove unnecessary conditional layer calls, but this patch takes things further. The high level strategy is:

Mark all our conditional layer calls during llvm generation
Run a custom llvm optimization pass that associates calls with their basic blocks
For each call, walk up llvm's dominator tree to find basic blocks guaranteed to have already run
If a guaranteed basic block already contains the same call as the current one, remove the current one

It doesn't help the compilation times for every shader, but we've seen speedups as high as 5x on some of our most expensive shaders.

Work in progress

The problem I've run into is that the optimization pass introduces quite a bit of overhead to the ptx generation. Most assets take 10-20% longer to generate ptx with this patch. For us, it's probably still a good trade to turn on for GPU (module compilation is more expensive than ptx generation), but I was hoping some more eyes could help improve things. Especially since I've never written a LLVM pass before, maybe I'm doing things particularly inefficiently.

Any feedback or ideas would be appreciated!

Tests

Checklist:

I have read the contribution guidelines.
I have previously submitted a Contributor License Agreement.
I have updated the documentation, if applicable.
I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
My code follows the prevailing code style of this project.

… calls

linux-foundation-easycla · 2023-08-18T18:29:49Z

❌ The email address for the commit (af31c7b) is not linked to the GitHub account, preventing the EasyCLA check. Consult this Help Article and GitHub Help to resolve. (To view the commit's email address, add .patch at the end of this PR page's URL.) For further assistance with EasyCLA, please submit a support request ticket.

chellmuth · 2023-08-18T18:36:53Z

src/liboslexec/llvm_util.cpp

+        // Delete all the unnecessary stubs identified above
+
+        // WIP: This appears to be the cause of the performance regression
+        // (Not the dominator tree analysis above)
+        llvm::Value* fake = llvm::ConstantInt::get(F.getContext(), llvm::APInt(32, 1));
+        for (llvm::CallInst* inst : delete_queue) {
+            llvm::BasicBlock::iterator iterator(inst);
+            llvm::ReplaceInstWithValue(inst->getParent()->getInstList(), iterator, fake);
+            m_calls_removed++;
+        }


This appears to be where the performance hit originates from. It seems like the replacement and subsequent code-folding is pretty simple, so I'm confused why it is taking so much time compared to everything else that happens while codegen'ing a shader (+10-20%).

Maybe the replacement causes llvm to recook something. Does the 20% vanish if you if (always_false_global) line 1833?

tgrant-nv · 2023-08-21T15:52:19Z

This sounds very promising. A 10-20% increase in JIT time isn't so bad, if it increases the shader performance or even reduces the PTX compile time. We'll take a closer look and see what recommendations we can make.

tgrant-nv · 2023-08-21T16:26:42Z

src/liboslexec/llvm_util.cpp

@@ -1767,6 +1860,8 @@ LLVM_Util::setup_optimization_passes(int optlevel, bool target_host)

    m_llvm_module_passes           = new llvm::legacy::PassManager;
    llvm::legacy::PassManager& mpm = (*m_llvm_module_passes);
+    // TODO: Add based on optlevel
+    mpm.add(new CheckLayerRemovalPass());


Since the new pass is being run on the entire module, I think that means that it's running on all of the shadeops functions. Would it be possible to detect that you are processing a library function (by function name, or perhaps a function attribute), and early-exit from CheckLayerRemovalPass::runOnFunction() in that case?

That might not buy much since module pruning should remove unused library functions, but maybe it will help in some cases.

That seems like a responsible safe-guard to put in, I'll add that.

chellmuth · 2023-08-22T05:14:24Z

This sounds very promising. A 10-20% increase in JIT time isn't so bad, if it increases the shader performance or even reduces the PTX compile time. We'll take a closer look and see what recommendations we can make.

Thanks! Looking forward to your findings.

lgritz · 2023-09-14T18:28:02Z

Is this ready to be a non-draft? Looks like it might need a rebase as well.

chellmuth · 2023-09-14T20:47:26Z

Is this ready to be a non-draft?

It's close. It turned out the speed hit we've been seeing is because removing so many run-layer checks made the remaining ones more palatable for inlining. So LLVM's inlining pass times are growing quite a bit.

Now that it's been merged, I want to test this patch internally on #1680, since that also has inlining ramifications and I want to make sure everything interacts okay.

AlexMWells · 2023-09-18T16:38:04Z

@chellmuth , as an experiment could you add a no-inline attribute to the layer functions to prevent them from being inlined? Not sure you really want to do that, but could be an interesting data point.

Use llvm's dominator tree to remove unnecessary conditional run layer…

af31c7b

… calls

chellmuth commented Aug 18, 2023

View reviewed changes

tgrant-nv reviewed Aug 21, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Draft: Remove more unnecessary conditional run layer calls #1713

Draft: Remove more unnecessary conditional run layer calls #1713

chellmuth commented Aug 18, 2023 •

edited

linux-foundation-easycla bot commented Aug 18, 2023

chellmuth Aug 18, 2023

aconty Aug 21, 2023

tgrant-nv commented Aug 21, 2023

tgrant-nv Aug 21, 2023

chellmuth Aug 22, 2023

chellmuth commented Aug 22, 2023

lgritz commented Sep 14, 2023

chellmuth commented Sep 14, 2023

AlexMWells commented Sep 18, 2023

Draft: Remove more unnecessary conditional run layer calls #1713

Are you sure you want to change the base?

Draft: Remove more unnecessary conditional run layer calls #1713

Conversation

chellmuth commented Aug 18, 2023 • edited

Overview

Work in progress

Tests

Checklist:

linux-foundation-easycla bot commented Aug 18, 2023

chellmuth Aug 18, 2023

Choose a reason for hiding this comment

aconty Aug 21, 2023

Choose a reason for hiding this comment

tgrant-nv commented Aug 21, 2023

tgrant-nv Aug 21, 2023

Choose a reason for hiding this comment

chellmuth Aug 22, 2023

Choose a reason for hiding this comment

chellmuth commented Aug 22, 2023

lgritz commented Sep 14, 2023

chellmuth commented Sep 14, 2023

AlexMWells commented Sep 18, 2023

chellmuth commented Aug 18, 2023 •

edited