Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[InstCombine] May make task closures large by sinking instructions from parent to child tasks #80

Open
VictorYing opened this issue Nov 29, 2018 · 2 comments

Comments

@VictorYing
Copy link
Contributor

VictorYing commented Nov 29, 2018

InstructionCombining sinks any instruction it can if that instruction is only used in a single successor of it's current basic block:

// If the user is one of our immediate successors, and if that successor
// only has us as a predecessors (we'd have to split the critical edge
// otherwise), we can keep going. Don't do this if the successor
// follows through a sync instruction, because that's a pessimization.
if (UserIsSuccessor && UserParent->getUniquePredecessor() &&
!isa<SyncInst>(BB->getTerminator())) {
// Okay, the CFG is simple enough, try to sink this instruction.
if (TryToSinkInstruction(I, UserParent)) {
DEBUG(dbgs() << "IC: Sink: " << *I << '\n');
MadeIRChange = true;

This presents significant problems for me, when compiling Swarm tasks. Consider a task that starts out like

foo = a + b * c + d * e + f * g + ....;
spawn {
  some_var = foo;
}

and assume all those variables a, b, c, d, etc. are local values held in registers, so in the Tapir SSA form, there are no memory operations here other than the store inside the task to the variable some_var, which is the only variable here that is not promoted to registers. This task will have a very tiny closure, containing the value tmp computed in the parent. This is very good for targets such as Swarm that efficiently support tasks with very small numbers of arguments.

But InstCombine sees all the arithmetic instructions are used only inside the detached block and sinks the computation there, producing a task like

spawn {
  some_var = a + b * c + d * e + f * g + ....;
}

Now this is a task with many inputs that need to be captured from its parent task and passed to the child task, in order for the child task to perform the arithmetic. This significantly increases the task overhead.

On the other hand, there are certainly cases where sinking instructions from a parent task to a child task is a good idea that shortens the critical path, so this seems like a problem where you'd want to come up with better heuristics to determine what instructions to sink (which, on the Swarm side of things, we've started to think about: https://github.mit.edu/swarm/Swarm-IR/blob/77c721b9f4bb279331daa4fd9ba12868fec75af3/lib/Transforms/Utils/SwarmUtils.cpp#L565-L586)

This is probably a much less important optimization issue for Tapir's compilation of Cilk tasks, which are at least usually a little coarser grained, compared to my compilation of Swarm tasks, which tend to be very fine grained and I legitimately frequently have tasks that store to a single location, so moving extra instructions around really affects task overheads which in turn can easily cause large changes in work efficiency. Nonetheless, I think this problem isn't fundamentally Swarm-specific. You can see my patch where I simply killed off instruction sinking through detaches here: https://github.mit.edu/swarm/Swarm-IR/commit/809027c0477795f1fe26f0b159426a8648321151

@neboat
Copy link
Collaborator

neboat commented Nov 29, 2018

Thanks for pointing this out.

I don't think Tapir should disable instruction sinking across detaches in general. Without taking task-invocation overhead into account, this form of instruction sinking on its own is a win in terms of parallelism: it does not increase the asymptotic work, and it can only reduce the critical-path length (span). In addition, instruction sinking often facilitates other optimizations, such as other instruction combining or simplification, which can reduce the work of the computation.

Task-invocation overhead does mess with this analysis. Although instruction sinking alone shouldn't affect asymptotic work (and can only reduce the span), more instructions might be needed in practice to pass the function arguments. Hence some form of improved cost analysis would be desirable, as you mentioned. The ability for instruction sinking to enable other compiler optimizations seems to complicate the cost analysis.

It's probably worth considering adding cost analysis based on the target parallel-runtime library, which should probably be encoded in TargetLibraryInfo. For runtimes such as Swarm, instruction sinking might be harder to justify in terms of cost. For other runtimes it might not matter as much. In fact, for some runtimes, it may be an optimization to serialize tasks as small as those in your example, due to the task-invocation overhead.

@VictorYing
Copy link
Contributor Author

VictorYing commented Dec 4, 2018

I agree it's an interesting trade-off. One thing to consider is that we're not talking about sinking memory operations or any operation with side effects: TryToSinkInstruction already refuses to move those across task boundaries. We're talking therefore mainly about sinking fairly cheap arithmetic instructions or things like address computations. At least in Swarm, and I imagine in other parallel systems as well, computation is cheap, memory accesses and communication/synchronization are expensive, and saving on task communication bandwidth tends to be a win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants