Early-building some parts of graph #606

dmezh · 2024-03-16T16:57:53Z

Thanks for the incredible work. I have a question about action build order that I'm looking for some insight on:

Say I have a build graph that looks like:

actions 0-100 --------------------> \
``````````````````````````````````` -----> cli //:target I asked for
action 101 --> actions 102-200 ---> /

Where all of these actions are CPU-bound but where action 101 can't saturate the available CPU / is not perfectly parallel. In my case this is a bunch of C objects that can be compiled right away, plus a bunch of C objects that depend on the output of a code generator first, and then combining those into a binary I ask for on the command line with buck2 build.

Today, I notice buck2 does not look like it builds action 101 with some kind of priority given that it's blocking off a large part of the action graph. If action 101 takes a while, we sit there waiting for it somewhere in the "middle" of the build, whereas we could have frontloaded it by building it earlier in the build. I can do this: (//:long_step builds action 101):

buck2 build //:long_step //:target

And buck2 does start running action 101 earlier and the build finishes faster overall because less time is spent with <100% resource utilization waiting for a blocking action to complete and unblock the rest of the build.

Does buck2 know how to execute the graph in this kind of order where we prioritize steps that block larger numbers of other steps? This makes a material impact to build times; 24 -> 19 seconds in an actual project. I can make a reproducer if you'd like.

The text was updated successfully, but these errors were encountered:

JakobDegen · 2024-03-16T20:53:34Z

Does buck2 know how to execute the graph in this kind of order where we prioritize steps that block larger numbers of other steps?

No, it does not. If I remember correctly, local action execution ordering is currently just a super naive FIFO queue.

For a long time, this wasn't a problem that we cared about for Meta-internal use cases because RE gives you unbounded parallelism. However, we have recently run into cases where we know this causes a slowdown for us too, and so there's definite interest in being smarter.

It's not something that'd really be hard to change implementation wise, the difficult part is just figuring out what a better alternative actually would be. I think the currently favored suggestion was to try round-robin by action category

zjturner · 2024-03-17T01:14:40Z

Why not a priority queue by number of nodes waiting (directly or indirectly) on the node?

JakobDegen · 2024-03-17T07:41:11Z

There's a couple problems with implementing that:

Dynamic outputs means you can't even know that number. You could assume that each dynamic output is just one action, but that's really really wrong in some cases.
Dataflow doesn't really work that way in buck, action execution (and other dice nodes) can't do arbitrary inspection of their rdeps.
It's not obvious how you can implement such a thing in a non-quadratic way, keeping in mind that because of concurrent commands new rdeps can appear or existing ones can disappear at any time

Even if you ignore that though, it's not clear to me that it'd be better. If you have one part of the graph which is basically just a path, and another part which is fairly flat but has a large total number of actions, it's pretty easy to continue making the same mistake

dmezh · 2024-03-21T18:59:33Z

@JakobDegen yeah I figured there could be a complexity constraint on doing this. Could there be a parameter on a rule definition or invocation that just compels buck2 to prioritize building it?

Wrt a priority queue - is it maybe possible to do this on a best-effort basis? Sorry, I'm not very familiar with buck2 internals. Is there some kind of limited window of context we can use to inform ordering of action execution without necessarily doing really deep inspection?

For some context, we are currently using local execution but are planning on using RE. We are going to do a kind of "tiered RE" where individual devs will run a RE server on their local machines that can then delegate to a much larger remote one when it is available. We need to be able to build with no network connection regularly. So we're sometimes running with, sometimes without ~unbounded parallelism available.

cjhopman · 2024-03-29T20:34:28Z

I think we're going to try to improve this in the next several weeks. It's likely that even something fairly naive is going to be able to get most of the potential improvement here, but it's possible that's wrong. I'd say let's wait for those improvements and then check if the problem is resolved for you or if there's remaining opportunity here.

JakobDegen · 2024-03-30T05:13:35Z

Could there be a parameter on a rule definition or invocation that just compels buck2 to prioritize building it?

If you had such code written already, I wouldn't personally mind accepting a PR. That being said, I don't think that implementing that would be easier (and actually probably harder) than some of the other approaches that don't require by-hand maintenance. So I agree with Chris, we should try the simpler thing first and then see where to go from there.

dmezh · 2024-04-02T02:26:05Z

Got it, thanks guys. I don't have it written already -- if you're interested and short on time, we can try to pick this up on our end and give a shot at a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Early-building some parts of graph #606

Early-building some parts of graph #606

dmezh commented Mar 16, 2024

JakobDegen commented Mar 16, 2024

zjturner commented Mar 17, 2024

JakobDegen commented Mar 17, 2024

dmezh commented Mar 21, 2024

cjhopman commented Mar 29, 2024

JakobDegen commented Mar 30, 2024

dmezh commented Apr 2, 2024

Early-building some parts of graph #606

Early-building some parts of graph #606

Comments

dmezh commented Mar 16, 2024

JakobDegen commented Mar 16, 2024

zjturner commented Mar 17, 2024

JakobDegen commented Mar 17, 2024

dmezh commented Mar 21, 2024

cjhopman commented Mar 29, 2024

JakobDegen commented Mar 30, 2024

dmezh commented Apr 2, 2024