Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early-building some parts of graph #606

Open
dmezh opened this issue Mar 16, 2024 · 7 comments
Open

Early-building some parts of graph #606

dmezh opened this issue Mar 16, 2024 · 7 comments

Comments

@dmezh
Copy link

dmezh commented Mar 16, 2024

Thanks for the incredible work. I have a question about action build order that I'm looking for some insight on:

Say I have a build graph that looks like:

actions 0-100 --------------------> \
``````````````````````````````````` -----> cli //:target I asked for
action 101 --> actions 102-200 ---> /

Where all of these actions are CPU-bound but where action 101 can't saturate the available CPU / is not perfectly parallel. In my case this is a bunch of C objects that can be compiled right away, plus a bunch of C objects that depend on the output of a code generator first, and then combining those into a binary I ask for on the command line with buck2 build.

Today, I notice buck2 does not look like it builds action 101 with some kind of priority given that it's blocking off a large part of the action graph. If action 101 takes a while, we sit there waiting for it somewhere in the "middle" of the build, whereas we could have frontloaded it by building it earlier in the build. I can do this: (//:long_step builds action 101):

buck2 build //:long_step //:target

And buck2 does start running action 101 earlier and the build finishes faster overall because less time is spent with <100% resource utilization waiting for a blocking action to complete and unblock the rest of the build.

Does buck2 know how to execute the graph in this kind of order where we prioritize steps that block larger numbers of other steps? This makes a material impact to build times; 24 -> 19 seconds in an actual project. I can make a reproducer if you'd like.

@JakobDegen
Copy link
Contributor

Does buck2 know how to execute the graph in this kind of order where we prioritize steps that block larger numbers of other steps?

No, it does not. If I remember correctly, local action execution ordering is currently just a super naive FIFO queue.

For a long time, this wasn't a problem that we cared about for Meta-internal use cases because RE gives you unbounded parallelism. However, we have recently run into cases where we know this causes a slowdown for us too, and so there's definite interest in being smarter.

It's not something that'd really be hard to change implementation wise, the difficult part is just figuring out what a better alternative actually would be. I think the currently favored suggestion was to try round-robin by action category

@zjturner
Copy link
Contributor

Why not a priority queue by number of nodes waiting (directly or indirectly) on the node?

@JakobDegen
Copy link
Contributor

There's a couple problems with implementing that:

  1. Dynamic outputs means you can't even know that number. You could assume that each dynamic output is just one action, but that's really really wrong in some cases.
  2. Dataflow doesn't really work that way in buck, action execution (and other dice nodes) can't do arbitrary inspection of their rdeps.
  3. It's not obvious how you can implement such a thing in a non-quadratic way, keeping in mind that because of concurrent commands new rdeps can appear or existing ones can disappear at any time

Even if you ignore that though, it's not clear to me that it'd be better. If you have one part of the graph which is basically just a path, and another part which is fairly flat but has a large total number of actions, it's pretty easy to continue making the same mistake

@dmezh
Copy link
Author

dmezh commented Mar 21, 2024

@JakobDegen yeah I figured there could be a complexity constraint on doing this. Could there be a parameter on a rule definition or invocation that just compels buck2 to prioritize building it?

Wrt a priority queue - is it maybe possible to do this on a best-effort basis? Sorry, I'm not very familiar with buck2 internals. Is there some kind of limited window of context we can use to inform ordering of action execution without necessarily doing really deep inspection?

For some context, we are currently using local execution but are planning on using RE. We are going to do a kind of "tiered RE" where individual devs will run a RE server on their local machines that can then delegate to a much larger remote one when it is available. We need to be able to build with no network connection regularly. So we're sometimes running with, sometimes without ~unbounded parallelism available.

@cjhopman
Copy link
Contributor

I think we're going to try to improve this in the next several weeks. It's likely that even something fairly naive is going to be able to get most of the potential improvement here, but it's possible that's wrong. I'd say let's wait for those improvements and then check if the problem is resolved for you or if there's remaining opportunity here.

@JakobDegen
Copy link
Contributor

Could there be a parameter on a rule definition or invocation that just compels buck2 to prioritize building it?

If you had such code written already, I wouldn't personally mind accepting a PR. That being said, I don't think that implementing that would be easier (and actually probably harder) than some of the other approaches that don't require by-hand maintenance. So I agree with Chris, we should try the simpler thing first and then see where to go from there.

@dmezh
Copy link
Author

dmezh commented Apr 2, 2024

Got it, thanks guys. I don't have it written already -- if you're interested and short on time, we can try to pick this up on our end and give a shot at a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants