Share work across configurations by setting working directory for actions then canonicalizing that working directory in RE #611

thetimmorland · 2024-03-28T14:22:24Z

Buck2 struggles to share work between configurations due to the configuration hash appearing in output artifacts. Because an output artifact always appears in the command line, two configurations cannot share an action, even if the inputs and other command line arguments that action are identical; the action digests sent to remote execution (RE) will always differ, sometimes solely because of the output artifact path.

To increase sharing across configurations one approach would be to:

Set up each action with the current working directory set to buck-out/v2/gen/root/$HASH and relativize all artifact paths appropriately.
Enable canonicalization of the working directory in RE (eg buck-out/v2/gen/root/$HASH becomes buck-out/v2/gen/root/00000000000000 regardless of the config)

To make things more concrete, the action PWD=. cc main.c -o buck-out/v2/gen/root/200212f73efcd57d/__main__/main would become PWD=buck-out/v2/gen/root/200212f73efcd57d cc ../../../../../main.c -o __main__/main, but under RE it would actually run as PWD=buck-out/v2/gen/root/00000000000000 cc ../../../../../main.c -o __main__/main.

With these changes the configuration hash should not affect RE action digests, so if two configurations happen to perform the same action, duplicate work is avoided. Since the compilation occurred with relative paths it should be safe to materialize it into a different directory than RE wrote to without breaking debug info or other path information which may leak into the output artifact.

The inspiration for this approach comes from https://github.com/bazelbuild/reclient which supports a -canonicalize_working_dir flag.

I just wanted to share the idea. If it is a smallish change maybe I could take a stab at it.

Thanks so much for all your hard work open sourcing buck2!

The text was updated successfully, but these errors were encountered:

cjhopman · 2024-03-29T19:32:13Z

I think a challenge here is that basically every action has inputs that are produced by other actions. You would need to find a way to canonicalize those as well, and that's really difficult. Bazel had an interesting approach they were experimenting with described here: https://docs.google.com/document/d/17snvmic26-QdGuwVw55Gl0oOufw9sCVuOAvHqGZJFr4/edit#heading=h.5mcn15i0e1ch. I'm not sure what the status of that is today.

We're planning on experimenting with another approach for this sometime this year. There's a lot of complications (that bazel doc talks about a few and has comments on some others).

thetimmorland · 2024-03-29T20:08:24Z

think a challenge here is that basically every action has inputs that are produced by other actions. You would need to find a way to canonicalize those as well, and that's really difficult.

Assuming the build does not use transitions wouldn't generated inputs already be canonicalized?

PWD=buck-out/v2/gen/root/00000000000000 ./../../../../gen_src.py -o __gen_src__/generated.c
PWD=buck-out/v2/gen/root/00000000000000 cc __gen_src__/generated.c -o __main__/main

cjhopman · 2024-03-29T20:21:11Z

I had assumed you meant a model where the action sees its output with path 00000000000 but then it gets rewritten to the real path, because it keeping the 00000000 path just completely doesn't work. Different actions need to produce different output paths, and that needs to be some deterministic mapping in the context of any possible build.

zjturner · 2024-03-30T21:31:54Z

this sounds very similar to the problem I’ve discussed a bunch of times in the past, where it’s very difficult to apply different configurations per target, so you’re forced into global configuration that affects every targets hash even when it’s unnecessary.

ive had some success with transition rules to strip out unnecessary constraints, and they’ve discussed implementing a feature called “configuration trimming” to automatically strip unnecessary constraints, but no guidance yet on if or when that will actually happen

thetimmorland · 2024-04-01T15:58:29Z

I had assumed you meant a model where the action sees its output with path 00000000000 but then it gets rewritten to the real path, because it keeping the 00000000 path just completely doesn't work. Different actions need to produce different output paths, and that needs to be some deterministic mapping in the context of any possible build.

From buck2's perspective, each action still has a unique output path. However, by applying the transformation described above before sending an action to RE and reversing it when you receive your response (there is some bookkeeping required for this), you are able to share cache hits between configurations by hiding the configuration hash from RE.

ive had some success with transition rules to strip out unnecessary constraints, and they’ve discussed implementing a feature called “configuration trimming” to automatically strip unnecessary constraints, but no guidance yet on if or when that will actually happen

Yes, this definitely achieves a similar purpose to configuration trimming, either manual or automatic. I've read your previous threads and they've been very helpful :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Share work across configurations by setting working directory for actions then canonicalizing that working directory in RE #611

Share work across configurations by setting working directory for actions then canonicalizing that working directory in RE #611

thetimmorland commented Mar 28, 2024

cjhopman commented Mar 29, 2024

thetimmorland commented Mar 29, 2024

cjhopman commented Mar 29, 2024

zjturner commented Mar 30, 2024

thetimmorland commented Apr 1, 2024

Share work across configurations by setting working directory for actions then canonicalizing that working directory in RE #611

Share work across configurations by setting working directory for actions then canonicalizing that working directory in RE #611

Comments

thetimmorland commented Mar 28, 2024

cjhopman commented Mar 29, 2024

thetimmorland commented Mar 29, 2024

cjhopman commented Mar 29, 2024

zjturner commented Mar 30, 2024

thetimmorland commented Apr 1, 2024