Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement missing parts of OSS RE caching #477

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cormacrelf
Copy link
Contributor

This enables action caching on OSS builds. There were two missing pieces:

  • write_action_result, for writing to the action cache. This was the major piece; without it, all you have is a bunch of uploaded CAS blobs for various things, but no way to find them again from the action digest. Now that the action results are uploaded as well, buck can find them when it's querying for cached actions based on the calculated digest alone.
  • upload_blob was a less-exercised piece of code, it was only used for uploading stdout and stderr, and even then, only if they were more than 50KiB. I have added a test for it, but not sure if internal is the right one for it to be in, or when these tests get run.

I've successfully used this implementation with BuildBuddy, both from Linux (using remote_enabled = True), and from macOS, with remote_enabled = False but caching still enabled. I have only tested it with RBE v2.3+ output_paths mode.

As an implementation note, I found it very difficult to even figure out that these parts weren't implemented. Buck2 was just saying "0% cache hits mate" and reporting that every re_action_cache (= attempts to fetch a cached action, not attempts to cache an executed action!) had failed. None of these failures were logged anywhere. I don't know what the appropriate mechanism to log this stuff is, but I whacked a tracing::warn for failed action result cache writes in there.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 5, 2023
@cormacrelf
Copy link
Contributor Author

The TODOs around symlinking are related to #222

Copy link
Contributor

@ndmitchell ndmitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for this, and the great testing you have done3. Two things I'm still figuring out:

  1. Why didn't you get a warn - I've noted where I expected that to pop out. Is that line not being hit for you?
  2. Still figuring out if there are internal implications for upload blob taking a digest, or if it should have done so all along.

is_topologically_sorted: false,
})
})?;
anyhow::Ok(ActionResult {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to not just Ok?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just splatter from copy pasting the function below and inverting every operation, instead of coding it from scratch. I'll tidy this up, a few of these don't even need to be Result.

|| !action_result.output_file_symlinks.is_empty()
|| !action_result.output_directory_symlinks.is_empty()
{
anyhow::bail!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to avoid bail because it can be hard to reason about - easier to see the explicit return

@@ -91,6 +91,9 @@ pub struct TSubsysPerfCount {
pub struct TActionResult2 {
pub output_files: Vec<TFile>,
pub output_directories: Vec<TDirectory2>,
// TODO: output_symlinks (use in preference when output_paths mode is used the execution side)
// TODO: output_file_symlinks (deprecated)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why mark this as a todo? Is it something we should add? Or is this just for when servers return deprecated stuff?

@@ -1141,6 +1143,9 @@ impl RemoteExecutionClientImpl {
..Default::default()
},
)
.inspect_err(|err| {
tracing::warn!("write_action_result failed: {err}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That line may have run, but I definitely did not see it in the terminal. Thanks for the pointer, I will have a look at why.

@ndmitchell
Copy link
Contributor

Having upload blob take a digest shouldn't be a big deal since I think the amount of data flowing through this API isn't big enough to matter. I can see why this is more convenient and matches the upload API better.

However, once we are going through that path, maybe we should just delete upload_blob entirely and reuse upload? That already takes a InlinedBlobWithDigest so then we could have fewer functions?

@cormacrelf
Copy link
Contributor Author

However, once we are going through that path, maybe we should just delete upload_blob entirely and reuse upload? That already takes a InlinedBlobWithDigest so then we could have fewer functions?

There is a lot of "code overhead" for each piece of GRPC client API you want to expose. There are about 5 layers of nesting that, for fn upload_blob, add exactly nothing at each layer. Each layer basically does "return this.field.upload_blob(...)". So it would certainly be economical to delete one of those function stacks. However I imagine there are opportunities to collapse the nesting and make it less costly to maintain in the first place. I might report back on that.

@ndmitchell
Copy link
Contributor

Some of these layers of forwarding are so we can inject the internal closed source RE client (closed source because it uses a lot of systems that aren't open source, rather than because it has magical properties). They might need to stay (or might not).

@benbrittain
Copy link
Contributor

Is there anything I can do here to help push this to being landed? I'm happy to finish the PR if you'd like @cormacrelf

@cormacrelf
Copy link
Contributor Author

Hi @benbrittain, I haven't had any time to work on this recently. I've given you push access to my fork cormacrelf/buck2, so you can keep working on the rbe-writes branch.

@miaoyipu
Copy link
Contributor

miaoyipu commented May 1, 2024

Is there anything I can do here to help push this to being landed? I'm happy to finish the PR if you'd like @cormacrelf

@cormacrelf @benbrittain any update on this PR? This might help our RE execution efficiency, and I can help if no one is working on it.

@benbrittain
Copy link
Contributor

Hey @miaoyipu! It's on my near term TODO but I'm certainly not against you picking it up. If you do so though, let me know :)

@avdv
Copy link
Contributor

avdv commented May 10, 2024

Hi @cormacrelf

I've successfully used this implementation with BuildBuddy, both from Linux (using remote_enabled = True), and from macOS, with remote_enabled = False but caching still enabled. I have only tested it with RBE v2.3+ output_paths mode.

I am also trying to use this with Buildbuddy and disabled remote_enabled on Linux but I cannot get it to work. Could you share your config so I can compare that to mine? Thank you!

\edit: Never mind, it's working now. Thank you for the implementation @cormacrelf!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants