Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for more PGO modes #33

Open
zamazan4ik opened this issue Apr 1, 2023 · 6 comments
Open

Add support for more PGO modes #33

zamazan4ik opened this issue Apr 1, 2023 · 6 comments

Comments

@zamazan4ik
Copy link

Hi!

Do you plan to add to cargo-pgo support for the additional PGO modes:

  • AutoFDO support (PGO via sampling instead of instrumentation): https://github.com/google/autofdo . rustc already supports it: rust-lang/rust@a17193d AutoFDO could be useful for users who want to gather profiles directly from production env, without building slow instrumentation-only binaries
  • Add support for BOLT with perf mode instead of instrumentation (for the same reasons as AutoFDO). Yes, I understand that it's only Linux-only feature but still - we have a lot of Linux-users
  • Add support for Propeller (an alternative approach to BOLT but from Google). Now it's a part of AutoFDO. However, I didn't test it yet with Rust binaries

And thank you again for cargo-pgo - it's much easier to apply PGO to Rust-based binaries with it :)

@Kobzol
Copy link
Owner

Kobzol commented Apr 1, 2023

Hi, I don't have any experience with AutoFDO, but if it's integrated at least partly into rustc, I'll try to take a look how hard would it be to support it.

Regarding BOLT, it's currently Linux only also for instrumentation, so that's not a problem for me :) However, I have an AMD CPU that doesn't support LBR profiling with perf for BOLT, so I couldn't test this locally :/ I'll try to use some older PC for that.

As for Propeller, I consider it to be deprecated in favour of BOLT.

@zamazan4ik
Copy link
Author

Hi, I don't have any experience with AutoFDO, but if it's integrated at least partly into rustc, I'll try to take a look how hard would it be to support it.

Would be great!

However, I have an AMD CPU that doesn't support LBR profiling with perf for BOLT, so I couldn't test this locally :/ I'll try to use some older PC for that.

Well, you could run even without LBR support - BOLT will still consume this recording as a valid (e.g. I did the same thing here - ydb-platform/ydb#140 (comment) on my AMD Ryzen 9 5900X and Fedora 37 setup). I think optimization results, in this case, are worse than with LBR support but... I have no desire to change my CPU just for that :)

As for Propeller, I consider it to be deprecated in favour of BOLT.

Nope :) This tool is not deprecated at least from their developers' point of view. E.g you could check the most recent results here - this paper is from 2023. And Propeller developers planning to integrate this tool into the LLVM as well it's already done by BOLT. From my understanding, maybe one day BOLT and Propeller will be somehow merged into one tool (hopefully somewhere directly into LLVM linkers and/or PGO infra) but when we will get it... I cannot predict.

@Kobzol
Copy link
Owner

Kobzol commented Apr 1, 2023

Nope :) This tool is not deprecated at least from their developers' point of view. E.g you could check the most recent results here - this paper is from 2023. And Propeller developers planning to integrate this tool into the LLVM as well it's already done by BOLT. From my understanding, maybe one day BOLT and Propeller will be somehow merged into one tool (hopefully somewhere directly into LLVM linkers and/or PGO infra) but when we will get it... I cannot predict.

Oh, I didn't know that, maybe I mistook it for another tool. Well, if they have an easy way of profiling/instrumenting and optimizing binaries, a reasonable deployment mechanism and some documentation, I'm not opposed :) But I definitely do not plan to do shenanigans in this tool to build and support it if it's code and usage is in the typical software research open-source state 😅 I wonder if I could somehow generalize the BOLT support in cargo-pgo so that users could "plug in" their own instrumenter and optimizer, for any tool they want 🤔.

@zamazan4ik
Copy link
Author

But I definitely do not plan to do shenanigans in this tool to build and support it if it's code and usage is in the typical software research open-source state

Agreed :) However, Propeller is a default Post Link Optimization tool in Google right know (integrated into their build pipelines for a bunch of their services, etc) so I think it's quite usable in real-life, not just "usual research tool" :) Hopefully, this repo could clarify some things about the stuff, what should be done to apply Propeller for a real application (Clang, in the provided case).

I wonder if I could somehow generalize the BOLT support in cargo-pgo so that users could "plug in" their own instrumenter and optimizer, for any tool they want

That's a good question to think. IMHO would be quite difficult to provide stable enough generalization over these tools since they are evolving and quite unstable (at least from a public interface point of view). E.g. BOLT team now is working on a new BOLT approach that is called "Lightning BOLT" and probably would change interfaces/add a one new mode (in additionn to VESPA). Propeller is going on to be merged into LLVM in some form and possibly also would change some way how we should use it.

Propeller has one advantage over BOLT right now - much less memory usage spike. Even on my machine with 32 Gib RAM I am not always able to BOLTify my app due to OOM. Another point of pain - BOLT weakly supports other architectures except x86-64 (but they are working on it).

@zamazan4ik
Copy link
Author

By the way, I am working on gathering all available information regarding PGO in one place - https://github.com/ZaMaZaN4iK/awesome-pgo . Maybe some links would be interesting to you for reading (although I think you already read almost all of them :)

@Kobzol
Copy link
Owner

Kobzol commented Apr 6, 2023

This is what I currently do for BOLT instrumentation:

  1. Set specific linker flags
  2. Clear profile directory
  3. Run a command for each built binary to instrument it

And for BOLT optimization:

  1. Set specific linker flags
  2. Merge profiles
  3. Run a command for each built binary to optimize it

In theory, we could add something like this:

$ cargo pgo custom instrument -- ./instrument.sh

# instrument.sh
propeller instrument $INPUT_BINARY

$ cargo pgo custom optimize -- ./optimize.sh

# optimize.sh
propeller optimize $INPUT_BINARY

If specific compiler/linker flags are needed, they can be passed with RUSTFLAGS=... cargo pgo custom optimize.

Of course, with this generic approach, the user would be responsible for gathering and managing the profiles and writing the instrumentation and optimization scripts. Basically the only added value of cargo pgo would be to serve as a wrapper over cargo. It would pass all the binaries that should be instrumented/optimized to the custom optimization tool. I'm not sure if that is a useful enough feature to add support for custom optimization backends though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants