Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

Open
zamazan4ik opened this issue Nov 20, 2023 · 0 comments

Comments

@zamazan4ik
Copy link

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. According to the tests, PGO can help with achieving better performance. That's why I think trying to optimize tokei with PGO can be a good idea.

I already did some benchmarks and want to share my results.

Test environment

  • Fedora 39
  • Linux kernel 6.5.11
  • AMD Ryzen 9 5900x
  • 48 Gib RAM
  • SSD Samsung 980 Pro 2 Tib
  • Compiler - Rustc 1.74
  • tokei version: the latest for now from the master branch on commit c8e4d0703252c87b1df45382b365c6bb00769dbe
  • Disabled Turbo boost

Benchmark

For benchmark purposes, I use tokei with tokei llvm-project command, where llvm-project is full LLVM project checkout. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase via tokei built with cargo pgo build. PGO optimized results I got with tokei built with cargo pgo optimize build.

Results

I got the following results:

hyperfine --warmup 10 --min-runs 50 './tokei_release ../../llvm-project >> /dev/null' './tokei_optimized ../../llvm-project >> /dev/null'
Benchmark 1: ./tokei_release ../../llvm-project >> /dev/null
  Time (mean ± σ):     630.2 ms ±  15.5 ms    [User: 4380.7 ms, System: 1760.9 ms]
  Range (min … max):   582.3 ms … 666.2 ms    50 runs

Benchmark 2: ./tokei_optimized ../../llvm-project >> /dev/null
  Time (mean ± σ):     576.7 ms ±  16.5 ms    [User: 3227.9 ms, System: 1820.6 ms]
  Range (min … max):   521.0 ms … 608.9 ms    50 runs

Summary
  ./tokei_optimized ../../llvm-project >> /dev/null ran
    1.09 ± 0.04 times faster than ./tokei_release ../../llvm-project >> /dev/null

Just for reference, Tokei in instrumented mode timings:

hyperfine --warmup 1 --min-runs 1 './tokei_instrumented ../../llvm-project >> /dev/null'
Benchmark 1: ./tokei_instrumented ../../llvm-project >> /dev/null
  Time (abs ≡):        27.329 s               [User: 623.284 s, System: 1.771 s]

At least in the scenario above, PGO helps with optimizing tokei performance.

Further steps

I can suggest the following action points:

  • Perform more PGO benchmarks on tokei. If it shows improvements - add a note to the documentation about possible improvements in tokei performance with PGO.
  • Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize tokei according to their workloads.
  • Optimize pre-built binaries

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant