Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

zamazan4ik · 2023-11-20T03:46:23Z

Hi!

Recently I checked Profile-Guided Optimization (PGO) improvements on multiple projects. The results are here. According to the tests, PGO can help with achieving better performance. That's why I think trying to optimize tokei with PGO can be a good idea.

I already did some benchmarks and want to share my results.

Test environment

Fedora 39
Linux kernel 6.5.11
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.74
tokei version: the latest for now from the master branch on commit c8e4d0703252c87b1df45382b365c6bb00769dbe
Disabled Turbo boost

Benchmark

For benchmark purposes, I use tokei with tokei llvm-project command, where llvm-project is full LLVM project checkout. For PGO optimization I use cargo-pgo tool. The same benchmark suite was used for the PGO training phase via tokei built with cargo pgo build. PGO optimized results I got with tokei built with cargo pgo optimize build.

Results

I got the following results:

hyperfine --warmup 10 --min-runs 50 './tokei_release ../../llvm-project >> /dev/null' './tokei_optimized ../../llvm-project >> /dev/null'
Benchmark 1: ./tokei_release ../../llvm-project >> /dev/null
  Time (mean ± σ):     630.2 ms ±  15.5 ms    [User: 4380.7 ms, System: 1760.9 ms]
  Range (min … max):   582.3 ms … 666.2 ms    50 runs

Benchmark 2: ./tokei_optimized ../../llvm-project >> /dev/null
  Time (mean ± σ):     576.7 ms ±  16.5 ms    [User: 3227.9 ms, System: 1820.6 ms]
  Range (min … max):   521.0 ms … 608.9 ms    50 runs

Summary
  ./tokei_optimized ../../llvm-project >> /dev/null ran
    1.09 ± 0.04 times faster than ./tokei_release ../../llvm-project >> /dev/null

Just for reference, Tokei in instrumented mode timings:

hyperfine --warmup 1 --min-runs 1 './tokei_instrumented ../../llvm-project >> /dev/null'
Benchmark 1: ./tokei_instrumented ../../llvm-project >> /dev/null
  Time (abs ≡):        27.329 s               [User: 623.284 s, System: 1.771 s]

At least in the scenario above, PGO helps with optimizing tokei performance.

Further steps

I can suggest the following action points:

Perform more PGO benchmarks on tokei. If it shows improvements - add a note to the documentation about possible improvements in tokei performance with PGO.
Providing an easier way (e.g. a build option) to build scripts with PGO can be helpful for the end-users and maintainers since they will be able to optimize tokei according to their workloads.
Optimize pre-built binaries

Testing Post-Link Optimization techniques (like LLVM BOLT) would be interesting too (Clang and Rustc already use BOLT as an addition to PGO) but I recommend starting from the usual PGO.

Here are some examples of how PGO optimization is integrated in other projects:

Rustc: a CI script for the multi-stage build
GCC:
- Official docs, section "Building with profile feedback" (even AutoFDO build is supported)
- A part in a "wonderful" configure script
Clang: Docs
Python:
- CPython: README
- Pyston: README
Go: Bash script
V8: Bazel flag
ChakraCore: Scripts
Chromium: Script
Firefox: Docs
- Thunderbird has PGO support too
PHP - Makefile command and old Centminmod scripts
MySQL: CMake script
YugabyteDB: GitHub commit
FoundationDB: Script
Zstd: Makefile
Foot: Scripts
Windows Terminal: GitHub PR
Pydantic-core: GitHub PR
file.d: GitHub PR
OceanBase: CMake flag

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

zamazan4ik commented Nov 20, 2023

Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

Evaluate Profile-Guided Optimization (PGO) and Post-Link Optimization (PLO) #1039

Comments

zamazan4ik commented Nov 20, 2023

Test environment

Benchmark

Results

Further steps