Skip to content
Raphael Carvalho edited this page Jul 30, 2021 · 7 revisions

Quick n' dirty

Record profiling data:

$ perf record --call-graph=dwarf ./build/release/scylla # Profile entire run
$ perf record --call-graph dwarf -p $(pgrep scylla) # Attach to running process

Analyze the recorded data:

$ perf script perf.data > out.perf
$ stackcollapse-perf.pl out.perf > out.folded # This might take a long time
$ flamegraph.pl out.folded > out.svg

The tools stackcollapse-perf.pl and flamegraph.pl are from the FlameGraph project.

Note: --call-graph limits the size of collected stacks to 8kB. This may result in incomplete and mostly useless flamegraphs in some cases. Increasing that limit may help, e.g.: perf record --call-graph dwarf,65000 <path to scylla>.

Other tools

  • Hotspot provides a GUI for analysis of perf data.
  • pmu-tools contain, among other things:
    • ocperf.py that simplifies accessing CPU-model specific counters
    • toplev.py for doing top-down analysis (more here)
  • Intel VTune provides similar functionality to perf+hotspot pair and has a free community licence that permits commercial use.
  • Tomek's scripts provide tools for analyzing scheduling issues

See also

Clone this wiki locally