Perf
Raphael Carvalho edited this page Jul 30, 2021
·
7 revisions
Record profiling data:
$ perf record --call-graph=dwarf ./build/release/scylla # Profile entire run
$ perf record --call-graph dwarf -p $(pgrep scylla) # Attach to running process
Analyze the recorded data:
$ perf script perf.data > out.perf
$ stackcollapse-perf.pl out.perf > out.folded # This might take a long time
$ flamegraph.pl out.folded > out.svg
The tools stackcollapse-perf.pl
and flamegraph.pl
are from the FlameGraph project.
Note: --call-graph
limits the size of collected stacks to 8kB. This may result in incomplete and mostly useless flamegraphs in some cases. Increasing that limit may help, e.g.: perf record --call-graph dwarf,65000 <path to scylla>
.
- Hotspot provides a GUI for analysis of perf data.
-
pmu-tools contain, among other things:
-
ocperf.py
that simplifies accessing CPU-model specific counters -
toplev.py
for doing top-down analysis (more here)
-
- Intel VTune provides similar functionality to perf+hotspot pair and has a free community licence that permits commercial use.
- Tomek's scripts provide tools for analyzing scheduling issues