Perf

Quick n' dirty

Record profiling data:

$ perf record --call-graph=dwarf ./build/release/scylla # Profile entire run
$ perf record --call-graph dwarf -p $(pgrep scylla) # Attach to running process

Analyze the recorded data:

$ perf script perf.data > out.perf
$ stackcollapse-perf.pl out.perf > out.folded # This might take a long time
$ flamegraph.pl out.folded > out.svg

The tools stackcollapse-perf.pl and flamegraph.pl are from the FlameGraph project.

Note: --call-graph limits the size of collected stacks to 8kB. This may result in incomplete and mostly useless flamegraphs in some cases. Increasing that limit may help, e.g.: perf record --call-graph dwarf,65000 <path to scylla>.

Other tools

Hotspot provides a GUI for analysis of perf data.
pmu-tools contain, among other things:
- ocperf.py that simplifies accessing CPU-model specific counters
- toplev.py for doing top-down analysis (more here)
Intel VTune provides similar functionality to perf+hotspot pair and has a free community licence that permits commercial use.
Tomek's scripts provide tools for analyzing scheduling issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf

Quick n' dirty

Other tools

See also

Clone this wiki locally