Skip to content

spektom/spark-flamegraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spark-flamegraph

Build Status

Easy CPU Profiling for Apache Spark applications.

The script spark-submit-flamegraph is a wrapper around standard spark-submit that generates Flame Graph.

Supported Systems

  • Amazon EMR
  • Most Linux distributions
  • Mac (with Homebrew installed)

Prerequisites

The script is adapted for work in Amazon EMR. Otherwise the following utilities must present on your system:

  • perl
  • python2.7 (or set PYTHON environment variable to the Python executabl)
  • pip (or set PIP environment variable to the pip utility)

Running

wget -O /usr/local/bin/spark-submit-flamegraph \
  https://raw.githubusercontent.com/spektom/spark-flamegraph/master/spark-submit-flamegraph

chmod +x /usr/local/bin/spark-submit-flamegraph

Use spark-submit-flamegraph as a replacement for the spark-submit command.

Configuration

To configure use the following environment variables:

Environment Variable Description Default value
SPARK_CMD Spark command to run spark-submit
PYTHON Path to the Python executable python2.7
PIP Path to the pip utility pip

For example, to profile Spark shell session set SPARK_CMD environment variable:

SPARK_CMD=spark-shell /usr/local/bin/spark-submit-flamegraph

Details

The script does the following operations to make profiling Spark applications as easy as possible:

  • Downloads InfluxDB, and starts it on some random port.
  • Starts Spark application using original spark-submit command, with the StatsD profiler Jar in its classpath and with the configuration that tells it to report statistics back to the InfluxDB instance.
  • After running Spark application, queries all the reported metrics from the InfluxDB instance.
  • Run a script that generates the .SVG file.
  • Stops the InfluxDB instance.