Skip to content

Profiling Spark Using YourKit

Reynold Xin edited this page Oct 4, 2013 · 3 revisions

This page has been moved to the Apache Spark confluence wiki: https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage

Here are instructions on profiling Spark applications using YourKit Java Profiler.

On Spark EC2 images

  1. After logging into the master node, download the YourKit Java Profiler for Linux from the YourKit downloads page (at the time of writing, the latest version is yjp-12.0.5-linux.tar.bz2; you will need to substitute different paths if using a newer version). This file is pretty big (~100 MB) and YourKit downloads site is somewhat slow, so you may consider mirroring this file or including it on a custom AMI.

  2. Untar this file somewhere (in /root in our case): tar xvjf yjp-12.0.5-linux.tar.bz2

  3. Copy the expanded YourKit files to each node using copy-dir: ~/spark-ec2/copy-dir /root/yjp-12.0.5

  4. Configure the Spark JVMs to use the YourKit profiling agent by editing ~/spark/conf/spark-env.sh and adding the lines

    SPARK_DAEMON_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling"
    export SPARK_DAEMON_JAVA_OPTS
    SPARK_JAVA_OPTS+=" -agentpath:/root/yjp-12.0.5/bin/linux-x86-64/libyjpagent.so=sampling"
    export SPARK_JAVA_OPTS
    
  5. Copy the updated configuration to each node: ~/spark-ec2/copy-dir ~/spark/conf/spark-env.sh

  6. Restart your Spark cluster:

    ~/spark/bin/stop-all.sh
    ~/spark/bin/start-all.sh
    
  7. By default, the YourKit profiler agents use ports 10001-10010. To connect the YourKit desktop application to the remote profiler agents, you'll have to open these ports in the cluster's EC2 security groups.

    To do this, sign into the AWS Management Console. Go to the EC2 section and select Security Groups from the Network & Security section on the left side of the page. Find the security groups corresponding to your cluster; if you launched a cluster named test_cluster, then you will want to modify the settings for the test_cluster-slaves and test_cluster-master security groups. For each group, select it from the list, click the Inbound tab, and create a new Custom TCP Rule opening the port range 10001-10010. Finally, click Apply Rule Changes. Make sure to do this for both security groups.

    Note: by default, spark-ec2 re-uses security groups: if you stop this cluster and launch another cluster with the same name, your security group settings will be re-used.

  8. Launch the YourKit profiler on your desktop.

  9. Select "Connect to remote application..." from the welcome screen and enter the the address of your Spark master or worker machine, e.g. ec2-*-*-*-*.compute-1.amazonaws.com

  10. YourKit should now be connected to the remote profiling agent. It may take a few moments for profiling information to appear.

Please see the full YourKit documentation for the full list of profiler agent startup options.