SciSparkPerf

#Introduction

SciSparkPerf is a framework used to benchmark SciSpark and it's applications. SciSpark is a scalable scientific processing platform that makes interactive computation and exploration possible.

SciSparkPerf uses sbt-jmh for it's benchmarking framework.

We assume you have followed the instructions to install and get SciSpark running. Otherwise you can find the instructions here.
Join the SciSpark gitter chat if you have general questions about setting up SciSparkPerf or SciSpark.
IMPORTANT: If you want to benchmark your application against SciSpark for research purposes we encourage you to reach out to us. This benchmark suite is a nice way of obtaining performance results and doing comparisons. We would love to have your contributions as well!

#Installation - standalone mode The following instructions detail how to run the benchmarking suite as-is. If you want to benchmark the suite in clustered mode with data on hdfs go to the cluster mode installation

After you have installed SciSpark, go to the SciSpark directory
Have sbt publish the project to the local ivy repository so we can use it in the benchmark.
```
sbt publishLocal
```

Clone this repository

git clone https://github.com/rahulpalamuttam/SciSparkPerf

To run the suite execute
```
./run.sh
```
The results for each of the methods can be found in a csv file named results.csv

#Installation - cluster mode The following instructions detail how to run the benchmarking suite in clustered mode.

Follow steps 1 - 5 in the standalone mode instructions
If you haven't already built your jar now is the time to do so. Within your SciSpark folder, run sbt clean assembly
Set the fs.base.path to point to your hdfs uri Example hdfs uri: hdfs://HOSTNAME:8090/data/resources
Note that the dataset in the tests are located in src/jmh/resources Each folder's dataset size is denoted by their name - 100mb, 1gb, 10gb, 100gb In standalone each file just contains the same 4 files. You can put this on hdfs by executing
```
hadoop fs -put src/jmh/resources /
```
For testing purposes upload your own data into these directories Try to put enough data to meet the sizes indicated by the folder names
Set the spark master uri by setting the spark.master properties in the properties file An example URI: spark://HOSTNAME:7077 AN example mesos URI: mesos//HOSTNAME:5050
The test effectively talks to spark as a client. To be able to use it in this mode, we need to set the executor uri to point to the spark-version.tgz file. You can either set it to a http uri (where you downloaded your spark binary from. You can also put it into hdfs and set the property as an hdfs uri. An example http url is : http://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz An example hdfs uri is : hdfs://HOSTNAME:8090/spark/spark-1.6.0-bin-hadoop2.4.tgz
If using mesos you also need to set MESOS_NATIVE_JAVA_LIBRARY export variable as well as the SPARK_EXECUTOR_URI export variable in the run.sh bash script. MESOS_NATIVE_JAVA_LIBRARY needs to point to where the libmesos.so file is located. This can usually be found (if mesos is installed correctly) in /usr/local/lib/libmesos.s Otherwise it can be found in /MESOS_HOME/build/src/.libs/libmesos.so
To run the suite execute
```
./run.sh
```

#Logging

For now we have directed all logging to log.out What should be seen on screen is just the JMH benchmark output.

#Mesos Client Mode

To learn more about the Mesos client mode visit the documentation here.

#API

##Project Status

##Want to Use or Contribute to this Project? Contact us at scispark-team@jpl.nasa.gov

##Technology This project utilizes the following open-source technologies Apache Spark and is based on the NASA AIST14 project led by the PI Chris Mattmann. The SciSpark team is at at JPL and actively working on the project.

###Apache Spark

Apache Spark is a framework for distributed in-memory data processing. It runs locally, on an in-house or commercial cloud environment.

###sbt-jmh

sbt-jmh is a SBT plugin for running OpenJDK JMH benchmarks. JMH is a Java harness for building, running, and analysing nano/micro/milli/macro benchmarks written in Java and other languages targeting the JVM.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
project		project
src		src
Properties		Properties
README.md		README.md
build.sbt		build.sbt
run.sh		run.sh
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

project

project

src

src

Properties

Properties

README.md

README.md

build.sbt

build.sbt

run.sh

run.sh

scalastyle-config.xml

scalastyle-config.xml

Repository files navigation

SciSparkPerf

About

Releases

Packages

Languages

SciSpark/SciSparkPerf

Folders and files

Latest commit

History

Repository files navigation

SciSparkPerf

About

Resources

Stars

Watchers

Forks

Languages