2. Installation
#Installation
-
Install Scala: 2.11.2 (http://www.scala-lang.org/download/) Be sure to add scala/bin folder to your environment.
-
Install Maven 2.0x+ at https://maven.apache.org/download.cgi and be sure to add Maven to your environment.
-
Download SBT: v0.13.5+ http://www.scala-sbt.org/download.html
-
Install Spark 2.0.0 which has a Scala 2.11.2 dependency at http://spark.apache.org/downloads.html.
-
Add the installation folder to your environment SPARK_HOME = /path/to/installation
-
Download the latest version of SciSpark from https://github.com/SciSpark/SciSpark
-
Within your SciSpark folder, run
sbt clean assembly
-
Find where your SciSpark.jar (or similarly named) file is and get its path as follows /path_to_SciSpark/target/scala-2.11/SciSpark.jar. To build SciSpark for different Spark and Scala version combinations please see the NOTE at the bottom.
-
Download and untar Zeppelin 0.5.6 at https://zeppelin.incubator.apache.org/download.html
-
Find zeppelin-env.sh.template in Zeppelin's conf folder and create zeppelin-env.sh with the following command:
cp zeppelin-env.sh.template
-
Point your configuration to your SciSpark jar file by adding the following to zeppelin-env.sh:
export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/path/to/SciSpark.jar" export SPARK_SUBMIT_OPTIONS="--jars /path/to/SciSpark.jar"
-
Start Zeppelin:
bin/zeppelin-daemon.sh start
-
Open your local configuration (localhost:8080/#) and create a new note. Paste the following into the first cell:
//SciSpark imports import org.dia.Parsers import org.dia.core.{ SciSparkContext, SciTensor } import org.dia.algorithms.mcs.MCSOps import org.dia.urlgenerators.{RandomDatesGenerator}
-
Run this note. If it works, your configuration is set up correctly.
-
Now, we want to change the skin of our notebook to have a SciSpark theme. This can be done by downloading a zip file of the Zeppelin web repo at https://github.com/SciSpark/scispark_zeppelin_web. Then, go to your zeppelin installation and replace all folders under webapps/webapp/ with the folders of the same name under your web installation's src folder.
Possible pitfalls:
Your computer may cache some of your web files, resulting in a page that does not display the SciSpark skin correctly. If you suspect this is the case, you can reset the cache with command + shift + R (on Mac).
NOTE : SciSpark can be built for multiple scala and spark versions. Currently the following combinations have been tested and working :
spark=1.6.0 scala=2.10.6
spark=2.0.0 scala=2.10.6
spark=2.0.0 scala=2.11.2
By default sbt clean assembly builds SciSpark for Spark 2.0.0 and Scala 2.11.2 (the latest versions). If you need to build the SciSpark jar for older versions you can specify the parameters like so :
sbt -Dspark.version=1.6.0 -Dscala.version=2.10.6 clean assembly
NB: Sometimes the build fails because of existing files in ~/.ivy2/cache. The usual culprit of such an error is nd4j. An example of such an error in the log is
(*:update) sbt.ResolveException: download failed: org.nd4j#nd4j-native;0.5.0!nd4j-native.jar
If this happens, try the following instructions.
cd ~/.ivy2/cache
rm -r org.nd4j/
then retry the build. See SBT documentation for more on this.
If you need to build Apache Spark for Scala 2.10, run the following commands from your Spark folder:
./dev/change-scala-version.sh 2.10
mvn -Pyarn -Phadoop-2.4 -Dscala-2.10 -DskipTests clean package