Skip to content

neversleepz/gradle-spark

Repository files navigation

Intro to Spark Workshop

Melbourne Apache Spark + Hadoop Users Group - Apr 20 2015

About

The Goal of this workshop is to provide attendees with an introduction to Apache Spark.

We’ll cover

  1. Intro to Spark

  2. Intro to the REPL and executing Spark commands

  3. Intro to writing Spark Applications in your IDE

About this project

We’ve prepared this project as the goto point for all resources needed for the workshop.

It is designed to run without an internet connection but we make no gurantees.

You can either use this project, cloned from Github, and run a task to link:'Get what you need'[Get what you need] or you can grab a USB stick off one of the organisers to copy everything to your hard-drive.

This project uses the Gradle build tool. You don’t need to know Gradle or the Groovy language in order to use Spark, however Gradle does allow for expressive declarative builds that allow us to orchestrate what you need in order to get Spark up and running on your local machine, in addition to being a useful tool for compiling, testing and deploying your Spark apps.

We’ve prepared this build file so that you can get to the Spark webpages, and supporting IDEs quickly and that you can start the command line REPL in order for the workshop.

The intent here is that Gradle will download the required parts for you. It is platform agnostic meaning the same build file will work on Mac, Linux & even Windows.

Requirements

You need to install Java 8 for your platform.

Spark has a dependency on Scala, however the Gradle Build File will pull this down for you.

You’ll also need a Java IDE with some Scala plugins. There are Gradle tasks to open up the download pages in your browser for both IntelliJ and Eclipse.

IDEs

Both IDEs this workshop recommends are free:

  • IntelliJ Community Edition is a free IDE for Java, Scala + a host of other languages. If you haven’t used a Java IDE before its recommended you try this.

  • ScalaIDE An Eclipse distribution with the Scala plugin already installed. Those used to Eclipse can stay using it for their Spark endevours.

If you already have Eclipse or IDEA installed, you just need to get the Scala plugin. We’ve tested with the latest versions available. If following on during the workshop it may be best to use one of the IDEs provided.

Get what you need

Java

Install Java 8 from the link in the previous section. Note: One of the organisers will have a USB stick with versions of Java for all the relevant platforms.

Gradle

At a command prompt type:

./gradlew tasks

This will download Gradle 2.3 if you don’t already have it and show you a list of tasks you’ll find useful for this workshop. There is also some 'Documentation' tasks you with quick links to spark home page, scala IDEs, etc.

An IDE

If you don’t have an IDE installed, you can use Gradle to download one. Look for IDE Download Tasks in the Gradle. For example

./gradlew downloadEclipseForMacOSXCocoa64bit

If using IntelliJ dont forget to get the Scala Plugin using

./gradlew downloadIdeaForSparkPlugin

Spark & Data

The next step is to run

./gradlew prepareUsb

This will download the Spark distribution, and the Spark example data we are going to use in the workshop.

Note
Locations

Spark, Gradle is downloaded to the Tools directory

Data is put in the Data directory

Get Started

Starting the REPL

Open Projects in your IDE

Eclipse

Generate Eclipse Project

Run ./gradlew eclipse to generate the Eclipse project files with the jar’s already preconfigured.

In Eclipse go to File  Import Project…​ and select Existing Projects and click next. Select this project directory where you should see the spark project and click finish.

Tip
Optional Step to set correct Scala version

This Scala project works with Scala 2.10. ScalaIDE may default to Scala 2.11 though.

If you see an error in Eclipse that Scala version used doesnt match that of your libraries then right click on the project and select menu:Properties…​

In the properties window that appears select Scala Compiler on the Left Hand pane, then check Use Project Settings. The Scala Installation dropdown will be enabled allowing you to select "Latest 2.10 bundle (dynamic)".

Whilst you are there, set your target as jvm-1.8. Hit OK

IntelliJ IDEA

Install the IDEA plugin
from this USB

If you ran gradlew downloadIdeaForSparkPlugin (or got the USB) prior you will have the scala plugin ready to go.

  1. Open IDEA preferences.

  2. Press Install Plugin from disk

  3. Select where the plugin was downloaded: tools\ide\Idea\scalaPlugin\scala-intellij-bin-1.4.15.zip

  4. Press Ok.

  5. Once the plugin is installed, you will be prompted to restart.

  6. Restart IDEA

via Internet

If you have an internet connection you can download the plugin manually.

  1. Open IDEA preferences.

  2. Press Install Jetbrains Plugin

  3. Type Scala in the window that pops up

  4. Right click the Scala listing and select 'Download and Install'

  5. Press Ok on both the windows.

  6. Once the plugin is installed, you will be prompted to restart.

  7. Restart IDEA

Open the Project

If you’ve got a project already open select File  Open…​ and select the build.gradle file from this directory. This will configure Scala for you.

Note
You can also use gradlew idea to generate Idea project files (Idea v12 & 13)

About

A Gradle build file to help get started with Apache Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published