Skip to content

greyamigo/spark-plug

Repository files navigation

Spark Plug

An example Spark application with SBT configuration and unit tests.

  • Implements the obligatory "Word Count" example in Spark.
  • Docs on Topics related to Spark : Docs Here

Overview

  • /build.sbt - SBT configuration for the project. Includes spark-core and scalatest as dependencies. Sets fork := true, to work around an issue with the class loader when running or testing Spark applications in SBT.
  • /project/build.properties - Sets the SBT version to use.
  • /src/main/resources/log4j.properties - Sets logging level to WARN, to minimise log output when running or testing Spark applications.
  • /src/main/scala/SparkFixtures.scala - Fixtures for creating a SparkContext, passing it into a closure, and then stopping the SparkContext once the closure has finished. Re-used in application code and tests.
  • /src/main/scala/WordCount.scala - Pure function on RDDs for counting the occurrences of words in a multi-line input.
  • /src/main/scala/WordCountApp.scala - Application that accepts a file name as a command-line argument, starts a SparkContext, reads the file, calculates the word count, and prints the results to stdout. It can run in local mode with hard-coded configuration (master = "local[*]"), or remote mode with configuration supplied from the environment or spark-submit.
  • /src/test/scala/WordCountSpec.scala - Tests the pure code in WordCount using ScalaTest and the fixtures in SparkFixtures.
  • /Docs/*.md - Markdown files on topics relating to Spark
  • /Pics/* - Pics used in Documents

Usage

To compile:

sbt compile

To run tests:

sbt test

To run locally:

sbt 'run --local FILE'

To run remotely:

sbt package
spark-submit target/scala-2.11/spark-plug_2.11-1.0.jar --remote README.md

Run on docker :

sbt docker:publishLocal
docker run -it spark-plug:1.0 --local README.md

Releases

No releases published

Packages

No packages published

Languages