Skip to content

EdGENetworks/spark-scala-maven-boilerplate-project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Instructions:

Follow this article to find more detailed instructions.

Modify the class "MainExample.scala" writing your Spark code, then compile the project with the command:

mvn clean package

Inside the /target folder you will find the result fat jar called spark-scala-maven-project-0.0.1-SNAPSHOT-jar-with-depencencies.jar. In order to launch the Spark job use this command in a shell with a configured Spark environment:

spark-submit --class com.examples.MainExample \
  --master yarn-cluster \
  spark-scala-maven-project-0.0.1-SNAPSHOT-jar-with-depencencies.jar \
  inputhdfspath \
  outputhdfspath

The parameters inputhdfspath and outputhdfspath don't have to present the form hdfs://path/to/your/file but directly /path/to/your/files/ because submitting a job the default file system is HDFS. To retrieve the result locally:

hadoop fs -getmerge outputhdfspath resultSavedLocally

About

This is a skeleton of a Scala project with maven to start using Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Scala 100.0%