GitHub - EdGENetworks/spark-scala-maven-boilerplate-project: This is a skeleton of a Scala project with maven to start using Spark

Instructions:

Follow this article to find more detailed instructions.

Modify the class "MainExample.scala" writing your Spark code, then compile the project with the command:

mvn clean package

Inside the /target folder you will find the result fat jar called spark-scala-maven-project-0.0.1-SNAPSHOT-jar-with-depencencies.jar. In order to launch the Spark job use this command in a shell with a configured Spark environment:

spark-submit --class com.examples.MainExample \
  --master yarn-cluster \
  spark-scala-maven-project-0.0.1-SNAPSHOT-jar-with-depencencies.jar \
  inputhdfspath \
  outputhdfspath

The parameters inputhdfspath and outputhdfspath don't have to present the form hdfs://path/to/your/file but directly /path/to/your/files/ because submitting a job the default file system is HDFS. To retrieve the result locally:

hadoop fs -getmerge outputhdfspath resultSavedLocally

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main/scala/com/examples		src/main/scala/com/examples
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/main/scala/com/examples

src/main/scala/com/examples

.gitignore

.gitignore

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Instructions:

About

Releases

Packages

Languages

EdGENetworks/spark-scala-maven-boilerplate-project

Folders and files

Latest commit

History

Repository files navigation

Instructions:

About

Resources

Stars

Watchers

Forks

Languages