Skip to content

sbordya/json_reader_bordea

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

json_reader_bordea

This program parallely reads json file with spark and outputs parsed objects to the command line. As a template was used MrPowers/spark-sbt.g8 project. You can get it with the following command:

$ sbt new MrPowers/spark-sbt.g8

How to install

First install sbt and download Spark (version 2.4.x, scala 2.11). You will also need to download the json file with objects. Then clone the repository:

$ git clone git@github.com:sbordya/json_reader_bordea.git

Navigate to the project folder and prepare a jar file with dependencies (you will be able to see it under the path <path_to_project>/target/scala-2.11/json_reader_bordea-assembly-0.0.1.jar):

$ sbt assembly

Now you are ready to run the program:

$ <path_to_spark>/bin/spark-submit --master local[*] --class com.example.JsonReader <path_to_project>/target/scala-2.11/json_reader_bordea-assembly-0.0.1.jar <path_to_json_file>

If you want to see the graph of dependencies, you can run:

$ sbt dependencyBrowseGraph

Project Goals

This program was created during the data engineer course on otus.ru.

About

Parallel json reader implemented in Spark and json4s

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages