This program computes for each district in Boston following statistics:
crimes_total (total amount of crimes), crimes_monthly (a median value of
crimes per month), frequent_crime_types (3 most frequent types of crime),
lat (average latitude of crimes) and lng (average longitude of crimes).
As a template was used MrPowers/spark-sbt.g8
project. You can get it with
the following command:
$ sbt new MrPowers/spark-sbt.g8
First install sbt and download Spark (version 2.4.x, scala 2.11). You will also need to download the crime.csv and offense_codes.csv. Then clone the repository:
$ git clone git@github.com:sbordya/boston_crimes_map_bordea.git
Navigate to the project folder and prepare a jar file with dependencies
(you will be able to see it under the path
<path_to_project>/target/scala-2.11/boston_crimes_map_bordea-assembly-0.0.1.jar
):
$ sbt assembly
Now you are ready to run the program:
$ <path_to_spark>/bin/spark-submit --master local[*] --class com.example.BostonCrimesMap <path_to_project>/target/scala-2.11/boston_crimes_map_bordea-assembly-0.0.1.jar <path_to_crime.csv> <path_to_offense_codes.csv> <path_to_output_folder>
If you want to see the graph of dependencies, you can run:
$ sbt dependencyBrowseGraph
This program was created during the data engineer course on otus.ru.