Rail_Wreck_Data_Analysis

Rail Wreck Data Analysis: A CSV file has data for last 10 years of rail wreck in USA. Using Spark, python, MongoDB, Angular JS, HTML 5, CSS, Flask, Restlet API, a distributed application is developed that analyse this file to get useful information.

#Following concepts are used:

RDD
Dataframe
Map-Reduce
Filter, regular expression
Tuple, list, Dictionary
MongoDB collection, documents, data insertion, data retrival, data modification.
AngularJS - Controller, redirection, ng-repeat, watch, Service, JSON, HTTP communication
Flask: Web application resource, Static file rendering

#Architecture:

A CSV file is read as Dataframe.
On this data frame, different filtering, map-reduce functions are applied to get important information.
These important information is stored in database (Cloud MongoDB)
This CSV file is read as RDD.
On this RDD, different filtering, map-reduce functions are applied to get important information.
These important information is stored in database (Cloud MongoDB)
A Flask-python web application is developed that define web route resources to retrieve these important information from database.
A single-page website is created that uses AngularJS, HTML5, CSS to connect to this web application and rendering those important information to user.

#Steps to run this in local development environment:

Download this project. My application location is "C:\spark-2.1.0-bin-hadoop2.7\bin\project\Rail_Wreck_Data_Analysis"
File "train_wreck.csv" is in data folder.

Run python_MongoDB.py as shown below. This .py does data analysis on "train_wreck.csv" and insert important imformation into database.

         C:\spark-2.1.0-bin-hadoop2.7\bin>spark-submit project/Rail_Wreck_Data_Analysis/python_MongoDB.py
         [Stage 2:>                                                          (0 + 2) / 2]
         Accident Yearly Data Updated. Total Time taken (3.77 Sec)
         [Stage 4:>                                                          (0 + 2) / 2]
         [Stage 4:=============================>                             (1 + 1) / 2]
         Monthly Accident Data Updated. Total Time taken (3.03 Sec)
         [Stage 6:>                                                          (0 + 2) / 2]
         [Stage 6:=============================>                             (1 + 1) / 2]
         [Stage 7:>                                                          (0 + 2) / 2]
         Date Accident Data Updated. Total Time taken (2.86 Sec)
         [Stage 11:=====================================================>(197 + 3) / 200]
         [Stage 12:>                                                        (0 + 4) / 27]
         [Stage 12:======>                                                  (3 + 4) / 27]
         [Stage 12:==============>                                          (7 + 4) / 27]
         [Stage 12:======================>                                 (11 + 4) / 27]
         [Stage 12:===============================>                        (15 + 4) / 27]
         [Stage 12:=======================================>                (19 + 4) / 27]
         [Stage 12:===============================================>        (23 + 4) / 27]
         [Stage 12:=================================================>      (24 + 3) / 27]
         [Stage 13:>                                                        (0 + 4) / 27]
         [Stage 13:======>                                                  (3 + 4) / 27]
         [Stage 13:========>                                                (4 + 4) / 27]
         [Stage 13:============>                                            (6 + 4) / 27]
         [Stage 13:==============>                                          (7 + 4) / 27]
         [Stage 13:===================>                                     (9 + 4) / 27]
         [Stage 13:====================>                                   (10 + 4) / 27]
         [Stage 13:========================>                               (12 + 4) / 27]
         [Stage 13:==========================>                             (13 + 4) / 27]
         [Stage 13:===============================>                        (15 + 4) / 27]
         [Stage 13:=================================>                      (16 + 4) / 27]
         [Stage 13:=====================================>                  (18 + 4) / 27]
         [Stage 13:=======================================>                (19 + 4) / 27]
         [Stage 13:===========================================>            (21 + 4) / 27]
         [Stage 13:=============================================>          (22 + 4) / 27]
         [Stage 13:=================================================>      (24 + 3) / 27]
         [Stage 13:===================================================>    (25 + 2) / 27]
         Each State accident count Data Updated. Total Time taken (32.21 Sec)
         Each Hourly accident count Data Updated. Total Time taken (5.15 Sec)
         Each Railroad accident count Data Updated. Total Time taken (22.33 Sec)

MongoDB collections are created by python_MongoDB.py as shown in Data/Database_collection.jpg
Now run Web Application as shown below. This start the web application on http://127.0.0.1:5000

C:\spark-2.1.0-bin-hadoop2.7\bin\project\Rail_Wreck_Data_Analysis\Server>python Web_Application.py
- Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
Enter http://127.0.0.1:5000 in web browser and browse rail wrecks analytical data.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Server		Server
data		data
README.md		README.md
python_MongoDB.py		python_MongoDB.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server

Server

data

data

README.md

README.md

python_MongoDB.py

python_MongoDB.py

Repository files navigation

Rail_Wreck_Data_Analysis

About

Releases

Packages

Languages

SunilKumarGarg/Rail_Wreck_Data_Analysis

Folders and files

Latest commit

History

Repository files navigation

Rail_Wreck_Data_Analysis

About

Topics

Resources

Stars

Watchers

Forks

Languages