Hadoop project for Middleware Technologies for Distributed Systems course @ Politecnico di Milano
The project consisted in performing some analysis on NYPD Motor Vehicle Collisions dataset (available at https://data.cityofnewyork.us/Public-Safety/NYPD-Motor-Vehicle-Collisions/h9gi-nx95), using Apache Hadoop as framework. The analysis consisted in three major requests:
- number of lethal accidents per week throughout the entire dataset
- number of accidents and percentage of number of deaths per contributing factor in the dataset
- number of accidents and average number of lethal accidents per week per borough
Along with Hadoop the analysis has been verified and visualized using Python Notebooks.