Skip to content

mattavallone/Big-Data-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 

Repository files navigation

Big Data Project

Team Members:

Matthew Avallone, Garima Chaudhary, Dinesh Sreekanthan

Objective:

The goal of the project was to gain hands-on experience with multiple steps of the data lifecycle that benefit from big data infrastructure. The project is broken down into two tasks: Data Cleaning/Profiling and Semantic Profiling. All of the datasets used come from the NYC Open Data initiative (https://opendata.cityofnewyork.us/).

The code was run on NYU Hadoop Cluster using Python 3.6.5 and Spark 2.4.0

About

Data cleaning and profiling of NYC Open Data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published