Skip to content

mathias-mike/Project-Sparkify

Repository files navigation

Project Sparkify

Sparkify is a music streaming startup with an impressive userbase growth (their marketing team must be doing one hell of a job) and is looking to move their processes and data to the cloud. Their data resides in S3, in a directory of JSON logs on user activity on the app, as well as a directory with JSON metadata on the songs in their app.

I am tasked with applying my engineering skills to build solutions that serve data business users making their job a lot easier.

Initial I built a data warehouse for their analytic team but with their increasing user base, their data needed to be moved into a data lake.

Data Warehouse

This folder contains my implementation of a data warehouse solution for Sparkify.

Warehouse was hosted in AWS Redshift

Data Lake

This folder contains my implementation of a data lake solution for Sparkify.

Pipeline

This folder contains my implemetation of a data pipeline with Apache Airflow for Sparkify

About

Big Data solutions for Sparkify (An online music streaming startup)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published