Sparkify ETL

Introduction

A startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. The analysis team is particularly interested in understanding what songs users are listening to.

We want to answer those 3 queries

Give me the artist, song title and song's length in the music app history that was heard during sessionId = 338, and itemInSession = 4
Give me only the following: name of artist, song (sorted by itemInSession) and **user (first and last name)** for userid = 10, sessionid = 182
Give me every user name (first and last) in my music app history who listened to the song 'All Hands Against His Own'

The ETL extracts the data from event files and but it in a new file called event_datafile_new then get the required data for creating Cassandra tables depending on the query.

The database used in the project is Apache Cassandra

Tools

Usage

To run the ETL open ETL.ipynb and execute the code sections one by one to create the keyspace and the table and executing the queries

Keyspace Design

The keyspace contains 3 tables (one table according to each query)

Project Files

ETL.ipynb :

the file contains the process of to collect the data from all files to one new CSV file
Connect to cassandra with a cluster then creating a session
Creating the keyspace for the tables and setting the replication configurations
Creating the tables for every query and test the select statement to ensure that result is right
Dropping the tables and closing the connection

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
event_data		event_data
images		images
ETL.ipynb		ETL.ipynb
README.md		README.md
event_datafile_new.csv		event_datafile_new.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

event_data

event_data

images

images

ETL.ipynb

ETL.ipynb

README.md

README.md

event_datafile_new.csv

event_datafile_new.csv

Repository files navigation

Sparkify ETL

Table of contents

Introduction

Tools

Usage

Keyspace Design

Project Files

About

Releases

Packages

Languages

amrelauoty/Sparkify-ETL-Cassandra

Folders and files

Latest commit

History

Repository files navigation

Sparkify ETL

Table of contents

Introduction

Tools

Usage

Keyspace Design

Project Files

About

Topics

Resources

Stars

Watchers

Forks

Languages