Skip to content

Sparkify-ETL is a project for data modeling in Apache Cassandra

Notifications You must be signed in to change notification settings

amrelauoty/Sparkify-ETL-Cassandra

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sparkify ETL

Table of contents

Introduction

A startup called Sparkify wants to analyze the data they've been collecting on songs and user activity on their new music streaming app. The analysis team is particularly interested in understanding what songs users are listening to.

We want to answer those 3 queries

  • Give me the artist, song title and song's length in the music app history that was heard during sessionId = 338, and itemInSession = 4
  • Give me only the following: name of artist, song (sorted by itemInSession) and **user (first and last name)** for userid = 10, sessionid = 182
  • Give me every user name (first and last) in my music app history who listened to the song 'All Hands Against His Own'

The ETL extracts the data from event files and but it in a new file called event_datafile_new then get the required data for creating Cassandra tables depending on the query.

The database used in the project is Apache Cassandra

Tools

pythoncassandra

Usage

To run the ETL open ETL.ipynb and execute the code sections one by one to create the keyspace and the table and executing the queries

Keyspace Design

The keyspace contains 3 tables (one table according to each query)

Project Files

ETL.ipynb :

  • the file contains the process of to collect the data from all files to one new CSV file
  • Connect to cassandra with a cluster then creating a session
  • Creating the keyspace for the tables and setting the replication configurations
  • Creating the tables for every query and test the select statement to ensure that result is right
  • Dropping the tables and closing the connection

About

Sparkify-ETL is a project for data modeling in Apache Cassandra

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published