Skip to content

taranahassan/Movies-ETL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Movies ETL

Description

Amazing Prime is hosting a hackathon event where teams of analysts work collaboratively on projects where they would use data to solve any problems.
Raw data has been gathered from Wikipedia and Kaggle which then was extracted, transformed and loaded into SQL. 4 deliverables have been to automate the ETL process. This generally would save time but also the not having to manually update codes to update the data.
The automation process was done by refactoring code and inserting a function.

Resource files:
movies_metadata.csv from Kaggle
wikipedia-movies.json from Wikipedia
ratings.csv from MovieLens

ETL function defined to read all 3 data files.

Wikipedia movies data extracted and transformed with nested fuctions to clean the data.

Kaggle data extracted and transformed by cleaning data within the existing function.

Movies database created in SQL showing elapsed time; within the existing function.

About

Extracting and transforming and automating the process by refactoring code. Data then loaded onto SQL.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published