Skip to content

Project uses Pandas to create multiple DataFrames from CSV files containing Disneyland Reviews and Chocolate Reviews.. Cleaned those DataFrames, then loaded to PostgreSQL to create a relational database to join everything together.

Notifications You must be signed in to change notification settings

Elliott-dev/ETL-with-Pandas-Project

Repository files navigation

ETL-with-Pandas (Disneyland, Chocolate Bars)

image image image

Extract: There are 2 different datasets: Disneyland Review Ratings, and Chocolate Bar Ratings; limited to the years 2010-2019 from the public platform Kaggle conducted by Rachael Tatman and Arush Chillar respectively. The raw data can be found in Resources folder.

The datasets used for this project provided information on:

Disneyland Review Ratings & Chocolate Bar Ratings

Each CSV was made into a pandas DataFrame.

Transform:

Copied only the columns needed into a new DataFrame from both tables.

Renamed the column headers both tables.

Added 2 additional columns splitting timestamp into a year column and a month column on Disneyland Table.

Removed NULL values in year column on Disneyland Table.

Cacao Table passthrough.

Load: Created a connection to PostgreSQL database

Checked for a successful connection to the database and confirmed that the tables have been created

Appended DataFrames to tables;

Confirmed successful Load by querying database.

Documentation:

Documentation Containing Business Rules: ETL Mapping Documentation

Instructions: Running the program:

Open Disneyland_and_chocolate_ETL.ipynb

run everything and when prompted, enter pgAdmin password

About

Project uses Pandas to create multiple DataFrames from CSV files containing Disneyland Reviews and Chocolate Reviews.. Cleaned those DataFrames, then loaded to PostgreSQL to create a relational database to join everything together.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published