Skip to content

A simple data pipeline using dbt, pandas, postgresql to transform data using dbt as a transformation tool and postgres as the warehouse

Notifications You must be signed in to change notification settings

jbassie/ETL-PROJECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL PROJECT


OVERVIEW

  • This is a simple ETL project. Firstly I scrapped a liquor store website to extract data about its various category of wines then converted it to a csv file. Using the faker nd names library from python, I created different customer names, employee names, city, country etc.
  • Using the data generated I created a local postgres database to store the data using pgcli in a star schema format to hold the data. After loading the data into postgres, I used dbt to transform the data into the solve I wanted to solve with data, using postgres connection from dbt and visual studio code as the IDE for the transformation.
  • With the transformed data data, I created a connection with from Power BI with Postgres as my Data Source to extract only transformed table for data analysis and visualization.

USAGE

  • Python: Python was used to for webscraping the data and creating the fake data from faker and names
    Libraries
    • Pandas
    • Beautiful Soup
    • Faker
    • pandasql
    • Names
  • Postgres: Used PostgreSQL as the Database for the data
  • Power Bi
  • dbt Used dbt as a transformation tool for the data and to determine some data quality
    • dbt init
    • dbt source freshness
    • dbt run
    • dbt debug
    • dbt test
    • dbt docs generate


PROCESS

  • EXTRACT
    • The Whiskey Exchange The whiskey exchange is online retail store that specializes in the sales of whiskey drink. For this project, we will be scrapping different whiskey name, price, alcohol amount, alcohol percent as well as category of the whiskey. Using the faker and names libraries I will create a fake customer name, employee name, country, city, email, CARD ISSUER and saving the result into different dataframe
    • customer_df
    • countries_df
    • departments_df
    • employees_df
    • payments_df
    • products_df This is all carried in a jupyter notebok file here

  • LOAD
    • The dataframe are converted into different csv files in the same folder> Using Postgres PSQL COMMAND LINE PROMPT, I created the different tables to a whiskey_exchange database according the name of the of the exported csv file. Added foreign and Unique keys to the tables and formed a star schema. With this the Database is Loaded into my local psotgres. Here is ERD representation of the database schema

  • TRANSFORM

    • Using the termnal prompt from VS CODE I initialze dbt into my project folder and created a connection profile into my local database as follows :

    Postgres2:
    - target: dev
    - outputs:
    * dev:
    - type: postgres
    - host: localhost
    - user: postgres
    - password: '############'
    - port: 5432
    - dbname: whiskey_exchange
    - schema: public
    - threads: 10
    - keepalives_idle: 0 # default 0, indicating the system default. See below
    - connect_timeout: 10 # default 10 seconds "

    The transformation process is represented in the lineage graph below:
    

    Lineage_graph Graph

    • VISUALIZATION After the transforamtion step, the data is now ready for analysis. For the analysis,I represented the data in a simple Microsoft Power BI Dashboard to schowcase findings from the data. To achieve this I created a connection with Postgresql from Power BI and pulled the processed data.

    Power BI Connection PostgresCOnnection

    After creating the connection I represented the data in a visual Visual

    • Some Insights some the Visual
      • 2015 was had the best sales of $710,893
      • Scotch Whiskey was the most Sold Whiskey Type
      • JCB 16 Digit was the most Card type used for Purchase

About

A simple data pipeline using dbt, pandas, postgresql to transform data using dbt as a transformation tool and postgres as the warehouse

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published