Skip to content

aimee0317/ETL-Data-Pipelines

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

ETL Data Pipelines

  • Author: Amelia Tang

This is the GitHub repository to document various ETL data pipelines I designed for different projects.

What is ETL?

Extract, Transform and Load (ETL) us a fundamental framework for streamlining data processing workflows. ETL pipelines facilitate the efficient extraction of data from diverse sources, transformation into a usable format, and loading into designated destinations for analysis.

Extract

This step involves gathering data from various sources such as databases, APIs, or files.

Transform

In this step, the extracted data is cleaned, validated, and transformed into a consistent format suitable for analysis.

Load

The transformed data is loaded into a target database or data warehouse, where it can be stored and accessed for further analysis or reporting.

Project 1 (Python BeautifulSoup, AWS EC2, S3, Glue and Athena)

ETL Diagrams

Extra public listing data from eBay.com using the Python script, transformed the data in AWS Glue and load the transformed data to AWS Athena for further analysis

ETL Data Pipeline Implementation on AWS

To demonstrate the implementation of the ETL data pipelines on AWS, I have created blog posts on Medium.com to document the process.

Project 2 (Apache Airflow, Python and PostgreSQL)

TBU

Project 3 (Databricks)

TBU

Releases

No releases published

Packages

No packages published