Personal Data Engineering Projects
-
Updated
Feb 8, 2023 - Jupyter Notebook
Personal Data Engineering Projects
Scan databases and data warehouses for PII data. Tag tables and columns in data catalogs like Amundsen and Datahub
Redshift Python Connector. It supports Python Database API Specification v2.0.
Udacity Data Engineering Nanodegree Program
Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Build clickstream analytics on AWS for your mobile and web applications
An example system that captures a large stream of product usage data, or events, and provides both real-time data visualization and SQL-based data analytics.
Project 3 - Data Engineering Nanodegree
spring boot data jpa integration with aws redshift sample
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
A simple command-line tool to copy tables from Amazon Redshift to Amazon RDS (PostgreSQL).
Project 5 - Data Engineering Nanodegree
Project was based on an interest in Data Engineering, ETL pipeline. It also provided a good opportunity to develop skills and experience in a range of tools. As such, project is more complex than required, utilising dbt, airflow, docker and cloud based storage.
Completed Udacity's data engineering nano degree. Went through a series of exercises and projects to learn and practice the trendy big data management tools.
Configuring Redshift cluster using Terraform.
A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.
Add a description, image, and links to the aws-redshift topic page so that developers can more easily learn about it.
To associate your repository with the aws-redshift topic, visit your repo's landing page and select "manage topics."