Udacity Data Engineer Nanodegree

This coursework was completed as part of Udacity's Data Engineer Nanodegree. I obtained my certification in 2021, and this repository is a collection of the projects I undertook during the program.

This repository showcases a portfolio of my work from the Udacity Data Engineer Nanodegree. The projects encompass a variety of skills including designing data models, building data warehouses and data lakes, automating data pipelines, and working with massive datasets.

1. Data Modelling with PostgreSQL

This project explores fundamental concepts of Data Modelling using PostgreSQL. We design and create a database schema, then populate the database using optimized queries for a fictitious music streaming app, Sparkify.

2. ETL in Cloud Data Warehouses

In this project, we build an ETL pipeline that extracts data from S3, stages them in Redshift, and transforms data into a set of dimensional tables for Sparkify's analytics team. The process introduces the hands-on implementation of cloud data warehouses.

3. Data Lakes with Spark

This project focuses on the construction of data lakes using Apache Spark. We build an ETL pipeline that extracts data from S3, processes it using Spark, and loads the processed data back into S3. This project highlights working with big data from different sources and in different formats.

4. Data Pipelines with Airflow

We dive into the world of automated data pipelines using Apache Airflow. By scheduling and monitoring data pipelines, we ensure high data quality for analytics and enable consistent data availability. The project also involves source data extraction from S3 to Redshift.

5. Data Engineering Final Capstone Project: US Migration Data ETL Pipeline with Spark

The Capstone project integrates the skills learned throughout the nanodegree. We construct an ETL pipeline to analyze US immigration data. We use Apache Spark to handle large datasets, enabling comprehensive analysis of migration patterns.

Closing Remarks

Feel free to explore the repository, clone projects, and get hands-on experience with real-world Data Engineering scenarios. Your feedback is always welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Part 1 - Data Modelling		Part 1 - Data Modelling
Part 2 - Cloud Data Warehousing		Part 2 - Cloud Data Warehousing
Part 3 - Datalakes with Spark		Part 3 - Datalakes with Spark
Part 4 - Data Pipelines with Airflow		Part 4 - Data Pipelines with Airflow
Projects		Projects
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Part 1 - Data Modelling

Part 1 - Data Modelling

Part 2 - Cloud Data Warehousing

Part 2 - Cloud Data Warehousing

Part 3 - Datalakes with Spark

Part 3 - Datalakes with Spark

Part 4 - Data Pipelines with Airflow

Part 4 - Data Pipelines with Airflow

Projects

Projects

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Udacity Data Engineer Nanodegree

1. Data Modelling with PostgreSQL

2. ETL in Cloud Data Warehouses

3. Data Lakes with Spark

4. Data Pipelines with Airflow

5. Data Engineering Final Capstone Project: US Migration Data ETL Pipeline with Spark

Closing Remarks

About

Languages

License

beingfranklin/Data-Engineer-Nanodegree-Coursework

Folders and files

Latest commit

History

Repository files navigation

Udacity Data Engineer Nanodegree

1. Data Modelling with PostgreSQL

2. ETL in Cloud Data Warehouses

3. Data Lakes with Spark

4. Data Pipelines with Airflow

5. Data Engineering Final Capstone Project: US Migration Data ETL Pipeline with Spark

Closing Remarks

About

Resources

License

Stars

Watchers

Forks

Languages