Udacity Data Engineering Nanodegree

This repository contains the projects I completed in the Udacity Data Engineering Nanodegree.

Project 1: Data Modelling with Postgres

In this project, we'll be creating a database schema and building an ETL pipeline using Python and SQL. The ETL pipeline will then be used to transfer data from json logs in local directories into tables in Postgres.

Go to repo

Project 2: Data Modelling with Apache Cassandra

In this project, we'll be modelling data using Apache Cassandra and building an ETL pipeline using Python. The ETL pipeline will transfer data from a set of csv files within a directory into Apache Cassandra tables.

Go to repo

Project 3: Implementing a Cloud Data Warehouse

In this project, we'll be building an ETL pipeline to load data from S3 to staging tables on Amazon Redshift. We'll also execute SQL statements that create the analytics tables from these staging tables.

Go to repo

Project 4: Data Lake on AWS

In this project, we'll build an ETL pipeline for a data lake hosted on S3. The ETL pipeline will load data from S3, process the data into analytics tables using Spark, and load them back into S3. This Spark process will be deployed on a cluster using AWS.

Go to repo

Project 5: Data Pipelines using Apache Airflow

In this project, we'll build data pipelines using Apache Airflow to automate the data warehouse ETL process. For the ETL process, the source data resides in S3, and is transferred into a data warehouse hosted on Amazon Redshift.

Go to repo

Project 6: Enhancing the I94 Immigration Data with External Data Sources

In this project, we'll be enhancing the I94 with external data, such as the world temperature and US city demographic data. This project will provide the foundation for future analysis regarding possible relationships between a country's immigration and arrival statistics, and its temperature and population demographics.

Go to repo

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
Data-Lake-AWS		Data-Lake-AWS
Data-Modelling-with-Apache-Cassandra		Data-Modelling-with-Apache-Cassandra
Data-Modelling-with-Postgres		Data-Modelling-with-Postgres
Data-Pipelines-Apache-Airflow		Data-Pipelines-Apache-Airflow
I94-Immigration-Enhancement		I94-Immigration-Enhancement
Implementing-Cloud-Data-Warehouse		Implementing-Cloud-Data-Warehouse
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data-Lake-AWS

Data-Lake-AWS

Data-Modelling-with-Apache-Cassandra

Data-Modelling-with-Apache-Cassandra

Data-Modelling-with-Postgres

Data-Modelling-with-Postgres

Data-Pipelines-Apache-Airflow

Data-Pipelines-Apache-Airflow

I94-Immigration-Enhancement

I94-Immigration-Enhancement

Implementing-Cloud-Data-Warehouse

Implementing-Cloud-Data-Warehouse

images

images

README.md

README.md

Repository files navigation

Udacity Data Engineering Nanodegree

Project 1: Data Modelling with Postgres

Project 2: Data Modelling with Apache Cassandra

Project 3: Implementing a Cloud Data Warehouse

Project 4: Data Lake on AWS

Project 5: Data Pipelines using Apache Airflow

Project 6: Enhancing the I94 Immigration Data with External Data Sources

Certificate of Completion

About

Releases

Packages

Languages

Gianatmaja/Udacity-Data-Engineering-Nanodegree

Folders and files

Latest commit

History

Repository files navigation

Udacity Data Engineering Nanodegree

Project 1: Data Modelling with Postgres

Project 2: Data Modelling with Apache Cassandra

Project 3: Implementing a Cloud Data Warehouse

Project 4: Data Lake on AWS

Project 5: Data Pipelines using Apache Airflow

Project 6: Enhancing the I94 Immigration Data with External Data Sources

Certificate of Completion

About

Topics

Resources

Stars

Watchers

Forks

Languages