Skip to content

This repository contains the projects I completed in the Udacity Data Engineering Nanodegree.

Notifications You must be signed in to change notification settings

Gianatmaja/Udacity-Data-Engineering-Nanodegree

Repository files navigation

Udacity Data Engineering Nanodegree

This repository contains the projects I completed in the Udacity Data Engineering Nanodegree.

Project 1: Data Modelling with Postgres

In this project, we'll be creating a database schema and building an ETL pipeline using Python and SQL. The ETL pipeline will then be used to transfer data from json logs in local directories into tables in Postgres.

Go to repo

Project 2: Data Modelling with Apache Cassandra

In this project, we'll be modelling data using Apache Cassandra and building an ETL pipeline using Python. The ETL pipeline will transfer data from a set of csv files within a directory into Apache Cassandra tables.

Go to repo

Project 3: Implementing a Cloud Data Warehouse

In this project, we'll be building an ETL pipeline to load data from S3 to staging tables on Amazon Redshift. We'll also execute SQL statements that create the analytics tables from these staging tables.

Go to repo

Project 4: Data Lake on AWS

In this project, we'll build an ETL pipeline for a data lake hosted on S3. The ETL pipeline will load data from S3, process the data into analytics tables using Spark, and load them back into S3. This Spark process will be deployed on a cluster using AWS.

Go to repo

Project 5: Data Pipelines using Apache Airflow

In this project, we'll build data pipelines using Apache Airflow to automate the data warehouse ETL process. For the ETL process, the source data resides in S3, and is transferred into a data warehouse hosted on Amazon Redshift.

Go to repo

Project 6: Enhancing the I94 Immigration Data with External Data Sources

In this project, we'll be enhancing the I94 with external data, such as the world temperature and US city demographic data. This project will provide the foundation for future analysis regarding possible relationships between a country's immigration and arrival statistics, and its temperature and population demographics.

Go to repo

Certificate of Completion

certificate

About

This repository contains the projects I completed in the Udacity Data Engineering Nanodegree.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published