Streaming data-pipeline in aws
-
Updated
May 2, 2022 - Python
Streaming data-pipeline in aws
A data pipeline that conducts ETL processes to AWS Redshift, utilizing Spark and coordinated by Apache Airflow.
Platzi. School of Amazon Web Services. Redshift for Big Data management.
The goal of this project is to build data pipeline for gathering real-time carpark lots availability and weather datasets from Data.gov.sg. These data are extracted via API, and stored them in the S3 bucket before ingesting them into the Data Warehouse.
Load data from the Million Song Dataset into AWS RedShift.
Data Warehousing in AWS with Redshift
An implementation of a Data Warehouse leveraging AWS RedShift. This project builds an ETL pipeline for the database hosted on AWS Redshift that extracts their data from multiple JSON files residing in S3 buckets, stages them in Redshift, and transforms data into a set of dimensional tables for their analytics team to continue finding insights in…
A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):
ETL pipeline with AWS Redshift orchestrated with Airflow
Udacity Data Engeneering Nanodegree Program - My Submission of Project: Data Pipelines
Used AWS Glue to perform ETL operations and load resultant data to AWS Redshift. In the second phase used AWS CloudWatch rules and LAMBDA to automatically run GLUE Jobs
AWS Redshift serverless clustor
Typescript Library for doing some redshift specific tasks
Data pipelines created and monitored using Airflow to feed data into Redshift
Add a description, image, and links to the aws-redshift topic page so that developers can more easily learn about it.
To associate your repository with the aws-redshift topic, visit your repo's landing page and select "manage topics."