This repo is my experimental projects on Data Engineering.
-
Updated
Mar 6, 2023 - Python
This repo is my experimental projects on Data Engineering.
Automate Apache Spark in Hadoop with Airflow in Cloud
Project files originating from my 2023 Nanodegree Data Engineering.
We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics
Keywords: Python, Airflow, AWS, S3, Redshift, ETL
Copying tables between Postgres databases (for analytics purpose)
Leo CDP - Customer Data Platform for Smart Business
Kaggle's 'Bike Sharing Demand' competition
PySpark Analysis from log files
Repository contains data science projects.
Constructing a protein fragment database in the context of Lyme disease.
A curated list of awesome data engineering resources using python
IEEE AIKE 2018 Conference Website
CS7IS1- Access Galway- Knowledge and Data Engineering
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."