data-engineering

We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics

python aws airflow scala spark apache-spark etl s3 s3-bucket aws-emr pyspark data-engineering

Updated Feb 25, 2023
Python

mukmookk / streamDAQ

Star

real time nasdaq data pipeline

python data-engineering webcrawling

Updated Aug 15, 2023
Python

leonardohss0 / etl-sql-s3-redshift

Star

Keywords: Python, Airflow, AWS, S3, Redshift, ETL

airflow etl data-engineering

Updated Apr 29, 2023
Python

lucasbalponti / Apache-Airflow---Pipeline-de-dados

Star

pipeline data-engineering dag apache-airflow vitrinedev

Updated Apr 26, 2023
Python

juliaobenauer / Data-Pipelines-with-Airflow

Star

Udacity project within the Data Engineer Nanodegree

python airflow sql etl data-engineering

Updated Nov 26, 2022
Python

alpreu / ddm-handson-akka

Star

Akka hands-on for the Distributed Data Management course at the Hasso-Plattner-Institute

distributed-systems akka data-engineering hands-on distributed-data-management

Updated Nov 29, 2018
Java

uche-madu / deb-infrastructure

Star

This repository contains infrastructure code for the Wizeline Data Engineering Bootcamp (DEB) 2023. It is one of two repositories for the DEB. The other (deb-application) houses the application code.

Updated Nov 11, 2023
HCL

yennanliu / data_infra_repo

Star

Collections of POC/dev data infrastructure. | #SE

Updated May 1, 2023
Python

AbdElrhman-m / Flight-Cancellations

Star

data-science data-visualization data-engineering data-analysis

Updated Feb 7, 2019

PetraLee2019 / A-Mystery-in-Two-Parts

Star

Project Performing Data Modeling, Data Engineering and Data Analysis on Employees of a Corporation

sql postgresql pandas python3 data-engineering data-analysis matplotlib data-modeling sql-queries sqlachemy entity-relationship-diagram

Updated Oct 18, 2019
Jupyter Notebook

hackersandslackers / hackers-jupyter-posts

Star

🔴 📕 Our repository for Jupyter Notebook to serve as blog posts.

python blog data jupyter jupyter-notebook python3 data-engineering gatsbyjs

Updated Jan 29, 2020
Jupyter Notebook

keithhorbin / sql-challenge

Star

Design Employee Database using SQL

sql database analysis postgresql data-engineering database-management data-analysis erd data-modeling pgadmin4

Updated Jun 9, 2020
Jupyter Notebook

iCode13 / sql-challenge

Star

SQL analyses of a corporation's employee database. UT Austin Bootcamp homework assignment.

sql database postgresql pandas data-engineering data-analysis data-modeling

Updated Mar 2, 2021
Jupyter Notebook

shipyardapp / googlecloudstorage-blueprints

Star

Simplified blueprints for building data pipelines with Google Cloud Storage (GCS).

cli data-science etl google-cloud-storage google-cloud gcs data-engineering data-analysis elt data-pipeline gcs-bucket

Updated Nov 4, 2022
Python

epj-alter / lol_scouter

Star

Use supervised machine learning to analyze key performance indicators of a player's strengths and weaknesses. The process involved data gathering from API, data cleaning, data storage in SQL and CSV files, multiple machine learning models like Random Forest, Logistic linear regression classifiers.

machine-learning data-engineering etl-pipeline

Updated Jul 31, 2020
Jupyter Notebook

josecsotomorales / dbt

Star

Repository for testing data build tool (dbt)

data data-transformation data-engineering business-intelligence dbt dbt-packages

Updated Sep 9, 2021
PLSQL

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-engineering

Here are 3,116 public repositories matching this topic...

jijo-james / data-engineering-pet-projects

leonidee / spark-hadoop-automation-in-cloud

LoveNui / EMR-AWS-APACHE-SPARK

horony / udacity-nanodegree-data-engineering

khushal2405 / ETL-pipeline-using-Airflow-and-AWS-EMR

mukmookk / streamDAQ

leonardohss0 / etl-sql-s3-redshift

lucasbalponti / Apache-Airflow---Pipeline-de-dados

juliaobenauer / Data-Pipelines-with-Airflow

alpreu / ddm-handson-akka

uche-madu / deb-infrastructure

yennanliu / data_infra_repo

AbdElrhman-m / Flight-Cancellations

PetraLee2019 / A-Mystery-in-Two-Parts

hackersandslackers / hackers-jupyter-posts

keithhorbin / sql-challenge

iCode13 / sql-challenge

shipyardapp / googlecloudstorage-blueprints

epj-alter / lol_scouter

josecsotomorales / dbt

Improve this page

Add this topic to your repo