#

data-engineering

Here are 3,195 public repositories matching this topic...

jijo-james / data-engineering-pet-projects

This repo is my experimental projects on Data Engineering.

python airflow sql etl data-engineering

Updated Mar 6, 2023
Python

leonidee / spark-hadoop-automation-in-cloud

Automate Apache Spark in Hadoop with Airflow in Cloud

airflow apache-spark hadoop data-engineering

Updated Jul 16, 2023
Python

LoveNui / EMR-AWS-APACHE-SPARK

aws airflow big-data spark data-engineering data-analysis

Updated Jul 15, 2023
Python

horony / udacity-nanodegree-data-engineering

Project files originating from my 2023 Nanodegree Data Engineering.

udacity spark python3 data-engineering udacity-nanodegree

Updated Feb 10, 2023
Jupyter Notebook

khushal2405 / ETL-pipeline-using-Airflow-and-AWS-EMR

We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics

python aws airflow scala spark apache-spark etl s3 s3-bucket aws-emr pyspark data-engineering

Updated Feb 25, 2023
Python

mukmookk / streamDAQ

real time nasdaq data pipeline

python data-engineering webcrawling

Updated Aug 15, 2023
Python

lucasbalponti / Apache-Airflow---Pipeline-de-dados

pipeline data-engineering dag apache-airflow vitrinedev

Updated Apr 26, 2023
Python

leonardohss0 / etl-sql-s3-redshift

Keywords: Python, Airflow, AWS, S3, Redshift, ETL

airflow etl data-engineering

Updated Apr 29, 2023
Python

holistics / pgcp

Copying tables between Postgres databases (for analytics purpose)

tools data-engineering

Updated Mar 16, 2023
Ruby

trieu / leo-cdp

Leo CDP - Customer Data Platform for Smart Business

bigdata data-engineering data-analytics cloud-computing cdp customer-data-platform

Updated Sep 18, 2020
JavaScript

hirenshah7390 / Spark_MLlib

machine-learning scala apache-spark data-engineering

Updated Sep 6, 2017
XSLT

omerv / Bike-Sharing-Demand

Kaggle's 'Bike Sharing Demand' competition

data-science numpy pandas-dataframe pandas data-visualization data-engineering kaggle-competition data-extraction matplotlib data-exploration matplotlib-figures bike-sharing-demand

Updated Jan 21, 2018
Jupyter Notebook

rparrapy / irs-revenue

Playing around with Spark for dataset aggregation

spark data-engineering

Updated Mar 2, 2017
Scala

imbrito / pyspark-calculates-session

PySpark Analysis from log files

python data-structure spark bigdata pyspark data-engineering data-analytics

Updated Nov 11, 2022
Python

liber1320 / data_science

Repository contains data science projects.

python data-science machine-learning deep-learning data-engineering

Updated Oct 4, 2021
Jupyter Notebook

pavel-filatov / yelp-challenge

Project to demonstrate basic data engineering skills

data-engineering

Updated Jul 17, 2019
Scala

splovyt / LymeDatabase

Constructing a protein fragment database in the context of Lyme disease.

bioinformatics pipeline data-engineering healthcare webapp

Updated Dec 26, 2018
Python

willyhakim / awesome-python-data-engineering

A curated list of awesome data engineering resources using python

python data-engineering data-management

Updated Feb 8, 2019

atzori / 2018

IEEE AIKE 2018 Conference Website

artificial-intelligence data-engineering conference-site

Updated Mar 24, 2018
HTML

stmunees / Access-Galway-KDE

CS7IS1- Access Galway- Knowledge and Data Engineering

linked-data sparql rdf data-engineering owl-ontology knowledge-engineering cs7is1

Updated Jan 7, 2023
CSS

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."