etl-pipeline

Here are 1,350 public repositories matching this topic...

orchest / orchest

Build data pipelines, the easy way 🛠️

python docker kubernetes data-science machine-learning airflow cloud deployment jupyter etl ide pipelines self-hosted jupyterlab notebooks data-pipelines dag etl-pipeline orchest

Updated Jun 6, 2023
TypeScript

apache / incubator-streampark

Star

Make stream processing easier! Easy-to-use streaming application development framework and operation platform.

streaming apache easy-to-use etl-pipeline development-framework streampark operation-platform

Updated May 1, 2024
Java

AlexIoannides / pyspark-example-project

Star

Implementing best practices for PySpark ETL jobs and applications.

python data-science spark etl pyspark data-engineering etl-pipeline etl-job

Updated Jan 1, 2023
Python

DAGWorks-Inc / hamilton

Star

Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage and metadata. Runs and scales everywhere python does.

Updated May 3, 2024
Jupyter Notebook

san089 / Udacity-Data-Engineering-Projects

Star

Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.

Updated Aug 26, 2022
Python

san089 / goodreads_etl_pipeline

Star

An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.

Updated Mar 9, 2020
Python

stitchfix / hamilton

Star

A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton

python data-science machine-learning etl numpy pandas data-engineering data-platform software-engineering feature-engineering dataframe dag hamiltonian etl-framework hamilton featurization etl-pipeline stitch-fix

Updated Jul 3, 2023
Python

techascent / tech.ml.dataset

Star

A Clojure high performance data processing system

java machine-learning clojure csv xlsx datascience dataset dataframe etl-pipeline

Updated Mar 12, 2024
Clojure

YotpoLtd / metorikku

Star

A simplified, lightweight ETL Framework based on Apache Spark

scala sql big-data spark etl distributed-computing etl-framework etl-pipeline

Updated Jan 24, 2024
Scala

flow-php / flow

Star

Flow PHP - data processing framework

etl etl-framework etl-pipeline

Updated May 3, 2024
PHP

SETL-Framework / setl

Star

A simple Spark-powered ETL framework that just works 🍺

data-science machine-learning framework scala big-data spark pipeline etl data-transformation data-engineering dataset data-analysis modularization setl etl-pipeline

Updated Dec 7, 2023
Scala

Indexical-Metrics-Measure-Advisory / watchmen-matryoshka-doll

Star

Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management

visualization charts pipeline data-visualization data-pipeline etl-pipeline data-quality-monitoring

Updated Apr 28, 2022
Python

data-engineering-community / data-engineering-project-template

Sponsor

Star

This is a template you can use for your next data engineering portfolio project.

python data sql etl data-warehouse data-engineering etl-pipeline

Updated Sep 10, 2021

jitsucom / bulker

Star

Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)

pipeline etl data-engineering ingestion datawarehouse etl-pipeline

Updated May 3, 2024
Go

Zipstack / unstract

Star

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

unstructured-data etl-pipeline llm-platform

Updated May 3, 2024
Python

patterns-app / patterns-devkit

Star

Data pipelines from re-usable components

data-science sql etl pipelines immutability data-engineering functional-reactive-programming data-analysis data-pipelines data-pipeline etl-framework etl-pipeline etl-pipelines

Updated Mar 30, 2023
Python

airscholar / e2e-data-engineering

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated Oct 5, 2023
Python

usc-isi-i2 / dig-etl-engine

Star

Download DIG to run on your laptop or server.

search-engine crawling information-extraction information-visualization etl-framework etl-pipeline

Updated Jan 9, 2019

Wittline / uber-expenses-tracking

Sponsor

Star

The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.

python aws uber power-bi data-engineering data-modeling aws-redshift airflow-docker uber-data apache-airflow etl-pipeline uber-eats expenses-dashboard expenses-tracker

Updated Jun 29, 2022
Jupyter Notebook

imsanjoykb / Data-Science-Regular-Bootcamp

Star

Regular practice on Data Science, Machien Learning, Deep Learning, Solving ML Project problem, Analytical Issue. Regular boost up my knowledge. The goal is to help learner with learning resource on Data Science filed.

Updated Jan 29, 2023
Jupyter Notebook

Improve this page

Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

etl-pipeline

Here are 1,350 public repositories matching this topic...

orchest / orchest

apache / incubator-streampark

AlexIoannides / pyspark-example-project

DAGWorks-Inc / hamilton

san089 / Udacity-Data-Engineering-Projects

san089 / goodreads_etl_pipeline

stitchfix / hamilton

techascent / tech.ml.dataset

YotpoLtd / metorikku

flow-php / flow

SETL-Framework / setl

Indexical-Metrics-Measure-Advisory / watchmen-matryoshka-doll

data-engineering-community / data-engineering-project-template

jitsucom / bulker

Zipstack / unstract

patterns-app / patterns-devkit

airscholar / e2e-data-engineering

usc-isi-i2 / dig-etl-engine

Wittline / uber-expenses-tracking

imsanjoykb / Data-Science-Regular-Bootcamp

Improve this page

Add this topic to your repo