data-pipeline

Here are 618 public repositories matching this topic...

airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated May 14, 2024
Python

apache / flink-cdc

Star

Flink CDC is a streaming data integration tool

mysql real-time kafka etl postgresql distributed batch data-integration schema-evolution elt flink cdc data-pipeline change-data-capture paimon

Updated May 14, 2024
Java

snowplow / snowplow

Star

The enterprise-grade behavioral data engine (web, mobile, server-side, webhooks), running cloud-natively on AWS and GCP

data analytics snowplow data-collection data-pipeline product-analytics marketing-analytics snowplow-pipeline snowplow-events

Updated Mar 26, 2024
Scala

GoogleCloudPlatform / data-science-on-gcp

Star

Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017

data-science machine-learning data-visualization data-engineering cloud-computing data-analysis data-processing data-pipeline

Updated May 1, 2024
Jupyter Notebook

adilkhash / Data-Engineering-HowTo

Star

A list of useful resources to learn Data Engineering from scratch

distributed-systems scala cloud-providers data-engineering data-pipeline

Updated Mar 22, 2024

kestra-io / kestra

Star

Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.

workflow data pipeline etl workflow-engine scheduler orchestration data-engineering data-integration elt data-pipeline data-quality low-code data-orchestration data-orchestrator reverse-etl

Updated May 14, 2024
Java

BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.

real-time big-data high-performance data-lake data-integration flink data-synchronization data-pipeline

Updated Jan 1, 2024
Java

rudderlabs / rudder-server

Star

Privacy and Security focused Segment-alternative, in Golang and React

Updated May 14, 2024
Go

superstreamlabs / memphis

Star

Memphis.dev is a highly scalable and effortless data streaming platform

kubernetes golang data enrichment microservices schema-registry message-bus message-queue data-engineering data-pipeline message-broker data-streaming data-stream-processing messaging-queue

Updated May 6, 2024
Go

damklis / DataEngineeringProject

Star

Example end to end data engineering project.

python redis elasticsearch airflow kafka big-data mongodb scraping django-rest-framework s3 data-engineering minio kafka-connect hacktoberfest data-pipeline debezium

Updated Dec 8, 2022
Python

apache / seatunnel-web

Star

SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).

real-time offline high-performance apache data-integration sql-engine data-pipeline etl-framework seatunnel

Updated May 8, 2024
Java

pydoit / doit

Star

task management & automation tool

python workflow data-science build-automation task-runner build-tool build-system workflow-management hacktoberfest data-pipeline workflow-automation

Updated Oct 21, 2023
Python

infoslack / awesome-kafka

Star

A list about Apache Kafka

infrastructure kafka apache-spark stream-processing apache-kafka kafka-streams data-processing data-pipeline streaming-data

Updated Feb 9, 2024

reugn / go-streams

Star

A lightweight stream processing library for Go

Updated May 14, 2024
Go

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated May 12, 2024
HTML

whylabs / whylogs

Star

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈