data-pipelines

Here are 199 public repositories matching this topic...

bruin-data / bruin

Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.

python bigquery sql analytics data-transformation snowflake data-analysis data-pipelines data-modeling

Updated May 14, 2024
Go

mage-ai / mage-ai

Star

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated May 14, 2024
Python

apache / dolphinscheduler

Star

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

workflow airflow job-scheduler orchestration cloud-native task-scheduler data-pipelines azkaban workflow-orchestration workflow-schedule powerful-data-pipelines

Updated May 14, 2024
Java

apache / airflow

Star

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Updated May 14, 2024
Python

infiniflow / ragflow

Star

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

nlp machine-learning information-retrieval ocr deep-learning orchestration preprocessing pdf-to-text data-pipelines document-parser rag document-understanding table-structure-recognition llm llmops retrieval-augmented-generation

Updated May 14, 2024
Python

artie-labs / transfer

Star

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

golang bigquery database kafka snowflake data-integration redshift apache-kafka elt data-pipelines cdc change-data-capture debezium

Updated May 14, 2024
Go

Unstructured-IO / unstructured

Star

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated May 14, 2024
HTML

dagster-io / dagster

Star

An orchestration platform for the development, production, and observation of data assets.

python metadata workflow data-science etl analytics scheduler orchestration data-engineering data-integration data-pipelines workflow-automation mlops dagster data-orchestrator

Updated May 14, 2024
Python

SciPhi-AI / R2R

Star

The framework for fast development and deployment of RAG systems.

search pdf machine-learning ocr deep-learning retrieval chatbot artificial-intelligence question-answering data-pipelines retrieval-systems large-language-models llm langchain llama-index retrieval-augmented-generation

Updated May 14, 2024
Python

infinyon / fluvio

Star

Lean and mean distributed stream processing system written in rust and web assembly.

rust distributed-systems streaming real-time serverless webassembly data-flow stream-processing data-integration cloud-native data-pipelines stateful streaming-data stream-processing-engine event-driven-architecture streaming-data-processing streaming-data-pipelines

Updated May 14, 2024
Rust

meltano / meltano

Star

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Updated May 13, 2024
Python

tsdat / tsdat

Star

Framework for standardizing, transforming, and applying quality checks to time series data.

python data-analysis data-pipelines tsdat

Updated May 14, 2024
Python

CogStack / CogStack-NiFi

Star

Building data processing pipelines for documents processing with NLP using Apache NiFi and related services

nlp elasticsearch kibana rest data-integration nifi apache-nifi data-pipelines electronic-health-records

Updated May 13, 2024
Python

vmware / versatile-data-kit

Star

One framework to develop, deploy and operate data workflows with Python and SQL.

Updated May 14, 2024
Python

elementary-data / dbt-data-reliability

Star

dbt package that is part of Elementary, the dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

data analytics dbt data-pipelines data-lineage analytics-engineering data-pipeline-monitoring dbt-tests dbt-packages data-observability data-reliability dbt-artifacts

Updated May 13, 2024
Python

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated May 12, 2024
HTML

mycelial / mycelial

Star

Move your data with ease.

rust etl data-pipelines edge-computing etl-pipeline

Updated May 12, 2024
Rust

allamiro / Data-Pipelines

Star

Every thing about designing installing and implementing data pipelines to include kafka zookeeper hadoop If you enjoy my content please consider supporting what I do Thank you.

data-science logstash kafka hadoop zookeeper rsyslog siem kafka-streams data-pipelines elk-stack confluent-kafka kafka-monitoring data-pipeline-monitoring confluent-tutorials

Updated May 12, 2024
Jinja

Snehil-Shah / Seismic-Alerts-Streamer

Star

A Realtime Seismic Logging & Alerts Service with Live Monitoring & Email Alerts made using Kafka Data Pipelines, all Dockerized & Deployment Ready!

docker flask kafka websocket data-pipelines containerized-build

Updated May 11, 2024
Java

tuva-health / tuva

Star

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library

open-source bigquery sql snowflake data-warehouse healthcare data-analytics redshift terminology dbt data-pipelines data-governance data-lineage healthcare-analysis healthcare-data analytics-engineering dbt-packages

Updated May 14, 2024

Improve this page

Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-pipelines

Here are 199 public repositories matching this topic...

bruin-data / bruin

mage-ai / mage-ai

apache / dolphinscheduler

apache / airflow

infiniflow / ragflow

artie-labs / transfer

Unstructured-IO / unstructured

dagster-io / dagster

SciPhi-AI / R2R

infinyon / fluvio

meltano / meltano

tsdat / tsdat

CogStack / CogStack-NiFi

vmware / versatile-data-kit

elementary-data / dbt-data-reliability

elementary-data / elementary

mycelial / mycelial

allamiro / Data-Pipelines

Snehil-Shah / Seismic-Alerts-Streamer

tuva-health / tuva

Improve this page

Add this topic to your repo