#

data-pipelines

Here are 201 public repositories matching this topic...

dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.

python metadata workflow data-science etl analytics scheduler orchestration data-engineering data-integration data-pipelines workflow-automation mlops dagster data-orchestrator

Updated Jun 12, 2024
Python

pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

python rust streaming real-time kafka etl machine-learning-algorithms stream-processing data-analytics dataflow data-processing data-pipelines batch-processing pathway iot-analytics etl-framework time-series-analysis

Updated Jun 12, 2024
Python

airflow

apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Updated Jun 12, 2024
Python

Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated Jun 12, 2024
HTML

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

nlp machine-learning information-retrieval ocr deep-learning chatbot orchestration preprocessing pdf-to-text data-pipelines document-parser rag document-understanding table-structure-recognition llm llmops retrieval-augmented-generation

Updated Jun 12, 2024
Python

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Jun 12, 2024
Python

apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

workflow airflow job-scheduler orchestration cloud-native task-scheduler data-pipelines azkaban workflow-orchestration workflow-schedule powerful-data-pipelines

Updated Jun 12, 2024
Java

SciPhi-AI / R2R

The ultimate open-source RAG framework

search pdf machine-learning ocr deep-learning retrieval chatbot artificial-intelligence question-answering data-pipelines retrieval-systems large-language-models llm langchain llama-index retrieval-augmented-generation

Updated Jun 12, 2024
HTML

artie-labs / transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

golang bigquery database kafka snowflake data-integration redshift apache-kafka elt data-pipelines cdc change-data-capture debezium

Updated Jun 12, 2024
Go

meltano / meltano

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Updated Jun 11, 2024
Python

tuva-health / tuva

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library

open-source bigquery sql snowflake data-warehouse healthcare data-analytics redshift terminology dbt data-pipelines data-governance data-lineage healthcare-analysis healthcare-data analytics-engineering dbt-packages

Updated Jun 11, 2024

unicef / magasin

Cloud native open-source end-to-end data / AI / ML platform

kubernetes data-science data cloud data-visualization helm-charts data-pipelines magasin dagster

Updated Jun 11, 2024
Mustache

mycelial / mycelial

Move your data with ease.

rust etl data-pipelines edge-computing etl-pipeline

Updated Jun 11, 2024
Rust

odd-platform

opendatadiscovery / odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Updated Jun 11, 2024
Java

elementary

elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Jun 11, 2024
HTML

smart-data-lake / smart-data-lake

Smart Automation Tool for building modern Data Lakes and Data Pipelines

scala spark hive hadoop transform-data data-lake data-pipelines deltalake smart-data-lake

Updated Jun 11, 2024
Scala

bruin-data / bruin

Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.

python bigquery sql analytics data-transformation snowflake data-analysis data-pipelines data-modeling

Updated Jun 11, 2024
Python

fluvio

infinyon / fluvio

Lean and mean distributed stream processing system written in rust and web assembly.

rust distributed-systems streaming real-time serverless webassembly data-flow stream-processing data-integration cloud-native data-pipelines stateful streaming-data stream-processing-engine event-driven-architecture streaming-data-processing streaming-data-pipelines

Updated Jun 10, 2024
Rust

kestra-io / examples

Best practices for data workflows, integrations with the Modern Data Stack (MDS), Infrastructure as Code (IaC), Cloud Provider Services

automation orchestration data-engineering data-pipelines data-orchestration analytics-engineering data-workflows

Updated Jun 10, 2024
Dockerfile

amphi-ai / amphi-etl

Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.

data etl structured-data data-pipelines unstructured-data rag-pipeline

Updated Jun 10, 2024
TypeScript

Improve this page

Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."