#

data-pipelines

Here are 201 public repositories matching this topic...

airflow

apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Updated Jun 10, 2024
Python

pathwaycom / pathway

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

python rust streaming real-time kafka etl machine-learning-algorithms stream-processing data-analytics dataflow data-processing data-pipelines batch-processing pathway iot-analytics etl-framework time-series-analysis

Updated Jun 10, 2024
Python

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Jun 10, 2024
Python

dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.

python metadata workflow data-science etl analytics scheduler orchestration data-engineering data-integration data-pipelines workflow-automation mlops dagster data-orchestrator

Updated Jun 10, 2024
Python

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

nlp machine-learning information-retrieval ocr deep-learning chatbot orchestration preprocessing pdf-to-text data-pipelines document-parser rag document-understanding table-structure-recognition llm llmops retrieval-augmented-generation

Updated Jun 10, 2024
Python

apache / dolphinscheduler

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

workflow airflow job-scheduler orchestration cloud-native task-scheduler data-pipelines azkaban workflow-orchestration workflow-schedule powerful-data-pipelines

Updated Jun 10, 2024
Java

SciPhi-AI / R2R

Build and deploy a fully-featured, observable, user-facing RAG backend in minutes.

search pdf machine-learning ocr deep-learning retrieval chatbot artificial-intelligence question-answering data-pipelines retrieval-systems large-language-models llm langchain llama-index retrieval-augmented-generation

Updated Jun 9, 2024
HTML

mycelial / mycelial

Move your data with ease.

rust etl data-pipelines edge-computing etl-pipeline

Updated Jun 9, 2024
Rust

nbigot / ministream

Ministream is a small, stand-alone, real-time event messaging streaming server

go golang json server nosql messaging eventing cloud-native webapi data-pipelines streaming-data real-time-processing event-streaming-database ministream

Updated Jun 9, 2024
Go

fluvio

infinyon / fluvio

Lean and mean distributed stream processing system written in rust and web assembly.

rust distributed-systems streaming real-time serverless webassembly data-flow stream-processing data-integration cloud-native data-pipelines stateful streaming-data stream-processing-engine event-driven-architecture streaming-data-processing streaming-data-pipelines

Updated Jun 10, 2024
Rust

Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated Jun 8, 2024
HTML

artie-labs / transfer

Database replication platform that leverages change data capture. Stream production data from databases to your data warehouse (Snowflake, BigQuery, Redshift) in real-time.

golang bigquery database kafka snowflake data-integration redshift apache-kafka elt data-pipelines cdc change-data-capture debezium

Updated Jun 9, 2024
Go

tuva-health / tuva

Main repo including core data model, data marts, reference data, terminology, and the clinical concept library

open-source bigquery sql snowflake data-warehouse healthcare data-analytics redshift terminology dbt data-pipelines data-governance data-lineage healthcare-analysis healthcare-data analytics-engineering dbt-packages

Updated Jun 7, 2024

bruin-data / bruin

Bruin is a data pipeline tool that is designed to be easy-to-use. It allows building data pipelines using SQL and Python, and has built-in data quality checks.

python bigquery sql analytics data-transformation snowflake data-analysis data-pipelines data-modeling

Updated Jun 7, 2024
Python

tsdat

tsdat / tsdat

Framework for standardizing, transforming, and applying quality checks to time series data.

python data-analysis data-pipelines tsdat

Updated Jun 7, 2024
Python

elementary

elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Jun 6, 2024
HTML

goto / optimus

Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.

golang bigquery airflow automation etl analytics data-transformation data-warehouse business-intelligence dataops elt workflows data-pipelines data-modelling analytics-engineering

Updated Jun 10, 2024
Go

odd-platform

opendatadiscovery / odd-platform

First open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.

Updated Jun 6, 2024
Java

mackelab / epiphyte

Python toolkit for working with high-dimensional neural data recorded during naturalistic, continuous stimuli @a-darcher @rachrapp

python docker movies database toolbox data-analysis data-pipelines datajoint meta-data epiphyte computational-neurosicence

Updated Jun 5, 2024
Jupyter Notebook

amphi-ai / amphi-etl

Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.

data etl structured-data data-pipelines unstructured-data rag-pipeline

Updated Jun 4, 2024
TypeScript

Improve this page

Add a description, image, and links to the data-pipelines topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipelines topic, visit your repo's landing page and select "manage topics."