My personal project for data engineering zoomcamp
-
Updated
May 13, 2024 - Python
My personal project for data engineering zoomcamp
MLRun is an open source MLOps platform for quickly building and managing continuous ML applications across their lifecycle. MLRun integrates into your development and CI/CD environment and automates the delivery of production data, ML pipelines, and online applications.
The open source high performance ELT framework powered by Apache Arrow
Turns Data and AI algorithms into production-ready web applications in no time.
One framework to develop, deploy and operate data workflows with Python and SQL.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Compute over Data framework for public, transparent, and optionally verifiable computation
SQL stream processing, analytics, and management. We decouple storage and compute to offer speedy bootstrapping, dynamic scaling, time-travel queries, and efficient joins.
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
lakeFS - Data version control for your data lake | Git for data
Compilation of high-profile real-world examples of failed machine learning projects
💜🌈 A Data Engineering Project that implements an ETL data pipeline using Dagster, Apache Spark, Streamlit, MinIO, Apache Superset, Dbt 🌺
Infinitely scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.
Apache Superset is a Data Visualization and Data Exploration Platform
A REST-API developed in Python (Flask) and analytics using SQLAlchemy ORM queries, Pandas, and Matplotlib for Hawaii climate data.
Clean APIs for data cleaning. Python implementation of R package Janitor
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack. Leading Reverse ETL and Customer Data Platform (CDP) for Data Teams.
DDE IO Utility Objects
Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.
Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."