An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
May 23, 2024 - Scala
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries. InfoWorld’s 2023 BOSSIE Award for best open source software.
DataPulse is a platform for developers to build, schedule and monitor data pipelines.
This construct builds some elements for you to quickly launch an EMR Serverless application. After submitting the Emr Serverless job, you could also launch an EMR notebook via cluster template to check the outcome from the EMR Serverless application.
A native Rust library for Delta Lake, with bindings into Python
Amazon SageMaker Local Mode Examples
An open protocol for secure data sharing
Example of local pyspark setup including DeltaLake for unit-testing
The Goal of this project is to provide documentation for the Lakehouse Engine framework.
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
Analytical database for data-driven Web applications 🪶
Create full-fledged APIs for slowly moving datasets without writing a single line of code.
A Minimalistic Rust Implementation of Delta Sharing Server.
Data Streaming with Debezium, Kafka, Spark Streaming, Delta Lake, and MinIO
Add a description, image, and links to the delta-lake topic page so that developers can more easily learn about it.
To associate your repository with the delta-lake topic, visit your repo's landing page and select "manage topics."