#

data-lakehouse

Here are 12 public repositories matching this topic...

Data-Kube / tst-datalakehouse-hudi

#Test - Create a Data Lakehouse in Kubernetes

kubernetes minio flink strimzi hudi data-lakehouse

Updated May 25, 2024

firelink-data / evolution

🦖 Efficiently evolve your old fixed-length data files into more modern file formats, fully parallelized!

rust cli converter data-science arrow ipc data-engineering data-lake data-generation flight delta iceberg apache-parquet parallel-programming data-mocking apache-arrow delta-lake data-lakehouse arrow-rs

Updated May 27, 2024
Rust

huwngnosleep / complete_lakehouse_techstack

This project implements an end-to-end techstack for a data platform, can be used on production.

kafka spark hadoop etl bigdata data-warehouse data-platform lambda-architecture data-lakehouse

Updated May 21, 2024
Python

sudohainguyen / mini-lakehouse

Data lakehouse at home with docker compose

hive iceberg trino data-lakehouse

Updated May 20, 2023
Jupyter Notebook

eavilaes / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

scala big-data spark sampling datasource hacktoberfest spark-sql data-lakehouse

Updated May 1, 2023
Scala

prneidhardt / AWS-Data-Lakehouse

STEDI project

aws apache-spark data-manipulation data-lakehouse data-definition-language

Updated Jul 9, 2023
Python

mahmoudparsian / data-warehousing

This repository is a place for the Data Warehousing course at the Information Systems & Analytics department, Santa Clara University.

data-mining database etl data-visualization data-lake business-intelligence data-analytics elt data-modeling data-warehousing star-schema dimensional-modeling data-lakehouse

Updated Apr 2, 2024
HTML

gupta-aayushkr / F1-Racing

The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.

sql azure databricks pyspark-notebook data-factory data-lakehouse

Updated Jan 10, 2024
Python

aabouzaid / modern-data-platform-poc

My M.Sc. dissertation: Modern Data Platform using DataOps, Kubernetes, and Cloud-Native ecosystem to build a resilient Big Data platform based on Data Lakehouse architecture which is the base for Machine Learning (MLOps) and Artificial Intelligence (AIOps).

kubernetes big-data data-engineering dataops data-platform cloud-native msc msc-project edinburgh-napier cloud-agnostic data-lakehouse

Updated May 12, 2024
Jupyter Notebook

dominikhei / Local-Data-LakeHouse

Sample Data Lakehouse deployed in Docker containers using Apache Iceberg, Minio, Trino and a Hive Metastore. Can be used for local testing.

data-lake minio trino hive-metastore apache-iceberg lakehouse data-lakehouse

Updated Sep 2, 2023
Dockerfile

pracdata / awesome-open-source-data-engineering

A curated list of open source tools used in analytical stacks and data engineering ecosystem

Updated May 7, 2024

qbeast-spark

Qbeast-io / qbeast-spark

Qbeast-spark: DataSource enabling multi-dimensional indexing and efficient data sampling. Big Data, free from the unnecessary!

scala big-data spark sampling datasource spark-sql data-lakehouse

Updated May 24, 2024
Scala

Improve this page

Add a description, image, and links to the data-lakehouse topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-lakehouse topic, visit your repo's landing page and select "manage topics."