DataHut-DuckHouse is a lightweight, hybrid, and open-source analytics platform that combines the simplicity of DuckDB, the scalability of Iceberg, the speed of Arrow Flight, the orchestration power of Xorq, and the modularity of dbt to create a modern, local or cloud-ready data stack.
+------------------------+
| CSV / Local Files |
+-----------+------------+
|
v
+------------------------+
| Arrow Flight Client |
| (ingest_flight.py) |
+-----------+------------+
|
v
+--------------+---------------+
| Arrow Flight Server |
| (Xorq + app.py) |
| - hybrid backend: Iceberg + DuckDB
| - snapshots, synchronized views
+--------------+---------------+
|
+---------------+----------------+
| |
+--------+ +-------------+
| DuckDB | | Iceberg |
+--------+ +-------------+
| |
+--------+ +---------------+
| |
+-------------+ +-------------+
| dbt | | Trino |
+-------------+ +-------------+
| |
| v
| BI Tools (Metabase, Tableau)
|
v
SQL models per tenant
- 🔗 Fast ingestion via Arrow Flight
- 🐤 Hybrid storage: local DuckDB & Iceberg (MinIO)
- 🧠 Orchestration with Xorq (Flight + multi-backend support)
- 🔄 Auto-synchronization with Trino (catalogs)
- 📊 Declarative SQL transformations using dbt
- 📦 Multi-tenancy: dynamic tenant creation/deletion
- ☁️ S3 integration via MinIO
git clone https://github.com/Geobatpo07/datahut-duckhouse.git
cd datahut-duckhouse
poetry install
docker-compose up --build
poetry run python scripts/create_tenant.py --id tenant_acme
Place a CSV file in ingestion/data/data.csv then run:
poetry run python scripts/ingest_flight.py
datahut-duckhouse/
├── flight_server/ # Arrow Flight Server + HybridBackend
│ ├── app/
│ ├── app.py
│ ├── app_xorq.py
│ ├── xorq_config.py
│ ├── utils.py
│ └── backends/hybrid_backend.py
├── ingestion/data/ # Source data
├── scripts/ # Ingestion, queries, tenant management
│ ├── ingest_flight.py
│ ├── query_duckdb.py
│ ├── create_tenant.py
│ └── delete_tenant.py
├── transform/dbt_project/ # dbt models
├── config/ # Trino, dbt, tenants, users
│ ├── trino/etc/
│ ├── tenants/
│ └── users/users.yamlx
├── .env # Environment variables
├── docker-compose.yml
└── pyproject.toml
export DBT_PROFILES_DIR=transform/dbt_project/config
cd transform/dbt_project
poetry run dbt run
poetry run python scripts/query_duckdb.py
poetry run python scripts/delete_tenant.py --id tenant_acme
Access Trino at http://localhost:8080
Use tenant_acme
as the Trino catalog in Superset or Metabase.
- ✅ Multi-tenant Iceberg + DuckDB
- ✅ Dynamic registration with Xorq + Trino
- 🔜 Flask/React management interface
- 🔜 User authentication + role management
- 🔜 Integration with Metabase or Superset
- 🔜 SaaS deployment on public cloud
Project under MIT License.
Geovany Batista Polo LAGUERRE – lgeobatpo98@gmail.com | Data Science & Analytics Engineer