This project aims to deploy a complete data platform on kubernetes, many services are available to build end-to-end data engineering projects from ingestion to visualization.
- docker
- kubernetes (minikube cluster for local development)
- kubectl
- helm
- Data ingestion
- Nifi
- Data integration
- Airbyte
- Message queue
- Kafka
- RabbitMQ
- Change Data Capture
- Debezium
- Database
- Cassandra
- Druid
- MongoDB
- MySQL/Phpmyadmin
- PostgreSQL/pgAdmin
- Data warehouse
- ClickHouse
- Datalake
- MinIO
- Data transformation
- dbt
- Flink
- Spark
- Data quality
- Great Expectations
- Distributed SQL query engine
- Trino
- Visualization
- Metabase
- Superset
- Machine learning
- Kubeflow
- Orchestration
- Airflow
- Argo Workflows
- Monitoring
- Grafana/Prometheus
- Notebook
- JupyterHub
- Delta Lake
- Apache Iceberg (soon)
Before deploying in the cluster, choose services you want to start in .config
file. (y|n)
Deploy the data plaftorm
./start.sh
You may need to wait a few minutes for all services to start, you can check pods status with the following command : kubectl get all -A
.
Turn off the data plaftorm
./stop.sh
some services are accessible through an URL
example : http://dataplatform.<service-name>.io/
access another service from inside
<service-name>.<namespace>.svc.cluster.local:<service-port>
get helm default values
helm show values <repo/chart> > values.yaml
config file
set .config file to choose services you want to enable/disable
minikube ingress addons
minikube addons enable ingress
kubernetes dashboard
minikube dashboard --url