Data platform on Kubernetes

This project aims to deploy a complete data platform on kubernetes, many services are available to build end-to-end data engineering projects from ingestion to visualization.

Prerequisites

docker
kubernetes (minikube cluster for local development)
kubectl
helm

Available services

Data ingestion
- Nifi
Data integration
- Airbyte
Message queue
- Kafka
- RabbitMQ
Change Data Capture
- Debezium
Database
- Cassandra
- Druid
- MongoDB
- MySQL/Phpmyadmin
- PostgreSQL/pgAdmin
Data warehouse
- ClickHouse
Datalake
- MinIO
Data transformation
- dbt
- Flink
- Spark
Data quality
- Great Expectations
Distributed SQL query engine
- Trino
Visualization
- Metabase
- Superset
Machine learning
- Kubeflow
Orchestration
- Airflow
- Argo Workflows
Monitoring
- Grafana/Prometheus
Notebook
- JupyterHub

Data formats

Delta Lake
Apache Iceberg (soon)

How to deploy the data platform on kubernetes

Before deploying in the cluster, choose services you want to start in .config file. (y|n)

Deploy the data plaftorm
./start.sh

You may need to wait a few minutes for all services to start, you can check pods status with the following command : kubectl get all -A.

Turn off the data plaftorm
./stop.sh

Helpful:

some services are accessible through an URL
example : http://dataplatform.<service-name>.io/

access another service from inside
<service-name>.<namespace>.svc.cluster.local:<service-port>

get helm default values
helm show values <repo/chart> > values.yaml

config file
set .config file to choose services you want to enable/disable

minikube ingress addons
minikube addons enable ingress

kubernetes dashboard
minikube dashboard --url

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
cronjobs		cronjobs
deployments		deployments
dockerbuild		dockerbuild
examples		examples
ingress		ingress
jobs		jobs
kubefiles		kubefiles
persistentvolumes		persistentvolumes
pods		pods
serviceaccounts		serviceaccounts
values		values
.config		.config
LICENSE		LICENSE
README.md		README.md
credentials.md		credentials.md
start.sh		start.sh
stop.sh		stop.sh

License

hellomaxime/data-platform-on-kubernetes

Folders and files

Latest commit

History

Repository files navigation

Data platform on Kubernetes

Prerequisites

Available services

Data formats

How to deploy the data platform on kubernetes

Helpful:

About

Topics

Resources

License

Stars

Watchers

Forks

Languages