Skip to content

Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter, JFrog Container Registry

rogeriomm/labtools-k8s

Repository files navigation

This is a work in progress...

Pipeline architecture

flowchart TD
    Postgres(Postgres Database) -->|CDC| Kafka(Kafka Strimzi)
    SQLServer(SQL Server Database) -->|CDC| Kafka
    Kafka -->|AVRO Data Stream| ConsumerMinio(Minio S3)
    ConsumerMinio -->|AVRO Data Stream| ConsumerSpark(Apache Spark)
    ConsumerSpark --> |CDC Replication using Scala Engine - TODO| ConsumerDelta(Delta Lake)
    ConsumerSpark --> |Data catalog, lineage| ConsumerDatahub(Datahub)
    ConsumerSpark --> HiveMetastore(Hive metastore)
    Kafka -->|Schema Management| SchemaRegistry(Confluent Schema Registry)
    Kafka --> RedpandaConsole(Redpanda Console)
    SchemaRegistry -->|Schema Use - API| ConsumerSpark
    ConsumerDelta -->|Data Query| Trino(Trino)
    click ConsumerDelta href "https://github.com/rogeriomm/debezium-cdc-replication-delta" "Visit GitHub repository"
    Airflow(Apache Airflow) -->|Orchestrate| ConsumerSpark
    Trino --> Zeppelin(Zeppelin)
    Trino --> Jupyter(Jupyter)
    Trino --> Metabase(Metabase)
    
    class Postgres,SQLServer database;
    class Kafka,SchemaRegistry kafka;
    class ConsumerMinio,ConsumerSpark,ConsumerDelta consumers;
    class Datahub datahub;

Kafka Strimzi, Debezium CDC AVRO, Confluent Schema Registry, Postgres/SQL Server

Postgres

drawing

drawing

Microsoft SQL Server CDC

Zeppelin/Jupyter

drawing

drawing

Spark

drawing

Metabase

drawing

Datahub

drawing

OpenMetadata

drawing

Airflow

drawing

Minio

drawing

drawing

Argo CD

drawing

Kubernetes

drawing

Web local

Local URL Description User Password
https://dashboard.worldl.xpt/ K8S dashboard
https://argocd.worldl.xpt ArgoCD admin Notebook
https://zeppelin.worldl.xpt Zeppelin
https://jupyter.worldl.xpt/jupyter Jupyter notebook: Python,Scala, RUST
https://jupyter-commander.worldl.xpt/jupyter Jupyter notebook: Python,Scala, RUST - K8S Admin Service Account
https://minio-console.worldl.xpt MINIO operator instance minio-tenant-1 minio awesomes3
https://console.minio-operator.svc.cluster2.xpt:9090 MINIO operator
https://airflow.worldl.xpt/flower/ Airflow flower admin admin
https://airflow.worldl.xpt/airflow Airflow
https://jupyter-glue2.worldl.xpt/ AWS Glue version 2.0 - Jupyter
https://webui-glue2.worldl.xpt/ AWS Glue version 2.0 - WebUI
https://history-glue2.worldl.xpt/ AWS Glue version 2.0 - History
https://jupyter-glue3.worldl.xpt/ AWS Glue version 3.0 - Jupyter
https://webui-glue3.worldl.xpt/ AWS Glue version 3.0 - WebUI
https://history-glue3.worldl.xpt/ AWS Glue version 3.0 - History
https://jupyter-glue4.worldl.xpt/ AWS Glue version 4.0 - Jupyter
https://webui-glue4.worldl.xpt/ AWS Glue version 4.0 - WebUI
https://history-glue4.worldl.xpt/ AWS Glue version 4.0 - History
http://datahub.worldl.xpt/ Datahub datahub manualPassword
https://openmetadata.worldl.xpt/ OpenMetadata admin admin
https://kafkaui.worldl.xpt/ Kafka UI
https://redpanda-console.worldl.xpt/ Redpanda Console
https://metabase.worldl.xpt/ Metabase
http://trino.trino.svc:8080 Trino
https://jfrog.worldl.xpt Jfrog admin password
https://harbor.worldl.xpt Harbor admin notebook
https://nexus.worldl.xpt/ Nexus Free trial admin admin123
https://nexus.admin.worldl.xpt/ Nexus Free trial
https://keycloack.worldl.xpt Keycloak user notebook

Internet Web (Protected by Firewall)

Public URL Description
https://world-zeppelin.duckdns.org Zeppelin
https://world-jupyter.duckdns.org/jupyter Jupyter notebook: Python, Scala, RUST

About

Complete data engineering pipeline running on Minikube Kubernetes, Argo CD, Spark, Trino, S3, Delta lake, Postgres+ Debezium CDC, MySQL,Airflow, Kafka Strimzi, Datahub, OpenMetadata,Zeppelin, Jupyter, JFrog Container Registry

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published