Skip to content

yennanliu/data_infra_repo

Repository files navigation

data_infra_repo

Build Status PRs

As Data infra build part of the "Daas (Data as a service) repo", this project shows how to build DS/DE environments via Docker from scratch. Will focus on : 1) System design by practical using cases 2) Docker, package, and libraries env setting up 3) Test, staging, and product develop/deploy workflow development (CI/CD style maybe)

File Structure

# main projects
├── airflow_in_docker_compose
├── celery_redis_flower_infra
├── deploy_dockerhub.sh
├── hadoop_yarn_spark
├── kafka-zookeeper
├── kafka_zookeeper_redis_infra
├── mysql-master-slave

TODO

  • Hadoop
    • hadoop_yarn_spark (batch)
    • hadoop_yarn_spark (stream)
    • hadoop namenode, datanode
    • hadoop_yarn_flink
  • Kafka
    • Kafka producer, consumer, zk
    • Kafka mirror
    • Kafka-ELK-DB
  • airflow
    • airflow app in docker compose
  • DB
    • DB sharding (partition)
    • DB replica
    • DB master-follower
    • DB master-master
    • DB binary stream (kafka) to Bigquery/DW
    • DB binary stream ELK
  • Microservice

Test

Ref