AVUBDI

Github Repository for a Versatile Usable Big Data Infrastructure (AVUBDI) in Docker.

Development Environment

Dell XPS 7590
Intel Core i7-9750H (6 Cores)
64 GB DDR4-2666 SODIMM Memory
2TB NVMe PCIe M.2 SSD

Docker Host Environment

VMWare Workstation 15 Player
CentOS 8 + installed docker engine + compose
50 GB Memory
4 Cores

Big Data Components

We split the used big data components into 3 parts for better understanding.

Master Stack / Head Stack / Coordination Stack

This group consists of technologies responsible for data ingestion, distribution, validation, management and coordination.

Component	Description	Docker Image
Kafka	Distributed and scaleable streaming platform that supports real-time & batch processing with high throughput.	confluentinc/cp-kafka:5.5.0
Kafka Connect	Kafka Connect is a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems.	confluentinc/cp-kafka-connect:5.5.0
Kafka Rest Proxy	The Kafka REST Proxy provides a RESTful interface to a Kafka cluster. Examples of use cases include reporting data to Kafka from any frontend app built in any language, ingesting messages into a stream processing framework that doesn’t yet support Kafka, and scripting administrative actions.	confluentinc/cp-kafka-rest:5.5.0
Schema Registry	Schema Registry provides a serving layer for the metadata. It provides a RESTful interface for storing and retrieving your Avro®, JSON Schema, and Protobuf schemas. It works like a charm in combination with Kafka and enables us to hold the whole infrastructure in a schema consistent state.	confluentinc/cp-schema-registry:5.5.0
Zookeeper	ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications.	confluentinc/cp-zookeeper:5.5.0

Slave Stack / Worker Stack / Analytical Stack

This group consists of technologies responsible for complex data analytics and visualization on stream and batch data.

Component	Description	Docker Image
Spark-Master	Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. In this we can deploy any spark job.	bde2020/spark-master
Spark-Worker(x2)	Apache Spark is a unified analytics engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.	bde2020/spark-worker
InfluxDB	InfluxDB is the leading open source time series database for monitoring metrics and events and providing real-time visibility into stacks, sensors, and systems.	influxdb:1.8.0
Chronograf	Chronograf is a visualization tool for time series data in InfluxDB.	chronograf:1.8.4

Monitoring Stack / Management Stack

Component	Description	Docker Image
Kafka Connect UI	Kafka Connect UI is a web tool for Kafka Connect for setting up and managing connectors for multiple connect clusters.	landoop/kafka-connect-ui
Kafka Cluster UI	Kafdrop is a UI for monitoring Apache Kafka clusters. The tool displays information such as brokers, topics, partitions, and even lets you view messages.	obsidiandynamics/kafdrop
Schema Registry UI	The Schema Registry UI is a fully-featured tool for your underlying schema registry that allows visualization and exploration of registered schemas.	landoop/schema-registry-ui
Docker Container Management UI	Portainer is a lightweight management UI which allows easy management of the Docker host or Swarm cluster.	portainer/portainer
Grafana	Grafana is the open source analytics & monitoring solution for a lot of database (in our case InfluxDB).	grafana/grafana:7.0.6

Docker

What is Docker Engine

Docker Engine is an open source containerization technology for building and containerizing your applications. Docker Engine acts as a client-server application with: A server with a long-running daemon process dockerd . APIs which specify interfaces that programs can use to talk to and instruct the Docker daemon.

Docker Engine

What is Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application's services.

Docker Compose

Installation of Docker Engine

CentOS

Install the yum-utils package (which provides the yum-config-manager utility) and set up the stable repository.

sudo yum install -y yum-utils

sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

Install the latest version of Docker Engine and containerd.

sudo yum install docker-ce docker-ce-cli containerd.io

Start Docker

sudo systemctl start docker

Install Docker Compose

sudo curl -L "https://github.com/docker/compose/releases/download/1.26.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

Make Docker Compose Binary an Executable

sudo chmod +x /usr/local/bin/docker-compose

Verify that Docker Engine and Docker Compose is installed correctly by running the cogniplant docker-compose.yml file.

sudo docker-compose up -d --build

The output should look like the following:

[mmayr@localhost Cogniplant]$ docker-compose up -d
Creating spark-master            ... done
Creating zookeeper-1             ... done
Creating influxdb                ... done
Creating portainer               ... done
Creating cogniplant_chronograf_1 ... done
Creating cogniplant_grafana_1    ... done
Creating kafka-1                 ... done
Creating spark-worker-2          ... done
Creating spark-worker-1          ... done
Creating kafka-schema-registry   ... done
Creating kafdrop                 ... done
Creating schema-registry-ui      ... done
Creating kafka-rest-proxy        ... done
Creating kafka-connect           ... done
Creating kafka-connect-ui        ... done

Dashboard UIs

Preliminary

Use the virtualization host ip address for connecting to the different UIs. This IP and additionally the ports can be configured in the .env file!

Kafka Connect UI

Schema Registry UI

Grafana

Chronograf

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cogniplant		cogniplant
images		images
utils		utils
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
docker-single-host-deployment.sh		docker-single-host-deployment.sh

License

software-competence-center-hagenberg/AVUBDI

Folders and files

Latest commit

History

Repository files navigation

AVUBDI

Development Environment

Docker Host Environment

Big Data Components

Master Stack / Head Stack / Coordination Stack

Slave Stack / Worker Stack / Analytical Stack

Monitoring Stack / Management Stack

Docker

What is Docker Engine

What is Docker Compose

Installation of Docker Engine

CentOS

Dashboard UIs

Preliminary

Portainer

Kafka Monitoring UI (Kafdrop)

Spark Stream & Batch Master UI

Kafka Connect UI

Schema Registry UI

Grafana

Chronograf

About

Topics

Resources

License

Stars

Watchers

Forks

Languages