BNDF Cluster

Introduction

This is a Standalone cluster which includes the big data tools required by BNDF. This cluster is built and configured with Docker. Extension and scale-up to multi-node cluster could be easily done with Docker Swarm or other container orchestration tools like Kubernetes.

Versioning Information

Tools that is configured in this cluster could be summarized in the following table.

Tool	Dependencies	DependenciesVersion	Version
Apache Hadoop	Java	8, 11	2.7.7, 3.2.1
Apache Hive	Apache Hadoop, PostgreSQL	2.7.7, 12	2.3.7
Apache Spark	Apache Hive, Apache Hadoop	2.3.7, 3.2.1	3.0.0
Apache Zeppelin	Apache Spark	3.0.0	0.9.0
MongoDB	-	-	4.2.6
Netdata	-	-	latest

Service and Published Ports

Service	Port
Spark Master	8080
Spark Job WebUi	4042
HDFS Namenode	9874
HDFS Datanode	9864
Zeppelin	8085
Zeppelin Jobs WebUi	4040
MongoDB	27017
Netdata	19999
Hive	10000

Service WebUi is accessible with http://MACHINE_IP:Port, where MACHINE_IP is either localhost or remote server ipv4 address.

Getting Started

Create the cluster

Docker, and Docker Compose should be installed in order to create the cluster. Generally all operating systems have support for docker.

$ git clone https://github.com/M0h3eN/bndfcluster.git
$ cd bndfcluster

Root directory information

Directories are configurable, and their path could be changed by the user.

volumes directory include configs and data of the services, and it should be places on a disk with abundant capacity.
sample-data corresponds the input data directory.
jars directory includes extra jar files that user needs.
appJars is the directory that includes BNDF jar file.

Cluster could be created by the create-hdfs-spark-cluster.sh scripts. This script takes two parameter, VOLUMES_PATH and DATA_PATH respectively, which corresponds to volumes and sample-data directories.

$ sudo ./create-hdfs-spark-cluster.sh ./volumes ./sample-data

This will create cluster with default paths. It could take some time for the first time, since it will pull all required docker images from dockerhub. The cluster status could be checked by running

$ sudo docker ps

Work with BNDF Modules

Get Sample Data

Sample data to run BNDF could be get through get-data.sh script. It takes DATA_PATH parameter.

$ ./get-data.sh ./sample-data

Get Latest BNDF jar file

$ ./get-bndf-jar.sh ./appJars

RecordingDataLoader Module

RecordingDataLoader Module could be run through run-recording-data-loader.sh script. It takes five parameter with the following order

VOLUMES_PATH
DATA_PATH
SPARK_EXECUTOR_MEMORY
SPARK_EXECUTOR_CORES
SPARK_DRIVER_MEMORY

$ sudo ./run-recording-data-loader.sh ./volumes ./sample-data 35 18 10

This would run RecordingDataLoader module with default path configuration, 35 GB spark executor memory, 18 spark executor cores and 10 GB spark driver memory.

Sorting Module

Sorting Module could be run through run-sorting.sh script. It takes six parameter with the following order

VOLUMES_PATH
DATA_PATH
SPARK_EXECUTOR_MEMORY
SPARK_EXECUTOR_CORES
SPARK_DRIVER_MEMORY
Experiment/Session Name

The Experiment/Session list could be get after running run-recording-data-loader.sh. It is accessible in either Meta Data Database or by running get-sessionOrExperiment-list.sh cluster.

$ sudo ./get-sessionOrExperiment-list.sh

Experiment_Kopo_2018-04-25_J9_8600
Experiment_Kopo_2018-04-25_J9_8900

For example for sorting Experiment_Kopo_2018-04-25_J9_8600 experiment:

$ sudo ./run-sorting.sh ./volumes ./sample-data 35 18 10 Experiment_Kopo_2018-04-25_J9_8600

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
docker-files		docker-files
jars		jars
volumes		volumes
LICENSE		LICENSE
README.md		README.md
create-hdfs-spark-cluster.sh		create-hdfs-spark-cluster.sh
get-bdns-jar.sh		get-bdns-jar.sh
get-data.sh		get-data.sh
get-sessionOrExperiment-list.sh		get-sessionOrExperiment-list.sh
hdfs-spark-mongo-cluster.yml		hdfs-spark-mongo-cluster.yml
recording-data-loader.yml		recording-data-loader.yml
run-recording-data-loader.sh		run-recording-data-loader.sh
run-sorting.sh		run-sorting.sh
sorting.yml		sorting.yml

License

M0h3eN/bndfcluster

Folders and files

Latest commit

History

Repository files navigation

BNDF Cluster

Introduction

Versioning Information

Service and Published Ports

Getting Started

Create the cluster

Root directory information

Work with BNDF Modules

Get Sample Data

Get Latest BNDF jar file

RecordingDataLoader Module

Sorting Module

About

Topics

Resources

License

Stars

Watchers

Forks

Languages