Scribengin Quickstart

#Contents#

Overview
Build NeverwinterDP
Automation Prerequisites
Setup a cluster automatically in Docker
Setup a cluster automatically in Digital Ocean
Setup a cluster manually in AWS
Setup a cluster automatically somewhere else
Launching Scribengin manually in an already setup YARN cluster
Launching a Datafow from a preconfigured test
Monitoring Scribengin
Using the Scribengin Shell

#General Steps To Setup#

You need to check out NeverwinterDP code and build
- Check out NeverwinterDP from https://github.com/Nventdata/NeverwinterDP
- Build Scribengin with gradle
Setup the scribengin cluster using Docker, Digital Ocean, or any VM provider
- Install java and other requirement on the VMs
- Update /etc/hosts so the VMs know about each other
- Run Zookeeper, Hadoop, and YARN
- Optionally run Kafka, Elasticsearch...
Launch the VM Master (Scribengin's YARN Application Master)
Submit your dataflow

#Build NeverwinterDP#

Checkout NeverwinterDP

git clone https://github.com/Nventdata/NeverwinterDP
cd NeverwinterDP

You may want to work with the latest code, switch to the dev/master branch

git checkout dev/master

Pull for the latest code

git pull origin dev/master

Build and release the NeverwinterDP code

gradle clean build install release -x test

You will find the release, binaries, and shell scripts in NeverwinterDP/release/build/release/neverwinterdp

You need to set the NEVERWINTERDP_HOME environment variable (optional) in order the other cluster script can build and deploy Scribengin automatically

export NEVERWINTERDP_HOME=/your/path/to/NeverwinterDP

#Automation Prerequisites#

Install Ansible
Install and configure Docker
Install Gradle
Install Java 7
Install Python 2.7
Make sure the user you are running as has write permissions for /etc/hosts
- Setup scripts will update your /etc/hosts file, but will not remove any entries that are already there

Setup your SSH config

   echo -e "Host *\n  StrictHostKeyChecking no" >> ~/.ssh/config

If you want to work with S3, set up your credentials file in this format

user@machine $ cat ~/.aws/credentials
[default]
aws_access_key_id=XXXXX
aws_secret_access_key=YYYYYY

#Docker Setup# This will require access to Nvent's private repos. Continue on to Launching Scribengin cluster manually if you do not have access.

The following steps will deploy all the necessary components to run Scribengin locally by using Docker.

Clone deployments and tools repo

 git clone git clone https://<bitbucket_user>@bitbucket.org/nventdata/neverwinterdp-deployments.git

Set up for neverwinter tools

 #Run the setup script for tools (only necessary ONCE)
 sudo ./neverwinterdp-deployments/tools/cluster/setup.sh

Build docker image with scribengin in one step

 #Build images, launch containers, run ansible
 ./neverwinterdp-deployments/docker/scribengin/docker.sh  cluster --launch

 #If you decided not to set NEVERWINTERDP_HOME, then you can pass it in manually here
 ./neverwinterdp-deployments/docker/scribengin/docker.sh  cluster --launch --neverwinterdp-home=/your/path/to/NeverwinterDP

If you wish to DESTROY your cluster (clean images and containers)

 ./neverwinterdp-deployments/docker/scribengin/docker.sh cluster --clean-containers --clean-image

#Digital Ocean Setup# This will require access to Nvent's private repos. Continue on to Launching Scribengin cluster manually if you do not have access.

The following steps will deploy all the necessary components to run Scribengin in the cloud via Digital Ocean. You'll also need a Digital Ocean account and a Digital Ocean token (see step 3)

Clone deployments and tools repo

 git clone git clone https://<bitbucket_user>@bitbucket.org/nventdata/neverwinterdp-deployments.git

Set up for neverwinter tools

 #Run the setup script for tools (only necessary ONCE)
 sudo ./neverwinterdp-deployments/tools/cluster/setup.sh

Set up your Digital Ocean token

 #To get a token visit - 
 #  https://cloud.digitalocean.com/settings/applications#access-tokens
 echo "TOKENGOESHERE" > ~/.digitaloceantoken

Run the Digital Ocean automation

 cd ./neverwinterdp-deployments/tools/

 ./cluster/clusterCommander.py \
   digitalocean \
   --launch --neverwinterdp-home $NEVERWINTERDP_HOME \
   --ansible-inventory \
   --create-containers $ROOT/ansible/profile/stability.yml \
   --subdomain $SUBDOMAIN --region nyc3

Install Scribengin and necessary cluster services

 ./serviceCommander/serviceCommander.py \ 
   --cluster --install --configure --profile-type stability

#AWS-setup

1.Login in to amazon console

2.Select region

3.Click on 'Launch Instance' button in the scree 4.Click on 'AWS Marketplace' to select the software from AWS Marketplace

5.Click/Select on 'CentOS 7 (x86_64) with Updates HVM'

6.Select Instance Type

7.Configure Instance.

Make sure that you have entered correct number of instance that you want to - launch in 'Number of instances' field.
Select 'Shutdown behavior' field option to 'Terminate' so that OS level shutdown behaviour is performed.

8.Add Storage

By default you have 'Root'. Leave that as it is.
Add new stroage type 'EBS' by clicking 'Add New Volume' seen below the table.
Set the size of the volume that you need in 'Size (GiB)' field.
Choose the 'Volume Type' to 'Provisioned IOPS SSD (IO1)'. By selecting this you can change IOPS value that you want.
Set IOPS that you want in 'IOPS' field.
Make sure that 'Delete on Termision' field is checked (Recomonded for testing not for production). Because EBS volumes persist independently from the running life of EC2 instances. By choosing this option to checked, EBS volume associted too the EC2 instance will be deleted when its terminated.

9.Tag Instance (Optional). you can skip this for now.

10.Configure Secuirity Group

For now select an existing security group. Which is already created for testing purposes. Select two security group as seen in the above picture.

11.Click 'Review and Launch'. On the next screen review all the configurations that you set. And then click 'Launch' button to launch the instances.

12.When Launching instances, it will prompt to select 'Key Pair', which is used to communicate with ec2 instances securely. You can create new key pair or select an existing key pair. Make sure you have keypair file with an extention *.pem if you select an existing keypair. And the click Launch Instances.

13.You can see all the instances in the Instance table.

You can name the instances as you want for identification purpose.

14.You can get Public IP and Private IP by selecting any one instance from the instance list, and you can see the private and public IP on the 'Description' tab below the table.

15.Edit the /etc/hosts of your local machine with instances public IPs.

Example:

##SCRIBENGIN CLUSTER START##
52.1.1.1 hadoop-master
52.1.1.2  hadoop-worker-1
52.1.1.3  hadoop-worker-3
52.1.1.4  monitoring-1
52.1.1.5  kafka-4
52.1.1.6  kafka-5
52.1.1.7  hadoop-worker-2
52.1.1.8  zookeeper-1
52.1.1.9  elasticsearch-1
52.1.1.10  kafka-3
52.1.1.11  kafka-1
52.1.1.12  kafka-2
##SCRIBENGIN CLUSTER END##

Now you can able to communicate with ec2 instances with hostname.

16.SSH into instance. Initially you can able to login to the instance with the user 'centos'. To ssh login you need to use keypair pem file that you have.

ssh -i /path/to/test.pem centos@monitoring-1

17.Yes you logged into ec2 instance. Initially EBS volume is not mounted to the ec2 instance. Make EBS volume available. For that we need to format the volume with appropriate file system and then mount it. run the below command for ec2 centos instance.

sudo mkfs -t xfs /dev/xvdb
sudo mount /dev/xvdb /opt

If the above is not working or to know more about making EBS available, please read Making an Amazon EBS Volume Available for Use from amazon user guide website.

18.Add neverwinterdp user. To add user and edit /etc/sudoers you need to login as root user. To do so after login as 'centos' user type sudo -i and press enter to login as root user. And run the below commands

useradd -m -d /home/neverwinterdp -s /bin/bash -c "neverwinterdp user" -p $(openssl passwd -1 neverwinterdp)  neverwinterdp
echo "neverwinterdp ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
echo "root ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers
chown -R neverwinterdp:neverwinterdp /opt
cp -R /root/.ssh/ /home/neverwinterdp/
chown -R neverwinterdp:neverwinterdp /home/neverwinterdp/.ssh
echo "REPLACE HERE WITH YOUR PUBLIC KEY" >> /home/neverwinterdp/.ssh/authorized_keys

19.Update /etc/hosts file in all instances with private and public IPs of all instances. See Machine Naming Conventions Below

Example

##SCRIBENGIN CLUSTER START##
172.0.0.13  hadoop-master hadoop-master.stability.kafka hadoop-master.stability.kafka.private
52.0.0.226  hadoop-master.stability.kafka.public
172.0.0.12 hadoop-worker-1 hadoop-worker-1.stability.kafka hadoop-worker-1.stability.kafka.private
52.0.0.46  hadoop-worker-1.stability.kafka.public
172.0.0.14 hadoop-worker-3 hadoop-worker-3.stability.kafka hadoop-worker-3.stability.kafka.private
52.0.0.233  hadoop-worker-3.stability.kafka.public
172.0.0.227 monitoring-1 monitoring-1.stability.kafka monitoring-1.stability.kafka.private
52.0.0.151  monitoring-1.stability.kafka.public
172.0.0.227 kafka-4 kafka-4.stability.kafka kafka-4.stability.kafka.private
52.0.0.59  kafka-4.stability.kafka.public
172.0.0.228 kafka-5 kafka-5.stability.kafka kafka-5.stability.kafka.private
52.0.0.37  kafka-5.stability.kafka.public
172.0.0.15 hadoop-worker-2 hadoop-worker-2.stability.kafka hadoop-worker-2.stability.kafka.private
52.0.0.84  hadoop-worker-2.stability.kafka.public
172.0.0.102 zookeeper-1 zookeeper-1.stability.kafka zookeeper-1.stability.kafka.private
52.0.0.247  zookeeper-1.stability.kafka.public
172.0.0.100 elasticsearch-1 elasticsearch-1.stability.kafka elasticsearch-1.stability.kafka.private
52.0.0.238  elasticsearch-1.stability.kafka.public
172.0.0.224 kafka-3 kafka-3.stability.kafka kafka-3.stability.kafka.private
52.0.0.71  kafka-3.stability.kafka.public
172.0.0.226 kafka-1 kafka-1.stability.kafka kafka-1.stability.kafka.private
52.0.0.150  kafka-1.stability.kafka.public
172.0.0.225 kafka-2 kafka-2.stability.kafka kafka-2.stability.kafka.private
52.0.0.110  kafka-2.stability.kafka.public
##SCRIBENGIN CLUSTER END##

20.Repeat 16,17,18,19 on all instances.

21.Install Scribengin and necessary cluster services

./serviceCommander/serviceCommander.py --cluster --install --configure --profile-type stability

Yes... AWS instance is ready now run scribengin.

#Launching Demo (odyssey-scribengin) Cluster Setup in AWS 1.Follow AWS instance launching steps as before 2.It is importnant to add tags while launching aws instances. There are 3 mandotary key-values should be added while setting up aws instances. They are

Name
- Follow Machine Naming Conventions for scribengin clsuter, and there is no naming conventions for odyssey cluster
Groups
- Only the groups mentioned below are allowed.
- You can add multiple groups by comma seperated value
- Available groups for scribengin are
  - hadoop-worker
  - hadoop-master
  - elasticsearch
  - monitoring
  - zookeeper
  - kafka
- Available groups for odyssey are
  - kafka-zookeeper
  - storm-zookeeper
  - storm-nimbus
  - storm-supervisor
  - odyssey-monitoring
  - kafka-broker
  - load-balancer
Identifier
- There are only two identifier values are available, they are
  - odyssey
  - neverwinterdp

3.Deploy neverwinterdp-deployments in scribengin monitoring instance. 4.Run the below command with required arguments. This will update host machine with required keys and credentials for passwordless access with other instances in the cluster.

./tests/demo_deployments_script/update_ssh_config.sh --local-aws-pem=/path/to/aws.pem

5.Login in to host machine (usually its monitoring machine) as neverwinterdp user. And cd to neverwinterdp-deployemnts path. Make sure that neverwinterdp-deployments is latest from master branch, if not pull the latest from master.

cd ./neverwinterdp-deployments

6.Run the below command from neverwinterdp-deployemnts to create ansible inventory file

./tools/awsHelper/awsHelper.py ansibleinventory --identifier odyssey,neverwinterdp

7.Update hosts file on all instances

./tools/awsHelper/awsHelper.py updateremotehostfile -i odyssey,neverwinterdp -k /home/neverwinterdp/test.pem

8.Run below commands to install and start neccessory services in the cluster.

./tools/serviceCommander/serviceCommander.py -e "kafka,zookeeper" --configure --start --clean
./tools/serviceCommander/serviceCommander.py -e "gripper" --install --configure —start 
./tools/serviceCommander/serviceCommander.py -e "load_balancer" --install --configure --start 
./tools/serviceCommander/serviceCommander.py -e "odyssey_elasticsearch" --install --configure --start

./tools/serviceCommander/serviceCommander.py -e "storm_zookeeper" --install --configure --start
./tools/serviceCommander/serviceCommander.py -e "storm_nimbus" --install --configure --start
./tools/serviceCommander/serviceCommander.py -e "storm_supervisor" --install --configure --start
./tools/serviceCommander/serviceCommander.py -e "storm_code" --install --configure --start

9.Optional step- To make the things simple to skip steps 5 to 8, run the below command

./tests/demo_deployments_script/deploy_odyssey_scribengin_demo.sh

Note: The above steps will launch cluster with scribengin's kafka,zookeeper and other odyssey services that communicate with scribengins kafka,zookeeper.

#Arbitrary Cluster Setup Follow the steps in this guide for information on how to use Nvent's private automation to launch in any arbitrary cluster. These steps require access to Nvent's private repos.

If you do not have access to these private repos, please continue on to Manually Launching

#Manually Launching#

These steps will be necessary if you do not have access to Nvent's private automation repo's

###Prerequisites

Set up at least one Hadoop4 node
Set up YARN on Hadoop
Set up at least one Zookeeper node
Set up at least one ElasticSearch node
Set up any machines for sources/sinks (i.e. Kafka, etc)
Create a user on all the machines (These tweaks make things run smoothly without interruption)

username: neverwinterdp
ssh keys set up in ~/.ssh
has passwordless sudo

Set up all node's /etc/hosts file. See Machine Naming Conventions Below

 ##SCRIBENGIN CLUSTER START##
 10.0.0.1 elasticsearch-1 
 10.0.0.2 hadoop-master
 10.0.0.3 hadoop-worker-1
 10.0.0.4 hadoop-worker-2
 10.0.0.5 hadoop-worker-3
 10.0.0.6 zookeeper-1 
 ##SCRIBENGIN CLUSTER END##

###Machine Naming conventions

We strongly suggest aptly naming the nodes in your cluster.

hadoop-master
- Only one node of this is required
- Runs Hadoop and YARN master processes
- Needs to run
  - SecondaryNameNode
  - ResourceManager
  - NameNode
- Since there is only ONE master, no sequential naming
hadoop-worker-*
- Running Hadoop and YARN slave processes
- Named sequentially, i.e.
  - hadoop-worker-1
  - hadoop-worker-2
  - etc...
- Needs to run
  - DataNode
  - NodeManager
elasticsearch-*
- Running elasticsearch
- Named sequentially
- Handles receiving logs and metric information
zookeeper-*
- Runs zookeeper quorum
- Named sequentially

###Launching Scribengin

Make sure you have the JAVA_HOME environment variable correctly set
Build NeverwinterDP

Make sure the Scribengin shell script is set correctly

#After building, if you didn't edit your /etc/hosts file, you'll need to edit the file:
#NeverwinterDP/release/build/release/neverwinterdp/scribengin/bin/shell.sh
#   -Dshell.zk-connect    - [hostname]:[port] of your Zookeeper server
#   -Dshell.hadoop-master - [hostname] of your master Hadoop node

APP_OPT="$APP_OPT -Dshell.zk-connect=zookeeper-1:2181 -Dshell.hadoop-master=hadoop-master"

Launch the VM Master in YARN

#From release/neverwinterdp directory
cd  NeverwinterDP/release/build/release/neverwinterdp/scribengin/bin/
  
#To run the vm-master on top of hadoop yarn
./shell.sh vm start


#To check the scribengin status
./shell.sh vm info

Use the Scribengin API to upload your Dataflow.

#Launching a dataflow from a preconfigured test#

This test can be launched manually from the public NeverwinterDP repo

./NeverwinterDP/release/build/release/neverwinterdp/dataflow/tracking-sample/bin/run-tracking.sh

These tests are in Nvent's private automation repo

#The kafka test is a simple, quick test 
./neverwinterdp-deployments/tests/scribengin/tracking/integration/kafka-run-test.sh
    
#The kafka stability test is a more complicated, longer running test
./neverwinterdp-deployments/tests/scribengin/tracking/stability/stability-kafka-test.sh

#Monitoring Scribengin

###Navigate to Kibana to view real time metrics###

Point your browser to http://monitoring-1:5601
You can change the interval at which Kibana refreshes itself in the top panel, or manually refresh the page

###SSH onto a cluster node###

#neverwinterdp user has sudo permissions
ssh neverwinterdp@[node-name]

###Getting status of a running dataflow

#After you launch a dataflow on the command line, it will give you a command
# you can run to monitor the dataflow.  It will look similar to this.
./scribengin/bin/shell.sh plugin com.neverwinterdp.scribengin.dataflow.tool.tracking.TrackingMonitor \
   --dataflow-id [DATAFLOW NAME] \
   --report-path /applications/tracking-sample/reports \
   --max-runtime 0 \
   --print-period 15000 \
   --show-history-workers

###Using the Scribengin Shell

Refer to our Scribengin Shell Commands Guide for how to interface with the Scribengin shell. The Scribengin shell allows you to issue commands to Scribengin and get info on running dataflows and VMs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scribengin-cluster-setup-quickstart.md

scribengin-cluster-setup-quickstart.md

Scribengin Quickstart

Files

scribengin-cluster-setup-quickstart.md

Latest commit

History

scribengin-cluster-setup-quickstart.md

File metadata and controls

Scribengin Quickstart