Skip to content

Setting up Druid for CI on EC2

Cam Saul edited this page Feb 7, 2018 · 35 revisions

Unfortunately the process of loading test data onto an external server for CI purposes is a little involved. Follow the steps in each section below to get Druid up and running.

Launch an EC2 instance

Instance should be at least a large (8GB of RAM and 2 vCPUs) because those are the minimum requirements for Druid. Configure the instance to have ports 8082, 8090, and 22 (for SSH) exposed. Amazon Linux is recommended. Other options can be left as defaults. After the instance is launched, SSH into it and do sudo yum update to get latest security patches.

Make a new unprivileged user

Make a new unprivileged user who will run all the Druid stuff and switch to that account.

sudo useradd druid
sudo passwd druid <some-complicated-random-password>
su druid
cd ~

Upgrade to Java 8 (If Needed)

Check your Java version with java -version. If it's not 8, install Java 8 as follows:

sudo yum install java-1.8.0-openjdk.x86_64 # or higher; search with yum search java
sudo yum remove java-1.7.0-openjdk

After that check java -version again to make sure 8 is now installed.

Set Up Zookeeper

  1. Find the relevant link for the archive for the latest stable edition of Zookeeper at http://zookeeper.apache.org/releases.html#download.
  2. Download & extract the archive:
    wget <archive-link>
    tar -zxvf zookeeper-<version>.tar.gz
    mv zookeeper-<version> zookeeper
    rm zookeeper-<version>.tar.gz
  3. Create zookeeper/conf/zoo.cfg -- see the Getting Started Guide for more details. nano should be fine here:
    tickTime=2000
    dataDir=/home/druid/zookeeper/data
    clientPort=2181
  4. Start Zookeeper:
    zookeeper/bin/zkServer.sh start 
    zookeeper/bin/zkServer.sh status # to make sure it started correctly. Should say something like "Mode: standalone"
    If Zookeeper fails to start for some reason, debug it by starting with:
    zookeeper/bin/zkServer.sh start-foreground

Set Up Druid.

You can also refer to the Druid Quickstart Guide for more details.

  1. Find the link to download the latest stable version at http://druid.io/downloads.html and copy the link

  2. Download & Extract Druid

    wget <archive-link>
    tar -zxvf druid-<version>-bin.tar.gz 
    mv druid-<version> druid
    rm druid-<version>-bin.tar.gz 
  3. Create a temp directory for Druid

    mkdir -p druid/var/tmp
  4. Edit the historical node config file so it will run with enough memory.

    Edit druid/conf-quickstart/druid/historical/jvm.config and bump the -Xmx setting to 3g or so

  5. Enable JavaScript

    Edit druid/conf-quickstart/druid/_common/common.runtime.properties and add a druid.javascript.enabled=true line at the end of the file. (This will no longer be needed once #6864 is merged.)

  6. Launch Druid Processes

    cd druid
    java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical > /dev/null &
    java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker > /dev/null &
    java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator > /dev/null &
    java `cat conf-quickstart/druid/overlord/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/overlord:lib/*" io.druid.cli.Main server overlord > /dev/null &
    java `cat conf-quickstart/druid/middleManager/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/middleManager:lib/*" io.druid.cli.Main server middleManager >/dev/null &

    You can check if things are running with

    ps -aux | grep java

Load the Data

  1. Load metabase.test.data.druid in the REPL to generate flattened test data file. Optionally pick a <filename>

    (generate-json-for-batch-ingestion <filename>)
  2. Upload the data to the EC2 instance with scp

    scp -i druid-creds.pem checkins.json ec2-user@ec2-123-1-2-3.compute-1.amazonaws.com:checkins.json

    This will copy the file to ~/ec2-user. You'll need to copy it to ~/druid; the easiest way to do this is to SSH in as ec2-user in a separate terminal instance and copy it with sudo.

  3. Launch Druid Indexing Task

    (run-indexing-task <remote-host> 
      :base-dir <dir-where-you-uploaded-file>
      :filename <file>)

    e.g.

    (run-indexing-task "http://ec2-52-90-109-199.compute-1.amazonaws.com"
      :base-dir "/home/druid"
      :filename "checkins.json")

    The task will keep you apprised of its progress until it completes (takes 1-2 minutes) You can also keep an eye on things @ <host>:8090/console.html

  4. Keep an eye on <host>:8082/druid/v2/datasources. (e.g. http://ec2-52-90-109-199.compute-1.amazonaws.com:8082/druid/v2/datasources) This endpoint will return an empty array until the broker knows about the newly ingested segments. When it returns an array with the string "checkins" you're ready to run the tests.

  5. Kill the overlord and middleManager processes once the data has finished loading, e.g.

    ps -aux | grep overlord # find the overlord PID...
    kill <pid>              # then kill the process

    Check and make sure the other 3 nodes are still running after doing that 😻

Running Tests

You can run tests like

ENGINES=h2,druid \
  MB_DRUID_PORT=8082 \
  MB_DRUID_HOST=http://ec2-52-90-109-199.compute-1.amazonaws.com \
  lein test

Restarting Druid

If Amazon or somebody else decides to kill the instance you'll want to restart the server. Here's how to do it:

# SSH in
ssh -i /path/to/pemfile.pem ec2-user@ec2-52-90-109-199.compute-1.amazonaws.com

# Restart Zookeeper
zookeeper/bin/zkServer.sh start

# Restart Druid processes
cd druid
java `cat conf-quickstart/druid/historical/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/historical:lib/*" io.druid.cli.Main server historical > /dev/null &
java `cat conf-quickstart/druid/broker/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/broker:lib/*" io.druid.cli.Main server broker > /dev/null &
java `cat conf-quickstart/druid/coordinator/jvm.config | xargs` -cp "conf-quickstart/druid/_common:conf-quickstart/druid/coordinator:lib/*" io.druid.cli.Main server coordinator > /dev/null &

CI Upgrade History

  • February 6, 2018 - Upgraded to 0.11.0.
  • April 7, 2017 - Upgraded to 0.9.2.
Clone this wiki locally