Amylase is a Ruby on Rails application that we use at GreatVines to manage, schedule, and execute tasks. In its present state, these tasks only include certain Birst Web Services calls that we needed to automate the update of our many Birst spaces. In the future, we plan on using this application to manage the automation of all aspects of data integration here at GreatVines.
This project is still in the very early stages of development. There is currently no method for users to log in to the application. It is open to anyone who can access the IP address. We're currently running it in AWS Elastic Beanstalk with a whitelist for the IP addresses that need to connect.
Job Summary
TODOC:
- Job Summary Description
- Job Specifications
- Data Sources
- Schedules
- Clients
- Birst Spaces
- Birst Process Groups
We currently have Amylase running within AWS Elastic Beanstalk on a "64bit Amazon Linux 2014.03 v1.0.9 running Ruby 2.1 (Passenger Standalone)" type instance with a Postgres RDS database.
There are a number of environment variables that must be set for the application to run properly, detailed below
AMYLASE_LOGGING_S3_BUCKET=# name of an S3 bucket to store logs, e.g., amylase-logs
AMYLASE_REDSHIFT_S3_HOST=# hostname of the Redshift instance holding source data
AMYLASE_REDSHIFT_S3_PASSWORD=# 'envcrypted' Redshift password
AMYLASE_REDSHIFT_S3_PORT=5439
AMYLASE_REDSHIFT_S3_SSLMODE=require
AMYLASE_REDSHIFT_S3_STAGING_PATH=# S3 path to store temporary files needed to unload Redshift data, e.g., s3://amylase-staging/amylase
AMYLASE_REDSHIFT_S3_USER=# name of Redshift user
AMYLASE_REDSHIFT_S3_DBNAME=# name of Redshift dataabase
BIRST_PWD=# 'envcrypted Birst login password
BIRST_USER=# Birst username
BUNDLE_DISABLE_SHARED_GEMS=1
BUNDLE_PATH=vendor
BUNDLE_WITHOUT=test:development
ENVCRYPT_KEY=# envcrypt decryption key (you will need to escape the '$' with '\$')
RACK_ENV=production
RAILS_SKIP_ASSET_COMPILATION=false
RAILS_SKIP_MIGRATIONS=false
SECRET_KEY_BASE=# Rails secret key
AWS_ACCESS_KEY_ID=# AWS access key for communicating with AWS S3
AWS_SECRET_KEY=# AWS secret key for communicating with AWS S3
Some of the functionality of Amylase requires the use of passwords to access 3rd
party applications (such as BIRST_PWD
and AMYLASE_REDSHIFT_S3_PASSWORD
). To
protect those passwords, we use a tool called
envcrypt so that we never have to store
the unencrypted password. This requires generating a decryption key
(ENVCRYPT_KEY) and then encrypting the password using that key. Encrypted
passwords are then safe to store in files or databases, as long as the
ENVCRYPT_KEY isn't also stored. The only place the ENVCRYPT_KEY should be
stored is in an environment variable.
This application has been developed using Ruby 2.x, rbenv, Bundler, and Pow. Note, rbenv may not install correctly if rvm is also installed. Follow Pow install instructions for Rack app.
We've spent a lot of effort developing a good set of RSpec tests. So prior to deploying any code changes, a test spec should be defined and the full suite tested via
rspec spec
Some of the test suite performs live tests against Birst Web Services and AWS.
For those tests to work, the development environment will need to be able to
connect to valid Birst and AWS accounts. This could be done by setting the
environment variables similar to how they would set up in production as
described above. However, it may be more convenient to copy the relevant
sections of the config/settings.yml
file to a config/settings.local.yml
and
set them. This file overrides will override anything in config/settings.yml
.
It should never be committed to the source repository (and it is .gitignored by
default).
Additionally, some of the live tests require that an object already exists
in S3 or Redshift. These can be configured under the test:
namespace in
config/settings.local.yml
On AWS console, go to Amylase on Elastic Beanstalk. If a UAT server does not exist, clone the production server. Make sure RDS is also set up, but the UAT version can live on a much smaller instance class.
To run Amylase locally, make sure postgresql is installed and initialized with user and 2 amylase databases.
To create database: psql create database "amylase-dev"; create database "amylase-test";
Run bundle install to make sure all package dependencies are installed.
In Amylase directory, set the database variables. rake db:schema:load RAILS_ENV=development rake db:schema:load RAILS_ENV=test
On local development machine, get the latest Amylase code base from git. Create branch for code changes.
Run below command line to restart web engine. touch tmp/restart.txt
Pow should be running. Make a symbolic link to your application: cd ~/.pow ln -s /path/to/myapp Example, ln -s /Users/nancywong/Documents/GreatVines/opt/amylase Application should be running at http://amylase.dev/
Install EB Command Line Interface (CLI) version 3 (http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/eb-cli3-install.html) command line, pip install awsebcli
eb --version
Run initialization elastic beanstalk configuration on Amylase development directory. eb init Default region is us-west-2 : US West (Oregon).
Deploy to default environment or specify environment for deployment. eb deploy eb deploy amylase-uat-1 This wraps up the local Amylase repository into a zip file and moves it to the AWS elastic beanstalk specified deploy environment.
Note, when moving to production, stop scheduler to make sure no jobs are running. Also, make sure changes are committed to git in order to deploy to AWS elastic beanstalk. After deployment, start scheduler.
- User logins
- Scaling - separate the scheduler, web server, and workers
- Launch generic worker instances (e.g., ones that can run Kettle jobs)
- Birst model upgrade and deployment job templates
- PaperTrail
- Mechanism for configuring job dependencies (e.g., job B only runs if job A runs successfully)
######Why the name?
Amylase is an enzyme used in the brewing process that converts long-chained starch molecules found in barley to simpler fermentable sugars. Amylase is also the name of this Rails project, which is designed to take a mix of sequences of job tasks and split them up into smaller work units. Ok, so it's really just a job scheduler, but it's a job scheduler with a cool name, right?