Creating a new deployment

Contains templates for Etleap VPC deployments.

Creating a new deployment

Below is the minimal module instantiation to run Etleap inside your own VPC. This will create a new VPC, and deploy Etleap and its associated resources inside.

New VPC deployment

module "etleap" {
  source  = "etleap/etleap-vpc/aws"
  version = "1.8.10"

  region           = "us-east-1"
  deployment_id    = "deployment" # This will be provided by Etleap
  vpc_cidr_block_1 = 172
  vpc_cidr_block_2 = 22
  vpc_cidr_block_3 = 3
  key_name         = aws_key_pair.ssh.key_name
  first_name       = "John"
  last_name        = "Smith"
  email            = "john.smith@example.com"
}

output "app-hostname" {
  value = module.etleap.app_public_address
}

output "setup-password" {
  sensitive = true
  value     = module.etleap.setup_password
}

Existing VPC deployment

To deploy Etleap in an existing VPC, replace the vpc_cidr_block_* variables with:

vpc_id           = "vpc-id"
public_subnets   = ["subnet-public-1-id", "subnet-public-2-id", "subnet-public-3-id"]
private_subnets  = ["subnet-private-1-id", "subnet-private-2-id", "subnet-private-3-id"]

Inputs

The following options are available when deploying Etleap.

Note: Either vpc_cidr_block_1, vpc_cidr_block_2, vpc_cidr_block_3 or vpc_id, public_subnets, private_subnets are required to be specified.

Name	Description	Type	Default	Required
`region`	The region Etleap is deployed in.	`string`	n/a	yes
`deployment_id`	The Deployment ID for this deployment. If you don't have one, please contact Etleap Support.	`string`	n/a	yes
`vpc_cidr_block_1`	The first octet of the CIDR block of the desired VPC's address space.	`int`	n/a	no
`vpc_cidr_block_2`	The second octet of the CIDR block of the desired VPC's address space.	`int`	n/a	no
`vpc_cidr_block_3`	The third octet of the CIDR block of the desired VPC's address space.	`int`	n/a	no
`key_name`	The AWS Key Pair to use for SSH access into the EC2 instances.	`string`	n/a	yes
`first_name`	The first name to use when creating the first Etleap user account.	`string`	n/a	yes
`last_name`	The last name to use when creating the first Etleap user account.	`string`	n/a	yes
`email`	The email to use when creating the first Etleap user account.	`string`	n/a	yes
`vcp_id`	Existing VPC to deploy Etleap in.	`string`	n/a	no
`public_subnets`	Existing public subnets to deploy Etleap in.	`list(string)`	n/a	no
`private_subnets`	Existing private subnets to deploy Etleap in.	`list(string)`	n/a	no
`extra_security_groups`	Grant access to the DB, EC2 instance, and EMR cluster to the specified Security Groups	`list(string)`	`[]`	no
`app_hostname`	The hostname where Etleap will be accessible from. If left empty, the default Load Balancer DNS name will be used.	`string`	`null`	no
`app_available`	Only use this if instructed by ETLeap support. Enable or disable to start or destroy the app instance.	`boolean`	`true`	yes
`ha_mode`	Enables High Availability mode. This will run two redundant Etleap instances in 2 availability zones, and set the RDS instace to "multi-az" mode.	`boolean`	`false`	no
`app_private_ip`	The Private IP for the main application instance. Use if you want to set it to a predetermined value. By default, the application will be assigned a random IP.	`string`	`null`	no
`secondary_private_ip`	The Private IP for the seconday application instance. Use if you want to set it to a predetermined value. By default, the application will be assigned a random IP.	`string`	`null`	no
`nat_private_ip`	The Private IP for the NAT instance. Use if you want to set it to a predetermined value. By default, the application will be assigned a random IP.	`string`	`null`	no
`non_critical_cloudwatch_alarm_sns_topics`	A list of SNS topics to notify when non critical alarms are triggered. For the list of non-critical alarms, see CloudWatch Alarms under Monitoring and operation.	`list(string)`	`[]`	no
`critical_cloudwatch_alarm_sns_topics`	A list of SNS topics to notify when critical alarms are triggered. For the list of critical alarms, see CloudWatch Alarms under Monitoring and operation.	`list(string)`	`[]`	no
`app_instance_type`	The instance type for the main app node(s). Defaults to `t3.xlarge`. We do not recommend using a smaller instance type.	`string`	`t3.xlarge`	no
`nat_instance_type`	The instance type for the NAT instance. Defaults to `m5n.large`.	`string`	`m5n.large`	no
`rds_instance_type`	The instance type for the RDS instance. Defaults to `db.m5.large`. We do not recommend using a smaller instance type.	`string`	`db.m5.large`	no
`dms_instance_type`	The instance type for the DMS instance. Defaults to `dms.t2.small`. Not used if `disable_cdc_support` is set to `true`.	`boolean`	`true`	no
`disable_cdc_support`	Set to true if this deployment will not use CDC pipelines. This will cause the DMS Replication Instance and associated resources not to be created. Defaults to `false`.	`boolean`	`false`	no
`dms_roles_to_be_created`	Set to `true` if this template should create the roles required by DMS, `dms-vpc-role` and `dms-cloudwatch-logs-role`. Set to `false` if are already using DMS in the account where you deploy Etleap.	`boolean`	`true`	no
`unique_resource_names`	If set to 'true', a suffix is appended to resource names to make them unique per deployment. Recommend leaving this as 'true' except in the case of migrations from earlier versions.	`boolean`	`true`	no
`s3_input_buckets`	The names of the S3 buckets which will be used with "S3 Input" connections. The module will create an IAM role to be specified with the "S3 Input" connections, together with a bucket policy that needs to be applied to the bucket.	`list(string)`	`[]`	no
`s3_data_lake_account_ids`	The 12-digit IDs of the AWS accounts containing the roles specified with "S3 Data Lake" connections. IAM roles in these accounts are given read access to the intermediate data S3 bucket.	`list(string)`	`[]`	no
`github_username`	Github username to use when accessing custom transforms	`string`	`null`	no
`github_access_token_arn`	ARN of the secret containing the GitHub access token	`string`	`null`	no
`connection_secrets`	A map between environment variables and Secrets Manager Secret ARN for secrets to be injected into the application. This is only used for enabling certain integration.	`map(string, string)`	`{}`	no
`resource_tags`	Resource tags to be applied to all resources create by this template.	`map(string, string)`	`{}`	no
`app_access_cidr_blocks`	CIDR ranges that have access to the application (port 443). Defaults to allowing all IP addresses.	`list(string)`	`["0.0.0.0"]`	no
`ssh_access_cidr_blocks`	CIDR ranges that have SSH access to the application instance(s) (port 22). Defaults to allowing all IP addresses.	`list(string)`	`["0.0.0.0"]`	no
`roles_allowed_to_be_assumed`	A list of external roles that can be assumed by the app. When not specified, it defaults to all roles (*)	`list(string)`	`[]`	no
`enable_public_access`	Enable public access to the Etleap deployment. This will create an Internet facing ALB. Defaults to `true`.	`boolean`	`true`	no
`acm_certificate_arn`	"ARN Certificate to use for SSL connections to the Etleap UI. If the certificate is specified, it must use either RSA_1024 or RSA_2048. See https://docs.aws.amazon.com/acm/latest/userguide/import-certificate-api-cli.html for more details. If no certificate is specified, the deployment will use a default one bundled with the template.	`string`	`null`	no
`rds_backup_retention_period`	The number of days to retain the automated database snapshots. Defaults to 7 days.	`int`	`7`	no
`rds_allow_major_version_upgrade`	Only use this if instructed by ETLeap support. Indicates that major version upgrades are allowed.	`boolean`	`false`	no
`rds_apply_immediately`	If any RDS modifications are required they will be applied immediately instead of during the next maintenance window. It is recommended to set this back to `false` once the change has been applied.	`boolean`	`false`	no
`emr_core_node_count`	The number of EMR core nodes in the EMR cluster. Defaults to 1.	`int`	`1`	no
`allow_iam_devops_role`	Enable access to the deployment for Etleap by creating an IAM role that Etleap's ops team can assume. Defaults to false.	`boolean`	`false`	no
`allow_iam_support_role`	Enable access to the support role for Etleap by creating an IAM role that Etleap's support team can assume. Defaults to true.	`boolean`	`true`	no
`enable_streaming_ingestion`	Enable support and required infrastructure for streaming ingestion sources. Currently only supported in `us-east-1` and `eu-west-3` regions.	`boolean`	`false`	no
`streaming_endpoint_hostname`	The hostname the streaming ingestion webhook will be accessible from. Only has an effect if `enable_streaming_ingestion` is set to `true`. If left empty, the default Load Balancer DNS name will be used.	`string`	`null`	no
`streaming_endpoint_acm_certificate_arn`	ARN Certificate to use for SSL connections to the streaming ingestion webhook. If the certificate is specified, it must use either RSA_1024 or RSA_2048. See https://docs.aws.amazon.com/acm/latest/userguide/import-certificate-api-cli.html for more details. If no certificate is specified, the deployment will use a default one bundled with the template.	`string`	`null`	no
`streaming_endpoint_access_cidr_blocks`	CIDR ranges that have access to the streaming ingestion webhook (both HTTP and HTTPS). Defaults to allowing all IP addresses.	`list(string)`	`["0.0.0.0/0"]`	no

Outputs

Name	Description
`app_public_address`	The DNS address of the ALB that serves the Etleap Web UI.
`streaming_endpoint_public_address`	The DNS address of the ALB that serves the streaming ingestion webhook.
`s3_input_role_arn`	Role to use when setting up S3 Input connections with a bucket from a different AWS account.
`s3_input_bucket_policy`	Policies that need to applied to the S3 buckets specified by 's3_input_buckets' so Etleap's role can read from them.
`setup_password`	The password to log into Etleap for the first time. You'll be prompted to change it after on first login.
`vpc_id`	The VPC ID where Etleap is deployed
`public_subnet_a`	The first public subnet for Etleap's VPC
`public_subnet_b`	The second public subnet for Etleap's VPC
`private_subnet_a`	The first private subnet for Etleap's VPC
`private_subnet_b`	The second private subnet for Etleap's VPC
`public_route_table_id`	The public subnets' route table, if managed by the module
`private_route_table_id`	The public subnets' route table, if managed by the module
`private_route_table_id`	The public subnets' route table, if managed by the module
`emr_cluster_id`	The ID of Etleap's EMR cluster
`intermediate_bucket_id`	The ID of Etleap's intermediate bucket
`deployment_id`	The Deployment ID
`main_app_instance_id`	The instance ID of the main application instance.
`secondary_app_instance_id`	The instance ID of the secondary application instance.
`kms_policy`	Statement to add to the KMS key if using a Customer-Manager SSE KMS key for encrypting S3 data.
`nat_ami`	Status of the NAT AMI (if created)

Connecting to the Etleap deployment

After Terraform has finished applying the changes, it may take up to 30 minutes for the application to be available. This time is required to configure the EC2 instances, database and EMR cluster.

Go to the URL in the app-hostname, and use the email provided in the template to login. A temporary password was created as part of the deployment, and it's value is the output of terraform output setup-password.

Once logged in you'll be prompted to create a new password.

Monitoring and operation

CloudWatch Alarms

This module defines a number of CloudWatch alarms that can be used to alert your infrastructure operations team when the deployment is in a bad state. The table below describes the alarms that are defined, together with the action recommended to remedy them. Critical alarms are for conditions that cause pipelines to stop.

Alarm	Critical	Cause	Resolution
EMR Cluster Running	Yes	EMR cluster is not running	See the section on Reprovisioning a new EMR cluster
60% Disk EMR HDFS	No	Not enough core nodes for the workload	Increase the number of core nodes via the Terraform variable `emr_core_node_count`.
EMR Unhealthy Nodes	No	EMR cluster is in a bad state	Taint the cluster and see the section on Reprovisioning a new EMR cluster
EMR Missing Blocks	No	Missing HDFS blocks means we lost one or more core nodes	Taint the cluster and the section on Reprovisioning a new EMR cluster
80% Disk EMR NameNode	Yes	The disk is filling up on the name ndoe	Taint the cluster and the section on Reprovisioning a new EMR cluster
RDS CPU 90%	No	RDS instance is saturating CPU	Increase the RDS instance size
RDS Disk Space	Yes	RDS is running out of disk space	Increase the `allocated_storage` via Terraform, or via the console
RDS Freeable Memory	No	RDS is running out of disk space	Increase the `allocated_storage` via Terraform, or via the console
* Node 80% CPU	No	CPU usage is high on the specified instance	Upgrade the instance type to the next larger size within the same instance family. If you wish to upgrade from `t3.2xlarge`, which is the largest `t3` instance available, please switch to the `c6a` family.
* 90% Disk *	Yes	Disk is getting full for one of the instances	Increase the EBS size of the attached volumes; contact Etleap Support to diagnose to root cause
App is running	Yes	The main web application is down and not accepting requests	If in single-availability node, reprovision the instance. If in High-Availablity mode, reprovision both instances, and contact Etleap Support to determine the cause of the outage
Job is running	Yes	The data processing application is down	If in single-availability node, reprovision the instance. If in High-Availablity mode, reprovision both instances, and contact Etleap Support to determine the cause of the outage
DMS Disk Space 30GB Remaining	Yes	DMS replication instance is running out of disk space	Contact Support
DMS Available Memory <= 10%	No	DMS replication instance is running out of memory	Upgrade DMS replication instance
Elva Healthy Host Count	Yes	The number of streaming ingestion nodes is too low.	Contact Support
Zookepeer Unhealthy Nodes	Yes	Zookeeper cluster has Unhealthy Nodes	Contact Support
* App Kinesis logger agent is running	Yes	A Kinesis logger agent is not running	Contact Support
High Job GC Activity	Yes	The data processing application is spending a significant time doing garbage collection.	If the monitored metric has been steadily increasing over time, upgrade the `app_instance_type` to one that has more memory. Contact support if this alarm is caused by a sudden spike in the metric.

Reprovisioning a new EMR cluster

If the EMR Cluster Running, EMR Unhealthy Nodes or EMR Missing Blocks alarm has triggered, you'll need to start a new EMR cluster. Before running terraform, run the following script to send any relevant logs and metrics to Etleap for analysis (if you have the option enabled for you deployment).

CLUSTER_ID=$(terraform output -raw emr_cluster_id)
INTERMEDIATE_BUCKET=$(terraform output -raw intermediate_bucket_id)
DEPLOYMENT_ID=$(terraform output -raw deployment_id)
aws s3 cp s3://$INTERMEDIATE_BUCKET/emr-logs/$CLUSTER_ID/ s3://etleap-vpc-emr-logs/$DEPLOYMENT_ID/$CLUSTER_ID/ --acl bucket-owner-full-control --recursive

Once this is done, you can run terrafrom apply to recreate or replace the cluster, as the need may be.

Security upgrades

This section provides information on how to run security upgrade for the deployment.

EC2 instances that are part of this deployment are designed to upgrade and apply any updates when they first start up. We do not support patching existing instances, so the following instruction swill guide you on how to replace the instances while minimizing downtime.

Upgrading the Application Instances

Expected Downtime:

API and Web UI:
- HA Mode: none
- Regular Mode: 10-15 minutes
Pipelines: 10-15 minutes

Note if you plan on upgrading the EMR cluster as well, perform that upgrade first, as it will require replacing the application instances as part of the upgrade.

Step 1: Regular and HA Mode

Run terraform to replace the main application instance: terraform apply -replace 'module.etleap.module.main_app[0].aws_instance.app';
Once the apply finishes, check if the application is online:

a. In the AWS EC2 Console, go to "Target Groups"

b. Select the "Etleap*" Target group. To get the exact name run: terraform state show module.etleap.aws_lb_target_group.app.

c. Under the "Targets" tab, check that all instances are "Healthy".
Once all instances are healthy, you can continue with the next step.

Step 2: HA Mode only

Run terraform to replace the secondary instance: terraform apply -replace 'module.etleap.module.secondary_app[0].aws_instance.app';

Upgrading the Zookeeper Cluster

Downtime: none

Warning To ensure 0 downtime, the upgrade must be performed one instance at a time. Make sure that all 3 Zookeeper nodes are healthy before moving to the next one.

Check the maximum of the Etleap/Zookeeper Ruok metric is 1 for all 3 instances. If this is not the case, contact support@etleap.com before proceeding.
Taint the zookeeper instance: terraform apply -replace 'module.etleap.aws_instance.zookeeper["1"]'
Run terraform apply;
Wait for at least 10 minutes, and monitor until the Etleap/Zookeeper Ruok metric is 1 for the instance that was replaced. If the metric doesn't recover after 20 minutes, contact support@etleap.com before proceeding further.
Repeat steps 1-4 for the remaining 2 instances: 'module.etleap.aws_instance.zookeeper["2"]' and 'module.etleap.aws_instance.zookeeper["3"]'.

Upgrading the EMR Cluster

Downtime:

API and Web UI: none
Pipelines: 10-15 minutes

Remove the old cluster from the state: terraform state rm module.etleap.aws_emr_cluster.emr and terraform state rm module.etleap.aws_emr_instance_group.task_spot;
Run terraform apply -target module.etleap.aws_emr_cluster.emr -target module.etleap.aws_emr_instance_group.task_spot to create a new cluster;
Once the the apply completes, replace the main application instance: terraform apply -target module.etleap.module.main_app[0].aws_instance.app -target module.etleap.aws_lb_target_group_attachment.main_app[0];
Monitor that the instance comes online:

a. In the AWS EC2 Console, go to "Target Groups"

b. Select the "Etleap*" Target group. To get the exact name run: terraform state show module.etleap.aws_lb_target_group.app.

c. Under the "Targets" tab, check that all instances are "Health".
Once the main instance is online, apply the remaining changes with terraform apply. If HA Mode is enabled, this will also replace the secondary application instace.
Manually terminate the old cluster from the AWS Console or the CLI.

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.github		.github
modules		modules
ssl		ssl
templates		templates
.gitignore		.gitignore
.terraform.lock.hcl		.terraform.lock.hcl
CHANGELOG.md		CHANGELOG.md
README.md		README.md
alarms.tf		alarms.tf
app.tf		app.tf
dms.tf		dms.tf
dynamo.tf		dynamo.tf
emr.tf		emr.tf
extra-security.tf		extra-security.tf
iam-support.tf		iam-support.tf
iam.tf		iam.tf
kms-policy.json		kms-policy.json
kms.tf		kms.tf
locals.tf		locals.tf
main.tf		main.tf
nat.tf		nat.tf
outputs.tf		outputs.tf
policies.tf		policies.tf
provider.tf		provider.tf
rds.tf		rds.tf
s3.tf		s3.tf
secrets.tf		secrets.tf
security.tf		security.tf
sqs.tf		sqs.tf
ssm.tf		ssm.tf
variables.tf		variables.tf
vpc.tf		vpc.tf
zookeeper.tf		zookeeper.tf

etleap/terraform-aws-etleap-vpc

Folders and files

Latest commit

History

Repository files navigation

Creating a new deployment

New VPC deployment

Existing VPC deployment

Inputs

Outputs

Connecting to the Etleap deployment

Monitoring and operation

CloudWatch Alarms

Reprovisioning a new EMR cluster

Security upgrades

Upgrading the Application Instances

Step 1: Regular and HA Mode

Step 2: HA Mode only

Upgrading the Zookeeper Cluster

Upgrading the EMR Cluster

About

Resources

Stars

Watchers

Forks

Languages