Skip to content

A terraform module for deploying the Datafold infrastructure on Google cloud.

License

Notifications You must be signed in to change notification settings

datafold/terraform-google-datafold

Repository files navigation

=======

Datafold Google module

This repository provisions resources on Google, preparing them for a deployment of the application on a GKE cluster.

About this module

Prerequisites

  • A Google cloud account, preferably a new isolated one.
  • Terraform >= 1.4.6
  • A customer contract with Datafold
    • The application does not work without credentials supplied by sales
  • Access to our public helm-charts repository

This deployment will create the following resources:

  • Google VPC
  • Google subnet
  • Google GCS bucket for clickhouse backups
  • Google external application load balancer
  • Google HTTPS certificate, unless preregistered and provided
  • Three persistent disk volumes for local data storage
  • A GKE cluster
  • Service accounts for the GKE cluster to perform actions outside of its cluster boundary:
    • Provisioning persistent disk volumes
    • Updating Network Endpoint Group to route traffic to pods directly

Negative scope

  • This module will not provision DNS names in your zone.

How to use this module

  • See the example for a potential setup, which has dependencies on our helm-charts
  • Create secret files with our variables

Examples

  • Implement the example in this repository
  • Change the settings
  • Run terraform init
  • Run terraform apply

Requirements

Name Version
dns 3.2.1
google >= 4.80.0

Providers

Name Version
google >= 4.80.0
random n/a

Modules

Name Source Version
clickhouse_backup ./modules/clickhouse_backup n/a
database ./modules/database n/a
gke ./modules/gke n/a
load_balancer ./modules/load_balancer n/a
networking ./modules/networking n/a
project-iam-bindings terraform-google-modules/iam/google//modules/projects_iam n/a
project_factory_project_services terraform-google-modules/project-factory/google//modules/project_services ~> 14.4.0

Resources

Name Type

Inputs

Name Description Type Default Required
add_onprem_support_group Flag to add onprem support group for datafold-onprem-support@datafold.com bool true no
clickhouse_backup_sa_key SA key from secrets string "" no
clickhouse_data_disk_size Data volume size clickhouse number 40 no
clickhouse_db Db for clickhouse. string "clickhouse" no
clickhouse_gcs_bucket GCS Bucket for clickhouse backups. string "clickhouse-backups-abcguo23" no
clickhouse_get_backup_sa_from_secrets_yaml Flag to toggle getting clickhouse backup SA from secrets.yaml instead of creating new one bool false no
clickhouse_username Username for clickhouse. string "clickhouse" no
common_tags Common tags to apply to any resource map(string) n/a yes
create_ssl_cert True to create the SSL certificate, false if not bool false no
database_name The name of the database string "datafold" no
database_version Version of the database string "POSTGRES_15" no
datafold_intercom_app_id The app id for the intercom. A value other than "" will enable this feature. Only used if the customer doesn't use slack. string "" no
db_deletion_protection A flag that sets delete protection (applied in terraform only, not on the cloud). bool true no
default_node_disk_size Disk size for a node number 40 no
deploy_neg_backend Set this to true to connect the backend service to the NEG that the GKE cluster will create bool true no
deploy_vpc_flow_logs Flag weither or not to deploy vpc flow logs bool false no
deployment_name Name of the current deployment. string n/a yes
domain_name Provide valid domain name (used to set host in GCP) string n/a yes
environment Global environment tag to apply on all datadog logs, metrics, etc. string n/a yes
gcs_path Path in the GCS bucket to the backups string "backups" no
github_endpoint URL of Github enpoint to connect to. Useful for GH Enterprise. string "" no
gitlab_endpoint URL of Gitlab enpoint to connect to. Useful for GH Enterprise. string "" no
host_override A valid domain name if they provision their own DNS / routing string "" no
lb_app_rules Extra rules to apply to the application load balancer for additional filtering
list(object({
action = string
priority = number
description = string
match_type = string # can be either "src_ip_ranges" or "expr"
versioned_expr = string # optional, only used if match_type is "src_ip_ranges"
src_ip_ranges = list(string) # optional, only used if match_type is "src_ip_ranges"
expr = string # optional, only used if match_type is "expr"
}))
n/a yes
lb_layer_7_ddos_defence Flag to toggle layer 7 ddos defence bool false no
legacy_naming Flag to toggle legacy behavior - like naming of resources bool true no
mig_disk_type https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance_template#disk_type string "pd-balanced" no
postgres_allocated_storage The amount of allocated storage for the postgres database number 20 no
postgres_instance GCP instance type for PostgreSQL database.
Available instance groups: .
Available instance classes: .
string "db-custom-2-7680" no
postgres_ro_username Postgres read-only user name string "datafold_ro" no
postgres_username The username to use for the postgres CloudSQL database string "datafold" no
project_id The project to deploy to, if not set the default provider project is used. string n/a yes
provider_azs Provider AZs list, if empty we get AZs dynamically list(string) n/a yes
provider_region Region for deployment in GCP string n/a yes
redis_data_size Redis volume size number 10 no
remote_storage Type of remote storage for clickhouse backups. string "gcs" no
restricted_roles Flag to stop certain IAM related resources from being updated/changed bool false no
restricted_viewer_role Flag to stop certain IAM related resources from being updated/changed bool false no
ssl_cert_name Provide valid SSL certificate name in GCP OR ssl_private_key_path and ssl_cert_path string "" no
ssl_cert_path SSL certificate path string "" no
ssl_private_key_path Private SSL key path string "" no
vpc_cidr Network CIDR for VPC string "10.0.0.0/16" no
vpc_flow_logs_interval Interval for vpc flow logs string "INTERVAL_5_SEC" no
vpc_flow_logs_sampling Sampling for vpc flow logs string "0.5" no
vpc_id Provide ID of existing VPC if you want to omit creation of new one string "" no
vpc_master_cidr_block cidr block for k8s master, must be a /28 block. string "192.168.0.0/28" no
vpc_secondary_cidr_pods Network CIDR for VPC secundary subnet 1 string "/17" no
vpc_secondary_cidr_services Network CIDR for VPC secundary subnet 2 string "/17" no
whitelist_all_ingress_cidrs_lb Normally we filter on the load balancer, but some customers want to filter at the SG/Firewall. This flag will whitelist 0.0.0.0/0 on the load balancer. bool false no
whitelisted_egress_cidrs List of Internet addresses to which the application has access list(string) n/a yes
whitelisted_ingress_cidrs List of CIDRs that can access the HTTP/HTTPS list(string) n/a yes

Outputs

Name Description
clickhouse_backup_sa Name of the clickhouse backup Service Account
clickhouse_data_size Size in GB of the clickhouse data volume
clickhouse_data_volume_id Volume ID of the clickhouse data PD volume
clickhouse_gcs_bucket Name of the GCS bucket for the clickhouse backups
clickhouse_logs_size Size in GB of the clickhouse logs volume
clickhouse_logs_volume_id Volume ID of the clickhouse logs PD volume
clickhouse_password Password to use for clickhouse
cloud_provider The cloud provider creating all the resources
cluster_name The name of the GKE cluster that was created
db_instance_id The database instance ID
deployment_name The name of the deployment
domain_name The domain name on the HTTPS certificate
lb_external_ip The load balancer IP when it was provisioned.
neg_name The name of the Network Endpoint Group where pods need to be registered from kubernetes.
postgres_database_name The name of the postgres database
postgres_host The hostname of the postgres database
postgres_password The postgres password
postgres_port The port of the postgres database
postgres_username The postgres username
redis_data_size The size in GB of the redis data volume
redis_data_volume_id The volume ID of the Redis PD data volume
redis_password The Redis password
vpc_cidr The CIDR range of the VPC
vpc_id The ID of the Google VPC the cluster runs in.
vpc_subnetwork The subnet in which the cluster is created

About

A terraform module for deploying the Datafold infrastructure on Google cloud.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages