Skip to content

avensolutions/google-data-workshop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Data Workshop Resources

This repository contains resources and instructions for exercises using Terraform, DBT and Composer to build and operate a Google data environment.

Setup

Clone this repository to your Cloud Shell environment, you will then use the Cloud Shell Editor to make changes to files locally.

Environment Deployment and Configuration using Terraform

Each student will create two Big Query datasets (representing a conformed and presentation tier) encyrpted with a unique KMS key using CMEK.

As all resources will be created in the same project, a prefix (as seen in nonprod.tfvars) will be appended to resource names along with the students surname. For instance if the student's surname is Smith then the datasets would be named nonprod-smith-presentation, etc.

Modify the files created in the terraform directory to perform the steps below.

Step 1 - Create a KMS Key Ring and Key for the nonprod environment

Use the information in kms_key_ring and kms_crypto_key to create a KMS Key Ring and Key for non prod environment.

Include your surname and the environment name in the key ring name and key name.

Generate a plan using terraform plan and execute it using terraform apply.

Step 2 - Create datasets for the conformed and presentation tiers in the nonprod environment

Use the information in bigquery_dataset to create a dataset for the conformed and presentation tiers in the nonprod environment.

The datasets should be located in US and encrypted with the KMS Keys created in step 1 and 2 respectively. All other settings should be left at defaults.

Generate a plan using terraform plan and execute it using terraform apply.

Step 3 - Create an IAM binding for the presentation tier in the nonprod environment

Use the information in google_bigquery_dataset_iam_binding to create an IAM binding for the presentation tier in the nonprod environment allowing the user javen@avensolutions.com to access the dataset with the role of roles/bigquery.dataViewer.

DBT Challenge

An example dbt project is included in the dbt_example directory for your reference.

In your cloud shell, install dbt using the command below:

sudo pip3 uninstall flask
sudo pip3 uninstall Jinja2
sudo pip3 install \
  dbt-core \
  dbt-bigquery

use dbt init to create an empty dbt project.

create a target table in the conformed tier dataset in the nonprod environment by joining the following datasets:

bigquery-public-data.san_francisco.bikeshare_stations
bigquery-public-data.san_francisco.bikeshare_status

The target table (named station_bikes_status) schema required is:

station_id
station_name	
latitude
longitude
dockcount
landmark
installation_date
bikes_available
docks_available
time

Challenge:

Create a target table in the presentation tier (called station_bikes_status_by_month) which shows averages for the bikes_available and docks_available fields for each month in every year avaiable in the source tables, return all other fields (except bikes_available, docks_available and time) as well.

Composer Challenge

Look at the example DAG in composer_dag_example. Rename this dag using your surname, save the file and copy it to the DAGs directory in GCS, run the job using the Airflow UI.

Challenge:

How would you restructure this DAG?

About

Google data workshop covering Terraform, DBT and Composer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published