This is a python script to demo how to use BigQuery and Data Catalog client to automate tagging of table metadata.
-
The script accepts the following input parameters:
- GCP Project ID, Tag Template Name, Tag Template Region, BQ Dataset and Table
-
Create a sample tag template with the following fields:
- count
- last_modified
-
Execute BQ SQL to retrieve the row count and last modified date of the BQ table
-
Create or update the tag for the BQ table in Data Catalog
- Python 3
- GCP Service Account with the following roles
- BigQuery Data Viewer
- BigQuery Job User
- Data Catalog Tag Editor
- Data Catalog TagTemplate Creator
Create Python virtual environment and install dependencies:
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
Setup authentication using service account key:
export GOOGLE_APPLICATION_CREDENTIALS="path_to_service_account_credential"
> python bq_data_catalog_sample.py --help
usage: bq_data_catalog_sample.py [-h] --project PROJECT --region REGION --template TEMPLATE --dataset DATASET --table TABLE
Sample program to populate BQ table tags in Data Catalog.
optional arguments:
-h, --help show this help message and exit
--project PROJECT GCP Project ID
--region REGION GCP Region for the Data Catalog Tag Template (e.g., us-central1)
--template TEMPLATE Data Catalog Tag Template Name
--dataset DATASET BigQuery Dataset Name
--table TABLE BigQuery Table Name