Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Onboard Google Diversity Annual Report 2021 dataset #111

Merged
merged 22 commits into from Jun 28, 2021
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
1546e82
jinja template for backend.tf
adlersantos May 26, 2021
c316d30
template path for backend.tf
adlersantos May 26, 2021
cac78fe
add backend.tf file generation in generate_terraform.py script
adlersantos May 26, 2021
15ad8f5
refactor create file function to be selective of dir tree
adlersantos May 26, 2021
d62196b
fixed backend.tf template with missing quotes
adlersantos May 26, 2021
6c2b341
revised generate_terraform tests to account for remote state file
adlersantos May 26, 2021
92054af
rerun generate terraform files for ml_datasets
adlersantos May 26, 2021
f7c8af7
updated README for TF remote state usage
adlersantos May 26, 2021
ef81b79
Merge branch 'main' into dar-dataset
adlersantos Jun 3, 2021
0c1f5e3
feat: Onboard Google's Diversity Annual Report dataset
adlersantos Jun 3, 2021
ca79b58
Merge branch 'main' into dar-dataset
adlersantos Jun 18, 2021
b98afaf
feat: column descriptions
adlersantos Jun 21, 2021
7b3ef0d
BQ table descriptions
adlersantos Jun 28, 2021
f9f4c55
regenerate BQ table descriptions on .tf files
adlersantos Jun 28, 2021
09dd3e0
Update datasets/google_dei/diversity_annual_report/pipeline.yaml
adlersantos Jun 28, 2021
f335113
Update datasets/google_dei/diversity_annual_report/pipeline.yaml
adlersantos Jun 28, 2021
f62590a
use self_id instead of selfid
adlersantos Jun 28, 2021
11806e8
Merge branch 'dar-dataset' of github.com:GoogleCloudPlatform/public-d…
adlersantos Jun 28, 2021
91a5886
regenerated the DAG
adlersantos Jun 28, 2021
4ec522f
Update datasets/google_dei/diversity_annual_report/pipeline.yaml
adlersantos Jun 28, 2021
f96be72
Update datasets/google_dei/diversity_annual_report/pipeline.yaml
adlersantos Jun 28, 2021
0ea9f65
regenerated the DAG
adlersantos Jun 28, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
216 changes: 216 additions & 0 deletions datasets/google_dei/_terraform/diversity_annual_report_pipeline.tf
@@ -0,0 +1,216 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


resource "google_bigquery_table" "dar_intersectional_attrition" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_intersectional_attrition"

description = "This table contains the attrition index score of Googlers in the U.S. cut by race and gender combined. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_intersectional_attrition-table_id" {
value = google_bigquery_table.dar_intersectional_attrition.table_id
}

output "bigquery_table-dar_intersectional_attrition-id" {
value = google_bigquery_table.dar_intersectional_attrition.id
}

resource "google_bigquery_table" "dar_intersectional_hiring" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_intersectional_hiring"

description = "This table contains the hiring breakdown of Googlers in the U.S. cut by race and gender combined. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_intersectional_hiring-table_id" {
value = google_bigquery_table.dar_intersectional_hiring.table_id
}

output "bigquery_table-dar_intersectional_hiring-id" {
value = google_bigquery_table.dar_intersectional_hiring.id
}

resource "google_bigquery_table" "dar_intersectional_representation" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_intersectional_representation"

description = "This table contains the representation of Googlers in the U.S. cut by race and gender combined. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_intersectional_representation-table_id" {
value = google_bigquery_table.dar_intersectional_representation.table_id
}

output "bigquery_table-dar_intersectional_representation-id" {
value = google_bigquery_table.dar_intersectional_representation.id
}

resource "google_bigquery_table" "dar_non_intersectional_representation" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_non_intersectional_representation"

description = "This table contains the representation of Googlers in the U.S. cut by race and gender separately and the representation of global Googlers cut by gender. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_non_intersectional_representation-table_id" {
value = google_bigquery_table.dar_non_intersectional_representation.table_id
}

output "bigquery_table-dar_non_intersectional_representation-id" {
value = google_bigquery_table.dar_non_intersectional_representation.id
}

resource "google_bigquery_table" "dar_non_intersectional_attrition" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_non_intersectional_attrition"

description = "This table contains the attrition index score of Googlers in the U.S. cut by race and gender separately and the attrition index score of global Googlers cut by gender. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_non_intersectional_attrition-table_id" {
value = google_bigquery_table.dar_non_intersectional_attrition.table_id
}

output "bigquery_table-dar_non_intersectional_attrition-id" {
value = google_bigquery_table.dar_non_intersectional_attrition.id
}

resource "google_bigquery_table" "dar_non_intersectional_hiring" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_non_intersectional_hiring"

description = "This table contains the hiring breakdown of Googlers in the U.S. cut by race and gender separately and the hiring breakdown of global Googlers cut by gender. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_non_intersectional_hiring-table_id" {
value = google_bigquery_table.dar_non_intersectional_hiring.table_id
}

output "bigquery_table-dar_non_intersectional_hiring-id" {
value = google_bigquery_table.dar_non_intersectional_hiring.id
}

resource "google_bigquery_table" "dar_region_non_intersectional_attrition" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_region_non_intersectional_attrition"

description = "This table contains the attrition index score of Googlers in the regions (EMEA, APAC, and the Americas) cut by gender. “Americas” includes all countries in North and South America in which we operate, excluding the U.S. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_region_non_intersectional_attrition-table_id" {
value = google_bigquery_table.dar_region_non_intersectional_attrition.table_id
}

output "bigquery_table-dar_region_non_intersectional_attrition-id" {
value = google_bigquery_table.dar_region_non_intersectional_attrition.id
}

resource "google_bigquery_table" "dar_region_non_intersectional_hiring" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_region_non_intersectional_hiring"

description = "This table contains the hiring breakdown of Googlers in the regions (EMEA, APAC, and the Americas) cut by gender. “Americas” includes all countries in North and South America in which we operate, excluding the U.S. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_region_non_intersectional_hiring-table_id" {
value = google_bigquery_table.dar_region_non_intersectional_hiring.table_id
}

output "bigquery_table-dar_region_non_intersectional_hiring-id" {
value = google_bigquery_table.dar_region_non_intersectional_hiring.id
}

resource "google_bigquery_table" "dar_region_non_intersectional_representation" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_region_non_intersectional_representation"

description = "This table contains the representation of Googlers in the regions (EMEA, APAC, and the Americas) cut by race and gender. “Americas” includes all countries in North and South America in which we operate, excluding the U.S. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_region_non_intersectional_representation-table_id" {
value = google_bigquery_table.dar_region_non_intersectional_representation.table_id
}

output "bigquery_table-dar_region_non_intersectional_representation-id" {
value = google_bigquery_table.dar_region_non_intersectional_representation.id
}

resource "google_bigquery_table" "dar_selfid" {
project = var.project_id
dataset_id = "google_dei"
table_id = "dar_selfid"

description = "This table contains the representation of Googlers globally who identify as LGBTQ+, members of the military or veterans, people with disabilities, or non-binary genders. Some data may be intentionally redacted due to security and privacy restrictions regarding smaller n-counts. In those cases, the data is displayed as a null value."

depends_on = [
google_bigquery_dataset.google_dei
]
}

output "bigquery_table-dar_selfid-table_id" {
value = google_bigquery_table.dar_selfid.table_id
}

output "bigquery_table-dar_selfid-id" {
value = google_bigquery_table.dar_selfid.id
}
36 changes: 36 additions & 0 deletions datasets/google_dei/_terraform/google_dei_dataset.tf
@@ -0,0 +1,36 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


resource "google_bigquery_dataset" "google_dei" {
dataset_id = "google_dei"
project = var.project_id
description = "The diversity annual report from Google provides data on the representation, hiring, and retention of employees in our company including race and gender demographics."
}

output "bigquery_dataset-google_dei-dataset_id" {
value = google_bigquery_dataset.google_dei.dataset_id
}

resource "google_storage_bucket" "ggl-dei" {
name = "${var.bucket_name_prefix}-ggl-dei"
force_destroy = true
uniform_bucket_level_access = true
}

output "storage_bucket-ggl-dei-name" {
value = google_storage_bucket.ggl-dei.name
}
25 changes: 25 additions & 0 deletions datasets/google_dei/_terraform/google_diversity_dataset.tf
@@ -0,0 +1,25 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


resource "google_bigquery_dataset" "google_diversity" {
dataset_id = "google_diversity"
project = var.project_id
}

output "bigquery_dataset-google_diversity-dataset_id" {
value = google_bigquery_dataset.google_diversity.dataset_id
}
28 changes: 28 additions & 0 deletions datasets/google_dei/_terraform/provider.tf
@@ -0,0 +1,28 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


provider "google" {
project = var.project_id
impersonate_service_account = var.impersonating_acct
region = var.region
}

data "google_client_openid_userinfo" "me" {}

output "impersonating-account" {
value = data.google_client_openid_userinfo.me.email
}
23 changes: 23 additions & 0 deletions datasets/google_dei/_terraform/variables.tf
@@ -0,0 +1,23 @@
/**
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/


variable "project_id" {}
variable "bucket_name_prefix" {}
variable "impersonating_acct" {}
variable "region" {}
variable "env" {}

30 changes: 30 additions & 0 deletions datasets/google_dei/dataset.yaml
@@ -0,0 +1,30 @@
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

dataset:
name: google_dei

friendly_name: ~
description: "Google Diversity, Equity, and Inclusion (DEI)"
dataset_sources: ~
terms_of_use: ~

resources:
- type: bigquery_dataset
dataset_id: google_dei
description: "The diversity annual report from Google provides data on the representation, hiring, and retention of employees in our company including race and gender demographics."

- type: storage_bucket
name: "ggl-dei"
leahecole marked this conversation as resolved.
Show resolved Hide resolved
uniform_bucket_level_access: True