Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding code for bigquery policy tag extractor #398

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

karcot1
Copy link

@karcot1 karcot1 commented Mar 25, 2024

PR contains two files

  1. policy_tag_extractor.sh - a bash script that takes a BigQuery dataset, and outputs a CSV of all objects and their specific columns that contain policy tags.
  2. README.md - a readme that contains information on the script and instructions for its suggested use.

Copy link
Collaborator

@danieldeleo danieldeleo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @karcot1 ! Can you please move this into the scripts/ directory

@karcot1
Copy link
Author

karcot1 commented Mar 27, 2024

@danieldeleo done! Please review when you get the chance. Thanks!

Co-authored-by: Daniel De Leo <danieldeleo@users.noreply.github.com>
@karcot1
Copy link
Author

karcot1 commented Mar 29, 2024

@danieldeleo thanks for the suggested changes! Commits are done and ready for review.

scripts/policy_tag_extractor/README.md Outdated Show resolved Hide resolved
scripts/policy_tag_extractor/README.md Outdated Show resolved Hide resolved
scripts/policy_tag_extractor/policy_tag_export.sh Outdated Show resolved Hide resolved

if [ "${TAG_COUNT}" -ge 1 ]
then
COLUMN_AND_TAG=`bq show --format=prettyjson ${DATASET}.${TABLE} | jq '.schema.fields[] | select(.policyTags | length>=1)'`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle RECORD type columns with nested policy tags. Can you either handle it in code or make an explicit callout in README that this script only handles simple column types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danieldeleo added a line to the Considerations section of the README calling this out. Will work on updating the code to handle nested tags in the future.

karcot1 and others added 3 commits April 18, 2024 18:44
Co-authored-by: Daniel De Leo <danieldeleo@users.noreply.github.com>
…oogleCloudPlatform#399)

* adding anti pattern recognition step to optimization scripts

* using viewable_queries_grouped_by_hash for anti pattern processing

* moving anti pattern recognition tool steps to separate script

* fixing bug in column names

* fixing bug in column names

* adding anti pattern script, accounting for null has

* adding anti pattern script, supporting multiple executions

* adding anti pattern script, addressing duplicate hashes

* adding anti pattern script, addressing duplicate hashes

* making anti pattern opitmization script generic for any input table, removing query column from queries_grouped_by_hash at org level, adding a project level version of queries_grouped_by_hash

* updating readme with examples on how to execute the anti pattern optimization script

* updating readme with examples on how to execute the anti pattern optimization script and removing test file

* changing location of the anti pattern optimization script for clarity

* enhancing readme with instructions to run the anti pattern optimization scripts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants