fix: avoid policy tags 403 error in `load_table_from_dataframe` #557

tswast · 2021-03-17T21:19:21Z

In internal issue 182204971, as customer is encountering a 403 error for missing permissions to set policy tags on a table when trying to append a dataframe to a table with load_table_from_dataframe. This is because we get the schema from the table and then pass it back to the API. We only need to set the field names (and maybe type + mode) in this case.

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes internal bug 182204971 🦕

tswast · 2021-03-18T14:57:09Z

google/cloud/bigquery/schema.py

+        }
+        if mode is not None:
+            self._properties["mode"] = mode.upper()
+        if description is not _DEFAULT_VALUE:


@shollyman This is one of the key changes: we no longer set the resource value for "description" if it's not explicitly set.

We already omit policy_tags from the resource if none (though arguably it should get the same treatment so that someone can unset policy tags from Python)

My default inclination would be for special handling for None values to happen at the places where it's significant, like when calling tables.update. It's also the case that schema fields can't be manipulated individually, so perhaps I'm simply just not thinking this through properly.

I called that out as a possibility in #558, but that'd require updating our field mask logic to support sub-fields, which gets into some hairy string parsing (perhaps not all that hairy, as it could be as simple as split on '.', but is definitely a departure from what we've been doing).

Also, it might mean that we'd have to introduce a field mask to our load job methods. Based on the error message we're seeing, it sounds like it's possible to make updates to fields like policy tags from a load job.

tswast · 2021-03-18T15:02:06Z

google/cloud/bigquery/client.py

-                    field for field in table.schema if field.name in columns_and_indexes
+                    # Field description and policy tags are not needed to
+                    # serialize a data frame.
+                    SchemaField(


This is the actual bug fix. Rather than populate all properties of schema field from the table schema, just populate the minimum we need to convert to parquet/CSV and then upload

We'll need to revisit this for parameterization constraints, but that's a problem for future Tim.

Also, check that sent schema matches DataFrame order, not table order

shollyman · 2021-03-18T16:46:28Z

google/cloud/bigquery/schema.py

+        }
+        if mode is not None:
+            self._properties["mode"] = mode.upper()
+        if description is not _DEFAULT_VALUE:


My default inclination would be for special handling for None values to happen at the places where it's significant, like when calling tables.update. It's also the case that schema fields can't be manipulated individually, so perhaps I'm simply just not thinking this through properly.

shollyman · 2021-03-18T16:50:46Z

google/cloud/bigquery/client.py

-                    field for field in table.schema if field.name in columns_and_indexes
+                    # Field description and policy tags are not needed to
+                    # serialize a data frame.
+                    SchemaField(


We'll need to revisit this for parameterization constraints, but that's a problem for future Tim.

tswast · 2021-04-26T14:46:12Z

For posterity, here is the error you get if you include policyTags in the schema, even if they aren't actually changed:

[1] Reason: 403 POST
https://bigquery.googleapis.com/upload/bigquery/v2/projects/YOUR-PROJECT-ID/jobs?uploadType=resumable:
Access Denied: Taxonomy projects/REDACTED/locations/eu/taxonomies/REDACTED: 
User does not have permission to get taxonomy projects/REDACTED/locations/eu/taxonomies/REDACTED\n'

WIP: fix: don't set policy tags in load job from dataframe

1f6e6d8

product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Mar 17, 2021

google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 17, 2021

tswast changed the title ~~WIP: fix: don't set policy tags in load job from dataframe~~ fix: avoid policy tags 403 error in load_table_from_dataframe Mar 17, 2021

tswast added 2 commits March 17, 2021 17:13

copy fields parameter for struct support

d4b6d32

update tests to allow missing description property

3cdbcc7

tswast marked this pull request as ready for review March 18, 2021 14:34

tswast requested review from a team, stephaniewang526, shollyman and plamut and removed request for a team March 18, 2021 14:34

tswast commented Mar 18, 2021

View reviewed changes

fix load from dataframe test on python 3.6

fe1029d

Also, check that sent schema matches DataFrame order, not table order

tswast mentioned this pull request Mar 18, 2021

add ability to unset policy tags (and description?) in schema fields #558

Closed

shollyman approved these changes Mar 18, 2021

View reviewed changes

tswast merged commit 84e646e into googleapis:master Mar 19, 2021

tswast deleted the b182204971-dataframe-policy-tags branch March 19, 2021 17:54

tswast mentioned this pull request Mar 29, 2021

fix: avoid 403 from to_gbq when table has policyTags googleapis/python-bigquery-pandas#356

Merged

4 tasks

tswast mentioned this pull request Apr 23, 2021

Implement BigQuery Table Schema Update Operator apache/airflow#15367

Merged

This was referenced Sep 22, 2021

feat: enable unsetting policy tags on schema fields #703

Merged

disambiguate missing policy tags from explicitly unset policy tags #981

Closed

release-please bot mentioned this pull request Jan 4, 2022

chore(main): release python-bigquery 1.27.1 #1097

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: avoid policy tags 403 error in `load_table_from_dataframe` #557

fix: avoid policy tags 403 error in `load_table_from_dataframe` #557

tswast commented Mar 17, 2021 •

edited

tswast Mar 18, 2021

shollyman Mar 18, 2021

tswast Mar 18, 2021

tswast Mar 18, 2021

shollyman Mar 18, 2021

shollyman Mar 18, 2021

shollyman Mar 18, 2021

tswast commented Apr 26, 2021

fix: avoid policy tags 403 error in load_table_from_dataframe #557

fix: avoid policy tags 403 error in load_table_from_dataframe #557

Conversation

tswast commented Mar 17, 2021 • edited

tswast Mar 18, 2021

Choose a reason for hiding this comment

shollyman Mar 18, 2021

Choose a reason for hiding this comment

tswast Mar 18, 2021

Choose a reason for hiding this comment

tswast Mar 18, 2021

Choose a reason for hiding this comment

shollyman Mar 18, 2021

Choose a reason for hiding this comment

shollyman Mar 18, 2021

Choose a reason for hiding this comment

shollyman Mar 18, 2021

Choose a reason for hiding this comment

tswast commented Apr 26, 2021

fix: avoid policy tags 403 error in `load_table_from_dataframe` #557

fix: avoid policy tags 403 error in `load_table_from_dataframe` #557

tswast commented Mar 17, 2021 •

edited