Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disambiguate missing policy tags from explicitly unset policy tags #981

Closed
3 tasks done
tswast opened this issue Sep 22, 2021 · 5 comments · Fixed by #983
Closed
3 tasks done

disambiguate missing policy tags from explicitly unset policy tags #981

tswast opened this issue Sep 22, 2021 · 5 comments · Fixed by #983
Assignees
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tswast
Copy link
Contributor

tswast commented Sep 22, 2021

As seen in internal bug 182204971, #557, if the policyTags key is included in a load job, the user can get 403 POST ... User does not have permission to get taxonomy ... errors when uploading data, even if they have write access to the table.

Based on the behavior seen in googleapis/python-bigquery-pandas#387, I believe this fix has accidentally been reverted in #703 (comment).

TODO:

  • Manually test to see if the error from 182204971 is reproduced
  • Add a system test to avoid future regressions
  • Update SchemaField to allow disambiguating unset vs explicitly set to empty list / none.
@tswast tswast self-assigned this Sep 22, 2021
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the googleapis/python-bigquery API. label Sep 22, 2021
@tswast tswast added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Sep 22, 2021
@tswast
Copy link
Contributor Author

tswast commented Sep 22, 2021

I manually tested, and it appears that empty policy tags are OK.

from google.cloud import bigquery
client = bigquery.Client()
load_job_config = bigquery.LoadJobConfig()
load_job_config.schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("age", "INTEGER"),
]
load_job = client.load_table_from_json([
    {"name": "Adam", "age": 46}
], "swast-scratch.my_dataset.names_with_policy_tags",
job_config=load_job_config)
load_job.result()

Trying to set the policy tags to exactly the value they already are still fails, though:

from google.cloud import bigquery
client = bigquery.Client()
load_job_config = bigquery.LoadJobConfig()
load_job_config.schema = [
    bigquery.SchemaField.from_api_repr({
            "name": "name",
            "type": "STRING",
            "mode": "NULLABLE",
            "policyTags": {
              "names": [
                "projects/swast-scratch/locations/us/taxonomies/5039374947791552873/policyTags/5641276241825860924"
              ]
            }
          }),
    bigquery.SchemaField("age", "INTEGER"),
]
load_job = client.load_table_from_json([
    {"name": "Adam", "age": 46}
], "swast-scratch.my_dataset.names_with_policy_tags",
job_config=load_job_config)
load_job.result()

Error:

$ python load.py 
Traceback (most recent call last):
  File "/home/swast/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2440, in load_table_from_file
    response = self._do_multipart_upload(
  File "/home/swast/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2959, in _do_multipart_upload
    response = upload.transmit(
  File "/home/swast/venv/lib/python3.8/site-packages/google/resumable_media/requests/upload.py", line 154, in transmit
    return _helpers.wait_and_retry(
  File "/home/swast/venv/lib/python3.8/site-packages/google/resumable_media/_helpers.py", line 182, in wait_and_retry
    response = func()
  File "/home/swast/venv/lib/python3.8/site-packages/google/resumable_media/requests/upload.py", line 150, in retriable_request
    self._process_response(result)
  File "/home/swast/venv/lib/python3.8/site-packages/google/resumable_media/_upload.py", line 113, in _process_response
    _helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
  File "/home/swast/venv/lib/python3.8/site-packages/google/resumable_media/_helpers.py", line 100, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "load.py", line 19, in <module>
    load_job = client.load_table_from_json([
  File "/home/swast/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2781, in load_table_from_json
    return self.load_table_from_file(
  File "/home/swast/venv/lib/python3.8/site-packages/google/cloud/bigquery/client.py", line 2444, in load_table_from_file
    raise exceptions.from_http_response(exc.response)
google.api_core.exceptions.Forbidden: 403 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/swast-scratch/jobs?uploadType=multipart: Acce
ss Denied: Table swast-scratch:my_dataset.names_with_policy_tags: Permission bigquery.tables.setCategory denied on table swast-scratch:my_dataset.names_
with_policy_tags (or it may not exist).

I'd still be more comfortable if we omitted the policyTags key altogether if it wasn't explicitly set, though.

@tswast
Copy link
Contributor Author

tswast commented Sep 22, 2021

Uh oh, I can confirm that this code sample accidentally unsets the policy tags, which is very bad.

from google.cloud import bigquery
client = bigquery.Client()
table = bigquery.Table("swast-scratch.my_dataset.names_with_policy_tags")
table.schema = [
    bigquery.SchemaField("name", "STRING"),
    bigquery.SchemaField("age", "INTEGER"),
]
client.update_table(table, ["schema"])

@tswast
Copy link
Contributor Author

tswast commented Sep 22, 2021

Compare with

from google.cloud import bigquery
client = bigquery.Client()
table = bigquery.Table("swast-scratch.my_dataset.names_with_policy_tags")
table._properties["schema"] = {
    "fields": [
        {"name": "name", "type": "STRING"},
        {"name": "age", "type": "INTEGER"},
    ]
}
client.update_table(table, ["schema"])

which does not reset policyTags, as expected. (though it does still reset description, confusingly).

@tswast
Copy link
Contributor Author

tswast commented Sep 22, 2021

From some more manual testing with #983, for table update there isn't a difference between explicit None and omitting the key from the resource. To explicitly unset policy tags, the user will have to set the policy tags to an empty PolicyTagList.

That said, I think it still makes sense to disambiguate explicit None from missing in the client side to allow the developer more control over what is sent over the wire.

@tswast
Copy link
Contributor Author

tswast commented Sep 22, 2021

Re: "Add a system test to avoid future regressions" this is a bit difficult, as it requires some external "taxonomy" resources in Data Catalog. Instead, I've added comments to the relevant unit tests to describe the expected behavior and hopefully avoid regressions.

gcf-merge-on-green bot pushed a commit that referenced this issue Sep 24, 2021
…gs (#983)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes #981 
Fixes #982
Towards googleapis/python-bigquery-pandas#387
🦕
judahrand pushed a commit to judahrand/python-bigquery that referenced this issue Sep 27, 2021
…gs (googleapis#983)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes googleapis#981 
Fixes googleapis#982
Towards googleapis/python-bigquery-pandas#387
🦕
abdelmegahedgoogle pushed a commit to abdelmegahedgoogle/python-bigquery that referenced this issue Apr 17, 2023
…gs (googleapis#983)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [ ] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-bigquery/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [ ] Ensure the tests and linter pass
- [ ] Code coverage does not decrease (if any source code was changed)
- [ ] Appropriate docs were updated (if necessary)

Fixes googleapis#981 
Fixes googleapis#982
Towards googleapis/python-bigquery-pandas#387
🦕
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/python-bigquery API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant