Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blob.download_as_text followed by blob.upload_from_string raises a CRC32C validation error #487

Closed
crwilcox opened this issue Jul 3, 2021 · 2 comments · Fixed by #523
Closed
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@crwilcox
Copy link
Contributor

crwilcox commented Jul 3, 2021

from google.cloud import storage

BUCKET_NAME="crwilcox"
BLOB_NAME="blob.txt"
storage_client = storage.Client()
bucket = storage_client.bucket(BUCKET_NAME)
blob = bucket.blob(BLOB_NAME)
blob_text = blob.download_as_text()

blob_text += "edit"
blob.upload_from_string(blob_text)
Traceback (most recent call last):
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2488, in upload_from_file
    created_json = self._do_upload(
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2290, in _do_upload
    response = self._do_multipart_upload(
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 1823, in _do_multipart_upload
    response = upload.transmit(
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/resumable_media/requests/upload.py", line 149, in transmit
    self._process_response(response)
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/resumable_media/_upload.py", line 116, in _process_response
    _helpers.require_status_code(response, (http_client.OK,), self._get_status_code)
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/resumable_media/_helpers.py", line 99, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/crwilcox/workspace/python-playground/bucket_blob_test.py", line 11, in <module>
    blob.upload_from_string(blob_text)
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2761, in upload_from_string
    self.upload_from_file(
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2505, in upload_from_file
    _raise_from_invalid_response(exc)
  File "/home/crwilcox/workspace/python-playground/venv39/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 4271, in _raise_from_invalid_response
    raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.BadRequest: 400 POST https://storage.googleapis.com/upload/storage/v1/b/crwilcox/o?uploadType=multipart: {
  "error": {
    "code": 400,
    "message": "Provided CRC32C \"9+rL+w==\" doesn't match calculated CRC32C \"ueIORg==\".",
    "errors": [
      {
        "message": "Provided CRC32C \"9+rL+w==\" doesn't match calculated CRC32C \"ueIORg==\".",
        "domain": "global",
        "reason": "invalid"
      }
    ]
  }
}
: ('Request failed with status code', 400, 'Expected one of', <HTTPStatus.OK: 200>)

Possibly helpful, recreating the blob resolves the issue:

blob = bucket.blob(BLOB_NAME)
blob_text = blob.download_as_text()
blob = bucket.blob(BLOB_NAME)
blob_text += "edit"
blob.upload_from_string(blob_text)

Versions:

google-api-core==1.30.0
google-auth==1.32.1
google-cloud-core==1.7.1
google-cloud-storage==1.40.0
google-crc32c==1.1.2
google-resumable-media==1.3.1
googleapis-common-protos==1.53.0
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Jul 3, 2021
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Jul 4, 2021
@ddelgrosso1 ddelgrosso1 added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Jul 5, 2021
@crwilcox
Copy link
Contributor Author

crwilcox commented Jul 7, 2021

I suspect what is happening:

on download, the metadata for crc32c is placed in the blob, on 'upload_from_string' with new info, that metadata hasn't been updated, so on put there is a content/checksum mismatch.

@andrewsg andrewsg self-assigned this Jul 7, 2021
@andrewsg
Copy link
Contributor

Diagnosed this to a flaw in the update-from-header strategy added in #204 where properties are being set through the front door, which updates Blob._changes[], rather than directly setting the property in Blob._properties[]. I think modifying that part of the code to directly set the property will fix this issue and also resolve many confusing customer complaints we've received over the past year as well. Thanks so much for your detailed report @crwilcox, couldn't have done it without you. PR coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants