Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blob.download_as_text does not decode properly #319

Closed
kornholi opened this issue Nov 18, 2020 · 1 comment · Fixed by #326
Closed

Blob.download_as_text does not decode properly #319

kornholi opened this issue Nov 18, 2020 · 1 comment · Fixed by #326
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@kornholi
Copy link

Blob.download_as_text tries to use the content-encoding header to decode the bytes. In most cases that value is gzip, even though the bytes were already decompressed at that point. In other cases, e.g text/plain; charset=utf-8, the value does not make sense to Python's bytes.decode.

  File "/storage/bazel-cache/_bazel_kornholi/9f066b43468ef9bfd3c6a621a4515622/execroot/__main__/bazel-out/k8-opt/bin/foo.runfiles/pypi__google_cloud_storage_1_33_0/google/cloud/storage/blob.py", line 1424, in download_as_text
    return data.decode(self.content_encoding)
LookupError: unknown encoding: gzip

I don't think we can be smarter here than passing through the encoding kwarg which defaults to utf-8.

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Nov 18, 2020
@yoshi-automation yoshi-automation added triage me I really want to be triaged. 🚨 This issue needs some love. labels Nov 19, 2020
@tseaver tseaver added priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Nov 24, 2020
@tseaver tseaver self-assigned this Nov 24, 2020
@tseaver
Copy link
Contributor

tseaver commented Nov 24, 2020

@kornholi Thanks for the report! I agree with your assessment that the content_encoding value is not appropriate. It might be possible to use the charset portion of the content_type value as a default, if no explicit encoding argument is passed.

tseaver added a commit that referenced this issue Nov 24, 2020
Explicit 'encoding' overrides the fallback.

Use the 'charset' param of 'content_type', rather than 'content_encoding',
which isn't going to be a Unicode -> bytes encoding.

Closes #319.
tseaver added a commit that referenced this issue Nov 24, 2020
Explicit 'encoding' overrides the fallback.

Use the 'charset' param of 'content_type', rather than 'content_encoding',
which isn't going to be a Unicode -> bytes encoding.

Closes #319.
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Nov 25, 2020
tseaver added a commit that referenced this issue Nov 30, 2020
…326)

Explicit 'encoding' overrides the fallback.

Use the 'charset' param of 'content_type', rather than 'content_encoding',
which isn't going to be a Unicode -> bytes encoding.

Closes #319.

Also, rewrap long param descriptions for in-source readability.
cojenco pushed a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
…oogleapis#326)

Explicit 'encoding' overrides the fallback.

Use the 'charset' param of 'content_type', rather than 'content_encoding',
which isn't going to be a Unicode -> bytes encoding.

Closes googleapis#319.

Also, rewrap long param descriptions for in-source readability.
cojenco pushed a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
…oogleapis#326)

Explicit 'encoding' overrides the fallback.

Use the 'charset' param of 'content_type', rather than 'content_encoding',
which isn't going to be a Unicode -> bytes encoding.

Closes googleapis#319.

Also, rewrap long param descriptions for in-source readability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants