Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage: Capture relevant headers to blob properties during download #24

Closed
tseaver opened this issue Nov 13, 2019 · 8 comments · Fixed by #204
Closed

Storage: Capture relevant headers to blob properties during download #24

tseaver opened this issue Nov 13, 2019 · 8 comments · Fixed by #204
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@tseaver
Copy link
Contributor

tseaver commented Nov 13, 2019

Residual from googleapis/google-cloud-python#9003.

@william-silversmith notes that even with raw_download enabled, he is unable to detect the content_type of a downloaded blob without performing an additional reload request, which is prohibitive for his usecase at scale. E.g.:

blob = bucket.blob( key )
binary = blob.download_as_string(raw_download=True)
if blob.content_encoding == 'gzip':
    return gunzip(binary)
elif blob.content_encoding == 'br':
    return brotli.decompress(binary)
else:
    return binary

Potentially even...

if blob.content_type == 'application/json':
    return json.loads(binary.decode('utf8'))
@william-silversmith
Copy link
Contributor

Thank you Tres!

@tseaver
Copy link
Contributor Author

tseaver commented Nov 13, 2019

@william-silversmith The headers needed to do this task are buried several layers deep. I'm wondering if a better solution than trying to guess which ones to map onto the blob's properties, we might be better off designing for your usecase by supplying a response_callback argument to the Blob.download_* methods:

  • The callback would take the blob instance and a requests.Response object as its arguments.
  • For non-chunked downloads, the callback would be called once; chunked downloads would call it once per chunk.

An example callback implementation for your usecase might be something like:

def _update_blob_content_encoding(blob, response):
    if blob.content_encoding is None:  # latch
        blob.content_encoding = response.headers.get('Content-Encoding')

@tseaver
Copy link
Contributor Author

tseaver commented Nov 13, 2019

If we don't use a callback, then the relevant headers to start with would seem to be:

  • Cache-Control -> Blob.cache_control
  • Content-Disposition -> Blob.content_disposition
  • Content-Encoding -> Blob.content_encoding
  • Content-Length -> Blob.size
  • Content-Type -> Blob.content_type
  • Etag -> Blob.etag
  • X-Goog-Generation: Blob.generation
  • X-Goog-Hash: Blob.crc32c
  • X-Goog-Metageneration: Blob.metageneration
  • X-Goog-Storage-Class: Blob.storage_class

@william-silversmith
Copy link
Contributor

william-silversmith commented Nov 14, 2019

I almost never need more exotic headers than the ones you listed. If it's easier to only expose those headers, that's fine with me. If you want to go for a more general solution, I am fine with a (synchronous) callback. Thanks again!

@william-silversmith
Copy link
Contributor

Any word on when this might drop? Thanks for your help!

@crwilcox crwilcox transferred this issue from googleapis/google-cloud-python Jan 31, 2020
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Jan 31, 2020
@yoshi-automation yoshi-automation added 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 3, 2020
@frankyn frankyn added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. and removed 🚨 This issue needs some love. triage me I really want to be triaged. labels Feb 4, 2020
@william-silversmith
Copy link
Contributor

Hi Tres, I've made some changes to my own branch of this repo and they seem to work. I tried making a PR but the button was greyed out. Here's a link to the changes in case y'all are interested. In the mean time, I'll direct my repos to use the forked version.

master...seung-lab:master

@william-silversmith
Copy link
Contributor

Unfortunately, it seems PyPI won't let me use a direct reference to github. It would be extremely convenient if some version of this could get merged.

@william-silversmith
Copy link
Contributor

Figured out the PR submission issue.

gcf-merge-on-green bot pushed a commit that referenced this issue Jul 23, 2020
…ds (#204)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-storage/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)

Fixes #24  🦕

This PR autopopulates the following fields for non-chunked downloads based on the server header response:
```
blob.content_encoding
blob.content_type
blob.cache_control
blob.storage_class
blob.content_language
blob.md5_hash
blob.crc32c
```
cojenco pushed a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
…ds (googleapis#204)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-storage/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)

Fixes googleapis#24  🦕

This PR autopopulates the following fields for non-chunked downloads based on the server header response:
```
blob.content_encoding
blob.content_type
blob.cache_control
blob.storage_class
blob.content_language
blob.md5_hash
blob.crc32c
```
cojenco pushed a commit to cojenco/python-storage that referenced this issue Oct 13, 2021
…ds (googleapis#204)

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
- [x] Make sure to open an issue as a [bug/issue](https://github.com/googleapis/python-storage/issues/new/choose) before writing your code!  That way we can discuss the change, evaluate designs, and agree on the general idea
- [x] Ensure the tests and linter pass
- [x] Code coverage does not decrease (if any source code was changed)
- [x] Appropriate docs were updated (if necessary)

Fixes googleapis#24  🦕

This PR autopopulates the following fields for non-chunked downloads based on the server header response:
```
blob.content_encoding
blob.content_type
blob.cache_control
blob.storage_class
blob.content_language
blob.md5_hash
blob.crc32c
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants