Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_results argument of Client.list_blobs does not behave correctly #425

Closed
olfek opened this issue Apr 29, 2021 · 3 comments · Fixed by #520
Closed

max_results argument of Client.list_blobs does not behave correctly #425

olfek opened this issue Apr 29, 2021 · 3 comments · Fixed by #520
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@olfek
Copy link

olfek commented Apr 29, 2021

According to this

maxResults should be:

Maximum combined number of entries in items[] and prefixes[] to return in a single page of responses. Because duplicate entries in prefixes[] are omitted, fewer total results may be returned than requested. The service uses this parameter or 1,000 items, whichever is smaller.

When I do this:

client = storage.Client()
bucket = storage.Bucket(client, "<bucket_name>")
blobs_iterator = client.list_blobs(bucket, max_results=10)
for page in blobs_iterator.pages:
    print(page.num_items)

Output:

10

Only one iteration of the loop is completed even though there are 100 blobs in the bucket. It seems the max_results argument is limiting the total results rather than per page results.

Related to #19

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Apr 29, 2021
@frankyn frankyn added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Apr 29, 2021
@frankyn
Copy link
Member

frankyn commented Apr 29, 2021

Thanks for filing this issue @olfek.

I'll leave this open to address this usability issue:

The existing workaround is:

from google.cloud import storage

storage_client = storage.Client()
bucket_name = "bucket-name"
max_results_per_page = 3
blobs = storage_client.list_blobs(bucket_name, max_results=max_results_per_page)
start = False
while blobs.next_page_token or start is False:
   for blob in blobs:
       print(blob.name)
   blobs = storage_client.list_blobs(bucket_name, max_results=max_results_per_page, page_token=blobs.next_page_token)
   start = True

@olfek
Copy link
Author

olfek commented Apr 30, 2021

Note: The original documentation changed in #43 was actually correct, it is the code that is incorrect.

@ryanyuan
Copy link
Contributor

Hi, I reckon we could introduce page_size as one of the parameters here so that the users can use the combination of max_results (to define the total number of return elements) and page_size (to define the number of elements in each page). It will be pretty much the same as https://github.com/googleapis/python-bigquery/blob/master/google/cloud/bigquery/client.py#L362.

I've tested the function in my local and it worked. Can I raise a PR for this? I still need to finish the all the testings and the docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants