Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'offset' (for queries) not implemented correctly #20

Closed
crwilcox opened this issue Apr 6, 2020 · 2 comments
Closed

'offset' (for queries) not implemented correctly #20

crwilcox opened this issue Apr 6, 2020 · 2 comments
Assignees
Labels
api: datastore Issues related to the googleapis/python-datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.

Comments

@crwilcox
Copy link
Contributor

crwilcox commented Apr 6, 2020

If you have a large collection, with large objects, and a large offset, limits and offsets aren't respected as expected.

The below sample shows how this can happen.

b/146579743


# Imports the Google Cloud client library
from google.cloud import datastore
import six
import string


MAX_STRING = (string.ascii_lowercase * 58)[:1500]
TOTAL_OBJECTS = 1500

# Instantiates a client
datastore_client = datastore.Client()
KIND = "LargeKind"
def put_objects(count):
    # The name/ID for the new entity
    for i in range(count):
        name = f'sampletask{i:05d}'
        # The Cloud Datastore key for the new entity
        task_key = datastore_client.key(KIND, name)

        # Prepares the new entity
        task = datastore.Entity(key=task_key)
        task['name'] = f"{i:05d}"
        task['family'] = 'Stark'
        task['alive'] = False

        for i in string.ascii_lowercase:
            task[f'space-{i}'] = MAX_STRING
            
        # Saves the entity
        datastore_client.put(task)


def query_objects(total_entities):
    page_query = datastore_client.query(kind=KIND)
    page_query.add_filter("family", "=", "Stark")
    page_query.add_filter("alive",  "=", False)
    # page_query.order = "name"
    offset = 100 # This is greater than the number of entries.
    limit = 100
    
    def verify(limit, offset, expected):
        iterator = page_query.fetch(limit=limit, offset=offset)
        entities = [e for e in iterator]
        if len(entities) != expected:
            print(f"{limit}, {offset}, {expected}. Returned: {len(entities)}")
            #breakpoint()

    print("Verify that with no offset there are the correct # of results")
    verify(limit=None, offset=None, expected=TOTAL_OBJECTS)
    
    print("Verify that with no limit there are results (offset provided)")
    verify(limit=None, offset=900, expected=TOTAL_OBJECTS-900)    

    print("offset beyond items larger Verify 200 items found")
    verify(limit=200, offset=1100, expected=200)

    print("Offset within range, expect 50 despite larger limit")
    verify(limit=100, offset=TOTAL_OBJECTS-50, expected=50)

    print("offset beyond items larger Verify no items found")
    verify(limit=200, offset=TOTAL_OBJECTS+1000, expected=0)

# put_objects(TOTAL_OBJECTS)
query_objects(TOTAL_OBJECTS)
@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-datastore API. label Apr 6, 2020
@crwilcox crwilcox self-assigned this Apr 6, 2020
@crwilcox
Copy link
Contributor Author

crwilcox commented Apr 6, 2020

Investigation shows despite asking for 1450 offset, the query only satisfied 1000.

skipped_results: 1000 more_results: 1 end_cursor: None
skipped_results: 0 more_results: 1 end_cursor: None
skipped_results: 0 more_results: 2 end_cursor: None

At this point we immediately start reading results. Before fetching entities we need to ensure the offset is fully satisfied

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Apr 7, 2020
@crwilcox crwilcox added the priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. label Apr 7, 2020
@crwilcox
Copy link
Contributor Author

crwilcox commented Apr 7, 2020

Fixed by #18

@crwilcox crwilcox closed this as completed Apr 7, 2020
@yoshi-automation yoshi-automation removed the triage me I really want to be triaged. label Apr 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-datastore API. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release.
Projects
None yet
Development

No branches or pull requests

2 participants