Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ui_firestore): fixed pagination issue #117

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

a-s-k-u
Copy link

@a-s-k-u a-s-k-u commented Sep 26, 2023

Description

Current Behavior: Excessive Document Querying on Page Change
Upon each page transition, the package is querying not only for the documents on the current page but also for all preceding documents. This leads to an unnecessary increase in total read operations from Firebase.

The Solution: This pull request implements the usage of Firebase Firestore query cursors to exclusively retrieve data pertaining to the current page, optimizing data retrieval. When the page size changes, the package no longer re-fetches from first page.

Related Issues

This PR will resolve Pagination Issue.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]).
This will ensure a smooth and quick review process. Updating the pubspec.yaml and changelogs is not required.

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • My PR includes unit or integration tests for all changed/updated/fixed behaviors (See Contributor Guide).
  • All existing and new tests are passing.
  • I updated/added relevant documentation (doc comments with ///).
  • The analyzer (melos run analyze) does not report any problems on my PR.
  • All unit tests pass (melos run test:unit:all doesn't fail).
  • I read and followed the Flutter Style Guide.
  • I signed the CLA.
  • I am willing to follow-up on review comments in a timely manner.

Breaking Change

Does your PR require plugin users to manually update their apps to accommodate your change?

  • Yes, this is a breaking change.
  • No, this is not a breaking change.

final query = widget.query.limit(expectedDocsCount);
final query = (_isInitialized)
? widget.query
.startAfterDocument(_lastQueriedDocument)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That won't work, as startAfter does not guarantee consistency if the documents around the last one from the previous page have similar content.

startAfterDocument is more of a "start after a value matching that document" not "start after the document with that ID".
Meaning that the next page may show items from the previous page

Copy link
Author

@a-s-k-u a-s-k-u Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review. From my understanding, the issue described appears to be relevant when passing data values directly within the query. In this case, we are explicitly utilizing the last queried document, thus mitigating this particular concern.

This same concern is discussed here
https://youtu.be/poqTHxtDXwU?t=276 - at 4:32, where it's emphasised that cloud firestore leverages documentID when the Document object is passed instead of individual data values.

Having said that, I'm not sure if you are referring to this bug, which was only resolved a few years back.
firebase/flutterfire#2936

However, to validate this scenario, I have set up a sample Firestore test database where I loaded 50 duplicate users, followed by 10 distinct users, utilizing a page size of 10.
a-s-k-u@fa21915
There was an edge scenario I noted when end of documents is reached, that can be fixed with this missing check

Other than that, the pagination worked as expected; and all 60 items were loaded up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rrousselGit any feedback on that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. It appears that the original concern was fixed.
I'm fine with this change then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether the data gets updated if the _lastQueriedDocument is deleted by a user...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-s-k-u so, could u find any time to test this scenario out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ShahoodulHassan - I did. If a user tries to delete or modify any document that is in the last queried page(where an active listener is in place), then, it is resulting in duplicate documents getting populated to the list. The other aspects of pagination does not have any impact and works as expected.
Thanks for raising it. I'll update the PR shortly so that the list is populated only for 'added' document change event.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@a-s-k-u If my understanding is correct here, with this change what we are effectively doing is fetching documents after the '_lastQueriedDocument'. Also, our listener is now listening to changes to these set of documents only. My understanding is that - any updates to earlier fetched documents(earlier page docs) will not trigger our listener and therefore will not update UI. What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, @vaibhav891. It's a crucial trade off that we can't overlook. If it were not for _querySubscription?.cancel(), we could still keep an eye on all pages. However, in my opinion, having an excessive number of listeners just to achieve real-time functionality seems unnecessary for the vast majority of use cases. I'm more inclined to the idea of maintaining at least one listener for the topmost page as a practical solution to update the entire pagination through a callback input, in case of changes. We can clarify in the documentation that only the first page will offer real-time updates, allowing end users to adjust their queries accordingly.
I'm looking forward to reaching a consensus on how to proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think having a single listener for the current page is ideal. At least not by default.
Otherwise folks will wonder why when their users scoll up in a ListView, items are no-longer updated.

We should listen to all items, but optimize reads. And maybe have a flag to listen to only a few items

@rrousselGit
Copy link
Contributor

The proposal sounds good, but this needs testing 🙏

@rorystephenson
Copy link

rorystephenson commented Oct 12, 2023

If I understand correctly this would be a major breaking change as the list would no longer update for changes in previously loaded docs?

According to this StackOverflow response if "you have local caching enabled and/or attach the new snapshot listener before removing the old one" the previously fetched documents would be pulled from the cache rather than being fetched from Firestore again. So if we are careful with when we attach the new listener this package can keep listening to previously fetched documents and avoid extra reads when fetching new ones.

@lesnitsky
Copy link
Member

@rrousselGit is any action required on this PR from @a-s-k-u?

@rorystephenson
Copy link

I want to re-iterate that this would be, if I have understood the PR correctly, a major breaking change which is not necessary (previously fetched documents would no longer stay updated).

@rrousselGit
Copy link
Contributor

@rrousselGit is any action required on this PR from @a-s-k-u?

@lesnitsky Well tests are failing, so those need to be fixed.

@a-s-k-u
Copy link
Author

a-s-k-u commented Oct 21, 2023

@rrousselGit , @lesnitsky - I'll fix that.
@rorystephenson - by local caching, do you mean offline capability through PersistenceEnabled property ? https://firebase.google.com/docs/firestore/manage-data/enable-offline. My understanding is that even if we have enabled Offline capability, for a new query, it'd still try to read from server than cache unless app is offline. I'll try to confirm that by enabling PersistenceEnabled for Web and inspecting the network tab in chrome. Please let me know if there is any other way to confirm.

@rrousselGit
Copy link
Contributor

Another thing to be careful of is, previously fetched items will no-longer be available with this PR.

Meaning a ListView cannot use QueryBuilder as it is, as to show the latest page, it needs all items up to the current one.

@a-s-k-u
Copy link
Author

a-s-k-u commented Oct 22, 2023

So, with the current implementation, I enabled offline cache (through this commit ) and below is how the pagination looks like.
image
It's still making cumulative reads on the server, even with local cache enabled.
And, now coming back to this PR ( tested with this commit), pagination looks like below
image
It's performing incremental reads from server as expected.
From the above tests, I'm inclined to believe that just enabling local cache will not reduce server reads; and underlines the need of employing query cursors to achieve effective pagination.

@rorystephenson
Copy link

It would be great if the Firebase team could weigh in on this, the aforementioned StackOverflow answer from a Google Cloud employee indicates that it is possible to subscribe to the same query with a larger limit and receive cached data for the already-fetched items. This would be the optimal solution.

@rrousselGit
Copy link
Contributor

Correct. That is the original reasoning for using this approach of increasing the limit instead of using startAt

@a-s-k-u
Copy link
Author

a-s-k-u commented Oct 28, 2023

I've created a DarkPad, so that it's easy to test out the above mentioned claim.
https://dartpad.dev/?id=ec5d9b700a73f9b67f14073e7f6057bd
However, I'm unable to find any real benefit from keeping the connection open and subscribing to another one with a larger, overlapping set. Observing the results in chrome network tab, I can see that the first page is getting queried multiple times even though it's part of overlapping data. Maybe there is a better way to quantify server reads ? Please feel free to edit or suggest modifications to the above dumbed down version of FirebaseListView.
Just jotting down a few additional references I could find in StackOverflow regarding the above claim.
https://stackoverflow.com/questions/73637044/does-firestore-have-an-in-memory-cache-for-optimistic-updates-separate-from-inde
https://stackoverflow.com/questions/38423277/does-firebase-cache-the-data/38423694#38423694
However, I'm afraid these pertain to reuse of same query; and not cumulative queries.

The official documentation does recommend to use query cursors or page tokens for large result set.
image

Maybe it was thing with Real Time Database and not CloudFireStore anymore ?

@rrousselGit
Copy link
Contributor

If multiple reads are performed, we should change the logic, yes.

But no matter what we do, the CI needs to be fixed :)

@lesnitsky
Copy link
Member

As long as tests are green locally, I'm good. CI setup is currently very fragile, I'm doing some work to make it more stable.

@a-s-k-u
Copy link
Author

a-s-k-u commented Oct 30, 2023

@rrousselGit , @lesnitsky - I'll fix those unit tests.
Additionally, I've discovered a simpler and more reliable method for measuring server reads using firestore metrics exposed by google cloud api..
To assess server reads, I conducted the following tests with a Firestore collection containing just 100 documents while enabling offline caching for all three scenarios. I left the page size as 10 and scrolled down to the bottom to ensure all documents are read:

a.) Current Implementation - I measured 559 reads. 10 + 20 + 30 + 40 + 50 + 60 + 70 + 80 + 90 + 100 + 9 additional reads ( due to our n+1 document query approach)
b.) Current Implementation with querySubscription.cancel() commented out to keep listeners open - I observed the same 559 server reads.
c.) This PR - Precisely 100 Reads

Below is the image from Google Cloud API
image
So, it seems that going with the third approach using query cursors is the way to go.

Comment on lines +107 to +108
bool _isInitialized = false;
late DocumentSnapshot<Document> _lastQueriedDocument;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make _lastQueriedDocument nullable instead and remove that bool

_isInitialized = true;
if (event.docs.isNotEmpty) {
_lastQueriedDocument = event.docs.last;
_snapshot.docs.addAll(event.docs.toList());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutating snapshot.docs is not reasonable. That list should be immutable

@rrousselGit
Copy link
Contributor

We would need new tests for this.

In particular, we need an e2e test to verify that adding/removing items in previous "pages" correctly update the current page.

@a-s-k-u
Copy link
Author

a-s-k-u commented Oct 31, 2023

Well, that would be the tricky part as explained here. Given that it'll listen only to the "current" page, any updates on the "current" page can be handled smoothly. However, it won't detect updates on previous pages. And, having multiple listeners for each page sounds cumbersome.
Alternatively, we could let the end user decide how they want to handle this situation.
For instance, consider a project where posts are sorted by 'lastModifiedOn.' If any updates are made to the collection, the end user can effortlessly capture them by listening to the most recent document. In the event that such updates occur, they can trigger a refresh on FirebaseListView.

@rrousselGit
Copy link
Contributor

It is something supported today though.

@rorystephenson
Copy link

@rrousselGit if I'm understanding right you've confirmed that it should be possible to avoid repeat reads when increasing the limit on the query however the examples that @a-s-k-u has provided seem to show that it is not working. In order to avoid a breaking change we should keep listening to all documents but it sounds like some input is needed from the firestore team on how exactly (ideally with a demonstration) we can increase the limit and avoid re-fetching all previously fetched documents.

@rrousselGit
Copy link
Contributor

Rather than increasing the limit, we could simply have a List of queries. One per "page".

And then find a way to combine in all in a single big "snapshot".

@rorystephenson
Copy link

rorystephenson commented Nov 16, 2023

Rather than increasing the limit, we could simply have a List of queries. One per "page".

And then find a way to combine in all in a single big "snapshot".

@rrousselGit Is this approach robust in the case of an insertion at the page boundary? If there is a page size of 3 and we load the initial page:

  • A, order 1
  • B, order 2
  • C, order 3

Then another page is loaded:

  • A, order 1
  • B, order 2
  • C, order 3
  • D, order 4
  • E, order 5
  • F, order 6

So now we have two listeners, one for the first page (currently returning A B C) and one for the second page (D E F).
Finally a new document, C', is inserted with an order value of 3. If the second page snapshot uses startAfter with the last item in the first page then we could potentially miss C':

First page, start at beginning limit 3:

  • A, order 1
  • B, order 2
  • C, order 3

Second page, start after order 3 limit 3:

  • D, order 4
  • E, order 5
  • F, order 6

@lesnitsky
Copy link
Member

@rrousselGit any feedback on this?

@rrousselGit
Copy link
Contributor

Is this approach robust in the case of an insertion at the page boundary? If there is a page size of 3 and we load the initial page:

That's indeed a possible concern. Although I'm not sure we can find a solution that satisfies all problems at once here.
It'd be great if firestore offered a "startAtIndex" instead of starting at values/documents

@a-s-k-u
Copy link
Author

a-s-k-u commented Dec 3, 2023

If page boundary is the concern, maybe we can introduce a single document overlap to mitigate that. Say, in the above example, for a page size of 3, each listener should listen to 4 docs with one doc overlap - and track docs at both left and right edges for each page.

say,
1st listener listens for 4 docs from A -> A,B,C,D
2nd listener listens for 4 docs from D -> D,E,F,G
3rd listener listens for 4 docs from G -> G,H,I,J

So, just by cross checking the left edge and right edge item of the page every time a page event happens, it'd be possible to tell if the page has expanded or compressed. For eg: if C' gets added at position 4, right edge doc on page01 is no longer D, so, the page size can be adjusted/incremented by 1 until D is reached.

Having said that, this will introduce additional layers of complexity if we are to cover all possible scenarios - say if D gets deleted as well. Then, edges needs to be verified and possibly realigned for every page change.

Maybe, a simpler approach could be just tracking whether the page has changed and exposing a callback for users to handle that change. This callback could trigger actions like scrolling to the top and refreshing the content as needed.

@rrousselGit
Copy link
Contributor

If page boundary is the concern, maybe we can introduce a single document overlap to mitigate that

I think that overlap kinda exist when using startAfterDocument where the document is the end of the previous page. I'm not sure a content overlap is necessary

Maybe, a simpler approach could be just tracking whether the page has changed and exposing a callback for users to handle that change. This callback could trigger actions like scrolling to the top and refreshing the content as needed.

I'm not sure. That would certainly decrease the developer experience and default behaviour. I think it'd be valuable to have the most optimal experience out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FirestoreListView Pagination does not work
6 participants