fix(ui_firestore): fixed pagination issue #117

a-s-k-u · 2023-09-26T15:54:54Z

Description

Current Behavior: Excessive Document Querying on Page Change
Upon each page transition, the package is querying not only for the documents on the current page but also for all preceding documents. This leads to an unnecessary increase in total read operations from Firebase.

The Solution: This pull request implements the usage of Firebase Firestore query cursors to exclusively retrieve data pertaining to the current page, optimizing data retrieval. When the page size changes, the package no longer re-fetches from first page.

Related Issues

This PR will resolve Pagination Issue.

Checklist

Before you create this PR confirm that it meets all requirements listed below by checking the relevant checkboxes ([x]).
This will ensure a smooth and quick review process. Updating the pubspec.yaml and changelogs is not required.

I read the Contributor Guide and followed the process outlined there for submitting PRs.
My PR includes unit or integration tests for all changed/updated/fixed behaviors (See Contributor Guide).
All existing and new tests are passing.
I updated/added relevant documentation (doc comments with ///).
The analyzer (melos run analyze) does not report any problems on my PR.
All unit tests pass (melos run test:unit:all doesn't fail).
I read and followed the Flutter Style Guide.
I signed the CLA.
I am willing to follow-up on review comments in a timely manner.

Breaking Change

Does your PR require plugin users to manually update their apps to accommodate your change?

Yes, this is a breaking change.
No, this is not a breaking change.

…is fix prevents redundant data downloads from Firebase for previously accessed pages.

rrousselGit · 2023-10-05T10:07:06Z

packages/firebase_ui_firestore/lib/src/query_builder.dart

-    final query = widget.query.limit(expectedDocsCount);
+    final query = (_isInitialized)
+        ? widget.query
+            .startAfterDocument(_lastQueriedDocument)


That won't work, as startAfter does not guarantee consistency if the documents around the last one from the previous page have similar content.

startAfterDocument is more of a "start after a value matching that document" not "start after the document with that ID".
Meaning that the next page may show items from the previous page

Thank you for your review. From my understanding, the issue described appears to be relevant when passing data values directly within the query. In this case, we are explicitly utilizing the last queried document, thus mitigating this particular concern.

This same concern is discussed here
https://youtu.be/poqTHxtDXwU?t=276 - at 4:32, where it's emphasised that cloud firestore leverages documentID when the Document object is passed instead of individual data values.

Having said that, I'm not sure if you are referring to this bug, which was only resolved a few years back.
firebase/flutterfire#2936

However, to validate this scenario, I have set up a sample Firestore test database where I loaded 50 duplicate users, followed by 10 distinct users, utilizing a page size of 10.
a-s-k-u@fa21915
There was an edge scenario I noted when end of documents is reached, that can be fixed with this missing check

Other than that, the pagination worked as expected; and all 60 items were loaded up.

@rrousselGit any feedback on that?

Sounds good to me. It appears that the original concern was fixed.
I'm fine with this change then

I'm wondering whether the data gets updated if the _lastQueriedDocument is deleted by a user...

@a-s-k-u so, could u find any time to test this scenario out?

Hi @ShahoodulHassan - I did. If a user tries to delete or modify any document that is in the last queried page(where an active listener is in place), then, it is resulting in duplicate documents getting populated to the list. The other aspects of pagination does not have any impact and works as expected.
Thanks for raising it. I'll update the PR shortly so that the list is populated only for 'added' document change event.

@a-s-k-u If my understanding is correct here, with this change what we are effectively doing is fetching documents after the '_lastQueriedDocument'. Also, our listener is now listening to changes to these set of documents only. My understanding is that - any updates to earlier fetched documents(earlier page docs) will not trigger our listener and therefore will not update UI. What do you think?

You're absolutely right, @vaibhav891. It's a crucial trade off that we can't overlook. If it were not for _querySubscription?.cancel(), we could still keep an eye on all pages. However, in my opinion, having an excessive number of listeners just to achieve real-time functionality seems unnecessary for the vast majority of use cases. I'm more inclined to the idea of maintaining at least one listener for the topmost page as a practical solution to update the entire pagination through a callback input, in case of changes. We can clarify in the documentation that only the first page will offer real-time updates, allowing end users to adjust their queries accordingly.
I'm looking forward to reaching a consensus on how to proceed.

I don't think having a single listener for the current page is ideal. At least not by default.
Otherwise folks will wonder why when their users scoll up in a ListView, items are no-longer updated.

We should listen to all items, but optimize reads. And maybe have a flag to listen to only a few items

…ched

rrousselGit · 2023-10-09T16:40:42Z

The proposal sounds good, but this needs testing 🙏

rorystephenson · 2023-10-12T07:51:52Z

If I understand correctly this would be a major breaking change as the list would no longer update for changes in previously loaded docs?

According to this StackOverflow response if "you have local caching enabled and/or attach the new snapshot listener before removing the old one" the previously fetched documents would be pulled from the cache rather than being fetched from Firestore again. So if we are careful with when we attach the new listener this package can keep listening to previously fetched documents and avoid extra reads when fetching new ones.

lesnitsky · 2023-10-19T11:34:15Z

@rrousselGit is any action required on this PR from @a-s-k-u?

rorystephenson · 2023-10-19T13:33:09Z

I want to re-iterate that this would be, if I have understood the PR correctly, a major breaking change which is not necessary (previously fetched documents would no longer stay updated).

rrousselGit · 2023-10-20T08:59:27Z

@rrousselGit is any action required on this PR from @a-s-k-u?

@lesnitsky Well tests are failing, so those need to be fixed.

a-s-k-u · 2023-10-21T02:36:04Z

@rrousselGit , @lesnitsky - I'll fix that.
@rorystephenson - by local caching, do you mean offline capability through PersistenceEnabled property ? https://firebase.google.com/docs/firestore/manage-data/enable-offline. My understanding is that even if we have enabled Offline capability, for a new query, it'd still try to read from server than cache unless app is offline. I'll try to confirm that by enabling PersistenceEnabled for Web and inspecting the network tab in chrome. Please let me know if there is any other way to confirm.

rrousselGit · 2023-10-21T17:11:51Z

Another thing to be careful of is, previously fetched items will no-longer be available with this PR.

Meaning a ListView cannot use QueryBuilder as it is, as to show the latest page, it needs all items up to the current one.

a-s-k-u · 2023-10-22T01:41:17Z

So, with the current implementation, I enabled offline cache (through this commit ) and below is how the pagination looks like.

It's still making cumulative reads on the server, even with local cache enabled.
And, now coming back to this PR ( tested with this commit), pagination looks like below

It's performing incremental reads from server as expected.
From the above tests, I'm inclined to believe that just enabling local cache will not reduce server reads; and underlines the need of employing query cursors to achieve effective pagination.

rorystephenson · 2023-10-25T13:29:47Z

It would be great if the Firebase team could weigh in on this, the aforementioned StackOverflow answer from a Google Cloud employee indicates that it is possible to subscribe to the same query with a larger limit and receive cached data for the already-fetched items. This would be the optimal solution.

rrousselGit · 2023-10-26T10:53:46Z

Correct. That is the original reasoning for using this approach of increasing the limit instead of using startAt

a-s-k-u · 2023-10-28T03:12:50Z

I've created a DarkPad, so that it's easy to test out the above mentioned claim.
https://dartpad.dev/?id=ec5d9b700a73f9b67f14073e7f6057bd
However, I'm unable to find any real benefit from keeping the connection open and subscribing to another one with a larger, overlapping set. Observing the results in chrome network tab, I can see that the first page is getting queried multiple times even though it's part of overlapping data. Maybe there is a better way to quantify server reads ? Please feel free to edit or suggest modifications to the above dumbed down version of FirebaseListView.
Just jotting down a few additional references I could find in StackOverflow regarding the above claim.
https://stackoverflow.com/questions/73637044/does-firestore-have-an-in-memory-cache-for-optimistic-updates-separate-from-inde
https://stackoverflow.com/questions/38423277/does-firebase-cache-the-data/38423694#38423694
However, I'm afraid these pertain to reuse of same query; and not cumulative queries.

The official documentation does recommend to use query cursors or page tokens for large result set.

Maybe it was thing with Real Time Database and not CloudFireStore anymore ?

rrousselGit · 2023-10-30T11:48:26Z

If multiple reads are performed, we should change the logic, yes.

But no matter what we do, the CI needs to be fixed :)

lesnitsky · 2023-10-30T12:23:51Z

As long as tests are green locally, I'm good. CI setup is currently very fragile, I'm doing some work to make it more stable.

a-s-k-u · 2023-10-30T13:30:07Z

@rrousselGit , @lesnitsky - I'll fix those unit tests.
Additionally, I've discovered a simpler and more reliable method for measuring server reads using firestore metrics exposed by google cloud api..
To assess server reads, I conducted the following tests with a Firestore collection containing just 100 documents while enabling offline caching for all three scenarios. I left the page size as 10 and scrolled down to the bottom to ensure all documents are read:

a.) Current Implementation - I measured 559 reads. 10 + 20 + 30 + 40 + 50 + 60 + 70 + 80 + 90 + 100 + 9 additional reads ( due to our n+1 document query approach)
b.) Current Implementation with querySubscription.cancel() commented out to keep listeners open - I observed the same 559 server reads.
c.) This PR - Precisely 100 Reads

Below is the image from Google Cloud API

So, it seems that going with the third approach using query cursors is the way to go.

rrousselGit · 2023-10-30T13:46:11Z

packages/firebase_ui_firestore/lib/src/query_builder.dart

+  bool _isInitialized = false;
+  late DocumentSnapshot<Document> _lastQueriedDocument;


Make _lastQueriedDocument nullable instead and remove that bool

rrousselGit · 2023-10-30T13:46:51Z

packages/firebase_ui_firestore/lib/src/query_builder.dart

+          _isInitialized = true;
+          if (event.docs.isNotEmpty) {
+            _lastQueriedDocument = event.docs.last;
+            _snapshot.docs.addAll(event.docs.toList());


Mutating snapshot.docs is not reasonable. That list should be immutable

rrousselGit · 2023-10-30T13:48:06Z

We would need new tests for this.

In particular, we need an e2e test to verify that adding/removing items in previous "pages" correctly update the current page.

a-s-k-u · 2023-10-31T10:09:11Z

Well, that would be the tricky part as explained here. Given that it'll listen only to the "current" page, any updates on the "current" page can be handled smoothly. However, it won't detect updates on previous pages. And, having multiple listeners for each page sounds cumbersome.
Alternatively, we could let the end user decide how they want to handle this situation.
For instance, consider a project where posts are sorted by 'lastModifiedOn.' If any updates are made to the collection, the end user can effortlessly capture them by listening to the most recent document. In the event that such updates occur, they can trigger a refresh on FirebaseListView.

rrousselGit · 2023-10-31T10:59:35Z

It is something supported today though.

rorystephenson · 2023-10-31T11:05:48Z

@rrousselGit if I'm understanding right you've confirmed that it should be possible to avoid repeat reads when increasing the limit on the query however the examples that @a-s-k-u has provided seem to show that it is not working. In order to avoid a breaking change we should keep listening to all documents but it sounds like some input is needed from the firestore team on how exactly (ideally with a demonstration) we can increase the limit and avoid re-fetching all previously fetched documents.

rrousselGit · 2023-10-31T13:02:28Z

Rather than increasing the limit, we could simply have a List of queries. One per "page".

And then find a way to combine in all in a single big "snapshot".

rorystephenson · 2023-11-16T14:21:32Z

Rather than increasing the limit, we could simply have a List of queries. One per "page".

And then find a way to combine in all in a single big "snapshot".

@rrousselGit Is this approach robust in the case of an insertion at the page boundary? If there is a page size of 3 and we load the initial page:

A, order 1
B, order 2
C, order 3

Then another page is loaded:

A, order 1
B, order 2
C, order 3
D, order 4
E, order 5
F, order 6

So now we have two listeners, one for the first page (currently returning A B C) and one for the second page (D E F).
Finally a new document, C', is inserted with an order value of 3. If the second page snapshot uses startAfter with the last item in the first page then we could potentially miss C':

First page, start at beginning limit 3:

A, order 1
B, order 2
C, order 3

Second page, start after order 3 limit 3:

D, order 4
E, order 5
F, order 6

lesnitsky · 2023-11-29T18:41:04Z

@rrousselGit any feedback on this?

rrousselGit · 2023-11-30T14:40:53Z

Is this approach robust in the case of an insertion at the page boundary? If there is a page size of 3 and we load the initial page:

That's indeed a possible concern. Although I'm not sure we can find a solution that satisfies all problems at once here.
It'd be great if firestore offered a "startAtIndex" instead of starting at values/documents

a-s-k-u · 2023-12-03T14:08:50Z

If page boundary is the concern, maybe we can introduce a single document overlap to mitigate that. Say, in the above example, for a page size of 3, each listener should listen to 4 docs with one doc overlap - and track docs at both left and right edges for each page.

say,
1st listener listens for 4 docs from A -> A,B,C,D
2nd listener listens for 4 docs from D -> D,E,F,G
3rd listener listens for 4 docs from G -> G,H,I,J

So, just by cross checking the left edge and right edge item of the page every time a page event happens, it'd be possible to tell if the page has expanded or compressed. For eg: if C' gets added at position 4, right edge doc on page01 is no longer D, so, the page size can be adjusted/incremented by 1 until D is reached.

Having said that, this will introduce additional layers of complexity if we are to cover all possible scenarios - say if D gets deleted as well. Then, edges needs to be verified and possibly realigned for every page change.

Maybe, a simpler approach could be just tracking whether the page has changed and exposing a callback for users to handle that change. This callback could trigger actions like scrolling to the top and refreshing the content as needed.

rrousselGit · 2023-12-04T11:16:04Z

If page boundary is the concern, maybe we can introduce a single document overlap to mitigate that

I think that overlap kinda exist when using startAfterDocument where the document is the end of the previous page. I'm not sure a content overlap is necessary

Maybe, a simpler approach could be just tracking whether the page has changed and exposing a callback for users to handle that change. This callback could trigger actions like scrolling to the top and refreshing the content as needed.

I'm not sure. That would certainly decrease the developer experience and default behaviour. I think it'd be valuable to have the most optimal experience out of the box.

a-s-k-u added 2 commits September 27, 2023 00:13

Implemented pagination optimization through firebase query cursors.Th…

3c4af9e

…is fix prevents redundant data downloads from Firebase for previously accessed pages.

removed unused variable as suggested by melos run analyse command

9cf89ae

lesnitsky requested a review from rrousselGit October 3, 2023 15:21

rrousselGit reviewed Oct 5, 2023

View reviewed changes

Fixed the issue of unhandled exceptions when end of collection is rea…

3a5daf8

…ched

rrousselGit reviewed Oct 30, 2023

View reviewed changes

		bool _isInitialized = false;
		late DocumentSnapshot<Document> _lastQueriedDocument;

fix(ui_firestore): fixed pagination issue #117

Are you sure you want to change the base?

fix(ui_firestore): fixed pagination issue #117

Conversation

a-s-k-u commented Sep 26, 2023 • edited

Description

Related Issues

Checklist

Breaking Change

rrousselGit Oct 5, 2023 • edited

Choose a reason for hiding this comment

a-s-k-u Oct 6, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rrousselGit commented Oct 9, 2023

rorystephenson commented Oct 12, 2023 • edited

lesnitsky commented Oct 19, 2023

rorystephenson commented Oct 19, 2023

rrousselGit commented Oct 20, 2023

a-s-k-u commented Oct 21, 2023

rrousselGit commented Oct 21, 2023

a-s-k-u commented Oct 22, 2023

rorystephenson commented Oct 25, 2023

rrousselGit commented Oct 26, 2023

a-s-k-u commented Oct 28, 2023

rrousselGit commented Oct 30, 2023

lesnitsky commented Oct 30, 2023

a-s-k-u commented Oct 30, 2023

Choose a reason for hiding this comment

rrousselGit Oct 30, 2023 • edited

Choose a reason for hiding this comment

rrousselGit commented Oct 30, 2023

a-s-k-u commented Oct 31, 2023

rrousselGit commented Oct 31, 2023

rorystephenson commented Oct 31, 2023

rrousselGit commented Oct 31, 2023

rorystephenson commented Nov 16, 2023 • edited

lesnitsky commented Nov 29, 2023

rrousselGit commented Nov 30, 2023

a-s-k-u commented Dec 3, 2023

rrousselGit commented Dec 4, 2023

a-s-k-u commented Sep 26, 2023 •

edited

rrousselGit Oct 5, 2023 •

edited

a-s-k-u Oct 6, 2023 •

edited

rorystephenson commented Oct 12, 2023 •

edited

rrousselGit Oct 30, 2023 •

edited

rorystephenson commented Nov 16, 2023 •

edited