Very poor performance in Firestore reads due to proto-plus sequence unmarshaling #224

GabrieleMazzola · 2021-04-10T12:36:04Z

Environment details

OS type and version: Windows 10 Pro
Python version: 3.9
pip version: 20.2.2
google-cloud-firestore: 2.1.0

Problem

We have been using Firestore with the python admin SDK for some time now, and in some cases the latency in fetching data was nowhere near the expected latency. For example, reading 500 (very small) documents from a collection could take up to 30
seconds depending on which fields the documents had. After some experiments, we found out this poor performance directly related with the presence of empty arrays in the documents.

Seemingly related issues

proto-plus introduces substantial overhead to interactions with pb. #222

Steps to reproduce

Create two collections of 500 documents each: one with documents containing an empty array, and the other collection with documents containing a non-empty array.
Retrieve all documents in each collection and compare execution time.

from google.cloud import firestore
import time
import os

N_DOCS = 500

if __name__ == '__main__':

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "./gcp_sa_creds-dev.json"
    firestore_client = firestore.Client()

    for coll_name in ["performance_test_empty_array", "performance_test_not_empty_array"]:
        coll_ref = firestore_client.collection(coll_name)

        print(f"\n{coll_name}")
        
        # Just uncomment this if you need to clean up the previous test
        # batch = firestore_client.batch()
        # for doc in coll_ref.stream():
        #     batch.delete(doc.reference)
        # batch.commit()

        # Create data
        print("Creating data")
        start = time.time()
        batch = firestore_client.batch()
        for _ in range(N_DOCS):
            batch.set(
                coll_ref.document(),
                {
                    "field": [] if coll_name == "performance_test_empty_array" else ["aa"],
                }
            )
        batch.commit()
        end = time.time()
        elapsed = round(end-start, 2)
        print(f"{elapsed} seconds to create {N_DOCS} documents")

        # Read data
        print("Fetching data")
        start = time.time()
        docs = coll_ref.get()
        end = time.time()
        elapsed = round(end-start, 2)
        print(f"{elapsed} seconds to fetch {N_DOCS} documents")

Leading to the following result:

performance_test_empty_array
Creating data
0.67 seconds to create 500 documents
Fetching data
10.9 seconds to fetch 500 documents

performance_test_not_empty_array
Creating data
0.65 seconds to create 500 documents
Fetching data
0.28 seconds to fetch 500 documents

Issue analysis

To demonstrate the issue with empty arrays, we created a collection containing 500 documents each having a single field as follows:

We then fetched the data with the following code:

from google.cloud import firestore
import time
import os

N_DOCS = 500

if __name__ == '__main__':

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "./gcp_sa_creds-dev.json"
    firestore_client = firestore.Client()

    coll_ref = firestore_client.collection("performance_test_empty_array")

    # Read data
    print("Fetching data")
    start = time.time()
    docs = coll_ref.get()
    end = time.time()
    elapsed = round(end-start, 2)
    print(f"{elapsed} seconds to fetch {N_DOCS} documents")

This required 9.76 seconds. By profiling the python code we understood that this issue was related to an extremely high
number of calls (15,295,500 times) to the deepcopy function of the Python standard library, as it can be seen
in the following profiling output:

The call graph showed us that a lot of calls to the deepcopy function were caused by the _pb_type function defined in the RepeatedComposite class (contained in "proto.marshal.collections.repeated") used to infer the protocol buffer type for a sequence. The reason for this is the following snippet taken from the "_pb_type" function:

proto.marshal.collections.repeated:RepeatedComposite._pb_type

[...]
# We have no members in the list.
# In order to get the type, we create a throw-away copy and add a
# blank member to it.
canary = copy.deepcopy(self.pb).add()
return type(canary)

which performs a deepcopy of the self.pb element just to add an element in order to infer the type of the sequence.

Proposed solution

By replacing the above snippet with the following snippet we can bypass the deepcopy:

[...]
# We have no members in the list.
return self.pb._message_descriptor._concrete_class

That is, we get the type of the protocol buffer from the protected attributes. With this change, fetching the 500 documents took 0.24 seconds compared to 9.76 seconds with the original snippet. In fact, if we profile the code with the newly introduced change, we see that the number of calls to the deepcopy function went from 15,295,500 to 1,500.

Tests

All tests pass after the proposed solution was implemented.

Next steps

As soon as the maintainers have validated (or improved upon) our proposed solution, we can readily submit a PR to address this issue.

@zoncrd @lucastanzani

The text was updated successfully, but these errors were encountered:

software-dov · 2021-04-12T20:39:27Z

Very well researched. Yes, this looks both safe and correct. The entire cached property logic can be removed and be replaced with the _concrete_class lookup, if it passes all the tests.

tseaver · 2021-09-24T15:40:54Z

I addressed the two-orders-of-magnitude slowdown in Firestore yesterday: googleapis/python-firestore#458

hkdevandla · 2022-02-10T02:50:31Z

@software-dov , is this anything to be done here in proto-plus besides the fix that was submitted to address the perf? If nothing else, can we close this bug?

hkdevandla · 2022-02-10T02:50:59Z

@vam-google (for vis)

yoshi-automation added the triage me I really want to be triaged. label Apr 12, 2021

software-dov added P2 A nice-to-fix bug priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Apr 12, 2021

GabrieleMazzola mentioned this issue Apr 13, 2021

fix: lookup attribute instead of performing a deepcopy #226

Merged

yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Jul 13, 2021

yoshi-automation added the 🚨 This issue needs some love. label Nov 3, 2021

hkdevandla assigned software-dov Feb 10, 2022

vchudnov-g assigned atulep and unassigned software-dov Mar 15, 2022

parthea assigned parthea and unassigned atulep May 1, 2022

parthea closed this as completed in #226 May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very poor performance in Firestore reads due to proto-plus sequence unmarshaling #224

Very poor performance in Firestore reads due to proto-plus sequence unmarshaling #224

GabrieleMazzola commented Apr 10, 2021

software-dov commented Apr 12, 2021

tseaver commented Sep 24, 2021

hkdevandla commented Feb 10, 2022

hkdevandla commented Feb 10, 2022

Very poor performance in Firestore reads due to proto-plus sequence unmarshaling #224

Very poor performance in Firestore reads due to proto-plus sequence unmarshaling #224

Comments

GabrieleMazzola commented Apr 10, 2021

Environment details

Problem

Seemingly related issues

Steps to reproduce

Issue analysis

Proposed solution

Tests

Next steps

software-dov commented Apr 12, 2021

tseaver commented Sep 24, 2021

hkdevandla commented Feb 10, 2022

hkdevandla commented Feb 10, 2022