Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huge performance problem when using append() on a repeated field #246

Closed
tswast opened this issue Sep 17, 2021 · 5 comments
Closed

huge performance problem when using append() on a repeated field #246

tswast opened this issue Sep 17, 2021 · 5 comments
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@tswast
Copy link

tswast commented Sep 17, 2021

Consider the following code sample:

from google.cloud.bigquery_storage_v1beta2 import types


def dense_row_generator(proto_rows):
    """Generates rows up to a predetermined size limit."""
    available = 8 * 1024 * 1024
    rows = []
    while True:
        row_bytes = b"hello world" * 20

        if available - len(row_bytes) < 0:
            break
        available = available - len(row_bytes)

        # (1) Slow! 174.54s user 34.81s system 99% cpu 3:29.59 total
        # proto_rows.serialized_rows.append(row_bytes)

        # (2) OK: 1.68s user 0.46s system 77% cpu 2.768 total
        # proto_rows._pb.serialized_rows.append(row_bytes)

        # (3)
        rows.append(row_bytes)

    # (3) OK: 1.66s user 0.43s system 90% cpu 2.313 total
    proto_rows.serialized_rows = rows


proto_rows = types.ProtoRows()
dense_row_generator(proto_rows)

Since I have used the plain protobuf library before, (1) was my first intuition. This turned out to be more than 50x slower than the same code but accessing the plain protobuf directly!

Perhaps append shouldn't be allowed at all in proto-plus if it can introduce such a huge slowdown?

@tswast tswast added type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Sep 17, 2021
@software-dov
Copy link
Contributor

We can't remove it without causing a breaking change; for anything that's not a hot inner loop it probably doesn't matter too much.
This is hopefully going to be improved when the protobuf team adds additional APIs for manipulating fields so we don't have to rely on hacks.

@software-dov software-dov added P2 A nice-to-fix bug priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. labels Oct 27, 2021
@y4n9squared
Copy link

Encountered a similar problem in a different context. Accessing the underlying _pb member to append directly improves performance significantly. This workaround exists in a lot of Google Python client libraries (eg. googleapis/python-bigtable#382).

Not sure if there is a better solution until issue is solved upstream.

@software-dov
Copy link
Contributor

There may eventually be a much better solution, and in the medium term there will probably be a slightly better solution. Referencing the underlying _pb member is a valid workaround; the bad ergonomics of using vanilla protos is definitely worth putting up with if performance is extremely important.

@yoshi-automation yoshi-automation added 🚨 This issue needs some love. and removed 🚨 This issue needs some love. labels Feb 2, 2022
@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Mar 16, 2022
@vam-google
Copy link
Contributor

Lowing this down to its original priority (p3). There are lot of changes are planned for proto-plus vs native protobuf stubs in the future, and most likely this issue will be addressed indirectly as part of that work.

@vam-google vam-google added priority: p3 Desirable enhancement or fix. May not be included in next release. and removed P2 A nice-to-fix bug priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. labels Mar 23, 2022
@parthea
Copy link
Contributor

parthea commented Nov 21, 2023

Closing as a response was provided in #246 (comment) which includes a workaround for performance sensitive applications. (Googlers consider checking with the protobuf team to determine how best to integrate directly with the protobuf API, rather than via proto-plus. Ref b/259448364)

@parthea parthea closed this as completed Nov 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority: p3 Desirable enhancement or fix. May not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

6 participants