Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: further avoid using proto-plus wrapper when unmarshalling entities #190

Merged
merged 7 commits into from Jul 9, 2021

Conversation

tseaver
Copy link
Contributor

@tseaver tseaver commented Jul 9, 2021

Follow-on to @craiglabenz's work in PR #155. These tweaks combine to get performance on my comparison with 1.5.3 script to within a few percentage points of the 1.5.3 implementation.

$ /tmp/datastore-1.15.3/bin/python /tmp/compare_perf_issue_1_15_3.py 
Time: 0.9033858776092529
$ /tmp/datastore-HEAD/bin/python /tmp/compare_perf_issue_1_15_3.py 
Time: 0.9162840843200684

Closes #150.

This is actually a big win:  in my comparison, we go from 3x to 2x slower
than the older/faster version.
Always take the raw protobuf message, rather than the proto-plus wrapper.

This is another big win:  we are back to within a few percent of the
older/faster version on my comparison test.
@tseaver tseaver requested review from crwilcox, craiglabenz and a team July 9, 2021 17:30
@tseaver tseaver requested a review from a team as a code owner July 9, 2021 17:30
@product-auto-label product-auto-label bot added the api: datastore Issues related to the googleapis/python-datastore API. label Jul 9, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 9, 2021
Copy link
Contributor

@craiglabenz craiglabenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! After a few read-throughs of the PR, I think I have pieced together how/why this improves performance. This LGTM pending your confirmation of my understanding (which is only important in that, if I am still misunderstanding, then I have not actually reviewed this PR as much as I think 😆).


return meaning
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I'm following, do the edits to _get_meaning() impact performance, or is this just for readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big hit: the three-item getattr can be much faster than a try: ... except AttributeError:, but likely not the conditional assignment. OTOH, since in later commits we no longer risk handing the proto-plus wrapper into _get_meaning, we could likely pull that getattr out. The other changes to _get_meaning are micro-optimizaitons, which neither helped nor hurt on my comparison test.

Comment on lines 53 to 54
values = (
value_pb._pb.array_value.values
if hasattr(value_pb, "_pb")
else value_pb.array_value.values
)
# Use 'raw' protobuf message if possible
value_pb = getattr(value_pb, "_pb", value_pb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes like this (there are a few in the PR) seem be a pure refactor of the old implementation, unless I am missing something that further helps performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three-arg getattr does the test in C, versus in Python's own branch machinery. At least in some cases, it can be a speedup.

@@ -124,21 +108,18 @@ def entity_from_protobuf(pb):
:rtype: :class:`google.cloud.datastore.entity.Entity`
:returns: The entity derived from the protobuf.
"""
if not isinstance(pb, entity_pb2.Entity):
proto_pb = entity_pb2.Entity.wrap(pb)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the removal of this line and focusing entirely on raw protobuf objects what delivers the final performance gains?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, indeed. The proto-plus wrappers are an enormous perf-suck. Most of the win in this PR is avoiding touching them anywhere except at the outermost entry point, and then unwrapping there ASAP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had not pieced together that we could avoid .wrap() altogether. The last time I was in here, I'd mistakenly assumed calling that (which is at least the least-slow proto-plus entrypoint) was still required.

Very nice!

Copy link
Contributor

@craiglabenz craiglabenz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tseaver tseaver merged commit d0481bf into master Jul 9, 2021
@tseaver tseaver deleted the 150-perf-tweaks branch July 9, 2021 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: datastore Issues related to the googleapis/python-datastore API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Too slow query fetch
2 participants