Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add AppendRowsStream helper to append rows with a BigQueryWriteClient #284

Merged
merged 48 commits into from Sep 10, 2021

Conversation

tswast
Copy link
Contributor

@tswast tswast commented Aug 26, 2021

Follow-up to #278

TODO:

  • Fail fast with bad initial request (add close callback?)
  • System test for bad initial request
  • Unit test for timeout behavior
  • Unit/system test that checks that RPC is actually open after open
  • How long does it take for the RPC to open? (on my laptop: between 0.25 and 0.4 seconds) Add some sleep calls to busy loop?

Closes #285

@product-auto-label product-auto-label bot added the api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. label Aug 26, 2021
@snippet-bot
Copy link

snippet-bot bot commented Aug 26, 2021

Here is the summary of changes.

You are about to add 2 region tags.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

  • Refresh this comment

@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Aug 26, 2021
@tswast tswast marked this pull request as ready for review August 26, 2021 20:23
@tswast tswast requested a review from a team August 26, 2021 20:23
@tswast
Copy link
Contributor Author

tswast commented Aug 31, 2021

Uh oh. The BackgroundConsumer eats errors if I send an invalid stream name: https://github.com/googleapis/python-api-core/blob/dcb6ebd9994fddcb1729150df1675ebf8c503a73/google/api_core/bidi.py#L659-L667

It doesn't even send a "done" notification... Hmmm...

@tswast
Copy link
Contributor Author

tswast commented Sep 3, 2021

Samples tests are timing out, stuck waiting for the futures to be done. Possible I messed something up when making the done() call work when called directly.

@tswast tswast added owlbot:run Add this label to trigger the Owlbot post processor. and removed do not merge Indicates a pull request not ready for merge, due to either quality or timing. labels Sep 8, 2021
@gcf-owl-bot gcf-owl-bot bot removed the owlbot:run Add this label to trigger the Owlbot post processor. label Sep 8, 2021
@tswast tswast changed the title feat: add BigQueryWriteClient where append_rows returns a helper for writing row feat: add BigQueryWriteClient where append_rows returns a helper for writing rows Sep 8, 2021
def append_rows(
self,
initial_request: types.AppendRowsRequest,
# TODO: add retry argument. Blocked by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to keep arg order stable, should you stub something here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I've added a * to force timeout and metadata to be keyword arguments only. This is the approach the generated clients have been taking to avoid argument order being a part of the contract.

metadata.

Returns:
A tuple containing a stream and a future. Use the stream to send
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contract feels odd to me at first blush. You seem to have two different ways of appending, depending on whether it's the first append (client append_rows) or a subsequent append(send on the returned stream). I expect this to get odd with retries.

Alternatively, perhaps you just have an open_stream method or similar in the write client, which can be opened with the necessary requisite data (stream id, schema, etc), and the caller doesn't have to be responsible for dealing with the special logic for the first append?

self.__cancelled = False
self._is_done = False

def cancel(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cancel() on an individual future kills all the futures on the same manager? This feels like something that maybe should live on a connection abstraction. Or will it be impossible for python to end up in the state where you have one stream that's draining and one that's accepting new appends?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does seem odd to cancel everything sent later, though that's the semantics I copied from Pub/Sub. From what I can tell, _reopen isn't called until the old RPC has completely shut down.

https://github.com/googleapis/python-api-core/blob/8b65c93aef908c99784200b73ad270a0591481a8/google/api_core/bidi.py#L444-L451

Now that I look more closely, stream resumption only allows one request for the "initial request".

https://github.com/googleapis/python-api-core/blob/8b65c93aef908c99784200b73ad270a0591481a8/google/api_core/bidi.py#L101-L105

That'll probably have to be refactored if/when we do stream resumption so that we can re-send any requests where we didn't get a response yet.

proto_data.rows = proto_rows

# Generate a protocol buffer representation of your message descriptor. You
# must inlcude this information in the first request of an append_rows
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/inlcude/include/

Copy link
Contributor

@shollyman shollyman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems much more aligned as a connection pattern, thanks for all the work getting it into shape!

@tswast tswast merged commit 2461f63 into googleapis:main Sep 10, 2021
@tswast tswast deleted the b195450856-write-client-BiDi branch September 10, 2021 22:07
@tswast tswast changed the title feat: add BigQueryWriteClient where append_rows returns a helper for writing rows feat: add AppendRowsStream helper to append rows with a BigQueryWriteClient Sep 10, 2021
gcf-merge-on-green bot pushed a commit that referenced this pull request Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquerystorage Issues related to the googleapis/python-bigquery-storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants