Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(bigquery/storage/managedwriter): add append stream plumbing #4452

Merged

Conversation

shollyman
Copy link
Contributor

@shollyman shollyman commented Jul 16, 2021

This PR adds enough of the wiring to the client to being testing via integration tests. It adapts a similar pattern to the pullstream in pubsub, in that it abstracts individual calls from stream state management.

There's two significant units of future work that may yield changes here:

  • For traffic efficiency sake, we only want to add things like the stream ID, schema, and trace ID to the first append on any stream.

  • For stream connection retry, we may want to re-send writes that were sent but we didn't get an acknowledgement back. For default/committed streams, this behavior may yield additional writes (at least once semantics). For buffered/pending streams, it means either the library or user should know to expect "data already present" for these resent-writes.

Towards #4366

@shollyman shollyman requested a review from a team as a code owner July 16, 2021 17:16
@product-auto-label product-auto-label bot added the api: bigquery Issues related to the BigQuery API. label Jul 16, 2021
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Jul 16, 2021
@shollyman shollyman requested a review from codyoss July 16, 2021 17:19
Copy link
Member

@codyoss codyoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped reviewing the test for now, will look at those with the next pass incase refactors are needed. Just a handful of comments.

bigquery/storage/managedwriter/client.go Outdated Show resolved Hide resolved

ms := &ManagedStream{
streamSettings: defaultStreamSettings(),
c: c,
ctx: ctx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to Cody: check if we need this. Could it be passed in later? At the very least we might want to document how context is used for ManagedStream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be a bit wonky here due to the BQ client here having a bunch of retained context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to stick with retained context for now and document. We could potentially get clever using the contexts passed in to AppendRows, but then the lifetime of the contexts become even more ambiguous (e.g. a single append context gets used for the receive processor, etc).

bigquery/storage/managedwriter/integration_test.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/managed_stream.go Outdated Show resolved Hide resolved
}

// Always return the retained ARC if the arg differs.
if arc != ms.arc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will differ when ms.arc is nil, does this need an extra check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time ms.arc will be nil is when arc is nil (we have never opened a connection), so we'll skip this conditional.


func (ms *ManagedStream) append(pw *pendingWrite) error {
return ms.call(func(arc storagepb.BigQueryWrite_AppendRowsClient, ch chan *pendingWrite) error {
// TODO: we should only send stream ID and schema for the first message in a new stream, but
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh, good point. Will think about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plumbed this in. I ended up removing the intermediate call() abstraction, as we're only using it for sending appends and for issuing the CloseSend() and making those two methods call getStream directly. This ensures the sync.Once fires only for appends.

bigquery/storage/managedwriter/managed_stream.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/managed_stream.go Outdated Show resolved Hide resolved
Copy link
Member

@codyoss codyoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@shollyman shollyman added the automerge Merge the pull request once unit tests and other checks pass. label Jul 26, 2021
@gcf-merge-on-green gcf-merge-on-green bot merged commit b085384 into googleapis:master Jul 26, 2021
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jul 26, 2021
@shollyman shollyman deleted the fr-managedwriter-streamwiring branch July 26, 2021 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants