Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spanner-client: Retry PDML on "Received unexpected EOS on DATA frame from server" #5209

Closed
thiagotnunes opened this issue Jul 29, 2020 · 4 comments · Fixed by #5238
Closed
Assignees
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.

Comments

@thiagotnunes
Copy link

This bug is related to the Spanner client library.

For long lived transactions (>= 30 minutes), in the case of large PDML changes, it is possible that the gRPC connection is terminated with an error "Received unexpected EOS on DATA frame from server".

In this case, we need to retry the transaction either with the received resume token obtained on reading the stream or from scratch. This will ensure that the PDML transaction continues to execute until it is successful or a hard timeout is reached.

We have already implemented such change in the Java client library, for more information see this PR: googleapis/java-spanner#360.

In order to test the fix, we can use a large spanner database. Please speak to @thiagotnunes for more details.

@thiagotnunes thiagotnunes added the api: spanner Issues related to the Spanner API. label Jul 29, 2020
@jskeet jskeet added the type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design. label Jul 29, 2020
@jskeet
Copy link
Collaborator

jskeet commented Jul 29, 2020

(Not sure whether to count this as a bug or a feature request, but it probably doesn't matter much.) @skuruppu I've assigned this to you to start with, given that you've already looked at the Java - but I'm happy to take it on if you'd prefer. (I wouldn't be able to get to it for a few weeks though.)

skuruppu added a commit to skuruppu/google-cloud-dotnet that referenced this issue Aug 7, 2020
For long-running PDML queries (>= 30mins), there's a possibility that
the gRPC connection is terminated with an error "Received unexpected EOS
on DATA frame from server".

We now retry the same transaction on this error.

Fixes googleapis#5209
skuruppu added a commit to skuruppu/google-cloud-dotnet that referenced this issue Sep 2, 2020
For long-running PDML queries (>= 30mins), there's a possibility that
the gRPC connection is terminated with an error "Received unexpected EOS
on DATA frame from server".

We now retry the same transaction on this error.

Fixes googleapis#5209
@skuruppu
Copy link
Contributor

skuruppu commented Sep 3, 2020

I should keep the discussion in the issue instead of the PR.

@thiagotnunes let me know if you want to test this PR against your test dataset or whether I should do it. I won't be able to this week but happy to try next week.

In case you want to do the test, you can use the Docker container by following the instructions here.

@thiagotnunes
Copy link
Author

@skuruppu I can do the testing for this. Will let you know once it completes.

@skuruppu skuruppu assigned thiagotnunes and unassigned skuruppu Sep 3, 2020
@thiagotnunes
Copy link
Author

@skuruppu the test completed successfully. Thanks for the fix.

skuruppu added a commit that referenced this issue Sep 7, 2020
* fix: retry PDML on EOS on DATA error

For long-running PDML queries (>= 30mins), there's a possibility that
the gRPC connection is terminated with an error "Received unexpected EOS
on DATA frame from server".

We now retry the same transaction on this error.

Fixes #5209

* test: added unit test to verify retry on error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API. type: feature request ‘Nice-to-have’ improvement, new feature or different behavior or design.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants