New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanner-client: Retry PDML on "Received unexpected EOS on DATA frame from server" #7104
Comments
@jiren hey 👋 , could you help us debug this problem? Here you can see the example I am trying to run and the error I am seeing: https://gist.github.com/thiagotnunes/1e18081d589ad690ebc962b2ed780244 I have tried tweaking the |
@thiagotnunes, looking into it. |
@thiagotnunes I have tried 20 times using provided gist script PDML statement on 5 million records but not getting any error. |
@jiren thanks for trying this out.
|
@thiagotnunes Thanks for info. One thing more |
@thiagotnunes Please ignore previous message. I have access to |
@jiren no worries, happy to help with anything! |
@jiren while investigating this problem in the PHP client library, it seemed like we could not reproduce the problem. Let me know if this occurs here as well. |
@thiagotnunes I am not able to get above exact error but after few minutes getting bellow error. #<GRPC::Internal: 13:Received RST_STREAM with error code 2. debug_error_string:{"created":"@1597912702.090361000","description":"Error received from peer ipv6:[2404:6800:4009:805::200a]:443","file":"src/core/lib/surface/call.cc","file_line":1062,"grpc_message":"Received RST_STREAM with error code 2","grpc_status":13}> After this error, client lib is trying to resume I ran tests for few times but, no test was succeeded. Every time there is a timeout error. Issue location: google-cloud-ruby/google-cloud-spanner/lib/google/cloud/spanner/results.rb Lines 143 to 148 in 0db8a98
|
Thanks for the feedback @jiren, I am testing a small fix on my side based on your findings to see if it solves the problem at hand. If it does I will add you to the PR. |
This bug is related to the Spanner client library.
For long lived transactions (>= 30 minutes), in the case of large PDML changes, it is possible that the gRPC connection is terminated with an error "Received unexpected EOS on DATA frame from server".
In this case, we need to retry the transaction either with the received resume token obtained on reading the stream or from scratch. This will ensure that the PDML transaction continues to execute until it is successful or a hard timeout is reached.
We have already implemented such change in the Java client library, for more information see this PR: googleapis/java-spanner#360.
In order to test the fix, we can use a large spanner database. Please speak to @thiagotnunes for more details.
The text was updated successfully, but these errors were encountered: