Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigtable): Retry on RST_STREAM error #9673

Merged
merged 5 commits into from
May 16, 2024
Merged

Conversation

bhshkh
Copy link
Contributor

@bhshkh bhshkh commented Mar 31, 2024

Fixes: #6476

Issue: When operation fails with error "stream terminated by RST_STREAM with error code: INTERNAL_ERROR", it is not retried even though spanner client retries on same error.

Fix: Currently, gax.Invoke is retried only on DeadlineExceeded, Unavailable and Aborted. This PR modifies this behavior to retry on Internal error if error message contains "stream terminated by RST_STREAM"

@product-auto-label product-auto-label bot added the api: bigtable Issues related to the Bigtable API. label Mar 31, 2024
@bhshkh bhshkh marked this pull request as ready for review April 1, 2024 14:39
@bhshkh bhshkh requested review from a team as code owners April 1, 2024 14:39
@bhshkh bhshkh added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 2, 2024
@bhshkh
Copy link
Contributor Author

bhshkh commented Apr 2, 2024

Do not merge until release freeze ends mid April

@bhshkh bhshkh removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Apr 15, 2024
@bhshkh bhshkh enabled auto-merge (squash) April 15, 2024 22:43
@bhshkh bhshkh disabled auto-merge April 16, 2024 00:10

func (r *bigtableRetryer) Retry(err error) (time.Duration, bool) {
if status.Code(err) == codes.Internal &&
strings.Contains(err.Error(), "stream terminated by RST_STREAM") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment with context for why this is a special case, with a link back to this bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added


func (r *bigtableRetryer) Retry(err error) (time.Duration, bool) {
if status.Code(err) == codes.Internal &&
strings.Contains(err.Error(), "stream terminated by RST_STREAM") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to make this more flexible? Like take in a list and/or regex string describing the retryable error messages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

func (r *bigtableRetryer) Retry(err error) (time.Duration, bool) {
if status.Code(err) == codes.Internal &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume it's not feasible to retry all InternalErrors?

Have you discussed this with Igor or Mattie? They might have thoughts about what can be retried safely. I wonder if we should be considering this for other clients

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@igorbernstein2 , this should be done for all other clients, right?
Also, should all internal errors be retried ?

Copy link
Contributor Author

@bhshkh bhshkh May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked with Igor regarding which errors need to be retried. Shared the document with the team. Will update the PR

Copy link
Contributor Author

@bhshkh bhshkh May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate issue to track work on rest of the errors: #10207

@bhshkh bhshkh requested a review from a team as a code owner May 7, 2024 06:35
@bhshkh bhshkh enabled auto-merge (squash) May 16, 2024 21:08
@bhshkh bhshkh merged commit d4da4a5 into googleapis:main May 16, 2024
8 of 10 checks passed
@bhshkh bhshkh deleted the fix/issue-6476 branch May 16, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigtable Issues related to the Bigtable API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bigtable: stream terminated by RST_STREAM with error code: INTERNAL_ERROR is not retried by client
3 participants