Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: retry Get for 500 and 503 error from GCE metadata server #984

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

baeminbo
Copy link

Currently, getting metadata retries only for a transport error, but doesn't retry for retryable status code.

GCE metadata doc suggests retrying for 503. In addition, GCE metadata server also returns 500 error for intermittent unavailability.

If this happens in token refresh, an intermittent 500 or 503 error is propagated as RefreshError. RefreshError is not retryable in python-api-core library. So, just one time of an intermittent retryable error with GCE metadata leads to GCP API call failure.

To mitigate this, I asked retry of RefreshError at [1], but the team suggested adding retry at auth layer [2].

[1] googleapis/python-api-core#312
[2] googleapis/python-api-core#313 (comment)

@arithmetic1728
Copy link
Contributor

link with issue #980

@TimurSadykov
Copy link
Member

TimurSadykov commented Mar 4, 2022

#980

@arithmetic1728 The #980 is about token endpoint, while this change addresses retries to Metadata endpoint

@TimurSadykov
Copy link
Member

TimurSadykov commented Mar 4, 2022

@arithmetic1728 i think we need first to address the #980 and add Retryable interface, then we can leverage that here to address Metadata retries. Most likely we will opt for retryable errors passed to client instead of actual retries in the library.

@TimurSadykov
Copy link
Member

@baeminbo Hi, could you, please, provide any stats on Metadata service errors that you are trying to mitigate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants