Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Credential Resolution on GCE #1261

Open
ijrsvt opened this issue Mar 28, 2023 · 3 comments
Open

Improve Credential Resolution on GCE #1261

ijrsvt opened this issue Mar 28, 2023 · 3 comments
Assignees
Labels
status: investigating The issue is under investigation, which is determined to be non-trivial.

Comments

@ijrsvt
Copy link
Contributor

ijrsvt commented Mar 28, 2023

Is your feature request related to a problem? Please describe.
We have intermittent failures when trying to run google.auth.default() on GKE. We get the following error, even though the metadata service will eventually come up:

google.auth.exceptions.DefaultCredentialsError: Could not automatically determine credentials. Please set GOOGLE_APPLICATION_CREDENTIALS or explicitly create credentials and re-run the application. For more information, please see https://cloud.google.com/docs/authentication/getting-started

Describe the solution you'd like
The Golang client checks for the presence of GCE environment variables (i.e.GCE_METADATA_HOST) before trying to communicate with the service. This would be preferrable because we know that we are on GCE: https://code-review.googlesource.com/c/gocloud/+/5200

Describe alternatives you've considered

  • Increase metadata ping timeout
@wangyutongg
Copy link
Contributor

In the failure cases, how long does it normally take for the metadata server to respond to the ping? The default timeout is 3 seconds, but it can be overwritten by the environment variable GCE_METADATA_TIMEOUT in Python. Please give it a try?

@ijrsvt
Copy link
Contributor Author

ijrsvt commented Mar 28, 2023

@wangyutongg
We aren't 100% sure how long it takes for the metadata server to respond to the ping, but we're pretty sure that the GKE metadata server is not ready & the connections are being refused. We think this because we've tried bumping the GCE_METADATA_TIMEOUT to something like 10 seconds, and still see the issue.

If we could configure the number of retries to be >3, that could also help, but that's just adding another knob to tune.

@wangyutongg wangyutongg self-assigned this Mar 28, 2023
@wangyutongg wangyutongg added the status: investigating The issue is under investigation, which is determined to be non-trivial. label Mar 28, 2023
@wangyutongg
Copy link
Contributor

@ijrsvt Looks like not a timeout issue. When your workload starts to run, the metadata server is not ready. Increasing/configuring more retries does not sound like a good option. Let me do some investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: investigating The issue is under investigation, which is determined to be non-trivial.
Projects
None yet
Development

No branches or pull requests

2 participants