-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Cloud SQL IAM service account authentication failed for user ..." intermittent errors when connecting to Postgres #2212
Comments
Hi @otto-nordander-yubico thanks for raising an issue on the Cloud SQL Proxy! 😄 I will happily take a look into this, a couple clarifying questions:
I'll look at re-producing this on my end. |
Let me know if there's anything else I can help with. |
One thing you could try is enable debug logging which would help us understand if the background refresh is behaving as expected. It's possible the token we're sending to the server is expired somehow. The other possibility is that the backend IAM check is failing for some other reason. |
Yes that's a good point, here I ran with
After the proxy has reached this broken state, every connection try logs the following (must restart proxy for it to work again):
Edit: perhaps some helping information, when this happens I can see that the server asks the client for the password and its sent. When it's working as expected the password is never emitted. |
That's interesting. The way Auto IAM AuthN works is that the Proxy transmits the OAuth2 token to the server through the client certificate. So Postgres shouldn't be asking for a password. That fact that it is implies the token isn't properly being received by the server. |
Perhaps another piece of the puzzle:
|
Okay, I think I've managed to pin it down. It's within What I see is that there's a race condition when starting the proxy and initiating the first connection. The Edit: with that in mind it has nothing to do with having two connections, its simply that the first connection gets a valid cert, and then the initial refresh overwrites the cert so next connection fails. |
Nice find. I suspect this is due to GoogleCloudPlatform/cloud-sql-go-connector#771. I'll get this fixed. Thanks for your careful debugging. |
Yes, that would explain why the certs were different in the first place. Anyway, let me know if you have a patch that you want help testing. Edit: I'm currently testing |
This also manifests as #2224. This is definitely caused by GoogleCloudPlatform/cloud-sql-go-connector#771. I'll have a fix ready to try today or tomorrow. |
PR to fix this in cloudsqlconn is here: GoogleCloudPlatform/cloud-sql-go-connector#806. We'll get that out in a release soon. |
The latest release of the Go Connector has a fix: https://github.com/GoogleCloudPlatform/cloud-sql-go-connector/releases/tag/v1.10.1. |
Nice, I can pull that in and run my tests to verify tomorrow! |
Tested with |
@otto-nordander-yubico Glad your testing went successfully 😄 Thanks so much for raising this issue, we greatly appreciate it! I will close this out, if the issue re-surfaces feel free to re-open this. 👏 👏 @enocom |
Actually I will wait till we release this officially to close out |
Release of version v2.11.3 has the new Go Connector version used and should now resolve this issue 🚀 |
Bug Description
Running
cloud-sql-proxy
sometimes it gets stuck in a state where we cant connect, returningCloud SQL IAM service account authentication failed for user errors
errors.When this happens the proxy container must restart before it works again, i.e. retrying doesn't help.
It seems to happen if we connect multiple times in a row. E.g. start proxy, connect successfully, disconnect, connect again and it hangs but I'm not sure.
I've managed to reproduce it locally with the same error.
Example code (or command)
I've managed to reproduce with the following (note it's rather tricky to reproduce, and might require multiple tries):
Stacktrace
Steps to reproduce?
See example code.
Environment
cloud-sql-proxy --port 5432 --auto-iam-authn --impersonate-service-account=<SA> <postgres instance>
Additional Details
I swear I saw another issue on this that was resolved in this repo, but I can't find it anylonger.
The text was updated successfully, but these errors were encountered: