-
Notifications
You must be signed in to change notification settings - Fork 344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connecting to cloud-sql using private-ip sometimes fails with a TLS handshake timeout #2208
Comments
Thanks @akshetpandey. The root problem seems to be this:
The SQL Admin API call isn't responding and the Proxy dies. We recently added retry support for 50x responses here: GoogleCloudPlatform/cloud-sql-go-connector#781. I wonder if we should extend that to include more generic TLS errors. |
I ended up changing my script to check if the pid is still alive and to restart it if it is not. An internal retry will definitely address the issue too. I do want to add that something seems fishy here. Not sure if its the container, dataflow, sql admin, network routing, dns, or something else, but the error happens way too frequently. I don't have concrete data but the failure rate I am seeing implies that Do note that this isn't the first request made in the flow. My script successfully hits the metadata server first and then this fails. |
What kind of CPU usage do you have on this instance? Wondering if this is a client error. |
n1-highmem-4, cpu usage at that point is pretty low. |
How many instances are you connecting to in your script? |
Just the 1 |
Bug Description
I am running v2.9.0/cloud-sql-proxy.linux.amd64 on a GCE
n1-highmem-4
instance that is started through dataflow. The binary runs inside a u22 base image container.As part of the container entry point, I run the following script:
Occasionally, I will get the following error:
And then the script gets stuck in an infinite loop, because cloud-sql-proxy quits instead of trying to connect again on the next attempt. Some of it is my fault, I should be using a process manager, but the timeout is unexpected.
The gce instance should not be throttling, and it runs in the same region as the cloud-sql instance. I do not know how to check if the other side of the auth-proxy is having issues. A lot of other instances are also connecting to is (mostly GAE), and I also see these similar issues on them occasionally.
PS: This is a new issue as a follow up for this comment I posted in a different issue: #2081 (comment)
Steps to reproduce?
No easy reproduction steps, since this happens occasionally (~a few times a week).
Environment
./cloud-sql-proxy --version
):v2.9.0
./cloud-sql-proxy --port 5432 INSTANCE_CONNECTION_NAME
):/bin/cloud_sql_proxy --private-ip -u /cloudsql $INSTANCE &
The text was updated successfully, but these errors were encountered: