-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm Installs Don't Honor Timeout #2025
Comments
I wonder if the port forwarder also needs to be given the timeout window. @adamreese do tunnels close if idle for some amount of time? |
@technosophos Tunnels will remain open until closed |
No I am not seeing the pods restart. |
I have the same problem without any using helm. Just issuing this to connect openshift: In |
So this may be an upstream issue rather than a Helm-specific one. Can we get a list of Kubernetes versions/distributions that this is showing up in? |
Having the same problem,
Helm version: 2.5.1 |
As @bacongobbler suggested I'm summarizing my experience with the same problem here. helm 2.6.1 client and server When I run
Yet the delete seems to have succeeded. All the k8s resources defined by the chart are cleaned up as expected. The tiller log looks like this:
That's the end of the log. It contains no errors or exceptions. k8s reports no restarts of the tiller pod. The elapsed time, from outside and from the log, is about 2m 30s, well under the default timeout value for the delete operation. Also, on #2983 @bacongobbler asked "Can you check on the load balancer fronting your kubernetes master API ...?" To my knowledge in this dev cluster there is no LB in front of the master, but I'll double-check and if there is, is there anything specific I should be looking for? |
Same issue on k8s 1.6.7 when installing coreos/prometheus-operator Command is : Error is : |
Per @technosophos' request:
Helm 2.7.2 but I don't think that matters. Relevant line in Kubernetes is here: https://github.com/kubernetes/kubernetes/blob/v1.8.3/staging/src/k8s.io/client-go/tools/portforward/portforward.go#L177-L178 I guess we need to figure out what's firing something on that close |
I may be way off, but ultimately the connection in question is created here: https://github.com/kubernetes/client-go/blob/master/tools/portforward/portforward.go#L138 It is an That interface specifies a It is probably? implemented here: https://github.com/kubernetes/apimachinery/blob/master/pkg/util/httpstream/spdy/connection.go#L141-L145 I'm wondering if maybe the genuine bug we're seeing here is that the idle timeout is left unset and so buried somewhere in Docker's At the moment there's no way to put an idle timeout on that connection in between its creation and usage in Am I on the right track? This issue will prevent helm chart hierarchies above a certain size from being installed, I think. |
I think it's inevitable to have such kind of problems when doing long-running HTTP queries. It's normal for load balancers to kill idle connections. For example, if kube-apiserver is sitting behind ELB, consider increasing I think a proper solution for this problem is to switch to polling. I.e. wait a few minutes for resources to be ready, close the connection and set-up a timer to poll tiller periodically. Disadvantage of this approach - helm will have to establish a new connection to tiller every few seconds but that's certainly better than timing out. |
This was fixed in 2.8.0 and now 2.8.2 is coming out with a improved version of that (the transport closing fixes) |
It might've been fixed by proxy, but I don't think this was directly fixed. Same error, different area of the code |
@bacongobbler oh hmm which part? When I was running into transport closed issues it was with portforward as well since AWS NAT Gateways were closing it (#3182), that's why I am curious what area it is |
We fixed
|
they're sorta confusing but essentially:
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
I'm having all kinds of timeout, both on tiller and on install (transport is closing as well), using helm 2.9.0. It's really annoying, there's azure LB involved, but I already set the idle timeout to 10 minutes. Even using |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Still encountering the same problem with install --timeout (however I think the timeout is much shorter than 5 minutes, it's non-configurable 1 or 2 minutes). |
/remove-lifecycle rotten |
as far as I understand, nobody in the community is currently looking into this particular issue. If you determine what the underlying issue is, we'd appreciate a patch! |
From what I can tell, the timeout is only not honored when attempting to do things remotely. If I log into one of my kubernetes nodes and issue the commands from there, it will sit there until the operation completes fully, whereas if I try to do it remotely it times out fairly quickly.... this lends credence to the theory that the issue lies with whatever loadbalancer is in front of kubernetes. |
i hit this as well, its kinda annoying since we have a big deployment. |
Still getting this error when tries to upgrade helm release with --wait option.
The only workaround that I found is to add your tiller service address as --host option:
|
We are also seeing similar issue when we do helm install/upgrade/delete/ls. Any thoughts? For Example:
|
@gvenka008c have you had a look at #2025 (comment)? |
@bacongobbler Yes, checked that. We have one node where helm is installed and tiller runs on our k8 cluster nodes (3 nodes). we do all helm install from this server which talks to k8 cluster. Not sure if our proxy is blocking the traffic or timing out. Let me check on that. Thanks. |
any progress? |
what is the update ? |
try with helm 3.0 ;-) |
I wouldn't hold out on upgrading to 3.0 being the fix. If @kotamahesh and @torubylist are experiencing the same issues as @badloop described back in 2018 (#2025 (comment)), then the issue isn't with Helm, but it's with the load balancer fronting the Kubernetes API server that's closing the long-running connection too early. It's worth giving a shot, at the very least. |
@torubylist and @kotamahesh, if you wouldn't mind sharing your experiences, that would be more helpful. That way we can help try to diagnose the issue you are seeing, and to direct you towards a potential solution. |
@bacongobbler, Thanks for the response, |
Hi @bacongobbler after the network issues are resolved, I am unable to reproduce the issue. |
@bacongobbler Hello, we had the same issue. We are using Openstack on-premis. I had this problem only on Kubernetes multi master deployment, where kube-api-server is deployed with LB (Octavia). Our resolution is to increase
|
Running into similar issue with AWS EKS, and helm (2.16.1) upgrade with
With the following accompanying tiller log:
As per hook configuration the job isn't deleted by Helm, and in Kubernetes it continues to actually run, and completes in a couple of hours though. I tend to think this is caused by what @andreychernih mentioned above in #2025 (comment), but how could I verify that this is actually AWS EKS API web server which terminates the "watch" operation for the job and fails the helm deployment? |
You could start a pod inside of your cluster, install the Helm client there, and run the deployment entirely inside of the cluster. That would only rule in/out some things (like whether a load balancer in the middle was terminating the connection), but it is at least a good debugging step that should provide some useful information. |
@andrvin Any clue on why specifying the --host would workaround the problem ? |
This issue might be similar to what is reported in this issue: kubernetes/kubernetes#67817 we were running into this issue as well, and fixed it with this PR: proofpoint#16 I would be more than happy to submit the PR here if that looks ok. Thanks |
This issue should (finally) be fixed with #8507, which will become available in Helm 3.4.0. Let us know if that does not fix the issue present here. Thanks! |
I have tried to install Spinnaker via Helm many times. Most fail. A lot of them with this error:
Note that they fail in just a couple of minutes, not the 25 minutes specified. Shouldn't they be honoring the specified timeout?
The text was updated successfully, but these errors were encountered: