New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Helm upgrade with --wait is waiting even when all resources are ready #2426
Comments
Seeing this also, k8s 1.6.2, helm 2.4.1 |
@eduardobaitello Thank you for all of the details. I'll look to see what is happening |
Ok, I duplicated this and am working on fixing |
The current methodology generated its own RS slice instead of using a helper method that uses a `clientset`. This caused some issues where `FindNewReplicaSet` always returned `nil`. This switches the method and removes some unneeded API calls and code. Closes helm#2426
@eduardobaitello Can you give #2430 a try if you have some time and see if it solves the issue for you? |
@thomastaylor312, thanks for the quick help! |
The current methodology generated its own RS slice instead of using a helper method that uses a `clientset`. This caused some issues where `FindNewReplicaSet` always returned `nil`. This switches the method and removes some unneeded API calls and code. Closes #2426
I'm still having problems in 2.4.2 with this. Even if I've changed nothing in my release, tiller ends up waiting the full timeout, and no pods are unready, nothing even changed, so everything is already ready, but the wait seems to not detect this. |
can confirm that after upgrading to 2.4.2 I am also having a similar problem (but on a Deployment with |
In my case downgrading to 2.4.1 fixes my issue, so #2430 is directly implicated |
admittedly my issue may be want of kubernetes/kubernetes#41740 (obviously I unlike the other in this thread am not on 1.6 yet) |
I only tried 2.3.2 (works) and 2.4.2 (doesn't work), so I'll also give 2.4.1 a shot to see if I can help ensure it's just something in 2.4.2 (rather than the must larger 2.3.2 -> 2.4.2 set of change). |
I am going to reopen this one due to the multiple people having issues. If you have a chart that you can share (or one of the main stable charts) when this issues happens, please let me know because I tried several charts and had issues duplicating this |
FWIW, I thought I'd hit this bug but upon inspection it was because I was using an external service - proposed a fix for external services here: #2497 Thought I'd mention in case anyone viewing this is using external services too. |
I tested with 2.4.1, it has the same issue as 2.4.2, so it's definitely something about 2.4.x, but not sure where. |
I'd also like to note I'm not using externalServices, my deployments are all at their desired replica count, no pods are unready, etc. |
@chancez can you please post the |
Tiller logs are this, https://gist.github.com/chancez/2d632496799632298efa0ccf9fa70f9d but I don't have the output of the resources at the time. I can assure you that they're in an unchanged state since I'm testing the helm upgrade without having made any changes, as the logs indicate. I'll try to get another helm run with 2.4.2 as well as the kubectl output. |
FWIW, I'm seeing this with |
The current methodology generated its own RS slice instead of using a helper method that uses a `clientset`. This caused some issues where `FindNewReplicaSet` always returned `nil`. This switches the method and removes some unneeded API calls and code. Closes helm#2426
Ok, I narrowed down where this is happening. I could not duplicate this on 1.6.4 (on Minikube), but I could duplicate the issue with 1.5.3 (on Minikube). The problem is that the new replica set is returning @adamreese I went as far down the stack as I understand, do you have any idea what could be causing this? |
Thanks @thomastaylor312! Hopefully you're on the trail. Did that code differ between helm 2.3.x and 2.4.x? As we didn't see this issue with 2.3.x with everything else being identical. In case it is relevant I am also using coreos clusters (kube-aws) like @baracoder. And the cluster it failed in was k8s 1.5.x. |
@whereisaaron We made a change to be smarter with |
If the ReplicaSet either lacks an owner reference or has it set improperly, here's a place to look for telltale signs of failure: |
Is there something with how that is working in k8s 1.5? |
According to the VCS, that area of the code has seen some action over the last eight months. Issue kubernetes/kubernetes#33845 and PR kubernetes/kubernetes#35676 sound relevant, and the latter looks like it didn't make it in until version 1.6. |
Hi @thomastaylor312 just upgraded to helm 2.5.1 for a k8s 1.5 cluster and can confirm this problem still exists. A
helm 2.3 was the last version of
|
Upgrading to a 1.7 k8s cluster resolved this issue for me! |
I'm seeing this with helm 2.7.2 on k8s 1.8.3-gke.0. In my dev env with
The only other thing to note is this chart runs a batch job as a post-install task. Even when that job finishes, helm still hangs. Perhaps the problem is that the job has a status of
|
@boosh Do both of your pods show up as ready while it is hanging? |
@thomastaylor312 Yes |
Hi, is there a solution to this in a 1.5.* cluster? |
@alexppg no, and even on the latest version I find it problematic/unreliable. I just don't use the |
Shouldn't then be an opened issue about it? This or any other one. PD: thanks, I'll do that. @whereisaaron |
We are also getting this issue with the helm version v2.10.0 version on Kubernetes version v1.8.2
We have an internal chart, which we intend to upgrade via the CI/CD pipeline using the following command and the upgrade process keeps on waiting.
We then have to kill the process and the chart release history show 'PENDING_UPGRADE' for some unknown amount of time before either it says DEPLOYED during some builds or Upgrade "<chart_release>" failed: timed out waiting for the condition Is this solved or are we missing something ? |
I have same issue with install/update with --wait |
Same here.
But all resources were created. |
Looks like this workaround suggested by @tcolgate also works for this issue. I just fixed FAILED release using |
Looks like in my case there was a real problem. One of the pods stuck in I added one more node and now it works! Why it looks like this bug but it isn't? Because previously, when I checked the CI result, K8s has already deployed the pod that was in |
daemonset have the same problem |
use --debug only(no --wait), to see which resource is pending, maybe it different from chart defined ,that cause pending |
Hey guys, still having the issue, any idea? Currently using helm image 3.2.0 |
Same issue. All pods are running, but helm upgrade still waiting. version.BuildInfo{Version:"v3.3.4", GitCommit:"a61ce5633af99708171414353ed49547cf05013d", GitTreeState:"clean", GoVersion:"go1.14.9"} |
The same problem |
Hi. I got the same problem with minikube.
|
I just installed a new release, then I upgraded the
image
tag for one of my Deployments inside a chart.After that, I executed a
helm upgrade mychart myrelease --wait --timeout 9999
. Tiller logs only detected changes in the above described Deployment, creating a new pod and ReplicaSet with the new image. Then, the upgrade process get stuck in "waiting for resources", even when all my pods are ready and running:2017/05/10 18:24:01 wait.go:47: beginning wait for resources with timeout of 2h46m39s
So I figured out that the problem is that a new ReplicaSet was created, but helm seems like also waiting for the old one to get all resources ready.
I manually deleted the
old-replica-set-320589364
, and the upgrade was completed.I'm using Kubernetes
v1.6.0
with Helmv2.4.1
The text was updated successfully, but these errors were encountered: