Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

werf helm upgrade with atomic flag and timeout #5550

Open
1 task done
AdamMachera opened this issue Apr 21, 2023 · 1 comment
Open
1 task done

werf helm upgrade with atomic flag and timeout #5550

AdamMachera opened this issue Apr 21, 2023 · 1 comment

Comments

@AdamMachera
Copy link

Before proceeding

  • I didn't find a similar issue

Version

v1.2.214+fix3

How to reproduce

Have a pod that is a part of helm package that takes 10minutes to start.
Run helm upgrade --install with --timeout 15m and --atomic set

Result

I wanted to add some insights into my werf helm upgrade.
I have --atomic and --timeout 15m set.
If I'm running it without werf it finished around 10m.

Operation start:

2023-04-21T11:03:24.5894899Z werf helm upgrade --namespace somens --create-namespace --install --atomic --timeout 15m --wait -f /home/vsts/work/r1/a/Backend/drop/charts/some-image-values.yaml --set container.image.repository=***/some-image --set container.image.tag=2023.4.21-8052-f2d1839 --set ingress.hosts[0].host=somename.somedomain.com --set serviceAccount.enabled=true --set serviceAccount.workloaduserAssignedIdentityID=00439251-6e2c-4959-8391-40108fe3e782 --set serviceAccount.accountName=someaccount --set secretProvider.userAssignedIdentityID=1238ab04-4bd0-4610-a851-2df69b505c36 --set secretProvider.keyvaultName=kv-somevault --set secretProvider.resourceGroup=some-rg --set secretProvider.subscriptionId=*** --set replicaCount=1 --set autoscaling.enabled=true --set autoscaling.minReplicas=1 --set autoscaling.maxReplicas=3 --set autoscaling.targetCPUUtilizationPercentage=85 --set autoscaling.targetMemoryUtilizationPercentage=85 --set resources.requests.cpu=50m --set resources.requests.memory=256Mi --set resources.limits.cpu=150m --set resources.limits.memory=384Mi --set nodeSelector.pool=*** --version 2023.4.21-8052-f2d1839 some-image ./some-image-2023.4.21-8052-f2d1839.tgz

Operation end:

2023-04-21T11:11:24.9373951Z Error: UPGRADE FAILED: release some-image failed, and has been rolled back due to atomic being set: error processing rollout phase stage: error tracking resources: deploy/some-image failed: po/some-image-64d8fdc484-rxr6g container/some-image: Unhealthy: Readiness probe failed: Get "http://10.5.0.125:8088/healthz": dial tcp 10.5.0.125:8088: connect: connection refused

Process has run for 8minutes. Any idea why

At the end before rollback I see this

2023-04-21T11:11:24.6268959Z │ ┌ Status progress
2023-04-21T11:11:24.6274161Z │ │ �[40;32;39;22;49mDEPLOYMENT                                                                    REPLICAS      AVAILABLE       UP-TO-DATE                 �[0m
2023-04-21T11:11:24.6289280Z │ │ �[40;32;39;22;49mevents                                                               4->3/3        3               3                          �[0m
2023-04-21T11:11:24.6290215Z │ │ �[40;32;39;22;49m│   POD                          READY      RESTARTS      STATUS                                                                       �[0m
2023-04-21T11:11:24.6290747Z │ │ �[40;32;39;22;49m├── events-64d8fdc484-7g8t2      1/1        2             Running->Terminatin �[0m
2023-04-21T11:11:24.6291540Z │ │ �[40;32;39;22;49m│                                                         g                   �[0m
2023-04-21T11:11:24.6291983Z │ │ �[40;32;39;22;49m├── events-64d8fdc484-pzfj4      0/0        0             -                   �[0m
2023-04-21T11:11:24.6292464Z │ │ �[40;32;39;22;49m├── events-64d8fdc484-rxr6g      0/0        0             -                   �[0m
2023-04-21T11:11:24.6292953Z │ │ �[40;32;39;22;49m├── events-847dc7c578-rxlwb      1/1        1             Running             �[0m
2023-04-21T11:11:24.6293446Z │ │ �[40;32;39;22;49m├── events-847dc7c578-w8gcz      1/1        1             Running             �[0m
2023-04-21T11:11:24.6293938Z │ │ �[40;32;39;22;49m└── events-847dc7c578-zddjw      1/1        0             Running             �[0m
2023-04-21T11:11:24.6294264Z │ └ Status progress
2023-04-21T11:11:24.6294698Z └ Waiting for resources to become ready�[40;32;39;22;49m (164.60 seconds)�[0m

Expected result

Werf takes into account --timeout flag when working with helm

Additional information

No response

@distorhead
Copy link
Member

distorhead commented Jul 5, 2023

@AdamMachera Hi!

Looks like in your case werf's default failure detector fires a readiness-probe-error:

error tracking resources: deploy/some-image failed: po/some-image-64d8fdc484-rxr6g container/some-image: Unhealthy: Readiness probe failed: Get "http://10.5.0.125:8088/healthz": dial tcp 10.5.0.125:8088: connect: connection refused

— thus it fails deploy process before timeout event occurs.

To alter default failure detector behaviour there are some annotations, which could be specified into the target resource:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants