Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rollout status is jumping from paused to progressing/failed in certain timeframe. #3534

Open
2 tasks done
wangli1030 opened this issue Apr 19, 2024 · 1 comment
Open
2 tasks done
Labels
bug Something isn't working

Comments

@wangli1030
Copy link

Checklist:

  • I've included steps to reproduce the bug.
  • I've included the version of argo rollouts.

Describe the bug

When rollout is in b/g paused status for a long time, the status will jump to progressing and failed, and then back to b/g paused/suspended status in certain timeframe(in my case, every 15 mins). When any resource has prune-last hook, it will caused ArgoCD sync failed.

To Reproduce

Deploy a rollout using b/g strategy and deploy a new version but not promote it. Checking the rollout controller log or rollout status, you will see the rollout status is jumping. Also the the update and transition time of progressing status under rollout status is keep refreshing. This is another indicator that status of rollout is jumping.

Expected behavior

The status should stay in b/g paused/suspended.

Screenshots

v1.6.5

Logs

# Paste the logs from the rollout controller

# Logs for the entire controller:
kubectl logs -n argo-rollouts deployment/argo-rollouts

# Logs for a specific rollout:
kubectl logs -n argo-rollouts deployment/argo-rollouts | grep rollout=<ROLLOUTNAME

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

@wangli1030 wangli1030 added the bug Something isn't working label Apr 19, 2024
@wangli1030
Copy link
Author

wangli1030 commented May 29, 2024

From docs, it says

ProgressDeadlineExceeded reason will be surfaced in the rollout status.
Note that progress will not be estimated during the time a rollout is paused.
Defaults to 600s
progressDeadlineSeconds: 600

However, from the code this line, it only escapes the abort status but not rollouts paused. Hence the program will fall into this line and rollout return this

"message":"ProgressDeadlineExceeded: ReplicaSet \"example-rollout-765fcddb5f\" has timed out progressing.","phase":"Degraded"}}".

It will happen every time after progressDeadlineSeconds. Should we also escape the paused condition like abort in RolloutTimedOut function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant