Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flagger and Flux gitops workflow in the case of cluster rebuilds #1577

Open
spandan541 opened this issue Jan 9, 2024 · 3 comments
Open

Flagger and Flux gitops workflow in the case of cluster rebuilds #1577

spandan541 opened this issue Jan 9, 2024 · 3 comments

Comments

@spandan541
Copy link

Hi Team,

I have tried looking for a similar issue addressing the same problem but could not find one.

Describe the bug

My team deploys all manifests using Flux and we don't prefer doing it through kubectl. We are keen on using Flagger for our canary releases along with Flux but we have faced a peculiar challenge as our clusters need to be rebuilt very often.

Steps

  1. Deployment in Gitops repo initially has image tag v1
  2. Canary CR is initialised in the cluster with manual gating enabled
  3. Deployment in Gitops repo is updated to image tag v2
  4. Flux reconciles and updates image tag v1 -> v2
  5. Flagger starts the promotion process. Say with manual gating enabled, the promotion process is waiting at a weight of 20.
  6. At this point due to some reason the cluster needs to be rebuilt
  7. When the cluster is rebuilt, flux applies deployment with image tag v2 even though promotion was not complete (This is the problem!)

Expected behavior

This causes the limitation where Flagger & Flux don't know that the promotion process was interrupted due to a cluster rebuild i.e. no state of the promotion process is saved.

Possible solutions :-

  • Provided 2 target deployments (primary and canary) under Canary.spec so no change in image tag is necessary
  • Flagger somehow saves the state of the promotion across rebuilds so even if Flux creates a Deployment with v2, zero traffic is sent to it

Please do let me know if there are better ways to solve this corner case. Any help would be greatly appreciated!
Thanks in advance.

Additional context

  • Flagger version: 1.31.0
  • Kubernetes version: 1.27
  • Service Mesh provider: Istio
@spandan541
Copy link
Author

Any ideas anyone?
@stefanprodan @aryan9600

@LiZhenCheng9527
Copy link
Contributor

You can use version 1.36.1 of Flagger to see if it solves this problem.

@spandan541
Copy link
Author

Unfortunately, that doesn't solve the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants