Skip to content
This repository has been archived by the owner on Feb 7, 2024. It is now read-only.

Addition of new trigger: on-health-healthy #341

Open
pentago opened this issue Sep 23, 2021 · 9 comments
Open

Addition of new trigger: on-health-healthy #341

pentago opened this issue Sep 23, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@pentago
Copy link

pentago commented Sep 23, 2021

Summary

Introducing new trigger: on-health-healthy

Use Cases

When using HPA and deploying apps, there's a short period of time until HPA collects metrics and the app is fully healthy and in green.

Meanwhile, notifications about degraded health are sent to notification channels and there's no new notification once HPA is healthy and the whole application is in green. This leaves the notification observer wondering if an app is entire time degraded when in reality degraded state lasted for about 10 seconds.

This new trigger should accompany on-health-degraded trigger and better represent situation in clusters.


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

@pentago pentago added the enhancement New feature or request label Sep 23, 2021
@Zava2012
Copy link

Zava2012 commented Sep 24, 2021

Hello!
I have a similar behavior: after approximately 10 seconds when the application has a Healthy status I will get a notification that the application changed status from Healthy -> Degraded without notification when the application will get Healthy status again.

In ArgoCD Application Controller debug logs I've found the root cause of that:
The HPA was unable to compute the replica count: did not receive metrics for any ready pods.

And this is a logical behavior for HPA. But, I think, ArgoCD Notifications is not handling this behavior of HPA that's why we will get redundant notifications in this order:

  • Progressing -> Healthy - the first notification
  • Healthy -> Degraded (approximately after 10 seconds) - the second notification

I'm using these triggers to get notifications:

triggers:
  trigger.on-deployed: |
    - when: app.status.operationState.phase in ['Succeeded'] and app.status.health.status == 'Healthy'
      oncePer: app.status.operationState.syncResult.revision
      send: [app-deployed]
  trigger.on-health-degraded: |
    - when: app.status.health.status == 'Degraded'
      send: [app-health-degraded]

Also, checked current behavior on ArgoCD Notifications v1.1.0 and v1.1.1. Everywhere is the same.

My expectations: when the application is using HPA controller then ArgoCD Notifications must wait before the controller will get necessary metrics from Metrics API (from Metrics Server component) and only after that to send a notification with a Healthy status. But mostly, I think. these changes must be done on ArgoCD side, not ArgoCD Notifications.
Here is a related issue from ArgoCD with a workaround: argoproj/argo-cd#6287

If my expectations are incorrect, please give me know)

@pentago
Copy link
Author

pentago commented Sep 24, 2021

I'm fine with notifications on degraded getting sent but would want healthy status notification right away when the app is healthy again, regardless of how quickly sent - if it's after 10 seconds, that's fine to me.

BUT, if ArgoCD can somehow poll HPA status and wait untill app is healthy, that would be even better.

@pentago
Copy link
Author

pentago commented Sep 24, 2021

@Zava2012 wouldn't hurt adding a 👍🏼 on the issue to move it up in prioritization a bit. :)

@mubarak-j
Copy link

would this trigger condition prevent sending the false alert? i.e wait 2 mins before sending degraded notifications

  trigger.on-health-degraded: |
    - description: Application has degraded
      oncePer: app.status.sync.revision
      send:
      - app-health-degraded
      when: app.status.health.status == 'Degraded' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Minutes() >= 2

@Zava2012
Copy link

@Zava2012 wouldn't hurt adding a 👍🏼 on the issue to move it up in prioritization a bit. :)

Did it :)
But I am more and more sure that changes must be done from ArgoCD side.

would this trigger condition prevent sending the false alert? i.e wait 2 mins before sending degraded notifications

  trigger.on-health-degraded: |
    - description: Application has degraded
      oncePer: app.status.sync.revision
      send:
      - app-health-degraded
      when: app.status.health.status == 'Degraded' and time.Now().Sub(time.Parse(app.status.operationState.startedAt)).Minutes() >= 2

It looks like a soft workaround that can be used. Delay notification is better than incorrect notification, I think.
I'll try and give feedback on how this expression works after trying.
Thanks!

@olvesh
Copy link

olvesh commented Nov 26, 2021

This event/trigger would also be nice when an application has been unhealthy for a while, and has recovered, or should that be another trigger (e.g. on-recovered) ?

We don't want to spam the developers with "everything is ok" slack messages, but if we only notify on failures it would be nice to notify when an app recovers as well.

This is similar to e.g. AlertManager which I am also familiar with which can send notifications on recovery.

@pentago
Copy link
Author

pentago commented Dec 8, 2021

To me personally, sending "deployment successful" messages to Slack is useful because that's how the team knows that their deployment went through successfully instead of them having to watch every single step of the pipeline individually.

@ilacorda
Copy link

Hi there, I would be interested to know if this issue has been planned for a future release and in general what is its priority (I noticed that the latest comment is from December 2021). We have been recently discussing the very same situation at my company and that is the sort of scenario in which we would not mind to see a resolved message/"now healthy" coming back to us. In a nutshell, when the app recovers, it would be nice to be notified. Thanks

@dexterlakin-bdm
Copy link

I also have the same requirement as @ilacorda and would like to notify when an Application changes from degraded to healthy

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants