Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cluster crash recovery itest #55

Open
sed-i opened this issue Dec 12, 2022 · 1 comment
Open

Add cluster crash recovery itest #55

sed-i opened this issue Dec 12, 2022 · 1 comment

Comments

@sed-i
Copy link
Contributor

sed-i commented Dec 12, 2022

Enhancement Proposal

Would be handy to have a cluster crash recovery itest, for when, for example, daemon-kubelite fails to (re)start.

After manually restarting daemon-kubelite, the deployment was stuck on unknown/lost.

Model               Controller  Cloud/Region        Version  SLA          Timestamp
cos-lite-load-test  uk8s        microk8s/localhost  2.9.34   unsupported  19:43:43Z

App            Version  Status   Scale  Charm                         Channel  Rev  Address         Exposed  Message
alertmanager   0.23.0   active       1  alertmanager-k8s              edge      37  10.152.183.156  no       
catalogue               active       1  catalogue-k8s                 edge       5  10.152.183.232  no       
cos-config     3.5.0    active       1  cos-configuration-k8s         edge      14  10.152.183.20   no       
grafana        9.2.1    waiting    0/1  grafana-k8s                   edge      55  10.152.183.105  no       waiting for units to settle down
loki           2.4.1    waiting    0/1  loki-k8s                      edge      47  10.152.183.135  no       waiting for units to settle down
prometheus     2.33.5   waiting    0/1  prometheus-k8s                edge      87  10.152.183.141  no       waiting for units to settle down
scrape-config  n/a      active       1  prometheus-scrape-config-k8s  edge      38  10.152.183.231  no       
scrape-target  n/a      active       1  prometheus-scrape-target-k8s  edge      23  10.152.183.233  no       
traefik                 waiting    0/1  traefik-k8s                   edge      95  10.128.0.6      no       waiting for units to settle down

Unit              Workload  Agent  Address      Ports  Message
alertmanager/0*   active    idle   10.1.79.238         
catalogue/0*      active    idle   10.1.79.215         
cos-config/0*     active    idle   10.1.79.197         
grafana/0         unknown   lost                       agent lost, see 'juju show-status-log grafana/0'
loki/0            unknown   lost                       agent lost, see 'juju show-status-log loki/0'
prometheus/0      unknown   lost                       agent lost, see 'juju show-status-log prometheus/0'
scrape-config/0*  active    idle   10.1.79.204         
scrape-target/0*  active    idle   10.1.79.213         
traefik/0         unknown   lost                       agent lost, see 'juju show-status-log traefik/0'
@simskij
Copy link
Member

simskij commented Mar 12, 2024

duplicate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants