Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

show pods in dashboard for easier debugging of reconcilation errors #4068

Open
schdief opened this issue Oct 5, 2023 · 5 comments
Open

show pods in dashboard for easier debugging of reconcilation errors #4068

schdief opened this issue Oct 5, 2023 · 5 comments

Comments

@schdief
Copy link

schdief commented Oct 5, 2023

Problem
When a reconcilation fails, e. g. due to an imagepullbackoff due to a wrong image tag or missing pull secret, the dashboard is quite useless, as it only shows the deployment and not the pods (old and new).

Solution
The dashboard should also show all the pods and not just the deployment. For the pods it should show the whole yaml to see all the events.

Additional context
For a test deployment I have deliberately broken the image tag reference to get a ImagePullBackOff. So reconciliation fails and the old pod stays active.
Unfortunately Weave GitOps doesn’t tell me that story. It only tells me that reconciliation is in progress and something fails the health check, but in order to see the problem I need to connect to the cluster and use kubectl:

NAME READY STATUS RESTARTS AGE
pod/release-name-nodebrady-5978488bb8-m62gd 1/1 Running 0 11m
pod/release-name-nodebrady-c9897f486-6rmgn 0/1 ImagePullBackOff 0 5m4s

The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.
Clicking on the failing pod I would then also see the reason for the imagepullbackoff:

Warning Failed 17m (x4 over 18m) kubelet Failed to pull image "peter:pan": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/peter:pan": failed to resolve reference "docker.io/library/peter:pan": failed to do request: Head https://registry-1.docker.io/v2/library/peter/manifests/pan: x509: certificate signed by unknown authority

@bigkevmcd
Copy link
Contributor

@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.

But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.

@schdief
Copy link
Author

schdief commented Oct 5, 2023

@schdief I'm not sure you want to see all the pods by default, as if you have a lot of replicas, that's a lot of screen estate, and for a lot of the same thing.

But it does feel like we could do a better job of exposing errors, we'll discuss this and see what we can do.

I agree that for many pods this is a bad idea, maybe you can add a button to see all pods of an deployment and thr default view only shows the number and maybe failed ones. But if I really want I would still like to see all, even if there are 100 :)

Thanks for looking into it!

@foot
Copy link
Contributor

foot commented Oct 10, 2023

Hi @schdief

The graph view should also show the pods, because then I would see the old pod still running and the new pod failing to start due to imagepullbackoff.

Ah, you do not see the pods in the graph view?

Is this from a kustomization or a helmrelease?

@schdief
Copy link
Author

schdief commented Oct 10, 2023

Is this from a kustomization or a helmrelease?

Kustomization (using Weave GitOps 0.33)

image

this is the yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "2"
  creationTimestamp: 2023-10-06T11:11:28Z
  generation: 2
  labels:
    app.kubernetes.io/name: nodebrady
    helm.sh/chart: nodebrady-v0.3.0
    kustomize.toolkit.fluxcd.io/name: nodebrady-master
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:labels:
            f:app.kubernetes.io/name: {}
            f:helm.sh/chart: {}
            f:kustomize.toolkit.fluxcd.io/name: {}
            f:kustomize.toolkit.fluxcd.io/namespace: {}
        f:spec:
          f:replicas: {}
          f:selector: {}
          f:strategy: {}
          f:template:
            f:metadata:
              f:creationTimestamp: {}
              f:labels:
                f:app.kubernetes.io/name: {}
            f:spec:
              f:containers:
                k:{"name":"nodebrady"}:
                  .: {}
                  f:image: {}
                  f:imagePullPolicy: {}
                  f:name: {}
                  f:ports:
                    k:{"containerPort":3000,"protocol":"TCP"}:
                      .: {}
                      f:containerPort: {}
                      f:protocol: {}
                  f:resources: {}
              f:imagePullSecrets:
                k:{"name":"css-qhcr-sdm-dockerconfig"}: {}
                k:{"name":"css-thcr-sdm-dockerconfig"}: {}
      manager: kustomize-controller
      operation: Apply
      time: 2023-10-10T14:56:31Z
    - apiVersion: apps/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:deployment.kubernetes.io/revision: {}
        f:status:
          f:availableReplicas: {}
          f:conditions:
            .: {}
            k:{"type":"Available"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
            k:{"type":"Progressing"}:
              .: {}
              f:lastTransitionTime: {}
              f:lastUpdateTime: {}
              f:message: {}
              f:reason: {}
              f:status: {}
              f:type: {}
          f:observedGeneration: {}
          f:readyReplicas: {}
          f:replicas: {}
          f:updatedReplicas: {}
      manager: kube-controller-manager
      operation: Update
      subresource: status
      time: 2023-10-06T12:28:05Z
  name: nodebrady
  namespace: phippyandfriends-master
  resourceVersion: "217768850"
  uid: 66a1966e-3c03-41f8-84b4-2bfc2a8549cd
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app.kubernetes.io/name: nodebrady
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app.kubernetes.io/name: nodebrady
    spec:
      containers:
        - image: xxx/nodebrady:20231006.1426.8-master
          imagePullPolicy: Always
          name: nodebrady
          ports:
            - containerPort: 3000
              protocol: TCP
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      imagePullSecrets:
        - name: css-qhcr-sdm-dockerconfig
        - name: css-thcr-sdm-dockerconfig
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
status:
  availableReplicas: 1
  conditions:
    - lastTransitionTime: 2023-10-06T11:11:32Z
      lastUpdateTime: 2023-10-06T11:11:32Z
      message: Deployment has minimum availability.
      reason: MinimumReplicasAvailable
      status: "True"
      type: Available
    - lastTransitionTime: 2023-10-06T11:11:28Z
      lastUpdateTime: 2023-10-06T12:28:05Z
      message: ReplicaSet "nodebrady-6cc7f5bbbc" has successfully progressed.
      reason: NewReplicaSetAvailable
      status: "True"
      type: Progressing
  observedGeneration: 2
  readyReplicas: 1
  replicas: 1
  updatedReplicas: 1

@foot
Copy link
Contributor

foot commented Oct 17, 2023

Gotcha! So there is a bug here where we don't shoq the pods in the graph if the namespace differs from the kustomization.

To the other point of showing the pods in the table, we have all the data available, just have to figure out a design..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants