Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hubble should indicate that a service cannot be reached due to non-running pods #1299

Open
MartinKolbAtWork opened this issue Nov 20, 2023 · 0 comments

Comments

@MartinKolbAtWork
Copy link

MartinKolbAtWork commented Nov 20, 2023

Not sure if this is an enhancement, feature or a maybe a bug.

I naively assumed that an observability platform for networking on K8s gives me some kind of explicit indication if a K8s service cannot be reached, because the pods that are backing the service are not available/running.

However, I only see the usual DNS flows that also occur when the pods are running. No indication whatsoever that would make me aware that the service cannot be used, because the pods have crashed:

Nov 20 12:53:48.042: pod-test/client-0 (ID:23235) <> kube-system/kube-dns:53 (world) pre-xlate-fwd TRACED (UDP)
Nov 20 12:53:48.042: pod-test/client-0 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) post-xlate-fwd TRANSLATED (UDP)
Nov 20 12:53:48.042: pod-test/client-0:41813 (ID:23235) -> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:53:48.042: pod-test/client-0:41813 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.042: pod-test/client-0:41813 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.043: pod-test/client-0:41813 (ID:23235) <- kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:53:48.043: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.043: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
Nov 20 12:53:48.043: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:53:48.043: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)

At the client side I simply curl’ed the service, getting an expected error message, because the server pods were not running:

root@client-0:/# curl http://pod-service
curl: (7) Couldn't connect to server

When the pods are running, then the flows look like this (note that the first 10 lines look exactly like the 10 lines of the error-case mentioned above):

Nov 20 12:59:27.931: pod-test/client-0 (ID:23235) <> kube-system/kube-dns:53 (world) pre-xlate-fwd TRACED (UDP)
Nov 20 12:59:27.931: pod-test/client-0 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) post-xlate-fwd TRANSLATED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) -> kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) <> kube-system/coredns-5dd5756b68-cnszf (ID:31735) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.931: pod-test/client-0:50175 (ID:23235) <- kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) to-endpoint FORWARDED (UDP)
Nov 20 12:59:27.931: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.931: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
Nov 20 12:59:27.932: kube-system/coredns-5dd5756b68-cnszf:53 (ID:31735) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (UDP)
Nov 20 12:59:27.932: kube-system/kube-dns:53 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (UDP)
Nov 20 12:59:27.932: pod-test/client-0 (ID:23235) <> pod-test/pod-service:80 (world) pre-xlate-fwd TRACED (TCP)
Nov 20 12:59:27.932: pod-test/client-0 (ID:23235) <> pod-test/server-0:80 (ID:15840) post-xlate-fwd TRANSLATED (TCP)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: SYN)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) <- pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: SYN, ACK)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) <> pod-test/server-0 (ID:15840) pre-xlate-rev TRACED (TCP)
Nov 20 12:59:27.932: pod-test/server-0:80 (ID:15840) <> pod-test/client-0 (ID:23235) pre-xlate-rev TRACED (TCP)
Nov 20 12:59:27.932: pod-test/pod-service:80 (world) <> pod-test/client-0 (ID:23235) post-xlate-rev TRANSLATED (TCP)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) <- pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, PSH)
Nov 20 12:59:27.932: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Nov 20 12:59:27.933: pod-test/client-0:50570 (ID:23235) <- pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK, FIN)
Nov 20 12:59:27.933: pod-test/client-0:50570 (ID:23235) -> pod-test/server-0:80 (ID:15840) to-endpoint FORWARDED (TCP Flags: ACK)

To reproduce the scenario, here’s a simple yaml with a client that can execute curl requests (e.g. curl http://pod-service). The error situation is provoked by choosing a "nodeName" of the server pods which does not exist.
By commenting the “nodeName” in the StatefulSet “server”, the scenario can be switched to a state where the pod is running successfully and can serve as an endpoint to the service.

apiVersion: v1
kind: Namespace
metadata:
  name: pod-test
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: server
  namespace: pod-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: server
  template:
    metadata:
      labels:
        app: server
    spec:
      nodeName: kind-worker-non-existing
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: client
  namespace: pod-test
spec:
  replicas: 1
  selector:
    matchLabels:
      app: client
  template:
    metadata:
      labels:
        app: client
    spec:
      containers:
      - image: ubuntu
        command: ['sh', '-c', 'apt update && apt install curl -y && sleep 7d']
        imagePullPolicy: Always
        name: ubuntu
---
apiVersion: v1
kind: Service
metadata:
  name: pod-service
  namespace: pod-test
spec:
  selector:
    app: server
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant