Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Endpoint selected by multiple headless services only has 1 dns hostname #124207

Closed
alexeldeib opened this issue Apr 6, 2024 · 18 comments
Closed
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@alexeldeib
Copy link
Member

alexeldeib commented Apr 6, 2024

What happened?

A pod in multiple endpoint slices selected by a headless service only resolves the hostname for one of the services.

Per the spec:

There must be an A record for each ready endpoint of the headless Service with IPv4 address as shown below.

So I feel like I'm missing something?

What did you expect to happen?

Both names resolve (maybe?)

How can we reproduce it (as minimally and precisely as possible)?

  • create 2x statefusets.
  • add a shared selector to both
  • add a unique selector to each
  • create 3x headless services: one for each selector

now things get interesting:

  • the shared service will show all endpoints from both pods, and the endpoint slices will show they as ready and serving
  • DNS will only resolve for whatever you said SS serviceName to (perhaps deliberately?)
  • SS subdomain is overwritten to match serviceName if specified?

end result is only second-pod.second-service-name.ns.svc.cluster.local resolves to the pod IP, even though second-pod.shared-service.ns.svc.cluster.local should too?

my read is that is due to this logic:
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/endpointslice/util/controller_utils.go#L113-L117
https://github.com/coredns/coredns/blob/e3f83cb1fabb9b1cbaffb9df3c4b65476e92c39b/plugin/kubernetes/kubernetes.go#L476-L480

so who is wrong?

Anything else we need to know?

$ kubectl apply -f repro.yaml

# when using serviceName: shared
$ k exec -it first-0 -- dig second-0.shared.default.svc.cluster.local

; <<>> DiG 9.18.21 <<>> second-0.shared.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10827
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 5b180353a44f56c4 (echoed)
;; QUESTION SECTION:
;second-0.shared.default.svc.cluster.local. IN A

;; ANSWER SECTION:
second-0.shared.default.svc.cluster.local. 5 IN	A 10.7.225.77

;; Query time: 0 msec
;; SERVER: 10.16.0.10#53(10.16.0.10) (UDP)
;; WHEN: Sat Apr 06 15:01:31 UTC 2024
;; MSG SIZE  rcvd: 139
# same. just highlighting the 2nd subdomain doesn't resolve. perhaps expected
$ k exec -it first-0 -- dig second-0.second.default.svc.cluster.local

; <<>> DiG 9.18.21 <<>> second-0.second.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 39439
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: d8f070b19ee2c55e (echoed)
;; QUESTION SECTION:
;second-0.second.default.svc.cluster.local. IN A

;; AUTHORITY SECTION:
cluster.local.		5	IN	SOA	ns.dns.cluster.local. hostmaster.cluster.local. 1712413987 7200 1800 86400 5

;; Query time: 0 msec
;; SERVER: 10.16.0.10#53(10.16.0.10) (UDP)
;; WHEN: Sat Apr 06 15:01:27 UTC 2024
;; MSG SIZE  rcvd: 175
# fine
$ k exec -it first-0 -- dig first-0.first.default.svc.cluster.local

; <<>> DiG 9.18.21 <<>> first-0.first.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28555
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 474b9b921327388d (echoed)
;; QUESTION SECTION:
;first-0.first.default.svc.cluster.local. IN A

;; ANSWER SECTION:
first-0.first.default.svc.cluster.local. 5 IN A	10.7.225.73

;; Query time: 0 msec
;; SERVER: 10.16.0.10#53(10.16.0.10) (UDP)
;; WHEN: Sat Apr 06 15:01:39 UTC 2024
;; MSG SIZE  rcvd: 135

interestingly, on the endpoints with issue, I see in endpointslice yaml they don't have the hostname due to the k8s code above

long bash outputs of dig and endpoint slices full yaml output
$ k get ep -o yaml first second shared
apiVersion: v1
items:
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2024-04-06T14:31:49Z"
    creationTimestamp: "2024-04-06T14:27:11Z"
    labels:
      service.kubernetes.io/headless: ""
    name: first
    namespace: default
    resourceVersion: "4546341"
    uid: 884b42c2-1a3c-4f90-95e4-f82a8383c8f9
  subsets:
  - addresses:
    - hostname: first-0
      ip: 10.7.225.73
      nodeName: g1cb6d2
      targetRef:
        kind: Pod
        name: first-0
        namespace: default
        uid: 59d87cd3-5594-4ad8-b273-0969cd6c8d14
    ports:
    - name: rest
      port: 8501
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2024-04-06T14:33:06Z"
    creationTimestamp: "2024-04-06T14:30:43Z"
    labels:
      service.kubernetes.io/headless: ""
    name: second
    namespace: default
    resourceVersion: "4546684"
    uid: 314a8f21-9133-4b6d-8f5e-5b25907cbf1a
  subsets:
  - addresses:
    - ip: 10.7.225.77
      nodeName: g1cb6d2
      targetRef:
        kind: Pod
        name: second-0
        namespace: default
        uid: 801102ab-874f-4ceb-ba5a-e49e34274c4d
    ports:
    - name: rest
      port: 8501
      protocol: TCP
- apiVersion: v1
  kind: Endpoints
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2024-04-06T14:33:06Z"
    creationTimestamp: "2024-04-06T14:27:11Z"
    labels:
      service.kubernetes.io/headless: ""
    name: shared
    namespace: default
    resourceVersion: "4546685"
    uid: 099d86cb-4418-44e7-b7d9-8fdb1ef0b3fb
  subsets:
  - addresses:
    - ip: 10.7.225.73
      nodeName: g1cb6d2
      targetRef:
        kind: Pod
        name: first-0
        namespace: default
        uid: 59d87cd3-5594-4ad8-b273-0969cd6c8d14
    - hostname: second-0
      ip: 10.7.225.77
      nodeName: g1cb6d2
      targetRef:
        kind: Pod
        name: second-0
        namespace: default
        uid: 801102ab-874f-4ceb-ba5a-e49e34274c4d
    ports:
    - name: rest
      port: 8080
      protocol: TCP
kind: List
metadata:
  resourceVersion: ""

$ k get endpointslice | rg '(first|second|shared)'
first-lr9rb                       IPv4          8501      10.7.225.73               21m
second-fhvbk                      IPv4          8501      10.7.225.77               17m
shared-4ph6n                      IPv4          8080      10.7.225.73,10.7.225.77   21m

$ k get endpointslice -o yaml first-lr9rb second-fhvbk shared-4ph6n
apiVersion: v1
items:
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - 10.7.225.73
    conditions:
      ready: true
      serving: true
      terminating: false
    hostname: first-0
    nodeName: g1cb6d2
    targetRef:
      kind: Pod
      name: first-0
      namespace: default
      uid: 59d87cd3-5594-4ad8-b273-0969cd6c8d14
    zone: "208"
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2024-04-06T14:31:49Z"
    creationTimestamp: "2024-04-06T14:27:11Z"
    generateName: first-
    generation: 6
    labels:
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: first
      service.kubernetes.io/headless: ""
    name: first-lr9rb
    namespace: default
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: first
      uid: 7d50bcea-2070-4484-a049-b63f8b67ff7f
    resourceVersion: "4546340"
    uid: b2da77c4-fa28-4be2-a562-e9d19cc0d849
  ports:
  - name: rest
    port: 8501
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - 10.7.225.77
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: g1cb6d2
    targetRef:
      kind: Pod
      name: second-0
      namespace: default
      uid: 801102ab-874f-4ceb-ba5a-e49e34274c4d
    zone: "208"
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2024-04-06T14:33:06Z"
    creationTimestamp: "2024-04-06T14:30:43Z"
    generateName: second-
    generation: 16
    labels:
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: second
      service.kubernetes.io/headless: ""
    name: second-fhvbk
    namespace: default
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: second
      uid: 28d521bc-1468-4e6a-a156-6f927494aead
    resourceVersion: "4546686"
    uid: 483b27b1-ba59-4ac2-9940-a465f43b8fc2
  ports:
  - name: rest
    port: 8501
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1
  endpoints:
  - addresses:
    - 10.7.225.73
    conditions:
      ready: true
      serving: true
      terminating: false
    nodeName: g1cb6d2
    targetRef:
      kind: Pod
      name: first-0
      namespace: default
      uid: 59d87cd3-5594-4ad8-b273-0969cd6c8d14
    zone: "208"
  - addresses:
    - 10.7.225.77
    conditions:
      ready: true
      serving: true
      terminating: false
    hostname: second-0
    nodeName: g1cb6d2
    targetRef:
      kind: Pod
      name: second-0
      namespace: default
      uid: 801102ab-874f-4ceb-ba5a-e49e34274c4d
    zone: "208"
  kind: EndpointSlice
  metadata:
    annotations:
      endpoints.kubernetes.io/last-change-trigger-time: "2024-04-06T14:33:06Z"
    creationTimestamp: "2024-04-06T14:27:11Z"
    generateName: shared-
    generation: 23
    labels:
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: shared
      service.kubernetes.io/headless: ""
    name: shared-4ph6n
    namespace: default
    ownerReferences:
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: shared
      uid: a3c5d040-1905-447c-a457-6b72ba26f0b7
    resourceVersion: "4546683"
    uid: 0110bd21-b87c-4380-83bb-0c89f13098c2
  ports:
  - name: rest
    port: 8080
    protocol: TCP
kind: List
metadata:
  resourceVersion: ""
yamls
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    unique: &name first
    shared: shared
  name: *name
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      unique: *name
  serviceName: *name
  template:
    metadata:
      labels:
        unique: *name
        shared: shared
    spec:
      containers:
      - command: ["sleep", "infinity"]
        image: docker.io/nicolaka/netshoot@sha256:b569665f0c32391b93f4de344f07bf6353ddff9d8c801ac3318d996db848a64c
        imagePullPolicy: IfNotPresent
        name: server
        ports:
        - containerPort: 8080
          name: rest
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      subdomain: *name
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoSchedule
        key: is_cpu_compute
        operator: Exists
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  name: &name first
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: rest
    port: 8501
    protocol: TCP
    targetPort: 8501
  selector:
    unique: *name
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    unique: &name second
    shared: shared
  name: *name
spec:
  persistentVolumeClaimRetentionPolicy:
    whenDeleted: Retain
    whenScaled: Retain
  podManagementPolicy: Parallel
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      unique: *name
  serviceName: shared
  template:
    metadata:
      labels:
        unique: *name
        shared: shared
    spec:
      containers:
      - command: ["sleep", "infinity"]
        image: docker.io/nicolaka/netshoot@sha256:b569665f0c32391b93f4de344f07bf6353ddff9d8c801ac3318d996db848a64c
        imagePullPolicy: IfNotPresent
        name: server
        ports:
        - containerPort: 8080
          name: rest
          protocol: TCP
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      subdomain: shared # <--- important to repro
      terminationGracePeriodSeconds: 1
      tolerations:
      - effect: NoSchedule
        key: is_cpu_compute
        operator: Exists
  updateStrategy:
    rollingUpdate:
      partition: 0
    type: RollingUpdate
---
apiVersion: v1
kind: Service
metadata:
  name: &name second
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: rest
    port: 8501
    protocol: TCP
    targetPort: 8501
  selector:
    unique: *name
  sessionAffinity: None
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  name: &name shared # <-- this one will have issues
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: rest
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    shared: *name
  sessionAffinity: None
  type: ClusterIP
---

Kubernetes version

$ kubectl version
Client Version: v1.29.3
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.26.13
WARNING: version difference between client (1.29) and server (1.26) exceeds the supported minor version skew of +/-1

Cloud provider

tested across 3x providers so far with same behavior, multiple k8s versions but not latest master

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, ...) and versions (if applicable)

...coredns?
@alexeldeib alexeldeib added the kind/bug Categorizes issue or PR as related to a bug. label Apr 6, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 6, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Apr 6, 2024
@alexeldeib
Copy link
Member Author

/sig network

@k8s-ci-robot k8s-ci-robot added sig/network Categorizes an issue or PR as relevant to SIG Network. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 6, 2024
@alexeldeib
Copy link
Member Author

I'll just tack on -- this could totally be expected from statefulsets. It feels like the intersection of a lot of different pieces. If it's expected, I'd say the DNS spec I linked doesn't reflect reality and is overly strict.

@danwinship
Copy link
Contributor

/assign @adrianmoisey

@k8s-ci-robot
Copy link
Contributor

@danwinship: GitHub didn't allow me to assign the following users: adrianmoisey.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @adrianmoisey

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@adrianmoisey
Copy link

/assign

@adrianmoisey
Copy link

Hello,

So it seems that the issue you're facing here is that the DNS records for the StatefulSets are only created for the Service set in spec.serviceName of the StatefulSet.
It's possible to simplify your example using a single StatefulSet and 2 Services, and setting spec.serviceName in the StatefulSet. You'll notice that you only get host DNS records for the Service defined in spec.serviceName.

This behaviour is documented here already: https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#stable-network-id, specifically:

As each Pod is created, it gets a matching DNS subdomain, taking the form: $(podname).$(governing service domain), where the governing service is defined by the serviceName field on the StatefulSet.

This behaviour is consistent with the code that you pointed out (https://github.com/kubernetes/kubernetes/blob/v1.29.3/staging/src/k8s.io/endpointslice/util/controller_utils.go#L113-L117) in the issue description.

Regarding a path forward, is this a case where the DNS spec needs to be updated (https://github.com/kubernetes/dns/blob/master/docs/specification.md#24---records-for-a-headless-service) or is there a use case where the behaviour needs to change?

@alexeldeib
Copy link
Member Author

alexeldeib commented Apr 18, 2024

I'd probably say updating the spec is the best path forward.

is there a use case where the behaviour needs to change

need is a strong word 🙂 but yes, there is a scenario where this is useful for us in practice.

admiralty + linkerd multicluster provides a really easy and lightweight multicluster service topology.

linkerd multicluster specifically supports per-endpoint routing for statefulsets by hostname. we depend on this today as we require per-endpoint tracking across clusters (we use endpoints, not k8s services, and this is a workload requirement today for resource allocation/tracking reasons).

sometimes we want to do something like canary rollouts, or test a minority of endpoints on a new version. deploying two statefulsets with matching selectors works perfectly for this use case, except that we cannot track the hostnames correctly for both statefulsets across clusters due the DNS issue.

This is already a fairly niche case and can likely be handled by something like SMI/traffic splitting further up the stack (well, that doesn't solve endpoint tracking, but we can deal with that I guess), but given it is a technical spec violation (even if longstanding behavior and the spec came later?), I figured I'd raise the issue. Other approaches which can deliver the same functionality impose much heavier requirements (flat disjoint network across clusters, identical CNI configuration -- cilium is a good example) or I have yet to find their existence (global overlay secondary VPN with minimal complexity, something like kilo is close, we found liqo awfully heavy. Submariner is also somewhat close conceptually).

I imagine changing that behavior would be far more disruptive today.

@thockin
Copy link
Member

thockin commented Apr 18, 2024

Part of the problem is that there are a few things conflated here.

The Endpoint hostname field is used both for the A/AAAA records and for the PTR record generation. It's strongly discouraged for PTR records to have more than one value, so we use the pod's own subdomain field to designate the intention. If we set hostname in multiple Endoints' for the same IP, we don't know which one is the canonical PTR.

In hindsight, this is a sloppy design, but it comes from a different era, and is hard to undo.

@alexeldeib
Copy link
Member Author

Turns out there are several more things here!

  • we return multiple PTRs in some cases as discovered in Multiple PTR records being returned for a Pod backed by multiple Services #124418
  • these should likely match pod subdomain, which should match service name (in the case of headless service DNS)
  • if you force this to be true when using SS + Deploy (where deployment pod template has pod hostname + subdomain explicitly set, and the subdomain matches the SS serviceName/subdomain), and ensure the pod is an endpoint selected by the headless service, all pod FQDNs resolve correctly, but...
    • it is possible to create arbitrarily many duplicate IPs behind the same FQDN
    • you will only get one response for the A record for that FQDN
    • all pod IPs will have the PTR record for that FQDN
      e.g.

multiple PTRs for overlapping service with non-unique subdomains (probably, subdomain should be the determining thing here)

[cw-rno2|default] ace@Ace-Eldeib slurm % k exec -it first-dep-f458d798b-vqbkj -- dig +short first-0.first.default.svc.cluster.local
10.7.225.73
[cw-rno2|default] ace@Ace-Eldeib slurm % k exec -it first-dep-f458d798b-vqbkj -- dig +short -x 10.7.225.73
first-0.first.default.svc.cluster.local.
10-7-225-73.shared-dep.default.svc.cluster.local.
10-7-225-73.shared.default.svc.cluster.local.

colliding PTR

[dev|default] ace@Ace-Eldeid % k exec -it first-0 -- dig +short first-0.first.default.svc.cluster.local
10.76.79.82
[dev|default] ace@Ace-Eldeib % k exec -it first-0 -- dig +short -x 10.76.79.82
first-0.first.default.svc.cluster.local.
[dev|default] ace@Ace-Eldeib % k exec -it first-0 -- dig +short -x 10.76.79.80
first-0.first.default.svc.cluster.local.

this seems to intersect with #60789

@alexeldeib
Copy link
Member Author

additionally:

  • statefulset allows setting pod subdomain in template but overrides it to serviceName
  • a pod which specifies hostname and subdomain will not have a resolvable FQDN unless it is also selected by a headless service of the same name (doc clarification, but probably correct behavior?)

@thockin
Copy link
Member

thockin commented Apr 24, 2024

statefulset allows setting pod subdomain in template but overrides it to serviceName

I can't speak to intention here, but it seems correct to me? Embedding a PodSpec is all-or-nothing, sadly.

a pod which specifies hostname and subdomain will not have a resolvable FQDN unless it is also selected by a headless service of the same name

Yeah this should be updated in the spec.

@thockin thockin added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 25, 2024
@adrianmoisey
Copy link

I had a look at making a PR to update the DNS spec, but as I was digging into it, I realised that the spec seems fine.

https://github.com/kubernetes/dns/blob/master/docs/specification.md#24---records-for-a-headless-service

There must also be an A record of the following form for each ready endpoint with hostname of and IPv4 address .

This part of the spec seems to be talking about Endpoints, not Pods or Statefulsets.
If a Pod has subdomain and hostname set, then the Endpoint with the matching name to subdomain will result in an entry with hostname, which in turn leads to a DNS A record, for that Pod, to be created.

I also see that the Statefulset docs also describe the behaviour of DNS creation.

@thockin
Copy link
Member

thockin commented May 3, 2024

You're right that this is speaking in terms of endpoints. If you think the spec is clear enough, we should close this. But fresh eyes are worth a fortune, so if you think it could be clarified, I am open to it.

@adrianmoisey
Copy link

so if you think it could be clarified, I am open to it.

Overall I think it's clear enough.

If it started getting into high-level concepts, such as relationships between Deployments and Pods and Endpoints, I think it would be a bit too much.

My only suggestion here is to capitalise the references to Endpoint, making it clear that the spec is talking about Endpoint objects. But I'm not sure if that is just an implementation detail that may change in the future.

@thockin
Copy link
Member

thockin commented May 3, 2024

The use of "endpoint" here is a little vague on purpose.It used to mean "records in the Endpoints object". Then we added EndpointSlice and it means that, too. In theory there are other ways to do endpoint discovery. This spec needs to stay just a LITTLE abstract to decouple from today's specific choices which might change tomorrow :)

@adrianmoisey
Copy link

Yup, that's what I assumed.
I can't see any useful changes to make it clearer. I think this is good to close.

@thockin
Copy link
Member

thockin commented May 3, 2024

OK, thanks!

@thockin thockin closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/network Categorizes an issue or PR as relevant to SIG Network. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants