Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readiness Gate Injection Breaks Kubernetes 1.29 Sidecar Pods #3649

Closed
jlrgraham23 opened this issue Apr 16, 2024 · 4 comments
Closed

Readiness Gate Injection Breaks Kubernetes 1.29 Sidecar Pods #3649

jlrgraham23 opened this issue Apr 16, 2024 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@jlrgraham23
Copy link

jlrgraham23 commented Apr 16, 2024

Describe the bug

When the pod readiness gate feature is enabled on a namespace in a Kubernetes 1.29 cluster it strips away the restartPolicy: Always values on containers in the initContainers block; this effectively breaks the new SideCar Containers feature in 1.29.

This looks to be due to use of an older version of the k8s.io/api library here:
https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/main/go.mod#L22C2-L22C20

Same issue here with the EKS specific Pod Identity Webhook.

Upstream docs: https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Steps to reproduce

  • Startup a Kubernetes 1.29 cluster.
  • Create a namespace with the elbv2.k8s.aws/pod-readiness-gate-inject=enabled label.
  • Deploy a workload into this namespace using restartPolicy: Always on an init container in a pod.
  • Receive infinite hang as the sidecar containers special status is removed and the pod tries to wait on them for startup.
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: foobar
spec:
  replicas: 2
  selector:
    matchLabels:
      app: foobar
  template:
    metadata:
      labels:
        app: foobar
    spec:
      initContainers:
        - name: sidecar
          image: debian
          command: ["sleep", "60"]
          restartPolicy: Always
      containers:
        - name: main
          image: debian
          command: ["sleep", "60"]

Inspecting the resulting pods from this Deployment shows that the restartPolicy attribute has been removed in flight.

Expected outcome

The readiness gate should not remove the restartPolicy attribute from containers in the initContainers spec.

Environment

  • AWS Load Balancer controller version: v2.7.2
  • Kubernetes version: 1.29.1
  • Using EKS (yes/no), if so version? Yes, 1.29.
@kakarotbyte
Copy link

was able to replicate the same using above steps

AWS Load Balancer controller version: v2.8.0
EKS version: 1.29

@oliviassss
Copy link
Collaborator

Thanks for the reporting, we are working on upgrading the controller-runtime to latest version, and k8s deps to v0.30.0, which should fix this issue: #3707

@kakarotbyte
Copy link

yes elbv2.k8s.aws/pod-readiness-gate-inject=enabled tag needed to be added to our Namespace.

And I see the following in AWS LBC pods logs when i enable debug logs

{"level":"debug","ts":"2024-05-21T18:34:55Z","logger":"controller-runtime.webhook.webhooks","msg":"received request","webhook":"/mutate-v1-pod","UID":"05be1caa-511d-4b1c-bdbc-c04de7f57745","kind":"/v1, Kind=Pod","resource":{"group":"","version":"v1","resource":"pods"}}
{"level":"debug","ts":"2024-05-21T18:34:55Z","logger":"mutating_handler","msg":"mutating webhook request","request":{"uid":"05be1caa-511d-4b1c-bdbc-c04de7f57745","kind":{"group":"","version":"v1","kind":"Pod"},"resource":{"group":"","version":"v1","resource":"pods"},"requestKind":{"group":"","version":"v1","kind":"Pod"},"requestResource":{"group":"","version":"v1","resource":"pods"},"namespace":"cat","operation":"CREATE","userInfo":{"username":"system:serviceaccount:kube-system:replicaset-controller","uid":"bbaef543-82f7-437e-a064-4fab5b55d934","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"]},"object":{"kind":"Pod","apiVersion":"v1","metadata":{"generateName":"foobar-6d5cd67844-","namespace":"cat","creationTimestamp":null,"labels":{"app":"foobar","eks.amazonaws.com/fargate-profile":"cat","pod-template-hash":"6d5cd67844"},"ownerReferences":[{"apiVersion":"apps/v1","kind":"ReplicaSet","name":"foobar-6d5cd67844","uid":"74e35ed7-9718-43ab-8ed3-26b07cef730c","controller":true,"blockOwnerDeletion":true}],"managedFields":[{"manager":"kube-controller-manager","operation":"Update","apiVersion":"v1","time":"2024-05-21T18:34:55Z","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:generateName":{},"f:labels":{".":{},"f:app":{},"f:pod-template-hash":{}},"f:ownerReferences":{".":{},"k:{\"uid\":\"74e35ed7-9718-43ab-8ed3-26b07cef730c\"}":{}}},"f:spec":{"f:containers":{"k:{\"name\":\"main\"}":{".":{},"f:command":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:dnsPolicy":{},"f:enableServiceLinks":{},"f:initContainers":{".":{},"k:{\"name\":\"sidecar\"}":{".":{},"f:command":{},"f:image":{},"f:imagePullPolicy":{},"f:name":{},"f:resources":{},"f:restartPolicy":{},"f:terminationMessagePath":{},"f:terminationMessagePolicy":{}}},"f:restartPolicy":{},"f:schedulerName":{},"f:securityContext":{},"f:terminationGracePeriodSeconds":{}}}}]},"spec":{"volumes":[{"name":"kube-api-access-vbfzw","projected":{"sources":[{"serviceAccountToken":{"expirationSeconds":3607,"path":"token"}},{"configMap":{"name":"kube-root-ca.crt","items":[{"key":"ca.crt","path":"ca.crt"}]}},{"downwardAPI":{"items":[{"path":"namespace","fieldRef":{"apiVersion":"v1","fieldPath":"metadata.namespace"}}]}}],"defaultMode":420}}],"initContainers":[{"name":"sidecar","image":"debian","command":["sleep","60"],"resources":{},"restartPolicy":"Always","volumeMounts":[{"name":"kube-api-access-vbfzw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"Always"}],"containers":[{"name":"main","image":"debian","command":["sleep","60"],"resources":{},"volumeMounts":[{"name":"kube-api-access-vbfzw","readOnly":true,"mountPath":"/var/run/secrets/kubernetes.io/serviceaccount"}],"terminationMessagePath":"/dev/termination-log","terminationMessagePolicy":"File","imagePullPolicy":"Always"}],"restartPolicy":"Always","terminationGracePeriodSeconds":30,"dnsPolicy":"ClusterFirst","serviceAccountName":"default","serviceAccount":"default","securityContext":{},"schedulerName":"fargate-scheduler","tolerations":[{"key":"node.kubernetes.io/not-ready","operator":"Exists","effect":"NoExecute","tolerationSeconds":300},{"key":"node.kubernetes.io/unreachable","operator":"Exists","effect":"NoExecute","tolerationSeconds":300}],"priorityClassName":"system-node-critical","priority":2000001000,"enableServiceLinks":true,"preemptionPolicy":"PreemptLowerPriority"},"status":{}},"oldObject":null,"dryRun":false,"options":{"kind":"CreateOptions","apiVersion":"meta.k8s.io/v1"}}}
{"level":"debug","ts":"2024-05-21T18:34:55Z","logger":"mutating_handler","msg":"mutating webhook response","response":{"Patches":[{"op":"remove","path":"/spec/initContainers/0/restartPolicy"}],"uid":"","allowed":true,"patchType":"JSONPatch"}}
{"level":"debug","ts":"2024-05-21T18:34:55Z","logger":"controller-runtime.webhook.webhooks","msg":"wrote response","webhook":"/mutate-v1-pod","code":200,"reason":"","UID":"05be1caa-511d-4b1c-bdbc-c04de7f57745","allowed":true}

@oliviassss
Copy link
Collaborator

@jlrgraham23, @kakarotbyte hi thanks for the patience
the fix has been shipped with v2.8.1 release, I'm closing this issue.
https://github.com/kubernetes-sigs/aws-load-balancer-controller/releases/tag/v2.8.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants