Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. #217

kuberkaul · 2023-08-19T16:20:03Z

Have cert-manager, istio csr and istio running using AWS PCA. It all works fine except when I restart a whole namespace with 30+ deployment starting at the same time.

These errors cause pods to fail while coming up but these errors tend to go away after 5-10 minutes and pods eventually come up. Is this just just istio-csr being slow and requests reaching timeout since there is a barrage of them ? any workarounds for this ?

I have two istio-csr running with dedicated 1 cpu, 2 Gi mem. it seems to come up with 1 worker.

2023-08-19T16:10:25.644755Z	error	klog	cert-manager "msg"="got unexpected object response from watcher" "error"=null "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-s7l9h" "namespace"="istio-system" "object"={"metadata":{},"status":"Failure","message":"an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","reason":"InternalError","details":{"causes":[{"reason":"UnexpectedServerResponse","message":"unable to decode an event from the watch stream: context canceled"},{"reason":"ClientWatchDecoding","message":"unable to decode an event from the watch stream: context canceled"}]},"code":500}

2023-08-19T16:10:25.644795Z	error	klog	grpc-server "msg"="failed to sign incoming client certificate signing request" "error"="failed to wait for CertificateRequest istio-system/istio-csr-s7l9h to be signed: watcher channel closed" "identities"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "serving-addr"="0.0.0.0:6443"

2023-08-19T16:10:25.651776Z	info	klog	cert-manager "msg"="deleted CertificateRequest" "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-s7l9h" "namespace"="istio-system"

2023-08-19T16:10:26.434228Z	error	klog	cert-manager "msg"="got unexpected object response from watcher" "error"=null "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-4ph8z" "namespace"="istio-system" "object"={"metadata":{},"status":"Failure","message":"an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","reason":"InternalError","details":{"causes":[{"reason":"UnexpectedServerResponse","message":"unable to decode an event from the watch stream: context canceled"},{"reason":"ClientWatchDecoding","message":"unable to decode an event from the watch stream: context canceled"}]},"code":500}

2023-08-19T16:10:26.434273Z	error	klog	grpc-server "msg"="failed to sign incoming client certificate signing request" "error"="failed to wait for CertificateRequest istio-system/istio-csr-4ph8z to be signed: watcher channel closed" "identities"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "serving-addr"="0.0.0.0:6443"

2023-08-19T16:10:26.442670Z	info	klog	cert-manager "msg"="deleted CertificateRequest" "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-4ph8z" "namespace"="istio-system"

The text was updated successfully, but these errors were encountered:

nitishkrishna · 2023-11-29T19:04:15Z

I think its because you can't set QPS or Burst, unlike in Cert-manager itself:
#144
So these pods are getting client-side rate-limited when talking to the Kube API server probably

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. #217

Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. #217

kuberkaul commented Aug 19, 2023 •

edited

nitishkrishna commented Nov 29, 2023

Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. #217

Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. #217

Comments

kuberkaul commented Aug 19, 2023 • edited

nitishkrishna commented Nov 29, 2023

kuberkaul commented Aug 19, 2023 •

edited