Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting a namespace with 30+ deployments causes errors in istio-csr which tends to reolve after a while. #217

Open
kuberkaul opened this issue Aug 19, 2023 · 1 comment

Comments

@kuberkaul
Copy link

kuberkaul commented Aug 19, 2023

Have cert-manager, istio csr and istio running using AWS PCA. It all works fine except when I restart a whole namespace with 30+ deployment starting at the same time.

These errors cause pods to fail while coming up but these errors tend to go away after 5-10 minutes and pods eventually come up. Is this just just istio-csr being slow and requests reaching timeout since there is a barrage of them ? any workarounds for this ?

I have two istio-csr running with dedicated 1 cpu, 2 Gi mem. it seems to come up with 1 worker.

2023-08-19T16:10:25.644755Z	error	klog	cert-manager "msg"="got unexpected object response from watcher" "error"=null "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-s7l9h" "namespace"="istio-system" "object"={"metadata":{},"status":"Failure","message":"an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","reason":"InternalError","details":{"causes":[{"reason":"UnexpectedServerResponse","message":"unable to decode an event from the watch stream: context canceled"},{"reason":"ClientWatchDecoding","message":"unable to decode an event from the watch stream: context canceled"}]},"code":500}

2023-08-19T16:10:25.644795Z	error	klog	grpc-server "msg"="failed to sign incoming client certificate signing request" "error"="failed to wait for CertificateRequest istio-system/istio-csr-s7l9h to be signed: watcher channel closed" "identities"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "serving-addr"="0.0.0.0:6443"

2023-08-19T16:10:25.651776Z	info	klog	cert-manager "msg"="deleted CertificateRequest" "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-s7l9h" "namespace"="istio-system"

2023-08-19T16:10:26.434228Z	error	klog	cert-manager "msg"="got unexpected object response from watcher" "error"=null "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-4ph8z" "namespace"="istio-system" "object"={"metadata":{},"status":"Failure","message":"an error on the server (\"unable to decode an event from the watch stream: context canceled\") has prevented the request from succeeding","reason":"InternalError","details":{"causes":[{"reason":"UnexpectedServerResponse","message":"unable to decode an event from the watch stream: context canceled"},{"reason":"ClientWatchDecoding","message":"unable to decode an event from the watch stream: context canceled"}]},"code":500}

2023-08-19T16:10:26.434273Z	error	klog	grpc-server "msg"="failed to sign incoming client certificate signing request" "error"="failed to wait for CertificateRequest istio-system/istio-csr-4ph8z to be signed: watcher channel closed" "identities"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "serving-addr"="0.0.0.0:6443"

2023-08-19T16:10:26.442670Z	info	klog	cert-manager "msg"="deleted CertificateRequest" "identity"="spiffe://cluster.local/ns/djin-content/sa/contentmgmt" "name"="istio-csr-4ph8z" "namespace"="istio-system"
@nitishkrishna
Copy link
Contributor

I think its because you can't set QPS or Burst, unlike in Cert-manager itself:
#144
So these pods are getting client-side rate-limited when talking to the Kube API server probably

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants