Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

istio-csr doesn't retry upon failed certificate requests #138

Open
SachinHg opened this issue Mar 7, 2022 · 2 comments
Open

istio-csr doesn't retry upon failed certificate requests #138

SachinHg opened this issue Mar 7, 2022 · 2 comments

Comments

@SachinHg
Copy link

SachinHg commented Mar 7, 2022

Hi all,
I am using istio-csr with cert-manager where vault is used as the CA on my istio mesh. I am running into the below issue.

When I try to deploy a workload pod on the istio mesh, I see errors in the events of the namespace

Error creating: Internal error occurred: failed calling webhook "namespace.sidecar-injector.istio.io": Post "https://istiod.istio-system.svc:443/inject?timeout=10s": x509: certificate has expired or is not yet valid: current time 2022-03-04T09:11:29Z is after 2022-03-04T02:12:13Z

Looking further at the details of the certificate deployed by istio-csr in istio-system namespace, I see that the last transition happened around 1:42:13
lastTransitionTime: "2022-03-04T01:42:13Z" message: Renewing certificate as renewal was scheduled at 2022-03-04 01:42:13

What this shows is that there was an attempt to renew the cert istiod at the stipulated time and it doesn't tell us if this activity failed or not.
So, further I looked at the logs on the istio-csr pod running in cert-manager namespace and see the errors logs around the same time which said that the vault server was not available.

"msg"="failed to sign incoming client certificate signing request" "error"="failed to wait for CertificateRequest istio-system/istio-csr-d22hp to be signed: created CertificateRequest has failed: [{Approved True 2022-03-04 01:49:51 +0000 UTC cert-manager.io Certificate request has been approved by cert-manager.io} {Ready False 2022-03-04 01:49:55 +0000 UTC Failed Vault failed to sign certificate: failed to sign certificate by vault: Post no such host

I know that vault was brought down at that time for maintenance which was eventually brought back up. What I noticed is that even though vault was up after some time, none of the new pods were created successfully and all the pod creations complain about the certificate being expired.

I want to know why istio-csr did not try to renew certificates once vault was back up. Is there a workaround for this problem ?

Thanks

@nitishkrishna
Copy link
Contributor

+1 @JoshVanL can you please look into this, its a critical issue for us as Vault can be temporarily available for maintenance

@nitishkrishna
Copy link
Contributor

nitishkrishna commented Nov 5, 2022

During a recent vault outage, many of our clusters didn't have their certs auto-renewed with the below error:

status:
  conditions:
  - lastTransitionTime:
    message: Certificate request has been approved by cert-manager.io
    reason: cert-manager.io
    status: "True"
    type: Approved
  - lastTransitionTime:
    message: 'Failed to initialise vault client for signing: error reading Kubernetes
      service account token from vault-istio-issuer-token: error calling Vault server:
      Post "<vault>/login":
      dial tcp: lookup vaut.xyz on xx: server misbehaving'
    reason: Pending
    status: "False"
    type: Ready

VaultIssuer recovered but new CSR was never made and this one was never eventually signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants