[BUG] race condition in longhorn-manager certificate renewal #8433
Labels
kind/bug
require/backport
Require backport. Only used when the specific versions to backport have not been definied.
require/qa-review-coverage
Require QA to review coverage
Describe the bug
We are observing what looks like a race condition between longhorn-manager pods. All
longhorn-manager
pods keep trying to updatelonghorn-webhook-tls
tens to hundreds times per second resulting in logs filled with those errors:What we have observed (through kubectl watch on the secret and deciphering certs) is that updates are flipping between 2 certificates (always the same 2) differing only by serial numbers.
Might be related to renewal 90 days before expiry.
To Reproduce
not sure how to reproduce, as we are not 100% sure of the cause
Expected behavior
longhorn-manager
pod properly renews the certificateSupport bundle for troubleshooting
Environment
Additional context
retaled to #5571
slack thread: just my notes without reply
The text was updated successfully, but these errors were encountered: