New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replicate cache hanging on configmaps #142
Comments
Throttling is not the issue, I was able to replicate this on a local kind cluster without throttling issues.
|
We also have the same issue, the container just hung for few days on And we are on the latest version |
Is this using the wrong informer? https://github.com/open-policy-agent/kube-mgmt/blob/7.1.1/pkg/data/generic.go#L126
It specifically says only to use this for Get/List NewSharedIndexInformer says its specifically for listwatcher
Here is a cluster with 2000 configmaps and it only ever makes it thru 27 of them to my raw http receiver.
I think this is what the authors upstream meant by faulty. The cache state is changing too rapidly and resetting the list? |
I made a PR based on the client-go comment but i think the whole diagnosis may have been wrong. It looks like perhaps we were just attacking ourselves with leader election configmaps which may have in some way overloaded OPA itself. As to why we dont see all configmaps load ever... i suspect that the LIST function is never initialized like this: https://github.com/open-policy-agent/kube-mgmt/pull/183/files#diff-10432976b832c2bf005c1a2b317bff34f985023e0e2e43650c2cfd1fbcb1f7faR151-R155 Im leaving the PR in draft because i think this possibly may be a non-issue with kube-mgmt given the very high volume of uploads to OPA that should be eliminated first. |
I have debug logging on my branch
This is with the loader fixed to ignore leader elections + different namespaces that may not have leader elects labeled correctly. You can see here its taking ~2 seconds per request. So if the reloads are NOT bypassed the queue never empties and kube-mgmt falls so far behind it can never finish the queue. I'm going to link this to #186 as i'm pretty certain this was of our own doing and will work to tidy up whatever i can to make this mergable. |
I am using a fairly large cluster with some configmaps that are constantly updating. When I add the
--replicate=v1/configmaps
flag to kube-mgmt, the watch seems to hang for hours (I am seeing around 3+ hours).The only conclusion I can come to is that because of the constantly updating configmaps, the ResultChan is not ever returning?
kube-mgmt/pkg/data/generic.go
Lines 185 to 223 in 6da0e7e
The only other thing I can think is that because of the throttling issue we have on the cluster and the channel is failing out.
Any help here would be appreciated, thanks.
The text was updated successfully, but these errors were encountered: