-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sporadic "MountVolume.SetUp failed for volume ... not registered" failures seen in 1.22.0-1.22.8, 1.23.0-1.23.5 #105204
Comments
/sig scheduling |
maybe you can also get this warning log in a namespace which its name doesn't contains
|
its not visible in namespaces that does not have |
@zetaab , I executed the steps you provided. I agree with @cmssczy, this failure message is not related to namespace naming.
|
@zetaab
What version was your cluster previously? I think this could somehow be related to the ServiceAccount admission controller and the Bound Service Account Token Volume. There was a feature gate that graduated in 1.21 that is supposed to copy the kube-root-ca.crt ConfigMap to every namespace:
I wonder if your namespace existed prior to 1.21, that maybe it never got the "kube-root-ca.crt" ConfigMap. Do you have that ConfigMap in your gha-devcloud namespace?
If not, you might be able to trigger the rootcacertpublisher by updating your namespace, perhaps by adding an annotation (ie. |
I'm seeing this too. k8s 1.22.2 setup with kubeadm, no updates and the kube-root-ca.crt is available in the namespace. |
@zetaab @sdlarsen I circled back to look at this again and I do in fact see the same problem now. I don't know why I didn't see it before. Now I am doubting myself, but I am almost certain the same manifest worked when I tried it a couple weeks ago trying to reproduce the issue. Also, I see the problem regardless of whether the namespace contains a dash. So for example, just
I really don't know what to make of this at this point. There must be some other factors involved. Are you sure it only happens when the namespace contains |
@zetaab As a workaround, if you don't need the service account token, then you can disable it using
Also, I think the error message in the original issue description is only a warning, so the jobs are still running, just that you get this warning every time. |
same error. Cronjob cannot get the kube-api-access volume. |
Check whether the issue is happening for other pods in the same namespace. In my case the problem was a bad pod, the container was exiting for other reasons. |
I'm experiencing the same issue some times for the My cron-job runs a shell script to update the digitalocean dns record for my domain. Appears since I upgraded my baremetal k3s cluster to v1.22.3 |
we experience the same issue on openshift 4.9.7 which is based on k8s 1.22.2. we are running a
so i think it's not only related to kube-root-ca.crt. the job runs fine, but we see the messages above in the events every time the job is triggered. |
searching the k8s 1.22.2 source code for the message above lists two locations:
i'm a noob in regards to k8s source code but it seems to be this is a caching issue. |
Got the same error in kubernetes 1.22.4, when I create the subPathExpr example pod in page https://kubernetes.io/docs/concepts/storage/volumes/ . The command
and the pod's state was Error, if this bug was related to caching, how to flush the caching? |
I'm seeing the same behavior as @tosmi... it doesn't appear to be strictly related to The log lines are marked as |
I am also seeing these MountVolume.SetUp failed errors with a freshly deployed OpenShift 4.9.5 cluster when attempting to create a cronjob to sync LDAP groups to OpenShift. Errors look very similar to those posted by @tosmi We just upgraded the cluster to 4.9.11 and will retest tomorrow. |
I also see this same error with a Job on fresh install of MicroK8s (v1.22.4). Newly created namespace called
|
We see the same behaviour with custom configMaps and secrets. Any news on this? |
The kubelet doesn't treat pods coming from a cronjob or jobs differently than pods from any other source. The 'not registered' error means a call to the kubelet's secret or configmap manager to get a secret or configmap happened when the manager had no record of the kubelet handling a pod referencing that secret or configmap. That should not happen. I also don't see any logic in that manager that would be impacted in any way by the presence or absence of a /assign @wojtek-t |
https://github.com/kubernetes/kubernetes/commits/master/pkg/kubelet/util/manager is the relevant package. |
Also seeing "not registered" methods in kubelet logs in current e2e runs (though it's unclear whether those are coming from the kubelet trying to do volume-related things after a pod has already been torn down). Opened #107739 to add some more logging around the refcount increment/decrement cases and the "not registered" instances to see if that is related |
@kneemaa the ‘volume not registered’ issue discussed in this thread has appeared to be a warning to us which doesn’t really break anything, if you have a job you need to troubleshoot, try to sleep it, and shell into it and see if you can access the content in mounted directory and if the content is corrupted; hope this helps. And for the issue in this thread, we have upgrade AKS to 1.24.3 from 1.22.4, haven’t seen it since. It’s good to be updated. |
I found what the cause was here. It appears it was unrelated to the original issue already solved in the releases I tried. |
Is this issue clearly resolved in the latest version? |
I think it's a problem about concurrency:
When kubelet see a new pod, it will add the new pod into pod manager firstly and then run syncPod function (HandlePodAddtions). These two actions are concurrency, so the volume manager might see the new pod before kubelet register the new pod into configMap/secret manager, and then we see the "not registered" events. |
It has worked for me by giving it a different name from the volume that I have in the deployment. In the deployment:
In the cronjob I had to put:
Is it possible that a resource cannot be reused between the deployment and the cronjob? I am on version 1.22.13 |
While there was a distinct reproducible issue found and fixed already in v1.22.9+, v1.23.6+, and 1.24+, it seems like there is still another issue remaining that produces a similar symptom. The comment at #105204 (comment) hints that this message might be a symptom of a pod that is failing to start for some other reason. It would be helpful if someone still observing this symptom in a reproducible way on a version >= 1.22.9, 1.23.6, or 1.24 could open a new issue to track that with the following info:
That would help in gathering info related to any remaining issues that surface this message. |
I've retitled this issue to bound to the versions containing the bug fixed in #107831 If we can get an issue report with reproducible details on versions newer than that, that would help steer investigation / fixes |
v1.24.9+k3s2 still exists |
I would say for me it is not actually a failure but a warning only as I am using karpenter with EKS, and this warning happens before node is actually ready, but pod is assigned to node.
|
Secrets should be registered by syncPod, and there is separate work to address components looking at data in podManager when they should be looking at podWorkers (#115342). But as Jordan says, a reproducer will help. |
Having the same issue on 1.27 |
We have the same problem from time to time with two different k8s version: kubelet logs:
|
Hello together, I have a K3S Cluster v1.28.2+k3s1 with 1 Master and 3 Worker nodes, and I just recently got the following error, whenever I try do deploy new pods: In addition, when I run I also deleted some of my other applications to see if there was some kind of limit reached, but still, I couldn't deploy new applications. When I then redeployed the applications I deleted for testing purposes, those applications again did deploy just fine. It is only for new applications that were not deployed in the cluster before. Can you please investigate? I cannot deploy any applications anymore! |
Same error for me. Tnx now its up and running. Just # annotations |
annotations didnt work for me
|
Using :
pod fails to start in any namespace that is not the default namespace. with the same error as the topic. All namespaces contain the root-ca crt
|
I found the issue to mine, I am running on a bare-metal server and the admission controller plugin was not enabled. Once I enabled it and restarted the api-server it started working. |
sslgeorge, would you please provide details what exactly plugin was not enabled? On my cluster v1.22 where I am experiencing the issue with cronjobs, api-server runs with Moreover, I see the issue with my other k8s clusters of different kinds, it is started right after a routine helm uninstall/install operation (the way we upgrade our releases)... Usually the warnings are disappeared after couple of hours but sometimes they are persistent. The pods are started and completed well, except for the warnings. |
Any new progress on this issue ?
|
Seen this on v1.24.4.
|
Issue reproducible on
|
What happened:
We updated our cluster to 1.22.2. We are trying to create following cronjob:
However, it creates
job
like should and also pod.When I check events for the pod I see following
If I take copy from that
pod
manifest, and removerestartPolicy
from the pod manifest, it will start correctly and pod will be executed.What you expected to happen:
I expect that it will create cronjob and it will also be executed correctly.
How to reproduce it (as minimally and precisely as possible):
-
in its name, without-
everything works.Anything else we need to know?:
Environment:
kubectl version
): 1.22.2cat /etc/os-release
): debian busteruname -a
):The text was updated successfully, but these errors were encountered: