Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic "MountVolume.SetUp failed for volume ... not registered" failures seen in 1.22.0-1.22.8, 1.23.0-1.23.5 #105204

Closed
zetaab opened this issue Sep 23, 2021 · 129 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@zetaab
Copy link
Member

zetaab commented Sep 23, 2021

What happened:

We updated our cluster to 1.22.2. We are trying to create following cronjob:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: example
  namespace: gha-devcloud
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 3
      template:
        spec:
          containers:
          - image: busybox
            imagePullPolicy: Always
            name: guillotine
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
      ttlSecondsAfterFinished: 180
  schedule: '*/1 * * * *'
  successfulJobsHistoryLimit: 3
  suspend: false

However, it creates job like should and also pod.

When I check events for the pod I see following

Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    2m25s                  default-scheduler  Successfully assigned gha-devcloud/example-27206544--1-2gvfg to nodes-esptnl-eprwhk
  Normal   Pulling      2m25s                  kubelet            Pulling image "busybox"
  Normal   Pulled       2m24s                  kubelet            Successfully pulled image "busybox" in 1.049753158s
  Normal   Created      2m24s                  kubelet            Created container guillotine
  Normal   Started      2m23s                  kubelet            Started container guillotine
  Warning  FailedMount  2m22s (x2 over 2m23s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-wxtss" : object "gha-devcloud"/"kube-root-ca.crt" not registered

If I take copy from that pod manifest, and remove restartPolicy from the pod manifest, it will start correctly and pod will be executed.

What you expected to happen:

I expect that it will create cronjob and it will also be executed correctly.

How to reproduce it (as minimally and precisely as possible):

  1. create new cronjob using manifests that I provided
  2. check the pod result was it started or not
  3. NOTE: namespace must have - in its name, without - everything works.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.22.2
  • Cloud provider or hardware configuration: openstack
  • OS (e.g: cat /etc/os-release): debian buster
  • Kernel (e.g. uname -a):
  • Install tools: kops
  • Network plugin and version (if this is a network-related bug):
  • Others:
@zetaab zetaab added the kind/bug Categorizes issue or PR as related to a bug. label Sep 23, 2021
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 23, 2021
@zetaab
Copy link
Member Author

zetaab commented Sep 23, 2021

/sig scheduling
/sig apps

@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Sep 23, 2021
@zetaab zetaab changed the title creating new CronJobs does not work after 1.22.2 update creating new CronJobs does not work if namespace name contains - Sep 23, 2021
@nayihz
Copy link
Contributor

nayihz commented Sep 24, 2021

maybe you can also get this warning log in a namespace which its name doesn't contains - (e.g default namespace)

 Normal   Started      60s                kubelet            Started container guillotine
 Warning  FailedMount  58s (x3 over 60s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-s4s5k" : object "test"/"kube-root-ca.crt" not registered

@zetaab
Copy link
Member Author

zetaab commented Sep 28, 2021

its not visible in namespaces that does not have -

@ardaguclu
Copy link
Member

@zetaab , I executed the steps you provided. I agree with @cmssczy, this failure message is not related to namespace naming.
I also got same mount error when I removed - from namespace;

Warning  FailedMount  37s (x3 over 38s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-dcxzs" : object "ghadevcloud"/"kube-root-ca.crt" not registered

@brianpursley
Copy link
Member

brianpursley commented Oct 2, 2021

@zetaab Your manifest works fine for me on my 1.23.0-alpha cluster, even with a - in the namespace name, like you have. (EDIT: I tried this several days later and now I AM able to reproduce the error even when my namespace doesn't have a dash. See my more recent comment).

We updated our cluster to 1.22.2

What version was your cluster previously?

I think this could somehow be related to the ServiceAccount admission controller and the Bound Service Account Token Volume.

There was a feature gate that graduated in 1.21 that is supposed to copy the kube-root-ca.crt ConfigMap to every namespace:

A ConfigMap containing a CA bundle used for verifying connections to the kube-apiserver. This feature depends on the RootCAConfigMap feature gate, which publishes a "kube-root-ca.crt" ConfigMap to every namespace. RootCAConfigMap feature gate is graduated to GA in 1.21 and default to true. (This flag will be removed from --feature-gate arg in 1.22)

I wonder if your namespace existed prior to 1.21, that maybe it never got the "kube-root-ca.crt" ConfigMap.

Do you have that ConfigMap in your gha-devcloud namespace?

$ kubectl get cm -A | grep kube-root-ca.crt
dash-test         kube-root-ca.crt                     1      128m
default           kube-root-ca.crt                     1      14d
gha-devcloud      kube-root-ca.crt                     1      121m
kube-node-lease   kube-root-ca.crt                     1      14d
kube-public       kube-root-ca.crt                     1      14d
kube-system       kube-root-ca.crt                     1      14d

If not, you might be able to trigger the rootcacertpublisher by updating your namespace, perhaps by adding an annotation (ie. kubectl annotate ns gha-devcloud foo=bar or something like that). I'm not sure if that will trigger the copy or not. If not, then maybe you can copy it there yourself from the default namespace.

@sdlarsen
Copy link

I'm seeing this too. k8s 1.22.2 setup with kubeadm, no updates and the kube-root-ca.crt is available in the namespace.

@brianpursley
Copy link
Member

brianpursley commented Oct 19, 2021

@zetaab @sdlarsen I circled back to look at this again and I do in fact see the same problem now. I don't know why I didn't see it before. Now I am doubting myself, but I am almost certain the same manifest worked when I tried it a couple weeks ago trying to reproduce the issue.

Also, I see the problem regardless of whether the namespace contains a dash. So for example, just devcloud:

  Type     Reason       Age                From               Message
  ----     ------       ----               ----               -------
  Normal   Scheduled    19s                default-scheduler  Successfully assigned devcloud/example-27244787--1-g82cp to k8s-worker-2
  Normal   Pulling      19s                kubelet            Pulling image "busybox"
  Normal   Pulled       18s                kubelet            Successfully pulled image "busybox" in 274.147021ms
  Normal   Created      18s                kubelet            Created container guillotine
  Normal   Started      18s                kubelet            Started container guillotine
  Warning  FailedMount  16s (x3 over 18s)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-87ssw" : object "devcloud"/"kube-root-ca.crt" not registered

I really don't know what to make of this at this point. There must be some other factors involved. Are you sure it only happens when the namespace contains - for you?

@brianpursley
Copy link
Member

brianpursley commented Oct 20, 2021

@zetaab As a workaround, if you don't need the service account token, then you can disable it using automountServiceAccountToken: false like this:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: example
  namespace: gha-devcloud
spec:
  concurrencyPolicy: Forbid
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      activeDeadlineSeconds: 600
      backoffLimit: 3
      template:
        spec:
          containers:
          - image: busybox
            imagePullPolicy: Always
            name: guillotine
          dnsPolicy: ClusterFirst
          restartPolicy: OnFailure
          automountServiceAccountToken: false
      ttlSecondsAfterFinished: 180
  schedule: '*/1 * * * *'
  successfulJobsHistoryLimit: 3
  suspend: false

Also, I think the error message in the original issue description is only a warning, so the jobs are still running, just that you get this warning every time.

@hxsf
Copy link

hxsf commented Nov 4, 2021

same error.

Cronjob cannot get the kube-api-access volume.

@klinakuf
Copy link

Check whether the issue is happening for other pods in the same namespace. In my case the problem was a bad pod, the container was exiting for other reasons.

@untcha
Copy link

untcha commented Nov 29, 2021

I'm experiencing the same issue some times for the kube-root-ca.crt and always when the cron-job or the corresponding pod is mounting a configmap with a shell script in it. The warning appears on every run but the job itself runs successful.

My cron-job runs a shell script to update the digitalocean dns record for my domain.

Appears since I upgraded my baremetal k3s cluster to v1.22.3

@tosmi
Copy link

tosmi commented Nov 29, 2021

we experience the same issue on openshift 4.9.7 which is based on k8s 1.22.2. we are running a CronJob that mounts custom configmaps and secrets as volumes:

30m         Warning   FailedMount        pod/test-job--1-xmvbp                           MountVolume.SetUp failed for volume "ldap-bind-password" : object "ldap-group-syncer"/"ad-bind-password" not registered
30m         Warning   FailedMount        pod/test-job--1-xmvbp                           MountVolume.SetUp failed for volume "ldap-ca" : object "ldap-group-syncer"/"ldap-ca" not registered
30m         Warning   FailedMount        pod/test-job--1-xmvbp                           MountVolume.SetUp failed for volume "kube-api-access-rrdjh" : [object "ldap-group-syncer"/"kube-root-ca.crt" not registered, object "wuero-ldap-group-syncer"/"openshift-service-ca.crt" not registered]
30m         Warning   FailedMount        pod/test-job--1-xmvbp                           MountVolume.SetUp failed for volume "ldap-sync-volume" : object "ldap-group-syncer"/"ldap-group-syncer" not registered

so i think it's not only related to kube-root-ca.crt.

the job runs fine, but we see the messages above in the events every time the job is triggered.

@tosmi
Copy link

tosmi commented Nov 30, 2021

searching the k8s 1.22.2 source code for the message above lists two locations:

  • pkg/kubelet/util/manager/cache_based_manager.go and
  • pkg/kubelet/util/manager/watch_based_manager.go

i'm a noob in regards to k8s source code but it seems to be this is a caching issue.

@coachafei
Copy link

Got the same error in kubernetes 1.22.4, when I create the subPathExpr example pod in page https://kubernetes.io/docs/concepts/storage/volumes/ . The command kubectl describe pods pod1 can see the log.

Warning  FailedMount  15m (x3 over 15m)  kubelet            MountVolume.SetUp failed for volume "kube-api-access-wr5kz" : object "default"/"kube-root-ca.crt" not registered

and the pod's state was Error, if this bug was related to caching, how to flush the caching?

@slapula
Copy link

slapula commented Dec 12, 2021

I'm seeing the same behavior as @tosmi... it doesn't appear to be strictly related to kube-root-ca.crt. I originally noticed that this was causing my Kaniko jobs to fail (which I'm running in a new namespace). Both kube-root-ca.crt and the secret I'm mounting failed with this error. I took the findings in this thread and switched these Kaniko jobs to the default namespace and sure enough it could find kube-root-ca.crt just fine but my secret mount still failed.

The log lines are marked as Warning but the behavior presents as a blocking error on my end. These errors are happening on init containers and preventing the jobs from running successfully. Granted, this is on my personal k0s cluster but I can see how this could affect production workloads.

@hightechrdn
Copy link

I am also seeing these MountVolume.SetUp failed errors with a freshly deployed OpenShift 4.9.5 cluster when attempting to create a cronjob to sync LDAP groups to OpenShift. Errors look very similar to those posted by @tosmi

We just upgraded the cluster to 4.9.11 and will retest tomorrow.

@tlitke5
Copy link

tlitke5 commented Dec 17, 2021

I also see this same error with a Job on fresh install of MicroK8s (v1.22.4).

Newly created namespace called pgo. (The Job is also in this namespace.)

MountVolume.SetUp failed for volume "deployer-conf" : object "pgo"/"pgo-deployer-cm" not registered
MountVolume.SetUp failed for volume "kube-api-access-lc499" : object "pgo"/"kube-root-ca.crt" not registered

kube-root-ca.crt is automatically created, pro-deployer-cm was created by me.

kubectl get configmap -n pgo
NAME               DATA   AGE
kube-root-ca.crt   1      15m
pgo-deployer-cm    1      15m

@robbertvdg
Copy link

We see the same behaviour with custom configMaps and secrets. Any news on this?
I think it might be related to the CronJobs v2 controller which is the default in 1.22.

@liggitt
Copy link
Member

liggitt commented Jan 24, 2022

The kubelet doesn't treat pods coming from a cronjob or jobs differently than pods from any other source. The 'not registered' error means a call to the kubelet's secret or configmap manager to get a secret or configmap happened when the manager had no record of the kubelet handling a pod referencing that secret or configmap. That should not happen. I also don't see any logic in that manager that would be impacted in any way by the presence or absence of a - in the namespace name.

/assign @wojtek-t

@liggitt liggitt changed the title creating new CronJobs does not work if namespace name contains - sporadic "MountVolume.SetUp failed for volume ... not registered" failures prevent running pods Jan 24, 2022
@liggitt
Copy link
Member

liggitt commented Jan 24, 2022

@liggitt
Copy link
Member

liggitt commented Jan 24, 2022

Also seeing "not registered" methods in kubelet logs in current e2e runs (though it's unclear whether those are coming from the kubelet trying to do volume-related things after a pod has already been torn down). Opened #107739 to add some more logging around the refcount increment/decrement cases and the "not registered" instances to see if that is related

@JiayangZhou
Copy link

@kneemaa the ‘volume not registered’ issue discussed in this thread has appeared to be a warning to us which doesn’t really break anything, if you have a job you need to troubleshoot, try to sleep it, and shell into it and see if you can access the content in mounted directory and if the content is corrupted; hope this helps. And for the issue in this thread, we have upgrade AKS to 1.24.3 from 1.22.4, haven’t seen it since. It’s good to be updated.

@sandori01
Copy link

I found what the cause was here. It appears it was unrelated to the original issue already solved in the releases I tried.
The pod specification I used contained
annotations:
container.apparmor.security.beta.kubernetes.io/sftp: runtime/default
and my linux installation didn't have apparmor enabled. Checking the kubelet log revealed this. After that it was a piece of cake to point out the culprit annotation in my deployment yaml.
Everything works now.
I'm not sure why the failure to set up apparmor caused the configmap definitions not being recognized, but for now, I'm satisfied that it works.
@kneemaa try to read kubelet logs.

@moonek
Copy link
Contributor

moonek commented Oct 25, 2022

Is this issue clearly resolved in the latest version?
There are many comments that the problem still occurs. is closed correct?
And is this only happening in cronjobs?

@nightmeng
Copy link

nightmeng commented Nov 2, 2022

I think it's a problem about concurrency:

  • The volume manager finds new pod from pod manager
  • kubelet registers the new pod into configMap/secret manager in syncPod function

When kubelet see a new pod, it will add the new pod into pod manager firstly and then run syncPod function (HandlePodAddtions). These two actions are concurrency, so the volume manager might see the new pod before kubelet register the new pod into configMap/secret manager, and then we see the "not registered" events.

@DiV666
Copy link

DiV666 commented Nov 2, 2022

It has worked for me by giving it a different name from the volume that I have in the deployment.

In the deployment:

volumes:
- configMap:
    items:
    - key: config.js
      path: config.js
    name: config-api
  name: config-volume

In the cronjob I had to put:

volumes:
- configMap:
    items:
    - key: config.js
      path: config.js
    name: config-job
  name: config-volume

Is it possible that a resource cannot be reused between the deployment and the cronjob?

I am on version 1.22.13

@liggitt
Copy link
Member

liggitt commented Nov 2, 2022

While there was a distinct reproducible issue found and fixed already in v1.22.9+, v1.23.6+, and 1.24+, it seems like there is still another issue remaining that produces a similar symptom.

The comment at #105204 (comment) hints that this message might be a symptom of a pod that is failing to start for some other reason.

It would be helpful if someone still observing this symptom in a reproducible way on a version >= 1.22.9, 1.23.6, or 1.24 could open a new issue to track that with the following info:

  1. the version you are on
  2. the scenario that reproduces the message
  3. the impact (is it just an event/message? does it prevent pod startup? does it prevent pod teardown?)

That would help in gathering info related to any remaining issues that surface this message.

@liggitt liggitt changed the title sporadic "MountVolume.SetUp failed for volume ... not registered" failures seen Sporadic "MountVolume.SetUp failed for volume ... not registered" failures seen in 1.22.0-1.22.8, 1.23.0-1.23.5 Nov 2, 2022
@liggitt
Copy link
Member

liggitt commented Nov 2, 2022

I've retitled this issue to bound to the versions containing the bug fixed in #107831

If we can get an issue report with reproducible details on versions newer than that, that would help steer investigation / fixes

@gocpplua
Copy link

有人可以确认或否认该错误已在 1.23.8 版本中修复吗?

我可以确认。在 v1.23.8+k3s1 上看不到

v1.24.9+k3s2 still exists

@gorkemgoknar
Copy link

While there was a distinct reproducible issue found and fixed already in v1.22.9+, v1.23.6+, and 1.24+, it seems like there is still another issue remaining that produces a similar symptom.

The comment at #105204 (comment) hints that this message might be a symptom of a pod that is failing to start for some other reason.

It would be helpful if someone still observing this symptom in a reproducible way on a version >= 1.22.9, 1.23.6, or 1.24 could open a new issue to track that with the following info:

  1. the version you are on
  2. the scenario that reproduces the message
  3. the impact (is it just an event/message? does it prevent pod startup? does it prevent pod teardown?)

That would help in gathering info related to any remaining issues that surface this message.

I would say for me it is not actually a failure but a warning only as I am using karpenter with EKS, and this warning happens before node is actually ready, but pod is assigned to node.
I did started to see this after 1.22 though on 1.21 did not had this warning.

Server EKS 1.22 -> v1.22.17-eks
Server Version: version.Info{Major:"1", Minor:"22+", GitVersion:"v1.22.17-eks-XXXXX", GitCommit:"XXXXXXXXXXXX", GitTreeState:"clean", BuildDate:"2023-01-24T09:34:06Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

  Type     Reason             Age                    From                Message
  ----     ------             ----                   ----                -------
  Warning  FailedScheduling   2m30s                  default-scheduler   0/[REDACTED] nodes are available: 12 node(s) had taint {eks.amazonaws.com/compute-type: fargate}, that the pod didn't tolerate, [REDACTED] node(s) didn't match Pod's node affinity/selector, [REDACTED] Insufficient cpu.
  Normal   NotTriggerScaleUp  2m27s                  cluster-autoscaler  pod didn't trigger scale-up: 3 node(s) didn't match Pod's node affinity/selector
  Normal   Nominate           2m18s (x2 over 2m24s)  karpenter           Pod should schedule on ip-XXXXXXX.ec2.internal
  Normal   Scheduled          103s                   default-scheduler   Successfully assigned applications/[REDACTED] to ip-XXXXXXX.ec2.internal
  Warning  FailedMount        86s (x6 over 102s)     kubelet             MountVolume.SetUp failed for volume "kube-api-access-YYYY" : object "applications"/"kube-root-ca.crt" not registered
  Warning  NetworkNotReady    85s (x10 over 102s)    kubelet             network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
  Normal   Pulling            70s                    kubelet             Pulling image "[REDACTED]"
  Normal   Created            106s                    kubelet             Created container XXX-gpu-prod
  Normal   Started            106s                    kubelet             Started container XXX-gpu-prod

@smarterclayton
Copy link
Contributor

When kubelet see a new pod, it will add the new pod into pod manager firstly and then run syncPod function (HandlePodAddtions). These two actions are concurrency, so the volume manager might see the new pod before kubelet register the new pod into configMap/secret manager, and then we see the "not registered" events.

Secrets should be registered by syncPod, and there is separate work to address components looking at data in podManager when they should be looking at podWorkers (#115342).

But as Jordan says, a reproducer will help.

@wiseelf
Copy link

wiseelf commented Jul 17, 2023

Having the same issue on 1.27
Warning FailedMount 3s (x2 over 4s) kubelet MountVolume.SetUp failed for volume "kube-api-access-l58lf" : object "iris-trybe"/"kube-root-ca.crt" not registered
after a few errors pod starts well.
Was that fixed?

@ugur99
Copy link

ugur99 commented Sep 1, 2023

We have the same problem from time to time with two different k8s version: v1.24.10 and v1.23.7, but usually during the rollout process; old instances/pods get stuck in the termination phase with the following errors. Are there any updates on this issue?

kubelet logs:

E0828 00:58:36.944965 3293071 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/projected/3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b-kube-api-access-qlshz podName:3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b nodeName:}" failed. No retries permitted until 2023-08-28 01:00:38.94493703 +0000 UTC m=+218616.925334872 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "kube-api-access-qlshz" (UniqueName: "kubernetes.io/projected/3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b-kube-api-access-qlshz") pod "pod-name-79687c7975-hrfmv" (UID: "3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b") : object "test-namespace"/"kube-root-ca.crt" not registered
2023-08-28 03:00:39.014	
I0828 01:00:39.014791 3293071 reconciler.go:258] "operationExecutor.MountVolume started for volume \"kube-api-access-qlshz\" (UniqueName: \"kubernetes.io/projected/3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b-kube-api-access-qlshz\") pod \"pod-name-79687c7975-hrfmv\" (UID: \"3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b\") " pod="test-namespace/pod-name-79687c7975-hrfmv"
2023-08-28 03:00:39.015	
E0828 01:00:39.014988 3293071 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/projected/3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b-kube-api-access-qlshz podName:3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b nodeName:}" failed. No retries permitted until 2023-08-28 01:02:41.014973186 +0000 UTC m=+218738.995371018 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "kube-api-access-qlshz" (UniqueName: "kubernetes.io/projected/3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b-kube-api-access-qlshz") pod "pod-name-79687c7975-hrfmv" (UID: "3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b") : object "test-namespace"/"kube-root-ca.crt" not registered
2023-08-28 03:02:41.035	
I0828 01:02:41.034974 3293071 reconciler.go:258] "operationExecutor.MountVolume started for volume \"kube-api-access-qlshz\" (UniqueName: \"kubernetes.io/projected/3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b-kube-api-access-qlshz\") pod \"pod-name-79687c7975-hrfmv\" (UID: \"3fdf6bd3-78d8-4ce6-a0ed-e97005a8d06b\") " pod="test-namespace/pod-name-79687c7975-hrfmv"

kubectl describe pod pod-name-79687c7975-hrfmv:

Name:                      pod-name-79687c7975-hrfmv
Namespace:                 test-namespace
Node:                      worker02/ip
Status:                    Terminating (lasts 132m)
Termination Grace Period:  30s
Containers:
  pod-name:
    State:          Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Fri, 25 Aug 2023 13:17:59 +0200
      Finished:     Fri, 25 Aug 2023 14:19:12 +0200
    Ready:          False
    Restart Count:  0
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  kube-api-access-qlshz:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
Events:
  Type     Reason       Age                  From     Message
  ----     ------       ----                 ----     -------
  Warning  FailedMount  86s (x70 over 131m)  kubelet  MountVolume.SetUp failed for volume "kube-api-access-qlshz" : object "test-namespace"/"kube-root-ca.crt" not registered

@amrap030
Copy link

amrap030 commented Oct 13, 2023

Hello together, I have a K3S Cluster v1.28.2+k3s1 with 1 Master and 3 Worker nodes, and I just recently got the following error, whenever I try do deploy new pods: MountVolume.SetUp failed for volume "kube-api-access-bww6r" : configmap "kube-root-ca.crt" not found. All other previously deployed pods are working fine. No matter what application I try to deploy now, I get the same error every time.

In addition, when I run kubectl get cm -A | grep kube-root-ca.crt every namespace contains the configmap just fine.

I also deleted some of my other applications to see if there was some kind of limit reached, but still, I couldn't deploy new applications. When I then redeployed the applications I deleted for testing purposes, those applications again did deploy just fine. It is only for new applications that were not deployed in the cluster before.

Can you please investigate? I cannot deploy any applications anymore!

@pyotrantropov
Copy link

I found what the cause was here. It appears it was unrelated to the original issue already solved in the releases I tried. The pod specification I used contained annotations: container.apparmor.security.beta.kubernetes.io/sftp: runtime/default and my linux installation didn't have apparmor enabled. Checking the kubelet log revealed this. After that it was a piece of cake to point out the culprit annotation in my deployment yaml. Everything works now. I'm not sure why the failure to set up apparmor caused the configmap definitions not being recognized, but for now, I'm satisfied that it works. @kneemaa try to read kubelet logs.

Same error for me. Tnx now its up and running. Just # annotations

@shkpk
Copy link

shkpk commented Dec 14, 2023

annotations didnt work for me

is invalid: spec.jobTemplate.spec.template.annotations[container.apparmor.security.beta.kubernetes.io/sftp]: Invalid value: "sftp": container not found```
any other solution?

@sslgeorge
Copy link

sslgeorge commented Jan 21, 2024

Using :

 K8s Rev: v1.28.5+k3s1
 

pod fails to start in any namespace that is not the default namespace. with the same error as the topic.

All namespaces contain the root-ca crt

 kubectl get cm -A | grep kube-root-ca.crt                                                                                                                                                                                       rbenv:3.0.4
default           kube-root-ca.crt                                       1      13d
kube-node-lease   kube-root-ca.crt                                       1      13d
kube-public       kube-root-ca.crt                                       1      13d
kube-system       kube-root-ca.crt                                       1      13d
metallb-system    kube-root-ca.crt                                       1      13d
ingress-nginx     kube-root-ca.crt                                       1      6d7h
monitoring        kube-root-ca.crt                                       1      3h17m

@sslgeorge
Copy link

I found the issue to mine, I am running on a bare-metal server and the admission controller plugin was not enabled. Once I enabled it and restarted the api-server it started working.

@maximfox
Copy link

admission controller plugin was not enabled

sslgeorge, would you please provide details what exactly plugin was not enabled? On my cluster v1.22 where I am experiencing the issue with cronjobs, api-server runs with --enable-admission-plugins=NodeRestriction. The secret is not kube-root-ca.crt though, it is a custom secret created by us.

Moreover, I see the issue with my other k8s clusters of different kinds, it is started right after a routine helm uninstall/install operation (the way we upgrade our releases)... Usually the warnings are disappeared after couple of hours but sometimes they are persistent. The pods are started and completed well, except for the warnings.

@jinxycandotailwhip
Copy link

jinxycandotailwhip commented Mar 14, 2024

Any new progress on this issue ?

  1. This is because configmap and secret support hot update, so it will be remounted in every reconcile.
  2. configmap and secret local cache were mantained by podManager, if a pod was terminated, they will be removed from local cache, at this time, a get request will cause a "not registered error".
  3. Since delete volume in desiredStateOfWorld is an independent goroutine, when configmaps and secrets were removed from desiredStateOfWorld with some delay, terminated pod will still request to mount configMap and secret ( we cannot forbid this operation because some other reason).
  4. When a pod is terminated, configmap and secret were already removed from local cache, so a mount request will lead to "not registered" error.

@yifeng-cerebras
Copy link

Seen this on v1.24.4.
After node restart, we see these errors but seem only warnings for already terminated pods while kubelet.
It keeps retrying in kubelet logs with "not registered" for all the configmaps in the NS and it will be cleared after restart kubelet

Mar 14 21:38:28 cs301-wse002-mx-sr08 kubelet[2788]: I0314 21:38:28.072221 2788 reconciler.go:342] "operationExecutor.VerifyControllerAttachedVolume started for volume "cluster-details-config-volume" (UniqueName: "kubernetes.io/configmap/b2b46050-1b49-4de5-a12a-9ab30739d1f6-cluster-details-config-volume") pod "wsjob-vdq34zcuetvvbkvkw5umou-chief-0" (UID: "b2b46050-1b49-4de5-a12a-9ab30739d1f6") " pod="cs3-1x-s8/wsjob-vdq34zcuetvvbkvkw5umou-chief-0"
Mar 14 21:38:28 cs301-wse002-mx-sr08 kubelet[2788]: E0314 21:38:28.667398 2788 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/configmap/b2b46050-1b49-4de5-a12a-9ab30739d1f6-cluster-details-config-volume podName:b2b46050-1b49-4de5-a12a-9ab30739d1f6 nodeName:}" failed. No retries permitted until 2024-03-14 21:38:29.167242501 +0000 UTC m=+2.996782020 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "cluster-details-config-volume" (UniqueName: "kubernetes.io/configmap/b2b46050-1b49-4de5-a12a-9ab30739d1f6-cluster-details-config-volume") pod "wsjob-vdq34zcuetvvbkvkw5umou-chief-0" (UID: "b2b46050-1b49-4de5-a12a-9ab30739d1f6") : object "cs3-1x-s8"/"cluster-details-config-wsjob-vdq34zcuetvvbkvkw5umou" not registered
Mar 14 21:38:28 cs301-wse002-mx-sr08 kubelet[2788]: E0314 21:38:28.667396 2788 configmap.go:193] Couldn't get configMap cs3-1x-s8/cluster-details-config-wsjob-bqq9j3zsyc4ncpw5ghidbm: object "cs3-1x-s8"/"cluster-details-config-wsjob-bqq9j3zsyc4ncpw5ghidbm" not registered

@dminca
Copy link

dminca commented Apr 11, 2024

Issue reproducible on v1.25.9 as well

Operation for "{volumeName:kubernetes.io/projected/1b21223f-eb68-473b-a143-aaf2b9658e6b-kube-api-access-q682j podName:1b21223f-eb68-473b-a143-aaf2b9658e6b nodeName:}" failed. No retries permitted until 2024-04-11 12:02:12.418388815 +0000 UTC m=+2449.479261782 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp failed for volume "kube-api-access-q682j" (UniqueName: "kubernetes.io/projected/1b21223f-eb68-473b-a143-aaf2b9658e6b-kube-api-access-q682j") pod "blinky-28507605-d69gx" (UID: "1b21223f-eb68-473b-a143-aaf2b9658e6b") : object "default"/"kube-root-ca.crt" not registered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests