Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubelet failed to create containerd task if cgroupRoot defined cpuset and CPU Manager configured with static policy #124440

Open
t33m opened this issue Apr 22, 2024 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@t33m
Copy link

t33m commented Apr 22, 2024

What happened?

If the kubelet is configured with cgroupRoot and cpuManagerPolicy: static and cpuset cgroup is defined with a specific vCPUs range, the kubelet fails to start containerd tasks or update container resources:

E0422 11:37:18.746817  109321 remote_runtime.go:343] "StartContainer from runtime service failed" err="rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: unable to apply cgroup configuration: failed to write \"0-39\": write /sys/fs/cgroup/cpuset/pods.slice/pods-kubepods.slice/pods-kubepods-burstable.slice/pods-kubepods-burstable-pode444cc90_8458_4d84_8319_a443fe6e975a.slice/cri-containerd-8e99221bf7eb3049f7afa35b3719f7870088c5191508f2bdf47959fe2a677385.scope/cpuset.cpus: permission denied: unknown" containerID="8e99221bf7eb3049f7afa35b3719f7870088c5191508f2bdf47959fe2a677385"

What did you expect to happen?

The CPU Manager respects the cpuset of the root cgroup and uses its value as the defaultCpuSet.

How can we reproduce it (as minimally and precisely as possible)?

use following /var/lib/kubelet/config.yaml:

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- fd00:10:245::a
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMaximumGCAge: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
cpuManagerPolicy: static
reservedSystemCPUs: 0-1,20-21
cgroupRoot: /pods

create cgroup with cpuset:

for DIR in hugetlb cpuset cpu,cpuacct memory systemd pids; do /bin/mkdir -p /sys/fs/cgroup/$DIR/pods.slice; done
echo 0-1 > /sys/fs/cgroup/cpuset/pods.slice/cpuset.mems
echo 0-1,6-39 > /sys/fs/cgroup/cpuset/pods.slice/cpuset.cpus

restart kubelet:

systemctl stop kubelet
rm /var/lib/kubelet/cpu_manager_state
systemctl start kubelet
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-39","checksum":421241391}

check logs:

journalctl -u kubelet -f

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.1", GitCommit:"bc401b91f2782410b3fb3f9acf43a995c4de90d2", GitTreeState:"clean", BuildDate:"2024-01-17T15:41:12Z", GoVersion:"go1.21.6", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.27) and server (1.29) exceeds the supported minor version skew of +/-1

Cloud provider

OS version

No response

Install tools

No response

Container runtime (CRI) and version (if applicable)

``` containerd --version containerd github.com/containerd/containerd 1.7.2 ```

Related plugins (CNI, CSI, ...) and versions (if applicable)

No response

@t33m t33m added the kind/bug Categorizes issue or PR as related to a bug. label Apr 22, 2024
@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 22, 2024
@HirazawaUi
Copy link
Contributor

/sig node

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 22, 2024
@ffromani
Copy link
Contributor

/cc

1 similar comment
@swatisehgal
Copy link
Contributor

/cc

@SergeyKanzhelev SergeyKanzhelev added this to Triage in SIG Node Bugs Apr 24, 2024
@AnishShah
Copy link
Contributor

AnishShah commented Apr 24, 2024

@t33m, what is the motivation to define a cpuset? and what is the expected behavior on how CPU allocation should happen? how critical is this for you to assess the priority for this bug?

@AnishShah
Copy link
Contributor

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 24, 2024
@AnishShah AnishShah moved this from Triage to Triaged in SIG Node Bugs Apr 24, 2024
@t33m
Copy link
Author

t33m commented Apr 25, 2024

@AnishShah, to limit cores for pods workload.

For example, I have my own services that use systemd to start and I can define CPU set for them via CPUAffinity option in /etc/systemd/system.conf file. But I also want to be sure, that the same cores will never used by pods.

@ffromani
Copy link
Contributor

related: #118021 and #123979

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/node Categorizes an issue or PR as relevant to SIG Node. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Development

No branches or pull requests

6 participants