Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource consuption are not limited or OOM killed #10371

Closed
cedricmoulard opened this issue May 3, 2024 · 5 comments
Closed

Resource consuption are not limited or OOM killed #10371

cedricmoulard opened this issue May 3, 2024 · 5 comments
Assignees
Labels
type: bug Something isn't working

Comments

@cedricmoulard
Copy link

Description

I want to sandbox a pod with gvisor and limit resources consuption (cpu and memory).

I am using containerd as container runtime.

I notice that pod consumes more memory and cpu than it should. I tried many configs but it seems that gVisor is not able to manage that yet.

Steps to reproduce

Configuration

Runsc

File /etc/containerd/runsc.toml

log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "debug"
[runsc_config]
debug = "true"
debug-log = "/var/log/runsc/%ID%/gvisor.%COMMAND%.log.json"
debug-log-format = "json"

Containerd

~# ctr --version
ctr github.com/containerd/containerd v1.7.13

File /etc/containerd/config.toml

version = 2

root = "/var/lib/containerd"
state = "/run/containerd"

[plugins."io.containerd.grpc.v1.cri".containerd]
  no_pivot = false
  default_runtime_name = "runc"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
     runtime_type = "io.containerd.runsc.v1"
     [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
       TypeUrl = "io.containerd.runsc.v1.options"
       BinaryName = "/usr/local/bin/runsc"
       ConfigPath = "/etc/containerd/runsc.toml"

Execute

Kubernetes resources

I am using stress-ng to request 2048Mi and 9vCpu.
I am setting container resources limit to 1024Mi and 1 vCpu

---
apiVersion: v1
kind: Namespace
metadata:
  name: test-gvisor
---
apiVersion: v1
kind: Pod
metadata:
  name: memory-test-sandboxed
  namespace: test-gvisor
spec:
  runtimeClassName: gvisor
  containers:
  - args:
      - -c 2
      - -t 600s
      - -m 8
      - -M
    image: polinux/stress-ng
    name: memory-test-sandboxed
    resources:
      limits:
        cpu: "1"
        memory: 1024Mi
      requests:
        cpu: "1"
        memory: 1024Mi

Get pod and containers ID/UID

export POD_ID=$(crictl pods --name memory-test-sandboxed -v -o json | jq -r ".items[0].id")
export POD_UID=$(crictl pods --name memory-test-sandboxed -v -o json | jq -r ".items[0].metadata.uid")
export POD_UID_UNDERSCORED=$(echo "$POD_UID" | tr '-' '_')
echo "POD_ID: ${POD_ID}"
echo "POD_UID: ${POD_UID}"
echo "POD_UID_UNDERSCORED: ${POD_UID_UNDERSCORED}"
export CONTAINER_ID=$(crictl ps -v -o json --pod $POD_ID | jq -r ".containers[0].id")
echo "CONTAINER_ID: ${CONTAINER_ID}"

Inspect Pod and Container

crictl inspect $CONTAINER_ID > /var/log/runsc/${CONTAINER_ID}/config.json
crictl inspectp $POD_ID > /var/log/runsc/${POD_ID}/config.json
crictl stats $CONTAINER_ID

List Logs

ls -ll /var/log/runsc/${POD_ID}
ls -ll /var/log/runsc/${CONTAINER_ID}

Get cgroup informations

ls -ll /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}
ls -ll /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}

Check memory

CGROUP_EXPORT_FILE=/var/log/runsc/${POD_ID}/cgroup.txt
touch $CGROUP_EXPORT_FILE
echo "================================= SYSTEMD CGROUP ${POD_ID}\n" >> $CGROUP_EXPORT_FILE
echo "================================= K8s POD CRI CONTAINER ${POD_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= k8s CONTAINER CRI CONTAINER ${CONTAINER_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> /var/log/runsc/${CONTAINER_ID}/cgroup.txt
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= KUBEPODS CGROUP ${POD_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= STATS CONTAINER ${CONTAINER_ID}" >> $CGROUP_EXPORT_FILE
echo "memory usage in bytes:" >> $CGROUP_EXPORT_FILE
crictl stats -o json $CONTAINER_ID | jq -r ".stats[0].memory.usageBytes.value" >> $CGROUP_EXPORT_FILE

Results

All logs are available here: https://github.com/cedricmoulard/gvisor-ressources-issue

Pod on cluster

I expect pod to be OOM killed or to use less than 1Gi and 1vCpu

kubectl top po                                                                                                                                                                                       
NAME                    CPU(cores)   MEMORY(bytes)                                                                                                                                                
memory-test-sandboxed   9074m        2093Mi

Cgroups

cat $CGROUP_EXPORT_FILE

================================= SYSTEMD CGROUP 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af\n
================================= K8s POD CRI CONTAINER 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af
memory.max:
max
memory.current:
2117292032
================================= k8s CONTAINER CRI CONTAINER 1d5c25b3695fc85879172bfac423f4417e0fdaeb29e4a27cb99c6db2712eed99
memory.max:
max
0
================================= KUBEPODS CGROUP 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af
memory.max:
1073741824
memory.current:
0
================================= STATS CONTAINER 1d5c25b3695fc85879172bfac423f4417e0fdaeb29e4a27cb99c6db2712eed99
memory usage in bytes:
2208296960

runsc version

runsc version release-20240422.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux k8s-test-gvisor-kosmos-node01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

All logs are available here: https://github.com/cedricmoulard/gvisor-ressources-issue
@cedricmoulard cedricmoulard added the type: bug Something isn't working label May 3, 2024
@EtiennePerot EtiennePerot self-assigned this May 8, 2024
@manninglucas
Copy link
Contributor

From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format slice:cri-containerd:uid) but not specifying the systemd-cgroup=true in runsc_config. Can you try adding that flag and seeing if you get the same behavior?

@cedricmoulard
Copy link
Author

From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format slice:cri-containerd:uid) but not specifying the systemd-cgroup=true in runsc_config. Can you try adding that flag and seeing if you get the same behavior?

Yes, it's working, thank you

@cedricmoulard cedricmoulard closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024
@EtiennePerot
Copy link
Contributor

@manninglucas Can we autodetect whether or not systemd-based cgroup control should be enabled?

@manninglucas
Copy link
Contributor

@EtiennePerot Maybe, but I think we should always try to stay in line with what runc does. Runc doesn't attempt to auto-detect systemd based configuration, it just reads whatever the user sets for the --systemd-cgroup flag (default: false) [1] same as runsc. I can add a short README to the runsc systemd folder clarifying how this works to help future users avoid this confusion.

[1] https://github.com/opencontainers/runc/blob/e8bec1ba40039a004d57ddc0a9afec9a8364172b/docs/systemd.md

@EtiennePerot
Copy link
Contributor

Fair enough, but perhaps also a warning log message in the runsc logs if this is detected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants