Resource consuption are not limited or OOM killed #10371

cedricmoulard · 2024-05-03T13:28:06Z

Description

I want to sandbox a pod with gvisor and limit resources consuption (cpu and memory).

I am using containerd as container runtime.

I notice that pod consumes more memory and cpu than it should. I tried many configs but it seems that gVisor is not able to manage that yet.

Steps to reproduce

Configuration

Runsc

File /etc/containerd/runsc.toml

log_path = "/var/log/runsc/%ID%/shim.log"
log_level = "debug"
[runsc_config]
debug = "true"
debug-log = "/var/log/runsc/%ID%/gvisor.%COMMAND%.log.json"
debug-log-format = "json"

Containerd

~# ctr --version
ctr github.com/containerd/containerd v1.7.13

File /etc/containerd/config.toml

version = 2

root = "/var/lib/containerd"
state = "/run/containerd"

[plugins."io.containerd.grpc.v1.cri".containerd]
  no_pivot = false
  default_runtime_name = "runc"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes]
    [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc]
     runtime_type = "io.containerd.runsc.v1"
     [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runsc.options]
       TypeUrl = "io.containerd.runsc.v1.options"
       BinaryName = "/usr/local/bin/runsc"
       ConfigPath = "/etc/containerd/runsc.toml"

Execute

Kubernetes resources

I am using stress-ng to request 2048Mi and 9vCpu.
I am setting container resources limit to 1024Mi and 1 vCpu

---
apiVersion: v1
kind: Namespace
metadata:
  name: test-gvisor
---
apiVersion: v1
kind: Pod
metadata:
  name: memory-test-sandboxed
  namespace: test-gvisor
spec:
  runtimeClassName: gvisor
  containers:
  - args:
      - -c 2
      - -t 600s
      - -m 8
      - -M
    image: polinux/stress-ng
    name: memory-test-sandboxed
    resources:
      limits:
        cpu: "1"
        memory: 1024Mi
      requests:
        cpu: "1"
        memory: 1024Mi

Get pod and containers ID/UID

export POD_ID=$(crictl pods --name memory-test-sandboxed -v -o json | jq -r ".items[0].id")
export POD_UID=$(crictl pods --name memory-test-sandboxed -v -o json | jq -r ".items[0].metadata.uid")
export POD_UID_UNDERSCORED=$(echo "$POD_UID" | tr '-' '_')
echo "POD_ID: ${POD_ID}"
echo "POD_UID: ${POD_UID}"
echo "POD_UID_UNDERSCORED: ${POD_UID_UNDERSCORED}"
export CONTAINER_ID=$(crictl ps -v -o json --pod $POD_ID | jq -r ".containers[0].id")
echo "CONTAINER_ID: ${CONTAINER_ID}"

Inspect Pod and Container

crictl inspect $CONTAINER_ID > /var/log/runsc/${CONTAINER_ID}/config.json
crictl inspectp $POD_ID > /var/log/runsc/${POD_ID}/config.json

crictl stats $CONTAINER_ID

List Logs

ls -ll /var/log/runsc/${POD_ID}
ls -ll /var/log/runsc/${CONTAINER_ID}

Get cgroup informations

ls -ll /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}
ls -ll /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}

Check memory

CGROUP_EXPORT_FILE=/var/log/runsc/${POD_ID}/cgroup.txt
touch $CGROUP_EXPORT_FILE
echo "================================= SYSTEMD CGROUP ${POD_ID}\n" >> $CGROUP_EXPORT_FILE
echo "================================= K8s POD CRI CONTAINER ${POD_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${POD_ID}/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= k8s CONTAINER CRI CONTAINER ${CONTAINER_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> /var/log/runsc/${CONTAINER_ID}/cgroup.txt
cat /sys/fs/cgroup/system.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice:cri-containerd:${CONTAINER_ID}/memory.current >> $CGROUP_EXPORT_FILE

echo "================================= KUBEPODS CGROUP ${POD_ID}" >> $CGROUP_EXPORT_FILE
echo "memory.max:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice/memory.max >> $CGROUP_EXPORT_FILE
echo "memory.current:" >> $CGROUP_EXPORT_FILE
cat /sys/fs/cgroup/kubepods.slice/kubepods-pod${POD_UID_UNDERSCORED}.slice/memory.current >> $CGROUP_EXPORT_FILE
echo "================================= STATS CONTAINER ${CONTAINER_ID}" >> $CGROUP_EXPORT_FILE
echo "memory usage in bytes:" >> $CGROUP_EXPORT_FILE
crictl stats -o json $CONTAINER_ID | jq -r ".stats[0].memory.usageBytes.value" >> $CGROUP_EXPORT_FILE

Results

All logs are available here: https://github.com/cedricmoulard/gvisor-ressources-issue

Pod on cluster

I expect pod to be OOM killed or to use less than 1Gi and 1vCpu

kubectl top po                                                                                                                                                                                       
NAME                    CPU(cores)   MEMORY(bytes)                                                                                                                                                
memory-test-sandboxed   9074m        2093Mi

Cgroups

cat $CGROUP_EXPORT_FILE

================================= SYSTEMD CGROUP 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af\n
================================= K8s POD CRI CONTAINER 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af
memory.max:
max
memory.current:
2117292032
================================= k8s CONTAINER CRI CONTAINER 1d5c25b3695fc85879172bfac423f4417e0fdaeb29e4a27cb99c6db2712eed99
memory.max:
max
0
================================= KUBEPODS CGROUP 02521dbbb0016b638eccb79d4362ff927dca72a9ebb4f6830781e82fcbc920af
memory.max:
1073741824
memory.current:
0
================================= STATS CONTAINER 1d5c25b3695fc85879172bfac423f4417e0fdaeb29e4a27cb99c6db2712eed99
memory usage in bytes:
2208296960

runsc version

runsc version release-20240422.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux k8s-test-gvisor-kosmos-node01 5.15.0-102-generic #112-Ubuntu SMP Tue Mar 5 16:50:32 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

All logs are available here: https://github.com/cedricmoulard/gvisor-ressources-issue

The text was updated successfully, but these errors were encountered:

manninglucas · 2024-05-09T23:31:06Z

From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format slice:cri-containerd:uid) but not specifying the systemd-cgroup=true in runsc_config. Can you try adding that flag and seeing if you get the same behavior?

cedricmoulard · 2024-05-13T14:07:53Z

From the logs you shared it looks like you/containerd are specifying a systemd cgroup path (format slice:cri-containerd:uid) but not specifying the systemd-cgroup=true in runsc_config. Can you try adding that flag and seeing if you get the same behavior?

Yes, it's working, thank you

EtiennePerot · 2024-05-14T00:12:50Z

@manninglucas Can we autodetect whether or not systemd-based cgroup control should be enabled?

manninglucas · 2024-05-14T00:45:03Z

@EtiennePerot Maybe, but I think we should always try to stay in line with what runc does. Runc doesn't attempt to auto-detect systemd based configuration, it just reads whatever the user sets for the --systemd-cgroup flag (default: false) [1] same as runsc. I can add a short README to the runsc systemd folder clarifying how this works to help future users avoid this confusion.

[1] https://github.com/opencontainers/runc/blob/e8bec1ba40039a004d57ddc0a9afec9a8364172b/docs/systemd.md

EtiennePerot · 2024-05-14T00:46:53Z

Fair enough, but perhaps also a warning log message in the runsc logs if this is detected?

cedricmoulard added the type: bug Something isn't working label May 3, 2024

EtiennePerot self-assigned this May 8, 2024

EtiennePerot mentioned this issue May 9, 2024

Resource consumption by python are not limited #10264

Open

cedricmoulard closed this as not planned Won't fix, can't repro, duplicate, stale May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resource consuption are not limited or OOM killed #10371

Resource consuption are not limited or OOM killed #10371

cedricmoulard commented May 3, 2024

manninglucas commented May 9, 2024

cedricmoulard commented May 13, 2024

EtiennePerot commented May 14, 2024

manninglucas commented May 14, 2024

EtiennePerot commented May 14, 2024

Resource consuption are not limited or OOM killed #10371

Resource consuption are not limited or OOM killed #10371

Comments

cedricmoulard commented May 3, 2024

Description

Steps to reproduce

Configuration

Runsc

Containerd

Execute

Kubernetes resources

Get pod and containers ID/UID

Inspect Pod and Container

List Logs

Get cgroup informations

Results

Pod on cluster

Cgroups

runsc version

docker version (if using docker)

uname

kubectl (if using Kubernetes)

repo state (if built from source)

runsc debug logs (if available)

manninglucas commented May 9, 2024

cedricmoulard commented May 13, 2024

EtiennePerot commented May 14, 2024

manninglucas commented May 14, 2024

EtiennePerot commented May 14, 2024