Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource consumption by python are not limited #10264

Open
chivalryq opened this issue Apr 9, 2024 · 3 comments
Open

Resource consumption by python are not limited #10264

chivalryq opened this issue Apr 9, 2024 · 3 comments
Assignees
Labels
type: bug Something isn't working

Comments

@chivalryq
Copy link

chivalryq commented Apr 9, 2024

Description

I'm building a sandbox service with gVisor. But the python seems to be able to apply unlimited memory while a bash script trying to apply unlimited memory are marked Error in Pod status.

Steps to reproduce

  1. Setup a kubernetes cluster with gVisor runtime class
  2. Apply the deploy below. It will try to apply 10GB memory.
cat << 'EOF' | kubectl apply -f -                                                                              
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: memory-eater-python
  name: memory-eater-python
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: memory-eater-python
  template:
    metadata:
      labels:
        app: memory-eater-python
    spec:
      containers:
      - command:
        - python
        args: ["-c", "import sys; big_list = []; print('Attempting to allocate 100GB of memory...'); [big_list.append(' ' * 10**6) for _ in range(100000)]"]
        image: python
        name: ubuntu
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 999
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 200m
            ephemeral-storage: 200M
            memory: "214748364"
      dnsPolicy: Default
      hostNetwork: true
      restartPolicy: Always
      runtimeClassName: gvisor
EOF
  1. After a while, run kubectl top
kubectl top pod -n default <pod-name>

I got the result. The memory is ~62GiB because in my pod because I'm trying to investigating why it makes our machine to be OOM. So, my pod apply ~100GiB memory.

NAME                                  CPU(cores)   MEMORY(bytes)
memory-eater-python-887b744f9-2snvs   984m         62654Mi
  1. As a negetive case, the bash script will be limited and pod will fail.
cat << 'EOF' | kubectl apply -f -                                                                              
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: memory-eater-bash
  name: memory-eater-bash
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: memory-eater-bash
  template:
    metadata:
      labels:
        app: memory-eater-bash
    spec:
      containers:
      - command:
        - bash
        - -c
        - big_var=data; while true; do big_var="$big_var$big_var"; done
        image: python
        name: ubuntu
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 999
        resources:
          limits:
            cpu: "1"
            memory: 512Mi
          requests:
            cpu: 200m
            ephemeral-storage: 200M
            memory: "214748364"
      dnsPolicy: Default
      hostNetwork: true
      restartPolicy: Always
      runtimeClassName: gvisor
EOF

runsc version

runsc version release-20231009.0
spec: 1.1.0-rc.1

docker version (if using docker)

No response

uname

Linux 3090-k8s-node029 5.15.0-69-generic #76-Ubuntu SMP Fri Mar 17 17:19:29 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

kubectl (if using Kubernetes)

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.4", GitCommit:"fa3d7990104d7c1f16943a67f11b154b71f6a132", GitTreeState:"clean", BuildDate:"2023-07-19T12:20:54Z", GoVersion:"go1.20.6", Compiler:"gc", Platform:"darwin/arm64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.2", GitCommit:"fc04e732bb3e7198d2fa44efa5457c7c6f8c0f5b", GitTreeState:"clean", BuildDate:"2023-02-22T13:32:22Z", GoVersion:"go1.19.6", Compiler:"gc", Platform:"linux/amd64"}

repo state (if built from source)

No response

runsc debug logs (if available)

Haven't do it in the cluster
@chivalryq chivalryq added the type: bug Something isn't working label Apr 9, 2024
@EtiennePerot EtiennePerot self-assigned this Apr 11, 2024
@EtiennePerot
Copy link
Contributor

EtiennePerot commented May 9, 2024

Hi; I can't seem to reproduce this, at least on GKE.

gVisor doesn't by itself do memory limiting; instead, it relies on the host Linux kernel to do this. It is set up here as part of container startup which eventually ends up here to control memory. This way, it limits both the total memory usage of the sum of the gVisor kernel and the processes within it with a single limit. If that goes over the limit, this should be killed by the Linux OOM killer, and this should be visible in dmesg on the machine.

The enforcement mechanism depends on many moving parts, so I suggest checking all of them.

  • The OOM killer must be enabled on the host Linux kernel.
  • cgroupfs must be mounted on the host (typically at /sys/fs/cgroup).
  • Note that cgroupfs comes into two versions (v1 and v2) which changes things quite a bit.
  • Make sure runsc's --ignore-cgroups flag is not specified.
  • If you use runsc's --systemd-cgroup, make sure you have systemd >= v244.
  • The Linux.CgroupsPath may need to be set properly in the OCI spec. It is probably incorrect (but need debug logs to check)
  • The gVisor shim can set the dev.gvisor.spec.cgroup-parent annotation to set the cgroups path as well (this would show up in debug logs).

If all of this is in place, please provide runsc debug logs, details on how you installed gVisor within the Kubernetes cluster (runsc flags etc.), systemd version (systemd --version), cgroup version (output of cat /proc/mounts), which cgroup controllers are enabled (cat /sys/fs/cgroup/cgroup.controllers).

Also please check #10371 which was filed recently after this issue and looks quite similar.

@chivalryq
Copy link
Author

chivalryq commented May 12, 2024

@EtiennePerot Thanks for replying! We have found the problem thanks to @charlie0129.

It ends up with that we didn't configure gvisor to use systemd-cgroup which is our cgroup manager in the cluster. After add systemd-cgroup and upgrade the gvisor to the latest version, the OOM pod is properly killed by Linux. If I understand it correctly the default option is to use cgroupfs which is not the mainstream. Would it be better to move to systemd-cgroup as a defualt?

But I don't seem to find any related document/FAQs about cgroup manager. Forgive me if I miss it. And if there is truly not any of them. It would be kind to mention it somewhere in document.

@EtiennePerot
Copy link
Contributor

Would it be better to move to systemd-cgroup as a default?

See discussion on #10371 on this. Apparently runc's default behavior is also systemd-group=false, and runsc needs to match runc behavior in order to remain a drop-in replacement for it. But +1 on the need for documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@EtiennePerot @chivalryq and others