Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

======== OOM Adjust script ========== #3931

Open
jbaldwinroberts opened this issue Mar 21, 2024 · 0 comments
Open

======== OOM Adjust script ========== #3931

jbaldwinroberts opened this issue Mar 21, 2024 · 0 comments
Labels
type:bug Something isn't working

Comments

@jbaldwinroberts
Copy link

jbaldwinroberts commented Mar 21, 2024

What went wrong?

================================= System Info ==================================
version: v0.8.4
build-sha: c22fa520401cf274bd92151442ea0d9c353173fa
platform: darwin/arm64; macOS 14.4
================================ Docker Version ================================
&containerutil.FrontendInfo{ClientVersion:"24.0.2", ClientAPIVersion:"1.43", ClientPlatform:"darwin/arm64", ServerVersion:"24.0.2", ServerAPIVersion:"1.43", ServerPlatform:"linux/arm64", ServerAddress:"/var/run/docker.sock"}
================================ Buildkit Logs =================================
starting earthly-buildkit with EARTHLY_GIT_HASH=c22fa520401cf274bd92151442ea0d9c353173fa BUILDKIT_BASE_IMAGE=github.com/earthly/buildkit:fbd249f838cc215eef3f5c900884ae0779ff4e50+build
detected cgroups v2; buildkit/entrypoint.sh running under pid=1 with controllers "cpuset cpu io memory hugetlb pids rdma" in group 0::/
Autodetecting iptables
Could not find an ip_tables module; falling back to heuristics.
Detected iptables-legacy by output length (4 >= 4)
BUILDKIT_ROOT_DIR=/tmp/earthly/buildkit
CACHE_SIZE_MB=83261
BUILDKIT_MAX_PARALLELISM=20
BUILDKIT_LOCAL_REGISTRY_LISTEN_PORT=8371
EARTHLY_ADDITIONAL_BUILDKIT_CONFIG=
CNI_MTU=65535
OOM_SCORE_ADJ=0
======== CNI config ==========
{
    "cniVersion": "0.3.0",
    "name": "buildkitbuild",
    "type": "bridge",
    "bridge": "cni0",
    "isGateway": true,
    "ipMasq": true,
    "mtu": 65535,
    "ipam": {
        "type": "host-local",
        "subnet": "172.30.0.0/16",
        "routes": [
            { "dst": "0.0.0.0/0" }
        ]
    }
}
======== End CNI config ==========
======== Buildkitd config ==========
debug = false
root = "/tmp/earthly/buildkit"
insecure-entitlements = [ "security.insecure" ]
[worker.oci]
  enabled = true
  snapshotter = "auto"
  max-parallelism = 20
  gc = true
  networkMode = "cni"
  cniBinaryPath = "/usr/libexec/cni"
  cniConfigPath = "/etc/cni/cni-conf.json"
    # Please note the required indentation to fit in buildkit.toml.template accordingly.
  # gckeepstorage sets storage limit for default gc profile, in MB.
  gckeepstorage = 83261
  [[worker.oci.gcpolicy]]
    keepBytes = 43652743168
    filters = [ "type==source.local", "type==source.git.checkout"]
  [[worker.oci.gcpolicy]]
    all = true
    keepBytes = 87305486336
======== End buildkitd config ==========
======== OOM Adjust script ==========
#! /bin/sh
set -e
OOM_ADJ="0"
DEBUG="false"
INVOCATION=$(tr -dc A-Za-z0-9 </dev/urandom | head -c 6 ; echo '')
log() {
    ! "$DEBUG" || echo "$(date) [$INVOCATION] | $1" >> /var/log/oom_adj
}
adjust_oom() {
  echo "$1" > /proc/"$2"/oom_score_adj || true # It is ok if the OOM score fails - the PID may have exited, so it no longer matters anyways.
}
if [ "$OOM_ADJ" -eq "0" ]; then
    exit 0
fi
for PID in $(pidof buildkit-runc)
do
    PID_NAME=$(cat /proc/"$PID"/cmdline || echo "unknown")
    case "$PID_NAME" in
        # This is the POSIX way to do a string-starts-with, and accomodates the one prefix we do not want. Order here is important.
        "buildkit-runcinit"*) log "$PID was buildkit-runcinit, ignoring"; continue ;;
        "buildkit-runc"*)     log "$PID is runc parent($PID_NAME), proceeding" ;;
        *)                    log "$PID was $PID_NAME, ignoring"; continue ;;
    esac
    for CHILD_PID in $(pgrep -P "$PID")
    do
        CHILD_PID_NAME=$(cat /proc/"$CHILD_PID"/cmdline || echo "unknown")
        CHILD_OOM_ADJ=$(cat /proc/"$CHILD_PID"/oom_score_adj || echo "unknown")
        case "$CHILD_OOM_ADJ" in
            "unknown"*) log "$PID has child: $CHILD_PID($CHILD_PID_NAME), which was missing, ignoring"; continue ;;
            "0")        log "$PID has child: $CHILD_PID($CHILD_PID_NAME) with oom_score_adj: $CHILD_OOM_ADJ"; continue ;;
            *)          log "$PID has child: $CHILD_PID($CHILD_PID_NAME), oom_score_adj was set to 0: $CHILD_OOM_ADJ" ;;
        esac
        # The child may have started _after_ this script ran on the parent (or other invocation of) buildkit-runc.
        # This undoes the inherited adjustment to make sure we behave properly at OOM time.
        adjust_oom "0" "$CHILD_PID"
    done
    PID_NAME=$(cat /proc/"$PID"/cmdline || echo "unknown")
    case "$PID_NAME" in
        # Just in case the process exec-ed another program between the initial filter and now.
        "buildkit-runcinit"*) log "$PID was buildkit-runcinit, no longer a candidate for OOM adjustment, ignoring"; continue ;;
        "buildkit-runc"*)     log "$PID oom_score_adj was set to 0" ;;
        *)                    log "$PID was $PID_NAME, no longer a candidate for OOM adjustment, ignoring"; continue ;;
    esac
    adjust_oom "$OOM_ADJ" "$PID"
done
======== OOM Adjust script ==========
Detected container architecture is aarch64

This happens sometimes when running an earthly target that does a lot of work, normally it works fine on re-run.

What should have happened?

Not this

Other Helpful Information

@jbaldwinroberts jbaldwinroberts added the type:bug Something isn't working label Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
Status: No status
Development

No branches or pull requests

1 participant