Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High memory usage. #2637

Open
r2k1 opened this issue Mar 14, 2024 · 10 comments
Open

High memory usage. #2637

r2k1 opened this issue Mar 14, 2024 · 10 comments
Labels
E3 Estimated level of Effort (1 is easiest, 4 is hardest) kubecost Relevant to Kubecost's downstream project needs-follow-up needs-triage opencost OpenCost issues vs. external/downstream P2 Estimated Priority (P0 is highest, P4 is lowest)

Comments

@r2k1
Copy link
Contributor

r2k1 commented Mar 14, 2024

I'm focused on reducing the memory use of Opencost+Prometheus.

My setup for Opencost and Prometheus is fine-tuned: it keeps data for a short time, scrapes only the essential metrics and labels needed by Opencost, and runs on many different clusters. I only ask for the past 24 hours of data and don't use caching.

I found that optimized setup of Opencost+Prometheus uses about 200MB of memory, plus an extra 0.5MB for each container.

I did a test on a cluster with 100 nodes and 25,000 pods, with some pods being replaced. Here are my results, along with some links to the pprof data:

img
img_1
img_2
img_3

pprof links:

I found opencost heap showing "Memory In-Use Bytes" is most insightful. It's tricky to catch heap at peak consumption and "Allocated Bytes Total" can be noisy and misleading. But I also may read it incorrectly.

In this test, memory use was about 7GB, with peaks up to 11GB, mostly when getting data from Opencost. Split half-half between opencost and prometheus.

My observations:

  • It's hard to tell from the Prometheus TSDB report, but I think most of the memory usage comes from storing every pod label. Different label values are aggregated in different buckets, so they're usually invisible in top10. This also influence opencost memory usage when querying.
  • Opencost seems to keep almost every Kubernetes object fully in memory, but only a few fields are used. The watcher and cache took up 56% of the memory, while the metrics emitter used only 4%.

I'd appreciate any ideas to lower the memory usage.

@mattray
Copy link
Collaborator

mattray commented Mar 14, 2024

Great data! Tagging the @opencost/opencost-helm-chart-maintainers to see if anyone wants to add in anything.

@mattray mattray added opencost OpenCost issues vs. external/downstream P2 Estimated Priority (P0 is highest, P4 is lowest) kubecost Relevant to Kubecost's downstream project E3 Estimated level of Effort (1 is easiest, 4 is hardest) labels Mar 14, 2024
@AjayTripathy
Copy link
Contributor

Artur, thank you so much for putting this together.

Opencost seems to keep almost every Kubernetes object fully in memory, but only a few fields are used. The watcher and cache took up 56% of the memory, while the metrics emitter used only 4%.

This seems like the simplest thing to carry forward; these paths should be well-tested so if we can find a way to not store all this data in the watcher, that would be ideal.

@AjayTripathy
Copy link
Contributor

#2641
#2642

Track the two major insights and work we'd consider doing to reduce the memory profiles based on these findings.

@AjayTripathy
Copy link
Contributor

Also, @r2k1 is there any way we could open source how to spin up your memory testing framework so we can test any PRs for #2641 and #2641 against a consistent benchmark?

@r2k1
Copy link
Contributor Author

r2k1 commented Mar 15, 2024

It would be nice to have a more automated way, but here is what I've done.

From what I've seen the usage is more or less linear with cluster size. 100 nodes cluster full of pods consumed roughly 10 times more than 10 nodes cluster.

az aks create --name artur-perf-test --resource-group artur --node-vm-size Standard_E2ps_v5 --node-count 10 --enable-managed-identity --tier standard --enable-cost-analysis  --max-pods 250 
az aks get-credentials --resource-group artur --name artur-perf-test --overwrite-existing

I think this cluster roughly costs 1$/hour.

I used kube-burner to generate some load.
There is another tool to generate load, but I found it's more difficult to use: clusterload2

Here is kube-burner script. It just fills cluster with the same pods. It churns a portion of it. In addition, you may run it multiple times.

cat << EOF > kubelet-density.yaml
# Config for 10 nodes clusters (250 pods), proportionally adjust "jobIterations" or "replicas" for other cluster sizes
---
jobs:
  - name: churning
    preLoadImages: false
    jobIterations: 50
    namespacedIterations: true
    namespace: churning
    waitWhenFinished: true
    podWait: false

    churn: true
    churnPercent: 50
    churnDuration: 1m

    objects:
      - objectTemplate: deployment.yaml
        replicas: 20
        inputVars:
          containerImage: registry.k8s.io/pause:3.1
EOF

cat <<EOF > deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kubelet-density-deployment-v3-{{.Iteration}}-{{.Replica}}
spec:
  replicas: 3
  selector:
    matchLabels:
      app: kubelet-density-{{.Iteration}}-{{.Replica}}
  template:
    metadata:
      labels:
        app: kubelet-density-{{.Iteration}}-{{.Replica}}
        label1: label-{{.Iteration}}-{{.Replica}}
        label2: label-{{.Iteration}}-{{.Replica}}
        label3: label-{{.Iteration}}-{{.Replica}}
        label4: label-{{.Iteration}}-{{.Replica}}
        label5: label-{{.Iteration}}-{{.Replica}}
        label6: label-{{.Iteration}}-{{.Replica}}
        label7: label-{{.Iteration}}-{{.Replica}}
        label8: label-{{.Iteration}}-{{.Replica}}
        label9: label-{{.Iteration}}-{{.Replica}}
        label10: label-{{.Iteration}}-{{.Replica}}
        label11: label-{{.Iteration}}-{{.Replica}}
        label12: label-{{.Iteration}}-{{.Replica}}
        label13: label-{{.Iteration}}-{{.Replica}}
        label14: label-{{.Iteration}}-{{.Replica}}
        label15: label-{{.Iteration}}-{{.Replica}}
        label16: label-{{.Iteration}}-{{.Replica}}
        label17: label-{{.Iteration}}-{{.Replica}}
        label18: label-{{.Iteration}}-{{.Replica}}
        label19: label-{{.Iteration}}-{{.Replica}}
        label20: label-{{.Iteration}}-{{.Replica}}
        label21: label-{{.Iteration}}-{{.Replica}}
        label22: label-{{.Iteration}}-{{.Replica}}
        label23: label-{{.Iteration}}-{{.Replica}}
        label24: label-{{.Iteration}}-{{.Replica}}
        label25: label-{{.Iteration}}-{{.Replica}}
        label26: label-{{.Iteration}}-{{.Replica}}
        label27: label-{{.Iteration}}-{{.Replica}}
        label28: label-{{.Iteration}}-{{.Replica}}
        label29: label-{{.Iteration}}-{{.Replica}}
        label30: label-{{.Iteration}}-{{.Replica}}
    spec:
      containers:
      - name: kubelet-density-1
        image: {{.containerImage}}
        ports:
        - containerPort: 8080
          protocol: TCP
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: false
        resources:
          requests:
            cpu: "1m"
            memory: "1Ki"
      - name: kubelet-density-2
        image: {{.containerImage}}
        ports:
          - containerPort: 8080
            protocol: TCP
        imagePullPolicy: IfNotPresent
        securityContext:
          privileged: false
        resources:
          requests:
            cpu: "1m"
            memory: "1Ki"

EOF
kube-burner init -c kubelete-density.yaml

Analyse load in prometheus:

kubectl port-forward -n kube-system deployment/cost-analysis-agent 9092 9094
open http://localhost:9092/

Here is some useful prometheus queries

# Memory usage
container_memory_working_set_bytes{pod=~"cost-analysis-agent-.*"}

# Amount of containers
count(container_cpu_usage_seconds_total)

# Total container count
count(present_over_time(container_cpu_usage_seconds_total[1d]))`
# Max container count
max_over_time(count(container_cpu_usage_seconds_total)[1d:10m])
# CPU Usage
rate(container_cpu_usage_seconds_total{pod=~"cost-analysis-agent-.*"}[5m])

Generate some load. (Note, opencost is requested indirectly)

set -e
current_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

# Get the time 24 hours ago in the required format
time_24_hours_ago=$(date -u -v-24H +"%Y-%m-%dT%H:%M:%SZ")

# Use these times in the curl command
curl "http://localhost:9094/resources/v1?from=$time_24_hours_ago&to=$current_time" > /dev/null
curl "http://localhost:9003/debug/pprof/heap" > heap-opencost.pprof
curl "http://localhost:9092/debug/pprof/heap" > heap-prometheus.pprof
curl http://localhost:9090/api/v1/status/tsdb?limit=100 > tsdb-status.json

@r2k1
Copy link
Contributor Author

r2k1 commented Mar 15, 2024

If you want to test a change and iterate on it. Here is a hackish solution I used:

https://github.com/opencost/opencost/compare/develop..memtest

I usually just run memory profiler for a single test in GoLand:
image

@AjayTripathy
Copy link
Contributor

Thank you for the detailed post!

cc @jessegoodier and @thomasvn -- perhaps this can form the basis for automated scale testing?

@thomasvn
Copy link
Contributor

Yep, agree. Thanks @r2k1 for the detailed writeup here! Very helpful.

You mention that resource usage grows linearly with the number of nodes/pods. When you ran your experiment, did you also notice resource usage grew linearly with increased number of labels per pod? Or did it grow faster than linearly?

@r2k1
Copy link
Contributor Author

r2k1 commented Mar 19, 2024

Sorry, I didn't measure properly how label count affects it.

@thomasvn
Copy link
Contributor

No problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
E3 Estimated level of Effort (1 is easiest, 4 is hardest) kubecost Relevant to Kubecost's downstream project needs-follow-up needs-triage opencost OpenCost issues vs. external/downstream P2 Estimated Priority (P0 is highest, P4 is lowest)
Projects
None yet
Development

No branches or pull requests

4 participants