-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage. #2637
Comments
Great data! Tagging the @opencost/opencost-helm-chart-maintainers to see if anyone wants to add in anything. |
Artur, thank you so much for putting this together.
This seems like the simplest thing to carry forward; these paths should be well-tested so if we can find a way to not store all this data in the watcher, that would be ideal. |
It would be nice to have a more automated way, but here is what I've done. From what I've seen the usage is more or less linear with cluster size. 100 nodes cluster full of pods consumed roughly 10 times more than 10 nodes cluster.
I think this cluster roughly costs 1$/hour. I used kube-burner to generate some load. Here is kube-burner script. It just fills cluster with the same pods. It churns a portion of it. In addition, you may run it multiple times. cat << EOF > kubelet-density.yaml
# Config for 10 nodes clusters (250 pods), proportionally adjust "jobIterations" or "replicas" for other cluster sizes
---
jobs:
- name: churning
preLoadImages: false
jobIterations: 50
namespacedIterations: true
namespace: churning
waitWhenFinished: true
podWait: false
churn: true
churnPercent: 50
churnDuration: 1m
objects:
- objectTemplate: deployment.yaml
replicas: 20
inputVars:
containerImage: registry.k8s.io/pause:3.1
EOF
cat <<EOF > deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kubelet-density-deployment-v3-{{.Iteration}}-{{.Replica}}
spec:
replicas: 3
selector:
matchLabels:
app: kubelet-density-{{.Iteration}}-{{.Replica}}
template:
metadata:
labels:
app: kubelet-density-{{.Iteration}}-{{.Replica}}
label1: label-{{.Iteration}}-{{.Replica}}
label2: label-{{.Iteration}}-{{.Replica}}
label3: label-{{.Iteration}}-{{.Replica}}
label4: label-{{.Iteration}}-{{.Replica}}
label5: label-{{.Iteration}}-{{.Replica}}
label6: label-{{.Iteration}}-{{.Replica}}
label7: label-{{.Iteration}}-{{.Replica}}
label8: label-{{.Iteration}}-{{.Replica}}
label9: label-{{.Iteration}}-{{.Replica}}
label10: label-{{.Iteration}}-{{.Replica}}
label11: label-{{.Iteration}}-{{.Replica}}
label12: label-{{.Iteration}}-{{.Replica}}
label13: label-{{.Iteration}}-{{.Replica}}
label14: label-{{.Iteration}}-{{.Replica}}
label15: label-{{.Iteration}}-{{.Replica}}
label16: label-{{.Iteration}}-{{.Replica}}
label17: label-{{.Iteration}}-{{.Replica}}
label18: label-{{.Iteration}}-{{.Replica}}
label19: label-{{.Iteration}}-{{.Replica}}
label20: label-{{.Iteration}}-{{.Replica}}
label21: label-{{.Iteration}}-{{.Replica}}
label22: label-{{.Iteration}}-{{.Replica}}
label23: label-{{.Iteration}}-{{.Replica}}
label24: label-{{.Iteration}}-{{.Replica}}
label25: label-{{.Iteration}}-{{.Replica}}
label26: label-{{.Iteration}}-{{.Replica}}
label27: label-{{.Iteration}}-{{.Replica}}
label28: label-{{.Iteration}}-{{.Replica}}
label29: label-{{.Iteration}}-{{.Replica}}
label30: label-{{.Iteration}}-{{.Replica}}
spec:
containers:
- name: kubelet-density-1
image: {{.containerImage}}
ports:
- containerPort: 8080
protocol: TCP
imagePullPolicy: IfNotPresent
securityContext:
privileged: false
resources:
requests:
cpu: "1m"
memory: "1Ki"
- name: kubelet-density-2
image: {{.containerImage}}
ports:
- containerPort: 8080
protocol: TCP
imagePullPolicy: IfNotPresent
securityContext:
privileged: false
resources:
requests:
cpu: "1m"
memory: "1Ki"
EOF
kube-burner init -c kubelete-density.yaml Analyse load in prometheus: kubectl port-forward -n kube-system deployment/cost-analysis-agent 9092 9094
open http://localhost:9092/ Here is some useful prometheus queries
Generate some load. (Note, opencost is requested indirectly) set -e
current_time=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
# Get the time 24 hours ago in the required format
time_24_hours_ago=$(date -u -v-24H +"%Y-%m-%dT%H:%M:%SZ")
# Use these times in the curl command
curl "http://localhost:9094/resources/v1?from=$time_24_hours_ago&to=$current_time" > /dev/null
curl "http://localhost:9003/debug/pprof/heap" > heap-opencost.pprof
curl "http://localhost:9092/debug/pprof/heap" > heap-prometheus.pprof
curl http://localhost:9090/api/v1/status/tsdb?limit=100 > tsdb-status.json |
If you want to test a change and iterate on it. Here is a hackish solution I used: https://github.com/opencost/opencost/compare/develop..memtest I usually just run memory profiler for a single test in GoLand: |
Thank you for the detailed post! cc @jessegoodier and @thomasvn -- perhaps this can form the basis for automated scale testing? |
Yep, agree. Thanks @r2k1 for the detailed writeup here! Very helpful. You mention that resource usage grows linearly with the number of nodes/pods. When you ran your experiment, did you also notice resource usage grew linearly with increased number of labels per pod? Or did it grow faster than linearly? |
Sorry, I didn't measure properly how label count affects it. |
No problem! |
I'm focused on reducing the memory use of Opencost+Prometheus.
My setup for Opencost and Prometheus is fine-tuned: it keeps data for a short time, scrapes only the essential metrics and labels needed by Opencost, and runs on many different clusters. I only ask for the past 24 hours of data and don't use caching.
I found that optimized setup of Opencost+Prometheus uses about 200MB of memory, plus an extra 0.5MB for each container.
I did a test on a cluster with 100 nodes and 25,000 pods, with some pods being replaced. Here are my results, along with some links to the pprof data:
pprof links:
I found opencost heap showing "Memory In-Use Bytes" is most insightful. It's tricky to catch heap at peak consumption and "Allocated Bytes Total" can be noisy and misleading. But I also may read it incorrectly.
In this test, memory use was about 7GB, with peaks up to 11GB, mostly when getting data from Opencost. Split half-half between opencost and prometheus.
My observations:
I'd appreciate any ideas to lower the memory usage.
The text was updated successfully, but these errors were encountered: