Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added memory requests to be more realistic. #14

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

keskival
Copy link

@keskival keskival commented Dec 10, 2022

This makes Kubernetes make better choices about where
to schedule the pods, and communicates to the administrators
about the minimum sensible resource requirements.

On a single user Mastodon instance on a three node Kubernetes
after a week of so use we get these memory uses per pod:

tero@arcones:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)
mastodon-elasticsearch-coordinating-0          6m           403Mi
mastodon-elasticsearch-coordinating-1          28m          189Mi
mastodon-elasticsearch-data-0                  10m          1432Mi
mastodon-elasticsearch-data-1                  5m           1513Mi
mastodon-elasticsearch-ingest-0                6m           418Mi
mastodon-elasticsearch-ingest-1                6m           396Mi
mastodon-elasticsearch-master-0                24m          466Mi
mastodon-elasticsearch-master-1                10m          221Mi
mastodon-postgresql-0                          12m          276Mi
mastodon-redis-master-0                        16m          37Mi
mastodon-redis-replicas-0                      7m           34Mi
mastodon-sidekiq-all-queues-549b4bb7b4-zvj2m   266m         499Mi
mastodon-streaming-78465f778d-6xfg2            1m           96Mi
mastodon-web-774c5c94f9-f5bhz                  22m          418Mi

Hence we make the following adjustments to Bitnami defaults:

  • mastodon-elasticsearch-coordinating: 256Mi->512Mi
  • mastodon-elasticsearch-data: The default 2048Mi is ok.
  • mastodon-elasticsearch-master: 256Mi->512Mi
  • mastodon-redis-master: 0->56Mi
  • mastodon-redis-replicas: 0->56Mi
  • mastodon-postgresql: 256->384Mi

And for Mastodon defaults:

  • mastodon-sidekiq-all-queues: 0->512Mi
  • mastodon-streaming: 0->128Mi
  • mastodon-web: 0->512Mi

The original idea of keeping these requests zero is a good default when
minimal requirements are unknown. However, from a single user node
we get minimal requirements and having the limits as zero only leads
to trouble for people.
Of course the system requirements will change over time, but they
are chiefly expected to go upwards.

@keskival
Copy link
Author

I realize this is an opinionated change, and feel free to close it if you disagree with the rationale.
I am in progress of testing this, putting it here already for transparency.

@keskival
Copy link
Author

keskival commented Dec 10, 2022

This has been tested as far as template generation goes. The template is generated as one would expect.

Confirmed working in a cluster as well.

@keskival keskival force-pushed the janitorial/updated-memory-requests-to-be-more-realistic branch 2 times, most recently from bcd7bca to a089c04 Compare December 11, 2022 00:01
@deepy
Copy link
Contributor

deepy commented Dec 12, 2022

single-user instance here as well, but I'm seeing 80Mi for redis and my sidekiq usage is just right next to 700MiB

I think adding these to values.yaml as documentation (commented-out) is a good idea though

@keskival
Copy link
Author

@deepy, thanks for an added validation! Commented out suggestions would be fine as well. However, as default values in my opinion these would be better than zeros which they are now.

Requests denote the minimum required for a pod to be scheduled on a node, so the values in this PR would be better than no such requests set for any instance, even yours with slightly higher usages. Of course you might want to tune them up a bit still in your cluster.

Limits are not set, so the pods can take as much memory as they want. The suggested requests don't affect that.

values.yaml Show resolved Hide resolved
@keskival keskival force-pushed the janitorial/updated-memory-requests-to-be-more-realistic branch from 3647e87 to f43f1ca Compare December 17, 2022 12:08
@renchap
Copy link
Sponsor Member

renchap commented Feb 17, 2023

Thanks for this, better defaults would definitely be a good idea.

FYI, here is out current usage for mastodon.online:

mastodon-web: 1300M
mastodon-sidekiq-default-push-ingress: 900M
mastodon-sidekiq-pull: 1000M
mastodon-streaming: 200M

Could you rebase your PR to latest main?

Also, do you have an opinion about adding default limits as well, at least for Mastodon processes?

Something like 300M for streaming, 2000M for web, 1500M for Sidekiq? I feel like its a good idea to have some to avoid long-running memory leaks.

This makes Kubernetes make better choices about where
to schedule the pods, and communicates to the administrators
about the minimum sensible resource requirements.

On a single user Mastodon instance on a three node Kubernetes
after a week of so use we get these memory uses per pod:
```
tero@arcones:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)
mastodon-elasticsearch-coordinating-0          6m           403Mi
mastodon-elasticsearch-coordinating-1          28m          189Mi
mastodon-elasticsearch-data-0                  10m          1432Mi
mastodon-elasticsearch-data-1                  5m           1513Mi
mastodon-elasticsearch-ingest-0                6m           418Mi
mastodon-elasticsearch-ingest-1                6m           396Mi
mastodon-elasticsearch-master-0                24m          466Mi
mastodon-elasticsearch-master-1                10m          221Mi
mastodon-postgresql-0                          12m          276Mi
mastodon-redis-master-0                        16m          37Mi
mastodon-redis-replicas-0                      7m           34Mi
mastodon-sidekiq-all-queues-549b4bb7b4-zvj2m   266m         499Mi
mastodon-streaming-78465f778d-6xfg2            1m           96Mi
mastodon-web-774c5c94f9-f5bhz                  22m          418Mi
```

Hence we make the following adjustments to Bitnami defaults:
- `mastodon-elasticsearch-coordinating`: `256Mi->512Mi`
- `mastodon-elasticsearch-data`: The default `2048Mi` is ok.
- `mastodon-elasticsearch-master`: `256Mi->512Mi`
- `mastodon-redis-master`: `0->56Mi`
- `mastodon-redis-replicas`: `0->56Mi`
- `mastodon-postgresql`: `256->384Mi`

And for Mastodon defaults:
- `mastodon-sidekiq-all-queues`: `0->512Mi`
- `mastodon-streaming`: `0->128Mi`
- `mastodon-web`: `0->512Mi`

The original idea of keeping these requests zero is a good default when
minimal requirements are unknown. However, from a single user node
we get minimal requirements and having the limits as zero only leads
to trouble for people.
Of course the system requirements will change over time, but they
are chiefly expected to go upwards.
@keskival keskival force-pushed the janitorial/updated-memory-requests-to-be-more-realistic branch from f43f1ca to a7419b5 Compare February 23, 2023 18:34
@keskival
Copy link
Author

keskival commented Feb 23, 2023

Also, do you have an opinion about adding default limits as well, at least for Mastodon processes?

Rebased!

I would be hesitant of adding limits, unless really high. These could cause sudden problems to people. For example my current numbers are:

tero@betanzos:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)   
mastodon-elasticsearch-coordinating-0          9m           376Mi           
mastodon-elasticsearch-coordinating-1          6m           217Mi           
mastodon-elasticsearch-coordinating-2          10m          185Mi           
mastodon-elasticsearch-data-0                  7m           1302Mi          
mastodon-elasticsearch-data-1                  12m          733Mi           
mastodon-elasticsearch-data-2                  9m           1000Mi          
mastodon-elasticsearch-ingest-0                6m           357Mi           
mastodon-elasticsearch-ingest-1                10m          244Mi           
mastodon-elasticsearch-ingest-2                15m          190Mi           
mastodon-elasticsearch-master-0                12m          223Mi           
mastodon-elasticsearch-master-1                61m          436Mi           
mastodon-elasticsearch-master-2                16m          280Mi           
mastodon-postgresql-0                          26m          1551Mi          
mastodon-redis-master-0                        26m          128Mi           
mastodon-redis-replicas-0                      17m          129Mi           
mastodon-redis-replicas-1                      16m          135Mi           
mastodon-redis-replicas-2                      14m          129Mi           
mastodon-sidekiq-all-queues-7cdbd75cdd-99mp7   545m         2487Mi          
mastodon-streaming-58f74f74c4-vwldv            1m           82Mi            
mastodon-web-948bd9cc-xxr6h                    51m          4045Mi  

The instance is rukii.net and has 6 users now.
Current web pod memory use is high now probably because I'm running all sorts of scheduled tootctl scripts there now. It's also possible I have a long-running memory leak there.

@renchap
Copy link
Sponsor Member

renchap commented Feb 23, 2023

We know that we have a memory leak in the ingress queue at the moment at least, this is one of the reason I am suggesting having memory limits in place by default to correctly restart pods with an abnormal memory usage.

I should be able to give you more data from mastodon.social soon, I guess if we base the limits on this instance's usage then everybody else should be fine :)

@keskival
Copy link
Author

Can we add the limits in a separate PR, because I think they require a separate discussion and be liable to be closed or reverted in isolation?

@keskival
Copy link
Author

keskival commented Feb 25, 2023

I just tested setting the memory limit on web and sidekiq pods to 2048MB, seems to work so far but might cause problems for running heavy tootctl commands like refresh on those same pods.
Newly started pods take much less memory than ones that had been running for weeks. That is, they are less than 512 MB now when they were many gigabytes after having run for weeks.

@abbottmg
Copy link
Contributor

abbottmg commented Oct 7, 2023

I agree that setting limits in a separate PR is worthwhile. I strongly second @renchap's opinion that limits should skew low as a counter to memory leaks within both web and sidekiq. The ingress leak was a big motivating factor in my shift from a minimal docker-compose stack to a fuller k8s deployment. I also don't think 4Gi is a reasonable memory stat for a server of 6 users, regardless of uptime. It's been rare for my ~15 user instance to top 700Mi at peak hours, even when we participated in a couple large relays as well. That points to some potential leaks IMHO, but that's neither here nor there.

To @keskival's point, I think that just goes to show there's more discussion to be had there. I also think a separate issue/PR will give scope to add HPAs to our toolbox here. With a memoryUtilization target in a sweet spot between request and limit, the autoscaler can spin up a fresh instance in time to create a rolling replacement with no downtime.

The sidekiq pods obviously lend themselves well to autoscaling, but I've also been running two web pods in parallel without issue. I know the admin at infosec.exchange needed to implement sessionAffinity because their S3 upload was really slow and multipart uploads were getting split between web pods. I haven't run into that problem, but it appears to be a minor hurdle anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants