Added memory requests to be more realistic. #14

keskival · 2022-12-10T16:33:23Z

This makes Kubernetes make better choices about where
to schedule the pods, and communicates to the administrators
about the minimum sensible resource requirements.

On a single user Mastodon instance on a three node Kubernetes
after a week of so use we get these memory uses per pod:

tero@arcones:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)
mastodon-elasticsearch-coordinating-0          6m           403Mi
mastodon-elasticsearch-coordinating-1          28m          189Mi
mastodon-elasticsearch-data-0                  10m          1432Mi
mastodon-elasticsearch-data-1                  5m           1513Mi
mastodon-elasticsearch-ingest-0                6m           418Mi
mastodon-elasticsearch-ingest-1                6m           396Mi
mastodon-elasticsearch-master-0                24m          466Mi
mastodon-elasticsearch-master-1                10m          221Mi
mastodon-postgresql-0                          12m          276Mi
mastodon-redis-master-0                        16m          37Mi
mastodon-redis-replicas-0                      7m           34Mi
mastodon-sidekiq-all-queues-549b4bb7b4-zvj2m   266m         499Mi
mastodon-streaming-78465f778d-6xfg2            1m           96Mi
mastodon-web-774c5c94f9-f5bhz                  22m          418Mi

Hence we make the following adjustments to Bitnami defaults:

mastodon-elasticsearch-coordinating: 256Mi->512Mi
mastodon-elasticsearch-data: The default 2048Mi is ok.
mastodon-elasticsearch-master: 256Mi->512Mi
mastodon-redis-master: 0->56Mi
mastodon-redis-replicas: 0->56Mi
mastodon-postgresql: 256->384Mi

And for Mastodon defaults:

mastodon-sidekiq-all-queues: 0->512Mi
mastodon-streaming: 0->128Mi
mastodon-web: 0->512Mi

The original idea of keeping these requests zero is a good default when
minimal requirements are unknown. However, from a single user node
we get minimal requirements and having the limits as zero only leads
to trouble for people.
Of course the system requirements will change over time, but they
are chiefly expected to go upwards.

keskival · 2022-12-10T16:34:04Z

I realize this is an opinionated change, and feel free to close it if you disagree with the rationale.
I am in progress of testing this, putting it here already for transparency.

keskival · 2022-12-10T19:46:54Z

This has been tested as far as template generation goes. The template is generated as one would expect.

Confirmed working in a cluster as well.

deepy · 2022-12-12T11:37:27Z

single-user instance here as well, but I'm seeing 80Mi for redis and my sidekiq usage is just right next to 700MiB

I think adding these to values.yaml as documentation (commented-out) is a good idea though

keskival · 2022-12-13T00:47:14Z

@deepy, thanks for an added validation! Commented out suggestions would be fine as well. However, as default values in my opinion these would be better than zeros which they are now.

Requests denote the minimum required for a pod to be scheduled on a node, so the values in this PR would be better than no such requests set for any instance, even yours with slightly higher usages. Of course you might want to tune them up a bit still in your cluster.

Limits are not set, so the pods can take as much memory as they want. The suggested requests don't affect that.

values.yaml

renchap · 2023-02-17T20:47:20Z

Thanks for this, better defaults would definitely be a good idea.

FYI, here is out current usage for mastodon.online:

mastodon-web: 1300M
mastodon-sidekiq-default-push-ingress: 900M
mastodon-sidekiq-pull: 1000M
mastodon-streaming: 200M

Could you rebase your PR to latest main?

Also, do you have an opinion about adding default limits as well, at least for Mastodon processes?

Something like 300M for streaming, 2000M for web, 1500M for Sidekiq? I feel like its a good idea to have some to avoid long-running memory leaks.

This makes Kubernetes make better choices about where to schedule the pods, and communicates to the administrators about the minimum sensible resource requirements. On a single user Mastodon instance on a three node Kubernetes after a week of so use we get these memory uses per pod: ``` tero@arcones:~$ kubectl top pods -n mastodon NAME CPU(cores) MEMORY(bytes) mastodon-elasticsearch-coordinating-0 6m 403Mi mastodon-elasticsearch-coordinating-1 28m 189Mi mastodon-elasticsearch-data-0 10m 1432Mi mastodon-elasticsearch-data-1 5m 1513Mi mastodon-elasticsearch-ingest-0 6m 418Mi mastodon-elasticsearch-ingest-1 6m 396Mi mastodon-elasticsearch-master-0 24m 466Mi mastodon-elasticsearch-master-1 10m 221Mi mastodon-postgresql-0 12m 276Mi mastodon-redis-master-0 16m 37Mi mastodon-redis-replicas-0 7m 34Mi mastodon-sidekiq-all-queues-549b4bb7b4-zvj2m 266m 499Mi mastodon-streaming-78465f778d-6xfg2 1m 96Mi mastodon-web-774c5c94f9-f5bhz 22m 418Mi ``` Hence we make the following adjustments to Bitnami defaults: - `mastodon-elasticsearch-coordinating`: `256Mi->512Mi` - `mastodon-elasticsearch-data`: The default `2048Mi` is ok. - `mastodon-elasticsearch-master`: `256Mi->512Mi` - `mastodon-redis-master`: `0->56Mi` - `mastodon-redis-replicas`: `0->56Mi` - `mastodon-postgresql`: `256->384Mi` And for Mastodon defaults: - `mastodon-sidekiq-all-queues`: `0->512Mi` - `mastodon-streaming`: `0->128Mi` - `mastodon-web`: `0->512Mi` The original idea of keeping these requests zero is a good default when minimal requirements are unknown. However, from a single user node we get minimal requirements and having the limits as zero only leads to trouble for people. Of course the system requirements will change over time, but they are chiefly expected to go upwards.

…faults set for each pod type separately.

keskival · 2023-02-23T18:38:07Z

Also, do you have an opinion about adding default limits as well, at least for Mastodon processes?

Rebased!

I would be hesitant of adding limits, unless really high. These could cause sudden problems to people. For example my current numbers are:

tero@betanzos:~$ kubectl top pods -n mastodon
NAME                                           CPU(cores)   MEMORY(bytes)   
mastodon-elasticsearch-coordinating-0          9m           376Mi           
mastodon-elasticsearch-coordinating-1          6m           217Mi           
mastodon-elasticsearch-coordinating-2          10m          185Mi           
mastodon-elasticsearch-data-0                  7m           1302Mi          
mastodon-elasticsearch-data-1                  12m          733Mi           
mastodon-elasticsearch-data-2                  9m           1000Mi          
mastodon-elasticsearch-ingest-0                6m           357Mi           
mastodon-elasticsearch-ingest-1                10m          244Mi           
mastodon-elasticsearch-ingest-2                15m          190Mi           
mastodon-elasticsearch-master-0                12m          223Mi           
mastodon-elasticsearch-master-1                61m          436Mi           
mastodon-elasticsearch-master-2                16m          280Mi           
mastodon-postgresql-0                          26m          1551Mi          
mastodon-redis-master-0                        26m          128Mi           
mastodon-redis-replicas-0                      17m          129Mi           
mastodon-redis-replicas-1                      16m          135Mi           
mastodon-redis-replicas-2                      14m          129Mi           
mastodon-sidekiq-all-queues-7cdbd75cdd-99mp7   545m         2487Mi          
mastodon-streaming-58f74f74c4-vwldv            1m           82Mi            
mastodon-web-948bd9cc-xxr6h                    51m          4045Mi

The instance is rukii.net and has 6 users now.
Current web pod memory use is high now probably because I'm running all sorts of scheduled tootctl scripts there now. It's also possible I have a long-running memory leak there.

renchap · 2023-02-23T19:56:36Z

We know that we have a memory leak in the ingress queue at the moment at least, this is one of the reason I am suggesting having memory limits in place by default to correctly restart pods with an abnormal memory usage.

I should be able to give you more data from mastodon.social soon, I guess if we base the limits on this instance's usage then everybody else should be fine :)

keskival · 2023-02-23T20:52:50Z

Can we add the limits in a separate PR, because I think they require a separate discussion and be liable to be closed or reverted in isolation?

keskival · 2023-02-25T21:17:06Z

I just tested setting the memory limit on web and sidekiq pods to 2048MB, seems to work so far but might cause problems for running heavy tootctl commands like refresh on those same pods.
Newly started pods take much less memory than ones that had been running for weeks. That is, they are less than 512 MB now when they were many gigabytes after having run for weeks.

abbottmg · 2023-10-07T02:10:41Z

I agree that setting limits in a separate PR is worthwhile. I strongly second @renchap's opinion that limits should skew low as a counter to memory leaks within both web and sidekiq. The ingress leak was a big motivating factor in my shift from a minimal docker-compose stack to a fuller k8s deployment. I also don't think 4Gi is a reasonable memory stat for a server of 6 users, regardless of uptime. It's been rare for my ~15 user instance to top 700Mi at peak hours, even when we participated in a couple large relays as well. That points to some potential leaks IMHO, but that's neither here nor there.

To @keskival's point, I think that just goes to show there's more discussion to be had there. I also think a separate issue/PR will give scope to add HPAs to our toolbox here. With a memoryUtilization target in a sweet spot between request and limit, the autoscaler can spin up a fresh instance in time to create a rolling replacement with no downtime.

The sidekiq pods obviously lend themselves well to autoscaling, but I've also been running two web pods in parallel without issue. I know the admin at infosec.exchange needed to implement sessionAffinity because their S3 upload was really slow and multipart uploads were getting split between web pods. I haven't run into that problem, but it appears to be a minor hurdle anyway.

keskival force-pushed the janitorial/updated-memory-requests-to-be-more-realistic branch 2 times, most recently from bcd7bca to a089c04 Compare December 11, 2022 00:01

deepy reviewed Dec 14, 2022

View reviewed changes

values.yaml Show resolved Hide resolved

keskival force-pushed the janitorial/updated-memory-requests-to-be-more-realistic branch from 3647e87 to f43f1ca Compare December 17, 2022 12:08

keskival added 2 commits February 23, 2023 19:33

Removed the default values as they are now ineffective with proper de…

a7419b5

…faults set for each pod type separately.

keskival force-pushed the janitorial/updated-memory-requests-to-be-more-realistic branch from f43f1ca to a7419b5 Compare February 23, 2023 18:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added memory requests to be more realistic. #14

Added memory requests to be more realistic. #14

keskival commented Dec 10, 2022 •

edited

keskival commented Dec 10, 2022

keskival commented Dec 10, 2022 •

edited

deepy commented Dec 12, 2022

keskival commented Dec 13, 2022

renchap commented Feb 17, 2023 •

edited

keskival commented Feb 23, 2023 •

edited

renchap commented Feb 23, 2023

keskival commented Feb 23, 2023

keskival commented Feb 25, 2023 •

edited

abbottmg commented Oct 7, 2023

Added memory requests to be more realistic. #14

Are you sure you want to change the base?

Added memory requests to be more realistic. #14

Conversation

keskival commented Dec 10, 2022 • edited

keskival commented Dec 10, 2022

keskival commented Dec 10, 2022 • edited

deepy commented Dec 12, 2022

keskival commented Dec 13, 2022

renchap commented Feb 17, 2023 • edited

keskival commented Feb 23, 2023 • edited

renchap commented Feb 23, 2023

keskival commented Feb 23, 2023

keskival commented Feb 25, 2023 • edited

abbottmg commented Oct 7, 2023

keskival commented Dec 10, 2022 •

edited

keskival commented Dec 10, 2022 •

edited

renchap commented Feb 17, 2023 •

edited

keskival commented Feb 23, 2023 •

edited

keskival commented Feb 25, 2023 •

edited