On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

keskival · 2022-12-06T13:34:14Z

Steps to reproduce the problem

Install Mastodon from the Helm chart to a multi-node Kubernetes cluster with an NFS storage class.
If the mastodon-web and mastodon-sidekiq-all-queues end up on different nodes, some of them will hang indefinitely on "ContainerCreating".

They are waiting to mount the persistence volumes system and assets. These can only be mounted on a single node at a time.

Expected behaviour

Everything should work on roughly default settings

Actual behaviour

The pods hang in ContainerCreating state in a difficult to understand way.

Detailed description

The default settings are non-functional on multi-node clusters. Either there needs to be a better comment warning to set pod affinities, the default mode should be ReadWriteMany, or there should be a pod affinity defined which puts these two kinds of pods to the same nodes by default.

Specifications

Mastodon: edge
OS: Ubuntu
Kubernetes: MicroK8S
Nodes: 2+

keskival · 2022-12-07T23:34:30Z

This same problem also spans to the Job mastodon-db-migrate, for which there doesn't seem to be a separate place to set nodeAffinity by values.yaml.

However, there the Helm chart includes function to set podAffinity to make it co-located with app.kubernetes.io/part-of=rails:
https://github.com/mastodon/mastodon/blob/ed07f10ca8d4e65ec58958f300a8bb7c762ccbbd/chart/templates/job-db-migrate.yaml#L22-L35

Similar setting should be added to sidekiq and mastodon-web deployments as well to make them co-locate with each other if ReadWriteOnce is set.

keskival · 2022-12-10T16:46:50Z

Added an in-progress PR here: #13

WilyWildWilly · 2023-12-22T07:10:43Z

Hi, have you tried setting the persistence as ReadWriteMany? I ask because I'm setting up a single-node cluster for now but will shift to multi-node in a second moment and I'd like to avoid running into this pitfall. And I don't know if setting ReadWriteMany can work to have multiple pods with Sidekiq and Rails instances possibly not staying on the same pods like it happened to you.

keskival · 2024-01-16T13:17:07Z

ReadWriteMany works, but of course requires support for it from the storage class. Alternatively you can force the pods to co-locate, which kind of moots the point of having a multi-node cluster in the first place.

keskival added the bug Something isn't working label Dec 6, 2022

keskival mentioned this issue Dec 10, 2022

Added pod affinity if the default ReadWriteOnce is used. #13

Open

ineffyble transferred this issue from mastodon/mastodon Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

keskival commented Dec 6, 2022 •

edited

keskival commented Dec 7, 2022 •

edited

keskival commented Dec 10, 2022

WilyWildWilly commented Dec 22, 2023

keskival commented Jan 16, 2024 •

edited

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

Comments

keskival commented Dec 6, 2022 • edited

Steps to reproduce the problem

Expected behaviour

Actual behaviour

Detailed description

Specifications

keskival commented Dec 7, 2022 • edited

keskival commented Dec 10, 2022

WilyWildWilly commented Dec 22, 2023

keskival commented Jan 16, 2024 • edited

keskival commented Dec 6, 2022 •

edited

keskival commented Dec 7, 2022 •

edited

keskival commented Jan 16, 2024 •

edited