Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On multi-node Kubernetes, the default settings on ReadWriteOnce without pod affinity are non-functional #17

Open
keskival opened this issue Dec 6, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@keskival
Copy link

keskival commented Dec 6, 2022

Steps to reproduce the problem

  1. Install Mastodon from the Helm chart to a multi-node Kubernetes cluster with an NFS storage class.
  2. If the mastodon-web and mastodon-sidekiq-all-queues end up on different nodes, some of them will hang indefinitely on "ContainerCreating".

They are waiting to mount the persistence volumes system and assets. These can only be mounted on a single node at a time.

Expected behaviour

Everything should work on roughly default settings

Actual behaviour

The pods hang in ContainerCreating state in a difficult to understand way.

Detailed description

The default settings are non-functional on multi-node clusters. Either there needs to be a better comment warning to set pod affinities, the default mode should be ReadWriteMany, or there should be a pod affinity defined which puts these two kinds of pods to the same nodes by default.

Specifications

Mastodon: edge
OS: Ubuntu
Kubernetes: MicroK8S
Nodes: 2+

@keskival keskival added the bug Something isn't working label Dec 6, 2022
@keskival
Copy link
Author

keskival commented Dec 7, 2022

This same problem also spans to the Job mastodon-db-migrate, for which there doesn't seem to be a separate place to set nodeAffinity by values.yaml.

However, there the Helm chart includes function to set podAffinity to make it co-located with app.kubernetes.io/part-of=rails:
https://github.com/mastodon/mastodon/blob/ed07f10ca8d4e65ec58958f300a8bb7c762ccbbd/chart/templates/job-db-migrate.yaml#L22-L35

Similar setting should be added to sidekiq and mastodon-web deployments as well to make them co-locate with each other if ReadWriteOnce is set.

@keskival
Copy link
Author

Added an in-progress PR here: #13

@ineffyble ineffyble transferred this issue from mastodon/mastodon Dec 13, 2022
@WilyWildWilly
Copy link

Hi, have you tried setting the persistence as ReadWriteMany? I ask because I'm setting up a single-node cluster for now but will shift to multi-node in a second moment and I'd like to avoid running into this pitfall. And I don't know if setting ReadWriteMany can work to have multiple pods with Sidekiq and Rails instances possibly not staying on the same pods like it happened to you.

@keskival
Copy link
Author

keskival commented Jan 16, 2024

ReadWriteMany works, but of course requires support for it from the storage class. Alternatively you can force the pods to co-locate, which kind of moots the point of having a multi-node cluster in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants