Upgrade flow for replicated setups #2630
Replies: 2 comments 3 replies
-
At the moment they straight up just refuse to replicate between disparate versions, so that's fairly easy. @Firstyear was talking about a future path however with out-of-sync-version-upgrades, which is partially helped by the upgrade check framework we're currently putting in place. |
Beta Was this translation helpful? Give feedback.
-
These steps are all fine. The big point here is that currently once you restart the primary, it will refuse to sync with the secondary until you also upgrade the secondary. Then they will re-sync and continue. As @yaleman said, we require "identical versions" on all nodes for the moment. Replication will automatically tell you if things get out of sync, Kanidm has internal checks for this. In the future the process would be similar, but version N could sync with N minus 1, but would only operate at a functional level (aka domain level) of N minus 1. Once the upgrades are complete, you would signal the server to raise the domain level, which then would "replicate out" to all nodes that they can raise their behaviour to version N. This way you can do a roll out over multiple nodes, then once complete, flag that the newer version features can be used. This way an older node wouldn't get changes it can't understand. However, to achieve that requires a lot more testing around this, and while we have some of that in place now, and while all the frameworks and tools are there, we don't want to support this yet because it adds extra risks for a still "relatively young" feature. |
Beta Was this translation helpful? Give feedback.
-
I'm about to introduce replication to my production setup and played with it on a staging environment before that. While all seems to work fine, I was wondering how I would execute future upgrades. The docs currently don't mention that part, unless I missed something.
I have the primary-secondary setup (active-passive) as described in the docs, so my primary node is always right in case of conflicts and the secondary only takes over if the primary goes down. In front of it I have a HAProxy managing the failover.
My first thought was, I could just nuke my secondary and act like there's no replication to keep things simple for the upgrade. However, the/one point of replication is to have higher availability so while this works (and might be sufficient for some who just want to schedule a maintenance window) it's not as nice as a rolling upgrade.
Now let's assume I want to be as available as possible. What would be the recommended flow/order to upgrade the cluster? I could image it's something like:
Looking forward to get some insights. :)
Beta Was this translation helpful? Give feedback.
All reactions