Cluster upgrade not working as expected #10786
-
Describe the bugDuring the rolling upgrade of the nodes of our cluster, some operations does not work as expected.
Next, we also used the management console to find the state of the queue, and most of them was in an inconsistent state, some were found not running if seen from the inital cluster nodes (A, B, C), but the other nodes (D,E,F) sees the queue as running. When analyzing these queue using rabbitmq-queues quorom_status we found out that some of them has three quorom nodes with only followers and not leaders ! Others queue has noproc raft state in other nodes (A, B, C) in a no deterministic way Then after we made the cluster enter in a consistent state, were all the queue had the new nodes in quorom, we tried to forget the node A,B and C. First we shrink and forget the node A, but the shrink give timeout errors on few queue, and also the delete_member manual was not useful, the cluster state was unknown to us. When we manually adjusted the queue membership, we forget the node with no error. When we also shrinked the last node (get again error on some queues) we first had a "clean" state, were all queue had new nodes (D,E,F) has members. When we forget the last node, almost every queue was down. Every queue had has quorom member the node C, that we required to forget before (and we had a OK as reponse). In this step we had the major service dis service. All these errors got us to have a total inconsistent cluster state with most of the queue down, with no knowledge on why and we had to terminate the cluster in a drastic way, losing lot of (important) data. It will be important to us to discover why it happened, and how to get this to not happen again, any tips, info and suggestion will be very important to us, and for any information I will reply in this issue. I'm not sure if this happened from some sort of bug or for any other reason, but I'm sure that some some sort of check should be done from the cluster to not enter in this type of state. Reproduction stepsThese are the operations done on the node D, the node to join to the cluster:
Expected behaviorWe should have a rolling upgrade procedure more safer, to reduce the probability of inconsistent cluster state Additional contextNo response |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 5 replies
-
Resetting a node was never recommended as an upgrade step. Performing it during an upgrade step is wrong and specifically counterproductive with quorum queues, streams and Khepri (all Raft-based features). Try finding a single recommendation to use If you have reasons to "upgrade" by throwing away data, you can use a greatly simplified variation of the Blue/Green deployment where all you do is export and re-import definitions, and otherwise form a brand new cluster. The only place where The claim that you did not want to rebuild the cluster and lose any data absolutely does not add up with the aforementioned upgrade procedure that explicitly resets nodes. |
Beta Was this translation helpful? Give feedback.
-
There's a couple more things that perhaps I should clarify:
|
Beta Was this translation helpful? Give feedback.
-
Another important note: as of 3.13.0, |
Beta Was this translation helpful? Give feedback.
-
Thank you Micheal for your reply, it's very important for Us. |
Beta Was this translation helpful? Give feedback.
-
Our team Is grateful for the upgrade strategy advice, but we would like to know if there Is a known method to repair the actual situation, that is the following: |
Beta Was this translation helpful? Give feedback.
-
Our Upgrade guide now has a new section dedicated to this upgrade strategy. For the lack of a better established term, I've named it "grow-then-shrink". Some tools use the term "surge upgrades" but I find that less descriptive. |
Beta Was this translation helpful? Give feedback.
We cannot suggest much without having logs from all nodes, but there is one known scenario:
rabbitmq-diagnostics check_if_node_is_quorum_critical
in the process, you can end up with fewer than a majority replicas online. A booting node is technically online according to some metrics but it does not necessarily have sta…