You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We noticed that a graceful-takeover command can fail sometimes, not promoting a new master and giving the following message: "GracefulMasterTakeover: unexpected error: recovery not attempted. This should not happen"
If we turn off RecoverNonWriteableMaster flag, it works.
If we disable global-recoveries before the graceful-takeover, it works too.
It does not fail all the time, it's unpredictable.
It looks like we have a race condition when the NoWriteableMasterStructureWarning is triggered, because GracefulMasterTakeover start by setting the master as read_only.
Aug 17 14:18:15 db2 orchestrator: [martini] Started GET /api/graceful-master-takeover-auto/slvcluster for 10.1.1.16:63558
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO Started replication on db2:3306
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO GracefulMasterTakeover: designated master deduced to be db2:3306
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO GracefulMasterTakeover: Will let db2:3306 take over its siblings
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO moveReplicasViaGTID: Will move 1 replicas below db2:3306 via GTID, max concurrency: 5
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO Will move db3:3306 below db2:3306 via GTID
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO auditType:begin-maintenance instance:db3:3306 cluster:db1:3306 message:maintenanceToken: 10, owner: db2, reason: move below db2:3306
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO Stopped replication on db3:3306, Self:3-mysql-bin.000309:8972015, Exec:1-mysql-bin.000634:9446952
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO ChangeMasterTo: Changed master on db3:3306 to: db2:3306, 2-mysql-bin.000310:9349152. GTID: true
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO Started replication on db3:3306
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO auditType:move-below-gtid instance:db3:3306 cluster:db1:3306 message:moved db3:3306 below db2:3306
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO auditType:end-maintenance instance:db3:3306 cluster:db1:3306 message:maintenanceToken: 10
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO auditType:move-replicas-gtid instance:db2:3306 cluster:db1:3306 message:moved 1/1 replicas below db2:3306 via GTID
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO auditType:relocate-replicas instance:db1:3306 cluster:db1:3306 message:relocated 1 replicas of db1:3306 below db2:3306
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO GracefulMasterTakeover: Will demote db1:3306 and promote db2:3306 instead
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO topology_recovery: Running 1 PreGracefulTakeoverProcesses hooks
Aug 17 14:18:15 db2 orchestrator: 2022-08-17 14:18:15 INFO topology_recovery: Running PreGracefulTakeoverProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/pregracefultakeover.sh
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun(/usr/ssn/orchestrator/bin/pregracefultakeover.sh,[])
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-736578168
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun: Success! Data written to: mysql/pregracefultakeover/oldmaster
Aug 17 14:18:16 db2 orchestrator: Success! Data written to: mysql/pregracefultakeover/datetime
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun successful. exit status 0
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: Completed PreGracefulTakeoverProcesses hook 1 of 1 in 336.940276ms
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: done running PreGracefulTakeoverProcesses hooks
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO GracefulMasterTakeover: Will set db1:3306 as read_only
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO instance db1:3306 read_only: true
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO auditType:read-only instance:db1:3306 cluster:db1:3306 message:set as true
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO GracefulMasterTakeover: Will wait for db2:3306 to reach master coordinates 1-mysql-bin.000634:9446952
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO GracefulMasterTakeover: attempting recovery
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster detection on db1:3306; isActionable?: true; skipProcesses: false
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: detected DeadMaster failure on db1:3306
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: Running 1 OnFailureDetectionProcesses hooks
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: Running OnFailureDetectionProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/on_failure_detection.sh
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun(/usr/ssn/orchestrator/bin/on_failure_detection.sh,[])
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-374276983
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun:
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun successful. exit status 0
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: Completed OnFailureDetectionProcesses hook 1 of 1 in 10.930676ms
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: done running OnFailureDetectionProcesses hooks
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO executeCheckAndRecoverFunction: proceeding with NoWriteableMasterStructureWarning detection on db1:3306; isActionable?: true; skipProcesses: false
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO checkAndExecuteFailureDetectionProcesses: could not register NoWriteableMasterStructureWarning detection on db1:3306
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO executeCheckAndRecoverFunction: proceeding with NoWriteableMasterStructureWarning recovery on db1:3306; isRecoverable?: true; skipProcesses: false
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster recovery on db1:3306; isRecoverable?: true; skipProcesses: false
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO auditType:recover-non-writeable-master instance:db1:3306 cluster:db1:3306 message:problem found; will recover
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: Running 1 PreFailoverProcesses hooks
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: found an active or recent recovery on db1:3306. Will not issue another RecoverDeadMaster.
Aug 17 14:18:16 db2 orchestrator: [martini] Completed 500 Internal Server Error in 1.235572118s
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO topology_recovery: Running PreFailoverProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/prefailover.sh
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun(/usr/ssn/orchestrator/bin/prefailover.sh,[])
Aug 17 14:18:16 db2 orchestrator: 2022-08-17 14:18:16 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-647313514
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO CommandRun: Success! Data written to: mysql/prefailover/oldmaster
Aug 17 14:18:17 db2 orchestrator: Success! Data written to: mysql/prefailover/datetime
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO CommandRun successful. exit status 0
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO topology_recovery: Completed PreFailoverProcesses hook 1 of 1 in 276.993065ms
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO topology_recovery: done running PreFailoverProcesses hooks
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO instance db1:3306 read_only: false
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO auditType:read-only instance:db1:3306 cluster:db1:3306 message:set as false
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO Topology recovery: {"Id":4,"UID":"1660745896695721901:a284b6a49b66050e99af55fc1241f6357e6d92ad7bcb87fef196d518779e7ee6","AnalysisEntry":{"AnalyzedInstanceKey":{"Hostname":"db1","Port":3306},"AnalyzedInstanceMasterKey":{"Hostname":"","Port":0},"ClusterDetails":{"ClusterName":"db1:3306","ClusterAlias":"slvcluster","ClusterDomain":"db1:3306","CountInstances":0,"HeuristicLag":0,"HasAutomatedMasterRecovery":true,"HasAutomatedIntermediateMasterRecovery":true},"AnalyzedInstanceDataCenter":"","AnalyzedInstanceRegion":"","AnalyzedInstancePhysicalEnvironment":"","AnalyzedInstanceBinlogCoordinates":{"LogFile":"1-mysql-bin.000634","LogPos":9446952,"Type":0},"IsMaster":true,"IsReplicationGroupMember":false,"IsCoMaster":false,"LastCheckValid":true,"LastCheckPartialSuccess":true,"CountReplicas":1,"CountValidReplicas":1,"CountValidReplicatingReplicas":1,"CountReplicasFailingToConnectToMaster":0,"CountDowntimedReplicas":0,"ReplicationDepth":0,"Replicas":[{"Hostname":"db2","Port":3306}],"SlaveHosts":[{"Hostname":"db2","Port":3306}],"IsFailingToConnectToMaster":false,"Analysis":"NoWriteableMasterStructureWarning","Description":"Master with replicas is read_only","StructureAnalysis":["NoWriteableMasterStructureWarning"],"IsDowntimed":false,"IsReplicasDowntimed":false,"DowntimeEndTimestamp":"","DowntimeRemainingSeconds":0,"IsBinlogServer":false,"PseudoGTIDImmediateTopology":false,"OracleGTIDImmediateTopology":false,"MariaDBGTIDImmediateTopology":true,"BinlogServerImmediateTopology":false,"SemiSyncMasterEnabled":false,"SemiSyncMasterStatus":false,"SemiSyncMasterWaitForReplicaCount":0,"SemiSyncMasterClients":0,"CountSemiSyncReplicasEnabled":0,"CountLoggingReplicas":1,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":0,"CountRowBasedLoggingReplicas":1,"CountDistinctMajorVersionsLoggingReplicas":1,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"ProcessingNodeHostname":"db2","ProcessingNodeToken":"1e47c6548fae26d642aca84fc145641c8ec21ef55358dea8e316a4950d7f0709","CountAdditionalAgreeingNodes":0,"StartActivePeriod":"","SkippableDueToDowntime":false,"GTIDMode":"","MinReplicaGTIDMode":"","MaxReplicaGTIDMode":"","MaxReplicaGTIDErrant":"","CommandHint":"","IsReadOnly":true},"SuccessorKey":{"Hostname":"db1","Port":3306},"SuccessorAlias":"","SuccessorBinlogCoordinates":{"LogFile":"1-mysql-bin.000634","LogPos":9446952,"Type":0},"IsActive":false,"IsSuccessful":true,"LostReplicas":[],"ParticipatingInstanceKeys":[],"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","ProcessingNodeHostname":"","ProcessingNodeToken":"","Acknowledged":false,"AcknowledgedAt":"","AcknowledgedBy":"","AcknowledgedComment":"","LastDetectionId":0,"RelatedRecoveryId":0,"Type":"","RecoveryType":"NotMasterRecovery"}
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO topology_recovery: No PostFailoverProcesses hooks to run
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO topology_recovery: Waiting for 0 postponed functions
Aug 17 14:18:17 db2 orchestrator: 2022-08-17 14:18:17 INFO topology_recovery: Executed 0 postponed functions
No Problem when global-recoveries are disabled
Aug 17 15:48:33 db3 orchestrator: [martini] Started GET /api/graceful-master-takeover-auto/slvcluster for 10.1.1.16:8672
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO GracefulMasterTakeover: Will demote db1:3306 and promote db2:3306 instead
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: Running 1 PreGracefulTakeoverProcesses hooks
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: Running PreGracefulTakeoverProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/pregracefultakeover.sh
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO CommandRun(/usr/ssn/orchestrator/bin/pregracefultakeover.sh,[])
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-505102700
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO CommandRun: Success! Data written to: mysql/pregracefultakeover/oldmaster
Aug 17 15:48:33 db3 orchestrator: Success! Data written to: mysql/pregracefultakeover/datetime
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO CommandRun successful. exit status 0
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: Completed PreGracefulTakeoverProcesses hook 1 of 1 in 360.585339ms
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: done running PreGracefulTakeoverProcesses hooks
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO GracefulMasterTakeover: Will set db1:3306 as read_only
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO instance db1:3306 read_only: true
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO auditType:read-only instance:db1:3306 cluster:db1:3306 message:set as true
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO GracefulMasterTakeover: Will wait for db2:3306 to reach master coordinates 1-mysql-bin.000635:499931
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO GracefulMasterTakeover: attempting recovery
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster detection on db1:3306; isActionable?: true; skipProcesses: false
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO checkAndExecuteFailureDetectionProcesses: could not register DeadMaster detection on db1:3306
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO CheckAndRecover: Analysis: DeadMaster, InstanceKey: db1:3306, candidateInstanceKey: db2:3306, skipProcesses: false: recoveries disabled globally but forcing this recovery
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster recovery on db1:3306; isRecoverable?: true; skipProcesses: false
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: will handle DeadMaster event on db1:3306
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO auditType:recover-dead-master instance:db1:3306 cluster:db1:3306 message:problem found; will recover
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: Running 1 PreFailoverProcesses hooks
Aug 17 15:48:33 db3 orchestrator: 2022-08-17 15:48:33 INFO topology_recovery: Running PreFailoverProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/prefailover.sh
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO CommandRun(/usr/ssn/orchestrator/bin/prefailover.sh,[])
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-892061147
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO CommandRun: Success! Data written to: mysql/prefailover/oldmaster
Aug 17 15:48:34 db3 orchestrator: Success! Data written to: mysql/prefailover/datetime
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO CommandRun successful. exit status 0
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: Completed PreFailoverProcesses hook 1 of 1 in 355.117528ms
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: done running PreFailoverProcesses hooks
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: will recover db1:3306
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: masterRecoveryType=MasterRecoveryGTID
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: regrouping replicas via GTID
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: promotedReplicaIsIdeal(db2:3306)
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO executeCheckAndRecoverFunction: proceeding with NoWriteableMasterStructureWarning detection on db1:3306; isActionable?: true; skipProcesses: false
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO CheckAndRecover: Analysis: NoWriteableMasterStructureWarning, InstanceKey: db1:3306, candidateInstanceKey: <nil>, skipProcesses: false: NOT Recovering host (disabled globally)
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO auditType:regroup-replicas-gtid instance:db1:3306 cluster:db1:3306 message:regrouped replicas of db1:3306 via GTID; promoted db2:3306
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO auditType:begin-downtime instance:db1:3306 cluster:db1:3306 message:owner: db3, reason: lost-in-recovery
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: 1 postponed functions
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: checking if should replace promoted replica with a better candidate
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: + sanity check: found our very own server to promote; doing nothing
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: replace-promoted-replica-with-candidate: promoted instance db2:3306 requires no further action
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: promoted replica: db2:3306
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO auditType:recover-dead-master instance:db1:3306 cluster:db1:3306 message:promoted replica: db2:3306
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: promoted replica lag seconds: 0
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: promoted replica sql thread up-to-date: true
Aug 17 15:48:34 db3 orchestrator: 2022-08-17 15:48:34 INFO topology_recovery: RecoverDeadMaster: found no reason to override promotion of db2:3306
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: RecoverDeadMaster: successfully promoted db2:3306
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: - RecoverDeadMaster: promoted server coordinates: 2-mysql-bin.000310:10554579
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: - RecoverDeadMaster: will apply MySQL changes to promoted master
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO Will reset replica on db2:3306
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO auditType:begin-maintenance instance:db2:3306 cluster:db1:3306 message:maintenanceToken: 20, owner: db3, reason: reset replica
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO Stopped replication on db2:3306, Self:2-mysql-bin.000310:10554579, Exec:1-mysql-bin.000635:499931
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO Reset replication db2:3306
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO auditType:reset-slave instance:db2:3306 cluster:db1:3306 message:db2:3306 replication reset
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO auditType:end-maintenance instance:db2:3306 cluster:db1:3306 message:maintenanceToken: 20
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: - RecoverDeadMaster: applying RESET SLAVE ALL on promoted master: success=true
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO instance db2:3306 read_only: false
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO auditType:read-only instance:db2:3306 cluster:db1:3306 message:set as false
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: - RecoverDeadMaster: applying read-only=0 on promoted master: success=true
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: Writing KV [mysql/master/slvcluster:db2:3306 mysql/master/slvcluster/hostname:db2 mysql/master/slvcluster/port:3306 mysql/master/slvcluster/ipv4:10.1.1.15 mysql/master/slvcluster/ipv6:]
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO instance db1:3306 read_only: true
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO auditType:read-only instance:db1:3306 cluster:db1:3306 message:set as true
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: - RecoverDeadMaster: applying read-only=1 on demoted master: success=true
Aug 17 15:48:35 db3 orchestrator: 2022-08-17 15:48:35 INFO topology_recovery: Distributing KV [mysql/master/slvcluster:db2:3306 mysql/master/slvcluster/hostname:db2 mysql/master/slvcluster/port:3306 mysql/master/slvcluster/ipv4:10.1.1.15 mysql/master/slvcluster/ipv6:]
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: - RecoverDeadMaster: updating cluster_alias: db1:3306 -> db2:3306
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: Running 1 PostMasterFailoverProcesses hooks
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: Running PostMasterFailoverProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/postfailover.sh
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO CommandRun(/usr/ssn/orchestrator/bin/postfailover.sh,[])
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-540574590
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO CommandRun: Success! Data written to: mysql/postfailover/oldmaster
Aug 17 15:48:36 db3 orchestrator: Success! Data written to: mysql/postfailover/newmaster
Aug 17 15:48:36 db3 orchestrator: Success! Data written to: mysql/postfailover/datetime
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO CommandRun successful. exit status 0
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: Completed PostMasterFailoverProcesses hook 1 of 1 in 563.624921ms
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: done running PostMasterFailoverProcesses hooks
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO Topology recovery: {"Id":20,"UID":"1660751313841861017:6e47241e772d13607c420ed1b2f94dda4207207222206b99e05a86c8cab13604","AnalysisEntry":{"AnalyzedInstanceKey":{"Hostname":"db1","Port":3306},"AnalyzedInstanceMasterKey":{"Hostname":"","Port":0},"ClusterDetails":{"ClusterName":"db1:3306","ClusterAlias":"slvcluster","ClusterDomain":"","CountInstances":3,"HeuristicLag":0,"HasAutomatedMasterRecovery":true,"HasAutomatedIntermediateMasterRecovery":true},"AnalyzedInstanceDataCenter":"","AnalyzedInstanceRegion":"","AnalyzedInstancePhysicalEnvironment":"","AnalyzedInstanceBinlogCoordinates":{"LogFile":"1-mysql-bin.000635","LogPos":499931,"Type":0},"IsMaster":true,"IsReplicationGroupMember":false,"IsCoMaster":false,"LastCheckValid":true,"LastCheckPartialSuccess":true,"CountReplicas":1,"CountValidReplicas":1,"CountValidReplicatingReplicas":1,"CountReplicasFailingToConnectToMaster":0,"CountDowntimedReplicas":0,"ReplicationDepth":0,"Replicas":[{"Hostname":"db2","Port":3306}],"SlaveHosts":[{"Hostname":"db2","Port":3306}],"IsFailingToConnectToMaster":false,"Analysis":"DeadMaster","Description":"","StructureAnalysis":null,"IsDowntimed":false,"IsReplicasDowntimed":false,"DowntimeEndTimestamp":"","DowntimeRemainingSeconds":0,"IsBinlogServer":false,"PseudoGTIDImmediateTopology":false,"OracleGTIDImmediateTopology":false,"MariaDBGTIDImmediateTopology":true,"BinlogServerImmediateTopology":false,"SemiSyncMasterEnabled":false,"SemiSyncMasterStatus":false,"SemiSyncMasterWaitForReplicaCount":0,"SemiSyncMasterClients":0,"CountSemiSyncReplicasEnabled":0,"CountLoggingReplicas":1,"CountStatementBasedLoggingReplicas":0,"CountMixedBasedLoggingReplicas":0,"CountRowBasedLoggingReplicas":1,"CountDistinctMajorVersionsLoggingReplicas":1,"CountDelayedReplicas":0,"CountLaggingReplicas":0,"IsActionableRecovery":true,"ProcessingNodeHostname":"db3","ProcessingNodeToken":"9e9a3f83db96ed7deb24254186ec6a23716622d8dd10768f4b7abd88bad99278","CountAdditionalAgreeingNodes":0,"StartActivePeriod":"","SkippableDueToDowntime":false,"GTIDMode":"","MinReplicaGTIDMode":"","MaxReplicaGTIDMode":"","MaxReplicaGTIDErrant":"","CommandHint":"graceful-master-takeover","IsReadOnly":false},"SuccessorKey":{"Hostname":"db2","Port":3306},"SuccessorAlias":"","SuccessorBinlogCoordinates":{"LogFile":"2-mysql-bin.000310","LogPos":10554579,"Type":0},"IsActive":false,"IsSuccessful":true,"LostReplicas":[],"ParticipatingInstanceKeys":[],"AllErrors":[],"RecoveryStartTimestamp":"","RecoveryEndTimestamp":"","ProcessingNodeHostname":"","ProcessingNodeToken":"","Acknowledged":false,"AcknowledgedAt":"","AcknowledgedBy":"","AcknowledgedComment":"","LastDetectionId":0,"RelatedRecoveryId":0,"Type":"MasterRecovery","RecoveryType":"MasterRecoveryGTID"}
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: No PostFailoverProcesses hooks to run
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: Waiting for 1 postponed functions
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: Executed 1 postponed functions
Aug 17 15:48:36 db3 orchestrator: 2022-08-17 15:48:36 INFO topology_recovery: Executed postponed functions: regroup-replicas-gtid db2:3306
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO ChangeMasterTo: Changed master on db1:3306 to: db2:3306, 2-mysql-bin.000310:10554579. GTID: true
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO ChangeMasterTo: Changed master credentials on db1:3306
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO Started replication on db1:3306
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO topology_recovery: Running 1 PostGracefulTakeoverProcesses hooks
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO topology_recovery: Running PostGracefulTakeoverProcesses hook 1 of 1: /usr/ssn/orchestrator/bin/postgracefultakeover.sh
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO CommandRun(/usr/ssn/orchestrator/bin/postgracefultakeover.sh,[])
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-989844677
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO CommandRun: Success! Data written to: mysql/postgracefultakeover/oldmaster
Aug 17 15:48:37 db3 orchestrator: Success! Data written to: mysql/postgracefultakeover/newmaster
Aug 17 15:48:37 db3 orchestrator: Success! Data written to: mysql/postgracefultakeover/datetime
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO CommandRun successful. exit status 0
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO topology_recovery: Completed PostGracefulTakeoverProcesses hook 1 of 1 in 487.976128ms
Aug 17 15:48:37 db3 orchestrator: 2022-08-17 15:48:37 INFO topology_recovery: done running PostGracefulTakeoverProcesses hooks
The text was updated successfully, but these errors were encountered:
We noticed that a graceful-takeover command can fail sometimes, not promoting a new master and giving the following message:
"GracefulMasterTakeover: unexpected error: recovery not attempted. This should not happen"
If we turn off
RecoverNonWriteableMaster
flag, it works.If we disable global-recoveries before the graceful-takeover, it works too.
It does not fail all the time, it's unpredictable.
It looks like we have a race condition when the
NoWriteableMasterStructureWarning
is triggered, becauseGracefulMasterTakeover
start by setting the master as read_only.initial situation
final situation
Please see some logs:
The text was updated successfully, but these errors were encountered: