[BUG]sentinel failed to failover while master was down.[4.0.14] #13239

luijianfie · 2024-04-30T09:18:15Z

Describe the bug

topology:
10.250.17.68:6379 master
10.250.17.80:6379 slave
10.250.17.68:6380 sentinel
10.250.17.80:6380 sentinel
10.250.17.32:6380 sentinel

all hosts are virtual machines. host 10.250.17.68 need to be shutdown for maintainance purpose.

master 10.250.17.68:6379 exited
[10.250.17.68-redis-6379]3894:signal-handler (1713937702) Received SIGTERM scheduling shutdown...
[10.250.17.68-redis-6379]3894:M 24 Apr 13:48:22.373 # User requested shutdown...
[10.250.17.68-redis-6379]3894:M 24 Apr 13:48:22.373 * Calling fsync() on the AOF file.
[10.250.17.68-redis-6379]3894:M 24 Apr 13:48:22.373 # Redis is now ready to exit, bye bye...

sentinel 10.250.17.68:6379 exited
[10.250.17.68-redis-seintiel-6380]3980:signal-handler (1713937702) Received SIGTERM scheduling shutdown...
[10.250.17.68-redis-seintiel-6380]3980:X 24 Apr 13:48:22.379 # User requested shutdown...
[10.250.17.68-redis-seintiel-6380]3980:X 24 Apr 13:48:22.379 # Sentinel is now ready to exit, bye bye...

sentinel 10.250.17.32:6380 marked master as sdown at 13:48:42.51
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:48:42.518 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:48:42.518 # +sdown sentinel 723cccee8b0adf35a3669c17f698ab8e4968c46a 10.250.17.68 6380 @ sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.32-redis-sentinel]
[10.250.17.32-redis-sentinel]
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:53:42.571 # +new-epoch 1
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:53:42.572 # +vote-for-leader 7c484ccb1655219c36b98d86887fcbdf29ede55f 1
[10.250.17.32-redis-sentinel]3812:X 24 Apr 13:53:43.513 # +odown master sentinel-10.250.17.68-6379 10.250.17.68 6379 #quorum 2/2

sentinel 10.250.17.80:6380 repeatedly marked master +sdown and -sdown, and marked master as odown at 13:53:42.567
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:22.533 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:23.469 # -sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:23.527 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:24.500 # -sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:48:42.504 # +sdown sentinel 723cccee8b0adf35a3669c17f698ab8e4968c46a 10.250.17.68 6380 @ sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:53:42.505 # +sdown master sentinel-10.250.17.68-6379 10.250.17.68 6379
[10.250.17.80-redis-sentinel]3860:X 24 Apr 13:53:42.567 # +odown master sentinel-10.250.17.68-6379 10.250.17.68 6379 #quorum 2/2

questions:
1.why did 10.250.17.80:6380 first mark master +sdown at 13:48:22.533 while down-after-milliseconds is 20000? Master exited at 13:48:22.373. The earliest time for 10.250.17.80:6380 to mark master as down should be 13:48:42.373?

2.why did 10.250.17.80:6380 repeatedly mark master +sdown and -sdown?

3.Why did 10.250.17.80:6380 only confirm the master as odown at 13:53:42.567? By this time, it had been 5 minutes since the master went down. Host 10.250.17.68 completed its restart at 13:53:42.

To reproduce

failed to reproduce

Expected behavior

to failover successfully

Additional information

Any additional information that is relevant to the problem.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]sentinel failed to failover while master was down.[4.0.14] #13239

[BUG]sentinel failed to failover while master was down.[4.0.14] #13239

luijianfie commented Apr 30, 2024

[BUG]sentinel failed to failover while master was down.[4.0.14] #13239

[BUG]sentinel failed to failover while master was down.[4.0.14] #13239

Comments

luijianfie commented Apr 30, 2024