[CRASH] Redis Instance is down while master sync #13217

BuseSolmaz · 2024-04-17T08:36:32Z

Description of the Problem

We have total of 72 redis cluster instances (36 master & 36 slave) and each instance contains approximately 68 millions (29 GB) data.
In order to check whether the redis instance is up properly from the dump file, we kill the redis process on one of the slave instances and then start it again. However, redis instance is down during the "MASTER <-> REPLICA sync: Flushing old data" step. When we remove the dump file and start the redis instance again, we do not encounter such problem. Dump file's size is 13 GB
We tried to increase the client-output-buffer-limit on this slave node and also its master (both tried the configs 1GB/256MB & 3GB/1GB) but it did not work.
CONFIG set client-output-buffer-limit "normal 0 0 0 slave 1073741824 268435456 60 pubsub 33554432 8388608 60"

Do you have any solutions for this problem?

Crash Report
We hid the redis host and port information in the below log since it is a corporate information.

25084:C 17 Apr 2024 01:30:24.038 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
25084:C 17 Apr 2024 01:30:24.039 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=25084, just started
25084:C 17 Apr 2024 01:30:24.040 # Configuration loaded
25084:M 17 Apr 2024 01:30:24.041 * Increased maximum number of open files to 10032 (it was originally set to 1024).
25084:M 17 Apr 2024 01:30:24.043 * Node configuration loaded, I'm 39170ea79f2889e3012e860fbad701fb70f80364
25084:M 17 Apr 2024 01:30:24.045 * Running mode=cluster, port=XXX.
25084:M 17 Apr 2024 01:30:24.046 # Server initialized
25084:M 17 Apr 2024 01:30:24.047 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
25084:M 17 Apr 2024 01:30:24.048 * Loading RDB produced by version 6.0.5
25084:M 17 Apr 2024 01:30:24.048 * RDB age 1282 seconds
25084:M 17 Apr 2024 01:30:24.049 * RDB memory usage when created 30022.11 Mb
25084:M 17 Apr 2024 01:38:50.610 * DB loaded from disk: 506.562 seconds
25084:M 17 Apr 2024 01:38:50.611 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
25084:M 17 Apr 2024 01:38:50.611 * Ready to accept connections
25084:S 17 Apr 2024 01:38:50.631 * Discarding previously cached master state.
25084:S 17 Apr 2024 01:38:50.632 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
25084:S 17 Apr 2024 01:38:50.633 # Cluster state changed: ok
25084:S 17 Apr 2024 01:38:51.639 * Connecting to MASTER XX.XX.XX.XX:XXXX
25084:S 17 Apr 2024 01:38:51.640 * MASTER <-> REPLICA sync started
25084:S 17 Apr 2024 01:38:51.641 * Non blocking connect for SYNC fired the event.
25084:S 17 Apr 2024 01:38:51.642 * Master replied to PING, replication can continue...
25084:S 17 Apr 2024 01:38:51.643 * Trying a partial resynchronization (request 81d2e9518bbd165853d8c2464ac2a8b2da03503f:2141620117578).
25084:S 17 Apr 2024 01:38:52.683 * Full resync from master: 81d2e9518bbd165853d8c2464ac2a8b2da03503f:2141678892713
25084:S 17 Apr 2024 01:38:52.684 * Discarding previously cached master state.
25084:S 17 Apr 2024 01:44:32.483 * MASTER <-> REPLICA sync: receiving 13043613357 bytes from master to disk
25084:S 17 Apr 2024 01:45:00.170 * MASTER <-> REPLICA sync: Flushing old data

The text was updated successfully, but these errors were encountered:

sundb · 2024-04-17T08:41:59Z

@BuseSolmaz what's the meaning of redis instance is down? the replication crashed?
if so, can you provide the fully log include the crash log?

BuseSolmaz · 2024-04-17T13:59:07Z

Sorry for the misunderstanding. I mean during the master-replica sync process redis instance is restarting. The above redis log is all we had for this problem. How can we get the detailed crash log? @sundb

sundb · 2024-04-19T11:27:45Z

@BuseSolmaz do you mean the replication was closed after the last log 25084:S 17 Apr 2024 01:45:00.170 * MASTER <-> REPLICA sync: Flushing old data?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CRASH] Redis Instance is down while master sync #13217

[CRASH] Redis Instance is down while master sync #13217

BuseSolmaz commented Apr 17, 2024

sundb commented Apr 17, 2024

BuseSolmaz commented Apr 17, 2024 •

edited

sundb commented Apr 19, 2024

[CRASH] Redis Instance is down while master sync #13217

[CRASH] Redis Instance is down while master sync #13217

Comments

BuseSolmaz commented Apr 17, 2024

sundb commented Apr 17, 2024

BuseSolmaz commented Apr 17, 2024 • edited

sundb commented Apr 19, 2024

BuseSolmaz commented Apr 17, 2024 •

edited