Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CRASH] Redis Instance is down while master sync #13217

Open
BuseSolmaz opened this issue Apr 17, 2024 · 3 comments
Open

[CRASH] Redis Instance is down while master sync #13217

BuseSolmaz opened this issue Apr 17, 2024 · 3 comments

Comments

@BuseSolmaz
Copy link

Description of the Problem

  • We have total of 72 redis cluster instances (36 master & 36 slave) and each instance contains approximately 68 millions (29 GB) data.
  • In order to check whether the redis instance is up properly from the dump file, we kill the redis process on one of the slave instances and then start it again. However, redis instance is down during the "MASTER <-> REPLICA sync: Flushing old data" step. When we remove the dump file and start the redis instance again, we do not encounter such problem. Dump file's size is 13 GB
  • We tried to increase the client-output-buffer-limit on this slave node and also its master (both tried the configs 1GB/256MB & 3GB/1GB) but it did not work.
    CONFIG set client-output-buffer-limit "normal 0 0 0 slave 1073741824 268435456 60 pubsub 33554432 8388608 60"

Do you have any solutions for this problem?

Crash Report
We hid the redis host and port information in the below log since it is a corporate information.

25084:C 17 Apr 2024 01:30:24.038 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
25084:C 17 Apr 2024 01:30:24.039 # Redis version=6.0.5, bits=64, commit=00000000, modified=0, pid=25084, just started
25084:C 17 Apr 2024 01:30:24.040 # Configuration loaded
25084:M 17 Apr 2024 01:30:24.041 * Increased maximum number of open files to 10032 (it was originally set to 1024).
25084:M 17 Apr 2024 01:30:24.043 * Node configuration loaded, I'm 39170ea79f2889e3012e860fbad701fb70f80364
25084:M 17 Apr 2024 01:30:24.045 * Running mode=cluster, port=XXX.
25084:M 17 Apr 2024 01:30:24.046 # Server initialized
25084:M 17 Apr 2024 01:30:24.047 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
25084:M 17 Apr 2024 01:30:24.048 * Loading RDB produced by version 6.0.5
25084:M 17 Apr 2024 01:30:24.048 * RDB age 1282 seconds
25084:M 17 Apr 2024 01:30:24.049 * RDB memory usage when created 30022.11 Mb
25084:M 17 Apr 2024 01:38:50.610 * DB loaded from disk: 506.562 seconds
25084:M 17 Apr 2024 01:38:50.611 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
25084:M 17 Apr 2024 01:38:50.611 * Ready to accept connections
25084:S 17 Apr 2024 01:38:50.631 * Discarding previously cached master state.
25084:S 17 Apr 2024 01:38:50.632 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
25084:S 17 Apr 2024 01:38:50.633 # Cluster state changed: ok
25084:S 17 Apr 2024 01:38:51.639 * Connecting to MASTER XX.XX.XX.XX:XXXX
25084:S 17 Apr 2024 01:38:51.640 * MASTER <-> REPLICA sync started
25084:S 17 Apr 2024 01:38:51.641 * Non blocking connect for SYNC fired the event.
25084:S 17 Apr 2024 01:38:51.642 * Master replied to PING, replication can continue...
25084:S 17 Apr 2024 01:38:51.643 * Trying a partial resynchronization (request 81d2e9518bbd165853d8c2464ac2a8b2da03503f:2141620117578).
25084:S 17 Apr 2024 01:38:52.683 * Full resync from master: 81d2e9518bbd165853d8c2464ac2a8b2da03503f:2141678892713
25084:S 17 Apr 2024 01:38:52.684 * Discarding previously cached master state.
25084:S 17 Apr 2024 01:44:32.483 * MASTER <-> REPLICA sync: receiving 13043613357 bytes from master to disk
25084:S 17 Apr 2024 01:45:00.170 * MASTER <-> REPLICA sync: Flushing old data

@sundb
Copy link
Collaborator

sundb commented Apr 17, 2024

@BuseSolmaz what's the meaning of redis instance is down? the replication crashed?
if so, can you provide the fully log include the crash log?

@BuseSolmaz
Copy link
Author

BuseSolmaz commented Apr 17, 2024

Sorry for the misunderstanding. I mean during the master-replica sync process redis instance is restarting. The above redis log is all we had for this problem. How can we get the detailed crash log? @sundb

@sundb
Copy link
Collaborator

sundb commented Apr 19, 2024

@BuseSolmaz do you mean the replication was closed after the last log 25084:S 17 Apr 2024 01:45:00.170 * MASTER <-> REPLICA sync: Flushing old data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants