Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Server Keeps crashing randomly #13213

Open
more456 opened this issue Apr 15, 2024 · 1 comment
Open

[BUG] - Server Keeps crashing randomly #13213

more456 opened this issue Apr 15, 2024 · 1 comment

Comments

@more456
Copy link

more456 commented Apr 15, 2024

Redis replica in a simple Master-Slave Configuration keeps crashing randomly.
Ubuntu 22.04 LTS

=== REDIS BUG REPORT START: Cut & paste starting from here ===
32214:S 15 Apr 2024 15:08:17.765 # ------------------------------------------------
32214:S 15 Apr 2024 15:08:17.765 # !!! Software Failure. Press left mouse button to continue
32214:S 15 Apr 2024 15:08:17.765 # Guru Meditation: Replica was unable to write command to disk. #server.c:4029

------ STACK TRACE ------

Backtrace:
/opt/redis-stack/bin/redis-server *:6379(+0x9b96d)[0x5bcd753ab96d]
/opt/redis-stack/bin/redis-server *:6379(processInputBuffer+0x107)[0x5bcd753d0387]
/opt/redis-stack/bin/redis-server *:6379(readQueryFromClient+0x368)[0x5bcd753d0908]
/opt/redis-stack/bin/redis-server *:6379(+0x1c9b9c)[0x5bcd754d9b9c]
/opt/redis-stack/bin/redis-server *:6379(aeMain+0xf9)[0x5bcd7539fdd9]
/opt/redis-stack/bin/redis-server *:6379(main+0x3df)[0x5bcd7539463f]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7d7613c29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7d7613c29e40]
/opt/redis-stack/bin/redis-server *:6379(_start+0x25)[0x5bcd75394e05]

------ INFO OUTPUT ------

Server

redis_version:7.2.4
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:3a23a5ac67fe7008
redis_mode:standalone
os:Linux 6.5.0-1018-azure x86_64
arch_bits:64
monotonic_clock:POSIX clock_gettime
multiplexing_api:epoll
atomicvar_api:c11-builtin
gcc_version:11.4.0
process_id:32214
process_supervised:no
run_id:18ec072b3d5b65ad8c0872f2aded3988402233b8
tcp_port:6379
server_time_usec:1713193697765465
uptime_in_seconds:38887
uptime_in_days:0
hz:10
configured_hz:10
lru_clock:1917665
executable:/opt/redis-stack/bin/redis-server
config_file:/etc/redis/redis-stack-6379.conf
io_threads_active:0
listener0:name=tcp,bind=,bind=-::,port=6379

Clients

connected_clients:2
cluster_connections:0
maxclients:10000
client_recent_max_input_buffer:20480
client_recent_max_output_buffer:0
blocked_clients:0
tracking_clients:0
clients_in_timeout_table:0
total_blocking_keys:0
total_blocking_keys_on_nokey:0

Memory

used_memory:12190828672
used_memory_human:11.35G
used_memory_rss:12255432704
used_memory_rss_human:11.41G
used_memory_peak:13477301528
used_memory_peak_human:12.55G
used_memory_peak_perc:90.45%
used_memory_overhead:161838492
used_memory_startup:1137208
used_memory_dataset:12028990180
used_memory_dataset_perc:98.68%
allocator_allocated:12194414656
allocator_active:12205555712
allocator_resident:12293505024
total_system_memory:135005425664
total_system_memory_human:125.73G
used_memory_lua:31744
used_memory_vm_eval:31744
used_memory_lua_human:31.00K
used_memory_scripts_eval:0
number_of_cached_scripts:0
number_of_functions:0
number_of_libraries:0
used_memory_vm_functions:32768
used_memory_vm_total:64512
used_memory_vm_total_human:63.00K
used_memory_functions:184
used_memory_scripts:184
used_memory_scripts_human:184B
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
allocator_frag_ratio:1.00
allocator_frag_bytes:11141056
allocator_rss_ratio:1.01
allocator_rss_bytes:87949312
rss_overhead_ratio:1.00
rss_overhead_bytes:-38072320
mem_fragmentation_ratio:1.01
mem_fragmentation_bytes:62722616
mem_not_counted_for_evict:8
mem_replication_backlog:164036
mem_total_replication_buffers:164032
mem_clients_slaves:0
mem_clients_normal:55048
mem_cluster_links:0
mem_aof_buffer:8
mem_allocator:jemalloc-5.3.0
active_defrag_running:0
lazyfree_pending_objects:0
lazyfreed_objects:5211

Persistence

loading:0
async_loading:0
current_cow_peak:0
current_cow_size:0
current_cow_size_age:0
current_fork_perc:0.00
current_save_keys_processed:0
current_save_keys_total:3171596
rdb_changes_since_last_save:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1713186112
rdb_last_bgsave_status:err
rdb_last_bgsave_time_sec:32
rdb_current_bgsave_time_sec:-1
rdb_saves:12
rdb_last_cow_size:9142272
rdb_last_load_keys_expired:0
rdb_last_load_keys_loaded:3171596
aof_enabled:1
aof_rewrite_in_progress:1
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:52
aof_current_rewrite_time_sec:0
aof_last_bgrewrite_status:err
aof_rewrites:15
aof_rewrites_consecutive_failures:1
aof_last_write_status:ok
aof_last_cow_size:9662464
module_fork_in_progress:0
module_fork_last_cow_size:0
aof_current_size:4554760198
aof_base_size:4554760198
aof_pending_rewrite:0
aof_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

Stats

total_connections_received:23970
total_commands_processed:27994205
instantaneous_ops_per_sec:0
total_net_input_bytes:203878587529
total_net_output_bytes:3754294534
total_net_repl_input_bytes:198573434482
total_net_repl_output_bytes:0
instantaneous_input_kbps:28145.24
instantaneous_output_kbps:0.04
instantaneous_input_repl_kbps:28145.24
instantaneous_output_repl_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:208
expired_stale_perc:0.00
expired_time_cap_reached_count:0
expire_cycle_cpu_milliseconds:4520
evicted_keys:0
evicted_clients:0
total_eviction_exceeded_time:0
current_eviction_exceeded_time:0
keyspace_hits:42417236
keyspace_misses:127
pubsub_channels:0
pubsub_patterns:0
pubsubshard_channels:0
latest_fork_usec:272648
total_forks:29
migrate_cached_sockets:0
slave_expires_tracked_keys:0
active_defrag_hits:0
active_defrag_misses:0
active_defrag_key_hits:0
active_defrag_key_misses:0
total_active_defrag_time:0
current_active_defrag_time:0
tracking_total_keys:0
tracking_total_items:0
tracking_total_prefixes:0
unexpected_error_replies:0
total_error_replies:84626
dump_payload_sanitizations:0
total_reads_processed:4239795
total_writes_processed:10007911
io_threaded_reads_processed:0
io_threaded_writes_processed:0
reply_buffer_shrinks:16904
reply_buffer_expands:28292
eventloop_cycles:26297715
eventloop_duration_sum:2227529806
eventloop_duration_cmd_sum:51513414
instantaneous_eventloop_cycles_per_sec:3214
instantaneous_eventloop_duration_usec:32
acl_access_denied_auth:0
acl_access_denied_cmd:0
acl_access_denied_key:0
acl_access_denied_channel:0

Replication

role:slave
master_host:x.x.x.x
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:45067404516
slave_repl_offset:45067384753
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:PQRS
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:45067384753
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:104857600
repl_backlog_first_byte_offset:45067226380
repl_backlog_histlen:158374

CPU

used_cpu_sys:957.055096
used_cpu_user:1849.819429
used_cpu_sys_children:216.995087
used_cpu_user_children:2547.199367
used_cpu_sys_main_thread:932.174448
used_cpu_user_main_thread:1814.793141

Modules

module:name=bf,ver=20612,api=1,filters=0,usedby=[],using=[],options=[]
module:name=search,ver=20813,api=1,filters=0,usedby=[],using=[ReJSON],options=[handle-io-errors]
module:name=ReJSON,ver=20610,api=1,filters=0,usedby=[search],using=[],options=[handle-io-errors]
module:name=timeseries,ver=11012,api=1,filters=0,usedby=[],using=[],options=[]

Commandstats

cmdstat_scan:calls=10,usec=24723,usec_per_call=2472.30,rejected_calls=0,failed_calls=0
cmdstat_FT._LIST:calls=1,usec=119,usec_per_call=119.00,rejected_calls=0,failed_calls=0
cmdstat_del:calls=0,usec=0,usec_per_call=0.00,rejected_calls=1,failed_calls=0
cmdstat_publish:calls=27832351,usec=47708746,usec_per_call=1.71,rejected_calls=1,failed_calls=0
cmdstat_bgrewriteaof:calls=11,usec=15,usec_per_call=1.36,rejected_calls=0,failed_calls=0
cmdstat_module|list:calls=5,usec=29,usec_per_call=5.80,rejected_calls=0,failed_calls=0
cmdstat_info:calls=70725,usec=808012,usec_per_call=11.42,rejected_calls=0,failed_calls=0
cmdstat_replicaof:calls=2,usec=573,usec_per_call=286.50,rejected_calls=8,failed_calls=0
cmdstat_dbsize:calls=3,usec=3,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_client|setname:calls=21556,usec=34498,usec_per_call=1.60,rejected_calls=0,failed_calls=0
cmdstat_client|list:calls=5,usec=557,usec_per_call=111.40,rejected_calls=0,failed_calls=0
cmdstat_get:calls=128,usec=330,usec_per_call=2.58,rejected_calls=21043,failed_calls=0
cmdstat_psubscribe:calls=394,usec=1415,usec_per_call=3.59,rejected_calls=0,failed_calls=0
cmdstat_punsubscribe:calls=2,usec=35,usec_per_call=17.50,rejected_calls=0,failed_calls=0
cmdstat_command|docs:calls=16,usec=23126,usec_per_call=1445.38,rejected_calls=0,failed_calls=0
cmdstat_quit:calls=11,usec=11,usec_per_call=1.00,rejected_calls=0,failed_calls=0
cmdstat_select:calls=3,usec=4,usec_per_call=1.33,rejected_calls=0,failed_calls=0
cmdstat_config|get:calls=42359,usec=91124,usec_per_call=2.15,rejected_calls=0,failed_calls=0
cmdstat_type:calls=12,usec=16,usec_per_call=1.33,rejected_calls=0,failed_calls=0
cmdstat_cluster|nodes:calls=127,usec=124,usec_per_call=0.98,rejected_calls=21043,failed_calls=127
cmdstat_echo:calls=21526,usec=8250,usec_per_call=0.38,rejected_calls=0,failed_calls=0
cmdstat_bgsave:calls=11,usec=2800561,usec_per_call=254596.45,rejected_calls=0,failed_calls=0
cmdstat_ping:calls=4462,usec=9952,usec_per_call=2.23,rejected_calls=21231,failed_calls=0
cmdstat_memory|usage:calls=12,usec=104,usec_per_call=8.67,rejected_calls=0,failed_calls=0
cmdstat_ttl:calls=12,usec=8,usec_per_call=0.67,rejected_calls=0,failed_calls=0
cmdstat_subscribe:calls=383,usec=1124,usec_per_call=2.93,rejected_calls=0,failed_calls=0

Errorstats

errorstat_ERR:count=21300
errorstat_LOADING:count=63325
errorstat_READONLY:count=1

Latencystats

latency_percentiles_usec_scan:p50=2179.071,p99=4390.911,p99.9=4390.911
latency_percentiles_usec_FT._LIST:p50=119.295,p99=119.295,p99.9=119.295
latency_percentiles_usec_publish:p50=1.003,p99=18.047,p99.9=31.103
latency_percentiles_usec_bgrewriteaof:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_module|list:p50=5.023,p99=12.031,p99.9=12.031
latency_percentiles_usec_info:p50=5.023,p99=76.287,p99.9=173.055
latency_percentiles_usec_replicaof:p50=255.999,p99=319.487,p99.9=319.487
latency_percentiles_usec_dbsize:p50=1.003,p99=1.003,p99.9=1.003
latency_percentiles_usec_client|setname:p50=1.003,p99=20.095,p99.9=74.239
latency_percentiles_usec_client|list:p50=67.071,p99=214.015,p99.9=214.015
latency_percentiles_usec_get:p50=2.007,p99=19.071,p99.9=20.095
latency_percentiles_usec_psubscribe:p50=2.007,p99=26.111,p99.9=35.071
latency_percentiles_usec_punsubscribe:p50=4.015,p99=31.103,p99.9=31.103
latency_percentiles_usec_command|docs:p50=1392.639,p99=1826.815,p99.9=1826.815
latency_percentiles_usec_quit:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_select:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_config|get:p50=1.003,p99=22.015,p99.9=81.407
latency_percentiles_usec_type:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_cluster|nodes:p50=1.003,p99=2.007,p99.9=2.007
latency_percentiles_usec_echo:p50=0.001,p99=1.003,p99.9=21.119
latency_percentiles_usec_bgsave:p50=253755.391,p99=265289.727,p99.9=265289.727
latency_percentiles_usec_ping:p50=1.003,p99=22.015,p99.9=41.215
latency_percentiles_usec_memory|usage:p50=5.023,p99=36.095,p99.9=36.095
latency_percentiles_usec_ttl:p50=0.001,p99=2.007,p99.9=2.007
latency_percentiles_usec_subscribe:p50=2.007,p99=24.063,p99.9=70.143

Cluster

cluster_enabled:0

Keyspace

db0:keys=3171596,expires=1973,avg_ttl=0

------ CLIENT LIST OUTPUT ------

------ CURRENT CLIENT INFO ------
id=155259 addr=x.x.x.x:6379 laddr=10.0.5.8:59762 fd=12 name= age=0 idle=0 flags=M db=0 sub=0 psub=0 ssub=0 multi=-1 qbuf=40954 qbuf-free=0 argv-mem=20 multi-mem=0 rbs=16384 rbp=45 obl=0 oll=0 omem=0 tot-mem=58276 events=r cmd=del user=(superuser) redir=-1 resp=2 lib-name= lib-ver=
argc: '2'
argv[0]: '"DEL"'
argv[1]: '"rl:PQRS"'
32214:S 15 Apr 2024 15:08:17.767 # key 'rl:PQRS' found in DB containing the following object:
32214:S 15 Apr 2024 15:08:17.767 # Object type: 0
32214:S 15 Apr 2024 15:08:17.767 # Object encoding: 1
32214:S 15 Apr 2024 15:08:17.767 # Object refcount: 2147483647

------ MODULES INFO OUTPUT ------

search_version

search_version:2.8.13
search_redis_version:7.2.4 - oss

search_index

search_number_of_indexes:153

search_fields_statistics

search_fields_text:Text=7
search_fields_numeric:Numeric=772,Sortable=310
search_fields_tag:Tag=1585,Sortable=5

search_dialect_statistics

search_dialect_1:0
search_dialect_2:0
search_dialect_3:0
search_dialect_4:0

search_runtime_configurations

search_concurrent_mode:OFF
search_enableGC:ON
search_minimal_term_prefix:2
search_maximal_prefix_expansions:200
search_query_timeout_ms:500
search_timeout_policy:return
search_cursor_read_size:1000
search_cursor_max_idle_time:300000
search_max_doc_table_size:1000000
search_max_search_results:1000000
search_max_aggregate_results:-1
search_search_pool_size:20
search_index_pool_size:8
search_gc_scan_size:100
search_min_phonetic_term_length:3

ReJSON_trace

ReJSON_backtrace: 0: redis_module::basic_info_command_handler
1: rejson::__info_func
2: modulesCollectInfo
at /__w/redis-stack/redis-stack/redis/src/module.c:10293:9
3: logModulesInfo
at /__w/redis-stack/redis-stack/redis/src/debug.c:1900:22
printCrashReport
at /__w/redis-stack/redis-stack/redis/src/debug.c:2183:5
4: _serverPanic
at /__w/redis-stack/redis-stack/redis/src/debug.c:1158:9
5: processCommand
at /__w/redis-stack/redis-stack/redis/src/server.c:4029:17
6: processCommandAndResetClient
at /__w/redis-stack/redis-stack/redis/src/networking.c:2466:9
processInputBuffer
at /__w/redis-stack/redis-stack/redis/src/networking.c:2574:17
7: readQueryFromClient
at /__w/redis-stack/redis-stack/redis/src/networking.c:2713:9
8: callHandler
at /__w/redis-stack/redis-stack/redis/src/connhelpers.h:79:18
connSocketEventHandler
at /__w/redis-stack/redis-stack/redis/src/socket.c:298:14
9: aeProcessEvents
at /__w/redis-stack/redis-stack/redis/src/ae.c:436:17
aeMain
at /__w/redis-stack/redis-stack/redis/src/ae.c:496:9
10: main
at /__w/redis-stack/redis-stack/redis/src/server.c:7360:5
11:
12: __libc_start_main
13: _start

------ CONFIG DEBUG OUTPUT ------
lazyfree-lazy-eviction no
slave-read-only yes
proto-max-bulk-len 512mb
io-threads-do-reads no
lazyfree-lazy-server-del no
activedefrag no
lazyfree-lazy-expire no
lazyfree-lazy-user-del no
replica-read-only yes
io-threads 1
lazyfree-lazy-user-flush no
client-query-buffer-limit 1gb
sanitize-dump-payload no
repl-diskless-load disabled
list-compress-depth 0
repl-diskless-sync yes

------ FAST MEMORY TEST ------
32214:S 15 Apr 2024 15:08:17.933 # Bio worker thread #0 terminated
32214:S 15 Apr 2024 15:08:17.933 # Bio worker thread #1 terminated
32214:S 15 Apr 2024 15:08:17.934 # Bio worker thread #2 terminated
*** Preparing to test memory region 5bcd75694000 (704512 bytes)
*** Preparing to test memory region 5bcd76348000 (454656 bytes)
*** Preparing to test memory region 7d72a4000000 (135168 bytes)
*** Preparing to test memory region 7d72ac000000 (135168 bytes)
*** Preparing to test memory region 7d72b0600000 (4194304 bytes)
*** Preparing to test memory region 7d72b0c00000 (8388608 bytes)
*** Preparing to test memory region 7d72b1400000 (2097152 bytes)
*** Preparing to test memory region 7d72b17f0000 (2097152 bytes)
*** Preparing to test memory region 7d72b1a00000 (8388608 bytes)
*** Preparing to test memory region 7d72b2200000 (2097152 bytes)
*** Preparing to test memory region 7d72b24f0000 (9437184 bytes)
*** Preparing to test memory region 7d72b2e00000 (8388608 bytes)
*** Preparing to test memory region 7d72b3600000 (13960740864 bytes)
*** Preparing to test memory region 7d75f3900000 (433061888 bytes)
*** Preparing to test memory region 7d760d6f0000 (15728640 bytes)
*** Preparing to test memory region 7d760e600000 (8388608 bytes)
*** Preparing to test memory region 7d760ee00000 (4194304 bytes)
*** Preparing to test memory region 7d760f3fe000 (8388608 bytes)
*** Preparing to test memory region 7d760fbff000 (8388608 bytes)
*** Preparing to test memory region 7d7610400000 (8388608 bytes)
*** Preparing to test memory region 7d7611229000 (12288 bytes)
*** Preparing to test memory region 7d76113ff000 (8388608 bytes)
*** Preparing to test memory region 7d7611c00000 (8388608 bytes)
*** Preparing to test memory region 7d7612a27000 (28672 bytes)
*** Preparing to test memory region 7d7612f00000 (13631488 bytes)
*** Preparing to test memory region 7d7613e1c000 (53248 bytes)
*** Preparing to test memory region 7d7614441000 (12288 bytes)
*** Preparing to test memory region 7d761451e000 (4096 bytes)
*** Preparing to test memory region 7d7614546000 (20480 bytes)
*** Preparing to test memory region 7d76146db000 (8192 bytes)
.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O
Fast memory test PASSED, however your memory can still be broken. Please run a memory test for several hours if possible.

@sundb
Copy link
Collaborator

sundb commented Apr 16, 2024

you might gotta check why rdb saving and aof rewrite are failing,
if we don't let it panic, it could lead to inconsistencies between master and replication.
btw, if you realy wanna ignore this panic, you can disable config replica-ignore-disk-write-errors,

rdb_last_bgsave_status:err
aof_last_bgrewrite_status:err

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants