Release Redis 7.2.5 #13272

YaacovHazan · 2024-05-16T07:17:04Z

Upgrade urgency MODERATE: Program an upgrade of the server, but it's not urgent.

Bug fixes

A single shard cluster leaves failed replicas in CLUSTER SLOTS instead of removing them (When one shard, sole primary node marks potentially failed replica as FAIL instead of PFAIL #12824)
Crash in LSET command when replacing small items and exceeding 4GB (Fix crash in LSET command when replacing small list items with larger ones, creating listpacks larger than 4GB #12955)
Blocking commands timeout is reset due to re-processing command (Fix blocking commands timeout is reset due to re-processing command #13004)
Conversion of numbers in Lua args to redis args can fail. Bug introduced in 7.2.0 (Fix conversion of numbers in lua args to redis args #13115)

Bug fixes in CLI tools

redis-cli: --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used (Fix redis-cli --count (for --scan, --bigkeys, etc) was ignored unless --pattern was also used #13092)
redis-check-aof: incorrectly considering data in manifest format as MP-AOF (Fix redis-check-aof incorrectly considering data in manifest format as MP-AOF #12958)

…edis#12927) The test was introduced in redis#10745, but we forgot to add it to the test_helper.tcl, so our CI did not actually run it. This PR adds it and ensures it passes CI tests. (cherry picked from commit b351a04)

… FAIL instead of PFAIL (redis#12824) Fixes issue where a single primary cannot mark a replica as failed in a single-shard cluster. (cherry picked from commit b3aaa0a)

seems that we forgot to update the array in redis-check rdb. (cherry picked from commit f9a0eb6)

…edis#13004) In redis#11012, we will reprocess command when client is unblocked on keys, in some blocking commands, for example, in the XREADGROUP BLOCK scenario, because of the re-processing command, we will recalculate the block timeout, causing the blocking time to be reset. This commit add a new CLIENT_REPROCESSING_COMMAND clent flag, explicitly let the command know that it is being re-processed, later in blockForKeys we will not reset the timeout. Affected BLOCK cases: - list / zset / stream, added test cases for each. Unaffected cases: - module (never re-process the commands). - WAIT / WAITAOF (never re-process the commands). Fixes redis#12998. (cherry picked from commit 492021d)

This was introduced in redis#13004, missing this assignment. It causes timeout to be a random value (may be less than now), and then in `Unblock by timer` test, the client is unblocked and then it call timeout_callback, since the callback is NULL, the server will crash. The crash stack is: ``` beforesleep handleBlockedClientsTimeout checkBlockedClientTimeout unblockClientOnTimeout replyToBlockedClientTimedOut moduleBlockedClientTimedOut -- the timeout_callback is NULL, invalidFunctionWasCalled bc->timeout_callback(&ctx,(void**)c->argv,c->argc); ``` (cherry picked from commit 45a35a7)

… 4GB (redis#12955) Fix redis#12864 The main reason for this crash is that when replacing a element of a quicklist packed node with lpReplace() method, if the final size is larger than 4GB, lpReplace() will fail and returns NULL, causing `node->entry` to be incorrectly set to NULL. Since the inserted data is not a large element, we can't just replace it like a large element, first quicklistInsertAfter() and then quicklistDelIndex(), because the current node may be merged and invalidated in quicklistInsertAfter(). The solution of this PR: When replacing a node fails (listpack exceeds 4GB), split the current node, create a new node to put in the middle, and try to merge them. This is the same as inserting a large element. In the worst case, its size will not exceed 4GB. (cherry picked from commit 1f00c95)

…edis#13040) Fix two crash introducted by redis#12955 When a quicklist node can't be inserted and split, we eventually merge the current node with its neighboring nodes after inserting, and compress the current node and its siblings. 1. When the current node is merged with another node, the current node may become invalid and can no longer be used. Solution: let `_quicklistMergeNodes()` return the merged nodes. 3. If the current node is a LZF quicklist node, its recompress will be 1. If the split node can be merged with a sibling node to become head or tail, recompress may cause the head and tail to be compressed, which is not allowed. Solution: always recompress to 0 after merging. (cherry picked from commit 1e8dc1d)

…ues (redis#13053) These tests have all failed in daily CI: ``` *** [err]: Blocking XREADGROUP for stream key that has clients blocked on stream - reprocessing command in tests/unit/type/stream-cgroups.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) *** [err]: BLPOP unblock but the key is expired and then block again - reprocessing command in tests/unit/type/list.tcl Expected '1101' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) *** [err]: BZPOPMIN unblock but the key is expired and then block again - reprocessing command in tests/unit/type/zset.tcl Expected '1103' to be between to '1000' and '1100' (context: type eval line 23 cmd {assert_range [expr $end-$start] 1000 1100} proc ::test) ``` Increase the range to avoid failures, and improve the comment to be clearer. tests was introduced in redis#13004. (cherry picked from commit 32f44da)

… --pattern was also used (redis#13092) The --count option for redis-cli has been released in redis 7.2. redis#12042 But I have found in code, that some logic was missing for using this 'count' option. ``` static redisReply *sendScan(unsigned long long *it) { redisReply *reply; if (config.pattern) reply = redisCommand(context, "SCAN %llu MATCH %b COUNT %d", *it, config.pattern, sdslen(config.pattern), config.count); else reply = redisCommand(context,"SCAN %llu",*it); ``` The intention was being able to using scan count. But in this case, the --count will be only applied when 'pattern' is declared. So, I had fix it simply, to be worked properly - even if --pattern option is not being used. I tested it simply with time() command several times, and I could see it works as intended with this commit. The examples of test results are below: ``` # unstable build time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null) real 0m1.287s user 0m0.011s sys 0m0.022s # count is not applied time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null) real 0m1.117s user 0m0.011s sys 0m0.020s # count is applied with --pattern time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null) real 0m0.045s user 0m0.002s sys 0m0.002s ``` ``` # fix-redis-cli-scan-count build time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan >/dev/null 2>/dev/null) real 0m1.084s user 0m0.008s sys 0m0.024s # count is applied even if --pattern is not declared time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 >/dev/null 2>/dev/null) real 0m0.043s user 0m0.000s sys 0m0.004s # of course this also applied time(./redis-cli -a $AUTH -p $PORT -h $HOST --scan --count 1000 --pattern "hash:*" >/dev/null 2>/dev/null) real 0m0.031s user 0m0.002s sys 0m0.002s ``` Thanks a lot. (cherry picked from commit 763827c)

…edis#13111) `CONFIG SET oom-score-adj handles configuration failures` test failed in some CI jobs today. Failed CI: https://github.com/redis/redis/actions/runs/8152519326 Not sure why the github action's docker image perssions have changed, but the issue is similar to redis#12887, where we can't assume the range of oom_score_adj that a user can change. ## Solution: Modify the way of determining whether the current user has no privileges or not, instead of relying on whether the user id is 0 or not. (cherry picked from commit 9738ba9)

Since lua_Number is not explicitly an integer or a double, we need to make an effort to convert it as an integer when that's possible, since the string could later be used in a context that doesn't support scientific notation (e.g. 1e9 instead of 100000000). Since fpconv_dtoa converts numbers with the equivalent of `%f` or `%e`, which ever is shorter, this would break if we try to pass a long integer number to a command that takes integer. we'll get an implicit conversion to string in Lua, and then the parsing in getLongLongFromObjectOrReply will fail. ``` > eval "redis.call('hincrby', 'key', 'field', '1000000000')" 0 (nil) > eval "redis.call('hincrby', 'key', 'field', tonumber('1000000000'))" 0 (error) ERR value is not an integer or out of range script: ac99c32e4daf7e300d593085b611de261954a946, on @user_script:1. ``` Switch to using ll2string if the number can be safely represented as a long long. The problem was introduced in redis#10587 (Redis 7.2). closes redis#13113. --------- Co-authored-by: Binbin <binloveplay1314@qq.com> Co-authored-by: debing.sun <debing.sun@redis.com> Co-authored-by: Oran Agra <oran@redislabs.com> (cherry picked from commit 5fdaa53)

…s MP-AOF (redis#12958) The check in fileIsManifest misjudged the manifest file. For example, if resp aof contains "file", it will be considered a manifest file and the check will fail: ``` *3 $3 set $4 file $4 file ``` In redis#12951, if the preamble aof also contains it, it will also fail. Fixes redis#12951. the bug was happening if the the word "file" is mentioned in the first 1024 lines of the AOF. and now as soon as it finds a non-comment line it'll break (if it contains "file" or doesn't) (cherry picked from commit da727ad)

In `beginResultEmission`, -1 means the result length is not known in advance. But after redis#12185, if we pass -1 to `zrangeResultBeginStore`, it will convert to SIZE_MAX in `zsetTypeCreate` and try to `dictExpand`. Although `dictExpand` won't succeed because the size overflows, I think we'd better to avoid this wrong conversion. This bug can be triggered when the source of `zrangestore` doesn't exist or we use `zrangestore` command with `byscore` or `bylex`. The impact is that dst keys will be converted to use skiplist instead of listpack. (cherry picked from commit bad33f8)

enjoy-binbin and others added 14 commits May 16, 2024 09:48

When one shard, sole primary node marks potentially failed replica as…

b5dde02

… FAIL instead of PFAIL (redis#12824) Fixes issue where a single primary cannot mark a replica as failed in a single-shard cluster. (cherry picked from commit b3aaa0a)

update redis-check-rdb types (redis#12969)

78f6dbe

seems that we forgot to update the array in redis-check rdb. (cherry picked from commit f9a0eb6)

Redis 7.2.5

c711522

oranagra approved these changes May 19, 2024

View reviewed changes

YaacovHazan merged commit f60370c into redis:7.2 May 19, 2024
39 of 42 checks passed

YaacovHazan deleted the redis-7.2.5 branch May 19, 2024 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release Redis 7.2.5 #13272

Release Redis 7.2.5 #13272

YaacovHazan commented May 16, 2024

Release Redis 7.2.5 #13272

Release Redis 7.2.5 #13272

Conversation

YaacovHazan commented May 16, 2024

Bug fixes

Bug fixes in CLI tools