Does RMap is aware of cluster restart #5812

lukasz-weareuncapped · 2024-04-25T08:01:56Z

I have spring app works and running with RMap create from RedissonClient.getMap cached.
Once something happens with few redis nodes (we had a forced restart) that cached map does not work anymore.
What is the recommended approach in that case? Call redissonClient.getMap each time? As i saw under the hood new RedissonMap is created so could be time and resource consuming. So please advice. Thank you.

mrniko · 2024-04-25T08:11:55Z

that cached map does not work anymore

Can you clarify here?

lukasz-weareuncapped · 2024-04-25T10:39:18Z

while reading/writing to map i've got an exception. Unfortunately i have no clue what's the exact exception is due to lack of logs.. but we do read and write on it.

mrniko · 2024-04-25T11:49:10Z

sorry, without details I'm unable to help. Redisson handles failover and you don't need to call redissonClient.getMap() after that.

lukasz-weareuncapped · 2024-04-26T09:40:20Z

Now i know much more. I've turned on debug logs and see that redisson keeps trying to reconnect(scaled down redis cluster to 0 and scaled up to repeat that issue). Unfortunately it keeps trying to reconnect to IP that is not active anymore. After reboot the IP has changed.. but looks that somehow it is cached and not able to connect to newly started redis nodes. If keeps trying forever with no luck.
Is there any caching mechanism?

`2024-04-26 11:03:16.519 CEST Connection created [addr=redis://locking-redis-node-0.locking-redis-headless.default.svc.cluster.local:26379]

2024-04-26 11:03:16.618 CEST reconnecting RedisPubSubConnection@395557197 [redisClient=[addr=redis://10.1.69.64:6379], channel=[id: 0x3d2eea36, L:/10.1.75.4:52138 ! R:10.1.69.64/10.1.69.64:6379], currentCommand=null, usage=0] to 10.1.69.64/10.1.69.64:6379

2024-04-26 11:03:16.618 CEST Connection created [addr=redis://10.1.69.64:6379]

2024-04-26 11:03:17.818 CEST connection released for command (INFO REPLICATION) and params [] from slot NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null] using connection RedisConnection@1350729690 [redisClient=[addr=redis://10.1.69.64:6379], channel=[id: 0xc72cb405, L:/10.1.75.4:52132 ! R:10.1.69.64/10.1.69.64:6379], currentCommand=null, usage=0]

2024-04-26 11:03:17.818 CEST attempt 3 for command (INFO REPLICATION) and params [] to NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null]

`

mrniko · 2024-04-26T09:49:36Z

Which Redisson version?

lukasz-weareuncapped · 2024-04-26T10:05:29Z

redisson client 3.26.1
Configuration we have:

"SPRING_DATA_REDIS_PASSWORD": "123",
  "SPRING_DATA_REDIS_SENTINEL_MASTER": "mymaster",
  "SPRING_DATA_REDIS_SENTINEL_NODES": "locking-redis-headless.default.svc.cluster.local:26379",

on redis 17.13.2 · bitnami/bitnami

mrniko · 2024-04-26T11:38:57Z

Can you share Redisson config?

Can you set trace logging level for org.redisson package and share the output?

lukasz-weareuncapped · 2024-04-26T13:35:06Z

redisson configuration is injected via redisson-spring-boot-starter

spring:
  data:
    redis:
      password: mypass
      sentinel:
        master: mymaster
        nodes: locking-redis-headless.default.svc.cluster.local:26379

Traces on the way...

lukasz-weareuncapped · 2024-04-26T18:29:22Z

Explore-logs-2024-04-26 20_28_22.txt - > Logs from 15 minutes of working application when trying to reconnect (after restarting all slaves)

and logs where redis slaves were working fine.->
Explore-logs-2024-04-26 20_34_00.txt

lukasz-weareuncapped · 2024-04-29T07:28:23Z

I found one exception related with DNS netty and found the issue solved by netty team that is related with caching DNS
netty/netty#11885 which is fixed in 4.1.109 or 4.1.110. (also related with netty/netty#12842 ) while the latest 3.29.0 redisson uses 4.1.104 under the hood

one more exception i'm facing on daily basis...
ERROR 2024-04-28T02:10:10.190425043Z Can't execute SENTINEL commands on 10.236.135.253/10.236.135.253:26379 java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (SENTINEL GET-MASTER-ADDR-BY-NAME), params: [mymaster], Redis client: [addr=redis://xx:26379] at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:254) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (SENTINEL GET-MASTER-ADDR-BY-NAME), params: [mymaster], Redis client: [addr=redis://xx:26379] at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:253) ... 7 common frames omitted

and

org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:254) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:253) ... 7 common frames omitted

mrniko · 2024-04-29T07:38:18Z

while the latest 3.29.0 redisson uses 4.1.104 under the hood

It uses 4.1.109 https://repo1.maven.org/maven2/org/redisson/redisson-parent/3.29.0/redisson-parent-3.29.0.pom

As for timeouts you need to read this https://github.com/redisson/redisson/wiki/16.-FAQ#q-what-is-the-cause-of-redistimeoutexception I'm sure it's a network issue

lukasz-weareuncapped · 2024-04-30T21:06:40Z

Hi @mrniko
Thank you for your advice. We went trough whole network but have not found any problem.

I went trough several tests and had a discovery right now and confirmed that for sure it is not our network problem.
I did a scheduled test(places within an app) in single thread checking each 5 minutes via InetAddress resolving all my hosts. Each time we were getting right IP , even after restarting all slaves i got the right IP address (at the same time redisson were not able to reconnect).
Based on that I came up with the idea that something has changed on redisson recently and i've tried downgrade to 3.26.0 - and buuum. It started working again as it should.
So basically on version 3.29.0 we have same issue on all our applications that is unable to reconnect trying constantly to make a connection. Isn't that existing connections in theconnection pool are reused to "reconnect" them instead of creating new ones? (that is suggestion i've started making when seeing in logs same IP after restarts).
Below logs are visible after each time when i did slaves restart only on 3.26.0.. not on 3.29.0..
Now, after reconnect i may see in logs (on 3.26.0):

1 connections initialized for 10.1.65.24/10.1.65.24:6379
master 10.1.65.24/10.1.65.24:6379 used as slave
24 connections initialized for 10.1.65.24/10.1.65.24:6379
sentinel: redis://10.1.65.23:26379 is down
sentinel: redis://10.1.69.26:26379 added
1 connections initialized for locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379
slave: locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379 added
24 connections initialized for locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379

What i've spotted (on 3.26.0) that log "connections initialized for" is printed in logs each time when we restart nodes. So reconnect

lukasz-weareuncapped added the question label Apr 25, 2024

mrniko closed this as completed Apr 29, 2024

mrniko reopened this May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does RMap is aware of cluster restart #5812

Does RMap is aware of cluster restart #5812

lukasz-weareuncapped commented Apr 25, 2024

mrniko commented Apr 25, 2024

lukasz-weareuncapped commented Apr 25, 2024

mrniko commented Apr 25, 2024

lukasz-weareuncapped commented Apr 26, 2024 •

edited

mrniko commented Apr 26, 2024

lukasz-weareuncapped commented Apr 26, 2024 •

edited

mrniko commented Apr 26, 2024

lukasz-weareuncapped commented Apr 26, 2024 •

edited

lukasz-weareuncapped commented Apr 26, 2024 •

edited

lukasz-weareuncapped commented Apr 29, 2024 •

edited

mrniko commented Apr 29, 2024 •

edited

lukasz-weareuncapped commented Apr 30, 2024 •

edited

Does RMap is aware of cluster restart #5812

Does RMap is aware of cluster restart #5812

Comments

lukasz-weareuncapped commented Apr 25, 2024

mrniko commented Apr 25, 2024

lukasz-weareuncapped commented Apr 25, 2024

mrniko commented Apr 25, 2024

lukasz-weareuncapped commented Apr 26, 2024 • edited

mrniko commented Apr 26, 2024

lukasz-weareuncapped commented Apr 26, 2024 • edited

mrniko commented Apr 26, 2024

lukasz-weareuncapped commented Apr 26, 2024 • edited

lukasz-weareuncapped commented Apr 26, 2024 • edited

lukasz-weareuncapped commented Apr 29, 2024 • edited

mrniko commented Apr 29, 2024 • edited

lukasz-weareuncapped commented Apr 30, 2024 • edited

lukasz-weareuncapped commented Apr 26, 2024 •

edited

lukasz-weareuncapped commented Apr 26, 2024 •

edited

lukasz-weareuncapped commented Apr 26, 2024 •

edited

lukasz-weareuncapped commented Apr 26, 2024 •

edited

lukasz-weareuncapped commented Apr 29, 2024 •

edited

mrniko commented Apr 29, 2024 •

edited

lukasz-weareuncapped commented Apr 30, 2024 •

edited