Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does RMap is aware of cluster restart #5812

Open
lukasz-weareuncapped opened this issue Apr 25, 2024 · 12 comments
Open

Does RMap is aware of cluster restart #5812

lukasz-weareuncapped opened this issue Apr 25, 2024 · 12 comments
Labels

Comments

@lukasz-weareuncapped
Copy link

I have spring app works and running with RMap create from RedissonClient.getMap cached.
Once something happens with few redis nodes (we had a forced restart) that cached map does not work anymore.
What is the recommended approach in that case? Call redissonClient.getMap each time? As i saw under the hood new RedissonMap is created so could be time and resource consuming. So please advice. Thank you.

@mrniko
Copy link
Member

mrniko commented Apr 25, 2024

that cached map does not work anymore

Can you clarify here?

@lukasz-weareuncapped
Copy link
Author

while reading/writing to map i've got an exception. Unfortunately i have no clue what's the exact exception is due to lack of logs.. but we do read and write on it.

@mrniko
Copy link
Member

mrniko commented Apr 25, 2024

sorry, without details I'm unable to help. Redisson handles failover and you don't need to call redissonClient.getMap() after that.

@lukasz-weareuncapped
Copy link
Author

lukasz-weareuncapped commented Apr 26, 2024

Now i know much more. I've turned on debug logs and see that redisson keeps trying to reconnect(scaled down redis cluster to 0 and scaled up to repeat that issue). Unfortunately it keeps trying to reconnect to IP that is not active anymore. After reboot the IP has changed.. but looks that somehow it is cached and not able to connect to newly started redis nodes. If keeps trying forever with no luck.
Is there any caching mechanism?

`2024-04-26 11:03:16.519 CEST Connection created [addr=redis://locking-redis-node-0.locking-redis-headless.default.svc.cluster.local:26379]

2024-04-26 11:03:16.618 CEST reconnecting RedisPubSubConnection@395557197 [redisClient=[addr=redis://10.1.69.64:6379], channel=[id: 0x3d2eea36, L:/10.1.75.4:52138 ! R:10.1.69.64/10.1.69.64:6379], currentCommand=null, usage=0] to 10.1.69.64/10.1.69.64:6379

2024-04-26 11:03:16.618 CEST Connection created [addr=redis://10.1.69.64:6379]

2024-04-26 11:03:17.818 CEST connection released for command (INFO REPLICATION) and params [] from slot NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null] using connection RedisConnection@1350729690 [redisClient=[addr=redis://10.1.69.64:6379], channel=[id: 0xc72cb405, L:/10.1.75.4:52132 ! R:10.1.69.64/10.1.69.64:6379], currentCommand=null, usage=0]

2024-04-26 11:03:17.818 CEST attempt 3 for command (INFO REPLICATION) and params [] to NodeSource [slot=0, addr=null, redisClient=null, redirect=null, entry=null]

`

@mrniko
Copy link
Member

mrniko commented Apr 26, 2024

Which Redisson version?

@lukasz-weareuncapped
Copy link
Author

lukasz-weareuncapped commented Apr 26, 2024

redisson client 3.26.1
Configuration we have:

"SPRING_DATA_REDIS_PASSWORD": "123",
  "SPRING_DATA_REDIS_SENTINEL_MASTER": "mymaster",
  "SPRING_DATA_REDIS_SENTINEL_NODES": "locking-redis-headless.default.svc.cluster.local:26379",

on redis 17.13.2 · bitnami/bitnami

@mrniko
Copy link
Member

mrniko commented Apr 26, 2024

Can you share Redisson config?

Can you set trace logging level for org.redisson package and share the output?

@lukasz-weareuncapped
Copy link
Author

lukasz-weareuncapped commented Apr 26, 2024

redisson configuration is injected via redisson-spring-boot-starter

spring:
  data:
    redis:
      password: mypass
      sentinel:
        master: mymaster
        nodes: locking-redis-headless.default.svc.cluster.local:26379

Traces on the way...

@lukasz-weareuncapped
Copy link
Author

lukasz-weareuncapped commented Apr 26, 2024

Explore-logs-2024-04-26 20_28_22.txt - > Logs from 15 minutes of working application when trying to reconnect (after restarting all slaves)

and logs where redis slaves were working fine.->
Explore-logs-2024-04-26 20_34_00.txt

@lukasz-weareuncapped
Copy link
Author

lukasz-weareuncapped commented Apr 29, 2024

I found one exception related with DNS netty and found the issue solved by netty team that is related with caching DNS
netty/netty#11885 which is fixed in 4.1.109 or 4.1.110. (also related with netty/netty#12842 ) while the latest 3.29.0 redisson uses 4.1.104 under the hood

one more exception i'm facing on daily basis...
ERROR 2024-04-28T02:10:10.190425043Z Can't execute SENTINEL commands on 10.236.135.253/10.236.135.253:26379 java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (SENTINEL GET-MASTER-ADDR-BY-NAME), params: [mymaster], Redis client: [addr=redis://xx:26379] at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:254) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (SENTINEL GET-MASTER-ADDR-BY-NAME), params: [mymaster], Redis client: [addr=redis://xx:26379] at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:253) ... 7 common frames omitted

and

org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] java.util.concurrent.CompletionException: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] at java.base/java.util.concurrent.CompletableFuture.encodeRelay(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeRelay(Unknown Source) at java.base/java.util.concurrent.CompletableFuture$UniRelay.tryFire(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.postComplete(Unknown Source) at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:254) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715) at io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34) at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703) at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790) at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Caused by: org.redisson.client.RedisTimeoutException: Command execution timeout for command: (AUTH), params: (password masked), Redis client: [addr=redis://locking-redis-node-2.locking-redis-headless.default.svc.cluster.local:26379] at org.redisson.client.RedisConnection.lambda$async$0(RedisConnection.java:253) ... 7 common frames omitted

@mrniko
Copy link
Member

mrniko commented Apr 29, 2024

while the latest 3.29.0 redisson uses 4.1.104 under the hood

It uses 4.1.109 https://repo1.maven.org/maven2/org/redisson/redisson-parent/3.29.0/redisson-parent-3.29.0.pom

As for timeouts you need to read this https://github.com/redisson/redisson/wiki/16.-FAQ#q-what-is-the-cause-of-redistimeoutexception I'm sure it's a network issue

@mrniko mrniko closed this as completed Apr 29, 2024
@lukasz-weareuncapped
Copy link
Author

lukasz-weareuncapped commented Apr 30, 2024

Hi @mrniko
Thank you for your advice. We went trough whole network but have not found any problem.

I went trough several tests and had a discovery right now and confirmed that for sure it is not our network problem.
I did a scheduled test(places within an app) in single thread checking each 5 minutes via InetAddress resolving all my hosts. Each time we were getting right IP , even after restarting all slaves i got the right IP address (at the same time redisson were not able to reconnect).
Based on that I came up with the idea that something has changed on redisson recently and i've tried downgrade to 3.26.0 - and buuum. It started working again as it should.
So basically on version 3.29.0 we have same issue on all our applications that is unable to reconnect trying constantly to make a connection. Isn't that existing connections in theconnection pool are reused to "reconnect" them instead of creating new ones? (that is suggestion i've started making when seeing in logs same IP after restarts).
Below logs are visible after each time when i did slaves restart only on 3.26.0.. not on 3.29.0..
Now, after reconnect i may see in logs (on 3.26.0):

1 connections initialized for 10.1.65.24/10.1.65.24:6379
master 10.1.65.24/10.1.65.24:6379 used as slave
24 connections initialized for 10.1.65.24/10.1.65.24:6379
sentinel: redis://10.1.65.23:26379 is down
sentinel: redis://10.1.69.26:26379 added
1 connections initialized for locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379
slave: locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379 added
24 connections initialized for locking-redis-node-1.locking-redis-headless.default.svc.cluster.local/10.1.69.26:6379

What i've spotted (on 3.26.0) that log "connections initialized for" is printed in logs each time when we restart nodes. So reconnect

@mrniko mrniko reopened this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants