Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particular configuration can cause mismatched responses #1113

Closed
patrickwhite256 opened this issue Aug 1, 2019 · 7 comments · Fixed by #1116
Closed

Particular configuration can cause mismatched responses #1113

patrickwhite256 opened this issue Aug 1, 2019 · 7 comments · Fixed by #1116

Comments

@patrickwhite256
Copy link
Contributor

Using the ClusterClient with ReadOnly set and MaxRetries > 0 can lead to serious bugged client state.

In short, if a timeout or other retryable error occurs, the connection can buffer responses incorrectly, so you can get into a state where your code is:

msg, err := client.Echo("message").Result()
log.Println("msg:", msg) // should print "msg: message"
log.Println("err:", err) // should print: "err: <nil>"

can in fact print "msg: OK" and others.

This can lead to states where the client is unknowingly returning totally incorrect, but valid-looking responses with no error:

Within Redis server: <AAAA: "hello", BBBB: "world">
msg, err = client.Get("AAAA").Result() // "OK", <nil>
msg, err = client.Get("BBBB").Result() // "hello", <nil>

I've created https://github.com/patrickwhite256/goredis-bug-demo which contains a fake redis "server" and a script that can trip the bug. Because it's not a real Redis server, it's not proof positive of the bug, so I have also included a detailed explanation of what exactly needs to happen to trip the bug all the way down the stack.

I've opened #1112 to address this.

@vmihailenco
Copy link
Collaborator

vmihailenco commented Aug 2, 2019

Hi,

I've not done as thorough check as you so please forgive me if I miss something.

But SingleConnPool is only created in single place to init the connection. If there are any errors the connection is removed from the pool. After net.Conn is initialized the SingleConnPool is discarded and is not used anywhere else.

So you probably confuse newConn and baseClient.newConn.

I also briefly looked at the program and there is no way the program that executes single command and exits can reproduce any bugs in buffering / connection pool. There should not be any buffers / connections next time you run it.

Does that make sense?

@patrickwhite256
Copy link
Contributor Author

patrickwhite256 commented Aug 2, 2019

Have you run the demo, or just looked? If you haven't tried it, I strongly encourage doing so.

That single place where SingleConnPool is used, while initializing the connection, is where the bug occurs - this is why the bug only happens when ReadOnly is enabled - the initialization is skipped otherwise. From the perspective of _getConn, there are no errors, so the connection is not removed from the pool. The README in the repository I linked has a nearly line-by-line explanation.

@vmihailenco
Copy link
Collaborator

Sorry, I've only looked at the program / description. BTW thanks - you've done big work - I appreciate it.

I've slightly changed your fix by providing more robust SingleConnPool implementation from https://github.com/go-pg/pg/blob/master/internal/pool/pool_single.go. Now your example reliably fails but ideally we should also retry by opening new connection. I will see what I can do later...

@patrickwhite256
Copy link
Contributor Author

Thanks for working with me on this. It would be helpful if you could release a new version / tag.

@vmihailenco
Copy link
Collaborator

Okay, I've tagged v7.0.0-beta that you should be able to use with Go Modules and semantic version importing import "github.com/go-redis/redis/v7". See https://github.com/golang/go/wiki/Modules if you are not familiar with Go Modules.

@rcurrier666
Copy link

Any chance we can get this fix back ported to V6? Since we are seeing the problem in production we can't deploy a beta version of the package. Plus we are in the early days of converting to go-modules.

@vmihailenco
Copy link
Collaborator

Also tagged v6.15.4 with the fix for this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants