Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serious keepalive issue #67

Open
netvipe opened this issue Feb 20, 2020 · 6 comments
Open

Serious keepalive issue #67

netvipe opened this issue Feb 20, 2020 · 6 comments

Comments

@netvipe
Copy link

netvipe commented Feb 20, 2020

I've migrated my project from Yamux to Smux a couple of weeks ago. Most reasionly due to the memory alloc behavior of Yamux
After facing some issue, I thought everything is smooth and stable.
But the included keepalive functions seems to be far from that.

A debugging revealed that the notifyBucket/check dataReady approch is not reliable under heavy load and with higher RTT.
The async notification works, but dataReady is sometimes not correctly set for the timeout checks.
It is also very important to mention that this only happens when testing under real life conditions.
My local test-cases are all fine, but testing between remote devices reveals the mentioned issue.

One mitigation would be removing the s.Close() right after the dataReady checks.
Another would be replacing the check by a more reliable approach.
There would also be the option to disable keepalives and implement a real Ping() function.

I've chosen the last option.

Could you please check if a patch integration makes sense or alternatively provide a fix for the keepalive.

Thanks!

Here is my ping patch:
ping_patch.txt

@xtaci
Copy link
Owner

xtaci commented Feb 20, 2020

394 if !atomic.CompareAndSwapInt32(&s.dataReady, 1, 0) {

so if this line fails, there is no incoming data, OR(for extreme case), the goroutine for recvLoop() hasn't been scheduled to ?

@xtaci
Copy link
Owner

xtaci commented Feb 20, 2020

for one special case:

the caller of smux don't read the incoming data, then , it blocks on :
309 for atomic.LoadInt32(&s.bucket) <= 0 && !s.IsClosed() {

@xtaci
Copy link
Owner

xtaci commented Feb 20, 2020

80072d9

@netvipe
Copy link
Author

netvipe commented Feb 20, 2020

Your patch loooks like a good option for approach 2!

Any comments on the Ping()?

@xtaci
Copy link
Owner

xtaci commented Feb 20, 2020

Pong is not required, IMHO, 'cause we don't need RTT.

@netvipe
Copy link
Author

netvipe commented Feb 20, 2020

You may don't need it, but I'm using it and there have been plenty of other requests.
In addition, it's nice to have a connection control apart from the non-reliable IsClosed() to managed adjacent processes.
But okay. So I'll have to managed my own branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants