Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a rate-limiter to dial() function? #115

Open
bboreham opened this issue Feb 24, 2020 · 2 comments
Open

Add a rate-limiter to dial() function? #115

bboreham opened this issue Feb 24, 2020 · 2 comments

Comments

@bboreham
Copy link

Every few days, one of my servers issues a kernel log message:

TCP: request_sock_TCP: Possible SYN flooding on port 11211. Sending cookies.  Check SNMP counters.

Mostly processing continues after this, but sometimes the entire server is unresponsive for minutes.

In this environment we have 10 Go programs using gomemcache hitting one memcached server, and each Go program has 60 goroutines that will call through this library. So I expected a maximum of 600 connections at a time.

I have seen the SYN flooding message at the default memcached connection backlog of 1024, and also after I raised it to 4096.

From inspection of logs, packet traces, etc., I have formed the impression that some glitch in processing or network causes timeout errors (at the default of 100ms), which then cause gomemcache to dial new connections. 60 goroutines waiting 100ms each to dial gives 600 new connections dialed each second, per process.

If the dial attempts are not being discarded on the other end of the wire, then I think it can quickly go over the backlog limit.

I wondered if gomemcache should have a rate-limiter on dial()? I would prefer gomemcache to fail quickly rather than raising the timeout to slow it down.
Any other insight would be valued.

The only related issue I could see here is #108 ; interestingly we are both running the same system.

@bboreham bboreham changed the title "Possible SYN flooding" Add a rate-limiter to dial() function? Aug 14, 2020
@bboreham
Copy link
Author

#86 could be used to add a rate-limiter from the outside. Or a circuit-breaker, which is perhaps an even better idea.

@bboreham
Copy link
Author

Update: I added a circuit-breaker using the code from #86, and the symptoms went away.
I would like to see #86 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant