Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional feature to set the SO_REUSEADDR option before binding th… #638

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fabiorush
Copy link

@fabiorush fabiorush commented Jul 25, 2023

Background

Add an optional parameter to set the SO_REUSEADDR option on the TCP socket before binding
it (default false). Can reduce the amount of "bind: address already in use" errors when doing many
connections with the same address and port combination, like the ones that happened on #583.

Checklist

  • Git commit messages conform to community standards.
  • Each Git commit represents meaningful milestones or atomic units of work.
  • Changed or added code is covered by appropriate tests.

…e socket

Signed-off-by: Fábio Urquiza <fabiorush@gmail.com>
@fabiorush fabiorush requested a review from tsenart as a code owner July 25, 2023 15:54
@tsenart
Copy link
Owner

tsenart commented Jul 25, 2023

Hey, thanks for the patch. I have couple question.

  1. Does this work on every OS/architecture pair we ship releases to? If not, that's fine. We just need to document it properly and make sure this only runs on the supported ones. See the latest release for the supported targets: https://github.com/tsenart/vegeta/releases/tag/v12.11.0
  2. Are there any risks / trade-offs involved with setting this on outgoing connections? I've used this when binding to a port to receive incoming requests but never to outgoing connections.

@tsenart
Copy link
Owner

tsenart commented Jul 25, 2023

Here's what GPT4 had to say about this: https://chat.openai.com/share/b515e3dc-6193-4550-8d5d-3ecaa8088755

@fabiorush
Copy link
Author

fabiorush commented Jul 25, 2023

Yeah, I'm aware of the risks and the follow article explain it really well:

https://hea-www.harvard.edu/~fine/Tech/addrinuse.html

I think the main risk is to have a socket A receiving packets that should be received by socket B, but I think it doesn't matter much for the load test itself (it would matter for the report in the end, but I think we could add a note about that).

@fabiorush
Copy link
Author

And this post on stackoverflow talks about the different implementation of the REUSEADDR and REUSEPORT flags on many architectures. In resume the REUSEADDR flag stays the same, while we have some minor variations on the REUSEPORT flag.

https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ

@fabiorush
Copy link
Author

About the suggestions made by GPT4, something we cannot or are not allowed to modify sys net ipv4 parameters. For example, decreasing the tcp_fin_timeout will also affect other services, which may not be desirable.

@tsenart
Copy link
Owner

tsenart commented Jul 30, 2023

I need to convince myself that this is safe, and so far I haven't been able to. Maybe you can help me get there. If someone runs into "bind: address already in use" even given the fact that we're using HTTP keep alive underneath and re-using TCP connections, it like means the server is very slow to respond, or the network really faulty, or a mix. In either case, the TCP connection would still be open, and the client reading a response, so re-using the local address seems unsafe?

Or is it only applicable when the underlying TCP connection is in TIME_WAIT state? In any case, I'd like to write some sort of integration test that exercises this and gives us some more confidence.

@fabiorush
Copy link
Author

Exactly. On our use case we were executing a load test to a load balancer composed of 8 front end servers. Those servers queries a route table located on another server and do a proxy_pass to one internal server out of 20. We were trying to find out what happens when the communication to the route table service becomes unresponsible. Because how the retries were implemented a request that took milliseconds to be responded went to up 5 seconds. We wanted to know if the front end servers would crash or also become unresponsible but we started to runs into "bind: address already in use". The setting of the reuse option fixed this on our case. It only allows the reuse of TIME_WAIT sockets when you are binding to different combination of remote IP address and port. So if you have a socket in TIME_WAIT with the combination LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_A:PORT_X and try the same combination again it will not allow, but if use a combination with another remote IP address or port like LOCAL_ADDR_A:PORT_A <=> REMOTE_ADDR_B:PORT_X it will reuse the LOCAL_ADDR_A:PORT_A when binding.

@peterbourgon
Copy link
Contributor

peterbourgon commented Aug 4, 2023

If you're hitting "bind: address in use" issues in your load tests, it typically means you're exhausting the capabilities of the network stack on the host. SO_REUSEADDR and SO_REUSEPORT won't meaningfully impact those constraints. And, in any case, it's a situation that you would — should! — never encounter in a normal deployment, so it doesn't make sense to try to accommodate in a vegeta load test.

edit: vegeta acts as a client, not a server, and SO_REUSEADDR/PORT on the client side is far more restrictive than on the server side. Specifically, it doesn't allow a single physical connection from a given client addr:port to a given server addr:port to mux arbitrary logical connections.

Instead, you want to make sure that vegeta creates no more than a reasonable number of connections to a given target host. The best way to do that is by setting -max-connections to a value like 8 or 16 or 32. And if that means you can't do as many RPS as you want, then you need to run vegeta on multiple hosts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants