Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

local ports exhaust quickly due to TCP TIME_WAIT when reconnect_interval is small #232

Open
minhuw opened this issue Jul 26, 2023 · 1 comment
Labels

Comments

@minhuw
Copy link

minhuw commented Jul 26, 2023

I found that when reconnect-interval is small, local ports exhaust quickly before the experiment completes as the log below shows.

$ memtier_benchmark -s 192.168.1.2 -t 1 -p 7777 -c 128 -n 10000 --json-out-file experiment.json --reconnect-interval 1
Json file experiment.json created...
Writing results to stdout
[RUN #1] Preparing benchmark client...
[RUN #1] Launching threads now...
[RUN #1 1%,   0 secs]  1 threads:       14335 ops,   14340 (avg:   14340) ops/sec, 611.74KB/sec (avg: 611.74KB/sec),  5.19 (avg:  5.19) msec latency
<some logs omitted>
[RUN #1 2%,  20 secs]  1 threads:       27477 ops,     692 (avg:    1373) ops/sec, 28.74KB/sec (avg: 58.36KB/sec), 47.35 (avg: 28.27) msec latency
connect failed, error = Cannot assign requested address
memtier_benchmark: shard_connection.cpp:470: void shard_connection::process_response(): Assertion `ret == 0' failed.

I find that SO_LINGER is not enabled so closed TCP connections go to the TIMEWAIT state instead of releasing local ports immediately.

struct linger ling = {0, 0};
int flags = 1;
int error = setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (void *) &flags, sizeof(flags));
assert(error == 0);
error = setsockopt(sockfd, SOL_SOCKET, SO_LINGER, (void *) &ling, sizeof(ling));
assert(error == 0);

It works if I enable SO_LINGER as follows thus aborting the connection immediately when it is closed.

-        struct linger ling = {0, 0};
+        struct linger ling = {1, 0};

Is there any reason SO_LINGER is not enabled? Any workaround so I could test the scenario when reconnect_interval is very small?

@filipecosta90
Copy link
Collaborator

@minhuw I believe tunning tcp_fin_timeout + tcp_tw_reuse / tcp_tw_recycle will help you WRT reusing TW connections and also reduce the TIMEWAIT connections in total.

However, it's essential to carefully test and evaluate the impact of enabling these parameters in your specific environment, as their behavior can vary depending on the network configuration and application requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants