Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server hangs on abrupt disconnect #225

Closed
haukened opened this issue Dec 9, 2014 · 17 comments
Closed

server hangs on abrupt disconnect #225

haukened opened this issue Dec 9, 2014 · 17 comments
Assignees
Labels
Milestone

Comments

@haukened
Copy link

haukened commented Dec 9, 2014

when the client abruptly disconnects during a test the server hangs as busy and will not time out to accept a new connection. Currently requires manual resetting of the server instance.

@bmah888 bmah888 self-assigned this Dec 16, 2014
@bmah888
Copy link
Contributor

bmah888 commented Dec 16, 2014

What version of iperf are you running? What command line arguments are needed to make this happen?

@mkall
Copy link

mkall commented Jan 12, 2015

Hello,

We have noticed this same issue. In our case we use iperf3 for doing throughput surveys of WiFi networks. This is often done on unreliable networks with lots of roaming and connection drops.

I have analyzed the issue further. I think the problem is that when a clients network connection drops suddenly then TCP connections can be left hanging in a ESTABLISHED state on server side. Those connections are never closed by the OS (or at least not for a very long time). This is what happens with the server side TCP control channel connection of iperf3. This causes the test to keep going on indefinitely, which in turn means that the server refuses any further connections until it's restarted.

The simplest fix i could think of is to simply have a timeout on the server side that closes a any test that have been running for longer than the "duration" that was requested by client. I have a patch for iperf_server_api.c revision 8d3ed69 that fixes this issue for me by closing tests that linger for longer than "duration" + 5 seconds. I can send it to you if you like. Where shall i send it to?

For the future perhaps a more robust method would be to have a periodic keep-alive message on the control channel that checks the connection state. That might very slightly affect the test results though.

Let me know if you need any more information.

Best Regards, Mikael

@mkall
Copy link

mkall commented Jan 12, 2015

Hi again,

I forked the source and put the change here:
mkall@aaeeaf6

I forgot to post steps to reproduce the issue. Just run any UDP test, and while the test is ongoing disconnect the client's network abruptly. For me 100% of the time the server will keep running the test forever and must be restarted.

We hope that this fix (or some other fix that resolves the issue) can be included in master branch asap. Until then we will have to use a custom binary built from this source.

Best regards, Mikael

@haukened
Copy link
Author

Mikael,
That is exactly the same use scenario where I got it to hang as well.
Thank you so much for the patch, I will get that applied to our testing
server and see if I can confirm a fix for you on my side as well.
I really appreciate your help on this. I'll send you my email address
when I get to the office!

David

On Monday, January 12, 2015, mkall notifications@github.com wrote:

Hi again,

I forked the source and put the change here:
mkall/iperf@aaeeaf6
mkall@aaeeaf6

I forgot to post steps to reproduce the issue. Just run any UDP test, and
while the test is ongoing disconnect the client's network abruptly. For me
100% of the time the server will keep running the test forever and must be
restarted.

We hope that this fix (or some other fix that resolves the issue) can be
included in master branch asap. Until then we will have to use a custom
binary built from this source.

Best regards, Mikael


Reply to this email directly or view it on GitHub
#225 (comment).

@mkall
Copy link

mkall commented Jan 19, 2015

Hi David.

Did you ever get a chance to try out my patch? I'm curious to know if your issue was resolved. Especially since that might increase the chance of eventually getting the fix to the main branch.

  • Mikael

@haukened
Copy link
Author

Mikael,
I tested this yesterday on both TCP and UDP. the change seems to have
solved the problem completely! I did notice one strange thing: The patch
causes the process to terminate with a "bad file descriptor" error, I don't
know if there is a way to change the verbosity of this to "test timed out"
or something similar, but functionally it works wonderfully!!

David

On Sunday, January 18, 2015, mkall notifications@github.com wrote:

Hi David.

Did you ever get a chance to try out my patch? I'm curious to know if your
issue was resolved. Especially since that might increase the chance of
eventually getting the fix to the main branch.

  • Mikael


Reply to this email directly or view it on GitHub
#225 (comment).

@mkall
Copy link

mkall commented Jan 20, 2015

Ok, thanks for that! However, what do you mean by "causes process to terminate"? Does the whole iperf server process actually terminate? I'm guessing you don't mean that or else you would probably not be very happy with the fix. :) It should be just the ongoing test that terminates.

Bruce, I'm not sure if you are following this discussion. If I were to clean up the fix a little would you consider merging it to the main branch? Or do you prefer some other type of fix? By clean up code i mean adding comments, adding better error message, and perhaps make the number of seconds at which to timeout a #DEFINE value.

  • Mikael

@haukened
Copy link
Author

Bruce,
You are correct, the entire iperf process does not terminate, only the
currently running test! I'm actually very happy with the fix, and glad for
both of your involvement! Thanks to each of you for being part of what
makes Linux great!

David

On Tuesday, January 20, 2015, mkall notifications@github.com wrote:

Ok, thanks for that! However, what do you mean by "causes process to
terminate"? Does the whole iperf server process actually terminate? I'm
guessing you don't mean that or else you would probably not be very happy
with the fix. :) It should be just the ongoing test that terminates.

Bruce, I'm not sure if you are following this discussion. If I were to
clean up the fix a little would you consider merging it to the main branch?
Or do you prefer some other type of fix? By clean up code i mean adding
comments, adding better error message, and perhaps make the number of
seconds at which to timeout a #DEFINE value.

  • Mikael


Reply to this email directly or view it on GitHub
#225 (comment).

@bmah888
Copy link
Contributor

bmah888 commented Jan 20, 2015

@mkall: I've been peripherally following this...I'm working on several almost completely unrelated projects. I hope to find time to look at your patch soon.

@haukened: Linux isn't the only open-source project, or indeed the only open-source operating system. From a former release engineer for FreeBSD. :-)

@Crazy-Hopper
Copy link

Hello, I have to admit that this problem is haunting my server instance as well. (Built from latest git.)
Bruce, could you please take a look into this?

@TheRealDJ
Copy link
Contributor

I am running iperf 3.0.11 using TCP under CentOS 6.5 and I am seeing a few issues.

If I disconnect the ethernet cable from the client, the server remains busy until I kill and restart the iperf3 server process.

I am also seeing cases where iperf3 doesn't respond at all to a client. The server indicates that it is listening, but client connections hangs, then returns "iperf3: error - unable to receive control message: Connection reset by peer". Here is the lsof output for a iperf3 server port being used in this case (I know that it SAYS that its listening):

lsof | grep 55503
iperf3 28785 root 3u IPv6 2703599 0t0 TCP *:55503 (LISTEN)
iperf3 28785 root 4u IPv6 2705589 0t0 TCP xxxx:55503->xxxx:48790 (ESTABLISHED)
iperf3 28785 root 5u IPv6 2707433 0t0 TCP xxxx:55503->xxxx:48795 (CLOSE_WAIT)
iperf3 28785 root 7u IPv6 2755607 0t0 TCP xxxx:55503->xxxx:48962 (ESTABLISHED)

I have multiple 10G servers that are having this issue. I would be more than happy to assist with troubleshooting this issue.

@dmdailey
Copy link
Contributor

dmdailey commented Aug 5, 2016

I noticed also the stream sockets could be left hanging as well, so I added code to this fine patch from mkall to do that and submitted pull request.

dmdailey@f7fd67d

@bmah888
Copy link
Contributor

bmah888 commented Apr 20, 2017

This was solved by pull request #446. Keeping this issue open to note that that code change needs to be merged.

@bmah888
Copy link
Contributor

bmah888 commented Apr 21, 2017

#446 merged to 3.1-STABLE. Closing.

@bmah888 bmah888 closed this as completed Apr 21, 2017
@bmah888 bmah888 removed the merge label Apr 21, 2017
@ghost
Copy link

ghost commented Feb 12, 2020

I have installed version 3.7 on both client and server machines and the problem persists, both with TCP and UDP connections. The server just won't stop and/or restart when the client disconnects... I'm breaking the connection before the test duration ends and I'm using a Alpine machine (client) and a OpenWrt machine (server). I have not found any information about a possible server timeout in the iperf documentation. Am I missing something, or do other people have the same problem?

@ryanwwest
Copy link

I've noticed the same thing on 3.13.

@bmah888
Copy link
Contributor

bmah888 commented May 30, 2023

I've noticed the same thing on 3.13.

Try the --rcv-timeout option on the server side, it takes one argument that is the number of milliseconds of idleness before it aborts the connection, for example iperf3 --server --rcv-timeout=5000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants