Resolve server/client stuck at the test end before results-exchange #1527

davidBar-On · 2023-06-03T08:18:04Z

Version of iperf3 (or development branch, such as master or
3.1-STABLE) to which this pull request applies:
master
Issues fixed (if any):
Iperf3 server is getting stuck after printing the result #819
Brief description of code changes (suitable for use as a commit message):

A suggested fix for the issue that server and client got stuck at the end of the test, because the client did not receive EXCHANGE_RESULTS state, based on @ffolkes1911 test results and discussion starting at this comment.

The issue was caused on a cellular network, so the fix seem to be important for any iperf3 use over cellular network. As the changes are for the end of the test, they are probably related also to the multi-thread version.

The root cause of the issue was that the server sent (reverse mode) only about 60KB from each 128KB TCP packet. Therefore, the last read by the client did not receive a full 128KB packet. Since the read was in blocking-mode the client got stuck and was not able to respond to the server. Therefore, the server also got stuck while waiting for the client's reply. It is not clear whether the EXCHANGE_RESULTS was lost or whether it was just delayed, but the fix handles both cases.

The main changes done:

Client does not set the TCP streams to blocking mode at TEST_END. This is to allow the client to receive non-full late packets.
Allow --rcv-timeout in the client in sending mode - used at the end of the test to allow timeout for exchange-results messages read.
Server does not close the test streams at TEST_END. This is because select() returns immediately when monitoring closed sockets, and the client's select() timeout is not effective in this case.
Client does not monitor "write sockets" at TEST_END. Otherwise, even when no input is received from the server, client's select() for late packets will return immediately (as the write sockets are available) which will not allow using the --rcv-timeout.
When client sends IPERF_DONE or CLIENT_TERMINATE, it sends additional 3 null bytes. This will make sure that in case the server is waiting for the exchanged-results JSON length (4 bytes), it will not get stuck in the read command (server will then fail because it doesn't get a legal JSON).
Cancel client timers at TEST_END, as they are redundant at that point and may interrupt with the --rcv-timeout , because select() may return after a timer expires and before the receive timer expired.

davidBar-On · 2023-12-10T14:16:25Z

Re-implementation for the Multi-thread iperf3. I am not sure whether the original problem still happen with multi-thread. Some of the changes may be worth implementing in any case, so if needed I can submit a PR only with these changes:

Removed write_set as it became redundant with multi-thread.
Added timers canceling function, instead of canceling the timers one by one in several places.
Cancel client timers at TEST_END, as they are redundant at that point and may interrupt with the --rcv-timeout, because select() may return after a timer expires and before the receive timer expired.
Added sp->socket = -1 in two places, which makes the code cleaner.
Added some debug messages that may be useful in general.

The PR also include the code changes to resolve the original issue:

Allow --rcv-timeout in the client in sending mode - used at the end of the test to allow timeout for exchange-results messages read.
When client sends IPERF_DONE or CLIENT_TERMINATE, it sends additional 3 null bytes. This will make sure that in case the server is waiting for the exchanged-results JSON length (4 bytes), it will not get stuck in the read command (server will then fail because it doesn't get a legal JSON).

MattCatz · 2024-05-10T01:35:37Z

src/iperf_server_api.c

@@ -562,7 +545,6 @@ iperf_run_server(struct iperf_test *test)
 	}

        memcpy(&read_set, &test->read_set, sizeof(fd_set));
-        memcpy(&write_set, &test->write_set, sizeof(fd_set));


Removing this means that write_set may have an uninitialized values on line 581. Should you just remove the variable entirely since it it unused?

Thanks for the comment. Forgot to remove all occurances of write_set... Now removed (with rebase).

MattCatz · 2024-05-20T22:36:09Z

src/iperf_client_api.c

@@ -580,8 +583,9 @@ iperf_run_client(struct iperf_test * test)

    /* Begin calculating CPU utilization */
    cpu_util(NULL);
+    rcv_timeout_value_in_us = (test->settings->rcv_timeout.secs * SEC_TO_US) + test->settings->rcv_timeout.usecs;


This change creates a scenario with an early timeout.
Given:

test->mode == SENDER

rcv_timeout_value_in_us > 0

test->state == TEST_END (or EXCHANGE_RESULTS or DISPLAY_RESULTS)

This implies that rcv_timeout_us = 0. Then on line 642, the if statement will evaluate to something like if (t_usecs > 0); always being true since t_usecs will pretty much always be greater than 0.

It might make more sense to split the blocks so the correct timeout is used:
(I've also renamed rcv_timeout_value_in_us -> end_rcv_timeout and rcv_timeout_us -> running_rcv_timeout to hopefully make their use more clear)

if (result < 0 && errno != EINTR) { i_errno = IESELECT; goto cleanup_and_fail; } else if ( result == 0 && (running_rcv_timeout > 0 && test->state == TEST_RUNNING)) { /* * If nothing was received in non-reverse running state * then probably something got stuck - either client, * server or network, and test should be terminated./ */ iperf_time_now(&now); if (iperf_time_diff(&now, &last_receive_time, &diff_time) == 0) { t_usecs = iperf_time_in_usecs(&diff_time); if (t_usecs > running_rcv_timeout) { /* Idle timeout if no new blocks received */ if (test->blocks_received == last_receive_blocks) { i_errno = IENOMSG; goto cleanup_and_fail; } } } } else if (result == 0 && (end_rcv_timeout > 0 && (test->state == TEST_END || test->state == EXCHANGE_RESULTS || test->state == DISPLAY_RESULTS))) { iperf_time_now(&now); if (iperf_time_diff(&now, &last_receive_time, &diff_time) == 0) { t_usecs = iperf_time_in_usecs(&diff_time); if (t_usecs > end_rcv_timeout) { /* Idle timeout if no new blocks received */ if (test->blocks_received == last_receive_blocks) { i_errno = IENOMSG; goto cleanup_and_fail; } } } }

Only rcv_timeout_us value is used for timeout (in the select statement). The value of rcv_timeout_value_in_us is only used to initialize rcv_timeout_us and to indicate for for the ending states whether receive timeout was requested (as there is no "timeout requested" setting). Therefore, the current code is correct in that respect.

I agree that probably rcv_timeout_value_in_us may have a better name, like rcv_timeout_setting_us but I won't make a change just for that.

davidBar-On mentioned this pull request Jun 3, 2023

Iperf3 server is getting stuck after printing the result #819

Closed

davidBar-On force-pushed the issue-819-server-stuck-after-printing-results branch from a3df342 to dbe8d68 Compare December 10, 2023 12:55

MattCatz reviewed May 10, 2024

View reviewed changes

davidBar-On added 2 commits May 10, 2024 09:43

Reimplement for Multi-thread and remove write_set

e396362

Completely remove serevr write_set per comment received

fcf1c46

davidBar-On force-pushed the issue-819-server-stuck-after-printing-results branch from dbe8d68 to fcf1c46 Compare May 10, 2024 06:50

MattCatz reviewed May 20, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve server/client stuck at the test end before results-exchange #1527

Resolve server/client stuck at the test end before results-exchange #1527

davidBar-On commented Jun 3, 2023 •

edited

davidBar-On commented Dec 10, 2023

MattCatz May 10, 2024

davidBar-On May 10, 2024

MattCatz May 20, 2024

davidBar-On May 21, 2024

Resolve server/client stuck at the test end before results-exchange #1527

Are you sure you want to change the base?

Resolve server/client stuck at the test end before results-exchange #1527

Conversation

davidBar-On commented Jun 3, 2023 • edited

davidBar-On commented Dec 10, 2023

MattCatz May 10, 2024

Choose a reason for hiding this comment

davidBar-On May 10, 2024

Choose a reason for hiding this comment

MattCatz May 20, 2024

Choose a reason for hiding this comment

davidBar-On May 21, 2024

Choose a reason for hiding this comment

davidBar-On commented Jun 3, 2023 •

edited