Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase TCP bandwidth for small messages #1691

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

davidBar-On
Copy link
Contributor

@davidBar-On davidBar-On commented Apr 30, 2024

Suggested enhancement to resolve the iperf3 TCP low bandwidth with small message sizes, compared to iper2 and netperf. This is by receiving all the sent burst messages as one read message. In my environment, for -l1500 throughput is increased by about 35% for a single stream and more than 50% for multi streams tests.

It seems that the main reason for iperf2 and netperf higher bandwidth for small messages is that iperf3 is sending and receiving the same message size, while in iperf2, and probably netperf, they are different. For example, iperf2 default receive TCP message size is 128K.

Notes:

  1. Receiving all the burst messages as one message is assumed to be o.k. (and not "cheating"), based of iperf2 (and netperf) behavior.
  2. Since read is waiting to full messages (with timeout), when test is limited by bytes/block count or file size (-n, -k or --file) are set, read may wait because bytes sent are not multiple of message size. Therefore, when these parameters are set, read size is only -l value is read. Future enhancement may be that read will not wait for the full message (like in iperf2), and count received blocks based on number of bytes received.
  3. The TCP receive message size is extended (by multiplying blksize by the burst size) to maximum MAX_BLOCKSIZE (1MB). If this is too large, it may be limited to MAX_TCP_BUFFER (512KB)or DEFAULT_TCP_BLKSIZE (128KB).
  4. One of the advantages of using the sent burst size for the received message is that sending and receiving are in sync. However, other approaches may be used, e.g. adding additional value to -l with the receiver message length, i.e. the first value is for the sender size (and is the receiver default). This will be similar to the iperf2 approach, but I believe that using burst is better.
  5. There is an issue with using the burst size, which actually already exists in iperf3 before this PR. When setting the the test bandwidth using -b, the default burst size is set to 1. Assume that the network bandwidth is 1Gbps. For test that does not set -b the burst size is 10, and for test with -b10G the burst size is 1. Therefore, the first test will have higher bandwidth, although practically the second test didn't put a real limit to the bandwidth. A workaround is also setting the burst size by -b10G/10, but still users may not understand this burst size difference. (By the way, with multi-thread the sending burst may be redundant, so it may be possible to remove the burst loop when sending. With this approach, burst size will only be applicable to TCP receives, maybe allowing to not changing the default burst to 1 when -b is set.)

@swlars
Copy link
Contributor

swlars commented May 24, 2024

Thank you for the pull request! We've been looking into this, and we have some questions about the methodology. When we're taking performance measurements for different message sizes, it's important to consider if we're pushing bytes as quickly as possible or measuring the performance of the entire system with sending and receiving these differently-sized messages. We tend to prefer the latter. 

On the other hand, if iperf3 is limiting the performance due to implementation inefficiencies, that is something we would like to address. For example, we removed most of the select calls from the send and receiving loops, since they were interfering with the measurement.

We're not sure if increasing the size of these messages by receiving and reading them as larger blocks reflects the measurements we would like to take. 

@davidBar-On
Copy link
Contributor Author

davidBar-On commented May 25, 2024

Few comments that may help the evaluation:

it's important to consider if we're pushing bytes as quickly as possible or measuring the performance of the entire system with sending and receiving these differently-sized messages. We tend to prefer the latter.

The suggested change is only in the receiving side. There is no change to the way the bytes a pushed. If I understood iperf2 code correctly, this is how it works, i.e. sending each message separately but receiving bytes as they arrive. The approach is also similar to the SKIP-RX-COPY approach that the overhead of getting the messages at the receiving side may be ignored.

For example, we removed most of the select calls from the send and receiving loops, since they were interfering with the measurement.

As I mentioned in the description, because of this change the burst size has no meaning practically. Therefore, it seems that burst lost its meaning and that the default burst size can be changed to 1 with no impact on iperf3 performance. If the default burst will be changed to 1, this PR will not change the default iperf3 behavior (because the change is to read "burst" number of messages). Therefore, the burst value may change its meaning to its use by this PR. If this approach is desired I can enhance the PR code accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

iperf3 single-stream low bandwidth with small message sizes (1KB, 1500B, 2000B, etc.)
2 participants