Multithreaded iperf3 #289

blochl · 2015-08-04T08:25:51Z

Hi,

I've been using Iperf3 (latest code from GitHub) on a machine with 4 cores (8 threads) running Fedora 21.
I have noticed that even when testing with multiple streams (-P option) just two threads are used. The same, most basic, multistream test with Iperf2 utilizes all the cores.
Am I missing something? Any thoughts why it may behave this way?

Regards,
Leonid.

bmah888 · 2015-08-04T15:18:53Z

iperf3 is not multi-threaded, by design. Do you have any indication that your tests are CPU-bound?

wangyoucao577 · 2015-08-05T01:20:17Z

iperf3 is not multi-thread even if set the -P option?

blochl · 2015-08-05T14:56:52Z

This probably explains it.
+1 for wangyoucao577's question.
And I am curious: why is that so? Because Iperf2 is multi threaded.

bmah888 · 2015-08-06T14:46:59Z

@wangyoucao577 , @blochl : iperf3 is a complete rewrite and shares very little code in common with iperf3. The intended use case was to test high-rate single-stream performance, typical of science workflows on R&E networks. This doesn't require multi-threading for parallel threads.

@blochl : You didn't answer my question yet about whether the single-threaded design of iperf3 actually created a problem for you.

blochl · 2015-08-06T16:05:08Z

In our test scenario the CPU load is also measured. With Iperf3, contrary to Iperf2, the CPU load is constantly ~100%, but on a single CPU, which does not give much indication on the CPU load as a function of, e.g., buffer size. This is the issue.

bmah888 · 2015-08-06T17:17:42Z

OK. I understand what you're seeing. I believe that multi-threading iperf3 would be a non-trivial amount of work (the design predates my involvement with this project), although I admit I haven't really thought too much about it.

wangyoucao577 · 2015-08-10T03:48:00Z

I've also tried the case. If use iperf and set the '-P' option, I can find more than one thread. But If use iperf3 and set the '-P' option, only one thread for client. So that the behavior of '-P' is also different between iperf and iperf3. Can I ask why iperf3 give up the multi-thread implement? Is there any difference between use multi-thread or just multi-socket here?

bmah888 · 2015-08-10T04:26:20Z

Folks, the single-threaded behavior of iperf3 is not a mystery, and it doesn't require any investigation on anyone's part. It wasn't designed to be multi-threaded, and the implementation reflects that. I'm not sure exactly why it was designed this way. Maybe someone more closely involved with the initial iperf3 work can shed some light on this (I'll ask around a little more actively when I get back from vacation).

wangyoucao577 · 2015-08-10T04:44:19Z

OK. Thanks. I just want to know whether the multi-thread is necessary in some situation. Or maybe it's just not important.

blochl · 2015-08-10T07:04:18Z

@wangyoucao577 : Well, it is important if you would like to measure the CPU performance as the traffic is being generated. Maybe there are other cases.
And I wonder: if this single thread is saturated (~100%) does it mean that this might be a bottleneck?

wangyoucao577 · 2015-08-10T08:25:38Z

But if single thread will meet the (~100%) bottleneck, I think multi-thread will also meet it, isn't it?

blochl · 2015-08-10T12:25:56Z

@wangyoucao577 : Well, no. Why would it be that way? If a certain load causes a single thread to use 100%, one can spread it to several threads, and each one will take less, imho.
Besides, experementally, with the same parameters, Iperf2 takes 20-60% on each thread, and uses all of them, while Iperf3 takes close to 100% on a single one.

wangyoucao577 · 2015-08-11T01:43:17Z

Why? I can't understand it. Doesn't the multi-thread means more cpu cost from one thread to another? Why the result is single thread takes more cpu? In my understanding, such as the case iperf2 takes 20-60% on each thread, but sum of these threads will over 100%, isn't it?

@blochl :
Or maybe you mean that for multi cpu core pc, the multi-thread could use multi cpu cores to afford the test, but single-thread could only use 1 cpu core so that maybe it's not enough for the high network performance test?

blochl · 2015-08-11T12:17:52Z

@wangyoucao577 : I mean that ~100% of a single core is used. Multiple threads could have used less time, but on multiple cores. The bottleneck issue is only a speculation for now, but for testing the CPU and the network performance in parrallel this is undoubtedly important.

joachimtingvold · 2016-04-26T16:42:51Z

The CPU-usage seems to be a bottleneck, yes.

root@foobar:~# iperf3 -c ::1 -i1 -t10 -w32M -P8
[SUM]   0.00-10.00  sec  30.9 GBytes  26.5 Gbits/sec                  receiver

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
14278 root       20   0  8948  3892  1208 R 101.  0.1  0:05.44 iperf3 -s
14279 root       20   0  8948  3868  1184 R 81.6  0.1  0:04.67 iperf3 -c ::1 -i1 -t10 -w32M -P8

If I fire up another set of iperf3 (different port) at the same time, we clearly see that the CPU is causing a bottleneck (and not the network stack, since we get double the bandwidth running two in parallel);

root@foobar:~# iperf3 -c ::1 -i1 -t10 -w32M -P8 -p5202
[SUM]   0.00-10.00  sec  31.3 GBytes  26.9 Gbits/sec                  receiver

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
14270 root       20   0  7412  2368  1216 R 101.  0.1  0:16.32 iperf3 -s -p 5202
14283 root       20   0  7412  2392  1220 R 101.  0.1  0:49.77 iperf3 -s
14287 root       20   0  7412  2396  1252 R 81.3  0.1  0:09.96 iperf3 -c ::1 -i1 -t10 -w32M -P8 -p5202
14286 root       20   0  7412  2308  1160 R 78.9  0.1  0:15.12 iperf3 -c ::1 -i1 -t10 -w32M -P8

spsholleman · 2016-12-29T19:37:40Z

The other problem with this is iperf2 used to spin up a connection for each thread meaning different ephemeral ports. This causes hashing algorithms used in any network functionality to see many connections instead of a few and can help/hurt performance given your implementation. Looks like I can go up to 8 - but not more right now.

bms · 2017-01-06T09:55:18Z

I definitely agree that the option to span multiple cores -- which was present in iperf2 -- is useful in soak testing. One should note, however, that iperf2 achieved this only by using multiple client-server connections, with a thread being affine to each socket.

However, there was some absolutely dog ugly logic in iperf2's implementation. Basically, it wrapped UDP sockets in its C++ implementation to work a little along the lines of how TCP accept() creates a new socket for an inbound flow.

It could be done better, but it does strike me as a significant bit of work for iperf3 in its current incarnation.

bms · 2017-01-06T10:00:36Z

Performance issues with the multiple flows used by iperf2 might have been related to the lack of bottom-up affinity, or possibly even cache line effects. iperf2 wasn't aware of RSS or other mechanisms, and as far as I know, the only really portable way to pin socket workloads is by learning about the core/thread topology using something like hwloc, and then appropriate setsockopt()/platform CPU pinning APIs.

The PCB hash[es] in the TCP/IP stacks themselves are usually pretty performant. unless you were cycling new connections, the hash management itself might not be a bump.

I've whined about UDP in iperf2, but one advantage of its approach was that the socket each thread (per sub-flow in the measurement session) was using could explicity bind() and 'listen()' (wrapped) on each ephemeral port, instead of using recvfrom() directly. That might have modest cache benefit.

dchard · 2017-02-25T10:01:27Z

The single threaded design is also a massive problem on embedded systems, like routers. I have a dual-core MIPS based router which can do 4 threads (DIR-860L), and I see that iperf3 is maxing out a single core and not reaching the gigabit speed. Definitely limited by the single threaded design.

bltierney · 2017-03-30T17:57:58Z

suggest you use iperf2 instead.

RongDongsheng · 2019-02-15T08:13:32Z

iperf(2) create threads per stream, however, iperf3 has only one thread send multiple streams.
So, you can see theads use htop w/ iperf -P x, but only 1 thead w/ iperf3 -P x.
Both iperf and iperf3 has multiple send port.

bmah888 added the question label Aug 4, 2015

bmah888 self-assigned this Aug 4, 2015

bmah888 changed the title ~~Iperf3 CPU usage~~ Multithreaded iperf3 Aug 6, 2015

bltierney closed this as completed Mar 30, 2017

laeti-tia mentioned this issue Jun 6, 2019

iperf3 multiple streams support implementation perfsonar/pscheduler#859

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multithreaded iperf3 #289

Multithreaded iperf3 #289

blochl commented Aug 4, 2015

bmah888 commented Aug 4, 2015

wangyoucao577 commented Aug 5, 2015

blochl commented Aug 5, 2015

bmah888 commented Aug 6, 2015

blochl commented Aug 6, 2015

bmah888 commented Aug 6, 2015

wangyoucao577 commented Aug 10, 2015

bmah888 commented Aug 10, 2015

wangyoucao577 commented Aug 10, 2015

blochl commented Aug 10, 2015

wangyoucao577 commented Aug 10, 2015

blochl commented Aug 10, 2015

wangyoucao577 commented Aug 11, 2015

blochl commented Aug 11, 2015

joachimtingvold commented Apr 26, 2016 •

edited

spsholleman commented Dec 29, 2016

bms commented Jan 6, 2017

bms commented Jan 6, 2017

dchard commented Feb 25, 2017

bltierney commented Mar 30, 2017

RongDongsheng commented Feb 15, 2019

Multithreaded iperf3 #289

Multithreaded iperf3 #289

Comments

blochl commented Aug 4, 2015

bmah888 commented Aug 4, 2015

wangyoucao577 commented Aug 5, 2015

blochl commented Aug 5, 2015

bmah888 commented Aug 6, 2015

blochl commented Aug 6, 2015

bmah888 commented Aug 6, 2015

wangyoucao577 commented Aug 10, 2015

bmah888 commented Aug 10, 2015

wangyoucao577 commented Aug 10, 2015

blochl commented Aug 10, 2015

wangyoucao577 commented Aug 10, 2015

blochl commented Aug 10, 2015

wangyoucao577 commented Aug 11, 2015

blochl commented Aug 11, 2015

joachimtingvold commented Apr 26, 2016 • edited

spsholleman commented Dec 29, 2016

bms commented Jan 6, 2017

bms commented Jan 6, 2017

dchard commented Feb 25, 2017

bltierney commented Mar 30, 2017

RongDongsheng commented Feb 15, 2019

joachimtingvold commented Apr 26, 2016 •

edited