Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slower bandwidth compared to sys-firewall #130

Open
grote opened this issue Dec 4, 2020 · 13 comments
Open

Slower bandwidth compared to sys-firewall #130

grote opened this issue Dec 4, 2020 · 13 comments

Comments

@grote
Copy link

grote commented Dec 4, 2020

I am debugging why I don't get my full 1Gbps bandwidth on Qubes OS which I (almost) get when booting Ubuntu from a USB flash drive. During this I noticed that using mirage-firewall provides worse performance than Qubes' default firewall.

when using mirage-firewall:

Screenshot_2020-12-04 Speedtest by Ookla - The Global Broadband Speed Test

when using sys-firewall:

Screenshot_2020-12-04 Speedtest by Ookla - The Global Broadband Speed Test

Could it be that mirage-firewall has bandwidth limitations?

@Jiw0cha
Copy link

Jiw0cha commented Dec 5, 2020

What about the bandwidth on sys-net? Better to take iperf3

@grote
Copy link
Author

grote commented Dec 5, 2020

Alright, so I set up iperf3 on a machine in the local network connected via 1Gbps ethernet running in server mode. Then I ran three client tests against it and the results are similar:

sys-net:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   963 MBytes   808 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   961 MBytes   804 Mbits/sec                  receiver

sys-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   709 MBytes   595 Mbits/sec    5             sender
[  5]   0.00-10.04  sec   706 MBytes   590 Mbits/sec                  receiver

mirage-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   207 MBytes   174 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   205 MBytes   172 Mbits/sec                  receiver

It is a shame when you have a 1Gbps fiber link and can't fully utilize it :(

@talex5
Copy link
Collaborator

talex5 commented Dec 5, 2020

Possibly relevant:

@grote
Copy link
Author

grote commented Dec 5, 2020

If I turn on Scatter Gather as suggested in QubesOS/qubes-issues#3510 I get:

with sys-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   997 MBytes   836 Mbits/sec  253             sender
[  5]   0.00-10.04  sec   993 MBytes   830 Mbits/sec                  receiver

with mirage-firewall:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   203 MBytes   171 Mbits/sec    0             sender
[  5]   0.00-10.04  sec   201 MBytes   168 Mbits/sec                  receiver

So this seems to fix the issue with sys-firewall (but note the number of retries), but mirage-firewall has actually slightly worse performance.

@hannesm
Copy link
Member

hannesm commented Dec 5, 2020

hey, so my experience with MirageOS unikernels is that the OCaml optimizer "flambda" helps to a big degree. I have not tested this with the QubesOS firewall, but would you mind to either

  • manually create a fresh opam switch opam sw create 4.11.1+flambda and compile the firewall there
  • if using Docker use the ocaml/opam:debian-10-ocaml-4.11-flambda container (or instead of debian 10 whichever distribution you prefer)

The resulting unikernel should be semantically equivalent, but allocating much less memory and thus more performant.

Another very suitable optimization is to use the best-fit allocation policy by passing --allocation-policy=best-fit to the unikernel -- either at the configuration stage (mirage configure -t xen --allocation-policy=best-fit) or at runtime as boot arguments (qubes... kernelopts '--allocation-policy=best-fit`).

I'd be very interested to see the number matrix of: baseline (mirage-qubes-firewall, as above); best-fit; flambda; flambda + best-fit.

Thanks for your report including figures for comparison.

@hannesm
Copy link
Member

hannesm commented Dec 5, 2020

NB: I scheduled (and finished) the builds above, please have a try with https://data.robur.coop/qubes-firewall-flambda/2020-12-05/ with the expectation to be the fast one (the best-fit allocation policy was already enabled at configuration time)

and https://data.robur.coop/qubes-firewall/2020-12-05/ for a unikernel with the "standard" OCaml compiler, but best-fit enabled at cofiguration time.

Espectially the first one (with flambda) would be interesting to see its performace numbers on your hardware.

Thanks, hannes

@grote
Copy link
Author

grote commented Dec 27, 2020

last release

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   237 MBytes   199 Mbits/sec   20             sender
[  5]   0.00-10.04  sec   236 MBytes   197 Mbits/sec                  receiver

best-fit enabled

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   523 MBytes   438 Mbits/sec   99             sender
[  5]   0.00-10.04  sec   520 MBytes   435 Mbits/sec                  receiver

flambda

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   531 MBytes   446 Mbits/sec  151             sender
[  5]   0.00-10.04  sec   529 MBytes   443 Mbits/sec                  receiver

For comparison again what sys-firewall is giving me:

[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   854 MBytes   716 Mbits/sec  196             sender
[  5]   0.00-10.04  sec   851 MBytes   711 Mbits/sec                  receiver

@hannesm
Copy link
Member

hannesm commented Jan 4, 2021

@grote thanks for your reported numbers. This month I plan to further analyze the bottlenecks of qubes-mirage-firewall, and will report back in this issue some graphs and more binaries to test. :) I'm glad that a factor of 2.5 is easily achieved by modern compiler features (that we should enable in future releases of qubes-mirage-firewall) :)

@hannesm
Copy link
Member

hannesm commented Oct 16, 2022

from #151 @palinp (with release 0.8.2)

The two PRs together are ready to merge according to my iperf3 tests. I have now those figures (TCP for 1', the mirage fw cpu is at 100% and the linux cpu (sys-net) is at around 70% (there is plenty of room for improvement there), + UDP for 1', the mirage CPU at 100% and the linux cpu at around 90%, the linux fw baseline is the same as in #130 I just noticed more dropping packet for UDP with linux than with mirage):

[user@fedora qubes-mirage-firewall]$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-60.00  sec  3.57 GBytes   510 Mbits/sec  529             sender
[  5]   0.00-60.00  sec  3.56 GBytes   510 Mbits/sec                  receiver

[user@fedora qubes-mirage-firewall]$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -u -t 60
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Jitter    Lost/Total Datagrams
[  5]   0.00-60.00  sec  4.61 GBytes   660 Mbits/sec  0.000 ms  0/3389750 (0%)  sender
[  5]   0.00-60.00  sec  4.61 GBytes   660 Mbits/sec  0.018 ms  785/3389697 (0.023%)  receiver

from IRC (also with 0.8.2):
minor diff between 20220527 no-flambda build (699 Mbps) vs 20221014 with-flambda (729 Mbps)

so, we're on a good track - but of course there's still room for improvements :)

@ihateprogramming88
Copy link

Bump! has this issue been resolved? Has any body found any work arounds?

@hannesm
Copy link
Member

hannesm commented Jan 24, 2023

Dear @ihateprogramming88, we're actively looking into this issue, now that qubes-mirage-firewall has stabilized. It will take some more time and testing to figure out how to improve the performance. :) If you are interested in contributing, let us know.

@ihateprogramming88
Copy link

Dear @hannesm, thanks for your response! I am happy to help :)

@palainp
Copy link
Member

palainp commented Jun 1, 2023

I tried to compare what could be the differences between linux and qubes-mirage-fw. What surprised me the most is that linux shows a gigantic bandwidth with TCP and only a slightly better bandwitdh with UDP.

If I shut off TCP Segmentation Offload, the sys-firewall AppVM have bandwidth in the same order of magnitude as with UDP or TCP with mirage.
The TSO would be usefull for similar bandwidth tests where you want to send a huge amount of data between two hosts.
I cannot imagine what could be the amount of work needed to implement TSO into the mirage stack, but I think this would help regarding this issue :)

The following are done on the same laptop (AppVM <-> fw <-> AppVM) with 1 core for the fw, so it counts the internal Xen bandwitdh (both with linux sys-fw between VMs, the first run leave TSO untouched while the second remove TSO from one VM):

$ sudo ethtool -K eth0 tso on
$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 10
Connecting to host 10.137.0.4, port 5201
[  5] local 10.137.0.21 port 35308 connected to 10.137.0.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   368 MBytes  3.08 Gbits/sec    0   1.90 MBytes       
[  5]   1.00-2.00   sec   380 MBytes  3.19 Gbits/sec    0   1.90 MBytes       
[  5]   2.00-3.00   sec   375 MBytes  3.15 Gbits/sec    0   1.90 MBytes       
[  5]   3.00-4.00   sec   369 MBytes  3.09 Gbits/sec    0   1.90 MBytes       
[  5]   4.00-5.00   sec   370 MBytes  3.10 Gbits/sec    0   1.90 MBytes       
[  5]   5.00-6.00   sec   369 MBytes  3.09 Gbits/sec    0   1.90 MBytes       
[  5]   6.00-7.00   sec   374 MBytes  3.14 Gbits/sec    0   1.90 MBytes       
[  5]   7.00-8.00   sec   372 MBytes  3.12 Gbits/sec    0   1.90 MBytes       
[  5]   8.00-9.00   sec   369 MBytes  3.09 Gbits/sec    0   1.90 MBytes       
[  5]   9.00-10.00  sec   372 MBytes  3.12 Gbits/sec    0   1.90 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  3.63 GBytes  3.12 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  3.63 GBytes  3.12 Gbits/sec                  receiver

iperf Done.
$ sudo ethtool -K eth0 tso off
$ iperf3 -c 10.137.0.4 -p 5201 -b 0 -t 10
Connecting to host 10.137.0.4, port 5201
[  5] local 10.137.0.21 port 33160 connected to 10.137.0.4 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec  85.6 MBytes   718 Mbits/sec    0   1.09 MBytes       
[  5]   1.00-2.00   sec  85.0 MBytes   713 Mbits/sec    0   1.09 MBytes       
[  5]   2.00-3.00   sec  86.2 MBytes   723 Mbits/sec    0   1.15 MBytes       
[  5]   3.00-4.00   sec  85.0 MBytes   713 Mbits/sec    0   1.15 MBytes       
[  5]   4.00-5.00   sec  83.8 MBytes   703 Mbits/sec    0   1.15 MBytes       
[  5]   5.00-6.00   sec  85.0 MBytes   713 Mbits/sec    0   1.21 MBytes       
[  5]   6.00-7.00   sec  83.8 MBytes   703 Mbits/sec    0   1.21 MBytes       
[  5]   7.00-8.00   sec  83.8 MBytes   703 Mbits/sec    0   1.21 MBytes       
[  5]   8.00-9.00   sec  85.0 MBytes   713 Mbits/sec    0   1.21 MBytes       
[  5]   9.00-10.00  sec  85.0 MBytes   713 Mbits/sec    0   1.21 MBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   848 MBytes   711 Mbits/sec    0             sender
[  5]   0.00-10.01  sec   846 MBytes   709 Mbits/sec                  receiver

iperf Done.

@palainp palainp mentioned this issue May 23, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants