Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio_netmap: Use half the vring's size as netmap slots #726

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

awelzel
Copy link
Contributor

@awelzel awelzel commented Sep 13, 2020

For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).

This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.

Background: Experiment / Observation

Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:

tcpdump -i netmap:enp8s0 -Xn#

The first 128 packets look as expected, but thereafter come 128 packets
of "null data":

  128  15:59:46.592378 IP 10.0.0.128.128 > 127.0.0.1.256: UDP, length 141
        0x0000:  4500 00a9 0001 0000 4011 f0c2 0a00 0080  E.......@.......
        0x0010:  7f00 0001 0080 0100 0095 7f07 3030 3132  ............0012
        0x0020:  3820 3031 3330 2046 4646 4646 4646 4646  8.0130.FFFFFFFFF
        0x0030:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0040:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0050:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0060:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0070:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0080:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0090:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x00a0:  4646 4646 4646 4646 46                   FFFFFFFFF
  129  15:59:46.712936 00:00:00:00:00:00 > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 121
        0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
        0x0070:  0000 0000 00                             .....

At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:

  281  16:03:51.846490 IP 10.0.0.153.153 > 127.0.0.1.306: UDP, length 74
        0x0000:  4500 0066 0001 0000 4011 f0ec 0a00 0099  E..f....@.......
        0x0010:  7f00 0001 0099 0132 0052 d633 3030 3135  .......2.R.30015
        0x0020:  3320 3030 3633 2046 4646 4646 4646 4646  3.0063.FFFFFFFFF
        0x0030:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0040:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0050:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
        0x0060:  4646 4646 4646 5757 5757 5757 5757 5757  FFFFFFWWWWWWWWWW
        0x0070:  5757 5757 5757 5757 5757 5757 5757 5757  WWWWWWWWWWWWWWWW
        0x0080:  5757 5757 5757 5757 5757 5757 5757 5757  WWWWWWWWWWWWWWWW
        0x0090:  5757 5757 5757 5757 5757 5757 0000 0000  WWWWWWWWWWWW....
        0x00a0:  0000 0000 0000 0000                      ........

This seems like a biggie and I wonder why this hasn't come up before, maybe there's something off in the analysis?

Scapy code to produce packets

import argparse
import random
import string
import time

from scapy.all import Ether, IP, UDP, sendp


def extra_payload(i):
    length = int(random.random() * 256)
    return random.choice(string.ascii_uppercase) * length


def send_udp_packets(iface, *, pps):
    i = 0
    while True:
        i += 1
        ip = IP(src=f"10.0.{i // 256}.{i % 256}")
        u = UDP(sport=i % 2**16, dport=(2 * i) % 2 ** 16)

        extra = extra_payload(i)
        u.add_payload(f"{i:05} {len(extra):04} {extra}".encode("utf-8"))

        sendp(Ether() / ip / u, iface=iface)
        time.sleep(1 / pps)


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("iface")
    parser.add_argument("--pps", type=int, default=10)
    args = parser.parse_args()
    send_udp_packets(args.iface, pps=args.pps)


if __name__ == "__main__":
    main()

For every slot in a netmap ring, two descriptors in the virtqueue's
vring are used. Therefore, we can only use half the vring's size as
slots for netmap rings. Otherwise, the vring and the netmap ring are
out of sync. This currently corrupts proper operation and produces
bad data (see below).

This patch also sets an explicit nm_config to ensure we keep the
values for num_rx_desc/num_tx_desc from attach time and marks the
VQ_FULL case in virtio_netmap_init_buffers() as an issue that should
never be triggered.

Background: Experiment / Observation

Running a qemu-guest with a virtio_net NIC (rx_queue_size=256) attached
to a bridge on the host. Sending known/predictable packets in a loop
from the host via scapy. On the guest side, running tcpdump with native
netmap support and inspecting the packet data:

    tcpdump -i netmap:enp8s0 -Xn#

The first 128 packets look as expected, but thereafter come 128 packets
of "null data":

      128  15:59:46.592378 IP 10.0.0.128.128 > 127.0.0.1.256: UDP, length 141
            0x0000:  4500 00a9 0001 0000 4011 f0c2 0a00 0080  E.......@.......
            0x0010:  7f00 0001 0080 0100 0095 7f07 3030 3132  ............0012
            0x0020:  3820 3031 3330 2046 4646 4646 4646 4646  8.0130.FFFFFFFFF
            0x0030:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0040:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0050:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0060:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0070:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0080:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0090:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x00a0:  4646 4646 4646 4646 46                   FFFFFFFFF
      129  15:59:46.712936 00:00:00:00:00:00 > 00:00:00:00:00:00 Null Information, send seq 0, rcv seq 0, Flags [Command], length 121
            0x0000:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0010:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0020:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0030:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0040:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0050:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0060:  0000 0000 0000 0000 0000 0000 0000 0000  ................
            0x0070:  0000 0000 00                             .....

At packet 256 tcpdump receives non-null data, but the payloads represents
what should've been received for packet 129 and later. Because the length
is taken from the vring, some of the slots also expose stale/corrupt data:

      281  16:03:51.846490 IP 10.0.0.153.153 > 127.0.0.1.306: UDP, length 74
            0x0000:  4500 0066 0001 0000 4011 f0ec 0a00 0099  E..f....@.......
            0x0010:  7f00 0001 0099 0132 0052 d633 3030 3135  .......2.R.30015
            0x0020:  3320 3030 3633 2046 4646 4646 4646 4646  3.0063.FFFFFFFFF
            0x0030:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0040:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0050:  4646 4646 4646 4646 4646 4646 4646 4646  FFFFFFFFFFFFFFFF
            0x0060:  4646 4646 4646 5757 5757 5757 5757 5757  FFFFFFWWWWWWWWWW
            0x0070:  5757 5757 5757 5757 5757 5757 5757 5757  WWWWWWWWWWWWWWWW
            0x0080:  5757 5757 5757 5757 5757 5757 5757 5757  WWWWWWWWWWWWWWWW
            0x0090:  5757 5757 5757 5757 5757 5757 0000 0000  WWWWWWWWWWWW....
            0x00a0:  0000 0000 0000 0000                      ........
@awelzel
Copy link
Contributor Author

awelzel commented Sep 14, 2020

As an additional comment, FreeBSD uses only half the slots, too, unless indirect descriptors are in use. Linux virtio_net doesn't seem to use indirect descriptors (checked 4.4, 4,19), so it seems reasonable to do the same.

https://github.com/freebsd/freebsd/blob/master/sys/dev/netmap/if_vtnet_netmap.h#L386

* num_rx_desc descriptors and so the queue is not
* allowed to become full.
*/
if (vq->num_free == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the VQ_FULL macro for portability

@@ -677,6 +685,20 @@ virtio_netmap_intr(struct netmap_adapter *na, int onoff)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining this function is unnecessary, because nothing is going to change the values listed in struct nm_config_info...
Here you are not updating them indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants