Skip to content

Implementation and evaluation of hypervisor to guest offloading for high throughput Virtual Machine traffic classification

Notifications You must be signed in to change notification settings

psoftware/bpfhv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

=== Structure of this repository ===
The driver/ directory contains the bpfhv guest driver for
Linux kernels (>= 4.18):
    - bpfhv.c: driver source code

The proxy/ directory contains the implementation of an external
backend process associated to the QEMU bpfhv-proxy network backend
(-netdev bpfhv-proxy).
The external backend provides the queue processing functionalities
for a single bpfhv device that belongs to a QEMU VM. In other words,
QEMU only implements the control functionalities of the device, while
RX and TX queues are processed by the external process. Similarly
to vhost-user, QEMU and the external backend process use a dedicated
control channel to exchange some information needed for the packet
processing tasks (e.g. guest memory map, addresses of each TX and RX
queue, file descriptors for notifications, etc.).
Files:
    - backend.c: main file, implements the control protocol and
                 two packet processing loops (the first one is
                 a poll() event loop, whereas the second one uses
                 busy-wait;
    - sring.[ch]: hv implementation of a device which uses a minimal
                  descriptor format, with no support for offloads (and
                  reduced per-packet overhead);
    - sring_progs.c: eBPF programs for the sring device;
    - sring_gso.[ch]: hv implementation of a device which uses an
                      extended descriptor format, supporting checksum
                      offloads and TCP/UDP segmentation offloads;
    - sring_gso_progs.c: eBPF programs for the sring_gso device
    - vring_packed.[ch]: hv implementation of the packed virtqueue
                         in the VirtIO 1.1 specification;
    - vring_packed_progs.c: eBPF programs for the vring_packed device;
    - start-qemu.sh: an example script to start a QEMU VM with a
                     bpfhv device peered with a bpfhv-proxy network
                     backend;
    - start-proxy.sh: an example script to start the external backend
                      process and configure the backend network device
                      (e.g. a TAP interface or a netmap port);


=== Some advantages of bpfhv ===
    - Have doorbells on separate pages (configurable stride)
    - Provider can evolve the medatata header (e.g., virtio-net)
      to balance between the needs of FreeBSD and Linux.
      (virtio-net is good for Linux, but not for FreeBSD).
    - Virtio 1.1 vs 1.0 (while 0.95 is still around). This is a
      sign that there is a need for evolution and compatibility
      problems.
    - You can define a metadata format (e.g. virtio-net header)
      that fits the specific hardware NIC features used by the
      cloud provider.
    - Let the provider inject code to encrypt/decrypt the payload,
      together with the hardcoded key. The encrypt/decrypt routines
      can be helper functions that take as argument the OS packet
      pointer and the key.
    - Simplification of device paravirtualization. Fixed datapath
      ABI means that you need to be backward compatible. Look at
      virtio implementation in Linux 4.20: it needs to support
      both split and packet ring --> complex, error prone, less
      efficient.
    - Change virtual switch and backend under the hood (tap,
      netmap, other).
    - Adapt to changing workloads.


=== TODOs (driver) ===
    - Let BPFHV_MAX_TX_BUFS and BPFHV_MAX_RX_BUFS be variable.
      This requires of course reshaping the layout of the
      context data structures.
    - Try to replace dma_map_single() with dma_map_page() on
      the RX datapath ? Not sure this is relevant.
    - What if the eBPF program needs to modify the SG layout,
      e.g., for encapsulation or encryption? This would require
      changing the paddr/vaddr/len in the buffer descriptors,
      and DMA mapping and unmapping... So maybe we should ask
      the eBPF program to DMA map/unmap so, that it can do
      that after encapsulation or encryption (i.e. once
      the SG layout is stable).


=== TODOs (qemu) ===
    - Replace cpu_physical_memory_[un]map() with dma_memory_[un]map()
      and the MemoryRegionCache library. This should be only necessary
      if the guest platform has an IOMMU.
      Code in virtqueue_pop() and virtqueue_push().

    - let backend.ops.init fail (vring packed < 2^15)
    - move vring_packed generic code at the top of the files

About

Implementation and evaluation of hypervisor to guest offloading for high throughput Virtual Machine traffic classification

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published