Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic SR-IOV support #3049

Open
DolceTriade opened this issue Feb 18, 2023 · 4 comments
Open

Generic SR-IOV support #3049

DolceTriade opened this issue Feb 18, 2023 · 4 comments

Comments

@DolceTriade
Copy link
Contributor

Use case

EVE supports SR-IOV for network cards, however, PCI devices like GPUs and accelerators also support SR-IOV. There are edge use cases where passing through GPUs and accelerator cards (see Intel QAT, Intel N300, Intel ACC100, etc) to Virtualized workloads. This especially true for telco and connectivity use cases where offloading crypto and fec operations are important to achieve maximum performance with limited power and CPU budgets.

Describe the solution you'd like

I propose adding a new enum for IoGenericPF and IoGenericVF to the list of available PhysicalIO types. Then in domainmgr, in the same place we handle ioEthPF, we also create VFs for devices of type IoGenericPF and automatically populate the VFs in the available hardware.

It would be the responsibility of the EVE image builder to ensure that the required driver and firmware are included in the EVE image (or perform any required initialization...)

@uncleDecart
Copy link
Contributor

Hey @DolceTriade,

Thanks for creating this issue! To my best knowledge, unfortunately, there's no way to generically create VF for GPUs or even for Mellanox NICs. Enabling SR-IOV requires driver (as you mentioned) and correct way of using it. In case of Nvidia and Mellanox driver API is different from what we already have. Moreover, introducing new drivers will increase EVE image size, which is not desirable.
There's one approach, which in my opinion can fit best for this specific task: Device Driver Domain. In a nutshell it's a way of spinning up a VM to run specific driver within it and use common virtio interface to attach this driver to VM which needs it. This way we will not need to add anything to EVE image it'll stay generic and we will be able to support any SR-IOV driver we want (and any other third-party driver). Of course, this approach needs careful performance evaluation (but in theory it should not be more addition than SPDK) I'd be glad to share my findings once they are in more readable format

@DolceTriade
Copy link
Contributor Author

I think the that while maybe GPUs are out of the question (certainly, nvidia needs a lot of deps), but accelerator cards do generally support SR-IOV without additional drivers.

While I see that nested passthrough is supported in QEMU (https://wiki.qemu.org/Features/VT-d#Use_Case_3:_Nested_Guest_Device_Assignment), I wonder what the performance implications will be and whether those costs would be acceptable.

The other workaround would have to be statically configure SR-IOV devices by adding a separate system level service to EVE specifically for that class of device, but that makes it hard to configure dynamically.

I think the bare minimum of using the same SR-IOV code for network cards and just doing the basic dance of setting num_vfs and binding the created VFs to vfio-pci and being able to allocate these dynamically would be a big step forward and would actually cover a surprising number of use cases (like DPDK applications using network accelerators or even Intel QAT).

@uncleDecart
Copy link
Contributor

I think the bare minimum of using the same SR-IOV code for network cards and just doing the basic dance of setting num_vfs and binding the created VFs to vfio-pci and being able to allocate these dynamically would be a big step forward and would actually cover a surprising number of use cases (like DPDK applications using network accelerators or even Intel QAT).
Dynamic allocation of VF can be very tricky. For instance old VF have to be removed first in order resize VFs. And if you have already passed through the other once to VMs, you will be removing these devices and that's actually not desirable behaviour. Of course, we can create some kind of stubs to manage that, but again, I'm not sure that there's a design which can guarantee availability and performance of such stub while we re-creating VF.

@DolceTriade
Copy link
Contributor Author

DolceTriade commented Feb 21, 2023

Indeed. I believe that's why EVE only applies SR-IOV settings at first boot, which would be true for this use case as well?

By allocate dynamically, I mean being able to assign the VFs to workloads dynamically, not keep changing the number of VFs dynamically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants