New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic SR-IOV support #3049
Comments
Hey @DolceTriade, Thanks for creating this issue! To my best knowledge, unfortunately, there's no way to generically create VF for GPUs or even for Mellanox NICs. Enabling SR-IOV requires driver (as you mentioned) and correct way of using it. In case of Nvidia and Mellanox driver API is different from what we already have. Moreover, introducing new drivers will increase EVE image size, which is not desirable. |
I think the that while maybe GPUs are out of the question (certainly, nvidia needs a lot of deps), but accelerator cards do generally support SR-IOV without additional drivers. While I see that nested passthrough is supported in QEMU (https://wiki.qemu.org/Features/VT-d#Use_Case_3:_Nested_Guest_Device_Assignment), I wonder what the performance implications will be and whether those costs would be acceptable. The other workaround would have to be statically configure SR-IOV devices by adding a separate system level service to EVE specifically for that class of device, but that makes it hard to configure dynamically. I think the bare minimum of using the same SR-IOV code for network cards and just doing the basic dance of setting num_vfs and binding the created VFs to vfio-pci and being able to allocate these dynamically would be a big step forward and would actually cover a surprising number of use cases (like DPDK applications using network accelerators or even Intel QAT). |
|
Indeed. I believe that's why EVE only applies SR-IOV settings at first boot, which would be true for this use case as well? By allocate dynamically, I mean being able to assign the VFs to workloads dynamically, not keep changing the number of VFs dynamically. |
Use case
EVE supports SR-IOV for network cards, however, PCI devices like GPUs and accelerators also support SR-IOV. There are edge use cases where passing through GPUs and accelerator cards (see Intel QAT, Intel N300, Intel ACC100, etc) to Virtualized workloads. This especially true for telco and connectivity use cases where offloading crypto and fec operations are important to achieve maximum performance with limited power and CPU budgets.
Describe the solution you'd like
I propose adding a new enum for IoGenericPF and IoGenericVF to the list of available PhysicalIO types. Then in domainmgr, in the same place we handle ioEthPF, we also create VFs for devices of type IoGenericPF and automatically populate the VFs in the available hardware.
It would be the responsibility of the EVE image builder to ensure that the required driver and firmware are included in the EVE image (or perform any required initialization...)
The text was updated successfully, but these errors were encountered: