Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use pcie-pci-bridge for the same iommu group #3369

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mikem-zed
Copy link
Contributor

@mikem-zed mikem-zed commented Jul 23, 2023

Rebase original PR #2544

Here is the latest comment from @eriknordmark in the original PR

@mikem-zed is this sharing of address space something we can detect (from spec.sh or otherwise?)
Our fallback position is that the hardware model needs to be tested and if it turns out that some devices must be assigned together then the user can edit the model file manually. (but if we have a way to detect it even better.)

We can detect address space overlapping by looking at BAR registers. It can be done at runtime. But this is exactly why Linux kernel put these devices into a single IOMMU group even with ACS override patch. Basically this check was already done by linux kernel. Are there any other reasons for putting >1 device in IOMMU group with ACS patch override? Good question. We can add a logic to check for address space overlaping and ad pci-pci bridge only in this case but the rule remains the same: the entier iommu group must be assigned to an app with or without extra pci-pci bridge

Rebase original PR lf-edge#2544

Signed-off-by: Petr Fedchenkov <giggsoff@gmail.com>
Signed-off-by: Mikhail Malyshev <mikem@zededa.com>
@mikem-zed mikem-zed requested a review from giggsoff July 23, 2023 17:24
@mikem-zed mikem-zed changed the title Use pcie-pci-bridge for the same iommu group [WIP] Use pcie-pci-bridge for the same iommu group Jul 23, 2023
for _, t := range list {
if t.pciLong == tap.pciLong {
return list
func addNoDuplicatePCI(list map[string][]typeAndPCI, tap typeAndPCI, group string) map[string][]typeAndPCI {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to change it from an ordered list to a (intentionally randomized by golang) map?

I suspect that the order matters e.g., if the DomainConfig has both eth0 with PCI-id x:y and eth1 with PCI-id w:z, then we need to pass them to kvm/xen in the order they were listed in DomainConfig, since I suspect a Linux guest VM will enumerate them based on the order they where in the kvm/xen/qemu config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, seems I did not pay attention to it during original PR.It looks like we need more complex structure to store pciAssignments in CreateDomConfig (not so pretty, but something like list of list of typeAndPCI + list of groups will work). Also, seems test should be modified to check the order as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I want to test it as-is on NUC to see if it solves current problem and then I'll fix it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this approach is not correct IMO. By adding pcie-pci bridge we start a legacy PCI topology which may be not correct for some PCIe devices. Besides some multi function devices must remain multi-function in VM e.g. GPU+Sound or Thunderbolt controllers. I have a POC code but I'll test the approach on real HW first

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @mikem-zed , also, according to the description in the original PR, it seems to me that the mentioned error it's being caused due to the ACS patch, am I right?

Copy link
Collaborator

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikem-zed , please, add some overview explanation to the commit's message. The old PR number information should be on the PR description at GH, where it can be traceable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants