{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":102116596,"defaultBranch":"master","name":"linux","ownerLogin":"jpemartins","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2017-09-01T13:38:57.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/279189?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1698155680.0","currentOid":""},"activityList":{"items":[{"before":"71675750263deeaa7ca7cc1509354d19270eb961","after":"058219f0cee29c651cf60c6745cffbf94b259627","ref":"refs/heads/iommufd-v6","pushedAt":"2023-10-24T14:04:27.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag\n\nChange test_mock_dirty_bitmaps() to pass a flag where it specifies the flag\nunder test. The test does the same thing as the GET_DIRTY_BITMAP regular\ntest. Except that it tests whether the dirtied bits are fetched all the\nsame a second time, as opposed to observing them cleared.\n\nSigned-off-by: Joao Martins ","shortMessageHtmlLink":"iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag"}},{"before":null,"after":"71675750263deeaa7ca7cc1509354d19270eb961","ref":"refs/heads/iommufd-v6","pushedAt":"2023-10-24T13:54:40.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag\n\nChange test_mock_dirty_bitmaps() to pass a flag where it specifies the flag\nunder test. The test does the same thing as the GET_DIRTY_BITMAP regular\ntest. Except that it tests whether the dirtied bits are fetched all the\nsame a second time, as opposed to observing them cleared.\n\nSigned-off-by: Joao Martins ","shortMessageHtmlLink":"iommufd/selftest: Test IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR flag"}},{"before":null,"after":"149979bd9ceed627616388de4f1872a75915d8c6","ref":"refs/heads/smmu-iommufd-v3","pushedAt":"2023-09-23T01:16:03.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"iommu/arm-smmu-v3: Enforce dirty tracking in domain attach/alloc\n\nSMMUv3 implements all requirements of revoking device\nattachment if smmu does not support dirty tracking.\n\nFinally handle the IOMMU_CAP_DIRTY in iommu_capable for\nIOMMUFD_DEVICE_GET_HW_INFO.\n\nSigned-off-by: Joao Martins ","shortMessageHtmlLink":"iommu/arm-smmu-v3: Enforce dirty tracking in domain attach/alloc"}},{"before":null,"after":"f725900c29ee1ed8e3afc169d27bcf1ea2359564","ref":"refs/heads/iommufd-v3","pushedAt":"2023-09-23T01:13:37.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"cover-letter: IOMMUFD Dirty Tracking\n\nPresented herewith is a series that extends IOMMUFD to have IOMMU\nhardware support for dirty bit in the IOPTEs.\n\nToday, AMD Milan (or more recent) supports it while ARM SMMUv3.2\nalongside VT-D rev3.x also do support. One intended use-case (but not\nrestricted!) is to support Live Migration with SR-IOV, specially useful\nfor live migrateable PCI devices that can't supply its own dirty\ntracking hardware blocks amongst others.\n\nAt a quick glance, IOMMUFD lets the userspace create the IOAS with a\nset of a IOVA ranges mapped to some physical memory composing an IO\npagetable. This is then created via HWPT_ALLOC or attached to a\nparticular device/hwpt, consequently creating the IOMMU domain and share\na common IO page table representing the endporint DMA-addressable guest\naddress space. In IOMMUFD Dirty tracking (since v2 of the series) it\nwill require via the HWPT_ALLOC model only, as opposed to simpler\nautodomains model.\n\nThe result is an hw_pagetable which represents the\niommu_domain which will be directly manipulated. The IOMMUFD UAPI,\nand the iommu/iommufd kAPI are then extended to provide:\n\n1) Enforcement that only devices with dirty tracking support are attached\nto an IOMMU domain, to cover the case where this isn't all homogenous in\nthe platform. While initially this is more aimed at possible heterogenous nature\nof ARM while x86 gets future proofed, should any such ocasion occur.\n\n[ Special Note: In this version I did it differently with the domain dirty\nops. Given that dirty tracking ops are now supposed to be set or not\ndynamically at domain allocation it means I can't rely on\ndefault_domain_ops to be set. So I added a new set of ops just for\ndirty tracking and is NULL by default. It looked simpler to me, and I\nwas concerned that dirty tracking would be the only one having 'dynamic'\nops, so chose the one with less churn. The alternatives are a) to either\nreplicate iommu_domain_ops with/without dirty tracking and have that set\nin domain_alloc_user() but it sounded wrong to just replicate the same\n.map and .unmap and other stuff that was unrelated or b) always allocate one\nin the driver that gets copied from the default_domain_ops value each\ndriver passes change it accordingly. c) have a structure for dynamic ops\n(this series). Open to alternatives or knowing the preference of folks. ]\n\nThe device dirty tracking enforcement on attach_dev is made whether the\ndirty_ops are set or not. Given that attach always checks for dirty\nops and IOMMU_CAP_DIRTY, while writing this it almost wanted this to\nmove to upper layer but semantically iommu driver should do the\nchecking.\n\n2) Toggling of Dirty Tracking on the iommu_domain. We model as the most\ncommon case of changing hardware translation control structures dynamically\n(x86) while making it easier to have an always-enabled mode. In the\nRFCv1, the ARM specific case is suggested to be always enabled instead of\nhaving to enable the per-PTE DBM control bit (what I previously called\n\"range tracking\"). Here, setting/clearing tracking means just clearing the\ndirty bits at start. The 'real' tracking of whether dirty\ntracking is enabled is stored in the IOMMU driver, hence no new\nfields are added to iommufd pagetable structures, except for the\niommu_domain dirty ops part via adding a dirty_ops field to\niommu_domain. We use that too for IOMMUFD to know if dirty tracking\nis supported and toggleable without having iommu drivers replicate said\nchecks.\n\n3) Add a capability probing for dirty tracking, leveraging the\nper-device iommu_capable() and adding a IOMMU_CAP_DIRTY. It extends\nthe GET_HW_INFO ioctl which takes a device ID to return some generic\ncapabilities *in addition*. Possible values enumarated by `enum\niommufd_hw_capabilities`.\n\n4) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap\nindexes on a page_size basis the IOVAs that got written by the device.\nWhile performing the marshalling also drivers need to clear the dirty bits\nfrom IOPTE and allow the kAPI caller to batch the much needed IOTLB flush.\nThere's no copy of bitmaps to userspace backed memory, all is zerocopy\nbased to not add more cost to the iommu driver IOPT walker. This shares\nfunctionality with VFIO device dirty tracking via the IOVA bitmap APIs. So\nfar this is a test-and-clear kind of interface given that the IOPT walk is\ngoing to be expensive. In addition this also adds the ability to read dirty\nbit info without clearing the PTE info. This is meant to cover the\nunmap-and-read-dirty use-case, and avoid the second IOTLB flush.\n\nNote: I've kept the name read_and_clear_dirty() as RFCv2 but this might not\nmake sense given the name of the flags; open to suggestions.\n\nThe only dependency is:\n* Have domain_alloc_user() API with flags [2] which is also used\nin the nesting work.\n\nThe series is organized as follows:\n\n* Patches 1-4: Takes care of the iommu domain operations to be added.\nThe idea is to abstract iommu drivers from any idea of how bitmaps are\nstored or propagated back to the caller, as well as allowing\ncontrol/batching over IOTLB flush. So there's a data structure and an\nhelper that only tells the upper layer that an IOVA range got dirty.\nThis logic is shared with VFIO and it's meant to walking the bitmap\nuser memory, and kmap-ing plus setting bits as needed. IOMMU driver\njust has an idea of a 'dirty bitmap state' and recording an IOVA as\ndirty.\n\n* Patches 5-15: Adds the UAPIs for IOMMUFD, and selftests. The selftests\ncover some corner cases on boundaries handling of the bitmap and various\nbitmap sizes that exercise. I haven't included huge IOVA ranges to avoid\nrisking the selftests failing to execute due to OOM issues of mmaping big\nbuffers.\n\nSo the next half of the series presents x86 implementations for IOMMUs:\n\n* Patches 16-18: AMD IOMMU implementation, particularly on those having\nHDSup support. Tested with a Qemu amd-iommu with HDSUp emulated[0]. And\ntested with live migration with VFs (but with IOMMU dirty tracking).\n\n* Patches 19: Intel IOMMU rev3.x+ implementation. Tested with a Qemu\nbased intel-iommu vIOMMU with SSADS emulation support[0].\n\nFor ARM-SMMU-v3 I've made adjustments from the RFCv2 but staged this\ninto a branch[6] with all the changes but didn't include here as I can't\ntest this besides compilation. Shameer, if you can pick up as chatted\nsometime ago it would be great as you have the hardware. Note that\nit depends on some patches from Nicolin for hw_info() and\ndomain_alloc_user() base support coming from his nesting work.\n\nOn AMD I have tested this with emulation and then live migration; and\nwhile I haven't tested on supported VTd hardware, so far emulation has\nbeen proving a reliable indication that it is functional, thus I kept it\non v3 the VTD bits.\n\nThe qemu iommu emulation bits are to increase coverage of this code and\nhopefully make this more broadly available for fellow\ncontributors/devs, old version[1]; it uses Yi's 2 commits to have\nhw_info() supported (still needs a bit of cleanup) on top of Zhenzhong\nlatest IOMMUFD QEMU bringup work: see here[0]. It includes IOMMUFD dirty\ntracking for Live migration and with live migration tested. I won't be\nexactly following up a v2 of QEMU patches given that IOMMUFD support\nneeds to be firstly supported by Qemu.\n\nShould be in a better direction of switching everything to be\ndomain_alloc_user() based. The only possible remaining wrinkle might be\nthe dynamic IOMMU dirty ops, if this proposal doesn't satisfy.\n\nThis series is also hosted here[3] and sits on top of the branch behind[2].\n\nFeedback or any comments are very much appreciated.\n\nThanks!\n\tJoao\n\n[0] https://github.com/jpemartins/qemu/commits/iommufd-v3\n[1] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/\n[2] https://lore.kernel.org/linux-iommu/20230919092523.39286-1-yi.l.liu@intel.com/\n[3] https://github.com/jpemartins/linux/commits/iommufd-v3\n[4] https://lore.kernel.org/linux-iommu/20230518204650.14541-1-joao.m.martins@oracle.com/\n[5] https://lore.kernel.org/kvm/20220428210933.3583-1-joao.m.martins@oracle.com/\n[6] https://github.com/jpemartins/linux/commits/smmu-iommufd-v3\n\nChanges since RFCv2[4]:\n* Test with Live Migration thus dropped RFC\n* General commit message improvements\n* Remove spurious headers in selftests\n* Exported some symbols to actually allow things to build when IOMMUFD\nis built as a module. (Alex Williamson)\n* Switch the enforcing to be done on IOMMU domain allocation via\ndomain_alloc_user (Jason, Robin, Lu Baolu)\n* Removed RCU series from Lu Baolu (Jason)\n* Switch set_dirty/read_dirty/clear_dirty to down_read() (Jason)\n* Make sure it check for area::pages (Jason)\n* Move clearing dirties before set dirty a helper (Jason)\n* Avoid breaking IOMMUFD selftests UAPI (Jason)\n* General improvements to testing\n* Add coverage to new out_capabilities support in HW_INFO.\n* Address Shameer/Robin comments in smmu-v3 (code is on a branch[6])\n - Properly check for FEAT_HD together with COHERENCY\n - Remove the pgsize_bitmap check\n - Limit the quirk set to s1 pgtbl_cfg.\n - Fix commit message on dubious sentence on DBM usecase\n\nChanges since RFCv1[5]:\nToo many changes but the major items were:\n* Majorirty of the changes from Jason/Kevin/Baolu/Suravee:\n- Improve structure and rework most commit messages\n- Drop all of the VFIO-compat series\n- Drop the unmap-get-dirty API\n- Tie this to HWPT only, no more autodomains handling;\n- Rework the UAPI widely by:\n - Having a IOMMU_DEVICE_GET_CAPS which allows to fetching capabilities\n of devices, specifically test dirty tracking support for an individual\n device\n - Add a enforce-dirty flag to the IOMMU domain via HWPT_ALLOC\n - SET_DIRTY now clears dirty tracking before asking iommu driver to do so;\n - New GET_DIRTY_IOVA flag that does not clear dirty bits\n - Add coverage for all added APIs\n - Expand GET_DIRTY_IOVA tests to cover IOVA bitmap corner cases tests\n that I had in separate; I only excluded the Terabyte IOVA range\n usecases (which test bitmaps 2M+) because those will most likely fail\n to be run as selftests (not sure yet how I can include those). I am\n not exactly sure how I can cover those, unless I do 'fake IOVA maps'\n *somehow* which do not necessarily require real buffers.\n- Handle most comments in intel-iommu. Only remaining one for v3 is the\n PTE walker which will be done better.\n- Handle all comments in amd-iommu, most of which regarding locking.\n Only one remaining is v3 same as Intel;\n- Reflect the UAPI changes into iommu driver implementations, including\npersisting dirty tracking enabling in new attach_dev calls, as well as\nenforcing attach_dev enforces the requested domain flags;\n* Comments from Yi Sun in making sure that dirty tracking isn't\nrestricted into SS only, so relax the check for FL support because it's\nalways enabled. (Yi Sun)\n* Most of code that was in v1 for dirty bitmaps got rewritten and\nrepurpose to also cover VFIO case; so reuse this infra here too for both.\n(Jason)\n* Take Robin's suggestion of always enabling dirty tracking and set_dirty\njust clearing bits on 'activation', and make that a generic property to\nensure we always get accurate results between starting and stopping\ntracking. (Robin Murphy)\n* Address all comments from SMMUv3 into how we enable/test the DBM, or the\nbits in the context descriptor with io-pgtable::quirks, etc\n(Robin, Shameerali)","shortMessageHtmlLink":"cover-letter: IOMMUFD Dirty Tracking"}},{"before":"13bba5487bed4e925a4880699a79f9c28d01fb91","after":"eff23626467dcbc07144def4a6261ce9df0454a0","ref":"refs/heads/iommufd-v2","pushedAt":"2023-05-18T20:35:07.758Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"cover-letter: IOMMUFD Dirty Tracking\n\nPresented herewith is a series that extends IOMMUFD to have IOMMU\nhardware support for dirty bit in the IOPTEs.\n\nToday, AMD Milan (or more recent) supports it while ARM SMMUv3.2\nalongside VT-D rev3.x are expected to eventually come along. One\nintended use-case (but not restricted!) is to support Live Migration with\nSR-IOV, specially useful for live migrateable PCI devices that can't\nsupply its own dirty tracking hardware blocks amongst others.\n\nAt a quick glance, IOMMUFD lets the userspace create the IOAS with a\nset of a IOVA ranges mapped to some physical memory composing an IO\npagetable. This is then created via HWPT_ALLOC or attached to a\nparticular device/hwpt, consequently creating the IOMMU domain and share\na common IO page table representing the endporint DMA-addressable guest\naddress space. In IOMMUFD Dirty tracking (from v1 of the series) we will\nbe supporting the HWPT_ALLOC model only, as opposed to simpler\nautodomains model.\n\nThe result is an hw_pagetable which represents the\niommu_domain which will be directly manipulated. The IOMMUFD UAPI,\nand the iommu core kAPI are then extended to provide:\n\n1) Enforce that only devices with dirty tracking support are attached\nto an IOMMU domain, to cover the case where this isn't all homogenous in\nthe platform. The enforcing being enabled or not is tracked by the iommu\ndomain op *caller* not iommu driver implementaiton, to avoid redundantly\ncheck this in IOMMU ops.\n\n2) Toggling of Dirty Tracking on the iommu_domain. We model as the most\ncommon case of changing hardware translation control structures dynamically\n(x86) while making it easier to have an always-enabled mode. In the\nRFCv1, the ARM specific case is suggested to be always enabled instead of\nhaving to enable the per-PTE DBM control bit (what I previously called\n\"range tracking\"). Here, setting/clearing tracking means just clearing the\ndirty bits at start. IOMMUFD wise The 'real' tracking of whether dirty\ntracking is enabled is stored in the IOMMU driver, hence no new\nfields are added to iommufd pagetable structures, except for the\niommu_domain enforcement part.\n\nNote: I haven't included a GET_DIRTY ioctl but I do have it implemented.\nBut I am not sure this is exactly needed. I find it good to have a getter\nsupplied with setter in general, but looking at how other parts were\ndeveloped in the past, the getter doesn't have a usage...\n\n3) Add a capability probing for dirty tracking, leveraging the per-device\niommu_capable() and adding a IOMMU_CAP_DIRTY. IOMMUFD we add a\nDEVICE_GET_CAPS ioctl which takes a device ID and returns some\ncapabilities. Similarly to 1) it might make sense to move it the drivers\n.attach_device validation to the caller; for now I have this in iommu\ndrivers;\n\n4) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap\nindexes on a page_size basis the IOVAs that got written by the device.\nWhile performing the marshalling also drivers need to clear the dirty bits\nfrom IOPTE and allow the kAPI caller to batch the much needed IOTLB flush.\nThere's no copy of bitmaps to userspace backed memory, all is zerocopy\nbased to not add more cost to the iommu driver IOPT walker. This shares\nfunctionality with VFIO device dirty tracking via the IOVA bitmap APIs. So\nfar this is a test-and-clear kind of interface given that the IOPT walk is\ngoing to be expensive. In addition this also adds the ability to read dirty\nbit info without clearing the PTE info. This is meant to cover the\nunmap-and-read-dirty use-case, and avoid the second IOTLB flush.\n\nNote: I've kept the name read_and_clear_dirty() in v2 but this might not\nmake sense given the name of the flags; open to suggestions.\n\n5) I've pulled Baolu Lu's patches[0] that make the pagetables page free\npath RCU-protected, which will fix both the use-after-free scenario\nmentioned there, but also let us a RCU-based page table walker for\nread_and_clear_dirty() iommu op, as opposed to taking the same locks on\nmap/unmap. These are taken exactly as they were posted.\n\nThe additional dependency are:\n* HWPT_ALLOC which allows creating/manipulating iommu_domain creation[4]\n\nWhile needing this to make it useful with VFIO (and consequently to VMMs):\n* VFIO cdev support and to use iommufd with VFIO [3]\n* VFIO PCI hot reset\n\nHence, I have these two dependencies applied first on top of this series.\nThis whole thing as posted is also here[6].\n\nThe series is organized as follows:\n\n* Patches 1-4: Takes care of the iommu domain operations to be added.\nThe idea is to abstract iommu drivers from any idea of how bitmaps are\nstored or propagated back to the caller, as well as allowing\ncontrol/batching over IOTLB flush. So there's a data structure and an\nhelper that only tells the upper layer that an IOVA range got dirty.\nThis logic is shared with VFIO and it's meant to walking the bitmap\nuser memory, and kmap-ing plus setting bits as needed. IOMMU driver\njust has an idea of a 'dirty bitmap state' and recording an IOVA as\ndirty. It also pulls Baolu Lu's patches for RCU-safe pagetable free.\n\n* Patches 5-16: Adds the UAPIs for IOMMUFD, and selftests. The selftests\ncover some corner cases on boundaries handling of the bitmap and various\nbitmap sizes that exercise. I haven't included huge IOVA ranges to avoid\nrisking the selftests failing to execute due to OOM issues of mmaping bit\nbuffers.\n\nFor completeness and most importantly to make sure the new IOMMU core\nops capture the hardware blocks, I've implemented for the x86 IOMMUs that\nhave/eventually-have IOMMU A/D support. So the next half of the series\npresents said implementations for IOMMUs:\n\n* Patches 17-18: AMD IOMMU implementation, particularly on those having\nHDSup support. Tested with a Qemu amd-iommu with HDSUp emulated[1].\n\n* Patches 19: Intel IOMMU rev3.x+ implementation. Tested with a Qemu\nbased intel-iommu vIOMMU with SSADS emulation support[1].\n\n* Patches 20-24: ARM SMMU v3 impleemntation. A lot simpler than the v1\nposting. Most of the adjustments were because of the new UAPI while taking\nthe comments I got in v1 from everyone. Only compile tested. Shameerali\nwill be taking over the ARM SMMUv3 support;\n\nTo help testing/prototypization, I also wrote qemu iommu emulation bits\nto increase coverage of this code and hopefully make this more broadly\navailable for fellow contributors/devs[1]; it is stored here[2] and\nlargelly based on Nicolin, Yi and Eric's IOMMUFD bringup work (thanks a\nton!). It also includes IOMMUFD dirty tracking supporting Qemu that got\nposted in the past. I won't be exactly following up a v2 there given that\nIOMMUFD support needs to be firstly supported by Qemu.\n\nWe have live migrateable VFs in VMMs these days (e.g. Qemu 8.0) so we can\nnow test everything in tandem, but I haven't have my hardware setup *yet*\norganized in such manner that allows me to test everything, hence why I am\nstill marking this as an RFC with intent to drop in v3. But most\nimportantly, this version is for making sure that iommu/iommufd kAPIs/UAPI\nare solid;\n\nSorry for such a late posting since v1; hopefully this are in a better\ndirection.\n\nFeedback or any comments are very much appreciated\n\nThanks!\n\tJoao\n\nTODOs for v3:\n- Testing with a live migrateable VF;\n- Improve the dirty PTE walking in Intel/AMD iommu drivers;\n\nChanges since RFCv1[5]:\nToo many changes but the major items were:\n* Majorirty of the changes from Jason/Kevin/Baolu/Suravee:\n- Improve structure and rework most commit messages\n- Drop all of the VFIO-compat series\n- Drop the unmap-get-dirty API\n- Tie this to HWPT only, no more autodomains handling;\n- Rework the UAPI widely by:\n - Having a IOMMU_DEVICE_GET_CAPS which allows to fetching capabilities\n of devices, specifically test dirty tracking support for an individual\n device\n - Add a enforce-dirty flag to the IOMMU domain via HWPT_ALLOC\n - SET_DIRTY now clears dirty tracking before asking iommu driver to do so;\n - New GET_DIRTY_IOVA flag that does not clear dirty bits\n - Add coverage for all added APIs\n - Expand GET_DIRTY_IOVA tests to cover IOVA bitmap corner cases tests\n that I had in separate; I only excluded the Terabyte IOVA range\n usecases (which test bitmaps 2M+) because those will most likely fail\n to be run as selftests (not sure yet how I can include those). I am\n not exactly sure how I can cover those, unless I do 'fake IOVA maps'\n *somehow* which do not necessarily require real buffers.\n- Handle most comments in intel-iommu. Only remaining one for v3 is the\n PTE walker which will be done better.\n- Handle all comments in amd-iommu, most of which regarding locking.\n Only one remaining is v3 same as Intel;\n- Reflect the UAPI changes into iommu driver implementations, including\npersisting dirty tracking enabling in new attach_dev calls, as well as\nenforcing attach_dev enforces the requested domain flags;\n* Comments from Yi Sun in making sure that dirty tracking isn't\nrestricted into SS only, so relax the check for FL support because it's\nalways enabled. (Yi Sun)\n* Most of code that was in v1 for dirty bitmaps got rewritten and\nrepurpose to also cover VFIO case; so reuse this infra here too for both.\n(Jason)\n* Take Robin's suggestion of always enabling dirty tracking and set_dirty\njust clearing bits on 'activation', and make that a generic property to\nensure we always get accurate results between starting and stopping\ntracking. (Robin Murphy)\n* Address all comments from SMMUv3 into how we enable/test the DBM, or the\nbits in the context descriptor with io-pgtable::quirks, etc\n(Robin, Shameerali)\n\n[0] https://lore.kernel.org/linux-iommu/20220609070811.902868-1-baolu.lu@linux.intel.com/\n[1] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/\n[2] https://github.com/jpemartins/qemu/commits/iommufd\n[3] https://lore.kernel.org/kvm/20230426150321.454465-1-yi.l.liu@intel.com/\n[4] https://lore.kernel.org/kvm/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com/\n[5] https://lore.kernel.org/kvm/20220428210933.3583-1-joao.m.martins@oracle.com/\n[6] https://github.com/jpemartins/linux/commits/iommufd-v2","shortMessageHtmlLink":"cover-letter: IOMMUFD Dirty Tracking"}},{"before":"4f7ac94ae8c0ea82d67ec66f3ad61ae2e04cda63","after":"13bba5487bed4e925a4880699a79f9c28d01fb91","ref":"refs/heads/iommufd-v2","pushedAt":"2023-05-18T18:56:02.528Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"cover-letter: IOMMUFD Dirty Tracking\n\nPresented herewith is a series that extends IOMMUFD to have IOMMU\nhardware support for dirty bit in the IOPTEs.\n\nToday, AMD Milan (or more recent) supports it while ARM SMMUv3.2\nalongside VT-D rev3.x are expected to eventually come along. One\nintended use-case (but not restricted!) is to support Live Migration with\nSR-IOV, specially useful for live migrateable PCI devices that can't\nsupply its own dirty tracking hardware blocks amongst others.\n\nAt a quick glance, IOMMUFD lets the userspace create the IOAS with a\nset of a IOVA ranges mapped to some physical memory composing an IO\npagetable. This is then created via HWPT_ALLOC or attached to a\nparticular device/hwpt, consequently creating the IOMMU domain and share\na common IO page table representing the endporint DMA-addressable guest\naddress space. In IOMMUFD Dirty tracking (from v1 of the series) we will\nbe supporting the HWPT_ALLOC model only, as opposed to simpler\nautodomains model.\n\nThe result is an hw_pagetable which represents the\niommu_domain which will be directly manipulated. The IOMMUFD UAPI,\nand the iommu core kAPI are then extended to provide:\n\n1) Enforce that only devices with dirty tracking support are attached\nto an IOMMU domain, to cover the case where this isn't all homogenous in\nthe platform. The enforcing being enabled or not is tracked by the iommu\ndomain op *caller* not iommu driver implementaiton, to avoid redundantly\ncheck this in IOMMU ops.\n\n2) Toggling of Dirty Tracking on the iommu_domain. We model as the most\ncommon case of changing hardware translation control structures dynamically\n(x86) while making it easier to have an always-enabled mode. In the\nRFCv1, the ARM specific case is suggested to be always enabled instead of\nhaving to enable the per-PTE DBM control bit (what I previously called\n\"range tracking\"). Here, setting/clearing tracking means just clearing the\ndirty bits at start. IOMMUFD wise The 'real' tracking of whether dirty\ntracking is enabled is stored in the IOMMU driver, hence no new\nfields are added to iommufd pagetable structures, except for the\niommu_domain enforcement part.\n\nNote: I haven't included a GET_DIRTY ioctl but I do have it implemented.\nBut I am not sure this is exactly needed. I find it good to have a getter\nsupplied with setter in general, but looking at how other parts were\ndeveloped in the past, the getter doesn't have a usage...\n\n3) Add a capability probing for dirty tracking, leveraging the per-device\niommu_capable() and adding a IOMMU_CAP_DIRTY. IOMMUFD we add a\nDEVICE_GET_CAPS ioctl which takes a device ID and returns some\ncapabilities. Similarly to 1) it might make sense to move it the drivers\n.attach_device validation to the caller; for now I have this in iommu\ndrivers;\n\n4) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap\nindexes on a page_size basis the IOVAs that got written by the device.\nWhile performing the marshalling also drivers need to clear the dirty bits\nfrom IOPTE and allow the kAPI caller to batch the much needed IOTLB flush.\nThere's no copy of bitmaps to userspace backed memory, all is zerocopy\nbased to not add more cost to the iommu driver IOPT walker. This shares\nfunctionality with VFIO device dirty tracking via the IOVA bitmap APIs. So\nfar this is a test-and-clear kind of interface given that the IOPT walk is\ngoing to be expensive. In addition this also adds the ability to read dirty\nbit info without clearing the PTE info. This is meant to cover the\nunmap-and-read-dirty use-case, and avoid the second IOTLB flush.\n\nNote: I've kept the name read_and_clear_dirty() in v2 but this might not\nmake sense given the name of the flags; open to suggestions.\n\n5) I've pulled Baolu Lu's patches[0] that make the pagetables page free\npath RCU-protected, which will fix both the use-after-free scenario\nmentioned there, but also let us a RCU-based page table walker for\nread_and_clear_dirty() iommu op, as opposed to taking the same locks on\nmap/unmap. These are taken exactly as they were posted.\n\nThe additional dependency are:\n* HWPT_ALLOC which allows creating/manipulating iommu_domain creation[4]\n\nWhile needing this to make it useful with VFIO (and consequently to VMMs):\n* VFIO cdev support and to use iommufd with VFIO [3]\n* VFIO PCI hot reset\n\nHence, I have these two dependencies applied first on top of this series.\nThis whole thing as posted is also here[6].\n\nThe series is organized as follows:\n\n* Patches 1-4: Takes care of the iommu domain operations to be added.\nThe idea is to abstract iommu drivers from any idea of how bitmaps are\nstored or propagated back to the caller, as well as allowing\ncontrol/batching over IOTLB flush. So there's a data structure and an\nhelper that only tells the upper layer that an IOVA range got dirty.\nThis logic is shared with VFIO and it's meant to walking the bitmap\nuser memory, and kmap-ing plus setting bits as needed. IOMMU driver\njust has an idea of a 'dirty bitmap state' and recording an IOVA as\ndirty. It also pulls Baolu Lu's patches for RCU-safe pagetable free.\n\n* Patches 5-16: Adds the UAPIs for IOMMUFD, and selftests. The selftests\ncover some corner cases on boundaries handling of the bitmap and various\nbitmap sizes that exercise. I haven't included huge IOVA ranges to avoid\nrisking the selftests failing to execute due to OOM issues of mmaping bit\nbuffers.\n\nFor completeness and most importantly to make sure the new IOMMU core\nops capture the hardware blocks, I've implemented for the x86 IOMMUs that\nhave/eventually-have IOMMU A/D support. So the next half of the series\npresents said implementations for IOMMUs:\n\n* Patches 17-18: AMD IOMMU implementation, particularly on those having\nHDSup support. Tested with a Qemu amd-iommu with HDSUp emulated[1].\n\n* Patches 19: Intel IOMMU rev3.x+ implementation. Tested with a Qemu\nbased intel-iommu vIOMMU with SSADS emulation support[1].\n\n* Patches 20-24: ARM SMMU v3 impleemntation. A lot simpler than the v1\nposting. Most of the adjustments were because of the new UAPI while taking\nthe comments I got in v1 from everyone. Only compile tested. Shameerali\nwill be taking over the ARM SMMUv3 support;\n\nTo help testing/prototypization, I also wrote qemu iommu emulation bits\nto increase coverage of this code and hopefully make this more broadly\navailable for fellow contributors/devs[1]; it is stored here[2] and\nlargelly based on Nicolin, Yi and Eric's IOMMUFD bringup work (thanks a\nton!). It also includes IOMMUFD dirty tracking supporting Qemu that got\nposted in the past. I won't be exactly following up a v2 there given that\nIOMMUFD support needs to be firstly supported by Qemu.\n\nWe have live migrateable VFs in VMMs these days (e.g. Qemu 8.0) so we can\nnow test everything in tandem, but I haven't have my hardware setup *yet*\norganized in such manner that allows me to test everything, hence why I am\nstill marking this as an RFC with intent to drop in v3. But most\nimportantly, this version is for making sure that iommu/iommufd kAPIs/UAPI\nare solid;\n\nSorry for such a late posting since v1; hopefully this are in a better\ndirection.\n\nFeedback or any comments are very much appreciated\n\nThanks!\n\tJoao\n\nTODOs for v3:\n- Testing with a live migrateable VF;\n- Improve the dirty PTE walking in Intel/AMD iommu drivers;\n\nChanges since RFCv1[5]:\nToo many changes but the major items were:\n* Majorirty of the changes from Jason/Kevin/Baolu/Suravee:\n- Improve structure and rework most commit messages\n- Drop all of the VFIO-compat series\n- Drop the unmap-get-dirty API\n- Tie this to HWPT only, no more autodomains handling;\n- Rework the UAPI widely by:\n - Having a IOMMU_DEVICE_GET_CAPS which allows to fetching capabilities\n of devices, specifically test dirty tracking support for an individual\n device\n - Add a enforce-dirty flag to the IOMMU domain via HWPT_ALLOC\n - SET_DIRTY now clears dirty tracking before asking iommu driver to do so;\n - New GET_DIRTY_IOVA flag that does not clear dirty bits\n - Add coverage for all added APIs\n - Expand GET_DIRTY_IOVA tests to cover IOVA bitmap corner cases tests\n that I had in separate; I only excluded the Terabyte IOVA range\n usecases (which test bitmaps 2M+) because those will most likely fail\n to be run as selftests (not sure yet how I can include those). I am\n not exactly sure how I can cover those, unless I do 'fake IOVA maps'\n *somehow* which do not necessarily require real buffers.\n- Handle most comments in intel-iommu. Only remaining one for v3 is the\n PTE walker which will be done better.\n- Handle all comments in amd-iommu, most of which regarding locking.\n Only one remaining is v3 same as Intel;\n- Reflect the UAPI changes into iommu driver implementations, including\npersisting dirty tracking enabling in new attach_dev calls, as well as\nenforcing attach_dev enforces the requested domain flags;\n* Comments from Yi Sun in making sure that dirty tracking isn't\nrestricted into SS only, so relax the check for FL support because it's\nalways enabled. (Yi Sun)\n* Most of code that was in v1 for dirty bitmaps got rewritten and\nrepurpose to also cover VFIO case; so reuse this infra here too for both.\n(Jason)\n* Take Robin's suggestion of always enabling dirty tracking and set_dirty\njust clearing bits on 'activation', and make that a generic property to\nensure we always get accurate results between starting and stopping\ntracking. (Robin Murphy)\n* Address all comments from SMMUv3 into how we enable/test the DBM, or the\nbits in the context descriptor with io-pgtable::quirks, etc\n(Robin, Shameerali)\n\n[0] https://lore.kernel.org/linux-iommu/20220609070811.902868-1-baolu.lu@linux.intel.com/\n[1] https://lore.kernel.org/qemu-devel/20220428211351.3897-1-joao.m.martins@oracle.com/\n[2] https://github.com/jpemartins/qemu/commits/iommufd\n[3] https://lore.kernel.org/kvm/20230426150321.454465-1-yi.l.liu@intel.com/\n[4] https://lore.kernel.org/kvm/0-v7-6c0fd698eda2+5e3-iommufd_alloc_jgg@nvidia.com/\n[5] https://lore.kernel.org/kvm/20220428210933.3583-1-joao.m.martins@oracle.com/\n[6] https://github.com/jpemartins/linux/commits/iommufd-v2","shortMessageHtmlLink":"cover-letter: IOMMUFD Dirty Tracking"}},{"before":"87ee1a898a1a6ebbdde6991dfc4209004b2b82bc","after":"4f7ac94ae8c0ea82d67ec66f3ad61ae2e04cda63","ref":"refs/heads/iommufd-v2","pushedAt":"2023-05-16T20:08:51.128Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"cover-letter: IOMMUFD Dirty Tracking\n\nPresented herewith is a series that extends IOMMUFD to have IOMMU\nhardware support for dirty bit in the IOPTEs.\n\nToday, AMD Milan (which been out for a year now) supports it while ARM\nSMMUv3.2 alongside VT-D rev3.x are expected to eventually come along.\nThe intended use-case is to support Live Migration with SR-IOV, without\nPCI devices having to supply its own IOVA dirty tracking hardware\nblocks. And, of course hopefully[*] longterm it becomes how\nIOVA dirty tracking is performed and relieve vendors to only care\nabout DMA quiescing/resume and device-state parts of VFIO migration.\n\n[*] i.e. should it not greatly/drastically affect DMA performance and what not etc\n\nAt a quick glance, IOMMUFD lets the userspace create the IOAS with a\nset of a IOVA ranges mapped to some physical memory composing an IO\npagetable. This is then attached to a particular device, consequently\ncreating the protection domain to share a common IO page table\nrepresenting the endporint DMA-addressable guest address space.\n(Hopefully I am not twisting the terminology here) The resultant object\nis an hw_pagetable object which represents the iommu_domain\nobject that will be directly manipulated. For more background on\nIOMMUFD have a look at these two series[0][1] on the kernel and qemu\nconsumption respectivally. The IOMMUFD UAPI, kAPI and the iommu core\nkAPI is then extended to provide:\n\n1) Status and Toggling of Dirty Tracking on the iommu_domain. We model\nas the most common case of changing hardware protection domain control\nbits. The ARM specific case of having to enable the per-PTE DBM control\nbit will be always enabled and setting/clearing tracking means just\nclearing the dirty bits at start and at stop. The 'real' tracking of\nwhether dirty tracking is enabled or not is stored in the IOMMU driver,\nhence no new fields are added to iommufd pagetable structures.\n\n2) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap\nthus describe the IOVAs that got written by the device. While performing\nthe marshalling also drivers need to clear the dirty bits from IOPTE and\nallow the kAPI caller to batch the much needed IOTLB flush.\nThere's no copy of bitmaps to userspace backed memory, all is zerocopy\nbased. So far this is a test-and-clear kind of interface given that the\nIOPT walk is going to be expensive. It occured to me to separate\nthe readout of dirty, and the clearing of dirty from IOPTEs.\nI haven't opted for that one, given that it would mean two lenghty IOPTE\nwalks and felt counter-performant.\n\nThe series is organized as follows:\n\n* Patches 1-4: Takes care of the iommu domain operations to be added and\nextends iommufd io-pagetable to set/clear dirty tracking, as well as\nreading the dirty bits from the IOMMU pagetables. The idea is to abstract\niommu drivers from any idea of how bitmaps are stored or propagated back to\nthe caller, as well as allowing control/batching over IOTLB flush. So\nthere's a data structure and an helper that only tells the upper layer that\nan IOVA range got dirty. This logic is shared with VFIO[2] and it's meant\nto pin pages, walking the bitmap user memory, and kmap-ing them as\nneeded. IOMMU driver just has an idea of a 'dirty bitmap state' and\nrecording an IOVA as dirty.\n\n* Patches 5-7: Adds the UAPIs for IOMMUFD, and selftests.\nThe selftests, test mainly the principal workflow, still needs\nto get added more corner cases.\n\nFor completeness and most importantly to make sure the new IOMMU core\nops capture the hardware blocks, I've implemented for the IOMMUs that\nwill eventually get IOMMU A/D support. So the next half of the series\npresents *proof of concept* implementations for IOMMUs:\n\n* Patches 8-9: AMD IOMMU implementation, particularly on those having\nHDSup support. Tested with a Qemu amd-iommu with HDSUp emulated,\nand also on a AMD Milan server IOMMU.\n\n* Patches 10: Intel IOMMU rev3.x implementation. Tested with a Qemu\nbased intel-iommu vIOMMU with SSADS/SLADS emulation support.\n\nTo help testing/prototypization, I also wrote qemu iommu emulation bits\nto increase coverage of this code and hopefully make this more broadly\navailable for fellow contributors/devs. A separate series is submitted right\nafter this covering the IOMMUFD extensions for dirty tracking, alongside\nwith x86 iommus device-models with A/D tracking support.\n\nNotable Remarks:\n\n* The UAPI/kAPI could be generalized over the next iteration to also cover\nAccess bit (or Intel's Extended Access bit that tracks non-CPU usage).\nIt wasn't done, as I was not aware of a use-case. I am wondering\nif the access-bits could be used to do some form of zero page detection\n(to just send the pages that got touched), although dirty-bits could be\nused just the same way. Happy to adjust for RFCv2. The algorithms, IOPTE\nwalk and marshalling into bitmaps as well as the necessary IOTLB flush\nbatching are all the same. The focus is on dirty bit given that the\ndirtyness IOVA feedback is used to select the pages that need to be transfered\nto the destination while migration is happening.\nSidebar: Sadly, there's a lot less clever possible tricks that can be\ndone (compared to the CPU/KVM) without having the PCI device cooperate (like\nuserfaultfd, wrprotect, etc as those would turn into nepharious IOMMU\nperm faults and devices with DMA target aborts).\nIf folks thing the UAPI/iommu-kAPI should be agnostic to any PTE A/D\nbits, we can instead have the ioctls be named after\nHWPT_SET_TRACKING() and add another argument which asks which bits to\nenabling tracking (IOMMUFD_ACCESS/IOMMUFD_DIRTY/IOMMUFD_ACCESS_NONCPU).\nLikewise for the read_and_clear() as all PTE bits follow the same logic\nas dirty.\n\n* Dirty bit tracking is not enough. Large IO pages tend to be the norm when\nDMA mapping large ranges of IOVA space, when really the VMM wants the smallest\ngranuliarity possible (i.e. base pages). A separate bit of work will\nneed to take care demoting IOPTE page sizes at guest-runtime to\nincrease/decrease the dirty tracking granularity, likely under the form of\na IOAS demote/promote page-size within a previously mapped IOVA range.\n\nFeedback is very much appreciated!\n\n[0] https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_jgg@nvidia.com/\n[1] https://lore.kernel.org/kvm/20220414104710.28534-1-yi.l.liu@intel.com/\n[2] https://lore.kernel.org/kvm/20220705102740.29337-1-yishaih@nvidia.com/\n\nThanks,\n\tJoao\n\nTODOs:\n- Testing with a live migrateable VF;","shortMessageHtmlLink":"cover-letter: IOMMUFD Dirty Tracking"}},{"before":null,"after":"87ee1a898a1a6ebbdde6991dfc4209004b2b82bc","ref":"refs/heads/iommufd-v2","pushedAt":"2023-05-15T21:07:09.614Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"jpemartins","name":"João Martins","path":"/jpemartins","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/279189?s=80&v=4"},"commit":{"message":"cover-letter: IOMMUFD Dirty Tracking\n\nPresented herewith is a series that extends IOMMUFD to have IOMMU\nhardware support for dirty bit in the IOPTEs.\n\nToday, AMD Milan (which been out for a year now) supports it while ARM\nSMMUv3.2 alongside VT-D rev3.x are expected to eventually come along.\nThe intended use-case is to support Live Migration with SR-IOV, without\nPCI devices having to supply its own IOVA dirty tracking hardware\nblocks. And, of course hopefully[*] longterm it becomes how\nIOVA dirty tracking is performed and relieve vendors to only care\nabout DMA quiescing/resume and device-state parts of VFIO migration.\n\n[*] i.e. should it not greatly/drastically affect DMA performance and what not etc\n\nAt a quick glance, IOMMUFD lets the userspace create the IOAS with a\nset of a IOVA ranges mapped to some physical memory composing an IO\npagetable. This is then attached to a particular device, consequently\ncreating the protection domain to share a common IO page table\nrepresenting the endporint DMA-addressable guest address space.\n(Hopefully I am not twisting the terminology here) The resultant object\nis an hw_pagetable object which represents the iommu_domain\nobject that will be directly manipulated. For more background on\nIOMMUFD have a look at these two series[0][1] on the kernel and qemu\nconsumption respectivally. The IOMMUFD UAPI, kAPI and the iommu core\nkAPI is then extended to provide:\n\n1) Status and Toggling of Dirty Tracking on the iommu_domain. We model\nas the most common case of changing hardware protection domain control\nbits. The ARM specific case of having to enable the per-PTE DBM control\nbit will be always enabled and setting/clearing tracking means just\nclearing the dirty bits at start and at stop. The 'real' tracking of\nwhether dirty tracking is enabled or not is stored in the IOMMU driver,\nhence no new fields are added to iommufd pagetable structures.\n\n2) Read the I/O PTEs and marshal its dirtyiness into a bitmap. The bitmap\nthus describe the IOVAs that got written by the device. While performing\nthe marshalling also drivers need to clear the dirty bits from IOPTE and\nallow the kAPI caller to batch the much needed IOTLB flush.\nThere's no copy of bitmaps to userspace backed memory, all is zerocopy\nbased. So far this is a test-and-clear kind of interface given that the\nIOPT walk is going to be expensive. It occured to me to separate\nthe readout of dirty, and the clearing of dirty from IOPTEs.\nI haven't opted for that one, given that it would mean two lenghty IOPTE\nwalks and felt counter-performant.\n\nThe series is organized as follows:\n\n* Patches 1-4: Takes care of the iommu domain operations to be added and\nextends iommufd io-pagetable to set/clear dirty tracking, as well as\nreading the dirty bits from the IOMMU pagetables. The idea is to abstract\niommu drivers from any idea of how bitmaps are stored or propagated back to\nthe caller, as well as allowing control/batching over IOTLB flush. So\nthere's a data structure and an helper that only tells the upper layer that\nan IOVA range got dirty. This logic is shared with VFIO[2] and it's meant\nto pin pages, walking the bitmap user memory, and kmap-ing them as\nneeded. IOMMU driver just has an idea of a 'dirty bitmap state' and\nrecording an IOVA as dirty.\n\n* Patches 5-7: Adds the UAPIs for IOMMUFD, and selftests.\nThe selftests, test mainly the principal workflow, still needs\nto get added more corner cases.\n\nFor completeness and most importantly to make sure the new IOMMU core\nops capture the hardware blocks, I've implemented for the IOMMUs that\nwill eventually get IOMMU A/D support. So the next half of the series\npresents *proof of concept* implementations for IOMMUs:\n\n* Patches 8-9: AMD IOMMU implementation, particularly on those having\nHDSup support. Tested with a Qemu amd-iommu with HDSUp emulated,\nand also on a AMD Milan server IOMMU.\n\n* Patches 10: Intel IOMMU rev3.x implementation. Tested with a Qemu\nbased intel-iommu vIOMMU with SSADS/SLADS emulation support.\n\nTo help testing/prototypization, I also wrote qemu iommu emulation bits\nto increase coverage of this code and hopefully make this more broadly\navailable for fellow contributors/devs. A separate series is submitted right\nafter this covering the IOMMUFD extensions for dirty tracking, alongside\nwith x86 iommus device-models with A/D tracking support.\n\nNotable Remarks:\n\n* The UAPI/kAPI could be generalized over the next iteration to also cover\nAccess bit (or Intel's Extended Access bit that tracks non-CPU usage).\nIt wasn't done, as I was not aware of a use-case. I am wondering\nif the access-bits could be used to do some form of zero page detection\n(to just send the pages that got touched), although dirty-bits could be\nused just the same way. Happy to adjust for RFCv2. The algorithms, IOPTE\nwalk and marshalling into bitmaps as well as the necessary IOTLB flush\nbatching are all the same. The focus is on dirty bit given that the\ndirtyness IOVA feedback is used to select the pages that need to be transfered\nto the destination while migration is happening.\nSidebar: Sadly, there's a lot less clever possible tricks that can be\ndone (compared to the CPU/KVM) without having the PCI device cooperate (like\nuserfaultfd, wrprotect, etc as those would turn into nepharious IOMMU\nperm faults and devices with DMA target aborts).\nIf folks thing the UAPI/iommu-kAPI should be agnostic to any PTE A/D\nbits, we can instead have the ioctls be named after\nHWPT_SET_TRACKING() and add another argument which asks which bits to\nenabling tracking (IOMMUFD_ACCESS/IOMMUFD_DIRTY/IOMMUFD_ACCESS_NONCPU).\nLikewise for the read_and_clear() as all PTE bits follow the same logic\nas dirty.\n\n* Dirty bit tracking is not enough. Large IO pages tend to be the norm when\nDMA mapping large ranges of IOVA space, when really the VMM wants the smallest\ngranuliarity possible (i.e. base pages). A separate bit of work will\nneed to take care demoting IOPTE page sizes at guest-runtime to\nincrease/decrease the dirty tracking granularity, likely under the form of\na IOAS demote/promote page-size within a previously mapped IOVA range.\n\nFeedback is very much appreciated!\n\n[0] https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6-iommufd_jgg@nvidia.com/\n[1] https://lore.kernel.org/kvm/20220414104710.28534-1-yi.l.liu@intel.com/\n[2] https://lore.kernel.org/kvm/20220705102740.29337-1-yishaih@nvidia.com/\n\nThanks,\n\tJoao\n\nTODOs:\n- More selftests for large/small iopte sizes;\n- Performance efficiency of GET_DIRTY_IOVA in various workloads;\n- Testing with a live migrateable VF;","shortMessageHtmlLink":"cover-letter: IOMMUFD Dirty Tracking"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAADnpFkpgA","startCursor":null,"endCursor":null}},"title":"Activity · jpemartins/linux"}