summary refs log tree commit diff stats
path: root/hw/vfio (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* | vfio/igd: add pci id for Coffee LakeCorvin Köhne2024-11-181-0/+3
|/ | | | | | | I've tested and verified that Coffee Lake devices are working properly. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/migration: Add vfio_save_block_precopy_empty_hit trace eventMaciej S. Szmigiero2024-11-052-0/+9
| | | | | | | This way it is clearly known when there's no more data to send for that device. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
* vfio/migration: Add save_{iterate, complete_precopy}_start trace eventsMaciej S. Szmigiero2024-11-052-0/+11
| | | | | | | This way both the start and end points of migrating a particular VFIO device are known. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
* migration: Drop migration_is_setup_or_active()Peter Xu2024-10-311-1/+1
| | | | | | | | | | | | | | | | | This helper is mostly the same as migration_is_running(), except that one has COLO reported as true, the other has CANCELLING reported as true. Per my past years experience on the state changes, none of them should matter. To make it slightly safer, report both COLO || CANCELLING to be true in migration_is_running(), then drop the other one. We kept the 1st only because the name is simpler, and clear enough. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* vfio/helpers: Align mmapsAlex Williamson2024-10-231-2/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | Thanks to work by Peter Xu, support is introduced in Linux v6.12 to allow pfnmap insertions at PMD and PUD levels of the page table. This means that provided a properly aligned mmap, the vfio driver is able to map MMIO at significantly larger intervals than PAGE_SIZE. For example on x86_64 (the only architecture currently supporting huge pfnmaps for PUD), rather than 4KiB mappings, we can map device MMIO using 2MiB and even 1GiB page table entries. Typically mmap will already provide PMD aligned mappings, so devices with moderately sized MMIO ranges, even GPUs with standard 256MiB BARs, will already take advantage of this support. However in order to better support devices exposing multi-GiB MMIO, such as 3D accelerators or GPUs with resizable BARs enabled, we need to manually align the mmap. There doesn't seem to be a way for userspace to easily learn about PMD and PUD mapping level sizes, therefore this takes the simple approach to align the mapping to the power-of-two size of the region, up to 1GiB, which is currently the maximum alignment we care about. Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
* vfio/helpers: Refactor vfio_region_mmap() error handlingAlex Williamson2024-10-231-17/+17
| | | | | | | | | | Move error handling code to the end of the function so that it can more easily be shared by new mmap failure conditions. No functional change intended. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
* vfio/migration: Change trace formats from hex to decimalAvihai Horon2024-10-231-5/+5
| | | | | | | | | | | | Data sizes in VFIO migration trace events are printed in hex format while in migration core trace events they are printed in decimal format. This inconsistency makes it less readable when using both trace event types. Hence, change the data sizes print format to decimal in VFIO migration trace events. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
* vfio/migration: Report only stop-copy size in vfio_state_pending_exact()Avihai Horon2024-10-231-3/+0
| | | | | | | | | | | | | | | | | | | | | | vfio_state_pending_exact() is used to update migration core how much device data is left for the device migration. Currently, the sum of pre-copy and stop-copy sizes of the VFIO device are reported. The pre-copy size is obtained via the VFIO_MIG_GET_PRECOPY_INFO ioctl, which returns the amount of device data available to be transferred while the device is in the PRE_COPY states. The stop-copy size is obtained via the VFIO_DEVICE_FEATURE_MIG_DATA_SIZE ioctl, which returns the total amount of device data left to be transferred in order to complete the device migration. According to the above, current implementation is wrong -- it reports extra overlapping data because pre-copy size is already contained in stop-copy size. Fix it by reporting only stop-copy size. Fixes: eda7362af959 ("vfio/migration: Add VFIO migration pre-copy support") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
* vfio/igd: correctly calculate stolen memory size for gen 9 and laterCorvin Köhne2024-09-171-4/+11
| | | | | | | | | | | We have to update the calculation of the stolen memory size because we've seen devices using values of 0xf0 and above for the graphics mode select field. The new calculation was taken from the linux kernel [1]. [1] https://github.com/torvalds/linux/blob/7c626ce4bae1ac14f60076d00eafe71af30450ba/arch/x86/kernel/early-quirks.c#L455-L460 Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/igd: don't set stolen memory size to zeroCorvin Köhne2024-09-171-17/+18
| | | | | | | | | | | | | | The stolen memory is required for the GOP (EFI) driver and the Windows driver. While the GOP driver seems to work with any stolen memory size, the Windows driver will crash if the size doesn't match the size allocated by the host BIOS. For that reason, it doesn't make sense to overwrite the stolen memory size. It's true that this wastes some VM memory. In the worst case, the stolen memory can take up more than a GB. However, that's uncommon. Additionally, it's likely that a bunch of RAM is assigned to VMs making use of GPU passthrough. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/igd: add ID's for ElkhartLake and TigerLakeCorvin Köhne2024-09-171-0/+6
| | | | | | | | | | ElkhartLake and TigerLake devices were tested in legacy mode with Linux and Windows VMs. Both are working properly. It's likely that other Intel GPUs of gen 11 and 12 like IceLake device are working too. However, we're only adding known good devices for now. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/igd: add new bar0 quirk to emulate BDSM mirrorCorvin Köhne2024-09-173-0/+100
| | | | | | | | | | | | | | | | | | | The BDSM register is mirrored into MMIO space at least for gen 11 and later devices. Unfortunately, the Windows driver reads the register value from MMIO space instead of PCI config space for those devices [1]. Therefore, we either have to keep a 1:1 mapping for the host and guest address or we have to emulate the MMIO register too. Using the igd in legacy mode is already hard due to it's many constraints. Keeping a 1:1 mapping may not work in all cases and makes it even harder to use. An MMIO emulation has to trap the whole MMIO page. This makes accesses to this page slower compared to using second level address translation. Nevertheless, it doesn't have any constraints and I haven't noticed any performance degradation yet making it a better solution. [1] https://github.com/projectacrn/acrn-hypervisor/blob/5c351bee0f6ae46250eefc07f44b4a31e770f3cf/devicemodel/hw/pci/passthrough.c#L650-L653 Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/igd: use new BDSM register location and size for gen 11 and laterCorvin Köhne2024-09-171-7/+24
| | | | | | | | | Intel changed the location and size of the BDSM register for gen 11 devices and later. We have to adjust our emulation for these devices to properly support them. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/igd: support legacy mode for all known generationsCorvin Köhne2024-09-171-1/+1
| | | | | | | | | | | | | | | We're soon going to add support for legacy mode to ElkhartLake and TigerLake devices. Those are gen 11 and 12 devices. At the moment, all devices identified by our igd_gen function do support legacy mode. This won't change when adding our new devices of gen 11 and 12. Therefore, it makes more sense to accept legacy mode for all known devices instead of maintaining a long list of known good generations. If we add a new generation to igd_gen which doesn't support legacy mode for some reason, it'll be easy to advance the check to reject legacy mode for this specific generation. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* vfio/igd: return an invalid generation for unknown devicesCorvin Köhne2024-09-171-1/+5
| | | | | | | | | | | | | Intel changes it's specification quite often e.g. the location and size of the BDSM register has change for gen 11 devices and later. This causes our emulation to fail on those devices. So, it's impossible for us to use a suitable default value for unknown devices. Instead of returning a random generation value and hoping that everthing works fine, we should verify that different devices are working and add them to our list of known devices. Signed-off-by: Corvin Köhne <c.koehne@beckhoff.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
* hw/vfio/pci.c: Use correct type in trace_vfio_msix_early_setup()Peter Maydell2024-09-171-1/+1
| | | | | | | | | | | | | | | | | | The tracepoint trace_vfio_msix_early_setup() uses "int" for the type of the table_bar argument, but we use this to print a uint32_t. Coverity warns that this means that we could end up treating it as a negative number. We only use this in printing the value in the tracepoint, so mishandling it as a negative number would be harmless, but it's better to use the right type in the tracepoint. Use uint64_t to match how we print the table_offset in the vfio_msix_relo() tracepoint. Resolves: Coverity CID 1547690 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com>
* hw: Use device_class_set_legacy_reset() instead of opencodingPeter Maydell2024-09-133-3/+3
| | | | | | | | | | | | | Use device_class_set_legacy_reset() instead of opencoding an assignment to DeviceClass::reset. This change was produced with: spatch --macro-file scripts/cocci-macro-file.h \ --sp-file scripts/coccinelle/device-reset.cocci \ --keep-comments --smpl-spacing --in-place --dir hw Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
* qapi/vfio: Rename VfioMigrationState to Qapi*, and drop prefixMarkus Armbruster2024-09-101-1/+1
| | | | | | | | | | | | | | | | | | | QAPI's 'prefix' feature can make the connection between enumeration type and its constants less than obvious. It's best used with restraint. VfioMigrationState has a 'prefix' that overrides the generated enumeration constants' prefix to QAPI_VFIO_MIGRATION_STATE. We could simply drop 'prefix', but then the enumeration constants would look as if they came from kernel header linux/vfio.h. Rename the type to QapiVfioMigrationState instead, so that 'prefix' is not needed. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240904111836.3273842-20-armbru@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com>
* qapi/common: Drop temporary 'prefix'Markus Armbruster2024-09-101-5/+5
| | | | | | | | | | | | | Recent commit "qapi: Smarter camel_to_upper() to reduce need for 'prefix'" added a temporary 'prefix' to delay changing the generated code. Revert it. This improves OffAutoPCIBAR's generated enumeration constant prefix from OFF_AUTOPCIBAR to OFF_AUTO_PCIBAR. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Message-ID: <20240904111836.3273842-5-armbru@redhat.com>
* Merge tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu into ↵Richard Henderson2024-07-248-35/+257
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging vfio queue: * IOMMUFD Dirty Tracking support * Fix for a possible SEGV in IOMMU type1 container * Dropped initialization of host IOMMU device with mdev devices # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEoPZlSPBIlev+awtgUaNDx8/77KEFAmafyVUACgkQUaNDx8/7 # 7KGebRAAzEYxvstDxSPNF+1xx937TKbRpiKYtspTfEgu4Ht50MwO2ZqnVWzTBSwa # qcjhDf2avMBpBvkp4O9fR7nXR0HRN2KvYrBSThZ3Qpqu4KjxCAGcHI5uYmgfizYh # BBLrw3eWME5Ry220TinQF5KFl50vGq7Z/mku5N5Tgj2qfTfCXYK1Kc19SyAga49n # LSokTIjZAGJa4vxrE7THawaEUjFRjfCJey64JUs/TPJaGr4R1snJcWgETww6juUE # 9OSw/xl0AoQhaN/ZTRC1qCsBLUI2MVPsC+x+vqVK62HlTjCx+uDRVQ8KzfDzjCeH # gaLkMjxJSuJZMpm4UU7DBzDGEGcEBCGeNyFt37BSqqPPpX55CcFhj++d8vqTiwpF # YzmTNd/znxcZTw6OJN9sQZohh+NeS86CVZ3x31HD3dXifhRf17jbh7NoIyi+0ZCb # N+mytOH5BXsD+ddwbk+yMaxXV43Fgz7ThG5tB1tjhhNtLZHDA5ezFvGZ5F/FJrqE # xAbjOhz5MC+RcOVNSzQJCULNqFpfE6Gqeys6btEDm/ltf4LpAe6W1HYuv8BJc19T # UsqGK2yKAuQX8GErYxJ1zqZCttVrgpsmXFYTC5iGbxC84mvsF0Iti96IdXz9gfzN # Vlb2OxoefcOwVqIhbkvTZW0ZwYGGDDPAYhLMfr5lSuRqj123OOo= # =cViP # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 Jul 2024 01:16:37 AM AEST # gpg: using RSA key A0F66548F04895EBFE6B0B6051A343C7CFFBECA1 # gpg: Good signature from "Cédric Le Goater <clg@kaod.org>" [undefined] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: A0F6 6548 F048 95EB FE6B 0B60 51A3 43C7 CFFB ECA1 * tag 'pull-vfio-20240723-1' of https://github.com/legoater/qemu: vfio/common: Allow disabling device dirty page tracking vfio/migration: Don't block migration device dirty tracking is unsupported vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support vfio/iommufd: Probe and request hwpt dirty tracking capability vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during attach_device() vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCaps vfio/{iommufd,container}: Remove caps::aw_bits vfio/iommufd: Introduce auto domain creation vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdev vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdev vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev vfio/pci: Extract mdev check into an helper hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize() Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * vfio/common: Allow disabling device dirty page trackingJoao Martins2024-07-233-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The property 'x-pre-copy-dirty-page-tracking' allows disabling the whole tracking of VF pre-copy phase of dirty page tracking, though it means that it will only be used at the start of the switchover phase. Add an option that disables the VF dirty page tracking, and fall back into container-based dirty page tracking. This also allows to use IOMMU dirty tracking even on VFs with their own dirty tracker scheme. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
| * vfio/migration: Don't block migration device dirty tracking is unsupportedJoao Martins2024-07-231-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default VFIO migration is set to auto, which will support live migration if the migration capability is set *and* also dirty page tracking is supported. For testing purposes one can force enable without dirty page tracking via enable-migration=on, but that option is generally left for testing purposes. So starting with IOMMU dirty tracking it can use to accommodate the lack of VF dirty page tracking allowing us to minimize the VF requirements for migration and thus enabling migration by default for those too. While at it change the error messages to mention IOMMU dirty tracking as well. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> [ clg: - spelling in commit log ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
| * vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap supportJoao Martins2024-07-231-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ioctl(iommufd, IOMMU_HWPT_GET_DIRTY_BITMAP, arg) is the UAPI that fetches the bitmap that tells what was dirty in an IOVA range. A single bitmap is allocated and used across all the hwpts sharing an IOAS which is then used in log_sync() to set Qemu global bitmaps. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
| * vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking supportJoao Martins2024-07-231-0/+32
| | | | | | | | | | | | | | | | | | | | | | ioctl(iommufd, IOMMU_HWPT_SET_DIRTY_TRACKING, arg) is the UAPI that enables or disables dirty page tracking. The ioctl is used if the hwpt has been created with dirty tracking supported domain (stored in hwpt::flags) and it is called on the whole list of iommu domains. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/iommufd: Probe and request hwpt dirty tracking capabilityJoao Martins2024-07-231-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to using the dirty tracking UAPI, probe whether the IOMMU supports dirty tracking. This is done via the data stored in hiod::caps::hw_caps initialized from GET_HW_INFO. Qemu doesn't know if VF dirty tracking is supported when allocating hardware pagetable in iommufd_cdev_autodomains_get(). This is because VFIODevice migration state hasn't been initialized *yet* hence it can't pick between VF dirty tracking vs IOMMU dirty tracking. So, if IOMMU supports dirty tracking it always creates HWPTs with IOMMU_HWPT_ALLOC_DIRTY_TRACKING even if later on VFIOMigration decides to use VF dirty tracking instead. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> [ clg: - Fixed vbasedev->iommu_dirty_tracking assignment in iommufd_cdev_autodomains_get() - Added warning for heterogeneous dirty page tracking support in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
| * vfio/{iommufd, container}: Invoke HostIOMMUDevice::realize() during ↵Joao Martins2024-07-234-10/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | attach_device() Move the HostIOMMUDevice::realize() to be invoked during the attach of the device before we allocate IOMMUFD hardware pagetable objects (HWPT). This allows the use of the hw_caps obtained by IOMMU_GET_HW_INFO that essentially tell if the IOMMU behind the device supports dirty tracking. Note: The HostIOMMUDevice data from legacy backend is static and doesn't need any information from the (type1-iommu) backend to be initialized. In contrast however, the IOMMUFD HostIOMMUDevice data requires the iommufd FD to be connected and having a devid to be able to successfully GET_HW_INFO. This means vfio_device_hiod_realize() is called in different places within the backend .attach_device() implementation. Suggested-by: Cédric Le Goater <clg@redhat.cm> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> [ clg: Fixed error handling in iommufd_cdev_attach() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/iommufd: Add hw_caps field to HostIOMMUDeviceCapsJoao Martins2024-07-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Store the value of @caps returned by iommufd_backend_get_device_info() in a new field HostIOMMUDeviceCaps::hw_caps. Right now the only value is whether device IOMMU supports dirty tracking (IOMMU_HW_CAP_DIRTY_TRACKING). This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/{iommufd,container}: Remove caps::aw_bitsJoao Martins2024-07-232-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove caps::aw_bits which requires the bcontainer::iova_ranges being initialized after device is actually attached. Instead defer that to .get_cap() and call vfio_device_get_aw_bits() directly. This is in preparation for HostIOMMUDevice::realize() being called early during attach_device(). Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/iommufd: Introduce auto domain creationJoao Martins2024-07-231-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's generally two modes of operation for IOMMUFD: 1) The simple user API which intends to perform relatively simple things with IOMMUs e.g. DPDK. The process generally creates an IOAS and attaches to VFIO and mainly performs IOAS_MAP and UNMAP. 2) The native IOMMUFD API where you have fine grained control of the IOMMU domain and model it accordingly. This is where most new feature are being steered to. For dirty tracking 2) is required, as it needs to ensure that the stage-2/parent IOMMU domain will only attach devices that support dirty tracking (so far it is all homogeneous in x86, likely not the case for smmuv3). Such invariant on dirty tracking provides a useful guarantee to VMMs that will refuse incompatible device attachments for IOMMU domains. Dirty tracking insurance is enforced via HWPT_ALLOC, which is responsible for creating an IOMMU domain. This is contrast to the 'simple API' where the IOMMU domain is created by IOMMUFD automatically when it attaches to VFIO (usually referred as autodomains) but it has the needed handling for mdevs. To support dirty tracking with the advanced IOMMUFD API, it needs similar logic, where IOMMU domains are created and devices attached to compatible domains. Essentially mimicking kernel iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain it falls back to IOAS attach. The auto domain logic allows different IOMMU domains to be created when DMA dirty tracking is not desired (and VF can provide it), and others where it is. Here it is not used in this way given how VFIODevice migration state is initialized after the device attachment. But such mixed mode of IOMMU dirty tracking + device dirty tracking is an improvement that can be added on. Keep the 'all of nothing' of type1 approach that we have been using so far between container vs device dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> [ clg: Added ERRP_GUARD() in iommufd_cdev_autodomains_get() ] Signed-off-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/ccw: Don't initialize HOST_IOMMU_DEVICE with mdevZhenzhong Duan2024-07-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by setting vbasedev->mdev true so skipping HostIOMMUDevice initialization in the presence of mdevs. Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/ap: Don't initialize HOST_IOMMU_DEVICE with mdevZhenzhong Duan2024-07-231-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by setting vbasedev->mdev true so skipping HostIOMMUDevice initialization in the presence of mdevs. Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()Joao Martins2024-07-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to implement auto domains have the attach function return the errno it got during domain attach instead of a bool. -EINVAL is tracked to track domain incompatibilities, and decide whether to create a new IOMMU domain. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
| * backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW ↵Joao Martins2024-07-231-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | capabilities The helper will be able to fetch vendor agnostic IOMMU capabilities supported both by hardware and software. Right now it is only iommu dirty tracking. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdevJoao Martins2024-07-232-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | mdevs aren't "physical" devices and when asking for backing IOMMU info, it fails the entire provisioning of the guest. Fix that by skipping HostIOMMUDevice initialization in the presence of mdevs, and skip setting an iommu device when it is known to be an mdev. Cc: Zhenzhong Duan <zhenzhong.duan@intel.com> Fixes: 930589520128 ("vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler") Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
| * vfio/pci: Extract mdev check into an helperJoao Martins2024-07-232-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to skip initialization of the HostIOMMUDevice for mdev, extract the checks that validate if a device is an mdev into helpers. A vfio_device_is_mdev() is created, and subsystems consult VFIODevice::mdev to check if it's mdev or not. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com>
| * hw/vfio/container: Fix SIGSEV on vfio_container_instance_finalize()Eric Auger2024-07-231-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In vfio_connect_container's error path, the base container is removed twice form the VFIOAddressSpace QLIST: first on the listener_release_exit label and second, on free_container_exit label, through object_unref(container), which calls vfio_container_instance_finalize(). Let's remove the first instance. Fixes: 938026053f4 ("vfio/container: Switch to QOM") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
* | hw/vfio/common: Add vfio_listener_region_del_iommu trace eventEric Auger2024-07-222-2/+4
|/ | | | | | | | | | | | Trace when VFIO gets notified about the deletion of an IOMMU MR. Also trace the name of the region in the add_iommu trace message. Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20240716094619.1713905-6-eric.auger@redhat.com> Tested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/vfio/container: Get rid of qemu_open_old()Zhao Liu2024-07-171-4/+2
| | | | | | | | | | | | | | | | | For qemu_open_old(), osdep.h said: > Don't introduce new usage of this function, prefer the following > qemu_open/qemu_create that take an "Error **errp". So replace qemu_open_old() with qemu_open(). Cc: Alex Williamson <alex.williamson@redhat.com> Cc: "Cédric Le Goater" <clg@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* vfio/display: Fix vfio_display_edid_init() error pathZhenzhong Duan2024-07-091-6/+7
| | | | | | | | | | | | | | | | vfio_display_edid_init() can fail for many reasons and return silently. It would be good to report the error. Old mdev driver may not support vfio edid region and we allow to go through in this case. vfio_display_edid_update() isn't changed because it can be called at runtime when UI changes (i.e. window resize). Fixes: 08479114b0de ("vfio/display: add edid support.") Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
* vfio/display: Fix potential memleak of edid infoZhenzhong Duan2024-07-091-0/+3
| | | | | | | | | | EDID related device region info is leaked in vfio_display_edid_init() error path and VFIODisplay destroying path. Fixes: 08479114b0de ("vfio/display: add edid support.") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
* memory: remove IOMMU MR iommu_set_page_size_mask() callbackEric Auger2024-07-091-8/+0
| | | | | | | | | | | | | | | Everything is now in place to use the Host IOMMU Device callbacks to retrieve the page size mask usable with a given assigned device. This new method brings the advantage to pass the info much earlier to the virtual IOMMU and before the IOMMU MR gets enabled. So let's remove the call to memory_region_iommu_set_page_size_mask in vfio common.c and remove the single implementation of the IOMMU MR callback in the virtio-iommu.c Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* HostIOMMUDevice: Introduce get_page_size_mask() callbackEric Auger2024-07-092-0/+21
| | | | | | | | | | This callback will be used to retrieve the page size mask supported along a given Host IOMMU device. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* HostIOMMUDevice : remove Error handle from get_iova_ranges callbackEric Auger2024-07-092-2/+2
| | | | | | | | | The error handle argument is not used anywhere. let's remove it. Signed-off-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* vfio-container-base: Introduce vfio_container_get_iova_ranges() helperEric Auger2024-07-093-14/+17
| | | | | | | | | | | | | | | | | | Introduce vfio_container_get_iova_ranges() to retrieve the usable IOVA regions of the base container and use it in the Host IOMMU device implementations of get_iova_ranges() callback. We also fix a UAF bug as the list was shallow copied while g_list_free_full() was used both on the single call site, in virtio_iommu_set_iommu_device() but also in vfio_container_instance_finalize(). Instead use g_list_copy_deep. Fixes: cf2647a76e ("virtio-iommu: Compute host reserved regions") Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* Remove inclusion of hw/hw.h from files that don't need itThomas Huth2024-07-021-1/+0
| | | | | | | | | | | | hw/hw.h only contains the prototype of hw_error() nowadays, so files that don't use this function don't need to include this header. Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Message-ID: <20240701132649.58345-1-thuth@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* vfio/container: Move vfio_container_destroy() to an instance_finalize() handlerCédric Le Goater2024-06-243-4/+3
| | | | | | | | | | | | | vfio_container_destroy() clears the resources allocated VFIOContainerBase object. Now that VFIOContainerBase is a QOM object, add an instance_finalize() handler to do the cleanup. It will be called through object_unref(). Suggested-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Introduce vfio_iommu_legacy_instance_init()Cédric Le Goater2024-06-241-1/+8
| | | | | | | | | | | Just as we did for the VFIOContainerBase object, introduce an instance_init() handler for the legacy VFIOContainer object and do the specific initialization there. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Remove vfio_container_init()Cédric Le Goater2024-06-243-9/+0
| | | | | | | | | It's now empty. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Remove VFIOContainerBase::opsCédric Le Goater2024-06-245-24/+38
| | | | | | | | | Instead, use VFIO_IOMMU_GET_CLASS() to get the class pointer. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Introduce an instance_init() handlerCédric Le Goater2024-06-241-6/+13
| | | | | | | | | | This allows us to move the initialization code from vfio_container_init(), which we will soon remove. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>