summary refs log tree commit diff stats
path: root/hw/i386/amd_iommu.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* amd_iommu: HATDis/HATS=11 supportJoao Martins2025-10-051-0/+19
| | | | | | | | | | | | Add a way to disable DMA translation support in AMD IOMMU by allowing to set IVHD HATDis to 1, and exposing HATS (Host Address Translation Size) as Reserved value. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-23-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Refactor amdvi_page_walk() to use common code for page walkAlejandro Jimenez2025-10-051-50/+27
| | | | | | | | | | | Simplify amdvi_page_walk() by making it call the fetch_pte() helper that is already in use by the shadow page synchronization code. Ensures all code uses the same page table walking algorithm. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-21-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Do not assume passthrough translation when DTE[TV]=0Alejandro Jimenez2025-10-051-39/+48
| | | | | | | | | | | | | | | | | | The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V, TV, and GV Fields in Device Table Entry), specifies that a DTE with V=1, TV=0 does not contain a valid address translation information. If a request requires a table walk, the walk is terminated when this condition is encountered. Do not assume that addresses for a device with DTE[TV]=0 are passed through (i.e. not remapped) and instead terminate the page table walk early. Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-20-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Toggle address translation mode on devtab entry invalidationAlejandro Jimenez2025-10-051-2/+120
| | | | | | | | | | | | | A guest must issue an INVALIDATE_DEVTAB_ENTRY command after changing a Device Table entry (DTE) e.g. after attaching a device and setting up its DTE. When intercepting this event, determine if the DTE has been configured for paging or not, and toggle the appropriate memory regions to allow DMA address translation for the address space if needed. Requires dma-remap=on. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-19-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add dma-remap property to AMD vIOMMU deviceAlejandro Jimenez2025-10-051-7/+17
| | | | | | | | | | | | | | | | | | | | | | | In order to enable device assignment with IOMMU protection and guest DMA address translation, IOMMU MAP notifier support is necessary to allow users like VFIO to synchronize the shadow page tables i.e. to receive notifications when the guest updates its I/O page tables and replay the mappings onto host I/O page tables. Provide a new dma-remap property to govern the ability to register for MAP notifications, effectively providing global control over the DMA address translation functionality that was implemented in previous changes. Note that DMA remapping support also requires the vIOMMU is configured with the NpCache capability, so a guest driver issues IOMMU invalidations for both map() and unmap() operations. This capability is already set by default and written to the configuration in amdvi_pci_realize() as part of AMDVI_CAPAB_FEATURES. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-18-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Set all address spaces to use passthrough mode on resetAlejandro Jimenez2025-10-051-0/+30
| | | | | | | | | | On reset, restore the default address translation mode (passthrough) for all the address spaces managed by the vIOMMU. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-17-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Toggle memory regions based on address translation modeAlejandro Jimenez2025-10-051-2/+21
| | | | | | | | | | | | | | | | Enable the appropriate memory region for an address space depending on the address translation mode selected for it. This is currently based on a generic x86 IOMMU property, and only done during the address space initialization. Extract the code into a helper and toggle the regions based on whether the specific address space is using address translation (via the newly introduced addr_translation field). Later, region activation will also be controlled by availability of DMA remapping capability (via dma-remap property to be introduced in follow up changes). Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-16-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALLAlejandro Jimenez2025-10-051-0/+48
| | | | | | | | | | | | | When the kernel IOMMU driver issues an INVALIDATE_IOMMU_ALL, the address translation and interrupt remapping information must be cleared for all Device IDs and all domains. Introduce a helper to sync the shadow page table for all the address spaces with registered notifiers, which replays both MAP and UNMAP events. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-15-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add replay callbackAlejandro Jimenez2025-10-051-0/+24
| | | | | | | | | | | | | A replay() method is necessary to efficiently synchronize the host page tables after VFIO registers a notifier for IOMMU events. It is called to ensure that existing mappings from an IOMMU memory region are "replayed" to a specified notifier, initializing or updating the shadow page tables on the host. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-14-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Unmap all address spaces under the AMD IOMMU on resetAlejandro Jimenez2025-10-051-0/+74
| | | | | | | | | | | | Support dropping all existing mappings on reset. When the guest kernel reboots it will create new ones, but other components that run before the kernel (e.g. OVMF) should not be able to use existing mappings from the previous boot. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-13-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Use iova_tree records to determine large page size on UNMAPAlejandro Jimenez2025-10-051-6/+89
| | | | | | | | | | | | | | | | | | | | | Keep a record of mapped IOVA ranges per address space, using the iova_tree implementation. Besides enabling optimizations like avoiding unnecessary notifications, a record of existing <IOVA, size> mappings makes it possible to determine if a specific IOVA is mapped by the guest using a large page, and adjust the size when notifying UNMAP events. When unmapping a large page, the information in the guest PTE encoding the page size is lost, since the guest clears the PTE before issuing the invalidation command to the IOMMU. In such case, the size of the original mapping can be retrieved from the iova_tree and used to issue the UNMAP notification. Using the correct size is essential since the VFIO IOMMU Type1v2 driver in the host kernel will reject unmap requests that do not fully cover previous mappings. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-12-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Sync shadow page tables on page invalidationAlejandro Jimenez2025-10-051-8/+74
| | | | | | | | | | | | When the guest issues an INVALIDATE_IOMMU_PAGES command, decode the address and size of the invalidation and sync the guest page table state with the host. This requires walking the guest page table and calling notifiers registered for address spaces matching the domain ID encoded in the command. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-11-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add basic structure to support IOMMU notifier updatesAlejandro Jimenez2025-10-051-0/+20
| | | | | | | | | | | | | | | Add the minimal data structures required to maintain a list of address spaces (i.e. devices) with registered notifiers, and to update the type of events that require notifications. Note that the ability to register for MAP notifications is not available. It will be unblocked by following changes that enable the synchronization of guest I/O page tables with host IOMMU state, at which point an amd-iommu device property will be introduced to control this capability. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-10-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add a page walker to sync shadow page tables on invalidationAlejandro Jimenez2025-10-051-0/+80
| | | | | | | | | | | For the specified address range, walk the page table identifying regions as mapped or unmapped and invoke registered notifiers with the corresponding event type. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-9-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add helpers to walk AMD v1 Page Table formatAlejandro Jimenez2025-10-051-0/+123
| | | | | | | | | | | The current amdvi_page_walk() is designed to be called by the replay() method. Rather than drastically altering it, introduce helpers to fetch guest PTEs that will be used by a page walker implementation. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-8-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Return an error when unable to read PTE from guest memoryAlejandro Jimenez2025-10-051-2/+2
| | | | | | | | | | | | | | | Make amdvi_get_pte_entry() return an error value (-1) in cases where the memory read fails, versus the current return of 0 to indicate failure. The reason is that 0 is also a valid value to have stored in the PTE in guest memory i.e. the guest does not have a mapping. Before this change, amdvi_get_pte_entry() returned 0 for both an error and for empty PTEs, but the page walker implementation that will be introduced in upcoming changes needs a method to differentiate between the two scenarios. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-7-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add helper function to extract the DTEAlejandro Jimenez2025-10-051-6/+42
| | | | | | | | | | | Extracting the DTE from a given AMDVIAddressSpace pointer structure is a common operation required for syncing the shadow page tables. Implement a helper to do it and check for common error conditions. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-6-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Helper to decode size of page invalidation commandAlejandro Jimenez2025-10-051-0/+34
| | | | | | | | | | | The size of the region to invalidate depends on the S bit and address encoded in the command. Add a helper to extract this information, which will be used to sync shadow page tables in upcoming changes. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-5-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Reorder device and page table helpersAlejandro Jimenez2025-10-051-86/+86
| | | | | | | | | | | | | | Move code related to Device Table and Page Table to an earlier location in the file, where it does not require forward declarations to be used by the various invalidation functions that will need to query the DTE and walk the page table in upcoming changes. This change consist of code movement only, no functional change intended. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-4-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Fix event log generationSairaj Kodilkar2025-08-011-9/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current event logging code is broken, because of following issues 1. The code uses '|' instead of '&' to test the bit field, which causes vIOMMU to generate overflow interrupt for every log entry. 2. Code does not update the eventlog tail MMIO register after adding an entry to the buffer, because of which guest cannot process new entries (as head == tail means buffer is empty). 3. Compares eventlog tail (which is byte offset in the buffer) to eventlog length (which is number of maximum entries in the buffer). This causes vIOMMU to generate only fix number of event logs, after which it keeps on generating overflow interrupts, without actually resetting the log buffer. 4. Updates ComWaitInt instead of EventLogInt bitfield in Status register. Guest checks this field to see if there are new event log entries in the buffer. 5. Does not reset event log head and tail pointers when guest writes to eventlog base register. Fix above issues, so that guest can process event log entries. Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250801060507.3382-7-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Support MMIO writes to the status registerSairaj Kodilkar2025-08-011-0/+3
| | | | | | | | | | | | Support the writes to the status register so that guest can reset the EventOverflow, EventLogInt, ComWaitIntr, etc bits after servicing the respective interrupt. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250801060507.3382-6-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Fix amdvi_write*()Sairaj Kodilkar2025-08-011-3/+18
| | | | | | | | | | | | | | amdvi_write*() function do not preserve the older values of W1C bits in the MMIO register. This results in all W1C bits set to 0, when guest tries to reset a single bit by writing 1 to it. Fix this by preserving W1C bits in the old value of the MMIO register. Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU") Suggested-by: Ethan MILON <ethan.milon@eviden.com> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Message-Id: <20250801060507.3382-5-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Move IOAPIC memory region initialization to the endSairaj Kodilkar2025-08-011-3/+3
| | | | | | | | | | | | | | | | | Setting up IOAPIC memory region requires mr_sys and mr_ir. Currently these two memory regions are setup after the initializing the IOAPIC memory region, which cause `amdvi_host_dma_iommu()` to use unitialized mr_sys and mr_ir. Move the IOAPIC memory region initialization to the end in order to use the mr_sys and mr_ir regions after they are fully initialized. Fixes: 577c470f4326 ("x86_iommu/amd: Prepare for interrupt remap support") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250801060507.3382-4-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Remove unused and wrongly set ats_enabled fieldSairaj Kodilkar2025-08-011-3/+2
| | | | | | | | | | | | | | | | The ats_enabled field is set using HTTUNEN, which is wrong. Fix this by removing the field as it is never used. MST: includes a tweak suggested by Philippe Fixes: d29a09ca68428 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250801060507.3382-3-sarunkod@amd.com> Message-ID: <948a6ac3-ded9-475b-8c45-9d36220b442b@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Fix MMIO register write tracingSairaj Kodilkar2025-08-011-5/+18
| | | | | | | | | | | | Define separate functions to trace MMIO write accesses instead of using `trace_amdvi_mmio_read()` for both read and write. Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20250801060507.3382-2-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Fix truncation of oldval in amdvi_writeqEthan Milon2025-07-141-1/+1
| | | | | | | | | | | | | | | | The variable `oldval` was incorrectly declared as a 32-bit `uint32_t`. This could lead to truncation and incorrect behavior where the upper read-only 32 bits are significant. Fix the type of `oldval` to match the return type of `ldq_le_p()`. Cc: qemu-stable@nongnu.org Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Ethan Milon <ethan.milon@eviden.com> Message-Id: <20250617150427.20585-9-alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Fix the calculation for Device Table sizeAlejandro Jimenez2025-07-141-2/+2
| | | | | | | | | | | | | Correctly calculate the Device Table size using the format encoded in the Device Table Base Address Register (MMIO Offset 0000h). Cc: qemu-stable@nongnu.org Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250617150427.20585-7-alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Update bitmasks representing DTE reserved fieldsAlejandro Jimenez2025-07-141-3/+4
| | | | | | | | | | | | | | | | | | | | The DTE validation method verifies that all bits in reserved DTE fields are unset. Update them according to the latest definition available in AMD I/O Virtualization Technology (IOMMU) Specification - Section 2.2.2.1 Device Table Entry Format. Remove the magic numbers and use a macro helper to generate bitmasks covering the specified ranges for better legibility. Note that some reserved fields specify that events are generated when they contain non-zero bits, or checks are skipped under certain configurations. This change only updates the reserved masks, checks for special conditions are not yet implemented. Cc: qemu-stable@nongnu.org Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250617150427.20585-4-alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Fix Device ID decoding for INVALIDATE_IOTLB_PAGES commandAlejandro Jimenez2025-07-141-2/+2
| | | | | | | | | | | | | | The DeviceID bits are extracted using an incorrect offset in the call to amdvi_iotlb_remove_page(). This field is read (correctly) earlier, so use the value already retrieved for devid. Cc: qemu-stable@nongnu.org Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250617150427.20585-3-alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Fix xtsup when vcpus < 255Vasant Hegde2025-06-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If vCPUs > 255 then x86 common code (x86_cpus_init()) call kvm_enable_x2apic(). But if vCPUs <= 255 then the common code won't calls kvm_enable_x2apic(). This is because commit 8c6619f3e692 ("hw/i386/amd_iommu: Simplify non-KVM checks on XTSup feature") removed the call to kvm_enable_x2apic when xtsup is "on", which break things when guest is booted with x2apic mode and there are <= 255 vCPUs. Fix this by adding back kvm_enable_x2apic() call when xtsup=on. Fixes: 8c6619f3e692 ("hw/i386/amd_iommu: Simplify non-KVM checks on XTSup feature") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Cc: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Message-Id: <20250516100535.4980-3-sarunkod@amd.com> Fixes: 8c6619f3e692 ("hw/i386/amd_iommu: Simplify non-KVM checks on XTSup feature") Reported-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Tested-by: Tested-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Cc: Philippe Mathieu-Daudé <philmd@linaro.org> Cc: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com>
* hw/i386/amd_iommu: Fix device setup failure when PT is on.Sairaj Kodilkar2025-06-011-10/+2
| | | | | | | | | | | | | | | | | | | | | | | | Commit c1f46999ef506 ("amd_iommu: Add support for pass though mode") introduces the support for "pt" flag by enabling nodma memory when "pt=off". This allowed VFIO devices to successfully register notifiers by using nodma region. But, This also broke things when guest is booted with the iommu=nopt because, devices bypass the IOMMU and use untranslated addresses (IOVA) to perform DMA reads/writes to the nodma memory region, ultimately resulting in a failure to setup the devices in the guest. Fix the above issue by always enabling the amdvi_dev_as->iommu memory region. But this will once again cause VFIO devices to fail while registering the notifiers with AMD IOMMU memory region. Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250516100535.4980-2-sarunkod@amd.com> Fixes: c1f46999ef506 ("amd_iommu: Add support for pass though mode") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
* hw/i386/amd_iommu: Allow migration when explicitly create the AMDVI-PCI deviceSuravee Suthikulpanit2025-05-141-0/+48
| | | | | | | | | | | | | | Add migration support for AMD IOMMU model by saving necessary AMDVIState parameters for MMIO registers, device table, command buffer, and event buffers. Also change devtab_len type from size_t to uint64_t to avoid 32-bit build issue. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20250504170405.12623-3-suravee.suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/amd_iommu: Isolate AMDVI-PCI from amd-iommu device to allow full ↵Suravee Suthikulpanit2025-05-141-21/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | control over the PCI device creation Current amd-iommu model internally creates an AMDVI-PCI device. Here is a snippet from info qtree: bus: main-system-bus type System dev: amd-iommu, id "" xtsup = false pci-id = "" intremap = "on" device-iotlb = false pt = true ... dev: q35-pcihost, id "" MCFG = -1 (0xffffffffffffffff) pci-hole64-size = 34359738368 (32 GiB) below-4g-mem-size = 134217728 (128 MiB) above-4g-mem-size = 0 (0 B) smm-ranges = true x-pci-hole64-fix = true x-config-reg-migration-enabled = true bypass-iommu = false bus: pcie.0 type PCIE dev: AMDVI-PCI, id "" addr = 01.0 romfile = "" romsize = 4294967295 (0xffffffff) rombar = -1 (0xffffffffffffffff) multifunction = false x-pcie-lnksta-dllla = true x-pcie-extcap-init = true failover_pair_id = "" acpi-index = 0 (0x0) x-pcie-err-unc-mask = true x-pcie-ari-nextfn-1 = false x-max-bounce-buffer-size = 4096 (4 KiB) x-pcie-ext-tag = true busnr = 0 (0x0) class Class 0806, addr 00:01.0, pci id 1022:0000 (sub 1af4:1100) ... This prohibits users from specifying the PCI topology for the amd-iommu device, which becomes a problem when trying to support VM migration since it does not guarantee the same enumeration of AMD IOMMU device. Therefore, allow the 'AMDVI-PCI' device to optionally be pre-created and associated with a 'amd-iommu' device via a new 'pci-id' parameter on the latter. For example: -device AMDVI-PCI,id=iommupci0,bus=pcie.0,addr=0x05 \ -device amd-iommu,intremap=on,pt=on,xtsup=on,pci-id=iommupci0 \ For backward-compatibility, internally create the AMDVI-PCI device if not specified on the CLI. Co-developed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20250504170405.12623-2-suravee.suthikulpanit@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* qom: Make InterfaceInfo[] uses constPhilippe Mathieu-Daudé2025-04-251-1/+1
| | | | | | | | | | | Mechanical change using: $ sed -i -E 's/\(InterfaceInfo.?\[/\(const InterfaceInfo\[/g' \ $(git grep -lE '\(InterfaceInfo.?\[\]\)') Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20250424194905.82506-7-philmd@linaro.org>
* qom: Have class_init() take a const data argumentPhilippe Mathieu-Daudé2025-04-251-3/+4
| | | | | | | | | | Mechanical change using gsed, then style manually adapted to pass checkpatch.pl script. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250424194905.82506-4-philmd@linaro.org>
* hw/i386/amd_iommu: Assign pci-id 0x1419 for the AMD IOMMU deviceSuravee Suthikulpanit2025-04-021-0/+1
| | | | | | | | | | | | | | | | | | | | | Currently, the QEMU-emulated AMD IOMMU device use PCI vendor id 0x1022 (AMD) with device id zero (undefined). Eventhough this does not cause any functional issue for AMD IOMMU driver since it normally uses information in the ACPI IVRS table to probe and initialize the device per recommendation in the AMD IOMMU specification, the device id zero causes the Windows Device Manager utility to show the device as an unknown device. Since Windows only recognizes AMD IOMMU device with device id 0x1419 as listed in the machine.inf file, modify the QEMU AMD IOMMU model to use the id 0x1419 to avoid the issue. This advertise the IOMMU as the AMD IOMMU device for Family 15h (Models 10h-1fh). Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Message-Id: <20250325021140.5676-1-suravee.suthikulpanit@amd.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Yan Vugenfirer <yvugenfi@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2025-02-221-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pc,pci: features, fixes, cleanups Features: SR-IOV emulation for pci virtio-mem-pci support for s390 interleave support for cxl big endian support for vdpa svq new QAPI events for vhost-user Also vIOMMU reset order fixups are in. Fixes, cleanups all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAme4b8sPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpHKcIAKPJsVqPdda2dJ7b7FdyRT0Q+uwezXqaGHd4 # 7Lzih1wsxYNkwIAyPtEb76/21qiS7BluqlUCfCB66R9xWjP5/KfvAFj4/r4AEduE # fxAgYzotNpv55zcRbcflMyvQ42WGiZZHC+o5Lp7vDXUP3pIyHrl0Ydh5WmcD+hwS # BjXvda58TirQpPJ7rUL+sSfLih17zQkkDcfv5/AgorDy1wK09RBKwMx/gq7wG8yJ # twy8eBY2CmfmFD7eTM+EKqBD2T0kwLEeLfS/F/tl5Fyg6lAiYgYtCbGLpAmWErsg # XZvfZmwqL7CNzWexGvPFnnLyqwC33WUP0k0kT88Y5wh3/h98blw= # =tej8 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 21 Feb 2025 20:21:31 HKT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (41 commits) docs/devel/reset: Document reset expectations for DMA and IOMMU hw/vfio/common: Add a trace point in vfio_reset_handler hw/arm/smmuv3: Move reset to exit phase hw/i386/intel-iommu: Migrate to 3-phase reset hw/virtio/virtio-iommu: Migrate to 3-phase reset vhost-user-snd: correct the calculation of config_size net: vhost-user: add QAPI events to report connection state hw/virtio/virtio-nsm: Respond with correct length vdpa: Fix endian bugs in shadow virtqueue MAINTAINERS: add more files to `vhost` cryptodev/vhost: allocate CryptoDevBackendVhost using g_mem0() vhost-iova-tree: Update documentation vhost-iova-tree, svq: Implement GPA->IOVA & partial IOVA->HVA trees vhost-iova-tree: Implement an IOVA-only tree amd_iommu: Use correct bitmask to set capability BAR amd_iommu: Use correct DTE field for interrupt passthrough hw/virtio: reset virtio balloon stats on machine reset mem/cxl_type3: support 3, 6, 12 and 16 interleave ways hw/mem/cxl_type3: Ensure errp is set on realization failure hw/mem/cxl_type3: Fix special_ops memory leak on msix_init_exclusive_bar() failure ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * amd_iommu: Use correct bitmask to set capability BARSairaj Kodilkar2025-02-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | AMD IOMMU provides the base address of control registers through IVRS table and PCI capability. Since this base address is of 64 bit, use 32 bits mask (instead of 16 bits) to set BAR low and high. Fixes: d29a09ca68 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250207045354.27329-3-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * amd_iommu: Use correct DTE field for interrupt passthroughSairaj Kodilkar2025-02-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Interrupt passthrough is determine by the bits 191,190,187-184. These bits are part of the 3rd quad word (i.e. index 2) in DTE. Hence replace dte[3] by dte[2]. Fixes: b44159fe0 ("x86_iommu/amd: Add interrupt remap support when VAPIC is not enabled") Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20250207045354.27329-2-sarunkod@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | hw/i386: Have X86_IOMMU devices inherit from DYNAMIC_SYS_BUS_DEVICEPhilippe Mathieu-Daudé2025-02-161-2/+0
|/ | | | | | | | | | | | Do not explain why _X86_IOMMU devices are user_creatable, have them inherit TYPE_DYNAMIC_SYS_BUS_DEVICE, to explicit they can optionally be plugged on TYPE_PLATFORM_BUS_DEVICE. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alexander Graf <graf@amazon.com> Reviewed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Message-Id: <20250125181343.59151-7-philmd@linaro.org>
* hw/i386/amd_iommu: Simplify non-KVM checks on XTSup featurePhilippe Mathieu-Daudé2024-12-311-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Generic code wanting to access KVM specific methods should do so being protected by the 'kvm_enabled()' helper. Doing so avoid link failures when optimization is disabled (using --enable-debug), see for example commits c04cfb4596a ("hw/i386: fix short-circuit logic with non-optimizing builds") and 0266aef8cd6 ("amd_iommu: Fix kvm_enable_x2apic link error with clang in non-KVM builds"). XTSup feature depends on KVM, so protect the whole block checking the XTSup feature with a check on whether KVM is enabled. Since x86_cpus_init() already checks APIC ID > 255 imply kernel support for irqchip and X2APIC, remove the confuse and unlikely reachable "AMD IOMMU xtsup=on requires support on the KVM side" message. Fix a type in "configuration" in error message. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Message-Id: <20241129155802.35534-1-philmd@linaro.org>
* include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LISTRichard Henderson2024-12-191-1/+0
| | | | | | | | | | | | | | Now that all of the Property arrays are counted, we can remove the terminator object from each array. Update the assertions in device_class_set_props to match. With struct Property being 88 bytes, this was a rather large form of terminator. Saves 30k from qemu-system-aarch64. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/i386: Constify all PropertyRichard Henderson2024-12-151-1/+1
| | | | | Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* amd_iommu: Fix kvm_enable_x2apic link error with clang in non-KVM buildsSairaj Kodilkar2024-11-281-3/+5
| | | | | | | | | | | | | | | | | | | | Commit b12cb3819 (amd_iommu: Check APIC ID > 255 for XTSup) throws linking error for the `kvm_enable_x2apic` when kvm is disabled and Clang is used for compilation. This issue comes up because Clang does not remove the function callsite (kvm_enable_x2apic in this case) during optimization when if condition have variable. Intel IOMMU driver solves this issue by creating separate if condition for checking variables, which causes call site being optimized away by virtue of `kvm_irqchip_is_split()` being defined as 0. Implement same solution for the AMD driver. Fixes: b12cb3819baf (amd_iommu: Check APIC ID > 255 for XTSup) Signed-off-by: Sairaj Kodilkar <sarunkod@amd.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Tested-by: Phil Dennis-Jordan <phil@philjordan.eu> Link: https://lore.kernel.org/r/20241114114509.15350-1-sarunkod@amd.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* amd_iommu: Check APIC ID > 255 for XTSupSuravee Suthikulpanit2024-11-041-0/+11
| | | | | | | | | | | | The XTSup mode enables x2APIC support for AMD IOMMU, which is needed to support vcpu w/ APIC ID > 255. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Message-Id: <20240927172913.121477-6-santosh.shukla@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Send notification when invalidate interrupt entry cacheSuravee Suthikulpanit2024-11-041-0/+12
| | | | | | | | | | | | | | | | | | In order to support AMD IOMMU interrupt remapping emulation with PCI pass-through devices, QEMU needs to notify VFIO when guest IOMMU driver updates and invalidate the guest interrupt remapping table (IRT), and communicate information so that the host IOMMU driver can update the shadowed interrupt remapping table in the host IOMMU. Therefore, send notification when guest IOMMU emulates the IRT invalidation commands. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Message-Id: <20240927172913.121477-5-santosh.shukla@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Use shared memory region for Interrupt RemappingSuravee Suthikulpanit2024-11-041-8/+14
| | | | | | | | | | | | Use shared memory region for interrupt remapping which can be aliased by all devices. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Message-Id: <20240927172913.121477-4-santosh.shukla@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Add support for pass though modeSuravee Suthikulpanit2024-11-041-9/+40
| | | | | | | | | | | | | Introduce 'nodma' shared memory region to support PT mode so that for each device, we only create an alias to shared memory region when DMA-remapping is disabled. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Message-Id: <20240927172913.121477-3-santosh.shukla@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* amd_iommu: Rename variable mmio to mr_mmioSuravee Suthikulpanit2024-11-041-3/+3
| | | | | | | | | | | | | | Rename the MMIO memory region variable 'mmio' to 'mr_mmio' so to correctly name align with struct AMDVIState::variable type. No functional change intended. Reviewed-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: Santosh Shukla <santosh.shukla@amd.com> Message-Id: <20240927172913.121477-2-santosh.shukla@amd.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw: Use device_class_set_legacy_reset() instead of opencodingPeter Maydell2024-09-131-1/+1
| | | | | | | | | | | | | Use device_class_set_legacy_reset() instead of opencoding an assignment to DeviceClass::reset. This change was produced with: spatch --macro-file scripts/cocci-macro-file.h \ --sp-file scripts/coccinelle/device-reset.cocci \ --keep-comments --smpl-spacing --in-place --dir hw Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org