focaccia-qemu - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	intel_iommu: Simplify caching mode check with VFIO device	Zhenzhong Duan	2025-10-05	2	-54/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In early days, we had different tricks to ensure caching-mode=on with VFIO device: 28cf553afe ("intel_iommu: Sanity check vfio-pci config on machine init done") c6cbc29d36 ("pc/q35: Disallow vfio-pci hotplug without VT-d caching mode") There is also a patch with the same purpose but for VDPA device: b8d78277c0 ("intel-iommu: fail MAP notifier without caching mode") Because without caching mode, MAP notifier won't work correctly since guest won't send IOTLB update event when it establishes new mappings in the I/O page tables. Now with host IOMMU device interface between VFIO and vIOMMU, we can simplify first two commits above with a small check in set_iommu_device(). This also works for future IOMMUFD backed VDPA implementation which may also need caching mode on. But for legacy VDPA we still need commit b8d78277c0 as it doesn't use the host IOMMU device interface. For coldplug VFIO device: qemu-system-x86_64: -device vfio-pci,host=0000:3b:00.0,id=hostdev3,bus=root0,iommufd=iommufd0: vfio 0000:3b:00.0: Failed to set vIOMMU: Device assignment is not allowed without enabling caching-mode=on for Intel IOMMU. For hotplug VFIO device: if "iommu=off" is configured in guest, Error: vfio 0000:3b:00.0: Failed to set vIOMMU: Device assignment is not allowed without enabling caching-mode=on for Intel IOMMU. else Error: vfio 0000:3b:00.0: memory listener initialization failed: Region vtd-00.0-dmar: device 01.00.0 requires caching mode: Operation not supported The specialty for hotplug is due to the check in commit b8d78277c0 happen before the check in set_iommu_device. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250929034206.439266-3-zhenzhong.duan@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	intel_iommu: Enable Enhanced Set Root Table Pointer Support (ESRTPS)	Zhenzhong Duan	2025-10-05	2	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	According to VTD spec rev 4.1 section 6.6: "For implementations reporting the Enhanced Set Root Table Pointer Support (ESRTPS) field as Clear, on a 'Set Root Table Pointer' operation, software must perform a global invalidate of the context cache, PASID-cache (if applicable), and IOTLB, in that order. This is required to ensure hardware references only the remapping structures referenced by the new root table pointer and not stale cached entries. For implementations reporting the Enhanced Set Root Table Pointer Support (ESRTPS) field as Set, as part of 'Set Root Table Pointer' operation, hardware performs global invalidation on all DMA remapping translation caches and hence software is not required to perform additional invalidations" We already implemented ESRTPS capability in vtd_handle_gcmd_srtp() by calling vtd_reset_caches(), just set ESRTPS in DMAR_CAP_REG to avoid unnecessary global invalidation requests of context, PASID-cache and IOTLB from guest. This change doesn't impact migration as the content of DMAR_CAP_REG is migrated too. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250929034206.439266-2-zhenzhong.duan@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: HATDis/HATS=11 support	Joao Martins	2025-10-05	3	-1/+25
\| \| \| \| \| \| \| \| \| \| \| \|	Add a way to disable DMA translation support in AMD IOMMU by allowing to set IVHD HATDis to 1, and exposing HATS (Host Address Translation Size) as Reserved value. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-23-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	intel-iommu: Move dma_translation to x86-iommu	Joao Martins	2025-10-05	2	-3/+3
\| \| \| \| \| \| \| \| \| \|	To be later reused by AMD, now that it shares similar property. Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-22-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Refactor amdvi_page_walk() to use common code for page walk	Alejandro Jimenez	2025-10-05	1	-50/+27
\| \| \| \| \| \| \| \| \| \| \|	Simplify amdvi_page_walk() by making it call the fetch_pte() helper that is already in use by the shadow page synchronization code. Ensures all code uses the same page table walking algorithm. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-21-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Do not assume passthrough translation when DTE[TV]=0	Alejandro Jimenez	2025-10-05	1	-39/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The AMD I/O Virtualization Technology (IOMMU) Specification (see Table 8: V, TV, and GV Fields in Device Table Entry), specifies that a DTE with V=1, TV=0 does not contain a valid address translation information. If a request requires a table walk, the walk is terminated when this condition is encountered. Do not assume that addresses for a device with DTE[TV]=0 are passed through (i.e. not remapped) and instead terminate the page table walk early. Fixes: d29a09ca6842 ("hw/i386: Introduce AMD IOMMU") Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-20-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Toggle address translation mode on devtab entry invalidation	Alejandro Jimenez	2025-10-05	1	-2/+120
\| \| \| \| \| \| \| \| \| \| \| \| \|	A guest must issue an INVALIDATE_DEVTAB_ENTRY command after changing a Device Table entry (DTE) e.g. after attaching a device and setting up its DTE. When intercepting this event, determine if the DTE has been configured for paging or not, and toggle the appropriate memory regions to allow DMA address translation for the address space if needed. Requires dma-remap=on. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-19-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Add dma-remap property to AMD vIOMMU device	Alejandro Jimenez	2025-10-05	2	-7/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In order to enable device assignment with IOMMU protection and guest DMA address translation, IOMMU MAP notifier support is necessary to allow users like VFIO to synchronize the shadow page tables i.e. to receive notifications when the guest updates its I/O page tables and replay the mappings onto host I/O page tables. Provide a new dma-remap property to govern the ability to register for MAP notifications, effectively providing global control over the DMA address translation functionality that was implemented in previous changes. Note that DMA remapping support also requires the vIOMMU is configured with the NpCache capability, so a guest driver issues IOMMU invalidations for both map() and unmap() operations. This capability is already set by default and written to the configuration in amdvi_pci_realize() as part of AMDVI_CAPAB_FEATURES. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-18-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Set all address spaces to use passthrough mode on reset	Alejandro Jimenez	2025-10-05	1	-0/+30
\| \| \| \| \| \| \| \| \| \|	On reset, restore the default address translation mode (passthrough) for all the address spaces managed by the vIOMMU. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-17-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Toggle memory regions based on address translation mode	Alejandro Jimenez	2025-10-05	1	-2/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable the appropriate memory region for an address space depending on the address translation mode selected for it. This is currently based on a generic x86 IOMMU property, and only done during the address space initialization. Extract the code into a helper and toggle the regions based on whether the specific address space is using address translation (via the newly introduced addr_translation field). Later, region activation will also be controlled by availability of DMA remapping capability (via dma-remap property to be introduced in follow up changes). Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-16-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL	Alejandro Jimenez	2025-10-05	1	-0/+48
\| \| \| \| \| \| \| \| \| \| \| \| \|	When the kernel IOMMU driver issues an INVALIDATE_IOMMU_ALL, the address translation and interrupt remapping information must be cleared for all Device IDs and all domains. Introduce a helper to sync the shadow page table for all the address spaces with registered notifiers, which replays both MAP and UNMAP events. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-15-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Add replay callback	Alejandro Jimenez	2025-10-05	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \|	A replay() method is necessary to efficiently synchronize the host page tables after VFIO registers a notifier for IOMMU events. It is called to ensure that existing mappings from an IOMMU memory region are "replayed" to a specified notifier, initializing or updating the shadow page tables on the host. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-14-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Unmap all address spaces under the AMD IOMMU on reset	Alejandro Jimenez	2025-10-05	1	-0/+74
\| \| \| \| \| \| \| \| \| \| \| \|	Support dropping all existing mappings on reset. When the guest kernel reboots it will create new ones, but other components that run before the kernel (e.g. OVMF) should not be able to use existing mappings from the previous boot. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-13-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Use iova_tree records to determine large page size on UNMAP	Alejandro Jimenez	2025-10-05	1	-6/+89
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep a record of mapped IOVA ranges per address space, using the iova_tree implementation. Besides enabling optimizations like avoiding unnecessary notifications, a record of existing <IOVA, size> mappings makes it possible to determine if a specific IOVA is mapped by the guest using a large page, and adjust the size when notifying UNMAP events. When unmapping a large page, the information in the guest PTE encoding the page size is lost, since the guest clears the PTE before issuing the invalidation command to the IOMMU. In such case, the size of the original mapping can be retrieved from the iova_tree and used to issue the UNMAP notification. Using the correct size is essential since the VFIO IOMMU Type1v2 driver in the host kernel will reject unmap requests that do not fully cover previous mappings. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-12-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Sync shadow page tables on page invalidation	Alejandro Jimenez	2025-10-05	1	-8/+74
\| \| \| \| \| \| \| \| \| \| \| \|	When the guest issues an INVALIDATE_IOMMU_PAGES command, decode the address and size of the invalidation and sync the guest page table state with the host. This requires walking the guest page table and calling notifiers registered for address spaces matching the domain ID encoded in the command. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-11-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Add basic structure to support IOMMU notifier updates	Alejandro Jimenez	2025-10-05	2	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the minimal data structures required to maintain a list of address spaces (i.e. devices) with registered notifiers, and to update the type of events that require notifications. Note that the ability to register for MAP notifications is not available. It will be unblocked by following changes that enable the synchronization of guest I/O page tables with host IOMMU state, at which point an amd-iommu device property will be introduced to control this capability. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-10-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Add a page walker to sync shadow page tables on invalidation	Alejandro Jimenez	2025-10-05	1	-0/+80
\| \| \| \| \| \| \| \| \| \| \|	For the specified address range, walk the page table identifying regions as mapped or unmapped and invoke registered notifiers with the corresponding event type. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-9-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Add helpers to walk AMD v1 Page Table format	Alejandro Jimenez	2025-10-05	2	-0/+163
\| \| \| \| \| \| \| \| \| \| \|	The current amdvi_page_walk() is designed to be called by the replay() method. Rather than drastically altering it, introduce helpers to fetch guest PTEs that will be used by a page walker implementation. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-8-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Return an error when unable to read PTE from guest memory	Alejandro Jimenez	2025-10-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make amdvi_get_pte_entry() return an error value (-1) in cases where the memory read fails, versus the current return of 0 to indicate failure. The reason is that 0 is also a valid value to have stored in the PTE in guest memory i.e. the guest does not have a mapping. Before this change, amdvi_get_pte_entry() returned 0 for both an error and for empty PTEs, but the page walker implementation that will be introduced in upcoming changes needs a method to differentiate between the two scenarios. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-7-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Add helper function to extract the DTE	Alejandro Jimenez	2025-10-05	1	-6/+42
\| \| \| \| \| \| \| \| \| \| \|	Extracting the DTE from a given AMDVIAddressSpace pointer structure is a common operation required for syncing the shadow page tables. Implement a helper to do it and check for common error conditions. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-6-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Helper to decode size of page invalidation command	Alejandro Jimenez	2025-10-05	2	-0/+38
\| \| \| \| \| \| \| \| \| \| \|	The size of the region to invalidate depends on the S bit and address encoded in the command. Add a helper to extract this information, which will be used to sync shadow page tables in upcoming changes. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-5-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	amd_iommu: Reorder device and page table helpers	Alejandro Jimenez	2025-10-05	1	-86/+86
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move code related to Device Table and Page Table to an earlier location in the file, where it does not require forward declarations to be used by the various invalidation functions that will need to query the DTE and walk the page table in upcoming changes. This change consist of code movement only, no functional change intended. Signed-off-by: Alejandro Jimenez <alejandro.j.jimenez@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250919213515.917111-4-alejandro.j.jimenez@oracle.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	intel_iommu: Add PRI operations support	CLEMENT MATHIEU--DRIF	2025-10-05	2	-0/+276
\| \| \| \| \| \| \| \| \|	Implement the PRI callbacks in vtd_iommu_ops. Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901111630.1018573-6-clement.mathieu--drif@eviden.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	intel_iommu: Declare registers for PRI	CLEMENT MATHIEU--DRIF	2025-10-05	1	-0/+55
\| \| \| \| \| \| \|	Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901111630.1018573-5-clement.mathieu--drif@eviden.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	intel_iommu: Declare PRI constants and structures	CLEMENT MATHIEU--DRIF	2025-10-05	1	-0/+49
\| \| \| \| \| \| \|	Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901111630.1018573-4-clement.mathieu--drif@eviden.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	intel_iommu: Bypass barrier wait descriptor	CLEMENT MATHIEU--DRIF	2025-10-05	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \|	wait_desc with SW=0,IF=0,FN=1 must not be considered as an invalid descriptor as it is used to implement section 7.10 of the VT-d spec. Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901111630.1018573-3-clement.mathieu--drif@eviden.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	smbios: cap DIMM size to 2Tb as workaround for broken Windows	Igor Mammedov	2025-10-05	2	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With current limit set to match max spec size (2PTb), Windows fails to parse type 17 records when DIMM size reaches 4Tb+. Failure happens in GetPhysicallyInstalledSystemMemory() function, and fails "Check SMBIOS System Memory Tables" SVVP test. Though not fatal, it might cause issues for userspace apps, something like [1]. Lets cap default DIMM size to 2Tb for now, until MS fixes it. 1) https://issues.redhat.com/browse/RHEL-81999?focusedId=27731200&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-27731200 PS: It's obvious 32 int overflow math somewhere in Windows, MS admitted that it's Windows bug and in a process of fixing it. However it's unclear if W10 and earlier would get the fix. So however I dislike changing defaults, we heed to work around the issue (it looks like QEMU regression while not being it). Hopefully 2Tb/DIMM split will last longer until VM memory size will become large enough to cause to many type 17 records issue again. PPS: Alternatively, instead of messing with defaults, we can create a dedicated knob to ask for desired DIMM size cap explicitly on CLI. That will let users to enable workaround when they hit this corner case. Downside is that knob has to be propagated up all mgmt stack, which might be not desirable. PPPS: Yet alternatively, users can configure initial RAM to be less than 4Tb and all additional RAM add as DIMMs on QEMU CLI. (however it's the job to be done by mgmt which could know Windows version and total amount of RAM) Signed-off-by: Igor Mammedov <imammedo@redhat.com> Fixes: 62f182c97b ("smbios: make memory device size configurable per Machine") Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901084915.2607632-1-imammedo@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	hw/i386/pc: Avoid overlap between CXL window and PCI 64bit BARs in QEMU	peng guo	2025-10-05	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When using a CXL Type 3 device together with a virtio 9p device in QEMU on a physical server, the 9p device fails to initialize properly. The kernel reports the following error: virtio: device uses modern interface but does not have VIRTIO_F_VERSION_1 9pnet_virtio virtio0: probe with driver 9pnet_virtio failed with error -22 Further investigation revealed that the 64-bit BAR space assigned to the 9pnet device was overlapped by the memory window allocated for the CXL devices. As a result, the kernel could not correctly access the BAR region, causing the virtio device to malfunction. An excerpt from /proc/iomem shows: 480010000-cffffffff : CXL Window 0 480010000-4bfffffff : PCI Bus 0000:00 4c0000000-4c01fffff : PCI Bus 0000:0c 4c0000000-4c01fffff : PCI Bus 0000:0d 4c0200000-cffffffff : PCI Bus 0000:00 4c0200000-4c0203fff : 0000:00:03.0 4c0200000-4c0203fff : virtio-pci-modern To address this issue, this patch adds the reserved memory end calculation for cxl devices to reserve sufficient address space and ensure that CXL memory windows are allocated beyond all PCI 64-bit BARs. This prevents overlap with 64-bit BARs regions such as those used by virtio or other pcie devices, resolving the conflict. QEMU Build Configuration: ./configure --prefix=/home/work/qemu_master/build/ \ --target-list=x86_64-softmmu \ --enable-kvm \ --enable-virtfs QEMU Boot Command: sudo /home/work/qemu_master/qemu/build/qemu-system-x86_64 \ -nographic -machine q35,cxl=on -enable-kvm -m 16G -smp 8 \ -hda /home/work/gp_qemu/rootfs.img \ -virtfs local,path=/home/work/gp_qemu/share,mount_tag=host0,security_model=passthrough,id=host0 \ -kernel /home/work/linux_output/arch/x86/boot/bzImage \ --append "console=ttyS0 crashkernel=256M root=/dev/sda rootfstype=ext4 rw loglevel=8" \ -object memory-backend-ram,id=vmem0,share=on,size=4096M \ -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -device cxl-rp,port=0,bus=cxl.1,id=root_port13,chassis=0,slot=2 \ -device cxl-type3,bus=root_port13,volatile-memdev=vmem0,id=cxl-vmem0,sn=0x123456789 \ -M cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G Fixes: 03b39fcf64bc ("hw/cxl: Make the CXL fixed memory window setup a machine parameter") Signed-off-by: peng guo <engguopeng@buaa.edu.cn> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250805142300.15226-1-engguopeng@buaa.edu.cn> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
*	hw/i386/pc_piix.c: remove unnecessary if() from pc_init1()	Mark Cave-Ayland	2025-09-02	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the isapc logic has been split out of pc_piix.c, the PCI Host Bridge (phb) object is now always set in pc_init1(). Since phb is now guaranteed not to be NULL, Coverity reports that the if() statement surrounding ioapic_init_gsi() is now unnecessary and can be removed along with the phb NULL initialiser. Coverity: CID 1620557 Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Fixes: 99d0630a45 ("hw/i386/pc_piix.c: assume pcmc->pci_enabled is always true in pc_init1()") Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250901203409.1196620-1-mark.caveayland@nutanix.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
*	hw/i386/isapc.c: replace rom_memory with system_memory	Mark Cave-Ayland	2025-08-29	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we can guarantee the isapc machine will never have a PCI bus, any instances of rom_memory can be replaced by system_memory and rom_memory removed completely. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-20-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: replace rom_memory with pci_memory	Mark Cave-Ayland	2025-08-29	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Now that we can guarantee the i440fx-pc machine will always have a PCI bus, any instances of rom_memory can be replaced by pci_memory and rom_memory removed completely. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-19-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove unused headers after isapc machine split	Mark Cave-Ayland	2025-08-29	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \|	The headers for isapc-only devices can be removed from pc_piix.c since they are no longer used by the i440fx-pc machine. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-18-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386: move isapc machine to separate isapc.c file	Mark Cave-Ayland	2025-08-29	4	-175/+191
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that pc_init_isa() is independent of any PCI initialisation, move it into a separate isapc.c file including the ISA IDE variables which are now no longer needed for the pc-i440fx machine. This enables us to finally fix the dependency of ISAPC on I440FX in hw/i386/Kconfig. Note that as part of the move to a separate file we can see that the licence text is a verbatim copy of the MIT licence. The text originates from commit 1df912cf9e ("VL license of the day is MIT/BSD") so we can be sure that this was the original intent. As a consequence we can update the file header to use a SPDX tag as per the current project contribution guidelines. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-17-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: assume pcmc->pci_enabled is always true in pc_init1()	Mark Cave-Ayland	2025-08-29	1	-117/+77
\| \| \| \| \| \| \| \| \| \| \| \|	PCI is always enabled on the pc-i440fx machine so hardcode the relevant logic in pc_init1(). Add an assert() to ensure that this is always the case at runtime as already done in pc_q35_init(). Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-16-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: always initialise ISA IDE drives in pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-20/+15
\| \| \| \| \| \| \| \| \| \| \| \|	By definition an isapc machine must always use ISA IDE drives so ensure that they are always enabled. At the same time also remove the surrounding CONFIG_IDE_ISA define since it will be enabled via the ISAPC Kconfig. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-15-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove pc_system_flash_cleanup_unused() from pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	This function contains 'assert(PC_MACHINE_GET_CLASS(pcms)->pci_enabled)' and so we can safely assume that it should never be used for the isapc machine. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250828111057.468712-14-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: hardcode hole64_size to 0 in pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \|	All isapc machines must have 32-bit CPUs and have no PCI 64-bit hole so it can be hardcoded to 0. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-13-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: simplify RAM size logic in pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-54/+4
\| \| \| \| \| \| \| \| \| \|	All isapc machines must have 32-bit CPUs and so the RAM split logic can be hardcoded accordingly. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250828111057.468712-12-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove nvdimm initialisation from pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-6/+0
\| \| \| \| \| \| \| \| \| \|	NVDIMMs cannot be used by PCs from a pre-PCI era. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-11-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove SGX initialisation from pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	The Intel SGX instructions only exist on recent CPUs and so would never be available on a CPU from the pre-PCI era. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-10-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove SMI and piix4_pm initialisation from pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-19/+0
\| \| \| \| \| \| \| \| \| \|	These are based upon the PIIX4 PCI chipset and so can never be used on an isapc machine. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-9-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove igvm initialisation from pc_init_isa()	Mark Cave-Ayland	2025-08-29	1	-10/+0
\| \| \| \| \| \| \| \| \|	According to the QEMU documentation igvm is only supported for the pc and q35 machines so remove igvm support from the isapc machine. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Link: https://lore.kernel.org/r/20250828111057.468712-8-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove pcmc->pci_enabled dependent initialisation from ↵	Mark Cave-Ayland	2025-08-29	1	-105/+15
\| \| \| \| \| \| \| \| \| \| \| \|	pc_init_isa() PCI code will never be used for an isapc machine. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-7-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: duplicate pc_init1() into pc_isa_init()	Mark Cave-Ayland	2025-08-29	1	-1/+274
\| \| \| \| \| \| \| \| \|	This is to prepare for splitting the isapc machine into its own separate file. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250828111057.468712-6-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: inline pc_xen_hvm_init_pci() into pc_xen_hvm_init()	Mark Cave-Ayland	2025-08-29	1	-9/+4
\| \| \| \| \| \| \| \| \| \|	This helps to simplify the initialisation of the Xen hvm machine. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-5-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: remove include for loader.h	Mark Cave-Ayland	2025-08-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	This header is not required since the loader functionality is handled separately by pc_memory_init() in pc.c. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250828111057.468712-4-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: restrict isapc machine to 3.5G memory	Mark Cave-Ayland	2025-08-29	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	Since the isapc machine is now limited to using 32-bit CPUs, add a hard restriction so that the machine cannot be started with more than 3.5G memory. This matches the default value for max_ram_below_4g if not specified and provides consistent behaviour betweem TCG and KVM accelerators. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Link: https://lore.kernel.org/r/20250828111057.468712-3-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	hw/i386/pc_piix.c: restrict isapc machine to 32-bit CPUs	Mark Cave-Ayland	2025-08-29	1	-0/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The isapc machine represents a legacy ISA PC with a 486 CPU. Whilst it is possible to specify any CPU via -cpu on the command line, it makes no sense to allow modern 64-bit CPUs to be used. Restrict the isapc machine to the available 32-bit CPUs, taking care to handle the case where if a user inadvertently uses either -cpu max or -cpu host then the "best" 32-bit CPU is used (in this case the pentium3). Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250828111057.468712-2-mark.caveayland@nutanix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging	Richard Henderson	2025-08-28	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* rust: declare self as qemu_api for proc-macros * rust/qemu-api-macros: make derive(Object) friendly when missing parent * x86/loader: Don't update kernel header for CoCo VMs * target/i386: Add support for save/load of exception error code * i386/tcg/svm: fix incorrect canonicalization * scripts/minikconf.py: small fixes # -----BEGIN PGP SIGNATURE----- # # iQFIBAABCgAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmivPVYUHHBib256aW5p # QHJlZGhhdC5jb20ACgkQv/vSX3jHroNi/wf/VvAfmXDNgiffoXl91cF8kx2zSs8L # D+pd/ufVEkFYsU1EnHUsGJKK0XrjHp/beCGkWZr9nTP448n1t5MiTYgI9z5Lkult # hwBQMZsxbOLw4BItbh9obWC5HrfHqgpy88hsfy+RfiSU31ae4drzottDm3/VbaFY # 2d0x9ai8lvaTk+GqBV8EeeCT210tS/Cb/8HC22o+vC2O2/cztnuCj6wtD43ocDEk # lhT00edP8jUX4EoPAx18Qkv/zzPL/p9jWVAFCcE/IZ/e4LSrgA61aUyoP9vvrjWh # U+f8C4MV2o8oZ1lM9FC5hJ0LdQbeq1kxqqukQIKYlRiFXjD3LZ+3wJueHQ== # =XEsN # -----END PGP SIGNATURE----- # gpg: Signature made Thu 28 Aug 2025 03:16:06 AM AEST # gpg: using RSA key F13338574B662389866C7682BFFBD25F78C7AE83 # gpg: issuer "pbonzini@redhat.com" # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [unknown] # gpg: aka "Paolo Bonzini <pbonzini@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4 E2F7 7E15 100C CD36 69B1 # Subkey fingerprint: F133 3857 4B66 2389 866C 7682 BFFB D25F 78C7 AE83 * tag 'for-upstream' of https://gitlab.com/bonzini/qemu: rust: move dependencies to rust/Cargo.toml rust: declare self as qemu_api for proc-macros rust/qemu-api-macros: make derive(Object) friendly when missing parent subprojects: update proc-macro2 and syn rust: qemu-api-macros: support matching more than one error rust: disable borrow_as_ptr warning kvm/kvm-all: make kvm_park/unpark_vcpu local to kvm-all.c i386/tcg/svm: fix incorrect canonicalization x86/loader: Don't update kernel header for CoCo VMs MAINTAINERS: add a few more files to "Top Level Makefile and configure" python: mkvenv: fix messages printed by mkvenv scripts/minikconf.py: s/Error/KconfigParserError scripts/minikconf.py: fix invalid attribute access target/i386: Add support for save/load of exception error code Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
\| *	x86/loader: Don't update kernel header for CoCo VMs	Xiaoyao Li	2025-08-27	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Update the header makes it different from the original kernel that user provides via "-kernel", which leads to a different hash and breaks the attestation, e.g., for TDX. We already skip it for SEV VMs. Instead of adding another check of is_tdx_vm() to cover the TDX case, check machine->cgs to cover all the confidential computing case for x86. Reported-by: Vikrant Garg <vikrant1garg@gmail.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250814092111.2353598-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>