summary refs log tree commit diff stats
path: root/hw/i386/intel_iommu_internal.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* intel_iommu: Enable Enhanced Set Root Table Pointer Support (ESRTPS)Zhenzhong Duan2025-10-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to VTD spec rev 4.1 section 6.6: "For implementations reporting the Enhanced Set Root Table Pointer Support (ESRTPS) field as Clear, on a 'Set Root Table Pointer' operation, software must perform a global invalidate of the context cache, PASID-cache (if applicable), and IOTLB, in that order. This is required to ensure hardware references only the remapping structures referenced by the new root table pointer and not stale cached entries. For implementations reporting the Enhanced Set Root Table Pointer Support (ESRTPS) field as Set, as part of 'Set Root Table Pointer' operation, hardware performs global invalidation on all DMA remapping translation caches and hence software is not required to perform additional invalidations" We already implemented ESRTPS capability in vtd_handle_gcmd_srtp() by calling vtd_reset_caches(), just set ESRTPS in DMAR_CAP_REG to avoid unnecessary global invalidation requests of context, PASID-cache and IOTLB from guest. This change doesn't impact migration as the content of DMAR_CAP_REG is migrated too. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250929034206.439266-2-zhenzhong.duan@intel.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add PRI operations supportCLEMENT MATHIEU--DRIF2025-10-051-0/+2
| | | | | | | | | Implement the PRI callbacks in vtd_iommu_ops. Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901111630.1018573-6-clement.mathieu--drif@eviden.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Declare PRI constants and structuresCLEMENT MATHIEU--DRIF2025-10-051-0/+49
| | | | | | | Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901111630.1018573-4-clement.mathieu--drif@eviden.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Declare supported PASID sizeCLEMENT MATHIEU--DRIF2025-07-151-0/+1
| | | | | | | | | | | the PSS field of the extended capabilities stores the supported PASID size minus 1. This commit adds support for 8bits PASIDs (limited by MemTxAttrs::pid). Signed-off-by: Clement Mathieu--Drif <clement.mathieu--drif@eviden.com> Message-Id: <20250628180226.133285-6-clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Introduce a property x-flts for stage-1 translationZhenzhong Duan2025-01-151-0/+2
| | | | | | | | | | | | | | | | | | | | | Intel VT-d 3.0 introduces scalable mode, and it has a bunch of capabilities related to scalable mode translation, thus there are multiple combinations. This vIOMMU implementation wants to simplify it with a new property "x-flts". When turned on in scalable mode, stage-1 translation is supported. When turned on in legacy mode, throw out error. With stage-1 translation support exposed to user, also accurate the pasid entry check in vtd_pe_type_check(). Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Message-Id: <20241212083757.605022-19-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add support for PASID-based device IOTLB invalidationClément Mathieu--Drif2025-01-151-0/+11
| | | | | | | | | Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-14-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Process PASID-based iotlb invalidationZhenzhong Duan2025-01-151-0/+3
| | | | | | | | | | | | | | | | | PASID-based iotlb (piotlb) is used during walking Intel VT-d stage-1 page table. This emulates the stage-1 page table iotlb invalidation requested by a PASID-based IOTLB Invalidate Descriptor (P_IOTLB). Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241212083757.605022-12-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Set accessed and dirty bits during stage-1 translationClément Mathieu--Drif2025-01-151-0/+3
| | | | | | | | | | Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-10-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Check if the input address is canonicalClément Mathieu--Drif2025-01-151-0/+1
| | | | | | | | | | | | | Stage-1 translation must fail if the address to translate is not canonical. Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241212083757.605022-8-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Implement stage-1 translationYi Liu2025-01-151-0/+34
| | | | | | | | | | | | | | | This adds stage-1 page table walking to support stage-1 only translation in scalable mode. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Co-developed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-7-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Rename slpte to pteYi Liu2025-01-151-12/+12
| | | | | | | | | | | | | | | | | | Because we will support both FST(a.k.a, FLT) and SST(a.k.a, SLT) translation, rename variable and functions from slpte to pte whenever possible. But some are SST only, they are renamed with sl_ prefix. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Co-developed-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241212083757.605022-6-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Flush stage-2 cache in PASID-selective PASID-based iotlb ↵Zhenzhong Duan2025-01-151-5/+9
| | | | | | | | | | | | | | | | | | | | invalidation Per VT-d spec 4.1, 6.5.2.4, "Table 21. PASID-based-IOTLB Invalidation", PADID-selective PASID-based iotlb invalidation will flush stage-2 iotlb entries with matching domain id and pasid. With stage-1 translation introduced, guest could send PASID-selective PASID-based iotlb invalidation to flush either stage-1 or stage-2 entries. By this chance, remove old IOTLB related definitions which were unused. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-5-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Use the latest fault reasons defined by specYu Zhang2025-01-151-1/+8
| | | | | | | | | | | | | | | | | | | Spec revision 3.0 or above defines more detailed fault reasons for scalable mode. So introduce them into emulation code, see spec section 7.1.2 for details. Note spec revision has no relation with VERSION register, Guest kernel should not use that register to judge what features are supported. Instead cap/ecap bits should be checked. Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20241212083757.605022-2-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add missed reserved bit check for IEC descriptorZhenzhong Duan2024-11-041-0/+3
| | | | | | | | | | | | IEC descriptor is 128-bit invalidation descriptor, must be padded with 128-bits of 0s in the upper bytes to create a 256-bit descriptor when the invalidation queue is configured for 256-bit descriptors (IQA_REG.DW=1). Fixes: 02a2cbc872df ("x86-iommu: introduce IEC notifiers") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20241104125536.1236118-4-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add missed sanity check for 256-bit invalidation queueZhenzhong Duan2024-11-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to VTD spec, a 256-bit descriptor will result in an invalid descriptor error if submitted in an IQ that is setup to provide hardware with 128-bit descriptors (IQA_REG.DW=0). Meanwhile, there are old inv desc types (e.g. iotlb_inv_desc) that can be either 128bits or 256bits. If a 128-bit version of this descriptor is submitted into an IQ that is setup to provide hardware with 256-bit descriptors will also result in an invalid descriptor error. The 2nd will be captured by the tail register update. So we only need to focus on the 1st. Because the reserved bit check between different types of invalidation desc are common, so introduce a common function vtd_inv_desc_reserved_check() to do all the checks and pass the differences as parameters. With this change, need to replace error_report_once() call with error_report() to catch different call sites. This isn't an issue as error_report_once() here is mainly used to help debug guest error, but it only dumps once in qemu life cycle and doesn't help much, we need error_report() instead. Fixes: c0c1d351849b ("intel_iommu: add 256 bits qi_desc support") Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20241104125536.1236118-3-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Introduce property "stale-tm" to control Transient Mapping (TM) ↵Zhenzhong Duan2024-11-041-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | field VT-d spec removed Transient Mapping (TM) field from second-level page-tables and treat the field as Reserved(0) since revision 3.2. Changing the field as reserved(0) will break backward compatibility, so introduce a property "stale-tm" to allow user to control the setting. Use pc_compat_9_1 to handle the compatibility for machines before 9.2 which allow guest to set the field. Starting from 9.2, this field is reserved(0) by default to match spec. Of course, user can force it on command line. This doesn't impact function of vIOMMU as there was no logic to emulate Transient Mapping. Suggested-by: Yi Liu <yi.l.liu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241028022514.806657-1-zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Fix invalidation descriptor type fieldZhenzhong Duan2024-09-111-5/+6
| | | | | | | | | | | | | | | | | | | | According to spec, invalidation descriptor type is 7bits which is concatenation of bits[11:9] and bits[3:0] of invalidation descriptor. Currently we only pick bits[3:0] as the invalidation type and treat bits[11:9] as reserved zero. This is not a problem for now as bits[11:9] is zero for all current invalidation types. But it will break if newer type occupies bits[11:9]. Fix it by taking bits[11:9] into type and make reserved bits check accurate. Suggested-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Message-Id: <20240814071321.2621384-2-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: fix type of the mask field in VTDIOTLBPageInvInfoClément Mathieu--Drif2024-07-211-1/+1
| | | | | | | | | | | | | Per the below code, it can overflow as am can be larger than 8 according to the CH 6.5.2.3 IOTLB Invalidate. Use uint64_t to avoid overflows. Fixes: b5a280c00840 ("intel-iommu: add IOTLB using hash table") Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20240709142557.317271-4-clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: move VTD_FRCD_PV and VTD_FRCD_PP declarationsClément Mathieu--Drif2024-07-211-2/+2
| | | | | | | | | | | | These 2 macros are for high 64-bit of the FRCD registers. Declarations have to be moved accordingly. Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20240709142557.317271-3-clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: fix FRCD construction macroClément Mathieu--Drif2024-07-211-1/+1
| | | | | | | | | | | | | | The constant must be unsigned, otherwise the two's complement overrides the other fields when a PASID is present. Fixes: 1b2b12376c8a ("intel-iommu: PASID support") Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Minwoo Im <minwoo.im@samsung.com> Message-Id: <20240709142557.317271-2-clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: Report interrupt remapping faults, fix return valueDavid Woodhouse2023-10-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | A generic X86IOMMUClass->int_remap function should not return VT-d specific values; fix it to return 0 if the interrupt was successfully translated or -EINVAL if not. The VTD_FR_IR_xxx values are supposed to be used to actually raise faults through the fault reporting mechanism, so do that instead for the case where the IRQ is actually being injected. There is more work to be done here, as pretranslations for the KVM IRQ routing table can't fault; an untranslatable IRQ should be handled in userspace and the fault raised only when the IRQ actually happens (if indeed the IRTE is still not valid at that time). But we can work on that later; we can at least raise faults for the direct case. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <31bbfc9041690449d3ac891f4431ec82174ee1b4.camel@infradead.org> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/intel_iommu: Fix struct VTDInvDescIEC on big endian hostsThomas Huth2023-08-031-0/+9
| | | | | | | | | | | | On big endian hosts, we need to reverse the bitfield order in the struct VTDInvDescIEC, just like it is already done for the other bitfields in the various structs of the intel-iommu device. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230802135723.178083-4-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel_iommu: refine iotlb hash calculationJason Wang2023-04-241-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1b2b12376c8 ("intel-iommu: PASID support") takes PASID into account when calculating iotlb hash like: static guint vtd_iotlb_hash(gconstpointer v) { const struct vtd_iotlb_key *key = v; return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) | (key->level) << VTD_IOTLB_LVL_SHIFT | (key->pasid) << VTD_IOTLB_PASID_SHIFT; } This turns out to be problematic since: - the shift will lose bits if not converting to uint64_t - level should be off by one in order to fit into 2 bits - VTD_IOTLB_PASID_SHIFT is 30 but PASID is 20 bits which will waste some bits - the hash result is uint64_t so we will lose bits when converting to guint So this patch fixes them by - converting the keys into uint64_t before doing the shift - off level by one to make it fit into two bits - change the sid, lvl and pasid shift to 26, 42 and 44 in order to take the full width of uint64_t - perform an XOR to the top 32bit with the bottom 32bit for the final result to fit guint Fixes: Coverity CID 1508100 Fixes: 1b2b12376c8 ("intel-iommu: PASID support") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230412073510.7158-1-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel-iommu: PASID supportJason Wang2022-11-071-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduce ECAP_PASID via "x-pasid-mode". Based on the existing support for scalable mode, we need to implement the following missing parts: 1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation with PASID 2) tag IOTLB with PASID 3) PASID cache and its flush 4) PASID based IOTLB invalidation For simplicity PASID cache is not implemented so we can simply implement the PASID cache flush as a no and leave it to be implemented in the future. For PASID based IOTLB invalidation, since we haven't had L1 stage support, the PASID based IOTLB invalidation is not implemented yet. For PASID based device IOTLB invalidation, it requires the support for vhost so we forbid enabling device IOTLB when PASID is enabled now. Those work could be done in the future. Note that though PASID based IOMMU translation is ready but no device can issue PASID DMA right now. In this case, PCI_NO_PASID is used as PASID to identify the address without PASID. vtd_find_add_as() has been extended to provision address space with PASID which could be utilized by the future extension of PCI core to allow device model to use PASID based DMA translation. This feature would be useful for: 1) prototyping PASID support for devices like virtio 2) future vPASID work 3) future PRS and vSVA work Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-5-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: block output address in interrupt address rangeJason Wang2022-05-161-0/+4
| | | | | | | | | | | | | | | | | | | According to vtd spec v3.3 3.14: """ Software must not program paging-structure entries to remap any address to the interrupt address range. Untranslated requests and translation requests that result in an address in the interrupt range will be blocked with condition code LGN.4 or SGN.8. """ This patch blocks the request that result in interrupt address range. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220210092815.45174-2-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel-iommu: remove VTD_FR_RESERVED_ERRJason Wang2022-05-161-5/+0
| | | | | | | | | | | This fault reason is not used and is duplicated with SPT.2 condition code. So let's remove it. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220210092815.45174-1-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel_iommu: support snoop controlJason Wang2022-03-061-0/+1
| | | | | | | | | | | | | | | | | | | | SC is required for some kernel features like vhost-vDPA. So this patch implements basic SC feature. The idea is pretty simple, for software emulated DMA it would be always coherent. In this case we can simple advertise ECAP_SC bit. For VFIO and vhost, thing will be more much complicated, so this patch simply fail the IOMMU notifier registration. In the future, we may want to have a dedicated notifiers flag or similar mechanism to demonstrate the coherency so VFIO could advertise that if it has VFIO_DMA_CC_IOMMU, for vhost kernel backend we don't need that since it's a software backend. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220214060346.72455-1-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: ignore leaf SNP bit in scalable modeJason Wang2021-11-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | When booting with scalable mode, I hit this error: qemu-system-x86_64: vtd_iova_to_slpte: detected splte reserve non-zero iova=0xfffff002, level=0x1slpte=0x102681803) qemu-system-x86_64: vtd_iommu_translate: detected translation failure (dev=01:00:00, iova=0xfffff002) qemu-system-x86_64: New fault is not recorded due to compression of faults This is because the SNP bit is set for second level page table since Linux kernel commit 6c00612d0cba1 ("iommu/vt-d: Report right snoop capability when using FL for IOVA") even if SC is not supported by the hardware. To unbreak the guest, ignore the leaf SNP bit for scalable mode first. In the future we may consider to add SC support. Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20211129033618.3857-1-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel_iommu: Use correct shift for 256 bits qi descriptorLiu Yi L2020-07-221-1/+2
| | | | | | | | | | | | | | | | | | In chapter 10.4.23 of VT-d spec 3.0, Descriptor Width bit was introduced in VTD_IQA_REG. Software could set this bit to tell VT-d the QI descriptor from software would be 256 bits. Accordingly, the VTD_IQH_QH_SHIFT should be 5 when descriptor size is 256 bits. This patch adds the DW bit check when deciding the shift used to update VTD_IQH_REG. Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Message-Id: <1593850035-35483-1-git-send-email-yi.l.liu@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Cc: qemu-stable@nongnu.org Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: add present bit check for pasid table entriesLiu Yi L2020-01-061-0/+1
| | | | | | | | | | | | | | | | | | | The present bit check for pasid entry (pe) and pasid directory entry (pdire) were missed in previous commits as fpd bit check doesn't require present bit as "Set". This patch adds the present bit check for callers which wants to get a valid pe/pdire. Cc: qemu-stable@nongnu.org Cc: Kevin Tian <kevin.tian@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Peter Xu <peterx@redhat.com> Cc: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Liu Yi L <yi.l.liu@intel.com> Message-Id: <1578058086-4288-3-git-send-email-yi.l.liu@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: TM field should not be in reserved bitsQi, Yadong2019-11-251-3/+10
| | | | | | | | | | | | | When dt is supported, TM field should not be Reserved(0). Refer to VT-d Spec 9.8 Signed-off-by: Zhang, Qi <qi1.zhang@intel.com> Signed-off-by: Qi, Yadong <yadong.qi@intel.com> Message-Id: <20191125003321.5669-3-yadong.qi@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: refine SL-PEs reserved fields checkingQi, Yadong2019-11-251-4/+1
| | | | | | | | | | | | | 1. split the resevred fields arrays into two ones, 2. large page only effect for L2(2M) and L3(1G), so remove checking of L1 and L4 for large page. Signed-off-by: Zhang, Qi <qi1.zhang@intel.com> Signed-off-by: Qi, Yadong <yadong.qi@intel.com> Message-Id: <20191125003321.5669-2-yadong.qi@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Drop extended root fieldPeter Xu2019-04-021-1/+0
| | | | | | | | | | | | | | | | VTD_RTADDR_RTT is dropped even by the VT-d spec, so QEMU should probably do the same thing (after all we never really implemented it). Since we've had a field for that in the migration stream, to keep compatibility we need to fill the hole up. Please refer to VT-d spec 10.4.6. Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20190329061422.7926-3-peterx@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@intel.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: add scalable-mode option to make scalable mode workYi Sun2019-03-121-0/+4
| | | | | | | | | | | | | | | | | | | | | | | This patch adds an option to provide flexibility for user to expose Scalable Mode to guest. User could expose Scalable Mode to guest by the config as below: "-device intel-iommu,caching-mode=on,scalable-mode=on" The Linux iommu driver has supported scalable mode. Please refer below patch set: https://www.spinics.net/lists/kernel/msg2985279.html Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Message-Id: <1551753295-30167-4-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: add 256 bits qi_desc supportLiu, Yi L2019-03-121-1/+8
| | | | | | | | | | | | | Per Intel(R) VT-d 3.0, the qi_desc is 256 bits in Scalable Mode. This patch adds emulation of 256bits qi_desc. Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> [Yi Sun is co-developer to rebase and refine the patch.] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <1551753295-30167-3-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: scalable mode emulationLiu, Yi L2019-03-121-2/+39
| | | | | | | | | | | | | | | | | | | | | | | Intel(R) VT-d 3.0 spec introduces scalable mode address translation to replace extended context mode. This patch extends current emulator to support Scalable Mode which includes root table, context table and new pasid table format change. Now intel_iommu emulates both legacy mode and scalable mode (with legacy-equivalent capability set). The key points are below: 1. Extend root table operations to support both legacy mode and scalable mode. 2. Extend context table operations to support both legacy mode and scalable mode. 3. Add pasid tabled operations to support scalable mode. Signed-off-by: Liu, Yi L <yi.l.liu@intel.com> [Yi Sun is co-developer to contribute much to refine the whole commit.] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Message-Id: <1551753295-30167-2-git-send-email-yi.y.sun@linux.intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel_iommu: dma read/write draining supportPeter Xu2018-12-191-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Support DMA read/write draining should be easy for existing VT-d emulation since the emulation itself does not have any request queue there so we don't need to do anything to flush the un-commited queue. What we need to do is to declare the support. These capabilities are required to pass Windows SVVP test program. It is verified that when with parameters "x-aw-bits=48,caching-mode=off" we can pass the Windows SVVP test with this patch applied. Otherwise we'll fail with: IOMMU[0] - DWD (DMA write draining) not supported IOMMU[0] - DWD (DMA read draining) not supported Segment 0 has no DMA remapping capable IOMMU units However since these bits are not declared support for QEMU<=3.1, we'll need a compatibility bit for it and we turn this on by default only for QEMU>=4.0. Please refer to VT-d spec 6.5.4 for more information. CC: Yu Wang <wyu@redhat.com> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1654550 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: Extend address width to 48 bitsPrasad Singamsetty2018-01-181-6/+3
| | | | | | | | | | | | | | | | | | | | | The current implementation of Intel IOMMU code only supports 39 bits iova address width. This patch provides a new parameter (x-aw-bits) for intel-iommu to extend its address width to 48 bits but keeping the default the same (39 bits). The reason for not changing the default is to avoid potential compatibility problems with live migration of intel-iommu enabled QEMU guest. The only valid values for 'x-aw-bits' parameter are 39 and 48. After enabling larger address width (48), we should be able to map larger iova addresses in the guest. For example, a QEMU guest that is configured with large memory ( >=1TB ). To check whether 48 bits aw is enabled, we can grep in the guest dmesg output with line: "DMAR: Host address width 48". Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: Redefine macros to enable supporting 48 bit address widthPrasad Singamsetty2018-01-181-8/+26
| | | | | | | | | | | | | The current implementation of Intel IOMMU code only supports 39 bits host/iova address width so number of macros use hard coded values based on that. This patch is to redefine them so they can be used with variable address widths. This patch doesn't add any new functionality but enables adding support for 48 bit address width. Signed-off-by: Prasad Singamsetty <prasad.singamsety@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: fix iova for ptPeter Xu2017-08-021-1/+0
| | | | | | | | | | | | | IOMMUTLBEntry.iova is returned incorrectly on one PT path (though mostly we cannot really trigger this path, even if we do, we are mostly disgarding this value, so it didn't break anything). Fix it by converting the VTD_PAGE_MASK into the correct definition VTD_PAGE_MASK_4K, then remove VTD_PAGE_MASK. Fixes: b93130 ("intel_iommu: cleanup vtd_{do_}iommu_translate()") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: cleanup vtd_{do_}iommu_translate()Peter Xu2017-06-161-0/+1
| | | | | | | | | | | | | First, let vtd_do_iommu_translate() return a status, so that we explicitly knows whether error occured. Meanwhile, we make sure that IOMMUTLBEntry is filled in in that. Then, cleanup vtd_iommu_translate a bit. So even with PT we'll get a log now. Also, remove useless assignments. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: support passthrough (PT)Peter Xu2017-05-251-0/+1
| | | | | | | | | | | | Hardware support for VT-d device passthrough. Although current Linux can live with iommu=pt even without this, but this is faster than when using software passthrough. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Liu, Yi L <yi.l.liu@linux.intel.com> Reviewed-by: Jason Wang <jasowang@redhat.com>
* intel_iommu: enable remote IOTLBPeter Xu2017-04-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is based on Aviv Ben-David (<bd.aviv@gmail.com>)'s patch upstream: "IOMMU: enable intel_iommu map and unmap notifiers" https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01453.html However I removed/fixed some content, and added my own codes. Instead of translate() every page for iotlb invalidations (which is slower), we walk the pages when needed and notify in a hook function. This patch enables vfio devices for VT-d emulation. And, since we already have vhost DMAR support via device-iotlb, a natural benefit that this patch brings is that vt-d enabled vhost can live even without ATS capability now. Though more tests are needed. Signed-off-by: Aviv Ben-David <bdaviv@cs.technion.ac.il> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: \"Michael S. Tsirkin\" <mst@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <1491562755-23867-10-git-send-email-peterx@redhat.com> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* intel_iommu: add "caching-mode" optionAviv Ben-David2017-02-171-0/+1
| | | | | | | | | | | | | | This capability asks the guest to invalidate cache before each map operation. We can use this invalidation to trap map operations in the hypervisor. Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> [peterx: using "caching-mode" instead of "cache-mode" to align with spec] [peterx: re-write the subject to make it short and clear] Reviewed-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Aviv Ben-David <bd.aviv@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: support device iotlb descriptorJason Wang2017-01-101-2/+11
| | | | | | | | | | | | | | | This patch enables device IOTLB support for intel iommu. The major work is to implement QI device IOTLB descriptor processing and notify the device through iommu notifier. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel_iommu: fixing source id during IOTLB hash key calculationJason Wang2016-11-151-1/+1
| | | | | | | | | | | | | | Using uint8_t for source id will lose bus num and get the wrong/invalid IOTLB entry. Fixing by using uint16_t instead and enlarge level shift. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: support all masks in interrupt entry cache invalidationRadim Krčmář2016-07-211-0/+1
| | | | | | | | | | | | | Linux guests do not gracefully handle cases when the invalidation mask they wanted is not supported, probably because real hardware always allowed all. We can just say that all 16 masks are supported, because both ioapic_iec_notifier and kvm_update_msi_routes_all invalidate all caches. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add support for Extended Interrupt ModeJan Kiszka2016-07-211-0/+2
| | | | | | | | | | | | | | | | | As neither QEMU nor KVM support more than 255 CPUs so far, this is simple: we only need to switch the destination ID translation in vtd_remap_irq_get if EIME is set. Once CFI support is there, it will have to take EIM into account as well. So far, nothing to do for this. This patch allows to use x2APIC in split irqchip mode of KVM. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> [use le32_to_cpu() to retrieve dest_id] Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* x86-iommu: introduce IEC notifiersPeter Xu2016-07-211-4/+20
| | | | | | | | | | | | | This patch introduces x86 IOMMU IEC (Interrupt Entry Cache) invalidation notifier list. When vIOMMU receives IEC invalidate request, all the registered units will be notified with specific invalidation requests. Intel IOMMU is the first provider that generates such a event. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add support for PCI MSI remapPeter Xu2016-07-201-0/+2
| | | | | | | | | | | | | | | This patch enables interrupt remapping for PCI devices. To play the trick, one memory region "iommu_ir" is added as child region of the original iommu memory region, covering range 0xfeeXXXXX (which is the address range for APIC). All the writes to this range will be taken as MSI, and translation is carried out only when IR is enabled. Idea suggested by Paolo Bonzini. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>