summary refs log tree commit diff stats
path: root/hw/i386/intel_iommu.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* hw/i386: Constify all PropertyRichard Henderson2024-12-151-1/+1
| | | | | Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* intel_iommu: Add missed reserved bit check for IEC descriptorZhenzhong Duan2024-11-041-0/+8
| | | | | | | | | | | | IEC descriptor is 128-bit invalidation descriptor, must be padded with 128-bits of 0s in the upper bytes to create a 256-bit descriptor when the invalidation queue is configured for 256-bit descriptors (IQA_REG.DW=1). Fixes: 02a2cbc872df ("x86-iommu: introduce IEC notifiers") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20241104125536.1236118-4-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Add missed sanity check for 256-bit invalidation queueZhenzhong Duan2024-11-041-22/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to VTD spec, a 256-bit descriptor will result in an invalid descriptor error if submitted in an IQ that is setup to provide hardware with 128-bit descriptors (IQA_REG.DW=0). Meanwhile, there are old inv desc types (e.g. iotlb_inv_desc) that can be either 128bits or 256bits. If a 128-bit version of this descriptor is submitted into an IQ that is setup to provide hardware with 256-bit descriptors will also result in an invalid descriptor error. The 2nd will be captured by the tail register update. So we only need to focus on the 1st. Because the reserved bit check between different types of invalidation desc are common, so introduce a common function vtd_inv_desc_reserved_check() to do all the checks and pass the differences as parameters. With this change, need to replace error_report_once() call with error_report() to catch different call sites. This isn't an issue as error_report_once() here is mainly used to help debug guest error, but it only dumps once in qemu life cycle and doesn't help much, we need error_report() instead. Fixes: c0c1d351849b ("intel_iommu: add 256 bits qi_desc support") Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20241104125536.1236118-3-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Send IQE event when setting reserved bit in IQT_TAILZhenzhong Duan2024-11-041-0/+1
| | | | | | | | | | | | | | | | According to VTD spec, Figure 11-22, Invalidation Queue Tail Register, "When Descriptor Width (DW) field in Invalidation Queue Address Register (IQA_REG) is Set (256-bit descriptors), hardware treats bit-4 as reserved and a value of 1 in the bit will result in invalidation queue error." Current code missed to send IQE event to guest, fix it. Fixes: c0c1d351849b ("intel_iommu: add 256 bits qi_desc support") Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20241104125536.1236118-2-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Introduce property "stale-tm" to control Transient Mapping (TM) ↵Zhenzhong Duan2024-11-041-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | field VT-d spec removed Transient Mapping (TM) field from second-level page-tables and treat the field as Reserved(0) since revision 3.2. Changing the field as reserved(0) will break backward compatibility, so introduce a property "stale-tm" to allow user to control the setting. Use pc_compat_9_1 to handle the compatibility for machines before 9.2 which allow guest to set the field. Starting from 9.2, this field is reserved(0) by default to match spec. Of course, user can force it on command line. This doesn't impact function of vIOMMU as there was no logic to emulate Transient Mapping. Suggested-by: Yi Liu <yi.l.liu@intel.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20241028022514.806657-1-zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw: Use device_class_set_legacy_reset() instead of opencodingPeter Maydell2024-09-131-1/+1
| | | | | | | | | | | | | Use device_class_set_legacy_reset() instead of opencoding an assignment to DeviceClass::reset. This change was produced with: spatch --macro-file scripts/cocci-macro-file.h \ --sp-file scripts/coccinelle/device-reset.cocci \ --keep-comments --smpl-spacing --in-place --dir hw Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20240830145812.1967042-8-peter.maydell@linaro.org
* intel_iommu: Make PASID-cache and PIOTLB type invalid in legacy modeZhenzhong Duan2024-09-111-11/+11
| | | | | | | | | | | | | | | In vtd_process_inv_desc(), VTD_INV_DESC_PC and VTD_INV_DESC_PIOTLB are bypassed without scalable mode check. These two types are not valid in legacy mode and we should report error. Fixes: 4a4f219e8a10 ("intel_iommu: add scalable-mode option to make scalable mode work") Suggested-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20240814071321.2621384-3-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Fix invalidation descriptor type fieldZhenzhong Duan2024-09-111-1/+1
| | | | | | | | | | | | | | | | | | | | According to spec, invalidation descriptor type is 7bits which is concatenation of bits[11:9] and bits[3:0] of invalidation descriptor. Currently we only pick bits[3:0] as the invalidation type and treat bits[11:9] as reserved zero. This is not a problem for now as bits[11:9] is zero for all current invalidation types. But it will break if newer type occupies bits[11:9]. Fix it by taking bits[11:9] into type and make reserved bits check accurate. Suggested-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Clément Mathieu--Drif<clement.mathieu--drif@eviden.com> Message-Id: <20240814071321.2621384-2-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Fix for IQA reg read dropped DW fieldyeeli2024-08-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If VT-D hardware supports scalable mode, Linux will set the IQA DW field (bit11). In qemu, the vtd_mem_write and vtd_update_iq_dw set DW field well. However, vtd_mem_read the DW field wrong because "& VTD_IQA_QS" dropped the value of DW. Replace "&VTD_IQA_QS" with "& (VTD_IQA_QS | VTD_IQA_DW_MASK)" could save the DW field. Test patch as below: config the "x-scalable-mode" option: "-device intel-iommu,caching-mode=on,x-scalable-mode=on,aw-bits=48" After Linux OS boot, check the IQA_REG DW Field by usage 1 or 2: 1. IOMMU_DEBUGFS: Before fix: cat /sys/kernel/debug/iommu/intel/iommu_regset |grep IQA IQA 0x90 0x00000001001da001 After fix: cat /sys/kernel/debug/iommu/intel/iommu_regset |grep IQA IQA 0x90 0x00000001001da801 Check DW field(bit11) is 1. 2. devmem2 read the IQA_REG (offset 0x90): Before fix: devmem2 0xfed90090 /dev/mem opened. Memory mapped at address 0x7f72c795b000. Value at address 0xFED90090 (0x7f72c795b090): 0x1DA001 After fix: devmem2 0xfed90090 /dev/mem opened. Memory mapped at address 0x7fc95281c000. Value at address 0xFED90090 (0x7fc95281c090): 0x1DA801 Check DW field(bit11) is 1. Signed-off-by: yeeli <seven.yi.lee@gmail.com> Message-Id: <20240725031858.1529902-1-seven.yi.lee@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'hw-misc-20240723' of https://github.com/philmd/qemu into stagingRichard Henderson2024-07-241-24/+33
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Misc HW patch queue - Restrict probe_access*() functions to TCG (Phil) - Extract do_invalidate_device_tlb from vtd_process_device_iotlb_desc (Clément) - Fixes in Loongson IPI model (Bibo & Phil) - Make docs/interop/firmware.json compatible with qapi-gen.py script (Thomas) - Correct MPC I2C MMIO region size (Zoltan) - Remove useless cast in Loongson3 Virt machine (Yao) - Various uses of range overlap API (Yao) - Use ERRP_GUARD macro in nubus_virtio_mmio_realize (Zhao) - Use DMA memory API in Goldfish UART model (Phil) - Expose fifo8_pop_buf and introduce fifo8_drop (Phil) - MAINTAINERS updates (Zhao, Phil) # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmagFF8ACgkQ4+MsLN6t # wN5bKg//f5TwUhsy2ff0FJpHheDOj/9Gc2nZ1U/Fp0E5N3sz3A7MGp91wye6Xwi3 # XG34YN9LK1AVzuCdrEEs5Uaxs1ZS1R2mV+fZaGHwYYxPDdnXxGyp/2Q0eyRxzbcN # zxE2hWscYSZbPVEru4HvZJKfp4XnE1cqA78fJKMAdtq0IPq38tmQNRlJ+gWD9dC6 # ZUHXPFf3DnucvVuwqb0JYO/E+uJpcTtgR6pc09Xtv/HFgMiS0vKZ1I/6LChqAUw9 # eLMpD/5V2naemVadJe98/dL7gIUnhB8GTjsb4ioblG59AO/uojutwjBSQvFxBUUw # U5lX9OSn20ouwcGiqimsz+5ziwhCG0R6r1zeQJFqUxrpZSscq7NQp9ygbvirm+wS # edLc8yTPf4MtYOihzPP9jLPcXPZjEV64gSnJISDDFYWANCrysX3suaFEOuVYPl+s # ZgQYRVSSYOYHgNqBSRkPKKVUxskSQiqLY3SfGJG4EA9Ktt5lD1cLCXQxhdsqphFm # Ws3zkrVVL0EKl4v/4MtCgITIIctN1ZJE9u3oPJjASqSvK6EebFqAJkc2SidzKHz0 # F3iYX2AheWNHCQ3HFu023EvFryjlxYk95fs2f6Uj2a9yVbi813qsvd3gcZ8t0kTT # +dmQwpu1MxjzZnA6838R6OCMnC+UpMPqQh3dPkU/5AF2fc3NnN8= # =J/I2 # -----END PGP SIGNATURE----- # gpg: Signature made Wed 24 Jul 2024 06:36:47 AM AEST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] * tag 'hw-misc-20240723' of https://github.com/philmd/qemu: (28 commits) MAINTAINERS: Add myself as a reviewer of machine core MAINTAINERS: Cover guest-agent in QAPI schema util/fifo8: Introduce fifo8_drop() util/fifo8: Expose fifo8_pop_buf() util/fifo8: Rename fifo8_pop_buf() -> fifo8_pop_bufptr() util/fifo8: Rename fifo8_peek_buf() -> fifo8_peek_bufptr() util/fifo8: Use fifo8_reset() in fifo8_create() util/fifo8: Fix style chardev/char-fe: Document returned value on error hw/char/goldfish: Use DMA memory API hw/nubus/virtio-mmio: Fix missing ERRP_GUARD() in realize handler dump: make range overlap check more readable crypto/block-luks: make range overlap check more readable system/memory_mapping: make range overlap check more readable sparc/ldst_helper: make range overlap check more readable cxl/mailbox: make range overlap check more readable util/range: Make ranges_overlap() return bool hw/mips/loongson3_virt: remove useless type cast hw/i2c/mpc_i2c: Fix mmio region size docs/interop/firmware.json: convert "Example" section ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * hw/i386/intel_iommu: Extract device IOTLB invalidation logicClément Mathieu--Drif2024-07-231-24/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | This piece of code can be shared by both IOTLB invalidation and PASID-based IOTLB invalidation No functional changes intended. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-ID: <20240718081636.879544-12-zhenzhong.duan@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* | intel_iommu: make type matchClément Mathieu--Drif2024-07-211-1/+1
|/ | | | | | | | | | | | | The 'level' field in vtd_iotlb_key is an unsigned integer. We don't need to store level as an int in vtd_lookup_iotlb. This is not an issue by itself, but using unsigned here seems cleaner. Signed-off-by: Clément Mathieu--Drif <clement.mathieu--drif@eviden.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Message-Id: <20240709142557.317271-5-clement.mathieu--drif@eviden.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Check compatibility with host IOMMU capabilitiesZhenzhong Duan2024-06-241-0/+29
| | | | | | | | | | | | | | If check fails, host device (either VFIO or VDPA device) is not compatible with current vIOMMU config and should not be passed to guest. Only aw_bits is checked for now, we don't care about other caps before scalable modern mode is introduced. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Implement [set|unset]_iommu_device() callbacksYi Liu2024-06-241-0/+81
| | | | | | | | | | | | | | | | | Implement [set|unset]_iommu_device() callbacks in Intel vIOMMU. In set call, we take a reference of HostIOMMUDevice and store it in hash table indexed by PCI BDF. Note this BDF index is device's real BDF not the aliased one which is different from the index of VTDAddressSpace. There can be multiple assigned devices under same virtual iommu group and share same VTDAddressSpace, but each has its own HostIOMMUDevice. Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Extract out vtd_cap_init() to initialize cap/ecapZhenzhong Duan2024-06-241-42/+51
| | | | | | | | | | | Extract cap/ecap initialization in vtd_cap_init() to make code cleaner. No functional change intended. Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
* hw/i386/iommu: Constify IOMMUTLBEvent in vtd_page_walk_hook prototypePhilippe Mathieu-Daudé2024-06-191-4/+4
| | | | | | | | @event access is read-only. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20240612132532.85928-4-philmd@linaro.org>
* hw/i386/pc: Rename "bus" attribute to "pcibus"Bernhard Beschow2024-02-271-1/+1
| | | | | | | | | | | | The attribute is of type PCIBus; reflect that in the name. It will also make the next change more intuitive. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Bernhard Beschow <shentey@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20240224135851.100361-3-shentey@gmail.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* intel_iommu: allow Extended Interrupt Mode when using userspace APICBui Quang Minh2024-02-141-5/+1
| | | | | | | | | | | | As userspace APIC now supports x2APIC, intel interrupt remapping hardware can be set to EIM mode when userspace local APIC is used. Suggested-by: Joao Martins <joao.m.martins@oracle.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Message-Id: <20240111154404.5333-5-minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* system/cpus: rename qemu_mutex_lock_iothread() to bql_lock()Stefan Hajnoczi2024-01-081-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Big QEMU Lock (BQL) has many names and they are confusing. The actual QemuMutex variable is called qemu_global_mutex but it's commonly referred to as the BQL in discussions and some code comments. The locking APIs, however, are called qemu_mutex_lock_iothread() and qemu_mutex_unlock_iothread(). The "iothread" name is historic and comes from when the main thread was split into into KVM vcpu threads and the "iothread" (now called the main loop thread). I have contributed to the confusion myself by introducing a separate --object iothread, a separate concept unrelated to the BQL. The "iothread" name is no longer appropriate for the BQL. Rename the locking APIs to: - void bql_lock(void) - void bql_unlock(void) - bool bql_locked(void) There are more APIs with "iothread" in their names. Subsequent patches will rename them. There are also comments and documentation that will be updated in later patches. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Paul Durrant <paul@xen.org> Acked-by: Fabiano Rosas <farosas@suse.de> Acked-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Cédric Le Goater <clg@kaod.org> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-id: 20240102153529.486531-2-stefanha@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* hw/i386: Constify VMStateRichard Henderson2023-12-291-1/+1
| | | | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20231221031652.119827-32-richard.henderson@linaro.org>
* Merge tag 'pull-target-arm-20231106' of ↵Stefan Hajnoczi2023-11-071-3/+20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://git.linaro.org/people/pmaydell/qemu-arm into staging target-arm queue: * hw/arm/virt: fix PMU IRQ registration * hw/arm/virt: Report correct register sizes in ACPI DBG2/SPCR tables * hw/i386/intel_iommu: vtd_slpte_nonzero_rsvd(): assert no overflow * util/filemonitor-inotify: qemu_file_monitor_watch(): assert no overflow * mc146818rtc: rtc_set_time(): initialize tm to zeroes * block/nvme: nvme_process_completion() fix bound for cid * hw/core/loader: gunzip(): initialize z_stream * io/channel-socket: qio_channel_socket_flush(): improve msg validation * hw/arm/vexpress-a9: Remove useless mapping of RAM at address 0 * target/arm: Fix A64 LDRA immediate decode # -----BEGIN PGP SIGNATURE----- # # iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmVJBtUZHHBldGVyLm1h # eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3qYTEACYqLV57JezgRFXzMEwKX3l # 9IYbFje+lGemobdJOEHhRvXjCNb+5TwhEfQasri0FBzokw16S3WOOF7roGb6YOU1 # od1SGiS2AbrmiazlBpamVO8z0WAEgbnXIoQa/3xKAGPJXszD2zK+06KnXS5xuCuD # nHojzIx7Gv4HEIs4huY39/YL2HMaxrqvXC8IAu51eqY+TPnETT+WI3HxlZ2OMIsn # 1Jnn+FeZfA1bhKx4JsD9MyHM1ovbjOwYkHOlzjU6fmTFFPGKRy0nxnjMNCBcXHQ+ # unemc/9BhEFup76tkX+JIlSBrPre5Mnh93DsGKSapwKPKq+fQhUDmzXY2r3OvQZX # ryxO4PJkCNTM1wZU6GeEDPWVfhgBKHUMv+tr9Mf9iBlyXRsmXLSEl7AFUUaFlgAL # dSMyiAaUlfvGa7Gtta9eFAJ/GeaiuJu2CYq6lvtRrNIHflLm3gVCef8gmwM5Eqxm # 3PNzEoabKyQQfz69j9RCLpoutMBq1sg2IzxW8UjAFupugcIABjLf0Sl11qA0/B89 # YX67B0ynQD9ajI2GS8ULid/tvEiJVgdZ2Ua3U3xpG54vKG1/54EUiCP8TtoIuoMy # bKg8AU9EIPN962PxoAwS+bSSdCu7/zBjVpg4T/zIzWRdgSjRsE21Swu5Ca934ng5 # VpVUuiwtI/zvHgqaiORu+w== # =UbqJ # -----END PGP SIGNATURE----- # gpg: Signature made Mon 06 Nov 2023 23:31:33 HKT # gpg: using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE # gpg: issuer "peter.maydell@linaro.org" # gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full] # gpg: aka "Peter Maydell <pmaydell@gmail.com>" [full] # gpg: aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full] # gpg: aka "Peter Maydell <peter@archaic.org.uk>" [unknown] # Primary key fingerprint: E1A5 C593 CD41 9DE2 8E83 15CF 3C25 25ED 1436 0CDE * tag 'pull-target-arm-20231106' of https://git.linaro.org/people/pmaydell/qemu-arm: target/arm: Fix A64 LDRA immediate decode hw/arm/vexpress-a9: Remove useless mapping of RAM at address 0 io/channel-socket: qio_channel_socket_flush(): improve msg validation hw/core/loader: gunzip(): initialize z_stream block/nvme: nvme_process_completion() fix bound for cid mc146818rtc: rtc_set_time(): initialize tm to zeroes util/filemonitor-inotify: qemu_file_monitor_watch(): assert no overflow hw/i386/intel_iommu: vtd_slpte_nonzero_rsvd(): assert no overflow tests/qtest/bios-tables-test: Update virt SPCR and DBG2 golden references hw/arm/virt: Report correct register sizes in ACPI DBG2/SPCR tables. tests/qtest/bios-tables-test: Allow changes to virt SPCR and DBG2 hw/arm/virt: fix PMU IRQ registration Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * hw/i386/intel_iommu: vtd_slpte_nonzero_rsvd(): assert no overflowVladimir Sementsov-Ogievskiy2023-11-061-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We support only 3- and 4-level page-tables, which is firstly checked in vtd_decide_config(), then setup in vtd_init(). Than level fields are checked by vtd_is_level_supported(). So here we can't have level out from 1..4 inclusive range. Let's assert it. That also explains Coverity that we are not going to overflow the array. CID: 1487158, 1487186 Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru> Message-id: 20231017125941.810461-2-vsementsov@yandex-team.ru Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
* | hw/pci: modify pci_setup_iommu() to set PCIIOMMUOpsYi Liu2023-11-031-1/+5
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch modifies pci_setup_iommu() to set PCIIOMMUOps instead of setting PCIIOMMUFunc. PCIIOMMUFunc is used to get an address space for a PCI device in vendor specific way. The PCIIOMMUOps still offers this functionality. But using PCIIOMMUOps leaves space to add more iommu related vendor specific operations. Cc: Kevin Tian <kevin.tian@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Peter Xu <peterx@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Yi Sun <yi.y.sun@linux.intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Xu <peterx@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Andrey Smirnov <andrew.smirnov@gmail.com> Cc: Helge Deller <deller@gmx.de> Cc: Hervé Poussineau <hpoussin@reactos.org> Cc: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Cc: BALATON Zoltan <balaton@eik.bme.hu> Cc: Elena Ufimtseva <elena.ufimtseva@oracle.com> Cc: Jagannathan Raman <jag.raman@oracle.com> Cc: Matthew Rosato <mjrosato@linux.ibm.com> Cc: Eric Farman <farman@linux.ibm.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Helge Deller <deller@gmx.de> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> [ clg: - refreshed on latest QEMU - included hw/remote/iommu.c - documentation update - asserts in pci_setup_iommu() - removed checks on iommu_bus->iommu_ops->get_address_space - included Elroy PCI host (PA-RISC) ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2023-10-231-48/+102
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pc,pci: features, cleanups infrastructure for vhost-vdpa shadow work piix south bridge rework reconnect for vhost-user-scsi dummy ACPI QTG DSM for cxl tests, cleanups, fixes all over the place Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmU06PMPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpNIsH/0DlKti86VZLJ6PbNqsnKxoK2gg05TbEhPZU # pQ+RPDaCHpFBsLC5qsoMJwvaEQFe0e49ZFemw7bXRzBxgmbbNnZ9ArCIPqT+rvQd # 7UBmyC+kacVyybZatq69aK2BHKFtiIRlT78d9Izgtjmp8V7oyKoz14Esh8wkE+FT # ypHUa70Addi6alNm6BVkm7bxZxi0Wrmf3THqF8ViYvufzHKl7JR5e17fKWEG0BqV # 9W7AeHMnzJ7jkTvBGUw7g5EbzFn7hPLTbO4G/VW97k0puS4WRX5aIMkVhUazsRIa # zDOuXCCskUWuRapiCwY0E4g7cCaT8/JR6JjjBaTgkjJgvo5Y8Eg= # =ILek # -----END PGP SIGNATURE----- # gpg: Signature made Sun 22 Oct 2023 02:18:43 PDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (62 commits) intel-iommu: Report interrupt remapping faults, fix return value MAINTAINERS: Add include/hw/intc/i8259.h to the PC chip section vhost-user: Fix protocol feature bit conflict tests/acpi: Update DSDT.cxl with QTG DSM hw/cxl: Add QTG _DSM support for ACPI0017 device tests/acpi: Allow update of DSDT.cxl hw/i386/cxl: ensure maxram is greater than ram size for calculating cxl range vhost-user: fix lost reconnect vhost-user-scsi: start vhost when guest kicks vhost-user-scsi: support reconnect to backend vhost: move and rename the conn retry times vhost-user-common: send get_inflight_fd once hw/i386/pc_piix: Make PIIX4 south bridge usable in PC machine hw/isa/piix: Implement multi-process QEMU support also for PIIX4 hw/isa/piix: Resolve duplicate code regarding PCI interrupt wiring hw/isa/piix: Reuse PIIX3's PCI interrupt triggering in PIIX4 hw/isa/piix: Rename functions to be shared for PCI interrupt triggering hw/isa/piix: Reuse PIIX3 base class' realize method in PIIX4 hw/isa/piix: Share PIIX3's base class with PIIX4 hw/isa/piix: Harmonize names of reset control memory regions ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * intel-iommu: Report interrupt remapping faults, fix return valueDavid Woodhouse2023-10-221-48/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A generic X86IOMMUClass->int_remap function should not return VT-d specific values; fix it to return 0 if the interrupt was successfully translated or -EINVAL if not. The VTD_FR_IR_xxx values are supposed to be used to actually raise faults through the fault reporting mechanism, so do that instead for the case where the IRQ is actually being injected. There is more work to be done here, as pretranslations for the KVM IRQ routing table can't fault; an untranslatable IRQ should be handled in userspace and the fault raised only when the IRQ actually happens (if indeed the IRTE is still not valid at that time). But we can work on that later; we can at least raise faults for the direct case. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Message-Id: <31bbfc9041690449d3ac891f4431ec82174ee1b4.camel@infradead.org> Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | hw/i386/intel_iommu: Do not use SysBus API to map local MMIO regionPhilippe Mathieu-Daudé2023-10-191-3/+2
|/ | | | | | | | | | | | | | | | | | | | | | | | There is no point in exposing an internal MMIO region via SysBus and directly mapping it in the very same device. Just map it without using the SysBus API. Transformation done using the following coccinelle script: @@ expression sbdev; expression index; expression addr; expression subregion; @@ - sysbus_init_mmio(sbdev, subregion); ... when != sbdev - sysbus_mmio_map(sbdev, index, addr); + memory_region_add_subregion(get_system_memory(), addr, subregion); Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20231018141151.87466-3-philmd@linaro.org>
* intel_iommu: Fix shadow local variables on "size"Peter Xu2023-09-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | This patch fixes the warning of shadowed local variable: ../hw/i386/intel_iommu.c: In function ‘vtd_address_space_unmap’: ../hw/i386/intel_iommu.c:3773:18: warning: declaration of ‘size’ shadows a previous local [-Wshadow=compatible-local] 3773 | uint64_t size = mask + 1; | ^~~~ ../hw/i386/intel_iommu.c:3747:12: note: shadowed declaration is here 3747 | hwaddr size, remain; | ^~~~ Cc: Jason Wang <jasowang@redhat.com> Cc: Eric Auger <eric.auger@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-ID: <20230922160410.138786-1-peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
* i386: spelling fixesMichael Tokarev2023-09-201-2/+2
| | | | | Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
* target/i386: Allow elision of kvm_enable_x2apic()Philippe Mathieu-Daudé2023-09-071-1/+1
| | | | | | | | | | | Call kvm_enabled() before kvm_enable_x2apic() to let the compiler elide its call. Cleanup the code by simplifying "!xen_enabled() && kvm_enabled()" to just "kvm_enabled()". Suggested-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230904124325.79040-8-philmd@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/i386/intel_iommu: Fix index calculation in vtd_interrupt_remap_msi()Thomas Huth2023-08-031-1/+1
| | | | | | | | | | | | The values in "addr" are populated locally in this function in host endian byte order, so we must not swap the index_l field here. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230802135723.178083-5-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com>
* hw/i386/intel_iommu: Fix endianness problems related to VTD_IR_TableEntryThomas Huth2023-08-031-8/+8
| | | | | | | | | | | | | | | | | | | | | | | The code already tries to do some endianness handling here, but currently fails badly: - While it already swaps the data when logging errors / tracing, it fails to byteswap the value before e.g. accessing entry->irte.present - entry->irte.source_id is swapped with le32_to_cpu(), though this is a 16-bit value - The whole union is apparently supposed to be swapped via the 64-bit data[2] array, but the struct is a mixture between 32 bit values (the first 8 bytes) and 64 bit values (the second 8 bytes), so this cannot work as expected. Fix it by converting the struct to two proper 64-bit bitfields, and by swapping the values only once for everybody right after reading the data from memory. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230802135723.178083-3-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* hw/i386/intel_iommu: Fix trivial endianness problemsThomas Huth2023-08-031-0/+5
| | | | | | | | | | | | | After reading the guest memory with dma_memory_read(), we have to make sure that we byteswap the little endian data to the host's byte order. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20230802135723.178083-2-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel_iommu: Fix address space unmapZhenzhong Duan2023-06-261-1/+1
| | | | | | | | | | | | | | | | During address space unmap, corresponding IOVA tree entries are also removed. But DMAMap is set beyond notifier's scope by 1, so in theory there is possibility to remove a continuous entry above the notifier's scope but falling in adjacent notifier's scope. There is no issue currently as no use cases allocate notifiers continuously, but let's be robust. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230615032626.314476-4-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Fix flag check in replayZhenzhong Duan2023-06-261-1/+1
| | | | | | | | | | | | | | | Replay doesn't notify registered notifiers but the one passed to it. So it's meaningless to check the registered notifier's synthetic flag. There is no issue currently as all replay use cases have MAP flag set, but let's be robust. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230615032626.314476-3-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: Fix a potential issue in VFIO dirty page syncZhenzhong Duan2023-06-261-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | Peter Xu found a potential issue: "The other thing is when I am looking at the new code I found that we actually extended the replay() to be used also in dirty tracking of vfio, in vfio_sync_dirty_bitmap(). For that maybe it's already broken if unmap_all() because afaiu log_sync() can be called in migration thread anytime during DMA so I think it means the device is prone to DMA with the IOMMU pgtable quickly erased and rebuilt here, which means the DMA could fail unexpectedly. Copy Alex, Kirti and Neo." Fix it by replacing the unmap_all() to only evacuate the iova tree (keeping all host mappings untouched, IOW, don't notify UNMAP), and do a full resync in page walk which will notify all existing mappings as MAP. This way we don't interrupt with any existing mapping if there is (e.g. for the dirty sync case), meanwhile we keep sync too to latest (for moving a vfio device into an existing iommu group). Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Message-Id: <20230615032626.314476-2-zhenzhong.duan@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel_iommu: refine iotlb hash calculationJason Wang2023-04-241-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1b2b12376c8 ("intel-iommu: PASID support") takes PASID into account when calculating iotlb hash like: static guint vtd_iotlb_hash(gconstpointer v) { const struct vtd_iotlb_key *key = v; return key->gfn | ((key->sid) << VTD_IOTLB_SID_SHIFT) | (key->level) << VTD_IOTLB_LVL_SHIFT | (key->pasid) << VTD_IOTLB_PASID_SHIFT; } This turns out to be problematic since: - the shift will lose bits if not converting to uint64_t - level should be off by one in order to fit into 2 bits - VTD_IOTLB_PASID_SHIFT is 30 but PASID is 20 bits which will waste some bits - the hash result is uint64_t so we will lose bits when converting to guint So this patch fixes them by - converting the keys into uint64_t before doing the shift - off level by one to make it fit into two bits - change the sid, lvl and pasid shift to 26, 42 and 44 in order to take the full width of uint64_t - perform an XOR to the top 32bit with the bottom 32bit for the final result to fit guint Fixes: Coverity CID 1508100 Fixes: 1b2b12376c8 ("intel-iommu: PASID support") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230412073510.7158-1-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* Revert "memory: Optimize replay of guest mapping"Peter Maydell2023-04-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6da24341866fa940fd7d575788a2319514941c77 ("memory: Optimize replay of guest mapping"). This change breaks the mps3-an547 board under TCG (and probably other TCG boards using an IOMMU), which now assert: $ ./build/x86/qemu-system-arm --machine mps3-an547 -serial stdio -kernel /tmp/an547-mwe/build/test.elf qemu-system-arm: ../../softmmu/memory.c:1903: memory_region_register_iommu_notifier: Assertion `n->end <= memory_region_size(mr)' failed. This is because tcg_register_iommu_notifier() registers an IOMMU notifier which covers the entire address space, so the assertion added in this commit is not correct. For the 8.0 release, just revert this commit as it is only an optimization. Fixes: 6da24341866f ("memory: Optimize replay of guest mapping") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 917c1c552b2d1b732f9a86c6a90684c3a5e4cada.1680640587.git.mst@redhat.com
* intel-iommu: send UNMAP notifications for domain or global inv descPeter Xu2023-03-021-5/+9
| | | | | | | | | | | | | | | We don't send UNMAP notification upon domain or global invalidation which will lead the notifier can't work correctly. One example is to use vhost remote IOTLB without enabling device IOTLB. Fixing this by sending UNMAP notification. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-6-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: fail DEVIOTLB_UNMAP without dt modeJason Wang2023-03-021-0/+8
| | | | | | | | | | | | | | | | Without dt mode, device IOTLB notifier won't work since guest won't send device IOTLB invalidation descriptor in this case. Let's fail early instead of misbehaving silently. Reviewed-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Laurent Vivier <lvivier@redhat.com> Tested-by: Viktor Prutyanov <viktor@daynix.com> Buglink: https://bugzilla.redhat.com/2156876 Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-3-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: fail MAP notifier without caching modeJason Wang2023-03-021-0/+7
| | | | | | | | | | | | | | | Without caching mode, MAP notifier won't work correctly since guest won't send IOTLB update event when it establishes new mappings in the I/O page tables. Let's fail the IOMMU notifiers early instead of misbehaving silently. Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Viktor Prutyanov <viktor@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230223065924.42503-2-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* memory: Optimize replay of guest mappingZhenzhong Duan2023-03-021-1/+1
| | | | | | | | | | | | | | | | | | | On x86, there are two notifiers registered due to vtd-ir memory region splitting the whole address space. During replay of the address space for each notifier, the whole address space is scanned which is unnecessory. We only need to scan the space belong to notifier montiored space. Assert when notifier is used to monitor beyond iommu memory region's address space. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Message-Id: <20230215065238.713041-1-zhenzhong.duan@intel.com> Acked-by: Peter Xu <peterx@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* KVM: remove support for kernel-irqchip=offPaolo Bonzini2023-01-061-2/+2
| | | | | | | -machine kernel-irqchip=off is broken for many guest OSes; kernel-irqchip=split is the replacement that works, so remove the deprecated support for the former. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* intel-iommu: PASID supportJason Wang2022-11-071-101/+317
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduce ECAP_PASID via "x-pasid-mode". Based on the existing support for scalable mode, we need to implement the following missing parts: 1) tag VTDAddressSpace with PASID and support IOMMU/DMA translation with PASID 2) tag IOTLB with PASID 3) PASID cache and its flush 4) PASID based IOTLB invalidation For simplicity PASID cache is not implemented so we can simply implement the PASID cache flush as a no and leave it to be implemented in the future. For PASID based IOTLB invalidation, since we haven't had L1 stage support, the PASID based IOTLB invalidation is not implemented yet. For PASID based device IOTLB invalidation, it requires the support for vhost so we forbid enabling device IOTLB when PASID is enabled now. Those work could be done in the future. Note that though PASID based IOMMU translation is ready but no device can issue PASID DMA right now. In this case, PCI_NO_PASID is used as PASID to identify the address without PASID. vtd_find_add_as() has been extended to provision address space with PASID which could be utilized by the future extension of PCI core to allow device model to use PASID based DMA translation. This feature would be useful for: 1) prototyping PASID support for devices like virtio 2) future vPASID work 3) future PRS and vSVA work Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-5-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* intel-iommu: convert VTD_PE_GET_FPD_ERR() to be a functionJason Wang2022-11-071-14/+28
| | | | | | | | | | | | | | | We used to have a macro for VTD_PE_GET_FPD_ERR() but it has an internal goto which prevents it from being reused. This patch convert that macro to a dedicated function and let the caller to decide what to do (e.g using goto or not). This makes sure it can be re-used for other function that requires fault reporting. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-4-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
* intel-iommu: drop VTDBusJason Wang2022-11-071-118/+116
| | | | | | | | | | | | | | | | | | | | | We introduce VTDBus structure as an intermediate step for searching the address space. This works well with SID based matching/lookup. But when we want to support SID plus PASID based address space lookup, this intermediate steps turns out to be a burden. So the patch simply drops the VTDBus structure and use the PCIBus and devfn as the key for the g_hash_table(). This simplifies the codes and the future PASID extension. To prevent being slower for past vtd_find_as_from_bus_num() callers, a vtd_as cache indexed by the bus number is introduced to store the last recent search result of a vtd_as belongs to a specific bus. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-3-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
* intel-iommu: don't warn guest errors when getting rid2pasid entryJason Wang2022-11-071-6/+6
| | | | | | | | | | | | | We use to warn on wrong rid2pasid entry. But this error could be triggered by the guest and could happens during initialization. So let's don't warn in this case. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20221028061436.30093-2-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com>
* Revert "intel_iommu: Fix irqchip / X2APIC configuration checks"Peter Xu2022-10-091-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's true that when vcpus<=255 we don't require the length of 32bit APIC IDs. However here since we already have EIM=ON it means the hypervisor will declare the VM as x2apic supported (e.g. VT-d ECAP register will have EIM bit 4 set), so the guest should assume the APIC IDs are 32bits width even if vcpus<=255. In short, commit 77250171bdc breaks any simple cmdline that wants to boot a VM with >=9 but <=255 vcpus with: -device intel-iommu,intremap=on For anyone who does not want to enable x2apic, we can use eim=off in the intel-iommu parameters to skip enabling KVM x2apic. This partly reverts commit 77250171bdc02aee106083fd2a068147befa1a38, while keeping the valid bit on checking split irqchip, but revert the other change. One thing to mention is that this patch may break migration compatibility of such VM, however that's probably the best thing we can do, because the old behavior was simply wrong and not working for >8 vcpus. For <=8 vcpus, there could be a light guest ABI change (by enabling KVM x2apic after this patch), but logically it shouldn't affect the migration from working. Also, this is not the 1st commit to change x2apic behavior. Igor provided a full history of how this evolved for the past few years: https://lore.kernel.org/qemu-devel/20220922154617.57d1a1fb@redhat.com/ Relevant commits for reference: fb506e701e ("intel_iommu: reject broken EIM", 2016-10-17) c1bb5418e3 ("target/i386: Support up to 32768 CPUs without IRQ remapping", 2020-12-10) 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks", 2022-05-16) dc89f32d92 ("target/i386: Fix sanity check on max APIC ID / X2APIC enablement", 2022-05-16) We may want to have this for stable too (mostly for 7.1.0 only). Adding a fixes tag. Cc: David Woodhouse <dwmw2@infradead.org> Cc: Claudio Fontana <cfontana@suse.de> Cc: Igor Mammedov <imammedo@redhat.com> Fixes: 77250171bd ("intel_iommu: Fix irqchip / X2APIC configuration checks") Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20220926153206.10881-1-peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>
* util: accept iova_tree_remove_parameter by valueEugenio Pérez2022-09-021-3/+3
| | | | | | | | | | | | | | It's convenient to call iova_tree_remove from a map returned from iova_tree_find or iova_tree_find_iova. With the current code this is not possible, since we will free it, and then we will try to search for it again. Fix it making accepting the map by value, forcing a copy of the argument. Not applying a fixes tag, since there is no use like that at the moment. Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* intel-iommu: update iq_dw during post loadJason Wang2022-05-161-6/+15
| | | | | | | | | | | | | We need to update iq_dw according to the DMA_IRQ_REG during post load. Otherwise we may get wrong IOTLB invalidation descriptor after migration. Fixes: fb43cf739e ("intel_iommu: scalable mode emulation") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220317080522.14621-2-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>
* intel-iommu: update root_scalable before switching as during post_loadJason Wang2022-05-161-7/+7
| | | | | | | | | | | | | | | | | We need check whether passthrough is enabled during vtd_switch_address_space() by checking the context entries. This requires the root_scalable to be set correctly otherwise we may try to check legacy rsvd bits instead of scalable ones. Fixing this by updating root_scalable before switching the address spaces during post_load. Fixes: fb43cf739e ("intel_iommu: scalable mode emulation") Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20220317080522.14621-1-jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com>