summary refs log tree commit diff stats
path: root/hw/vfio/spapr.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* hw: Remove unnecessary 'system/ram_addr.h' headerPhilippe Mathieu-Daudé2025-10-071-1/+0
| | | | | | | | | | | | | | None of these files require definition exposed by "system/ram_addr.h", remove its inclusion. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251001175448.18933-7-philmd@linaro.org>
* vfio/spapr.c: rename VFIOContainer bcontainer field to parent_objMark Cave-Ayland2025-09-251-3/+4
| | | | | | | | | | Now that nothing accesses the bcontainer field directly, rename bcontainer to parent_obj as per our current coding guidelines. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-12-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr.c: use QOM casts where appropriateMark Cave-Ayland2025-09-251-12/+7
| | | | | | | | | | Use QOM casts to convert between VFIOSpaprContainer and VFIOLegacyContainer instead of accessing bcontainer directly. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-11-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* include/hw/vfio/vfio-container.h: rename file to vfio-container-legacy.hMark Cave-Ayland2025-09-251-1/+1
| | | | | | | | | | | | With the rename of VFIOContainer to VFIOLegacyContainer, the vfio-container.h header file containing the struct definition is misleading. Rename it from vfio-container.h to vfio-container-legacy.h accordingly, fixing up the name of the include guard at the same time. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-4-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* include/hw/vfio/vfio-container-base.h: rename VFIOContainerBase to VFIOContainerMark Cave-Ayland2025-09-251-6/+6
| | | | | | | | | | | Now that the VFIOContainer struct name is available, rename VFIOContainerBase to VFIOContainer to better indicate that it is the superclass of other VFIOFooContainer structs. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-3-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* include/hw/vfio/vfio-container.h: rename VFIOContainer to VFIOLegacyContainerMark Cave-Ayland2025-09-251-9/+9
| | | | | | | | | | | | The VFIOContainer struct represents the legacy VFIO container even though the name suggests it may be the common superclass of all VFIO containers. Rename it to VFIOLegacyContainer to make this clearer, which is also a better match for its VFIO_IOMMU_LEGACY QOM type name. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-2-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr.c: use QOM casts where appropriateMark Cave-Ayland2025-09-081-10/+6
| | | | | | | | | | | Use QOM casts to convert between VFIOContainer and VFIOContainerBase instead of accessing bcontainer directly. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Link: https://lore.kernel.org/qemu-devel/20250715093110.107317-7-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* Merge tag 'single-binary-20250425' of https://github.com/philmd/qemu into ↵Stefan Hajnoczi2025-04-271-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging Various patches loosely related to single binary work: - Replace cpu_list() definition by CPUClass::list_cpus() callback - Remove few MO_TE definitions on Hexagon / X86 targets - Remove target_ulong uses in ARMMMUFaultInfo and ARM CPUWatchpoint - Remove DEVICE_HOST_ENDIAN definition - Evaluate TARGET_BIG_ENDIAN at compile time and use target_needs_bswap() more - Rename target_words_bigendian() as target_big_endian() - Convert target_name() and target_cpu_type() to TargetInfo API - Constify QOM TypeInfo class_data/interfaces fields - Get default_cpu_type calling machine_class_default_cpu_type() - Correct various uses of GLibCompareDataFunc prototype - Simplify ARM/Aarch64 gdb_get_core_xml_file() handling a bit - Move device tree files in their own pc-bios/dtb/ subdir - Correctly check strchrnul() symbol availability on macOS SDK - Move target-agnostic methods out of cpu-target.c and accel-target.c - Unmap canceled USB XHCI packet - Use deposit/extract API in designware model - Fix MIPS16e translation - Few missing header fixes # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmgLqb8ACgkQ4+MsLN6t # wN6nCQ//cmv1M+NsndhO5TAK8T1eUSXKlTZh932uro6ZgxKwN4p+j1Qo7bq3O9gu # qUMHNbcfQl8sHSytiXBoxCjLMCXC3u38iyz75WGXuPay06rs4wqmahqxL4tyno3l # 1RviFts9xlLn+tJqqrAR6+pRdALld0TY+yXUjXgr4aK5pIRpLz9U/sIEoh7qbA5U # x0MTaceDG3A91OYo0TgrNbcMe1b9GqQZ+a4tbaP+oE37wbiKdyQ68LjrEbV08Y1O # qrFF4oxquV31QJcUiuII1W7hC6psGrMsUA1f1qDu7QvmybAZWNZNsR9T66X9jH5J # wXMShJmmXwxugohmuPPFnDshzJy90aFL6Jy2shrfqcG2v0W66ARY1ZnbJLCcfczt # 073bnE2dnOVhd/ny37RrIJNJLLmYM0yFDeKuYtNNAzpK9fpA7Q2PI8QiqNacQ3Pa # TdEYrGlMk7OeNck8xJmJMY5rATthi1D4dIBv3rjQbUolQvPJe2Y9or0R2WL1jK5v # hhr6DY01iSPES3CravmUs/aB1HRMPi/nX45OmFR6frAB7xqWMreh81heBVuoTTK8 # PuXtRQgRMRKwDeTxlc6p+zba4mIEYG8rqJtPFRgViNCJ1KsgSIowup3BNU05YuFn # NoPoRayMDVMgejVgJin3Mg2DCYvt/+MBmO4IoggWlFsXj59uUgA= # =DXnZ # -----END PGP SIGNATURE----- # gpg: Signature made Fri 25 Apr 2025 11:26:55 EDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'single-binary-20250425' of https://github.com/philmd/qemu: (58 commits) qemu: Convert target_name() to TargetInfo API accel: Move target-agnostic code from accel-target.c -> accel-common.c accel: Make AccelCPUClass structure target-agnostic accel: Include missing 'qemu/accel.h' header in accel-internal.h accel: Implement accel_init_ops_interfaces() for both system/user mode cpus: Move target-agnostic methods out of cpu-target.c cpus: Replace CPU_RESOLVING_TYPE -> target_cpu_type() qemu: Introduce target_cpu_type() qapi: Rename TargetInfo structure as QemuTargetInfo hw/microblaze: Evaluate TARGET_BIG_ENDIAN at compile time hw/mips: Evaluate TARGET_BIG_ENDIAN at compile time target/xtensa: Evaluate TARGET_BIG_ENDIAN at compile time target/mips: Check CPU endianness at runtime using env_is_bigendian() accel/kvm: Use target_needs_bswap() linux-user/elfload: Use target_needs_bswap() target/hexagon: Include missing 'accel/tcg/getpc.h' accel/tcg: Correct list of included headers in tcg-stub.c system/kvm: make functions accessible from common code meson: Use osdep_prefix for strchrnul() meson: Share common C source prefixes ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * qom: Have class_init() take a const data argumentPhilippe Mathieu-Daudé2025-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | Mechanical change using gsed, then style manually adapted to pass checkpatch.pl script. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250424194905.82506-4-philmd@linaro.org>
* | vfio: Move vfio_kvm_device_fd() into helpers.cCédric Le Goater2025-04-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | The vfio_kvm_device_add/del_fd() routines opening the VFIO pseudo device are defined in "helpers.c". Move 'vfio_kvm_device_fd' definition there and its declaration into "vfio-helpers.h" to reduce exposure of VFIO internals in "hw/vfio/vfio-common.h". Reviewed-by: John Levon <john.levon@nutanix.com> Link: https://lore.kernel.org/qemu-devel/20250318095415.670319-22-clg@redhat.com Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-23-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* | vfio: Introduce a new header file for VFIOcontainer declarationsCédric Le Goater2025-04-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Gather all VFIOcontainer related declarations into "hw/vfio/vfio-container.h" to reduce exposure of VFIO internals in "hw/vfio/vfio-common.h". These declarations were initially introduced in commit 65501a745dba ("vfio: vfio-pci device assignment driver"). They are made available externally for PPC and s390x. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: John Levon <john.levon@nutanix.com> Link: https://lore.kernel.org/qemu-devel/20250318095415.670319-12-clg@redhat.com Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-13-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* | vfio: Move VFIOHostDMAWindow definition into spapr.cCédric Le Goater2025-04-251-0/+7
| | | | | | | | | | | | | | | | | | | | VFIOHostDMAWindow is only used in file "spapr.c". Move it there. Reviewed-by: John Levon <john.levon@nutanix.com> Link: https://lore.kernel.org/qemu-devel/20250318095415.670319-9-clg@redhat.com Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/20250326075122.1299361-10-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* | vfio/spapr: Fix L2 crash with PCI device passthrough and memory > 128GAmit Machhiwal2025-04-251-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An L2 KVM guest fails to boot inside a pSeries LPAR when booted with a memory more than 128 GB and PCI device passthrough. The L2 guest also crashes when it is booted with a memory greater than 128 GB and a PCI device is hotplugged later. The issue arises from a conditional check for `levels > 1` in `spapr_tce_create_table()` within L1 KVM. This check is meant to prevent multi-level TCEs, which are not supported by the PowerVM hypervisor. As a result, when QEMU makes a `VFIO_IOMMU_SPAPR_TCE_CREATE` ioctl call with `levels > 1`, it triggers the conditional check and returns `EINVAL`, causing the guest to crash with the following errors: 2025-03-04T06:36:36.133117Z qemu-system-ppc64: Failed to create a window, ret = -1 (Invalid argument) 2025-03-04T06:36:36.133176Z qemu-system-ppc64: Failed to create SPAPR window: Invalid argument qemu: hardware error: vfio: DMA mapping failed, unable to continue Fix this by checking the supported DDW "levels" returned by the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl before attempting the TCE create ioctl in KVM. The patch has been tested on KVM guests with memory configurations of up to 390GB, and 450GB on PowerVM and bare-metal environments respectively. Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250408124042.2695955-3-amachhiw@linux.ibm.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* | vfio/spapr: Enhance error handling in vfio_spapr_create_window()Amit Machhiwal2025-04-251-17/+16
|/ | | | | | | | | | | | Introduce an Error ** parameter to vfio_spapr_create_window() to enable structured error reporting. This allows the function to propagate detailed errors back to callers. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250408124042.2695955-2-amachhiw@linux.ibm.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* include/system: Move exec/ram_addr.h to system/ram_addr.hRichard Henderson2025-04-231-1/+1
| | | | | | | | Convert the existing includes with sed. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* include/system: Move exec/address-spaces.h to system/address-spaces.hRichard Henderson2025-04-231-1/+1
| | | | | | | | Convert the existing includes with sed. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* hw/vfio/spapr: Do not include <linux/kvm.h>Philippe Mathieu-Daudé2025-03-111-3/+0
| | | | | | | | | | | | | <linux/kvm.h> is already included by "system/kvm.h" in the next line. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20250307180337.14811-3-philmd@linaro.org> Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-3-philmd@linaro.org Signed-off-by: Cédric Le Goater <clg@redhat.com>
* system: Declare qemu_[min/max]rampagesize() in 'system/hostmem.h'Philippe Mathieu-Daudé2025-03-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Both qemu_minrampagesize() and qemu_maxrampagesize() are related to host memory backends, having the following call stack: qemu_minrampagesize() -> find_min_backend_pagesize() -> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND) qemu_maxrampagesize() -> find_max_backend_pagesize() -> object_dynamic_cast(obj, TYPE_MEMORY_BACKEND) Having TYPE_MEMORY_BACKEND defined in "system/hostmem.h": include/system/hostmem.h:23:#define TYPE_MEMORY_BACKEND "memory-backend" Move their prototype declaration to "system/hostmem.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Eric Auger <eric.auger@redhat.com> Message-Id: <20250308230917.18907-7-philmd@linaro.org> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250311085743.21724-2-philmd@linaro.org Signed-off-by: Cédric Le Goater <clg@redhat.com>
* include: Rename sysemu/ -> system/Philippe Mathieu-Daudé2024-12-201-1/+1
| | | | | | | | | | | | | Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
* vfio/container: Change VFIOContainerBase to use QOMCédric Le Goater2024-06-241-0/+3
| | | | | | | | | | | | | | VFIOContainerBase was made a QOM interface because we believed that a QOM object would expose all the IOMMU backends to the QEMU machine and human interface. This only applies to user creatable devices or objects. Change the VFIOContainerBase nature from interface to object and make the necessary adjustments in the VFIO_IOMMU hierarchy. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Tested-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: Make VFIOIOMMUClass::add_window() and its wrapper return boolZhenzhong Duan2024-05-161-8/+8
| | | | | | | | | | | | | Make VFIOIOMMUClass::add_window() and its wrapper function vfio_container_add_section_window() return bool. This is to follow the coding standand to return bool if 'Error **' is used to pass error. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: Make VFIOIOMMUClass::setup() return boolZhenzhong Duan2024-05-161-7/+5
| | | | | | | | | | This is to follow the coding standand to return bool if 'Error **' is used to pass error. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: Introduce a sPAPR VFIOIOMMU QOM interfaceCédric Le Goater2024-01-051-15/+24
| | | | | | | | | | | | | | | | Move vfio_spapr_container_setup() to a VFIOIOMMUClass::setup handler and convert the sPAPR VFIOIOMMUOps struct to a QOM interface. The sPAPR QOM interface inherits from the legacy QOM interface because because both have the same basic needs. The sPAPR interface is then extended with the handlers specific to the sPAPR IOMMU. This allows reuse and provides better abstraction of the backends. It will be useful to avoid compiling the sPAPR IOMMU backend on targets not supporting it. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: Extend VFIOIOMMUOps with a release handlerCédric Le Goater2024-01-051-16/+19
| | | | | | | | | This allows to abstract a bit more the sPAPR IOMMU support in the legacy IOMMU backend. Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Tested-by: Eric Farman <farman@linux.ibm.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: Move hostwin_list into spapr containerZhenzhong Duan2023-12-191-16/+20
| | | | | | | | No functional changes intended. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: Move prereg_listener into spapr containerZhenzhong Duan2023-12-191-8/+16
| | | | | | | | No functional changes intended. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: switch to spapr IOMMU BE add/del_section_windowZhenzhong Duan2023-12-191-5/+14
| | | | | | | | No functional change intended. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: Introduce spapr backend and target interfaceZhenzhong Duan2023-12-191-0/+14
| | | | | | | | | | | | | Introduce an empty spapr backend which will hold spapr specific content, currently only prereg_listener and hostwin_list. Also introduce two spapr specific callbacks add/del_window into VFIOIOMMUOps. Instantiate a spapr ops with a helper setup_spapr_ops and assign it to bcontainer->ops. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Move listener to base containerEric Auger2023-12-191-5/+6
| | | | | | | | | | | | | | Move listener to base container. Also error and initialized fields are moved at the same time. No functional change intended. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Move pgsizes and dma_max_mappings to base containerEric Auger2023-12-191-4/+6
| | | | | | | | | | | No functional change intended. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/common: Move vfio_host_win_add/del into spapr.cZhenzhong Duan2023-11-061-0/+83
| | | | | | | | | | | | | | | | | | | | | Only spapr supports a customed host window list, other vfio driver assume 64bit host window. So remove the check in listener callback and move vfio_host_win_add/del into spapr.c and make it static. With the check removed, we still need to do the same check for VFIO_SPAPR_TCE_IOMMU which allows a single host window range [dma32_window_start, dma32_window_size). Move vfio_find_hostwin into spapr.c and do same check in vfio_container_add_section_window instead. When mapping a ram device section, if it's unaligned with hostwin->iova_pgsizes, this mapping is bypassed. With hostwin moved into spapr, we changed to check container->pgsizes. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/spapr: Make vfio_spapr_create/remove_window staticZhenzhong Duan2023-11-061-24/+24
| | | | | | | | | | | | vfio_spapr_create_window calls vfio_spapr_remove_window, With reoder of definition of the two, we can make vfio_spapr_create/remove_window static. No functional changes intended. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Move spapr specific init/deinit into spapr.cZhenzhong Duan2023-11-061-1/+80
| | | | | | | | | | | | | | | | | Move spapr specific init/deinit code into spapr.c and wrap them with vfio_spapr_container_init/deinit, this way footprint of spapr is further reduced, vfio_prereg_listener could also be made static. vfio_listener_release is unnecessary when prereg_listener is moved out, so have it removed. No functional changes intended. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/container: Move vfio_container_add/del_section_window into spapr.cZhenzhong Duan2023-11-061-0/+90
| | | | | | | | | | | | vfio_container_add/del_section_window are spapr specific functions, so move them into spapr.c to make container.c cleaner. No functional changes intended. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* Replace qemu_real_host_page variables with inlined functionsMarc-André Lureau2022-04-061-4/+4
| | | | | | | | | | | | Replace the global variables with inlined helper functions. getpagesize() is very likely annotated with a "const" function attribute (at least with glibc), and thus optimization should apply even better. This avoids the need for a constructor initialization too. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20220323155743.1585078-12-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* memory: Name all the memory listenersPeter Xu2021-09-301-0/+1
| | | | | | | | | | Provide a name field for all the memory listeners. It can be used to identify which memory listener is which. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-Id: <20210817013553.30584-2-peterx@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Do not include cpu.h if it's not really necessaryThomas Huth2021-05-021-1/+0
| | | | | | | | Stop including cpu.h in files that don't need it. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-Id: <20210416171314.2074665-4-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* vfio/spapr: Fix page size calculationAlexey Kardashevskiy2020-04-071-3/+3
| | | | | | | | | | | | | | | | | Coverity detected an issue (CID 1421903) with potential call of clz64(0) which returns 64 which make it do "<<" with a negative number. This checks the mask and avoids undefined behaviour. In practice pgsizes and memory_region_iommu_get_min_page_size() always have some common page sizes and even if they did not, the resulting page size would be 0x8000.0000.0000.0000 (gcc 9.2) and ioctl(VFIO_IOMMU_SPAPR_TCE_CREATE) would fail anyway. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20200324063912.25063-1-aik@ozlabs.ru> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* core: replace getpagesize() with qemu_real_host_page_sizeWei Yang2019-10-261-3/+4
| | | | | | | | | | | | | | | | | | | | | There are three page size in qemu: real host page size host page size target page size All of them have dedicate variable to represent. For the last two, we use the same form in the whole qemu project, while for the first one we use two forms: qemu_real_host_page_size and getpagesize(). qemu_real_host_page_size is defined to be a replacement of getpagesize(), so let it serve the role. [Note] Not fully tested for some arch or device. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> Message-Id: <20191013021145.16011-3-richardw.yang@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* vfio: Turn the container error into an Error handleEric Auger2019-10-041-1/+3
| | | | | | | | | | | | | The container error integer field is currently used to store the first error potentially encountered during any vfio_listener_region_add() call. However this fails to propagate detailed error messages up to the vfio_connect_container caller. Instead of using an integer, let's use an Error handle. Messages are slightly reworded to accomodate the propagation. Signed-off-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* exec: Introduce qemu_maxrampagesize() and rename qemu_getrampagesize()David Hildenbrand2019-04-251-1/+1
| | | | | | | | | | | | | | | | | | Rename qemu_getrampagesize() to qemu_minrampagesize(). While at it, properly rename find_max_supported_pagesize() to find_min_backend_pagesize(). s390x is actually interested into the maximum ram pagesize, so introduce and use qemu_maxrampagesize(). Add a TODO, indicating that looking at any mapped memory backends is not 100% correct in some cases. Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20190417113143.5551-3-david@redhat.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com>
* vfio/spapr: Rename local systempagesize variableAlexey Kardashevskiy2019-03-121-3/+3
| | | | | | | | | | The "systempagesize" name suggests that it is the host system page size while it is the smallest page size of memory backing the guest RAM so let's rename it to stop confusion. This should cause no behavioral change. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190227085149.38596-4-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* vfio/spapr: Fix indirect levels calculationAlexey Kardashevskiy2019-03-121-10/+33
| | | | | | | | | | | | | | | | | | | | The current code assumes that we can address more bits on a PCI bus for DMA than we really can but there is no way knowing the actual limit. This makes a better guess for the number of levels and if the kernel fails to allocate that, this increases the level numbers till succeeded or reached the 64bit limit. This adds levels to the trace point. This may cause the kernel to warn about failed allocation: [65122.837458] Failed to allocate a TCE memory, level shift=28 which might happen if MAX_ORDER is not large enough as it can vary: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/powerpc/Kconfig?h=v5.0-rc2#n727 Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Message-Id: <20190227085149.38596-3-aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* vfio/spapr: Allow backing bigger guest IOMMU pages with smaller physical pagesAlexey Kardashevskiy2018-08-211-1/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | At the moment the PPC64/pseries guest only supports 4K/64K/16M IOMMU pages and POWER8 CPU supports the exact same set of page size so so far things worked fine. However POWER9 supports different set of sizes - 4K/64K/2M/1G and the last two - 2M and 1G - are not even allowed in the paravirt interface (RTAS DDW) so we always end up using 64K IOMMU pages, although we could back guest's 16MB IOMMU pages with 2MB pages on the host. This stores the supported host IOMMU page sizes in VFIOContainer and uses this later when creating a new DMA window. This uses the system page size (64k normally, 2M/16M/1G if hugepages used) as the upper limit of the IOMMU pagesize. This changes the type of @pagesize to uint64_t as this is what memory_region_iommu_get_min_page_size() returns and clz64() takes. There should be no behavioral changes on platforms other than pseries. The guest will keep using the IOMMU page size selected by the PHB pagesize property as this only changes the underlying hardware TCE table granularity. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* vfio, spapr: Fix levels calculationAlexey Kardashevskiy2017-09-151-1/+1
| | | | | | | | | | | | | | The existing tries to round up the number of pages but @pages is always calculated as the rounded up value minus one which makes ctz64() always return 0 and have create.levels always set 1. This removes wrong "-1" and allows having more than 1 levels. This becomes handy for >128GB guests with standard 64K pages as this requires blocks with zone order 9 and the popular limit of CONFIG_FORCE_MAX_ZONEORDER=9 means that only blocks up to order 8 are allowed. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* memory/iommu: QOM'fy IOMMU MemoryRegionAlexey Kardashevskiy2017-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | This defines new QOM object - IOMMUMemoryRegion - with MemoryRegion as a parent. This moves IOMMU-related fields from MR to IOMMU MR. However to avoid dymanic QOM casting in fast path (address_space_translate, etc), this adds an @is_iommu boolean flag to MR and provides new helper to do simple cast to IOMMU MR - memory_region_get_iommu. The flag is set in the instance init callback. This defines memory_region_is_iommu as memory_region_get_iommu()!=NULL. This switches MemoryRegion to IOMMUMemoryRegion in most places except the ones where MemoryRegion may be an alias. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Message-Id: <20170711035620.4232-2-aik@ozlabs.ru> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* memory: Replace skip_dump flag with "ram_device"Alex Williamson2016-10-311-1/+1
| | | | | | | | | | | | | | | Setting skip_dump on a MemoryRegion allows us to modify one specific code path, but the restriction we're trying to address encompasses more than that. If we have a RAM MemoryRegion backed by a physical device, it not only restricts our ability to dump that region, but also affects how we should manipulate it. Here we recognize that MemoryRegions do not change to sometimes allow dumps and other times not, so we replace setting the skip_dump flag with a new initializer so that we know exactly the type of region to which we're applying this behavior. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
* vfio/spapr: Remove stale ioctl() callDavid Gibson2016-07-181-1/+0
| | | | | | | | | | | | | | | This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an earlier version of the code and has since been folded into vfio_spapr_remove_window(). It wasn't caught because although the argument structure has been removed, the libc function remove() means this didn't trigger a compile failure. The ioctl() was also almost certain to fail silently and harmlessly with the bogus argument, so this wasn't caught in testing. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
* vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)Alexey Kardashevskiy2016-07-051-0/+71
| | | | | | | | | | | | | | | | | | | | | | New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management. This adds ability to VFIO common code to dynamically allocate/remove DMA windows in the host kernel when new VFIO container is added/removed. This adds a helper to vfio_listener_region_add which makes VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into the host IOMMU list; the opposite action is taken in vfio_listener_region_del. When creating a new window, this uses heuristic to decide on the TCE table levels number. This should cause no guest visible change in behavior. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> [dwg: Added some casts to prevent printf() warnings on certain targets where the kernel headers' __u64 doesn't match uint64_t or PRIx64] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
* vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2)Alexey Kardashevskiy2016-07-051-0/+139
This makes use of the new "memory registering" feature. The idea is to provide the userspace ability to notify the host kernel about pages which are going to be used for DMA. Having this information, the host kernel can pin them all once per user process, do locked pages accounting (once) and not spent time on doing that in real time with possible failures which cannot be handled nicely in some cases. This adds a prereg memory listener which listens on address_space_memory and notifies a VFIO container about memory which needs to be pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped. The feature is only enabled for SPAPR IOMMU v2. The host kernel changes are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does not call it when v2 is detected and enabled. This enforces guest RAM blocks to be host page size aligned; however this is not new as KVM already requires memory slots to be host page size aligned. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [dwg: Fix compile error on 32-bit host] Signed-off-by: David Gibson <david@gibson.dropbear.id.au>