summary refs log tree commit diff stats
path: root/hw/vfio/pci.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* migration: Remove error variant of vmstate_save_state() functionArun Menon2025-10-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into vmstate_load_state()Arun Menon2025-10-031-1/+4
| | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* vfio/pci.c: rename vfio_pci_nohotplug_dev_info to vfio_pci_nohotplug_infoMark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | This changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-23-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_nohotplug_dev_class_init() to ↵Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | | | vfio_pci_nohotplug_class_init() This changes the function prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-22-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_dev_nohotplug_properties[] to ↵Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | | | vfio_pci_nohotplug_properties[] This changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-21-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_dev_properties[] to vfio_pci_properties[]Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | This changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-20-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_base_dev_info to vfio_pci_device_infoMark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | This changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-19-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_base_dev_class_init() to ↵Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | | | vfio_pci_device_class_init() This changes the function prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-18-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* hw/vfio/types.h: rename TYPE_VFIO_PCI_BASE to TYPE_VFIO_PCI_DEVICEMark Cave-Ayland2025-09-251-14/+14
| | | | | | | | | This brings the QOM type name in line with the underlying VFIOPCIDevice structure. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-17-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_dev_info to vfio_pci_infoMark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | This changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-16-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_pci_dev_class_init() to vfio_pci_class_init()Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | This changes the function prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-15-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_instance_finalize() to vfio_pci_finalize()Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | | This is the more typical naming convention for QOM finalize() functions, in particular it changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-14-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: rename vfio_instance_init() to vfio_pci_init()Mark Cave-Ayland2025-09-251-2/+2
| | | | | | | | | | This is the more typical naming convention for QOM init() functions, in particular it changes the prefix to match the name of the QOM type. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250925113159.1760317-13-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: Do not unparent in instance_finalize()Akihiko Odaki2025-09-241-4/+0
| | | | | | | | | | | | | | | Children are automatically unparented so manually unparenting is unnecessary. Worse, automatic unparenting happens before the insntance_finalize() callback of the parent gets called, so object_unparent() calls in the callback will refer to objects that are already unparented, which is semantically incorrect. Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/20250924-use-v4-2-07c6c598f53d@rsg.ci.i.u-tokyo.ac.jp Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* vfio/pci.h: rename VFIOPCIDevice pdev field to parent_objMark Cave-Ayland2025-09-081-2/+2
| | | | | | | | | | | | Now that nothing accesses the pdev field directly, rename pdev to parent_obj as per our current coding guidelines. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/qemu-devel/20250715093110.107317-23-mark.caveayland@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci.c: use QOM casts where appropriateMark Cave-Ayland2025-09-081-83/+121
| | | | | | | | | | | Use QOM casts to convert between VFIOPCIDevice and PCIDevice instead of accessing pdev directly. Signed-off-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250715093110.107317-17-mark.caveayland@nutanix.com [ clg: Updated vfio_sub_page_bar_update_mappings() ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: Introduce helper vfio_pci_from_vfio_device()Zhenzhong Duan2025-09-081-0/+9
| | | | | | | | | | | | Introduce helper vfio_pci_from_vfio_device() to transform from VFIODevice to VFIOPCIDevice, also to hide low level VFIO_DEVICE_TYPE_PCI type check. Suggested-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250822064101.123526-5-zhenzhong.duan@intel.com [ clg: Added documentation ] Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: Document 'use-legacy-x86-rom' propertyCédric Le Goater2025-08-091-0/+3
| | | | | | | | | | | Commit 350785d41d8b ("ramfb: Add property to control if load the romfile") introduced the `use-legacy-x86-rom` property for the `vfio-pci-nohotplug` device. Add documentation for the property. Fixes: d5fcf0d960d8 ("hw/i386: Add the ramfb romfile compatibility") Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/qemu-devel/20250805065543.120091-1-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: preserve pending interruptsSteve Sistare2025-08-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | cpr-transfer may lose a VFIO interrupt because the KVM instance is destroyed and recreated. If an interrupt arrives in the middle, it is dropped. To fix, stop pending new interrupts during cpr save, and pick up the pieces. In more detail: Stop the VCPUs. Call kvm_irqchip_remove_irqfd_notifier_gsi --> KVM_IRQFD to deassign the irqfd gsi that routes interrupts directly to the VCPU and KVM. After this call, interrupts fall back to the kernel vfio_msihandler, which writes to QEMU's kvm_interrupt eventfd. CPR already preserves that eventfd. When the route is re-established in new QEMU, the kernel tests the eventfd and injects an interrupt to KVM if necessary. Deassign INTx in a similar manner. For both MSI and INTx, remove the eventfd handler so old QEMU does not consume an event. If an interrupt was already pended to KVM prior to the completion of kvm_irqchip_remove_irqfd_notifier_gsi, it will be recovered by the subsequent call to cpu_synchronize_all_states, which pulls KVM interrupt state to userland prior to saving it in vmstate. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1752689169-233452-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: augment set_handlerSteve Sistare2025-08-091-2/+11
| | | | | | | | | | Extend vfio_pci_msi_set_handler() so it can set or clear the handler. Add a similar accessor for INTx. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1752689169-233452-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/igd: Fix VGA regions are not exposed in legacy modeTomita Moeko2025-07-281-3/+10
| | | | | | | | | | | | | | | | In commit a59d06305fff ("vfio/pci: Introduce x-pci-class-code option"), pci_register_vga() has been moved ouside of vfio_populate_vga(). As a result, IGD VGA ranges are no longer properly exposed to guest. To fix this, call pci_register_vga() after vfio_populate_vga() legacy mode. A wrapper function vfio_pci_config_register_vga() is introduced to handle it. Fixes: a59d06305fff ("vfio/pci: Introduce x-pci-class-code option") Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250723160906.44941-3-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: fix sub-page bar after cprSteve Sistare2025-07-281-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | Regions for sub-page BARs are normally mapped here, in response to the guest writing to PCI config space: vfio_pci_write_config() pci_default_write_config() pci_update_mappings() memory_region_add_subregion() vfio_sub_page_bar_update_mapping() ... vfio_dma_map() However, after CPR, the guest does not reconfigure the device and the code path above is not taken. To fix, in vfio_cpr_pci_post_load, call vfio_sub_page_bar_update_mapping for each sub-page BAR with a valid address. Fixes: 7e9f21411302 ("vfio/container: restore DMA vaddr") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1752520890-223356-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* hw/i386: Fix 'use-legacy-x86-rom' property compatibilityCédric Le Goater2025-07-281-2/+0
| | | | | | | | | | | | | | | | | | | | Commit 350785d41d8b ("ramfb: Add property to control if load the romfile") introduced the `use-legacy-x86-rom` property for the `vfio-pci-nohotplug` device, allowing control over VGA BIOS ROM loading. However, the property compatibility setting was incorrectly applied to the `vfio-pci` device instead, which causes all `vfio-pci` devices to fail to load. This change fixes the issue by ensuring the property is set on the correct device. Fixes: d5fcf0d960d8 ("hw/i386: Add the ramfb romfile compatibility") Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Shaoqin Huang <shahuang@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250723062714.1245826-1-clg@redhat.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* Merge tag 'display-20250718-pull-request' of https://gitlab.com/kraxel/qemu ↵Stefan Hajnoczi2025-07-211-0/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging Load ramfb vgabios on x86 only. # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCgAdFiEEoDKM/7k6F6eZAf59TLbY7tPocTgFAmh6o80ACgkQTLbY7tPo # cTjxPBAAktTXxFK6loSMSWC1ul8RCl/4F7G84J4eT+Ui8/KIG8do5KcebTnXb9zo # keOG7n9HPk4fROWiAFgGnuBfw41DWmLDS34iuENrG3X26TQgSSgBveuwas67Pzqu # HpaFSxjh7BRLlkUWaNoll57cDM3kKLmx+Onw6m/7kbcVXAsy1N4wxfCT1faUU7ID # R1ggULG1WhB8q+YtQjac6EfOpdHe1BTBGLuxSwE3mNkce9ZP7C8uxZTCR5PXggZi # IXzJzGpFRDCHqrilWksiE62yF20Kem4ZcpO/GgLWmF+X+DYBDEWcajihvF20TGUL # n6dyT7MBxuvqFy0OtBPHNcnq2PZzOIKyxyMvBg9402xeD6goNbFKloAYeae4C9u0 # QuqQUpb8D3lVagVu55N5XfpdMHR0P8yefPAjaFL4o3rf2JSjyI6MRX/+2eA7aXcX # xiwHSx3iavEeNQNsPZsS3JhH5bKy/zkWRiBd+msGVAYMZGzhdEtLg/w8yUd6dQ5p # /3Y3F4fL6T6QSwhsiihcbdPtjhfVCP09MYK/P4cIFbWOzjfbndt1/UIXHQ54s8Jo # PShcE7QH7ttT2gK5nFPG5yeTqF70kKpSyhwF2pukf2fAgcU+0SNoj2zZNtHAvKeh # 8EHqAy8m1J4AlQeO5nT9tJj/v1CM0q6cljzIfV8hWWgM/hL/vLc= # =76m5 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 18 Jul 2025 15:43:09 EDT # gpg: using RSA key A0328CFFB93A17A79901FE7D4CB6D8EED3E87138 # gpg: Good signature from "Gerd Hoffmann (work) <kraxel@redhat.com>" [full] # gpg: aka "Gerd Hoffmann <gerd@kraxel.org>" [full] # gpg: aka "Gerd Hoffmann (private) <kraxel@gmail.com>" [full] # Primary key fingerprint: A032 8CFF B93A 17A7 9901 FE7D 4CB6 D8EE D3E8 7138 * tag 'display-20250718-pull-request' of https://gitlab.com/kraxel/qemu: hw/i386: Add the ramfb romfile compatibility vfio: Move the TYPE_* to hw/vfio/types.h ramfb: Add property to control if load the romfile Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: hw/core/machine.c Context conflict because the vfio-pci "x-migration-load-config-after-iter" was added recently.
| * hw/i386: Add the ramfb romfile compatibilityShaoqin Huang2025-07-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ramfb is a sysbus device so it can only used for machine types where it is explicitly enabled: # git grep machine_class_allow_dynamic_sysbus_dev.*TYPE_RAMFB_DEVICE hw/arm/virt.c: machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE); hw/i386/microvm.c: machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE); hw/i386/pc_piix.c: machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE); hw/i386/pc_q35.c: machine_class_allow_dynamic_sysbus_dev(m, TYPE_RAMFB_DEVICE); hw/loongarch/virt.c: machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE); hw/riscv/virt.c: machine_class_allow_dynamic_sysbus_dev(mc, TYPE_RAMFB_DEVICE); So these six are the only machine types we have to worry about. The three x86 machine types (pc, q35, microvm) will actually use the rom (when booting with seabios). For arm/riscv/loongarch virt we want to disable the rom. This patch sets ramfb romfile option to false by default, except for x86 machines types (pc, q35, microvm) which need the rom file when booting with seabios and machine types <= 10.0 (handling the case of arm virt, for compat reasons). At the same time, set the "use-legacy-x86-rom" property to true on those historical versioned machine types in order to avoid the memory layout being changed. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Message-ID: <20250717100941.2230408-4-shahuang@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
| * ramfb: Add property to control if load the romfileShaoqin Huang2025-07-171-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the ramfb device loads the vgabios-ramfb.bin unconditionally, but only the x86 need the vgabios-ramfb.bin, this can cause that when use the release package on arm64 it can't find the vgabios-ramfb.bin. Because only seabios will use the vgabios-ramfb.bin, load the rom logic is x86-specific. For other !x86 platforms, the edk2 ships an EFI driver for ramfb, so they don't need to load the romfile. So add a new property use-legacy-x86-rom in both ramfb and vfio_pci device, because the vfio display also use the ramfb_setup() to load the vgabios-ramfb.bin file. After have this property, the machine type can set the compatibility to not load the vgabios-ramfb.bin if the arch doesn't need it. For now the default value is true but it will be turned off by default in subsequent patch when compats get properly handled. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Shaoqin Huang <shahuang@redhat.com> Message-ID: <20250717100941.2230408-2-shahuang@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
* | vfio/migration: Max in-flight VFIO device state buffers size limitMaciej S. Szmigiero2025-07-151-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow capping the maximum total size of in-flight VFIO device state buffers queued at the destination, otherwise a malicious QEMU source could theoretically cause the target QEMU to allocate unlimited amounts of memory for buffers-in-flight. Since this is not expected to be a realistic threat in most of VFIO live migration use cases and the right value depends on the particular setup disable this limit by default by setting it to UINT64_MAX. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/4f7cad490988288f58e36b162d7a888ed7e7fd17.1752589295.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* | vfio/migration: Add x-migration-load-config-after-iter VFIO propertyMaciej S. Szmigiero2025-07-151-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This property allows configuring whether to start the config load only after all iterables were loaded, during non-iterables loading phase. Such interlocking is required for ARM64 due to this platform VFIO dependency on interrupt controller being loaded first. The property defaults to AUTO, which means ON for ARM, OFF for other platforms. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/0e03c60dbc91f9a9ba2516929574df605b7dfcb4.1752589295.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* | vfio/pci: Introduce x-pci-class-code optionTomita Moeko2025-07-151-4/+25
|/ | | | | | | | | | | | | | | | | | | | | Introduce x-pci-class-code option to allow users to override PCI class code of a device, similar to the existing x-pci-vendor-id option. Only the lower 24 bits of this option are used, though a uint32 is used here for determining whether the value is valid and set by user. Additionally, to ensure VGA ranges are only exposed on VGA devices, pci_register_vga() is now called in vfio_pci_config_setup(), after the class code override is completed. This is mainly intended for IGD devices that expose themselves either as VGA controller (primary display) or Display controller (non-primary display). The UEFI GOP driver depends on the device reporting a VGA controller class code (0x030000). Signed-off-by: Tomita Moeko <tomitamoeko@gmail.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250708145211.6179-1-tomitamoeko@gmail.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/iommufd: add vfio_device_free_nameSteve Sistare2025-07-031-1/+1
| | | | | | | | | | | | Define vfio_device_free_name to free the name created by vfio_device_get_name. A subsequent patch will do more there. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1751493538-202042-11-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio-pci: preserve INTxSteve Sistare2025-07-031-2/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | Preserve vfio INTx state across cpr-transfer. Preserve VFIOINTx fields as follows: pin : Recover this from the vfio config in kernel space interrupt : Preserve its eventfd descriptor across exec. unmask : Ditto route.irq : This could perhaps be recovered in vfio_pci_post_load by calling pci_device_route_intx_to_irq(pin), whose implementation reads config space for a bridge device such as ich9. However, there is no guarantee that the bridge vmstate is read before vfio vmstate. Rather than fiddling with MigrationPriority for vmstate handlers, explicitly save route.irq in vfio vmstate. pending : save in vfio vmstate. mmap_timeout, mmap_timer : Re-initialize bool kvm_accel : Re-initialize In vfio_realize, defer calling vfio_intx_enable until the vmstate is available, in vfio_pci_post_load. Modify vfio_intx_enable and vfio_intx_kvm_enable to skip vfio initialization, but still perform kvm initialization. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1751493538-202042-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio-pci: preserve MSISteve Sistare2025-07-031-2/+50
| | | | | | | | | | | | | Save the MSI message area as part of vfio-pci vmstate, and preserve the interrupt and notifier eventfd's. migrate_incoming loads the MSI data, then the vfio-pci post_load handler finds the eventfds in CPR state, rebuilds vector data structures, and attaches the interrupts to the new KVM instance. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1751493538-202042-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: export MSI functionsSteve Sistare2025-06-111-12/+17
| | | | | | | | | | Export various MSI functions, renamed with a vfio_pci prefix, for use by CPR in subsequent patches. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-18-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: vfio_notifier_cleanupSteve Sistare2025-06-111-11/+17
| | | | | | | | | | | Move event_notifier_cleanup calls to a helper vfio_notifier_cleanup. This version is trivial, and does not yet use the vdev and nr parameters. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-17-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: vfio_notifier_init cpr parametersSteve Sistare2025-06-111-12/+19
| | | | | | | | | | Pass vdev and nr to vfio_notifier_init, for use by CPR in a subsequent patch. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-16-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: pass vector to virq functionsSteve Sistare2025-06-111-6/+7
| | | | | | | | | | | Pass the vector number to vfio_connect_kvm_msi_virq and vfio_remove_kvm_msi_virq, so it can be passed to their subroutines in a subsequent patch. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-15-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: vfio_notifier_initSteve Sistare2025-06-111-15/+25
| | | | | | | | | | | Move event_notifier_init calls to a helper vfio_notifier_init. This version is trivial, but it will be expanded to support CPR in subsequent patches. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-14-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: vfio_pci_vector_initSteve Sistare2025-06-111-7/+17
| | | | | | | | | Extract a subroutine vfio_pci_vector_init. No functional change. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-13-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio-pci: skip reset during cprSteve Sistare2025-06-111-0/+7
| | | | | | | | | | | Do not reset a vfio-pci device during CPR, and do not complain if the kernel's PCI config space changes for non-emulated bits between the vmstate save and load, which can happen due to ongoing interrupt activity. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-12-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* pci: skip reset during cprSteve Sistare2025-06-111-0/+7
| | | | | | | | | Do not reset a vfio-pci device during CPR. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749576403-25355-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: mark posted writes in region write callbacksJohn Levon2025-06-111-1/+4
| | | | | | | | | | For vfio-user, the region write implementation needs to know if the write is posted; add the necessary plumbing to support this. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-5-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: export PCI helpers needed for vfio-userJohn Levon2025-06-111-24/+24
| | | | | | | | | | | | The vfio-user code will need to re-use various parts of the vfio PCI code. Export them in hw/vfio/pci.h, and rename them to the vfio_pci_* namespace. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Mark Cave-Ayland <mark.caveayland@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250607001056.335310-2-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio/pci: Fix instance_size of VFIO_PCI_BASEZhenzhong Duan2025-06-111-2/+1
| | | | | | | | | | | | | | | | Currently the final instance_size of VFIO_PCI_BASE is sizeof(PCIDevice). It should be sizeof(VFIOPCIDevice), VFIO_PCI uses same structure as base class VFIO_PCI_BASE, so no need to set its instance_size explicitly. This isn't catastrophic only because VFIO_PCI_BASE is an abstract class. Fixes: d4e392d0a99b ("vfio: add vfio-pci-base class") Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Link: https://lore.kernel.org/qemu-devel/20250611024228.423666-1-zhenzhong.duan@intel.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: refactor out IRQ signalling setupJohn Levon2025-06-051-15/+20
| | | | | | | | | | This makes for a slightly more readable vfio_msix_vector_do_use() implementation, and we will rely on this shortly. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250520150419.2172078-5-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: move config space read into vfio_pci_config_setup()John Levon2025-06-051-15/+16
| | | | | | | | | | | Small cleanup that reduces duplicate code for vfio-user and reduces the size of vfio_realize(); while we're here, correct that name to vfio_pci_realize(). Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250520150419.2172078-4-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: move more cleanup into vfio_pci_put_device()John Levon2025-06-051-11/+12
| | | | | | | | | | All of the cleanup can be done in the same place, and vfio-user will want to do the same. Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250520150419.2172078-3-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: add vfio-pci-base classJohn Levon2025-05-091-22/+40
| | | | | | | | | | | | | | | | | | Split out parts of TYPE_VFIO_PCI into a base TYPE_VFIO_PCI_BASE, although we have not yet introduced another subclass, so all the properties have remained in TYPE_VFIO_PCI. Note that currently there is no need for additional data for TYPE_VFIO_PCI, so it shares the same C struct type as TYPE_VFIO_PCI_BASE, VFIOPCIDevice. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-14-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: add read/write to device IO ops vectorJohn Levon2025-05-091-14/+14
| | | | | | | | | Now we have the region info cache, add ->region_read/write device I/O operations instead of explicit pread()/pwrite() system calls. Signed-off-by: John Levon <john.levon@nutanix.com> Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-13-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: add region info cacheJohn Levon2025-05-091-3/+3
| | | | | | | | | | | | | | | | | | | | Instead of requesting region information on demand with VFIO_DEVICE_GET_REGION_INFO, maintain a cache: this will become necessary for performance for vfio-user, where this call becomes a message over the control socket, so is of higher overhead than the traditional path. We will also need it to generalize region accesses, as that means we can't use ->config_offset for configuration space accesses, but must look up the region offset (if relevant) each time. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-12-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* vfio: add device IO ops vectorJohn Levon2025-05-091-6/+4
| | | | | | | | | | | | | | | | For vfio-user, device operations such as IRQ handling and region read/writes are implemented in userspace over the control socket, not ioctl() to the vfio kernel driver; add an ops vector to generalize this, and implement vfio_device_io_ops_ioctl for interacting with the kernel vfio driver. Originally-by: John Johnson <john.g.johnson@oracle.com> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com> Signed-off-by: John Levon <john.levon@nutanix.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/20250507152020.1254632-11-john.levon@nutanix.com Signed-off-by: Cédric Le Goater <clg@redhat.com>