summary refs log tree commit diff stats
path: root/hw/virtio (follow)
Commit message (Collapse)AuthorAgeFilesLines
* hw: Remove unnecessary 'system/ram_addr.h' headerPhilippe Mathieu-Daudé2025-10-071-1/+0
| | | | | | | | | | | | | | None of these files require definition exposed by "system/ram_addr.h", remove its inclusion. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Jagannathan Raman <jag.raman@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Eric Farman <farman@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20251001175448.18933-7-philmd@linaro.org>
* hw/virtio/virtio: Replace legacy cpu_physical_memory_map() callPhilippe Mathieu-Daudé2025-10-071-4/+6
| | | | | | | | | | Propagate VirtIODevice::dma_as to virtqueue_undo_map_desc() in order to replace the legacy cpu_physical_memory_unmap() call by address_space_unmap(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-18-philmd@linaro.org>
* hw/virtio/vhost: Replace legacy cpu_physical_memory_*map() callsPhilippe Mathieu-Daudé2025-10-071-2/+5
| | | | | | | | | Use VirtIODevice::dma_as address space to convert the legacy cpu_physical_memory_[un]map() calls to address_space_[un]map(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-17-philmd@linaro.org>
* system/ramblock: Move ram_block_discard_*_range() declarationsPhilippe Mathieu-Daudé2025-10-072-0/+2
| | | | | | | | | Keep RAM blocks API in the same header: "system/ramblock.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Acked-by: Peter Xu <peterx@redhat.com> Message-Id: <20251002032812.26069-4-philmd@linaro.org>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Richard Henderson2025-10-0614-148/+371
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes users can now control VM bit in smbios. vhost-user-device is now user-createable. intel_iommu now supports PRI virtio-net now supports GSO over UDP tunnel ghes now supports error injection amd iommu now supports dma remapping for vfio better error messages for virtio small fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmji0s0PHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpuH4H/09h70IqAWZGHIWKGmmGGtdKOj3g54KuI0Ss # mGECEsHvvBexOy670Qy8jdgXfaW4UuNui8BiOnJnGsBX8Y0dy+/yZori3KhkXkaY # D57Ap9agkpHem7Vw0zgNsAF2bzDdlzTiQ6ns5oDnSq8yt82onCb5WGkWTGkPs/jL # Gf8Jv+Ddcpt5SU4/hHPYC8pUhl7z4xPOOyl0Qp1GG21Pxf5v4sGFcWuGGB7UEPSQ # MjZeoM0rSnLDtNg18sGwD5RPLQs13TbtgsVwijI79c3w3rcSpPNhGR5OWkdRCIYF # 8A0Nhq0Yfo0ogTht7yt1QNPf/ktJkuoBuGVirvpDaix2tCBECes= # =Zvq/ # -----END PGP SIGNATURE----- # gpg: Signature made Sun 05 Oct 2025 01:19:25 PM PDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [unknown] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (75 commits) virtio: improve virtqueue mapping error messages pci: Fix wrong parameter passing to pci_device_get_iommu_bus_devfn() intel_iommu: Simplify caching mode check with VFIO device intel_iommu: Enable Enhanced Set Root Table Pointer Support (ESRTPS) vdpa-dev: add get_vhost() callback for vhost-vdpa device amd_iommu: HATDis/HATS=11 support intel-iommu: Move dma_translation to x86-iommu amd_iommu: Refactor amdvi_page_walk() to use common code for page walk amd_iommu: Do not assume passthrough translation when DTE[TV]=0 amd_iommu: Toggle address translation mode on devtab entry invalidation amd_iommu: Add dma-remap property to AMD vIOMMU device amd_iommu: Set all address spaces to use passthrough mode on reset amd_iommu: Toggle memory regions based on address translation mode amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL amd_iommu: Add replay callback amd_iommu: Unmap all address spaces under the AMD IOMMU on reset amd_iommu: Use iova_tree records to determine large page size on UNMAP amd_iommu: Sync shadow page tables on page invalidation amd_iommu: Add basic structure to support IOMMU notifier updates amd_iommu: Add a page walker to sync shadow page tables on invalidation ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * virtio: improve virtqueue mapping error messagesAlessandro Ratti2025-10-051-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve error reporting when virtqueue ring mapping fails by including a device identifier in the error message. Introduce a helper qdev_get_printable_name() in qdev-core, which returns either: - the device ID, if explicitly provided (e.g. -device ...,id=foo) - the QOM path from qdev_get_dev_path(dev) otherwise - "<unknown device>" as a fallback when no identifier is present This makes it easier to identify which device triggered the error in multi-device setups or when debugging complex guest configurations. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/230 Buglink: https://bugs.launchpad.net/qemu/+bug/1919021 Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alessandro Ratti <alessandro@0x65c.net> Message-Id: <20250924093138.559872-2-alessandro@0x65c.net> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vdpa-dev: add get_vhost() callback for vhost-vdpa deviceLi Zhaoxin2025-10-051-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c255488d67 "virtio: add vhost support for virtio devices" added the get_vhost() function, but it did not include vhost-vdpa devices. So when I use the vdpa device and query the status of the vdpa device with the x-query-virtio-status qmp command, since vdpa does not implement vhost_get, it will cause qemu to crash. Therefore, in order to obtain the status of the virtio device under vhost-vdpa, we need to add a vhost_get implement for the vdpa device. Co-developed-by: Miao Kezhan <miaokezhan@baidu.com> Signed-off-by: Miao Kezhan <miaokezhan@baidu.com> Signed-off-by: Li Zhaoxin <lizhaoxin04@baidu.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <2778f817cb6740a15ecb37927804a67288b062d1.1758860411.git.lizhaoxin04@baidu.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: support irqfd in virtio_notify_config()Stefan Hajnoczi2025-10-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio_error() calls virtio_notify_config() to inject a VIRTIO Configuration Change Notification. This doesn't work from IOThreads because the BQL is not held and the interrupt code path requires the BQL. Follow the same approach as virtio_notify() and use ->config_notifier (an irqfd) when called from the IOThread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-4-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: unify virtio_notify_irqfd() and virtio_notify()Stefan Hajnoczi2025-10-052-16/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The difference between these two functions: - virtio_notify() uses the interrupt code path (MSI or classic IRQs) - virtio_notify_irqfd() uses guest notifiers (irqfds) virtio_notify() can only be called with the BQL held because the interrupt code path requires the BQL. Device models use virtio_notify_irqfd() from IOThreads since the BQL is not held. The two functions can be unified by pushing down the if (qemu_in_iothread()) check from virtio-blk and virtio-scsi into core virtio code. This is in preparation for the next commit that will add irqfd support to virtio_notify_config() and where it's unattractive to introduce another irqfd-only API for device model callers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-3-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: use virtio_config_get_guest_notifier()Stefan Hajnoczi2025-10-051-4/+7
| | | | | | | | | | | | | | | | | | | | | | There is a getter function so avoid accessing the ->config_notifier field directly. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-2-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * hw/virtio: rename vhost-user-device and make user creatableAlex Bennée2025-10-054-18/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We didn't make the device user creatable in the first place because we were worried users might get confused. Rename the device to make its nature as a test device even more explicit. While we are at it add a Kconfig variable so it can be skipped for those that want to thin out their build configuration even further. Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-ID: <20250820195632.1956795-1-alex.bennee@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250901105948.982583-1-alex.bennee@linaro.org> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost-backend: implement extended features supportPaolo Abeni2025-10-041-11/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Leverage the kernel extended features manipulation ioctls(), if available, and fallback to old ops otherwise. Error out when setting extended features but kernel support is not available. Note that extended support for get/set backend features is not needed, as the only feature that can be changed belongs to the 64 bit range. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <150daade3d59e77629276920e014ee8e5fc12121.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * qmp: update virtio features map to support extended featuresPaolo Abeni2025-10-043-30/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the VirtioDeviceFeatures struct with an additional u64 to track unknown features in the 64-127 bit range and decode the full virtio features spaces for vhost and virtio devices. Also add entries for the soon-to-be-supported virtio net GSO over UDP features. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <e51969f94d89045b333f1bc5ef5fca9e12fc371a.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: add support for negotiating extended featuresPaolo Abeni2025-10-041-20/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to virtio infra, vhost core maintains the features status in the full extended format and allows the devices to implement extended version of the getter/setter. Note that 'protocol_features' are not extended: they are only used by vhost-user, and the latter device is not going to implement extended features soon. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <a0062c3b1847fb2baedd6cd8f6ef13b051d6beb2.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio-pci: implement support for extended featuresPaolo Abeni2025-10-041-9/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extend the features configuration space to 128 bits. If the virtio device supports any extended features, allow the common read/write operation to access all of it, otherwise keep exposing only the lower 64 bits. On migration, save the 128 bit version of the features only if the upper bits are non zero. Relay on reset to clear all the feature space before load. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <c0b81601f65b41ca8310eba8f05e2dcf3702de89.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: add support for negotiating extended featuresPaolo Abeni2025-10-042-6/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The virtio specifications allows for a device features space up to 128 bits and more. Soon we are going to use some of the 'extended' bits features for the virtio net driver. Add support to allow extended features negotiation on a per devices basis. Devices willing to negotiated extended features need to implemented a new pair of features getter/setter, the core will conditionally use them instead of the basic one. Note that 'bad_features' don't need to be extended, as they are bound to the 64 bits limit. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <9bb29d70adc3f2b8c7756d4e3cd076cffee87826.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: serialize extended features statePaolo Abeni2025-10-041-31/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the driver uses any of the extended features (i.e. 64 or above), store the extended features range (64-127 bits). At load time, let legacy features initialize the full features range and pass it to the set helper; sub-states loading will have filled-up the extended part as needed. This is one of the few spots that need explicitly to know and set in stone the extended features array size; add a build bug to prevent breaking the migration should such size change again in the future: more serialization plumbing will be needed. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <d5d9d398675bee6c4c7d7308c5d3d5d3c6d17d87.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | migration: Remove error variant of vmstate_save_state() functionArun Menon2025-10-033-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* | migration: push Error **errp into vmstate_load_state()Arun Menon2025-10-033-4/+8
|/ | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* treewide: use qemu_set_blocking instead of g_unix_set_fd_nonblockingVladimir Sementsov-Ogievskiy2025-09-191-6/+2
| | | | | | | | | | Instead of open-coded g_unix_set_fd_nonblocking() calls, use QEMU wrapper qemu_set_blocking(). Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> [DB: fix missing closing ) in tap-bsd.c, remove now unused GError var] Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
* util: drop qemu_socket_set_nonblock()Vladimir Sementsov-Ogievskiy2025-09-191-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use common qemu_set_blocking() instead. Note that pre-patch the behavior of Win32 and Linux realizations are inconsistent: we ignore failure for Win32, and assert success for Linux. How do we convert the callers? 1. Most of callers call qemu_socket_set_nonblock() on a freshly created socket fd, in conditions when we may simply report an error. Seems correct switching to error handling both for Windows (pre-patch error is ignored) and Linux (pre-patch we assert success). Anyway, we normally don't expect errors in these cases. Still in tests let's use &error_abort for simplicity. What are exclusions? 2. hw/virtio/vhost-user.c - we are inside #ifdef CONFIG_LINUX, so no damage in switching to error handling from assertion. 3. io/channel-socket.c: here we convert both old calls to qemu_socket_set_nonblock() and qemu_socket_set_block() to one new call. Pre-patch we assert success for Linux in qemu_socket_set_nonblock(), and ignore all other errors here. So, for Windows switch is a bit dangerous: we may get new errors or crashes(when error_abort is passed) in cases where we have silently ignored the error before (was it correct in all such cases, if they were?) Still, there is no other way to stricter API than take this risk. 4. util/vhost-user-server - compiled only for Linux (see util/meson.build), so we are safe, switching from assertion to &error_abort. Note: In qga/channel-posix.c we use g_warning(), where g_printerr() would actually be a better choice. Still let's for now follow common style of qga, where g_warning() is commonly used to print such messages, and no call to g_printerr(). Converting everything to use g_printerr() should better be another series. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
* vhost: Do not abort on log-stop errorHanna Czenczek2025-08-011-1/+2
| | | | | | | | | | | | | | Failing to stop logging in a vhost device is not exactly fatal. We can log such an error, but there is no need to abort the whole qemu process because of it. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20250724125928.61045-3-hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: Do not abort on log-start errorHanna Czenczek2025-08-011-1/+2
| | | | | | | | | | | | | | | | | | | | | Commit 3688fec8923 ("memory: Add Error** argument to .log_global_start() handler") enabled vhost_log_global_start() to return a proper error, but did not change it to do so; instead, it still aborts the whole process on error. This crash can be reproduced by e.g. killing a virtiofsd daemon before initiating migration. In such a case, qemu should not crash, but just make the attempted migration fail. Buglink: https://issues.redhat.com/browse/RHEL-94534 Reported-by: Tingting Mao <timao@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20250724125928.61045-2-hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: fix off-by-one and invalid access in virtqueue_ordered_fillJonah Palmer2025-08-011-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b44135daa372 introduced virtqueue_ordered_fill for VIRTIO_F_IN_ORDER support but had a few issues: * Conditional while loop used 'steps <= max_steps' but should've been 'steps < max_steps' since reaching steps == max_steps would indicate that we didn't find an element, which is an error. Without this change, the code would attempt to read invalid data at an index outside of our search range. * Incremented 'steps' using the next chain's ndescs instead of the current one. This patch corrects the loop bounds and synchronizes 'steps' and index increments. We also add a defensive sanity check against malicious or invalid descriptor counts to avoid a potential infinite loop and DoS. Fixes: b44135daa372 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support") Reported-by: terrynini <terrynini38514@gmail.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250721150208.2409779-1-jonah.palmer@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2025-07-165-51/+62
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= # =sktm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits) hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add hw/cxl: Create helper function to create DC Event Records from extents hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config hw/mem: cxl_type3: Add DC Region bitmap lock hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info hw/cxl: fix DC extent capacity tracking tests: virt: Update expected ACPI tables for virt test hw/acpi/aml-build: Build a root node in the PPTT table hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes tests: virt: Allow changes to PPTT test table qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests hw/arm/virt: Let virt support pci hotplug/unplug GED event ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: net/vhost-vdpa.c vhost_vdpa_set_steering_ebpf() was removed, resolve the context conflict.
| * hw/virtio: Build various files oncePhilippe Mathieu-Daudé2025-07-152-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | Now that various VirtIO files don't use target specific API anymore, we can move them to the system_ss[] source set to build them once. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250708215320.70426-9-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * qemu: Declare all load/store helper in 'qemu/bswap.h'Philippe Mathieu-Daudé2025-07-151-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Restrict "exec/tswap.h" to the tswap*() methods, move the load/store helpers with the other ones declared in "qemu/bswap.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250708215320.70426-8-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * qemu: Convert target_words_bigendian() to TargetInfo APIPhilippe Mathieu-Daudé2025-07-151-1/+1
| | | | | | | | | | | | | | | | | | Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250708215320.70426-6-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: add a helper for force stopping a deviceDaniil Tatianin2025-07-141-13/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds an ability to skip GET_VRING_BASE during device stop entirely, and thus the expensive drain operation that this call entails as well, which may be useful during a non-graceful shutdown in case the guest operating system hangs or refuses to react to a previously requested ACPI shutdown for whatever reason. Signed-off-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Message-Id: <20250609212547.2859224-3-d-tatianin@yandex-team.ru> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: Fix used memslot tracking when destroying a vhost deviceDavid Hildenbrand2025-07-141-27/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we unplug a vhost device, we end up calling vhost_dev_cleanup() where we do a memory_listener_unregister(). This memory_listener_unregister() call will end up disconnecting the listener from the address space through listener_del_address_space(). In that process, we effectively communicate the removal of all memory regions from that listener, resulting in region_del() + commit() callbacks getting triggered. So in case of vhost, we end up calling vhost_commit() with no remaining memory slots (0). In vhost_commit() we end up overwriting the global variables used_memslots / used_shared_memslots, used for detecting the number of free memslots. With used_memslots / used_shared_memslots set to 0 by vhost_commit() during device removal, we'll later assume that the other vhost devices still have plenty of memslots left when calling vhost_get_free_memslots(). Let's fix it by simply removing the global variables and depending only on the actual per-device count. Easy to reproduce by adding two vhost-user devices to a VM and then hot-unplugging one of them. While at it, detect unexpected underflows in vhost_get_free_memslots() and issue a warning. Reported-by: yuanminghao <yuanmh12@chinatelecom.cn> Link: https://lore.kernel.org/qemu-devel/20241121060755.164310-1-yuanmh12@chinatelecom.cn/ Fixes: 2ce68e4cf5be ("vhost: add vhost_has_free_slot() interface") Cc: Igor Mammedov <imammedo@redhat.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com> Message-Id: <20250603111336.1858888-1-david@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | virtio-net: Add queues for RSS during migrationAkihiko Odaki2025-07-141-7/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | virtio_net_pre_load_queues() inspects vdev->guest_features to tell if VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features is set at the point and VIRTIO_NET_F_RSS uses bit 60 while VIRTIO_NET_F_MQ uses bit 22. Instead of inferring the required number of queues from vdev->guest_features, use the number loaded from the vm state. This change also has a nice side effect to remove a duplicate peer queue pair change by circumventing virtio_net_set_multiqueue(). Also update the comment in include/hw/virtio/virtio.h to prevent an implementation of pre_load_queues() from refering to any fields being loaded during migration by accident in the future. Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing") Tested-by: Lei Yang <leiyang@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* memory: Unify the definiton of ReplayRamPopulate() and ReplayRamDiscard()Chenyi Qiang2025-06-231-11/+10
| | | | | | | | | | | | | | | Update ReplayRamDiscard() function to return the result and unify the ReplayRamPopulate() and ReplayRamDiscard() to ReplayRamDiscardState() at the same time due to their identical definitions. This unification simplifies related structures, such as VirtIOMEMReplayData, which makes it cleaner. Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-4-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
* memory: Change memory_region_set_ram_discard_manager() to return the resultChenyi Qiang2025-06-231-13/+17
| | | | | | | | | | | | | | | | | | | | | | | | Modify memory_region_set_ram_discard_manager() to return -EBUSY if a RamDiscardManager is already set in the MemoryRegion. The caller must handle this failure, such as having virtio-mem undo its actions and fail the realize() process. Opportunistically move the call earlier to avoid complex error handling. This change is beneficial when introducing a new RamDiscardManager instance besides virtio-mem. After ram_block_coordinated_discard_require(true) unlocks all RamDiscardManager instances, only one instance is allowed to be set for one MemoryRegion at present. Suggested-by: David Hildenbrand <david@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Tested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-3-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
* memory: Export a helper to get intersection of a MemoryRegionSection with a ↵Chenyi Qiang2025-06-231-27/+5
| | | | | | | | | | | | | | | | | | | given range Rename the helper to memory_region_section_intersect_range() to make it more generic. Meanwhile, define the @end as Int128 and replace the related operations with Int128_* format since the helper is exported as a wider API. Suggested-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com> Link: https://lore.kernel.org/r/20250612082747.51539-2-chenyi.qiang@intel.com Signed-off-by: Peter Xu <peterx@redhat.com>
* hw/virtio/virtio: avoid cost of -ftrivial-auto-var-init in hot pathStefan Hajnoczi2025-06-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7ff9ff039380 ("meson: mitigate against use of uninitialize stack for exploits") the -ftrivial-auto-var-init=zero compiler option is used to zero local variables. While this reduces security risks associated with uninitialized stack data, it introduced a measurable bottleneck in the virtqueue_split_pop() and virtqueue_packed_pop() functions. These virtqueue functions are in the hot path. They are called for each element (request) that is popped from a VIRTIO device's virtqueue. Using __attribute__((uninitialized)) on large stack variables in these functions improves fio randread bs=4k iodepth=64 performance from 304k to 332k IOPS (+9%). This issue was found using perf-top(1). virtqueue_split_pop() was one of the top CPU consumers and the "annotate" feature showed that the memory zeroing instructions at the beginning of the functions were hot. Fixes: 7ff9ff039380 ("meson: mitigate against use of uninitialize stack for exploits") Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20250610123709.835102-3-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* vfio: return mr from vfio_get_xlat_addrSteve Sistare2025-06-051-2/+7
| | | | | | | | | | | | | | | | | | | | | Modify memory_get_xlat_addr and vfio_get_xlat_addr to return the memory region that the translated address is found in. This will be needed by CPR in a subsequent patch to map blocks using IOMMU_IOAS_MAP_FILE. Also return the xlat offset, so we can simplify the interface by removing the out parameters that can be trivially derived from mr and xlat. Lastly, rename the functions to to memory_translate_iotlb() and vfio_translate_iotlb(). Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: John Levon <john.levon@nutanix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1747661203-136490-1-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2025-06-023-39/+86
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes, tests vhost will now no longer set a call notifier if unused some work towards loongarch testing based on bios-tables-test some core pci work for SVM support in vtd vhost vdpa init has been optimized for response time to QMP A couple more fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmg97ZUPHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpRBsH/0Fx4NNMaynXmVOgV1rMFirTydhQG5NSdeJv # i1RHd25Rne/RXH0CL71UPuOPADWh6bv9iZTg6RU6g7TwI8K9v3M0R71RlPLh1Lh1 # x7fifWNSNXVi18fM9/j+mIg7I2Ye0AaqveezRJWGzqoOxQKKlVI2xspKZBCCkygd # i2tgtR1ORB6+ji6wVoTDPlL42X5Jef5MUT3XOcRR5biHm0JfqxxQKVM83mD+5yMI # 0YqjT2BVRzo5rGN7mSuf7tQ50xI6I0wI1+eoWeKHRbg08f709M8TZRDKuVh24Evg # 9WnIhKLTzRVdCNLNbw9h9EhxoANpWCyvmnn6GCfkJui40necFHY= # =0lO6 # -----END PGP SIGNATURE----- # gpg: Signature made Mon 02 Jun 2025 14:29:41 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (26 commits) hw/i386/pc_piix: Fix RTC ISA IRQ wiring of isapc machine vdpa: move memory listener register to vhost_vdpa_init vdpa: move iova_tree allocation to net_vhost_vdpa_init vdpa: reorder listener assignment vdpa: add listener_registered vdpa: set backend capabilities at vhost_vdpa_init vdpa: reorder vhost_vdpa_set_backend_cap vdpa: check for iova tree initialized at net_client_start vhost: Don't set vring call if guest notifier is unused tests/qtest/bios-tables-test: Use MiB macro rather hardcode value tests/data/uefi-boot-images: Add ISO image for LoongArch system uefi-test-tools:: Add LoongArch64 support pci: Add a PCI-level API for PRI pci: Add a pci-level API for ATS pci: Add a pci-level initialization function for IOMMU notifiers memory: Store user data pointer in the IOMMU notifiers pci: Add an API to get IOMMU's min page size and virtual address width pci: Cache the bus mastering status in the device pcie: Helper functions to check to check if PRI is enabled pcie: Add a helper to declare the PRI capability for a pcie device ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * vdpa: move memory listener register to vhost_vdpa_initEugenio Pérez2025-06-021-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current memory operations like pinning may take a lot of time at the destination. Currently they are done after the source of the migration is stopped, and before the workload is resumed at the destination. This is a period where neigher traffic can flow, nor the VM workload can continue (downtime). We can do better as we know the memory layout of the guest RAM at the destination from the moment that all devices are initializaed. So moving that operation allows QEMU to communicate the kernel the maps while the workload is still running in the source, so Linux can start mapping them. As a small drawback, there is a time in the initialization where QEMU cannot respond to QMP etc. By some testing, this time is about 0.2seconds. This may be further reduced (or increased) depending on the vdpa driver and the platform hardware, and it is dominated by the cost of memory pinning. This matches the time that we move out of the called downtime window. The downtime is measured as the elapsed trace time between the last vhost_vdpa_suspend on the source and the last vhost_vdpa_set_vring_enable_one on the destination. In other words, from "guest CPUs freeze" to the instant the final Rx/Tx queue-pair is able to start moving data. Using ConnectX-6 Dx (MLX5) NICs in vhost-vDPA mode with 8 queue-pairs, the series reduces guest-visible downtime during back-to-back live migrations by more than half: - 39G VM: 4.72s -> 2.09s (-2.63s, ~56% improvement) - 128G VM: 14.72s -> 5.83s (-8.89s, ~60% improvement) Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-8-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vdpa: reorder listener assignmentEugenio Pérez2025-06-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit f6fe3e333f ("vdpa: move memory listener to vhost_vdpa_shared") this piece of code repeatedly assign shared->listener members. This was not a problem as it was not used until device start. However next patches move the listener registration to this vhost_vdpa_init function. When the listener is registered it is added to an embedded linked list, so setting its members again will cause memory corruption to the linked list node. Do the right thing and only set it in the first vdpa device. Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-6-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vdpa: add listener_registeredEugenio Pérez2025-06-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Check if the listener has been registered or not, so it needs to be registered again at start. Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-5-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vdpa: set backend capabilities at vhost_vdpa_initEugenio Pérez2025-06-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The backend does not reset them until the vdpa file descriptor is closed so there is no harm in doing it only once. This allows the destination of a live migration to premap memory in batches, using VHOST_BACKEND_F_IOTLB_BATCH. Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-4-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vdpa: reorder vhost_vdpa_set_backend_capEugenio Pérez2025-06-021-30/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | It will be used directly by vhost_vdpa_init. Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250522145839.59974-3-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: Don't set vring call if guest notifier is unusedHuaitong Han2025-06-021-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The vring call fd is set even when the guest does not use MSI-X (e.g., in the case of virtio PMD), leading to unnecessary CPU overhead for processing interrupts. The commit 96a3d98d2c("vhost: don't set vring call if no vector") optimized the case where MSI-X is enabled but the queue vector is unset. However, there's an additional case where the guest uses INTx and the INTx_DISABLED bit in the PCI config is set, meaning that no interrupt notifier will actually be used. In such cases, the vring call fd should also be cleared to avoid redundant interrupt handling. Fixes: 96a3d98d2c("vhost: don't set vring call if no vector") Reported-by: Zhiyuan Yuan <yuanzhiyuan@chinatelecom.cn> Signed-off-by: Jidong Xia <xiajd@chinatelecom.cn> Signed-off-by: Huaitong Han <hanht2@chinatelecom.cn> Message-Id: <20250522100548.212740-1-hanht2@chinatelecom.cn> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: check for validity of indirect descriptorsYuri Benditovich2025-06-011-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio processes indirect descriptors even if the respected feature VIRTIO_RING_F_INDIRECT_DESC was not negotiated. If qemu is used with reduced set of features to emulate the hardware device that does not support indirect descriptors, the will probably trigger problematic flows on the hardware setup but do not reveal the mistake on qemu. Add LOG_GUEST_ERROR for such case. This will issue logs with '-d guest_errors' in the command line Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Message-Id: <20250515063237.808293-1-yuri.benditovich@daynix.com> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
* | hw/virtio/virtio-pci: Remove VIRTIO_PCI_FLAG_DISABLE_PCIE definitionPhilippe Mathieu-Daudé2025-05-301-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VIRTIO_PCI_FLAG_DISABLE_PCIE was only used by the hw_compat_2_4[] array, via the 'x-disable-pcie=false' property. We removed all machines using that array, lets remove all the code around VIRTIO_PCI_FLAG_DISABLE_PCIE (see commit 9a4c0e220d8 for similar VIRTIO_PCI_FLAG_* enum removal). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20250512083948.39294-9-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
* | hw/virtio/virtio-pci: Remove VIRTIO_PCI_FLAG_MIGRATE_EXTRA definitionPhilippe Mathieu-Daudé2025-05-301-5/+1
|/ | | | | | | | | | | | | | VIRTIO_PCI_FLAG_MIGRATE_EXTRA was only used by the hw_compat_2_4[] array, via the 'migrate-extra=true' property. We removed all machines using that array, lets remove all the code around VIRTIO_PCI_FLAG_MIGRATE_EXTRA. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-ID: <20250512083948.39294-8-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
* virtio: Move virtio_reset()Akihiko Odaki2025-05-141-45/+43
| | | | | | | | | | | Move virtio_reset() to a later part of the file to remove the forward declaration of virtio_set_features_nocheck() and to prepare the situation that we need more operations to perform during reset. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20250421-reset-v2-2-e4c1ead88ea1@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: Call set_features during resetAkihiko Odaki2025-05-141-1/+3
| | | | | | | | | | | | | | | | | | | | | virtio-net expects set_features() will be called when the feature set used by the guest changes to update the number of virtqueues but it is not called during reset, which will clear all features, leaving the queues added for VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS. Not only these extra queues are visible to the guest, they will cause segmentation fault during migration. Call set_features() during reset to remove those queues for virtio-net as we call set_status(). It will also prevent similar bugs for virtio-net and other devices in the future. Fixes: f9d6dbf0bf6e ("virtio-net: remove virtio queues if the guest doesn't support multiqueue") Buglink: https://issues.redhat.com/browse/RHEL-73842 Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20250421-reset-v2-1-e4c1ead88ea1@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-user: return failure if backend crash when live migrationHaoqian He2025-05-1412-59/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Live migration should be terminated if the vhost-user backend crashes before the migration completes. Specifically, since the vhost device will be stopped when VM is stopped before the end of the live migration, in current implementation if the backend crashes, vhost-user device set_status() won't return failure, live migration won't perceive the disconnection between QEMU and the backend. When the VM is migrated to the destination, the inflight IO will be resubmitted, and if the IO was completed out of order before, it will cause IO error. To fix this issue: 1. Add the return value to set_status() for VirtioDeviceClass. a. For the vhost-user device, return failure when the backend crashes. b. For other virtio devices, always return 0. 2. Return failure if vhost_dev_stop() failed for vhost-user device. If QEMU loses connection with the vhost-user backend, virtio set_status() can return failure to the upper layer, migration_completion() can handle the error, terminate the live migration, and restore the VM, so that inflight IO can be completed normally. Signed-off-by: Haoqian He <haoqian.he@smartx.com> Message-Id: <20250416024729.3289157-4-haoqian.he@smartx.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost: return failure if stop virtqueue failed in vhost_dev_stopHaoqian He2025-05-141-10/+13
| | | | | | | | | | | | | | | | | | | | | | | This patch captures the error of vhost_virtqueue_stop() in vhost_dev_stop() and returns the error upward. Specifically, if QEMU is disconnected from the vhost backend, some actions in vhost_dev_stop() will fail, such as sending vhost-user messages to the backend (GET_VRING_BASE, SET_VRING_ENABLE) and vhost_reset_status. Considering that both set_vring_enable and vhost_reset_status require setting the specific virtio feature bit, we can capture vhost_virtqueue_stop()'s error to indicate that QEMU has lost connection with the backend. This patch is the pre patch for 'vhost-user: return failure if backend crashes when live migration', which makes the live migration aware of the loss of connection with the vhost-user backend and aborts the live migration. Signed-off-by: Haoqian He <haoqian.he@smartx.com> Message-Id: <20250416024729.3289157-3-haoqian.he@smartx.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>