summary refs log tree commit diff stats
path: root/hw/virtio/virtio.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* hw/virtio/virtio: Replace legacy cpu_physical_memory_map() callPhilippe Mathieu-Daudé2025-10-071-4/+6
| | | | | | | | | | Propagate VirtIODevice::dma_as to virtqueue_undo_map_desc() in order to replace the legacy cpu_physical_memory_unmap() call by address_space_unmap(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-18-philmd@linaro.org>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Richard Henderson2025-10-061-52/+98
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes users can now control VM bit in smbios. vhost-user-device is now user-createable. intel_iommu now supports PRI virtio-net now supports GSO over UDP tunnel ghes now supports error injection amd iommu now supports dma remapping for vfio better error messages for virtio small fixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmji0s0PHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpuH4H/09h70IqAWZGHIWKGmmGGtdKOj3g54KuI0Ss # mGECEsHvvBexOy670Qy8jdgXfaW4UuNui8BiOnJnGsBX8Y0dy+/yZori3KhkXkaY # D57Ap9agkpHem7Vw0zgNsAF2bzDdlzTiQ6ns5oDnSq8yt82onCb5WGkWTGkPs/jL # Gf8Jv+Ddcpt5SU4/hHPYC8pUhl7z4xPOOyl0Qp1GG21Pxf5v4sGFcWuGGB7UEPSQ # MjZeoM0rSnLDtNg18sGwD5RPLQs13TbtgsVwijI79c3w3rcSpPNhGR5OWkdRCIYF # 8A0Nhq0Yfo0ogTht7yt1QNPf/ktJkuoBuGVirvpDaix2tCBECes= # =Zvq/ # -----END PGP SIGNATURE----- # gpg: Signature made Sun 05 Oct 2025 01:19:25 PM PDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [unknown] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (75 commits) virtio: improve virtqueue mapping error messages pci: Fix wrong parameter passing to pci_device_get_iommu_bus_devfn() intel_iommu: Simplify caching mode check with VFIO device intel_iommu: Enable Enhanced Set Root Table Pointer Support (ESRTPS) vdpa-dev: add get_vhost() callback for vhost-vdpa device amd_iommu: HATDis/HATS=11 support intel-iommu: Move dma_translation to x86-iommu amd_iommu: Refactor amdvi_page_walk() to use common code for page walk amd_iommu: Do not assume passthrough translation when DTE[TV]=0 amd_iommu: Toggle address translation mode on devtab entry invalidation amd_iommu: Add dma-remap property to AMD vIOMMU device amd_iommu: Set all address spaces to use passthrough mode on reset amd_iommu: Toggle memory regions based on address translation mode amd_iommu: Invalidate address translations on INVALIDATE_IOMMU_ALL amd_iommu: Add replay callback amd_iommu: Unmap all address spaces under the AMD IOMMU on reset amd_iommu: Use iova_tree records to determine large page size on UNMAP amd_iommu: Sync shadow page tables on page invalidation amd_iommu: Add basic structure to support IOMMU notifier updates amd_iommu: Add a page walker to sync shadow page tables on invalidation ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * virtio: improve virtqueue mapping error messagesAlessandro Ratti2025-10-051-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve error reporting when virtqueue ring mapping fails by including a device identifier in the error message. Introduce a helper qdev_get_printable_name() in qdev-core, which returns either: - the device ID, if explicitly provided (e.g. -device ...,id=foo) - the QOM path from qdev_get_dev_path(dev) otherwise - "<unknown device>" as a fallback when no identifier is present This makes it easier to identify which device triggered the error in multi-device setups or when debugging complex guest configurations. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/230 Buglink: https://bugs.launchpad.net/qemu/+bug/1919021 Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Alessandro Ratti <alessandro@0x65c.net> Message-Id: <20250924093138.559872-2-alessandro@0x65c.net> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: support irqfd in virtio_notify_config()Stefan Hajnoczi2025-10-051-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio_error() calls virtio_notify_config() to inject a VIRTIO Configuration Change Notification. This doesn't work from IOThreads because the BQL is not held and the interrupt code path requires the BQL. Follow the same approach as virtio_notify() and use ->config_notifier (an irqfd) when called from the IOThread. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-4-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: unify virtio_notify_irqfd() and virtio_notify()Stefan Hajnoczi2025-10-051-15/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The difference between these two functions: - virtio_notify() uses the interrupt code path (MSI or classic IRQs) - virtio_notify_irqfd() uses guest notifiers (irqfds) virtio_notify() can only be called with the BQL held because the interrupt code path requires the BQL. Device models use virtio_notify_irqfd() from IOThreads since the BQL is not held. The two functions can be unified by pushing down the if (qemu_in_iothread()) check from virtio-blk and virtio-scsi into core virtio code. This is in preparation for the next commit that will add irqfd support to virtio_notify_config() and where it's unattractive to introduce another irqfd-only API for device model callers. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <20250922220149.498967-3-stefanha@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: add support for negotiating extended featuresPaolo Abeni2025-10-041-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The virtio specifications allows for a device features space up to 128 bits and more. Soon we are going to use some of the 'extended' bits features for the virtio net driver. Add support to allow extended features negotiation on a per devices basis. Devices willing to negotiated extended features need to implemented a new pair of features getter/setter, the core will conditionally use them instead of the basic one. Note that 'bad_features' don't need to be extended, as they are bound to the 64 bits limit. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <9bb29d70adc3f2b8c7756d4e3cd076cffee87826.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * virtio: serialize extended features statePaolo Abeni2025-10-041-31/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the driver uses any of the extended features (i.e. 64 or above), store the extended features range (64-127 bits). At load time, let legacy features initialize the full features range and pass it to the set helper; sub-states loading will have filled-up the extended part as needed. This is one of the few spots that need explicitly to know and set in stone the extended features array size; add a build bug to prevent breaking the migration should such size change again in the future: more serialization plumbing will be needed. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <d5d9d398675bee6c4c7d7308c5d3d5d3c6d17d87.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | migration: Remove error variant of vmstate_save_state() functionArun Menon2025-10-031-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* | migration: push Error **errp into vmstate_load_state()Arun Menon2025-10-031-2/+5
|/ | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* virtio: fix off-by-one and invalid access in virtqueue_ordered_fillJonah Palmer2025-08-011-6/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit b44135daa372 introduced virtqueue_ordered_fill for VIRTIO_F_IN_ORDER support but had a few issues: * Conditional while loop used 'steps <= max_steps' but should've been 'steps < max_steps' since reaching steps == max_steps would indicate that we didn't find an element, which is an error. Without this change, the code would attempt to read invalid data at an index outside of our search range. * Incremented 'steps' using the next chain's ndescs instead of the current one. This patch corrects the loop bounds and synchronizes 'steps' and index increments. We also add a defensive sanity check against malicious or invalid descriptor counts to avoid a potential infinite loop and DoS. Fixes: b44135daa372 ("virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER support") Reported-by: terrynini <terrynini38514@gmail.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20250721150208.2409779-1-jonah.palmer@oracle.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2025-07-161-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= # =sktm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits) hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add hw/cxl: Create helper function to create DC Event Records from extents hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config hw/mem: cxl_type3: Add DC Region bitmap lock hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info hw/cxl: fix DC extent capacity tracking tests: virt: Update expected ACPI tables for virt test hw/acpi/aml-build: Build a root node in the PPTT table hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes tests: virt: Allow changes to PPTT test table qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests hw/arm/virt: Let virt support pci hotplug/unplug GED event ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: net/vhost-vdpa.c vhost_vdpa_set_steering_ebpf() was removed, resolve the context conflict.
| * qemu: Convert target_words_bigendian() to TargetInfo APIPhilippe Mathieu-Daudé2025-07-151-1/+1
| | | | | | | | | | | | | | | | | | Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250708215320.70426-6-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | virtio-net: Add queues for RSS during migrationAkihiko Odaki2025-07-141-7/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | | virtio_net_pre_load_queues() inspects vdev->guest_features to tell if VIRTIO_NET_F_RSS or VIRTIO_NET_F_MQ is enabled to infer the required number of queues. This works for VIRTIO_NET_F_MQ but it doesn't for VIRTIO_NET_F_RSS because only the lowest 32 bits of vdev->guest_features is set at the point and VIRTIO_NET_F_RSS uses bit 60 while VIRTIO_NET_F_MQ uses bit 22. Instead of inferring the required number of queues from vdev->guest_features, use the number loaded from the vm state. This change also has a nice side effect to remove a duplicate peer queue pair change by circumventing virtio_net_set_multiqueue(). Also update the comment in include/hw/virtio/virtio.h to prevent an implementation of pre_load_queues() from refering to any fields being loaded during migration by accident in the future. Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing") Tested-by: Lei Yang <leiyang@redhat.com> Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* hw/virtio/virtio: avoid cost of -ftrivial-auto-var-init in hot pathStefan Hajnoczi2025-06-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7ff9ff039380 ("meson: mitigate against use of uninitialize stack for exploits") the -ftrivial-auto-var-init=zero compiler option is used to zero local variables. While this reduces security risks associated with uninitialized stack data, it introduced a measurable bottleneck in the virtqueue_split_pop() and virtqueue_packed_pop() functions. These virtqueue functions are in the hot path. They are called for each element (request) that is popped from a VIRTIO device's virtqueue. Using __attribute__((uninitialized)) on large stack variables in these functions improves fio randread bs=4k iodepth=64 performance from 304k to 332k IOPS (+9%). This issue was found using perf-top(1). virtqueue_split_pop() was one of the top CPU consumers and the "annotate" feature showed that the memory zeroing instructions at the beginning of the functions were hot. Fixes: 7ff9ff039380 ("meson: mitigate against use of uninitialize stack for exploits") Cc: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 20250610123709.835102-3-berrange@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* virtio: check for validity of indirect descriptorsYuri Benditovich2025-06-011-0/+11
| | | | | | | | | | | | | | | virtio processes indirect descriptors even if the respected feature VIRTIO_RING_F_INDIRECT_DESC was not negotiated. If qemu is used with reduced set of features to emulate the hardware device that does not support indirect descriptors, the will probably trigger problematic flows on the hardware setup but do not reveal the mistake on qemu. Add LOG_GUEST_ERROR for such case. This will issue logs with '-d guest_errors' in the command line Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com> Message-Id: <20250515063237.808293-1-yuri.benditovich@daynix.com> Signed-off-by: Yuri Benditovich <yuri.benditovich@daynix.com>
* virtio: Move virtio_reset()Akihiko Odaki2025-05-141-45/+43
| | | | | | | | | | | Move virtio_reset() to a later part of the file to remove the forward declaration of virtio_set_features_nocheck() and to prepare the situation that we need more operations to perform during reset. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20250421-reset-v2-2-e4c1ead88ea1@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: Call set_features during resetAkihiko Odaki2025-05-141-1/+3
| | | | | | | | | | | | | | | | | | | | | virtio-net expects set_features() will be called when the feature set used by the guest changes to update the number of virtqueues but it is not called during reset, which will clear all features, leaving the queues added for VIRTIO_NET_F_MQ or VIRTIO_NET_F_RSS. Not only these extra queues are visible to the guest, they will cause segmentation fault during migration. Call set_features() during reset to remove those queues for virtio-net as we call set_status(). It will also prevent similar bugs for virtio-net and other devices in the future. Fixes: f9d6dbf0bf6e ("virtio-net: remove virtio queues if the guest doesn't support multiqueue") Buglink: https://issues.redhat.com/browse/RHEL-73842 Cc: qemu-stable@nongnu.org Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Message-Id: <20250421-reset-v2-1-e4c1ead88ea1@daynix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* vhost-user: return failure if backend crash when live migrationHaoqian He2025-05-141-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Live migration should be terminated if the vhost-user backend crashes before the migration completes. Specifically, since the vhost device will be stopped when VM is stopped before the end of the live migration, in current implementation if the backend crashes, vhost-user device set_status() won't return failure, live migration won't perceive the disconnection between QEMU and the backend. When the VM is migrated to the destination, the inflight IO will be resubmitted, and if the IO was completed out of order before, it will cause IO error. To fix this issue: 1. Add the return value to set_status() for VirtioDeviceClass. a. For the vhost-user device, return failure when the backend crashes. b. For other virtio devices, always return 0. 2. Return failure if vhost_dev_stop() failed for vhost-user device. If QEMU loses connection with the vhost-user backend, virtio set_status() can return failure to the upper layer, migration_completion() can handle the error, terminate the live migration, and restore the VM, so that inflight IO can be completed normally. Signed-off-by: Haoqian He <haoqian.he@smartx.com> Message-Id: <20250416024729.3289157-4-haoqian.he@smartx.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* system/runstate: add VM state change cb with return valueHaoqian He2025-05-141-1/+1
| | | | | | | | | | | | | | | | | | | | | This patch adds the new VM state change cb type `VMChangeStateHandlerWithRet`, which has return value for `VMChangeStateEntry`. Thus, we can register a new VM state change cb with return value for device. Note that `VMChangeStateHandler` and `VMChangeStateHandlerWithRet` are mutually exclusive and cannot be provided at the same time. This patch is the pre patch for 'vhost-user: return failure if backend crashes when live migration', which makes the live migration aware of the loss of connection with the vhost-user backend and aborts the live migration. Virtio device will use VMChangeStateHandlerWithRet. Signed-off-by: Haoqian He <haoqian.he@smartx.com> Message-Id: <20250416024729.3289157-2-haoqian.he@smartx.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* exec: Rename target_words_bigendian() -> target_big_endian()Philippe Mathieu-Daudé2025-04-251-1/+1
| | | | | | | | | | | | | | | | | In commit 98ed8ecfc9d ("exec: introduce target_words_bigendian() helper") target_words_bigendian() was matching the definition it was depending on (TARGET_WORDS_BIGENDIAN). Later in commit ee3eb3a7ce7 ("Replace TARGET_WORDS_BIGENDIAN") the definition was renamed as TARGET_BIG_ENDIAN but we didn't update the helper. Do it now mechanically using: $ sed -i -e s/target_words_bigendian/target_big_endian/g \ $(git grep -wl target_words_bigendian) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250417210025.68322-1-philmd@linaro.org>
* qom: Have class_init() take a const data argumentPhilippe Mathieu-Daudé2025-04-251-1/+1
| | | | | | | | | | Mechanical change using gsed, then style manually adapted to pass checkpatch.pl script. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250424194905.82506-4-philmd@linaro.org>
* cleanup: Drop pointless return at end of functionMarkus Armbruster2025-04-241-2/+0
| | | | | | | | | | | A few functions now end with a label. The next commit will clean them up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250407082643.2310002-3-armbru@redhat.com> [Straightforward conflict with commit 988ad4ccebb6 (hw/loongarch/virt: Fix cpuslot::cpu set at last in virt_cpu_plug()) resolved]
* Merge tag 'exec-20241220' of https://github.com/philmd/qemu into stagingStefan Hajnoczi2024-12-211-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accel & Exec patch queue - Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander) - Add '-d invalid_mem' logging option (Zoltan) - Create QOM containers explicitly (Peter) - Rename sysemu/ -> system/ (Philippe) - Re-orderning of include/exec/ headers (Philippe) Move a lot of declarations from these legacy mixed bag headers: . "exec/cpu-all.h" . "exec/cpu-common.h" . "exec/cpu-defs.h" . "exec/exec-all.h" . "exec/translate-all" to these more specific ones: . "exec/page-protection.h" . "exec/translation-block.h" . "user/cpu_loop.h" . "user/guest-host.h" . "user/page-protection.h" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t # wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt # KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K # A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8 # 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe/// # 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r # xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl # VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay # ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP # 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd # +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6 # x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo= # =cjz8 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 20 Dec 2024 11:45:20 EST # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits) util/qemu-timer: fix indentation meson: Do not define CONFIG_DEVICES on user emulation system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header system/numa: Remove unnecessary 'exec/cpu-common.h' header hw/xen: Remove unnecessary 'exec/cpu-common.h' header target/mips: Drop left-over comment about Jazz machine target/mips: Remove tswap() calls in semihosting uhi_fstat_cb() target/xtensa: Remove tswap() calls in semihosting simcall() helper accel/tcg: Un-inline translator_is_same_page() accel/tcg: Include missing 'exec/translation-block.h' header accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h' accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h' qemu/coroutine: Include missing 'qemu/atomic.h' header exec/translation-block: Include missing 'qemu/atomic.h' header accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h' exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined target/sparc: Move sparc_restore_state_to_opc() to cpu.c target/sparc: Uninline cpu_get_tb_cpu_state() target/loongarch: Declare loongarch_cpu_dump_state() locally user: Move various declarations out of 'exec/exec-all.h' ... Conflicts: hw/char/riscv_htif.c hw/intc/riscv_aplic.c target/s390x/cpu.c Apply sysemu header path changes to not in the pull request. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * include: Rename sysemu/ -> system/Philippe Mathieu-Daudé2024-12-201-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
* | include/hw/qdev-properties: Remove DEFINE_PROP_END_OF_LISTRichard Henderson2024-12-191-1/+0
|/ | | | | | | | | | | | | | Now that all of the Property arrays are counted, we can remove the terminator object from each array. Update the assertions in device_class_set_props to match. With struct Property being 88 bytes, this was a rather large form of terminator. Saves 30k from qemu-system-aarch64. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://lore.kernel.org/r/20241218134251.4724-21-richard.henderson@linaro.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* hw/virtio: Constify all PropertyRichard Henderson2024-12-151-1/+1
| | | | | Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* virtio-net: Add queues before loading themAkihiko Odaki2024-11-261-0/+7
| | | | | | | | | | | | | Call virtio_net_set_multiqueue() to add queues before loading their states. Otherwise the loaded queues will not have handlers and elements in them will not be processed. Cc: qemu-stable@nongnu.org Fixes: 8c49756825da ("virtio-net: Add only one queue pair when realizing") Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* virtio: rename virtio_split_packed_update_used_idxWenyu Huang2024-09-111-2/+2
| | | | | | | | | | | virtio_split_packed_update_used_idx should be virtio_queue_split_update_used_idx like virtio_split_packed_update_used_idx. Signed-off-by: Wenyu Huang <huangwenyuu@outlook.com> Message-Id: <TYBP286MB036536B9015994AA5F3E4495ACB22@TYBP286MB0365.JPNP286.PROD.OUTLOOK.COM> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: Always reset vhost devicesHanna Czenczek2024-09-101-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Requiring `vhost_started` to be true for resetting vhost devices in `virtio_reset()` seems like the wrong condition: Most importantly, the preceding `virtio_set_status(vdev, 0)` call will (for vhost devices) end up in `vhost_dev_stop()` (through vhost devices' `.set_status` implementations), setting `vdev->vhost_started = false`. Therefore, the gated `vhost_reset_device()` call is unreachable. `vhost_started` is not documented, so it is hard to say what exactly it is supposed to mean, but judging from the fact that `vhost_dev_start()` sets it and `vhost_dev_stop()` clears it, it seems like it indicates whether there is a vhost back-end, and whether that back-end is currently running and processing virtio requests. Making a reset conditional on whether the vhost back-end is processing virtio requests seems wrong; in fact, it is probably better to reset it only when it is not currently processing requests, which is exactly the current order of operations in `virtio_reset()`: First, the back-end is stopped through `virtio_set_status(vdev, 0)`, then we want to send a reset. Therefore, we should drop the `vhost_started` condition, but in its stead we then have to verify that we can indeed send a reset to this vhost device, by not just checking `k->get_vhost != NULL` (introduced by commit 95e1019a4a9), but also that the vhost back-end is connected (`hdev = k->get_vhost(); hdev != NULL && hdev->vhost_ops != NULL`). Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-Id: <20240723163941.48775-3-hreitz@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio-net: Fix network stall at the host side waiting for kickthomas2024-08-021-4/+60
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch 06b12970174 ("virtio-net: fix network stall under load") added double-check to test whether the available buffer size can satisfy the request or not, in case the guest has added some buffers to the avail ring simultaneously after the first check. It will be lucky if the available buffer size becomes okay after the double-check, then the host can send the packet to the guest. If the buffer size still can't satisfy the request, even if the guest has added some buffers, viritio-net would stall at the host side forever. The patch enables notification and checks whether the guest has added some buffers since last check of available buffers when the available buffers are insufficient. If no buffer is added, return false, else recheck the available buffers in the loop. If the available buffers are sufficient, disable notification and return true. Changes: 1. Change the return type of virtqueue_get_avail_bytes() from void to int, it returns an opaque that represents the shadow_avail_idx of the virtqueue on success, else -1 on error. 2. Add a new API: virtio_queue_enable_notification_and_check(), it takes an opaque as input arg which is returned from virtqueue_get_avail_bytes(). It enables notification firstly, then checks whether the guest has added some buffers since last check of available buffers or not by virtio_queue_poll(), return ture if yes. The patch also reverts patch "06b12970174". The case below can reproduce the stall. Guest 0 +--------+ | iperf | ---------------> | server | Host | +--------+ +--------+ | ... | iperf |---- | client |---- Guest n +--------+ | +--------+ | | iperf | ---------------> | server | +--------+ Boot many guests from qemu with virtio network: qemu ... -netdev tap,id=net_x \ -device virtio-net-pci-non-transitional,\ iommu_platform=on,mac=xx:xx:xx:xx:xx:xx,netdev=net_x Each guest acts as iperf server with commands below: iperf3 -s -D -i 10 -p 8001 iperf3 -s -D -i 10 -p 8002 The host as iperf client: iperf3 -c guest_IP -p 8001 -i 30 -w 256k -P 20 -t 40000 iperf3 -c guest_IP -p 8002 -i 30 -w 256k -P 20 -t 40000 After some time, the host loses connection to the guest, the guest can send packet to the host, but can't receive packet from the host. It's more likely to happen if SWIOTLB is enabled in the guest, allocating and freeing bounce buffer takes some CPU ticks, copying from/to bounce buffer takes more CPU ticks, compared with that there is no bounce buffer in the guest. Once the rate of producing packets from the host approximates the rate of receiveing packets in the guest, the guest would loop in NAPI. receive packets --- | | v | free buf virtnet_poll | | v | add buf to avail ring --- | | need kick the host? | NAPI continues v receive packets --- | | v | free buf virtnet_poll | | v | add buf to avail ring --- | v ... ... On the other hand, the host fetches free buf from avail ring, if the buf in the avail ring is not enough, the host notifies the guest the event by writing the avail idx read from avail ring to the event idx of used ring, then the host goes to sleep, waiting for the kick signal from the guest. Once the guest finds the host is waiting for kick singal (in virtqueue_kick_prepare_split()), it kicks the host. The host may stall forever at the sequences below: Host Guest ------------ ----------- fetch buf, send packet receive packet --- ... ... | fetch buf, send packet add buf | ... add buf virtnet_poll buf not enough avail idx-> add buf | read avail idx add buf | add buf --- receive packet --- write event idx ... | wait for kick add buf virtnet_poll ... | --- no more packet, exit NAPI In the first loop of NAPI above, indicated in the range of virtnet_poll above, the host is sending packets while the guest is receiving packets and adding buffers. step 1: The buf is not enough, for example, a big packet needs 5 buf, but the available buf count is 3. The host read current avail idx. step 2: The guest adds some buf, then checks whether the host is waiting for kick signal, not at this time. The used ring is not empty, the guest continues the second loop of NAPI. step 3: The host writes the avail idx read from avail ring to used ring as event idx via virtio_queue_set_notification(q->rx_vq, 1). step 4: At the end of the second loop of NAPI, recheck whether kick is needed, as the event idx in the used ring written by the host is beyound the range of kick condition, the guest will not send kick signal to the host. Fixes: 06b12970174 ("virtio-net: fix network stall under load") Cc: qemu-stable@nongnu.org Signed-off-by: Wencheng Yang <east.moutain.yang@gmail.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
* virtio: virtqueue_ordered_flush - VIRTIO_F_IN_ORDER supportJonah Palmer2024-07-211-1/+70
| | | | | | | | | | | | | | | | | | | | | | | | Add VIRTIO_F_IN_ORDER feature support for the virtqueue_flush operation. The goal of the virtqueue_ordered_flush operation when the VIRTIO_F_IN_ORDER feature has been negotiated is to write elements to the used/descriptor ring in-order and then update used_idx. The function iterates through the VirtQueueElement used_elems array in-order starting at vq->used_idx. If the element is valid (filled), the element is written to the used/descriptor ring. This process continues until we find an invalid (not filled) element. For packed VQs, the first entry (at vq->used_idx) is written to the descriptor ring last so the guest doesn't see any invalid descriptors. If any elements were written, the used_idx is updated. Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240710125522.4168043-5-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Eugenio Pérez <eperezma@redhat.com>
* virtio: virtqueue_ordered_fill - VIRTIO_F_IN_ORDER supportJonah Palmer2024-07-211-1/+43
| | | | | | | | | | | | | | | | | | | Add VIRTIO_F_IN_ORDER feature support for the virtqueue_fill operation. The goal of the virtqueue_ordered_fill operation when the VIRTIO_F_IN_ORDER feature has been negotiated is to search for this now-used element, set its length, and mark the element as filled in the VirtQueue's used_elems array. By marking the element as filled, it will indicate that this element has been processed and is ready to be flushed, so long as the element is in-order. Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240710125522.4168043-4-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: virtqueue_pop - VIRTIO_F_IN_ORDER supportJonah Palmer2024-07-211-1/+15
| | | | | | | | | | | | | | | | | | | Add VIRTIO_F_IN_ORDER feature support in virtqueue_split_pop and virtqueue_packed_pop. VirtQueueElements popped from the available/descritpor ring are added to the VirtQueue's used_elems array in-order and in the same fashion as they would be added the used and descriptor rings, respectively. This will allow us to keep track of the current order, what elements have been written, as well as an element's essential data after being processed. Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240710125522.4168043-3-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: remove virtio_tswap16s() call in vring_packed_event_read()Stefano Garzarella2024-07-031-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Commit d152cdd6f6 ("virtio: use virtio accessor to access packed event") switched using of address_space_read_cached() to virito_lduw_phys_cached() to access packed descriptor event. When we used address_space_read_cached(), we needed to call virtio_tswap16s() to handle the endianess of the field, but virito_lduw_phys_cached() already handles it internally, so we no longer need to call virtio_tswap16s() (as the commit had done for `off_wrap`, but forgot for `flags`). Fixes: d152cdd6f6 ("virtio: use virtio accessor to access packed event") Cc: jasowang@redhat.com Cc: qemu-stable@nongnu.org Reported-by: Xoykie <xoykie@gmail.com> Link: https://lore.kernel.org/qemu-devel/CAFU8RB_pjr77zMLsM0Unf9xPNxfr_--Tjr49F_eX32ZBc5o2zQ@mail.gmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20240701075208.19634-1-sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/virtio: Fix obtain the buffer id from the last descriptorWafer2024-07-011-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | The virtio-1.3 specification <https://docs.oasis-open.org/virtio/virtio/v1.3/virtio-v1.3.html> writes: 2.8.6 Next Flag: Descriptor Chaining Buffer ID is included in the last descriptor in the list. If the feature (_F_INDIRECT_DESC) has been negotiated, install only one descriptor in the virtqueue. Therefor the buffer id should be obtained from the first descriptor. In descriptor chaining scenarios, the buffer id should be obtained from the last descriptor. Fixes: 86044b24e8 ("virtio: basic packed virtqueue support") Signed-off-by: Wafer <wafer@jaguarmicro.com> Reviewed-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20240510072753.26158-2-wafer@jaguarmicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: Prevent creation of device using notification-data with ioeventfdJonah Palmer2024-07-011-0/+22
| | | | | | | | | | | | | | | | | | | | | | Prevent the realization of a virtio device that attempts to use the VIRTIO_F_NOTIFICATION_DATA transport feature without disabling ioeventfd. Due to ioeventfd not being able to carry the extra data associated with this feature, having both enabled is a functional mismatch and therefore Qemu should not continue the device's realization process. Although the device does not yet know if the feature will be successfully negotiated, many devices using this feature wont actually work without this extra data and would fail FEATURES_OK anyway. If ioeventfd is able to work with the extra notification data in the future, this compatibility check can be removed. Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240315165557.26942-3-jonah.palmer@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio/virtio-pci: Handle extra notification dataJonah Palmer2024-07-011-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | Add support to virtio-pci devices for handling the extra data sent from the driver to the device when the VIRTIO_F_NOTIFICATION_DATA transport feature has been negotiated. The extra data that's passed to the virtio-pci device when this feature is enabled varies depending on the device's virtqueue layout. In a split virtqueue layout, this data includes: - upper 16 bits: shadow_avail_idx - lower 16 bits: virtqueue index In a packed virtqueue layout, this data includes: - upper 16 bits: 1-bit wrap counter & 15-bit shadow_avail_idx - lower 16 bits: virtqueue index Signed-off-by: Jonah Palmer <jonah.palmer@oracle.com> Message-Id: <20240315165557.26942-2-jonah.palmer@oracle.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* exec: Declare target_words_bigendian() in 'exec/tswap.h'Philippe Mathieu-Daudé2024-04-261-0/+1
| | | | | | | | | | | | | | We usually check target endianess before swapping values, so target_words_bigendian() declaration makes sense in "exec/tswap.h" with the target swapping helpers. Remove "hw/core/cpu.h" when it was only included to get the target_words_bigendian() declaration. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Anton Johansson <anjo@rev.ng> Message-Id: <20231212123401.37493-16-philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
* hw/virtio: Introduce virtio_bh_new_guarded() helperPhilippe Mathieu-Daudé2024-04-101-0/+10
| | | | | | | | | | | | | Introduce virtio_bh_new_guarded(), similar to qemu_bh_new_guarded() but using the transport memory guard, instead of the device one (there can only be one virtio device per virtio bus). Inspired-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-Id: <20240409105537.18308-2-philmd@linaro.org>
* hw/virtio: Fix packed virtqueue flush used_idxWafer2024-04-091-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the event of writing many chains of descriptors, the device must write just the id of the last buffer in the descriptor chain, skip forward the number of descriptors in the chain, and then repeat the operations for the rest of chains. Current QEMU code writes all the buffer ids consecutively, and then skips all the buffers altogether. This is a bug, and can be reproduced with a VirtIONet device with _F_MRG_RXBUB and without _F_INDIRECT_DESC: If a virtio-net device has the VIRTIO_NET_F_MRG_RXBUF feature but not the VIRTIO_RING_F_INDIRECT_DESC feature, 'VirtIONetQueue->rx_vq' will use the merge feature to store data in multiple 'elems'. The 'num_buffers' in the virtio header indicates how many elements are merged. If the value of 'num_buffers' is greater than 1, all the merged elements will be filled into the descriptor ring. The 'idx' of the elements should be the value of 'vq->used_idx' plus 'ndescs'. Fixes: 86044b24e8 ("virtio: basic packed virtqueue support") Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Wafer <wafer@jaguarmicro.com> Message-Id: <20240407015451.5228-2-wafer@jaguarmicro.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* Revert "hw/virtio: Add support for VDPA network simulation devices"Michael S. Tsirkin2024-04-091-39/+0
| | | | | | | | | | | | | | | This reverts commit cd341fd1ffded978b2aa0b5309b00be7c42e347c. The patch adds non-upstream code in include/standard-headers/linux/virtio_pci.h which would make maintainance harder. Revert for now. Suggested-by: Jason Wang <jasowang@redhat.com> Message-Id: <df6b6b465753e754a19459e8cd61416548f89a42.1712569644.git.mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* hw/virtio: Add support for VDPA network simulation devicesHao Chen2024-03-121-0/+39
| | | | | | | | | | | | | This patch adds support for VDPA network simulation devices. The device is developed based on virtio-net and tap backend, and supports hardware live migration function. For more details, please refer to "docs/system/devices/vdpa-net.rst" Signed-off-by: Hao Chen <chenh@yusur.tech> Message-Id: <20240221073802.2888022-1-chenh@yusur.tech> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: Re-enable notifications after drainHanna Czenczek2024-02-071-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | During drain, we do not care about virtqueue notifications, which is why we remove the handlers on it. When removing those handlers, whether vq notifications are enabled or not depends on whether we were in polling mode or not; if not, they are enabled (by default); if so, they have been disabled by the io_poll_start callback. Because we do not care about those notifications after removing the handlers, this is fine. However, we have to explicitly ensure they are enabled when re-attaching the handlers, so we will resume receiving notifications. We do this in virtio_queue_aio_attach_host_notifier*(). If such a function is called while we are in a polling section, attaching the notifiers will then invoke the io_poll_start callback, re-disabling notifications. Because we will always miss virtqueue updates in the drained section, we also need to poll the virtqueue once after attaching the notifiers. Buglink: https://issues.redhat.com/browse/RHEL-3934 Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-3-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* hw/virtio: Constify VMStateRichard Henderson2023-12-301-14/+14
| | | | | Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20231221031652.119827-61-richard.henderson@linaro.org>
* vhost-user-scsi: free the inflight area when resetLi Feng2023-12-021-1/+1
| | | | | | | | | | | Keep it the same to vhost-user-blk. At the same time, fix the vhost_reset_device. Signed-off-by: Li Feng <fengli@smartx.com> Message-Id: <20231123055431.217792-3-fengli@smartx.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: use defer_call() in virtio_irqfd_notify()Stefan Hajnoczi2023-10-311-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used Buffer Notifications from an IOThread. This involves an eventfd write(2) syscall. Calling this repeatedly when completing multiple I/O requests in a row is wasteful. Use the defer_call() API to batch together virtio_irqfd_notify() calls made during thread pool (aio=threads), Linux AIO (aio=native), and io_uring (aio=io_uring) completion processing. Behavior is unchanged for emulated devices that do not use defer_call_begin()/defer_call_end() since defer_call() immediately invokes the callback when called outside a defer_call_begin()/defer_call_end() region. fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could be noise. Detailed performance data and configuration specifics are available here: https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd This duplicates the BH that virtio-blk uses for batching. The next commit will remove it. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230913200045.1024233-4-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* virtio: call ->vhost_reset_device() during resetStefan Hajnoczi2023-10-221-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vhost-user-scsi has a VirtioDeviceClass->reset() function that calls ->vhost_reset_device(). The other vhost devices don't notify the vhost device upon reset. Stateful vhost devices may need to handle device reset in order to free resources or prevent stale device state from interfering after reset. Call ->vhost_device_reset() from virtio_reset() so that that vhost devices are notified of device reset. This patch affects behavior as follows: - vhost-kernel: No change in behavior since ->vhost_reset_device() is not implemented. - vhost-user: back-ends that negotiate VHOST_USER_PROTOCOL_F_RESET_DEVICE now receive a VHOST_USER_DEVICE_RESET message upon device reset. Otherwise there is no change in behavior. DPDK, SPDK, libvhost-user, and the vhost-user-backend crate do not negotiate VHOST_USER_PROTOCOL_F_RESET_DEVICE automatically. - vhost-vdpa: an extra SET_STATUS 0 call is made during device reset. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20231004014532.1228637-4-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
* virtio: remove unused next argument from virtqueue_split_read_next_desc()Ilya Maximets2023-10-041-10/+8
| | | | | | | | | | | | | | | | | | | | | | The 'next' was converted from a local variable to an output parameter in commit: 412e0e81b174 ("virtio: handle virtqueue_read_next_desc() errors") But all the actual uses of the 'i/next' as an output were removed a few months prior in commit: aa570d6fb6bd ("virtio: combine the read of a descriptor") Remove the unused argument to simplify the code. Also, adding a comment to the function to describe what it is actually doing, as it is not obvious that the 'desc' is both an input and an output argument. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Message-Id: <20230927140016.2317404-3-i.maximets@ovn.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: remove unnecessary thread fence while reading next descriptorIlya Maximets2023-10-041-2/+0
| | | | | | | | | | | | | | | | | | | It was supposed to be a compiler barrier and it was a compiler barrier initially called 'wmb' when virtio core support was introduced. Later all the instances of 'wmb' were switched to smp_wmb to fix memory ordering issues on non-x86 platforms. However, this one doesn't need to be an actual barrier, as its only purpose was to ensure that the value is not read twice. And since commit aa570d6fb6bd ("virtio: combine the read of a descriptor") there is no need for a barrier at all, since we're no longer reading guest memory here, but accessing a local structure. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Message-Id: <20230927140016.2317404-2-i.maximets@ovn.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio: use shadow_avail_idx while checking number of headsIlya Maximets2023-10-041-3/+15
| | | | | | | | | | | | | | | | | We do not need the most up to date number of heads, we only want to know if there is at least one. Use shadow variable as long as it is not equal to the last available index checked. This avoids expensive qatomic dereference of the RCU-protected memory region cache as well as the memory access itself. The change improves performance of the af-xdp network backend by 2-3%. Signed-off-by: Ilya Maximets <i.maximets@ovn.org> Message-Id: <20230927135157.2316982-1-i.maximets@ovn.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>