summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | s390x/pci: Honor DMA limits set by vfioMatthew Rosato2020-11-016-11/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an s390 guest is using lazy unmapping, it can result in a very large number of oustanding DMA requests, far beyond the default limit configured for vfio. Let's track DMA usage similar to vfio in the host, and trigger the guest to flush their DMA mappings before vfio runs out. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: non-Linux build fixes] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | s390x/pci: Add routine to get the vfio dma available countMatthew Rosato2020-11-013-0/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create new files for separating out vfio-specific work for s390 pci. Add the first such routine, which issues VFIO_IOMMU_GET_INFO ioctl to collect the current dma available count. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: Fix non-Linux build with CONFIG_LINUX] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Find DMA available capabilityMatthew Rosato2020-11-012-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The underlying host may be limiting the number of outstanding DMA requests for type 1 IOMMU. Add helper functions to check for the DMA available capability and retrieve the current number of DMA mappings allowed. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: vfio_get_info_dma_avail moved inside CONFIG_LINUX] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Create shared routine for scanning info capabilitiesMatthew Rosato2020-11-011-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than duplicating the same loop in multiple locations, create a static function to do the work. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | s390x/pci: Move header files to include/hw/s390xMatthew Rosato2020-11-016-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | Seems a more appropriate location for them. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | linux-headers: update against 5.10-rc1Matthew Rosato2020-11-0128-16/+294
| | | | | | | | | | | | | | | | | | | | | | | | commit 3650b228f83adda7e5ee532e2b90429c03f7b9ec Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> [aw: drop pvrdma_ring.h changes to avoid revert of d73415a31547 ("qemu/atomic.h: rename atomic_ to qatomic_")] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | update-linux-headers: Add vfio_zdev.hMatthew Rosato2020-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | vfio_zdev.h is used by s390x zPCI support to pass device-specific CLP information between host and userspace. Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> Acked-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | qapi: Add VFIO devices migration stats in Migration statsKirti Wankhede2020-11-016-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | Added amount of bytes transferred to the VM at destination by all VFIO devices Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Make vfio-pci device migration capableKirti Wankhede2020-11-012-21/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the device is not a failover primary device, call vfio_migration_probe() and vfio_migration_finalize() to enable migration support for those devices that support it respectively to tear it down again. Removed migration blocker from VFIO PCI device specific structure and use migration blocker from generic structure of VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add ioctl to get dirty pages bitmap during dma unmapKirti Wankhede2020-11-011-4/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase of migration. In that case, unmap ioctl should return pages pinned in that range and QEMU should find its correcponding guest physical addresses and report those dirty. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> [aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Dirty page tracking when vIOMMU is enabledKirti Wankhede2020-11-012-6/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When vIOMMU is enabled, register MAP notifier from log_sync when all devices in container are in stop and copy phase of migration. Call replay and get dirty pages from notifier callback. Suggested-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add vfio_listener_log_sync to mark dirty pagesKirti Wankhede2020-11-012-0/+117
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vfio_listener_log_sync gets list of dirty pages from container using VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all devices are stopped and saving state. Return early for the RAM block section of mapped MMIO region. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> [aw: fix error_report types, fix cpu_physical_memory_set_dirty_lebitmap() cast] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add function to start and stop dirty pages trackingKirti Wankhede2020-11-011-0/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking for VFIO devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Get migration capability flags for containerKirti Wankhede2020-11-013-9/+91
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added helper functions to get IOMMU info capability chain. Added function to get migration capability information from that capability chain for IOMMU container. Similar change was proposed earlier: https://lists.gnu.org/archive/html/qemu-devel/2018-05/msg03759.html Disable migration for devices if IOMMU module doesn't support migration capability. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Cc: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Cc: Eric Auger <eric.auger@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabledKirti Wankhede2020-11-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mr->ram_block is NULL when mr->is_iommu is true, then fr.dirty_log_mask wasn't set correctly due to which memory listener's log_sync doesn't get called. This patch returns log_mask with DIRTY_MEMORY_MIGRATION set when IOMMU is enabled. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add load state functions to SaveVMHandlersKirti Wankhede2020-11-012-0/+199
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sequence during _RESUMING device state: While data for this device is available, repeat below steps: a. read data_offset from where user application should write data. b. write data of data_size to migration region from data_offset. c. write data_size which indicates vendor driver that data is written in staging buffer. For user, data is opaque. User should write data in the same order as received. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add save state functions to SaveVMHandlersKirti Wankhede2020-11-013-0/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy functions. These functions handles pre-copy and stop-and-copy phase. In _SAVING|_RUNNING device state or pre-copy phase: - read pending_bytes. If pending_bytes > 0, go through below steps. - read data_offset - indicates kernel driver to write data to staging buffer. - read data_size - amount of data in bytes written by vendor driver in migration region. - read data_size bytes of data from data_offset in the migration region. - Write data packet to file stream as below: {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data, VFIO_MIG_FLAG_END_OF_STATE } In _SAVING device state or stop-and-copy phase a. read config space of device and save to migration file stream. This doesn't need to be from vendor driver. Any other special config state from driver can be saved as data in following iteration. b. read pending_bytes. If pending_bytes > 0, go through below steps. c. read data_offset - indicates kernel driver to write data to staging buffer. d. read data_size - amount of data in bytes written by vendor driver in migration region. e. read data_size bytes of data from data_offset in the migration region. f. Write data packet as below: {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data} g. iterate through steps b to f while (pending_bytes > 0) h. Write {VFIO_MIG_FLAG_END_OF_STATE} When data region is mapped, its user's responsibility to read data from data_offset of data_size before moving to next steps. Added fix suggested by Artem Polyakov to reset pending_bytes in vfio_save_iterate(). Added fix suggested by Zhi Wang to add 0 as data size in migration stream and add END_OF_STATE delimiter to indicate phase complete. Suggested-by: Artem Polyakov <artemp@nvidia.com> Suggested-by: Zhi Wang <zhi.wang.linux@gmail.com> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Register SaveVMHandlers for VFIO deviceKirti Wankhede2020-11-012-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define flags to be used as delimiter in migration stream for VFIO devices. Added .save_setup and .save_cleanup functions. Map & unmap migration region from these functions at source during saving or pre-copy phase. Set VFIO device state depending on VM's state. During live migration, VM is running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO device. During save-restore, VM is paused, _SAVING state is set for VFIO device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add migration state change notifierKirti Wankhede2020-11-013-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Added migration state change notifier to get notification on migration state change. These states are translated to VFIO device state and conveyed to vendor driver. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add VM state change handler to know state of VMKirti Wankhede2020-11-013-0/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VM state change handler is called on change in VM's state. Based on VM state, VFIO device state should be changed. Added read/write helper functions for migration region. Added function to set device_state. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> [aw: lx -> HWADDR_PRIx, remove redundant parens] Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add migration region initialization and finalize functionKirti Wankhede2020-11-014-0/+135
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whether the VFIO device supports migration or not is decided based of migration region query. If migration region query is successful and migration region initialization is successful then migration is supported else migration is blocked. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add save and load functions for VFIO PCI devicesKirti Wankhede2020-11-012-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | Added functions to save and restore PCI device specific data, specifically config space of PCI device. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add vfio_get_object callback to VFIODeviceOpsKirti Wankhede2020-11-012-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hook vfio_get_object callback for PCI devices. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Suggested-by: Cornelia Huck <cohuck@redhat.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
| * | vfio: Add function to unmap VFIO regionKirti Wankhede2020-11-013-4/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This function will be used for migration region. Migration region is mmaped when migration starts and will be unmapped when migration is complete. Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com> Reviewed-by: Neo Jia <cjia@nvidia.com> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
* | | Merge remote-tracking branch 'remotes/ericb/tags/pull-nbd-2020-10-27-v2' ↵Peter Maydell2020-11-0122-152/+446
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging nbd patches for 2020-10-27 - Tweak the new block-export-add QMP command - Allow multiple -B options for qemu-nbd - Add qemu:allocation-depth metadata context as qemu-nbd -A - Improve iotest use of NBD # gpg: Signature made Fri 30 Oct 2020 20:22:42 GMT # gpg: using RSA key 71C2CC22B1C4602927D2F3AAA7A16B4A2527436A # gpg: Good signature from "Eric Blake <eblake@redhat.com>" [full] # gpg: aka "Eric Blake (Free Software Programmer) <ebb9@byu.net>" [full] # gpg: aka "[jpeg image of size 6874]" [full] # Primary key fingerprint: 71C2 CC22 B1C4 6029 27D2 F3AA A7A1 6B4A 2527 436A * remotes/ericb/tags/pull-nbd-2020-10-27-v2: nbd: Add 'qemu-nbd -A' to expose allocation depth nbd: Add new qemu:allocation-depth metadata context block: Return depth level during bdrv_is_allocated_above nbd: Allow export of multiple bitmaps for one device nbd: Refactor counting of metadata contexts nbd: Simplify qemu bitmap context name nbd: Update qapi to support exporting multiple bitmaps nbd: Utilize QAPI_CLONE for type conversion qapi: Add QAPI_LIST_PREPEND() macro block: Simplify QAPI_LIST_ADD iotests/291: Stop NBD server iotests/291: Filter irrelevant parts of img-info Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | nbd: Add 'qemu-nbd -A' to expose allocation depthEric Blake2020-10-309-11/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the server to expose an additional metacontext to be requested by savvy clients. qemu-nbd adds a new option -A to expose the qemu:allocation-depth metacontext through NBD_CMD_BLOCK_STATUS; this can also be set via QMP when using block-export-add. qemu as client is hacked into viewing the key aspects of this new context by abusing the already-experimental x-dirty-bitmap option to collapse all depths greater than 2, which results in a tri-state value visible in the output of 'qemu-img map --output=json' (yes, that means x-dirty-bitmap is now a bit of a misnomer, but I didn't feel like renaming it as it would introduce a needless break of back-compat, even though we make no compat guarantees with x- members): unallocated (depth 0) => "zero":false, "data":true local (depth 1) => "zero":false, "data":false backing (depth 2+) => "zero":true, "data":true libnbd as client is probably a nicer way to get at the information without having to decipher such hacks in qemu as client. ;) Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-11-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
| * | nbd: Add new qemu:allocation-depth metadata contextEric Blake2020-10-303-12/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'qemu-img map' provides a way to determine which extents of an image come from the top layer vs. inherited from a backing chain. This is useful information worth exposing over NBD. There is a proposal to add a QMP command block-dirty-bitmap-populate which can create a dirty bitmap that reflects allocation information, at which point the qemu:dirty-bitmap:NAME metadata context can expose that information via the creation of a temporary bitmap, but we can shorten the effort by adding a new qemu:allocation-depth metadata context that does the same thing without an intermediate bitmap (this patch does not eliminate the need for that proposal, as it will have other uses as well). While documenting things, remember that although the NBD protocol has NBD_OPT_SET_META_CONTEXT, the rest of its documentation refers to 'metadata context', which is a more apt description of what is actually being used by NBD_CMD_BLOCK_STATUS: the user is requesting metadata by passing one or more context names. So I also touched up some existing wording to prefer the term 'metadata context' where it makes sense. Note that this patch does not actually enable any way to request a server to enable this context; that will come in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-10-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
| * | block: Return depth level during bdrv_is_allocated_aboveEric Blake2020-10-305-13/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When checking for allocation across a chain, it's already easy to count the depth within the chain at which the allocation is found. Instead of throwing that information away, return it to the caller. Existing callers only cared about allocated/non-allocated, but having a depth available will be used by NBD in the next patch. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-9-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> [eblake: rebase to master] Signed-off-by: Eric Blake <eblake@redhat.com>
| * | nbd: Allow export of multiple bitmaps for one deviceEric Blake2020-10-302-34/+72
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With this, 'qemu-nbd -B b0 -B b1 -f qcow2 img.qcow2' can let you sniff out multiple bitmaps from one server. qemu-img as client can still only read one bitmap per client connection, but other NBD clients (hello libnbd) can now read multiple bitmaps in a single pass. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201027050556.269064-8-eblake@redhat.com>
| * | nbd: Refactor counting of metadata contextsEric Blake2020-10-301-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than open-code the count of negotiated contexts at several sites, embed it directly into the struct. This will make it easier for upcoming commits to support even more simultaneous contexts. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-7-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
| * | nbd: Simplify qemu bitmap context nameEric Blake2020-10-301-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Each dirty bitmap already knows its name; by reducing the scope of the places where we construct "qemu:dirty-bitmap:NAME" strings, tracking the name is more localized, and there are fewer per-export fields to worry about. This in turn will make it easier for an upcoming patch to export more than one bitmap at once. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20201027050556.269064-6-eblake@redhat.com>
| * | nbd: Update qapi to support exporting multiple bitmapsEric Blake2020-10-305-29/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 'block-export-add' is new to 5.2, we can still tweak the interface; there, allowing 'bitmaps':['str'] is nicer than 'bitmap':'str'. This wires up the qapi and qemu-nbd changes to permit passing multiple bitmaps as distinct metadata contexts that the NBD client may request, but the actual support for more than one will require a further patch to the server. Note that there are no changes made to the existing deprecated 'nbd-server-add' command; this required splitting the QAPI type BlockExportOptionsNbd, which fortunately does not affect QMP introspection. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-5-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Peter Krempa <pkrempa@redhat.com>
| * | nbd: Utilize QAPI_CLONE for type conversionEric Blake2020-10-301-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than open-coding the translation from the deprecated NbdServerAddOptions type to the preferred BlockExportOptionsNbd, it's better to utilize QAPI_CLONE_MEMBERS. This solves a couple of issues: first, if we do any more refactoring of the base type (which an upcoming patch plans to do), we don't have to revisit the open-coding. Second, our assignment to arg->name is fishy: the generated QAPI code for qapi_free_NbdServerAddOptions does not visit arg->name if arg->has_name is false, but if it DID visit it, we would have introduced a double-free situation when arg is finally freed. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201027050556.269064-4-eblake@redhat.com>
| * | qapi: Add QAPI_LIST_PREPEND() macroEric Blake2020-10-302-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | block.c has a useful macro QAPI_LIST_ADD() for inserting at the front of any QAPI-generated list; move it from block.c to qapi/util.h so more places can use it, including one earlier place in block.c, and rename it to something more obvious (since we also have a lot of places that append, rather than prepend, to a list). There are many more places in the codebase that can benefit from using the macro, but converting them will be left to later patches. In theory, all QAPI list types are child classes of GenericList; but in practice, that relationship is not explicitly spelled out in the C type declarations (rather, it is something that happens implicitly due to C compatible layouts), and the macro does not actually depend on the GenericList type. We considered moving GenericList from visitor.h into util.h to group related code; however, such a move would be awkward if we do not also move GenericAlternate. Unfortunately, moving GenericAlternate would introduce its own problems of declaration circularity (qapi-builtin-types.h needs a complete definition of QEnumLookup from util.h, but GenericAlternate needs a complete definition of QType from qapi-builtin-types.h). Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20201027050556.269064-3-eblake@redhat.com> [eblake: s/ADD/PREPEND/ per suggestion by Markus]
| * | block: Simplify QAPI_LIST_ADDEric Blake2020-10-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to rely on the verbosity of the gcc/clang compiler extension of g_new(typeof(X), 1) when we can instead use the standard g_malloc(sizeof(X)). In general, we like g_new over g_malloc for returning type X rather than void* to let the compiler catch more potential typing mistakes, but in this particular macro, our other use of typeof on the same line already ensures we are getting correct results. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <20201027050556.269064-2-eblake@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
| * | iotests/291: Stop NBD serverMax Reitz2020-10-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nbd_server_start_unix_socket() includes an implicit nbd_server_stop(), but we still need an explicit one at the end of the test (where there follows no next nbd_server_start_unix_socket()), or qemu-nbd will linger until the test exits. This will become important when enabling this test to run on FUSE exports, because then the export (which is the image used by qemu-nbd) will go away before qemu-nbd exits, which will lead to qemu-nbd complaining that it cannot flush the bitmaps in the image. Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027164416.144115-3-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
| * | iotests/291: Filter irrelevant parts of img-infoMax Reitz2020-10-302-23/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to let _img_info emit the format-specific information so we get the list of bitmaps we want, but we do not need anything but the bitmaps. So filter out everything that is irrelevant to us. (Ideally, this would be a generalized function in common.filters that takes a list of things to keep, but that would require implementing an anti-bitmap filter, which would be hard, and which we do not need here. So that is why this function is just a local hack.) This lets 291 pass with qcow2 options like refcount_bits or data_file again. Fixes: 14f16bf9474c860ecc127a66a86961942319f7af ("qemu-img: Support bitmap --merge into backing image") Signed-off-by: Max Reitz <mreitz@redhat.com> Message-Id: <20201027164416.144115-2-mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Eric Blake <eblake@redhat.com>
* | | Merge remote-tracking branch 'remotes/mst/tags/for_upstream' into stagingPeter Maydell2020-11-0116-56/+122
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pc,pci,vhost,virtio: misc fixes Just a bunch of bugfixes all over the place. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # gpg: Signature made Fri 30 Oct 2020 12:44:31 GMT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * remotes/mst/tags/for_upstream: intel_iommu: Fix two misuse of "0x%u" prints virtio: skip guest index check on device load vhost-blk: set features before setting inflight feature pci: Disallow improper BAR registration for type 1 pci: Change error_report to assert(3) pci: advertise a page aligned ATS pc: Implement -no-hpet as sugar for -machine hpet=on vhost: Don't special case vq->used_phys in vhost_get_log_size() pci: Assert irqnum is between 0 and bus->nirqs in pci_bus_change_irq_level hw/pci: Extract pci_bus_change_irq_level() from pci_change_irq_level() hw/virtio/vhost-vdpa: Fix Coverity CID 1432864 acpi/crs: Support ranges > 32b for hosts acpi/crs: Prevent bad ranges for host bridges vhost-vsock: set vhostfd to non-blocking mode vhost-vdpa: negotiate VIRTIO_NET_F_STATUS with driver Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * | intel_iommu: Fix two misuse of "0x%u" printsPeter Xu2020-10-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave magically found this. Fix them with "0x%x". Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20201019173922.100270-1-peterx@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | virtio: skip guest index check on device loadFelipe Franciosi2020-10-301-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU must be careful when loading device state off migration streams to prevent a malicious source from exploiting the emulator. Overdoing these checks has the side effect of allowing a guest to "pin itself" in cloud environments by messing with state which is entirely in its control. Similarly to what f3081539 achieved in usb_device_post_load(), this commit removes such a check from virtio_load(). Worth noting, the result of a load without this check is the same as if a guest enables a VQ with invalid indexes to begin with. That is, the virtual device is set in a broken state (by the datapath handler) and must be reset. Signed-off-by: Felipe Franciosi <felipe@nutanix.com> Message-Id: <20201028134643.110698-1-felipe@nutanix.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vhost-blk: set features before setting inflight featureJin Yu2020-10-303-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Virtqueue has split and packed, so before setting inflight, you need to inform the back-end virtqueue format. Signed-off-by: Jin Yu <jin.yu@intel.com> Message-Id: <20200910134851.7817-1-jin.yu@intel.com> Acked-by: Raphael Norwitz <raphael.norwitz@nutanix.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | pci: Disallow improper BAR registration for type 1Ben Widawsky2020-10-301-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prevent future developers working on root complexes, root ports, or bridges that also wish to implement a BAR for those, from shooting themselves in the foot. PCI type 1 headers only support 2 base address registers. It is incorrect and difficult to figure out what is wrong with the device when this mistake is made. With this, it is immediate and obvious what has gone wrong. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Message-Id: <20201015181411.89104-2-ben.widawsky@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | pci: Change error_report to assert(3)Ben Widawsky2020-10-301-5/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Asserts are used for developer bugs. As registering a bar of the wrong size is not something that should be possible for a user to achieve, this is a developer bug. While here, use the more obvious helper function. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Message-Id: <20201015181411.89104-1-ben.widawsky@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
| * | pci: advertise a page aligned ATSJason Wang2020-10-301-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After Linux kernel commit 61363c1474b1 ("iommu/vt-d: Enable ATS only if the device uses page aligned address."), ATS will be only enabled if device advertises a page aligned request. Unfortunately, vhost-net is the only user and we don't advertise the aligned request capability in the past since both vhost IOTLB and address_space_get_iotlb_entry() can support non page aligned request. Though it's not clear that if the above kernel commit makes sense. Let's advertise a page aligned ATS here to make vhost device IOTLB work with Intel IOMMU again. Note that in the future we may extend pcie_ats_init() to accept parameters like queue depth and page alignment. Cc: qemu-stable@nongnu.org Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20200909081731.24688-1-jasowang@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | pc: Implement -no-hpet as sugar for -machine hpet=onEduardo Habkost2020-10-305-25/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of yet another global variable. The default will be hpet=on only if CONFIG_HPET=y. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Message-Id: <20201021144716.1536388-1-ehabkost@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | vhost: Don't special case vq->used_phys in vhost_get_log_size()Greg Kurz2020-10-301-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first loop in vhost_get_log_size() computes the size of the dirty log bitmap so that it allows to track changes in the entire guest memory, in terms of GPA. When not using a vIOMMU, the address of the vring's used structure, vq->used_phys, is a GPA. It is thus already covered by the first loop. When using a vIOMMU, vq->used_phys is a GIOVA that will be translated to an HVA when the vhost backend needs to update the used structure. It will log the corresponding GPAs into the bitmap but it certainly won't log the GIOVA. So in any case, vq->used_phys shouldn't be explicitly used to size the bitmap. Drop the second loop. This fixes a crash of the source when migrating a guest using in-kernel vhost-net and iommu_platform=on on POWER, because DMA regions are put over 0x800000000000000ULL. The resulting insanely huge log size causes g_malloc0() to abort. BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1879349 Signed-off-by: Greg Kurz <groug@kaod.org> Message-Id: <160208823418.29027.15172801181796272300.stgit@bahia.lan> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | pci: Assert irqnum is between 0 and bus->nirqs in pci_bus_change_irq_levelMark Cave-Ayland2020-10-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These assertions similar to those in the adjacent pci_bus_get_irq_level() function ensure that irqnum lies within the valid PCI bus IRQ range. Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Message-Id: <20201011082022.3016-1-mark.cave-ayland@ilande.co.uk> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20201024203900.3619498-3-f4bug@amsat.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | hw/pci: Extract pci_bus_change_irq_level() from pci_change_irq_level()Philippe Mathieu-Daudé2020-10-301-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extract pci_bus_change_irq_level() from pci_change_irq_level() to make it clearer it operates on the bus. Reported-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20201024203900.3619498-2-f4bug@amsat.org> Reviewed-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | hw/virtio/vhost-vdpa: Fix Coverity CID 1432864Philippe Mathieu-Daudé2020-10-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix uninitialized value issues reported by Coverity: Field 'msg.reserved' is uninitialized when calling write(). Fixes: a5bd05800f8 ("vhost-vdpa: batch updating IOTLB mappings") Reported-by: Coverity (CID 1432864: UNINIT) Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20201028154004.776760-1-philmd@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * | acpi/crs: Support ranges > 32b for hostsBen Widawsky2020-10-301-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to PCIe spec 5.0 Type 1 header space Base Address Registers are defined by 7.5.1.2.1 Base Address Registers (same as Type 0). The _CRS region should allow for the same range (up to 64b). Prior to this change, any host bridge utilizing more than 32b for the BAR would have the address truncated and likely lead to conflicts when the operating systems reads the _CRS object. Signed-off-by: Ben Widawsky <ben.widawsky@intel.com> Message-Id: <20201026193924.985014-2-ben.widawsky@intel.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com>