summary refs log tree commit diff stats
path: root/docs/sphinx/fakedbusdoc.py (unfollow)
Commit message (Collapse)AuthorFilesLines
2024-02-12qapi: Indent tagged doc comment sections properlyMarkus Armbruster5-40/+42
docs/devel/qapi-code-gen demands that the "second and subsequent lines of sections other than "Example"/"Examples" should be indented". Commit a937b6aa739q (qapi: Reformat doc comments to conform to current conventions) missed a few instances, and messed up a few others. Clean that up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-5-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-02-12qapi/block-core: Fix BlockLatencyHistogramInfo doc markupMarkus Armbruster1-3/+1
The description of @bins ends with a literal block: # @bins: list of io request counts corresponding to histogram # intervals, one more element than @boundaries has. For the # example above, @bins may be something like [3, 1, 5, 2], and # corresponding histogram looks like: # # :: # # 5| * Except it actually ends *before* the block: the unindented '::' line starts a new section. Makes no sense. We could fix this by indenting the '::' line. Instead, double the colon at the end of the preceding paragraph, and drop the '::' line. This shifts the box for the literal block right in generated documentation, so it lines up with the description. Fixes: commit a0fcff383b34 (qapi: Use rST markup for literal blocks) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-4-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-02-12docs/devel/qapi-code-gen: Tweak doc comment whitespaceMarkus Armbruster1-2/+3
Missed in commit a937b6aa739 (qapi: Reformat doc comments to conform to current conventions). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-3-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-02-12docs/devel/qapi-code-gen: Normalize version refs x.y.0 to just x.yMarkus Armbruster1-2/+2
Missed in commit 9bc6e893b72 (qapi: Normalize version references x.y.0 to just x.y). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-2-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
2024-02-08virtio-blk: avoid using ioeventfd state in irqfd conditionalStefan Hajnoczi1-1/+1
Requests that complete in an IOThread use irqfd to notify the guest while requests that complete in the main loop thread use the traditional qdev irq code path. The reason for this conditional is that the irq code path requires the BQL: if (s->ioeventfd_started && !s->ioeventfd_disabled) { virtio_notify_irqfd(vdev, req->vq); } else { virtio_notify(vdev, req->vq); } There is a corner case where the conditional invokes the irq code path instead of the irqfd code path: static void virtio_blk_stop_ioeventfd(VirtIODevice *vdev) { ... /* * Set ->ioeventfd_started to false before draining so that host notifiers * are not detached/attached anymore. */ s->ioeventfd_started = false; /* Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */ blk_drain(s->conf.conf.blk); During blk_drain() the conditional produces the wrong result because ioeventfd_started is false. Use qemu_in_iothread() instead of checking the ioeventfd state. Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-15394 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240122172625.415386-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-blk: Use ioeventfd_attach in start_ioeventfdHanna Czenczek1-11/+10
Commit d3f6f294aeadd5f88caf0155e4360808c95b3146 ("virtio-blk: always set ioeventfd during startup") has made virtio_blk_start_ioeventfd() always kick the virtqueue (set the ioeventfd), regardless of whether the BB is drained. That is no longer necessary, because attaching the host notifier will now set the ioeventfd, too; this happens either immediately right here in virtio_blk_start_ioeventfd(), or later when the drain ends, in virtio_blk_ioeventfd_attach(). With event_notifier_set() removed, the code becomes the same as the one in virtio_blk_ioeventfd_attach(), so we can reuse that function. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-4-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio: Re-enable notifications after drainHanna Czenczek2-1/+48
During drain, we do not care about virtqueue notifications, which is why we remove the handlers on it. When removing those handlers, whether vq notifications are enabled or not depends on whether we were in polling mode or not; if not, they are enabled (by default); if so, they have been disabled by the io_poll_start callback. Because we do not care about those notifications after removing the handlers, this is fine. However, we have to explicitly ensure they are enabled when re-attaching the handlers, so we will resume receiving notifications. We do this in virtio_queue_aio_attach_host_notifier*(). If such a function is called while we are in a polling section, attaching the notifiers will then invoke the io_poll_start callback, re-disabling notifications. Because we will always miss virtqueue updates in the drained section, we also need to poll the virtqueue once after attaching the notifiers. Buglink: https://issues.redhat.com/browse/RHEL-3934 Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-3-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-scsi: Attach event vq notifier with no_pollHanna Czenczek1-1/+6
As of commit 38738f7dbbda90fbc161757b7f4be35b52205552 ("virtio-scsi: don't waste CPU polling the event virtqueue"), we only attach an io_read notifier for the virtio-scsi event virtqueue instead, and no polling notifiers. During operation, the event virtqueue is typically non-empty, but none of the buffers are intended to be used immediately. Instead, they only get used when certain events occur. Therefore, it makes no sense to continuously poll it when non-empty, because it is supposed to be and stay non-empty. We do this by using virtio_queue_aio_attach_host_notifier_no_poll() instead of virtio_queue_aio_attach_host_notifier() for the event virtqueue. Commit 766aa2de0f29b657148e04599320d771c36fd126 ("virtio-scsi: implement BlockDevOps->drained_begin()") however has virtio_scsi_drained_end() use virtio_queue_aio_attach_host_notifier() for all virtqueues, including the event virtqueue. This can lead to it being polled again, undoing the benefit of commit 38738f7dbbda90fbc161757b7f4be35b52205552. Fix it by using virtio_queue_aio_attach_host_notifier_no_poll() for the event virtqueue. Reported-by: Fiona Ebner <f.ebner@proxmox.com> Fixes: 766aa2de0f29b657148e04599320d771c36fd126 ("virtio-scsi: implement BlockDevOps->drained_begin()") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-2-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07blkio: Respect memory-alignment for bounce buffer allocationsKevin Wolf1-0/+3
blkio_alloc_mem_region() requires that the requested buffer size is a multiple of the memory-alignment property. If it isn't, the allocation fails with a return value of -EINVAL. Fix the call in blkio_resize_bounce_pool() to make sure the requested size is properly aligned. I observed this problem with vhost-vdpa, which requires page aligned memory. As the virtio-blk device behind it still had 512 byte blocks, we got bs->bl.request_alignment = 512, but actually any request that needed a bounce buffer and was not aligned to 4k would fail without this fix. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240131173140.42398-1-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07scsi: Don't ignore most usb-storage propertiesKevin Wolf3-28/+15
usb-storage is for the most part just a wrapper around an internally created scsi-disk device. It uses DEFINE_BLOCK_PROPERTIES() to offer all of the usual block device properties to the user, but then only forwards a few select properties to the internal device while the rest is silently ignored. This changes scsi_bus_legacy_add_drive() to accept a whole BlockConf instead of some individual values inside of it so that usb-storage can now pass the whole configuration to the internal scsi-disk. This enables the remaining block device properties, e.g. logical/physical_block_size or discard_granularity. Buglink: https://issues.redhat.com/browse/RHEL-22375 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240131130607.24117-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-blk: do not use C99 mixed declarationsStefan Hajnoczi1-7/+10
QEMU's coding style generally forbids C99 mixed declarations. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240206140410.65650-1-stefanha@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07iotests: give tempdir an identifying nameDaniel P. Berrangé1-1/+1
If something goes wrong causing the iotests not to cleanup their temporary directory, it is useful if the dir had an identifying name to show what is to blame. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240205155158.1843304-1-berrange@redhat.com> Revieved-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07iotests: fix leak of tmpdir in dry-run modeDaniel P. Berrangé1-1/+2
Creating an instance of the 'TestEnv' class will create a temporary directory. This dir is only deleted, however, in the __exit__ handler invoked by a context manager. In dry-run mode, we don't use the TestEnv via a context manager, so were leaking the temporary directory. Since meson invokes 'check' 5 times on each configure run, developers /tmp was filling up with empty temporary directories. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240205154019.1841037-1-berrange@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07scsi: Await request purgingHanna Czenczek1-9/+21
scsi_device_for_each_req_async() currently does not provide any way to be awaited. One of its callers is scsi_device_purge_requests(), which therefore currently does not guarantee that all requests are fully settled when it returns. We want all requests to be settled, because scsi_device_purge_requests() is called through the unrealize path, including the one invoked by virtio_scsi_hotunplug() through qdev_simple_device_unplug_cb(), which most likely assumes that all SCSI requests are done then. In fact, scsi_device_purge_requests() already contains a blk_drain(), but this will not fully await scsi_device_for_each_req_async(), only the I/O requests it potentially cancels (not the non-I/O requests). However, we can have scsi_device_for_each_req_async() increment the BB in-flight counter, and have scsi_device_for_each_req_async_bh() decrement it when it is done. This way, the blk_drain() will fully await all SCSI requests to be purged. This also removes the need for scsi_device_for_each_req_async_bh() to double-check the current context and potentially re-schedule itself, should it now differ from the BB's context: Changing a BB's AioContext with a root node is done through bdrv_try_change_aio_context(), which creates a drained section. With this patch, we keep the BB in-flight counter elevated throughout, so we know the BB's context cannot change. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202144755.671354-3-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07block-backend: Allow concurrent context changesHanna Czenczek1-11/+11
Since AioContext locks have been removed, a BlockBackend's AioContext may really change at any time (only exception is that it is often confined to a drained section, as noted in this patch). Therefore, blk_get_aio_context() cannot rely on its root node's context always matching that of the BlockBackend. In practice, whether they match does not matter anymore anyway: Requests can be sent to BDSs from any context, so anyone who requests the BB's context should have no reason to require the root node to have the same context. Therefore, we can and should remove the assertion to that effect. In addition, because the context can be set and queried from different threads concurrently, it has to be accessed with atomic operations. Buglink: https://issues.redhat.com/browse/RHEL-19381 Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202144755.671354-2-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07monitor: use aio_co_reschedule_self()Stefan Hajnoczi1-5/+2
The aio_co_reschedule_self() API is designed to avoid the race condition between scheduling the coroutine in another AioContext and yielding. The QMP dispatch code uses the open-coded version that appears susceptible to the race condition at first glance: aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self()); qemu_coroutine_yield(); The code is actually safe because the iohandler and qemu_aio_context AioContext run under the Big QEMU Lock. Nevertheless, set a good example and use aio_co_reschedule_self() so it's obvious that there is no race. Suggested-by: Hanna Reitz <hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240206190610.107963-6-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-blk: declare VirtIOBlock::rq with a typeStefan Hajnoczi1-1/+1
The VirtIOBlock::rq field has had the type void * since its introduction in commit 869a5c6df19a ("Stop VM on error in virtio-blk. (Gleb Natapov)"). Perhaps this was done to avoid the forward declaration of VirtIOBlockReq. Hanna Czenczek <hreitz@redhat.com> pointed out the missing type. Specify the actual type because there is no need to use void * here. Suggested-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240206190610.107963-5-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb()Stefan Hajnoczi1-0/+2
Hanna Czenczek <hreitz@redhat.com> noted that the array index in virtio_blk_dma_restart_cb() is not bounds-checked: g_autofree VirtIOBlockReq **vq_rq = g_new0(VirtIOBlockReq *, num_queues); ... while (rq) { VirtIOBlockReq *next = rq->next; uint16_t idx = virtio_get_queue_index(rq->vq); rq->next = vq_rq[idx]; ^^^^^^^^^^ The code is correct because both rq->vq and vq_rq[] depend on num_queues, but this is indirect and not 100% obvious. Add an assertion. Suggested-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240206190610.107963-4-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-blk: clarify that there is at least 1 virtqueueStefan Hajnoczi1-0/+1
It is not possible to instantiate a virtio-blk device with 0 virtqueues. The following check is located in ->realize(): if (!conf->num_queues) { error_setg(errp, "num-queues property must be larger than 0"); return; } Later on we access s->vq_aio_context[0] under the assumption that there is as least one virtqueue. Hanna Czenczek <hreitz@redhat.com> noted that it would help to show that the array index is already valid. Add an assertion to document that s->vq_aio_context[0] is always safe...and catch future code changes that break this assumption. Suggested-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240206190610.107963-3-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07virtio-blk: enforce iothread-vq-mapping validationStefan Hajnoczi1-81/+102
Hanna Czenczek <hreitz@redhat.com> noticed that the safety of `vq_aio_context[vq->value] = ctx;` with user-defined vq->value inputs is not obvious. The code is structured in validate() + apply() steps so input validation is there, but it happens way earlier and there is nothing that guarantees apply() can only be called with validated inputs. This patch moves the validate() call inside the apply() function so validation is guaranteed. I also added the bounds checking assertion that Hanna suggested. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240206190610.107963-2-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2024-02-07ci: Update comment for migration-compat-aarch64Peter Xu1-3/+4
It turns out that we may not be able to enable this test even for the upcoming v9.0. Document what we're still missing. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240207005403.242235-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07ci: Remove tag dependency for build-previous-qemuPeter Xu1-0/+2
The new build-previous-qemu job relies on QEMU release tag being present, while that may not be always true for personal git repositories since by default tag is not pushed. The job can fail on those CI kicks, as reported by Peter Maydell. Fix it by fetching the tags remotely from the official repository, as suggested by Dan. [1] https://lore.kernel.org/r/ZcC9ScKJ7VvqektA@redhat.com Reported-by: Peter Maydell <peter.maydell@linaro.org> Suggested-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240207005403.242235-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07tests/migration-test: Stick with gicv3 in aarch64 testPeter Xu1-1/+1
Recently we introduced cross-binary migration test. It's always wanted that migration-test uses stable guest ABI for both QEMU binaries in this case, so that both QEMU binaries will be compatible on the migration stream with the cmdline specified. Switch to a static gic version "3" rather than using version "max", so that GIC should be stable now across any future QEMU binaries for migration-test. Here the version can actually be anything as long as the ABI is stable. We choose "3" because it's the majority of what we already use in QEMU while still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has no direct user yet besides "max". Note that even with this change, aarch64 won't be able to work yet with migration cross binary test, but then the only missing piece will be the stable CPU model. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240207005403.242235-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration/multifd: Add a synchronization point for channel creationFabiano Rosas1-6/+26
It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the channel creation tasks are still in flight. This could lead to multifd_save_cleanup() executing the qemu_thread_join() loop too early and not waiting for the threads which haven't been created yet, leading to the freeing of resources that the newly created threads will try to access and crash. Add a synchronization point after which there will be no attempts at thread creation and therefore calling multifd_save_cleanup() past that point will ensure it properly waits for the threads. A note about performance: Prior to this patch, if a channel took too long to be established, other channels could finish connecting first and already start taking load. Now we're bounded by the slowest-connecting channel. Reported-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration/multifd: Unify multifd and TLS connection pathsFabiano Rosas1-43/+40
During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice, once to create the TLS handshake thread and another time after the asynchrounous TLS handshake has finished. This creates a slightly confusing call stack where multifd_channel_connect() is called more times than the number of channels. It also splits error handling between the two callers of multifd_channel_connect() causing some code duplication. Lastly, it gets in the way of having a single point to determine whether all channel creation tasks have been initiated. Refactor the code to move the reentrancy one level up at the multifd_new_send_channel_async() level, de-duplicating the error handling and allowing for the next patch to introduce a synchronization point common to all the multifd channel creation, regardless of TLS. Note that the previous code would never fail once p->c had been set. This patch changes this assumption, which affects refcounting, so add comments around object_unref to explain the situation. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration/multifd: Move multifd_send_setup into migration threadFabiano Rosas1-5/+5
We currently have an unfavorable situation around multifd channels creation and the migration thread execution. We create the multifd channels with qio_channel_socket_connect_async -> qio_task_run_in_thread, but only connect them at the multifd_new_send_channel_async callback, called from qio_task_complete, which is registered as a glib event. So at multifd_send_setup() we create the channels, but they will only be actually usable after the whole multifd_send_setup() calling stack returns back to the main loop. Which means that the migration thread is already up and running without any possibility for the multifd channels to be ready on time. We currently rely on the channels-ready semaphore blocking multifd_send_sync_main() until channels start to come up and release it. However there have been bugs recently found when a channel's creation fails and multifd_send_cleanup() is allowed to run while other channels are still being created. Let's start to organize this situation by moving the multifd_send_setup() call into the migration thread. That way we unblock the main-loop to dispatch the completion callbacks and actually have a chance of getting the multifd channels ready for when the migration thread needs them. The next patches will deal with the synchronization aspects. Note that this takes multifd_send_setup() out of the BQL. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration/multifd: Move multifd_send_setup error handling in to the functionFabiano Rosas3-13/+19
Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration/multifd: Remove p->runningFabiano Rosas2-20/+14
We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created. However, there are at least two bugs in this logic: 1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join(). 2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked. Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage. Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that. Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon <avihaih@nvidia.com> Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration/multifd: Join the TLS threadFabiano Rosas2-1/+9
We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-07migration: Fix logic of channels and transport compatibility checkAvihai Horon1-6/+11
The commit in the fixes line mistakenly modified the channels and transport compatibility check logic so it now checks multi-channel support only for socket transport type. Thus, running multifd migration using a transport other than socket that is incompatible with multi-channels (such as "exec") would lead to a segmentation fault instead of an error message. For example: (qemu) migrate_set_capability multifd on (qemu) migrate -d "exec:cat > /tmp/vm_state" Segmentation fault (core dumped) Fix it by checking multi-channel compatibility for all transport types. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-06meson: Link with libinotify on FreeBSDIlya Leoshkevich2-5/+24
make vm-build-freebsd fails with: ld: error: undefined symbol: inotify_init1 >>> referenced by filemonitor-inotify.c:183 (../src/util/filemonitor-inotify.c:183) >>> util_filemonitor-inotify.c.o:(qemu_file_monitor_new) in archive libqemuutil.a On FreeBSD the inotify functions are defined in libinotify.so. Add it to the dependencies. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240206002344.12372-5-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06test-util-filemonitor: Adapt to the FreeBSD inotify rename semanticsIlya Leoshkevich1-0/+8
Unlike on Linux, on FreeBSD renaming a file when the destination already exists results in an IN_DELETE event for that existing file: $ FILEMONITOR_DEBUG=1 build/tests/unit/test-util-filemonitor Rename /tmp/test-util-filemonitor-K13LI2/fish/one.txt -> /tmp/test-util-filemonitor-K13LI2/two.txt Event id=200000000 event=2 file=one.txt Queue event id 200000000 event 2 file one.txt Queue event id 100000000 event 2 file two.txt Queue event id 100000002 event 2 file two.txt Queue event id 100000000 event 0 file two.txt Queue event id 100000002 event 0 file two.txt Event id=100000000 event=0 file=two.txt Expected event 0 but got 2 This difference in behavior is not expected to break the real users, so teach the test to accept it. Suggested-by: "Daniel P. Berrange" <berrange@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240206002344.12372-4-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06tests/vm/freebsd: Reload the sshd configurationIlya Leoshkevich1-0/+1
After console_sshd_config(), the SSH server needs to be nudged to pick up the new configs. The scripts for the other BSD flavors already do this with a reboot, but a simple reload is sufficient. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240206002344.12372-3-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06tests/vm: Set UseDNS=no in the sshd configurationIlya Leoshkevich1-0/+2
make vm-build-freebsd sometimes fails with "Connection timed out during banner exchange". The client strace shows: 13:59:30 write(3, "SSH-2.0-OpenSSH_9.3\r\n", 21) = 21 13:59:30 getpid() = 252655 13:59:30 poll([{fd=3, events=POLLIN}], 1, 5000) = 1 ([{fd=3, revents=POLLIN}]) 13:59:32 read(3, "S", 1) = 1 13:59:32 poll([{fd=3, events=POLLIN}], 1, 3625) = 1 ([{fd=3, revents=POLLIN}]) 13:59:32 read(3, "S", 1) = 1 13:59:32 poll([{fd=3, events=POLLIN}], 1, 3625) = 1 ([{fd=3, revents=POLLIN}]) 13:59:32 read(3, "H", 1) = 1 There is a 2s delay during connection, and ConnectTimeout is set to 1. Raising it makes the issue go away, but we can do better. The server truss shows: 888: 27.811414714 socket(PF_INET,SOCK_DGRAM|SOCK_CLOEXEC,0) = 5 (0x5) 888: 27.811765030 connect(5,{ AF_INET 10.0.2.3:53 },16) = 0 (0x0) 888: 27.812166941 sendto(5,"\^Z/\^A\0\0\^A\0\0\0\0\0\0\^A2"...,39,0,NULL,0) = 39 (0x27) 888: 29.363970743 poll({ 5/POLLRDNORM },1,5000) = 1 (0x1) So the delay is due to a DNS query. Disable DNS queries in the server config. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240206002344.12372-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06target/s390x: Prefer fast cpu_env() over slower CPU QOM cast macroPhilippe Mathieu-Daudé7-25/+11
Mechanical patch produced running the command documented in scripts/coccinelle/cpu_env.cocci_template header. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240129164514.73104-25-philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06tests/tcg/s390x: Test CONVERT TO BINARYIlya Leoshkevich2-0/+103
Check the CVB's, CVBY's, and CVBG's corner cases. Co-developed-by: Pavel Zbitskiy <pavel.zbitskiy@gmail.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240205205830.6425-5-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06tests/tcg/s390x: Test CONVERT TO DECIMALIlya Leoshkevich2-0/+64
Check the CVD's, CVDY's, and CVDG's corner cases. Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240205205830.6425-4-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06target/s390x: Emulate CVB, CVBY and CVBGIlya Leoshkevich4-0/+98
Convert to Binary - counterparts of the already implemented Convert to Decimal (CVD*) instructions. Example from the Principles of Operation: 25594C becomes 63FA. Co-developed-by: Pavel Zbitskiy <pavel.zbitskiy@gmail.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20240205205830.6425-3-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06target/s390x: Emulate CVDGIlya Leoshkevich4-0/+31
CVDG is the same as CVD, except that it converts 64 bits into 128, rather than 32 into 64. Create a new helper, which uses Int128 wrappers. Reported-by: Ido Plat <Ido.Plat@ibm.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Message-ID: <20240205205830.6425-2-iii@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-06oslib-posix: initialize backend memory objects in parallelMark Kanda7-37/+145
QEMU initializes preallocated backend memory as the objects are parsed from the command line. This is not optimal in some cases (e.g. memory spanning multiple NUMA nodes) because the memory objects are initialized in series. Allow the initialization to occur in parallel (asynchronously). In order to ensure optimal thread placement, asynchronous initialization requires prealloc context threads to be in use. Signed-off-by: Mark Kanda <mark.kanda@oracle.com> Message-ID: <20240131165327.3154970-2-mark.kanda@oracle.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2024-02-06memory-device: reintroduce memory region size checkDavid Hildenbrand1-0/+14
We used to check that the memory region size is multiples of the overall requested address alignment for the device memory address. We removed that check, because there are cases (i.e., hv-balloon) where devices unconditionally request an address alignment that has a very large alignment (i.e., 32 GiB), but the actual memory device size might not be multiples of that alignment. However, this change: (a) allows for some practically impossible DIMM sizes, like "1GB+1 byte". (b) allows for DIMMs that partially cover hugetlb pages, previously reported in [1]. Both scenarios don't make any sense: we might even waste memory. So let's reintroduce that check, but only check that the memory region size is multiples of the memory region alignment (i.e., page size, huge page size), but not any additional memory device requirements communicated using md->get_min_alignment(). The following examples now fail again as expected: (a) 1M with 2M THP qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-ram,id=mem1,size=1M \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x200000 (b) 1G+1byte qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-ram,id=mem1,size=1073741825B \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x200000 (c) Unliagned hugetlb size (2M) qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-file,id=mem1,mem-path=/dev/hugepages/tmp,size=511M \ -device pc-dimm,id=dimm1,memdev=mem1 backend memory size must be multiple of 0x200000 (d) Unliagned hugetlb size (1G) qemu-system-x86_64 -m 4g,maxmem=16g,slots=1 -S -nodefaults -nographic \ -object memory-backend-file,id=mem1,mem-path=/dev/hugepages1G/tmp,size=2047M \ -device pc-dimm,id=dimm1,memdev=mem1 -> backend memory size must be multiple of 0x40000000 Note that this fix depends on a hv-balloon change to communicate its additional alignment requirements using get_min_alignment() instead of through the memory region. [1] https://lkml.kernel.org/r/f77d641d500324525ac036fe1827b3070de75fc1.1701088320.git.mprivozn@redhat.com Message-ID: <20240117135554.787344-3-david@redhat.com> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com> Reported-by: Michal Privoznik <mprivozn@redhat.com> Fixes: eb1b7c4bd413 ("memory-device: Drop size alignment check") Tested-by: Zhenyu Zhang <zhenyzha@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Signed-off-by: David Hildenbrand <david@redhat.com>
2024-02-06migration/multifd: Optimize sender side to be locklessPeter Xu2-27/+26
When reviewing my attempt to refactor send_prepare(), Fabiano suggested we try out with dropping the mutex in multifd code [1]. I thought about that before but I never tried to change the code. Now maybe it's time to give it a stab. This only optimizes the sender side. The trick here is multifd has a clear provider/consumer model, that the migration main thread publishes requests (either pending_job/pending_sync), while the multifd sender threads are consumers. Here we don't have a lot of complicated data sharing, and the jobs can logically be submitted lockless. Arm the code with atomic weapons. Two things worth mentioning: - For multifd_send_pages(): we can use qatomic_load_acquire() when trying to find a free channel, but that's expensive if we attach one ACQUIRE per channel. Instead, keep the qatomic_read() on reading the pending_job flag as we do already, meanwhile use one smp_mb_acquire() after the loop to guarantee the memory ordering. - For pending_sync: it doesn't have any extra data required since now p->flags are never touched, it should be safe to not use memory barrier. That's different from pending_job. Provide rich comments for all the lockless operations to state how they are paired. With that, we can remove the mutex. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05tcg/tci: Support TCG_COND_TST{EQ,NE}Richard Henderson2-1/+15
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-05tcg/s390x: Support TCG_COND_TST{EQ,NE}Richard Henderson2-44/+97
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2024-02-05docs/about: Deprecate the old "power5+" and "power7+" CPU namesThomas Huth1-0/+9
For consistency we should drop the names with a "+" in it in the long run. Message-ID: <20240117141054.73841-3-thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-05target/ppc/cpu-models: Rename power5+ and power7+ for new QOM naming rulesThomas Huth3-10/+8
The character "+" is now forbidden in QOM device names (see commit b447378e1217 - "Limit type names to alphanumerical and some few special characters"). For the "power5+" and "power7+" CPU names, there is currently a hack in type_name_is_valid() to still allow them for compatibility reasons. However, there is a much nicer solution for this: Simply use aliases! This way we can still support the old names without the need for the ugly hack in type_name_is_valid(). Message-ID: <20240117141054.73841-2-thuth@redhat.com> Reviewed-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Harsh Prateek Bora <harshpb@linux.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-05hw/scsi/lsi53c895a: add missing decrement of reentrancy counterSven Schnelle1-0/+1
When the maximum count of SCRIPTS instructions is reached, the code stops execution and returns, but fails to decrement the reentrancy counter. This effectively renders the SCSI controller unusable because on next entry the reentrancy counter is still above the limit. This bug was seen on HP-UX 10.20 which seems to trigger SCRIPTS loops. Fixes: b987718bbb ("hw/scsi/lsi53c895a: Fix reentrancy issues in the LSI controller (CVE-2023-0330)") Signed-off-by: Sven Schnelle <svens@stackframe.org> Message-ID: <20240128202214.2644768-1-svens@stackframe.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Tested-by: Helge Deller <deller@gmx.de> Signed-off-by: Thomas Huth <thuth@redhat.com>
2024-02-05migration/multifd: Fix MultiFDSendParams.packet_num racePeter Xu2-24/+34
As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy to be assigned and stored. Consider two consequent operations of: (1) queue a job into multifd send thread X, then (2) queue another sync request to the same send thread X. Then the MultiFDSendParams.packet_num will be assigned twice, and the first assignment can get lost already. To avoid that, we move the packet_num assignment from p->packet_num into where the thread will fill in the packet. Use atomic operations to protect the field, making sure there's no race. Note that atomic fetch_add() may not be good for scaling purposes, however multifd should be fine as number of threads should normally not go beyond 16 threads. Let's leave that concern for later but fix the issue first. There's also a trick on how to make it always work even on 32 bit hosts for uint64_t packet number. Switching to uintptr_t as of now to simply the case. It will cause packet number to overflow easier on 32 bit, but that shouldn't be a major concern for now as 32 bit systems is not the major audience for any performance concerns like what multifd wants to address. We also need to move multifd_send_state definition upper, so that multifd_send_fill_packet() can reference it. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05migration/multifd: Stick with send/recv on function namesPeter Xu3-16/+16
Most of the multifd code uses send/recv to represent the two sides, but some rare cases use save/load. Since send/recv is the majority, replacing the save/load use cases to use send/recv globally. Now we reach a consensus on the naming. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2024-02-05migration/multifd: Cleanup multifd_load_cleanup()Peter Xu1-22/+30
Use similar logic to cleanup the recv side. Note that multifd_recv_terminate_threads() may need some similar rework like the sender side, but let's leave that for later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>