focaccia-qemu - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	migration/postcopy: Push blocktime start/end into page req mutex	Peter Xu	2025-07-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The postcopy blocktime feature was tricky that it used quite some atomic operations over quite a few arrays and vars, without explaining how that would be thread safe. The thread safety here is about concurrency between the fault thread and the fault resolution threads, possible to access the same chunk of data. All these atomic ops can be expensive too before knowing clearly how it works. OTOH, postcopy has one page_request_mutex used to serialize the received bitmap updates. So far it's ok - we don't yet have a lot of threads contending the lock. It might change after multifd will be supported, but that's a separate story. What is important is, with that mutex, it's pretty lightweight to move all the blocktime maintenance into the mutex critical section. It's because the blocktime layer is lightweighted: almost "remember which vcpu faulted on which address", and "ok we get some fault resolved, calculate how long it takes". It's also an optional feature for now (but I have thought of changing that, maybe in the future). Let's push the blocktime layer into the mutex, so that it's always thread-safe even without any atomic ops. To achieve that, I'll need to add a tid parameter on fault path so that it'll start to pass the faulted thread ID into deeper the stack, but not too deep. When at it, add a comment for the shared fault handler (for example, vhost-user devices running with postcopy), to mention a TODO. One reason it might not be trivial is that vhost-user's userfaultfds should be opened by vhost-user process, so it's pretty hard to control making sure the TID feature will be around. It wasn't supported before, so keep it like that for now. Now we should be as ease when everything is protected by a mutex that we always take anyway. One side effect: we can finally remove one ramblock_recv_bitmap_test() in mark_postcopy_blocktime_begin(), which was pretty weird and which also includes a weird (but maybe necessary.. but maybe not?) operation to inject a blocktime entry then quickly erase it.. When we're with the mutex, and when we make sure it's invoked after checking the receive bitmap, it's not needed anymore. Instead, we assert. As another side effect, this paves way for removing all atomic ops in all the mem accesses in blocktime layer. Note that we need a stub for mark_postcopy_blocktime_begin() for Windows builds. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration/postcopy: Replace QemuSemaphore with QemuEvent	Akihiko Odaki	2025-06-06	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	thread_sync_sem is an one-shot event so it can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250529-event-v5-9-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	migration/colo: Replace QemuSemaphore with QemuEvent	Akihiko Odaki	2025-06-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \|	colo_exit_sem and colo_incoming_sem represent one-shot events so they can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-8-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	migration: Replace QemuSemaphore with QemuEvent	Akihiko Odaki	2025-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \|	pause_event can utilize qemu_event_reset() to discard events. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/20250529-event-v5-7-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	migration: Add thread pool of optional load threads	Maciej S. Szmigiero	2025-03-06	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some drivers might want to make use of auxiliary helper threads during VM state loading, for example to make sure that their blocking (sync) I/O operations don't block the rest of the migration process. Add a migration core managed thread pool to facilitate this use case. The migration core will wait for these threads to finish before (re)starting the VM at destination. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/b09fd70369b6159c75847e69f235cb908b02570c.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
*	migration: Add MIG_CMD_SWITCHOVER_START and its load handler	Maciej S. Szmigiero	2025-03-06	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is used to mark the switchover point in main migration stream. It can be used to inform the destination that all pre-switchover main migration stream data has been sent/received so it can start to process post-switchover data that it might have received via other migration channels like the multifd ones. Add also the relevant MigrationState bit stream compatibility property and its hw_compat entry. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
*	migration: Change migrate_fd_ to migration_	Fabiano Rosas	2025-02-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove all instances of _fd_ from the migration generic code. These functions have grown over time and the _fd_ part is now just confusing. migration_fd_error() -> migration_error() makes it a little vague. Since it's only used for migration_connect() failures, change it to migration_connect_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-4-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration: Set migration error outside of migrate_cancel	Fabiano Rosas	2025-02-14	1	-1/+1
\| \| \| \| \| \| \| \| \|	There's no point passing the error into migration cancel only for it to call migrate_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-2-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration/multifd: Add a compat property for TLS termination	Fabiano Rosas	2025-02-14	1	-0/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're currently changing the way the source multifd migration handles the shutdown of the multifd channels when TLS is in use to perform a clean termination by calling gnutls_bye(). Older src QEMUs will always close the channel without terminating the TLS session. New dst QEMUs treat an unclean termination as an error. Add multifd_clean_tls_termination (default true) that can be switched on the destination whenever a src QEMU <= 9.2 is in use. (Note that the compat property is only strictly necessary for src QEMUs older than 9.1. Due to synchronization coincidences, src QEMUs 9.1 and 9.2 can put the destination in a condition where it doesn't see the unclean termination. Still, make the property more inclusive to facilitate potential backports.) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging	Stefan Hajnoczi	2025-02-10	1	-3/+0
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Block layer patches - Managing inactive nodes (enables QSD migration with shared storage) - Fix swapped values for BLOCK_IO_ERROR 'device' and 'qom-path' - vpc: Read images exported from Azure correctly - scripts/qemu-gdb: Support coroutine dumps in coredumps - Minor cleanups # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmek34IRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9bDpxAAnTvwmdazAXG0g9GzqvrEB/+6rStjAsqE # 9MTWV4WxyN41d0RXxN8CYKb8CXSiTRyw6r3CSGNYEI2eShe9e934PriSkZm41HyX # n9Yh5YxqGZqitzvPtx62Ii/1KG+PcjQbfHuK1p4+rlKa0yQ2eGlio1JIIrZrCkBZ # ikZcQUrhIyD0XV8hTQ2+Ysa+ZN6itjnlTQIG3gS3m8f8WR7kyUXD8YFMQFJFyjVx # NrAIpLnc/ln9+5PZR9tje8U7XEn2KCgI5pgGaQnrd0h0G1H4ig8ogzYYnKTLhjU/ # AmQpS8np8Tyg6S1UZTiekEq0VuAhThEQc5b3sGbmHWH/R2ABMStyf18oCBAkPzZ7 # s6h+3XzTKKY2Q5Q3ZG/ANkUJjTNBhdj1fcaARvbSWsqsuk5CWX/I3jzvgihFtCSs # eGu+b/bLeW6P7hu4qPHBcgLHuB1Fc7Rd2t4BoIGM1wcO2CeC9DzUKOiIMZOEJIh0 # GGqCkEWDHgckDTakD4/vSqm0UDKt6FSlQC9ga/ILBY3IB5HpHoArY58selymy28i # X7MgAvbjdsmNuUuXDZZOiObcFt3j8jlmwPJpPyzXPQIiPX1RXeBPRhVAEeZCKn6Z # tfHr72SJdMeVOGXVTvOrJ2iW+4g03rPdmkDFCUhpOwo62RODq7ahvCIXsNf3nEFR # rSB3T1M/8EM= # =iQLP # -----END PGP SIGNATURE----- # gpg: Signature made Thu 06 Feb 2025 11:12:50 EST # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (25 commits) block: remove unused BLOCK_OP_TYPE_DATAPLANE iotests: Add (NBD-based) tests for inactive nodes iotests: Add qsd-migrate case iotests: Add filter_qtest() nbd/server: Support inactive nodes block/export: Add option to allow export of inactive nodes block: Drain nodes before inactivating them block/export: Don't ignore image activation error in blk_exp_add() block: Support inactive nodes in blk_insert_bs() block: Add blockdev-set-active QMP command block: Add option to create inactive nodes block: Fix crash on block_resize on inactive node block: Don't attach inactive child to active node migration/block-active: Remove global active flag block: Inactivate external snapshot overlays when necessary block: Allow inactivating already inactive nodes block: Add 'active' field to BlockDeviceInfo block-backend: Fix argument order when calling 'qapi_event_send_block_io_error()' scripts/qemu-gdb: Support coroutine dumps in coredumps scripts/qemu-gdb: Simplify fs_base fetching for coroutines ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
\| *	migration/block-active: Remove global active flag	Kevin Wolf	2025-02-06	1	-3/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Block devices have an individual active state, a single global flag can't cover this correctly. This becomes more important as we allow users to manually manage which nodes are active or inactive. Now that it's allowed to call bdrv_inactivate_all() even when some nodes are already inactive, we can remove the flag and just unconditionally call bdrv_inactivate_all() and, more importantly, bdrv_activate_all() before we make use of the nodes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250204211407.381505-5-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* \|	qapi: Move include/qapi/qmp/ to include/qobject/	Daniel P. Berrangé	2025-02-10	1	-1/+1
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The general expectation is that header files should follow the same file/path naming scheme as the corresponding source file. There are various historical exceptions to this practice in QEMU, with one of the most notable being the include/qapi/qmp/ directory. Most of the headers there correspond to source files in qobject/. This patch corrects most of that inconsistency by creating include/qobject/ and moving the headers for qobject/ there. This also fixes MAINTAINERS for include/qapi/qmp/dispatch.h: scripts/get_maintainer.pl now reports "QAPI" instead of "No maintainers found". Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Acked-by: Halil Pasic <pasic@linux.ibm.com> #s390x Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20241118151235.2665921-2-armbru@redhat.com> [Rebased]
*	migration: Do not construct JSON description if suppressed	Peter Xu	2025-01-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	QEMU machine has a property "suppress-vmdesc". When it is enabled, QEMU will stop attaching JSON VM description at the end of the precopy migration stream (postcopy is never affected because postcopy never attach that). However even if it's suppressed by the user, the source QEMU will still construct the JSON descriptions, which is a complete waste of CPU and memory resources. To avoid it, only create the JSON writer object if suppress-vmdesc is not specified. Luckily, vmstate_save() already supports vmdesc==NULL, so only a few spots that are left to be prepared that vmdesc can be NULL now. When at it, move the init / destroy of the JSON writer object to start / end of the migration - the JSON writer object is a sub-struct of migration state, and that looks like the only object that was dynamically allocated / destroyed within migration process. Make it the same as the rest objects that migration uses. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-3-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration: cpr-transfer mode	Steve Sistare	2025-01-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the cpr-transfer migration mode, which allows the user to transfer a guest to a new QEMU instance on the same host with minimal guest pause time, by preserving guest RAM in place, albeit with new virtual addresses in new QEMU, and by preserving device file descriptors. Pages that were locked in memory for DMA in old QEMU remain locked in new QEMU, because the descriptor of the device that locked them remains open. cpr-transfer preserves memory and devices descriptors by sending them to new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot be sent over the normal migration channel, because devices and backends are created prior to reading the channel, so this mode sends CPR state over a second "cpr" migration channel. New QEMU reads the cpr channel prior to creating devices or backends. The user specifies the cpr channel in the channel arguments on the outgoing side, and in a second -incoming command-line parameter on the incoming side. The user must start old QEMU with the the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process by transferring a memory descriptor for each ram block. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The user starts new QEMU on the same host as old QEMU, with command-line arguments to create the same machine, plus the -incoming option for the main migration channel, like normal live migration. In addition, the user adds a second -incoming option with channel type "cpr". This CPR channel must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket. To initiate CPR, the user issues a migrate command to old QEMU, adding a second migration channel of type "cpr" in the channels argument. Old QEMU stops the VM, saves state to the migration channels, and enters the postmigrate state. New QEMU mmap's memory descriptors, and execution resumes. The implementation splits qmp_migrate into start and finish functions. Start sends CPR state to new QEMU, which responds by closing the CPR channel. Old QEMU detects the HUP then calls finish, which connects the main migration channel. In summary, the usage is: qemu-system-$arch -machine aux-ram-share=on ... start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>" Issue commands to old QEMU: migrate_set_parameter mode cpr-transfer {"execute": "migrate", ... {"channel-type": "main"...}, {"channel-type": "cpr"...} ... } Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration: enhance migrate_uri_parse	Steve Sistare	2025-01-29	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \|	Export migrate_uri_parse for use outside migration internals, and define a method migrate_is_uri that indicates when migrate_uri_parse should be used. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-12-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration/block: Rewrite disk activation	Peter Xu	2025-01-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch proposes a flag to maintain disk activation status globally. It mostly rewrites disk activation mgmt for QEMU, including COLO and QMP command xen_save_devices_state. Backgrounds =========== We have two problems on disk activations, one resolved, one not. Problem 1: disk activation recover (for switchover interruptions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When migration is either cancelled or failed during switchover, especially when after the disks are inactivated, QEMU needs to remember re-activate the disks again before vm starts. It used to be done separately in two paths: one in qmp_migrate_cancel(), the other one in the failure path of migration_completion(). It used to be fixed in different commits, all over the places in QEMU. So these are the relevant changes I saw, I'm not sure if it's complete list: - In 2016, commit fe904ea824 ("migration: regain control of images when migration fails to complete") - In 2017, commit 1d2acc3162 ("migration: re-active images while migration been canceled after inactive them") - In 2023, commit 6dab4c93ec ("migration: Attempt disk reactivation in more failure scenarios") Now since we have a slightly better picture maybe we can unify the reactivation in a single path. One side benefit of doing so is, we can move the disk operation outside QMP command "migrate_cancel". It's possible that in the future we may want to make "migrate_cancel" be OOB-compatible, while that requires the command doesn't need BQL in the first place. This will already do that and make migrate_cancel command lightweight. Problem 2: disk invalidation on top of invalidated disks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is an unresolved bug for current QEMU. Link in "Resolves:" at the end. It turns out besides the src switchover phase (problem 1 above), QEMU also needs to remember block activation on destination. Consider two continuous migration in a row, where the VM was always paused. In that scenario, the disks are not activated even until migration completed in the 1st round. When the 2nd round starts, if QEMU doesn't know the status of the disks, it needs to try inactivate the disk again. Here the issue is the block layer API bdrv_inactivate_all() will crash a QEMU if invoked on already inactive disks for the 2nd migration. For detail, see the bug link at the end. Implementation ============== This patch proposes to maintain disk activation with a global flag, so we know: - If we used to inactivate disks for migration, but migration got cancelled, or failed, QEMU will know it should reactivate the disks. - On incoming side, if the disks are never activated but then another migration is triggered, QEMU should be able to tell that inactivate is not needed for the 2nd migration. We used to have disk_inactive, but it only solves the 1st issue, not the 2nd. Also, it's done in completely separate paths so it's extremely hard to follow either how the flag changes, or the duration that the flag is valid, and when we will reactivate the disks. Convert the existing disk_inactive flag into that global flag (also invert its naming), and maintain the disk activation status for the whole lifecycle of qemu. That includes the incoming QEMU. Put both of the error cases of source migration (failure, cancelled) together into migration_iteration_finish(), which will be invoked for either of the scenario. So from that part QEMU should behave the same as before. However with such global maintenance on disk activation status, we not only cleanup quite a few temporary paths that we try to maintain the disk activation status (e.g. in postcopy code), meanwhile it fixes the crash for problem 2 in one shot. For freshly started QEMU, the flag is initialized to TRUE showing that the QEMU owns the disks by default. For incoming migrated QEMU, the flag will be initialized to FALSE once and for all showing that the dest QEMU doesn't own the disks until switchover. That is guaranteed by the "once" variable. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	include: Rename sysemu/ -> system/	Philippe Mathieu-Daudé	2024-12-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Headers in include/sysemu/ are not only related to system emulation, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
*	migration: correct multifd receive thread name	Prasad Pandit	2024-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Multifd receive threads run on the destination side. Correct the thread name marco to indicate the same. Fixes: e620b1e4770b ("migration: Put thread names together with macros") Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241127111528.167330-1-ppandit@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Unexport dirty_bitmap_mig_init()	Peter Xu	2024-10-31	1	-0/+4
\| \| \| \| \| \| \| \| \|	It's only used within migration/, so it shouldn't be exported. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Support periodic RAMBlock dirty bitmap sync	Hyman Huang	2024-10-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When VM is configured with huge memory, the current throttle logic doesn't look like to scale, because migration_trigger_throttle() is only called for each iteration, so it won't be invoked for a long time if one iteration can take a long time. The periodic dirty sync aims to fix the above issue by synchronizing the ramblock from remote dirty bitmap and, when necessary, triggering the CPU throttle multiple times during a long iteration. This is a trade-off between synchronization overhead and CPU throttle impact. Signed-off-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/f61f1b3653f2acf026901103e1c73d157d38b08f.1729146786.git.yong.huang@smartx.com [peterx: make prev_cnt global, and reset for each migration] Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Put thread names together with macros	Peter Xu	2024-10-31	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Use MigrationStatus instead of int	Peter Xu	2024-06-21	1	-4/+5
\| \| \| \| \| \| \| \| \|	QEMU uses "int" in most cases even if it stores MigrationStatus. I don't know why, so let's try to do that right and see what blows up.. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration: Remove non-multifd compression	Fabiano Rosas	2024-05-08	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'compress' migration capability enables the old compression code which has shown issues over the years and is thought to be less stable and tested than the more recent multifd-based compression. The old compression code has been deprecated in 8.2 and now is time to remove it. Deprecation commit 864128df46 ("migration: Deprecate old compression method"). Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration: Remove 'blk/-b' option from migrate commands	Fabiano Rosas	2024-05-08	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The block migration is considered obsolete and has been deprecated in 8.2. Remove the migrate command option that enables it. This only affects the QMP and HMP commands, the feature can still be accessed by setting the migration 'block' capability. The whole feature will be removed in a future patch. Deprecation commit 8846b5bfca ("migration: migrate 'blk' command option is deprecated."). Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	qapi: introduce exit-on-error parameter for migrate-incoming	Vladimir Sementsov-Ogievskiy	2024-05-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now we do set MIGRATION_FAILED state, but don't give a chance to orchestrator to query migration state and get the error. Let's provide a possibility for QMP-based orchestrators to get an error like with outgoing migration. For hmp_migrate_incoming(), let's enable the new behavior: HMP is not and ABI, it's mostly intended to use by developer and it makes sense not to stop the process. For x-exit-preconfig, let's keep the old behavior: - it's called from init(), so here we want to keep current behavior by default - it does exit on error by itself as well So, if we want to change the behavior of x-exit-preconfig, it should be another patch. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
*	migration: do not include coroutine_int.h	Paolo Bonzini	2024-05-03	1	-1/+1
\| \| \| \| \| \| \| \|	Migration code needs no private fields of the coroutine backend. Include the "regular" coroutine.h header. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
*	migration: purge MigrationState from public interface	Steve Sistare	2024-03-11	1	-0/+6
\| \| \| \| \| \| \| \| \|	Move remaining MigrationState references from the public file misc.h to the private file migration.h. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-12-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: export migration_is_running	Steve Sistare	2024-03-11	1	-2/+0
\| \| \| \| \| \| \| \| \|	Delete the MigrationState parameter from migration_is_running and move it to the public API in misc.h. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: export migration_is_setup_or_active	Steve Sistare	2024-03-11	1	-1/+0
\| \| \| \| \| \| \| \| \| \|	Delete the MigrationState parameter from migration_is_setup_or_active and move it to the public API in misc.h. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Link: https://lore.kernel.org/r/1710179338-294359-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: stop vm for cpr	Steve Sistare	2024-02-28	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \|	When migration for cpr is initiated, stop the vm and set state RUN_STATE_FINISH_MIGRATE before ram is saved. This eliminates the possibility of ram and device state being out of sync, and guarantees that a guest in the suspended state remains suspended, because qmp_cont rejects a cont command in the RUN_STATE_FINISH_MIGRATE state. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-11-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: remove postcopy_after_devices	Steve Sistare	2024-02-28	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	postcopy_after_devices and migration_in_postcopy_after_devices are no longer used, so delete them. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1708622920-68779-6-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Centralize BH creation and dispatch	Fabiano Rosas	2024-01-29	1	-4/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the migration state reference counting is correct, further wrap the bottom half dispatch process to avoid future issues. Move BH creation and scheduling together and wrap the dispatch with an intermediary function that will ensure we always keep the ref/unref balanced. Also move the responsibility of deleting the BH into the wrapper and remove the now unnecessary pointers. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240119233922.32588-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Make threshold_size an uint64_t	Peter Xu	2024-01-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	It's always used to compare against another uint64_t. Make it always clear that it's never a negative. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240117075848.139045-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Remove errp parameter in migration_fd_process_incoming()	Avihai Horon	2024-01-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Errp parameter in migration_fd_process_incoming() is unused. Remove it. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20231231093016.14204-5-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Remove migrate_max_downtime() declaration	Avihai Horon	2024-01-04	1	-2/+0
\| \| \| \| \| \| \| \| \| \|	migrate_max_downtime() has been removed long ago, but its declaration was mistakenly left. Remove it. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20231231093016.14204-2-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
*	migration: Implement MigrateChannelList to hmp migration flow.	Het Gala	2023-11-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Integrate MigrateChannelList with all transport backends (socket, exec and rdma) for both src and dest migration endpoints for hmp migration. Suggested-by: Aravind Retnakaran <aravind.retnakaran@nutanix.com> Signed-off-by: Het Gala <het.gala@nutanix.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20231023182053.8711-14-farosas@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>
*	migration: Allow network to fail even during recovery	Peter Xu	2023-11-02	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally the postcopy recover phase should only exist for a super short period, that's the duration when QEMU is trying to recover from an interrupted postcopy migration, during which handshake will be carried out for continuing the procedure with state changes from PAUSED -> RECOVER -> POSTCOPY_ACTIVE again. Here RECOVER phase should be super small, that happens right after the admin specified a new but working network link for QEMU to reconnect to dest QEMU. However there can still be case where the channel is broken in this small RECOVER window. If it happens, with current code there's no way the src QEMU can got kicked out of RECOVER stage. No way either to retry the recover in another channel when established. This patch allows the RECOVER phase to fail itself too - we're mostly ready, just some small things missing, e.g. properly kick the main migration thread out when sleeping on rp_sem when we found that we're at RECOVER stage. When this happens, it fails the RECOVER itself, and rollback to PAUSED stage. Then the user can retry another round of recovery. To make it even stronger, teach QMP command migrate-pause to explicitly kick src/dst QEMU out when needed, so even if for some reason the migration thread didn't got kicked out already by a failing rethrn-path thread, the admin can also kick it out. This will be an super, super corner case, but still try to cover that. One can try to test this with two proxy channels for migration: (a) socat unix-listen:/tmp/src.sock,reuseaddr,fork tcp:localhost:10000 (b) socat tcp-listen:10000,reuseaddr,fork unix:/tmp/dst.sock So the migration channel will be: (a) (b) src -> /tmp/src.sock -> tcp:10000 -> /tmp/dst.sock -> dst Then to make QEMU hang at RECOVER stage, one can do below: (1) stop the postcopy using QMP command postcopy-pause (2) kill the 2nd proxy (b) (3) try to recover the postcopy using /tmp/src.sock on src (4) src QEMU will go into RECOVER stage but won't be able to continue from there, because the channel is actually broken at (b) Before this patch, step (4) will make src QEMU stuck in RECOVER stage, without a way to kick the QEMU out or continue the postcopy again. After this patch, (4) will quickly fail qemu and bounce back to PAUSED stage. Admin can also kick QEMU from (4) into PAUSED when needed using migrate-pause when needed. After bouncing back to PAUSED stage, one can recover again. Reported-by: Xiaohui Li <xiaohli@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2111332 Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-3-peterx@redhat.com>
*	migration: Refactor error handling in source return path	Peter Xu	2023-11-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rp_state.error was a boolean used to show error happened in return path thread. That's not only duplicating error reporting (migrate_set_error), but also not good enough in that we only do error_report() and set it to true, we never can keep a history of the exact error and show it in query-migrate. To make this better, a few things done: - Use error_setg() rather than error_report() across the whole lifecycle of return path thread, keeping the error in an Error*. - With above, no need to have mark_source_rp_bad(), remove it, alongside with rp_state.error itself. - Use migrate_set_error() to apply that captured error to the global migration object when error occured in this thread. - Do the same when detected qemufile error in source return path We need to re-export qemu_file_get_error_obj() to do the last one. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231017202633.296756-2-peterx@redhat.com>
*	migration: migration_stop_vm() helper	Peter Xu	2023-11-01	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Provide a helper for non-COLO use case of migration to stop a VM. This prepares for adding some downtime relevant tracepoints to migration, where they may or may not apply to COLO. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231030163346.765724-5-peterx@redhat.com>
*	migration: Create migrate_rdma()	Juan Quintela	2023-10-17	1	-0/+2
\| \| \| \| \| \| \| \| \|	Helper to say if we are doing a migration over rdma. Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231011203527.9061-2-quintela@redhat.com>
*	migration: Allow user to specify available switchover bandwidth	Peter Xu	2023-10-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Migration bandwidth is a very important value to live migration. It's because it's one of the major factors that we'll make decision on when to switchover to destination in a precopy process. This value is currently estimated by QEMU during the whole live migration process by monitoring how fast we were sending the data. This can be the most accurate bandwidth if in the ideal world, where we're always feeding unlimited data to the migration channel, and then it'll be limited to the bandwidth that is available. However in reality it may be very different, e.g., over a 10Gbps network we can see query-migrate showing migration bandwidth of only a few tens of MB/s just because there are plenty of other things the migration thread might be doing. For example, the migration thread can be busy scanning zero pages, or it can be fetching dirty bitmap from other external dirty sources (like vhost or KVM). It means we may not be pushing data as much as possible to migration channel, so the bandwidth estimated from "how many data we sent in the channel" can be dramatically inaccurate sometimes. With that, the decision to switchover will be affected, by assuming that we may not be able to switchover at all with such a low bandwidth, but in reality we can. The migration may not even converge at all with the downtime specified, with that wrong estimation of bandwidth, keeping iterations forever with a low estimation of bandwidth. The issue is QEMU itself may not be able to avoid those uncertainties on measuing the real "available migration bandwidth". At least not something I can think of so far. One way to fix this is when the user is fully aware of the available bandwidth, then we can allow the user to help providing an accurate value. For example, if the user has a dedicated channel of 10Gbps for migration for this specific VM, the user can specify this bandwidth so QEMU can always do the calculation based on this fact, trusting the user as long as specified. It may not be the exact bandwidth when switching over (in which case qemu will push migration data as fast as possible), but much better than QEMU trying to wildly guess, especially when very wrong. A new parameter "avail-switchover-bandwidth" is introduced just for this. So when the user specified this parameter, instead of trusting the estimated value from QEMU itself (based on the QEMUFile send speed), it trusts the user more by using this value to decide when to switchover, assuming that we'll have such bandwidth available then. Note that specifying this value will not throttle the bandwidth for switchover yet, so QEMU will always use the full bandwidth possible for sending switchover data, assuming that should always be the most important way to use the network at that time. This can resolve issues like "unconvergence migration" which is caused by hilarious low "migration bandwidth" detected for whatever reason. Reported-by: Zhiyi Guo <zhguo@redhat.com> Reviewed-by: Joao Martins <joao.m.martins@oracle.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231010221922.40638-1-peterx@redhat.com>
*	migration: Add migration_rp_wait\|kick()	Peter Xu	2023-10-11	1	-0/+15
\| \| \| \| \| \| \| \| \| \|	It's just a simple wrapper for rp_sem on either wait() or kick(), make it even clearer on how it is used. Prepared to be used even for other things. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-ID: <20231004220240.167175-8-peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
*	migration: Introduce migrate_has_error()	Peter Xu	2023-10-11	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a helper to detect whether MigrationState.error is set for whatever reason. This is preparation work for any thread (e.g. source return path thread) to setup errors in an unified way to MigrationState, rather than relying on its own way to set errors (mark_source_rp_bad()). Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> Message-ID: <20231004220240.167175-3-peterx@redhat.com>
*	migration: Replace the return path retry logic	Fabiano Rosas	2023-09-27	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Replace the return path retry logic with finishing and restarting the thread. This fixes a race when resuming the migration that leads to a segfault. Currently when doing postcopy we consider that an IO error on the return path file could be due to a network intermittency. We then keep the thread alive but have it do cleanup of the 'from_dst_file' and wait on the 'postcopy_pause_rp' semaphore. When the user issues a migrate resume, a new return path is opened and the thread is allowed to continue. There's a race condition in the above mechanism. It is possible for the new return path file to be setup before the cleanup code in the return path thread has had a chance to run, leading to the new file being closed and the pointer set to NULL. When the thread is released after the resume, it tries to dereference 'from_dst_file' and crashes: Thread 7 "return path" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd1dbf700 (LWP 9611)] 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 154 return f->last_error; (gdb) bt #0 0x00005555560e4893 in qemu_file_get_error_obj (f=0x0, errp=0x0) at ../migration/qemu-file.c:154 #1 0x00005555560e4983 in qemu_file_get_error (f=0x0) at ../migration/qemu-file.c:206 #2 0x0000555555b9a1df in source_return_path_thread (opaque=0x555556e06000) at ../migration/migration.c:1876 #3 0x000055555602e14f in qemu_thread_start (args=0x55555782e780) at ../util/qemu-thread-posix.c:541 #4 0x00007ffff38d76ea in start_thread (arg=0x7fffd1dbf700) at pthread_create.c:477 #5 0x00007ffff35efa6f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Here's the race (important bit is open_return_path happening before migration_release_dst_files): migration \| qmp \| return path --------------------------+-----------------------------+--------------------------------- qmp_migrate_pause() shutdown(ms->to_dst_file) f->last_error = -EIO migrate_detect_error() postcopy_pause() set_state(PAUSED) wait(postcopy_pause_sem) qmp_migrate(resume) migrate_fd_connect() resume = state == PAUSED open_return_path <-- TOO SOON! set_state(RECOVER) post(postcopy_pause_sem) (incoming closes to_src_file) res = qemu_file_get_error(rp) migration_release_dst_files() ms->rp_state.from_dst_file = NULL post(postcopy_pause_rp_sem) postcopy_pause_return_path_thread() wait(postcopy_pause_rp_sem) rp = ms->rp_state.from_dst_file goto retry qemu_file_get_error(rp) SIGSEGV ------------------------------------------------------------------------------------------- We can keep the retry logic without having the thread alive and waiting. The only piece of data used by it is the 'from_dst_file' and it is only allowed to proceed after a migrate resume is issued and the semaphore released at migrate_fd_connect(). Move the retry logic to outside the thread by waiting for the thread to finish before pausing the migration. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-8-farosas@suse.de>
*	migration: Fix race that dest preempt thread close too early	Peter Xu	2023-09-27	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We hit intermit CI issue on failing at migration-test over the unit test preempt/plain: qemu-system-x86_64: Unable to read from socket: Connection reset by peer Memory content inconsistency at 5b43000 first_byte = bd last_byte = bc current = 4f hit_edge = 1 ** ERROR:../tests/qtest/migration-test.c:300:check_guests_ram: assertion failed: (bad == 0) (test program exited with status code -6) Fabiano debugged into it and found that the preempt thread can quit even without receiving all the pages, which can cause guest not receiving all the pages and corrupt the guest memory. To make sure preempt thread finished receiving all the pages, we can rely on the page_requested_count being zero because preempt channel will only receive requested page faults. Note, not all the faulted pages are required to be sent via the preempt channel/thread; imagine the case when a requested page is just queued into the background main channel for migration, the src qemu will just still send it via the background channel. Here instead of spinning over reading the count, we add a condvar so the main thread can wait on it if that unusual case happened, without burning the cpu for no good reason, even if the duration is short; so even if we spin in this rare case is probably fine. It's just better to not do so. The condvar is only used when that special case is triggered. Some memory ordering trick is needed to guarantee it from happening (against the preempt thread status field), so the main thread will always get a kick when that triggers correctly. Closes: https://gitlab.com/qemu-project/qemu/-/issues/1886 Debugged-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20230918172822.19052-2-farosas@suse.de>
*	migration: Add .save_prepare() handler to struct SaveVMHandlers	Avihai Horon	2023-09-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new .save_prepare() handler to struct SaveVMHandlers. This handler is called early, even before migration starts, and can be used by devices to perform early checks. Refactor migrate_init() to be able to return errors and call .save_prepare() from there. Suggested-by: Peter Xu <peterx@redhat.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
*	migration: Add migration prefix to functions in target.c	Avihai Horon	2023-09-11	1	-2/+2
\| \| \| \| \| \| \| \| \|	The functions in target.c are not static, yet they don't have a proper migration prefix. Add such prefix. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
*	migration: spelling fixes	Michael Tokarev	2023-07-25	1	-2/+2
\| \| \| \| \|	Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Fabiano Rosas <farosas@suse.de>
*	migration: unexport migrate_fd_error()	Laszlo Ersek	2023-07-08	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only migrate_fd_error() call sites are in "migration/migration.c", which is also where we define migrate_fd_error(). Make the function static, and remove its declaration from "migration/migration.h". Cc: Juan Quintela <quintela@redhat.com> (maintainer:Migration) Cc: Leonardo Bras <leobras@redhat.com> (reviewer:Migration) Cc: Peter Xu <peterx@redhat.com> (reviewer:Migration) Cc: qemu-trivial@nongnu.org Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2018404 Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
*	vfio/migration: Reset bytes_transferred properly	Avihai Horon	2023-06-30	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, VFIO bytes_transferred is not reset properly: 1. bytes_transferred is not reset after a VM snapshot (so a migration following a snapshot will report incorrect value). 2. bytes_transferred is a single counter for all VFIO devices, however upon migration failure it is reset multiple times, by each VFIO device. Fix it by introducing a new function vfio_reset_bytes_transferred() and calling it during migration and snapshot start. Remove existing bytes_transferred reset in VFIO migration state notifier, which is not needed anymore. Fixes: 3710586caa5d ("qapi: Add VFIO devices migration stats in Migration stats") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>