summary refs log tree commit diff stats
path: root/include/migration (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'staging-pull-request' of https://gitlab.com/peterx/qemu into stagingRichard Henderson2025-10-044-4/+39
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Migration/Memory Pull for 10.2 - PeterX's fix on tls warning for preempt channel when migratino completes - Arun's series to enhance error reporting for vTPM and migration framework - PeterX's patch to cleanup multifd send TLS BYE messages - Juraj's fix on postcopy start state transition when switchover failed - Yanfei's fix to migrate APIC before VFIO-PCI to avoid irq fallbacks - Dan's cleanup to simplify error reporting in qemu_fill_buffer() - PeterM's fix on address space leak when cpu hot plug / unplug - Steve's cpr-exec wholeset # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCaN/uIhIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wZ+mAEA1l2RS9sZS1W3vXQMCNb+Nu8Uo2p+e5Qj # Uu6J0WVV+XsBANtzGZk2UM/frqlABywW3/ozJ4qBvIPKo758Mr6/lqUH # =asUv # -----END PGP SIGNATURE----- # gpg: Signature made Fri 03 Oct 2025 08:39:14 AM PDT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [unknown] # gpg: aka "Peter Xu <peterx@redhat.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'staging-pull-request' of https://gitlab.com/peterx/qemu: (45 commits) migration-test: test cpr-exec vfio: cpr-exec mode migration: cpr-exec docs migration: cpr-exec mode migration: cpr-exec save and load migration: cpr-exec-command parameter oslib: qemu_clear_cloexec migration: add cpr_walk_fd migration: multi-mode notifier migration: simplify error reporting after channel read physmem: Destroy all CPU AddressSpaces on unrealize memory: New AS helper to serialize destroy+free include/system/memory.h: Clarify address_space_destroy() behaviour migration: ensure APIC is loaded prior to VFIO PCI devices migration: Fix state transition in postcopy_start() error handling migration/multifd/tls: Cleanup BYE message processing on sender side migration: HMP: Adjust the order of output fields migration: Make migration_has_failed() work even for CANCELLING io/crypto: Move tls premature termination handling into QIO layer backends/tpm: Propagate vTPM error on migration failure ... Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * migration: cpr-exec modeSteve Sistare2025-10-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the cpr-exec migration mode. Usage: qemu-system-$arch -machine aux-ram-share=on ... migrate_set_parameter mode cpr-exec migrate_set_parameter cpr-exec-command \ <arg1> <arg2> ... -incoming <uri-1> \ migrate -d <uri-1> The migrate command stops the VM, saves state to uri-1, directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID, and loads state from uri-1. Guest RAM is preserved in place, albeit with new virtual addresses. The new QEMU process is started by exec'ing the command specified by the @cpr-exec-command parameter. The first word of the command is the binary, and the remaining words are its arguments. The command may be a direct invocation of new QEMU, or may be a non-QEMU command that exec's the new QEMU binary. This mode creates a second migration channel that is not visible to the user. At the start of migration, old QEMU saves CPR state to the second channel, and at the end of migration, it tells the main loop to call cpr_exec. New QEMU loads CPR state early, before objects are created. Because old QEMU terminates when new QEMU starts, one cannot stream data between the two, so uri-1 must be a type, such as a file, that accepts all data before old QEMU exits. Otherwise, old QEMU may quietly block writing to the channel. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The VM must be started with the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process. The memfds are kept open across exec by clearing the close-on-exec flag, their values are saved in CPR state, and they are mmap'd in new QEMU. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: cpr-exec save and loadSteve Sistare2025-10-031-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | To preserve CPR state across exec, create a QEMUFile based on a memfd, and keep the memfd open across exec. Save the value of the memfd in an environment variable so post-exec QEMU can find it. These new functions are called in a subsequent patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1759332851-370353-6-git-send-email-steven.sistare@oracle.com [peterx: fix build for Windows] Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: add cpr_walk_fdSteve Sistare2025-10-031-0/+3
| | | | | | | | | | | | | | | | | | Add a helper to walk all CPR fd's and run a callback for each. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: multi-mode notifierSteve Sistare2025-10-031-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | Allow a notifier to be added for multiple migration modes. To allow a notifier to appear on multiple per-node lists, use a generic list type. We can no longer use NotifierWithReturnList, because it shoe horns the notifier onto a single list. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: ensure APIC is loaded prior to VFIO PCI devicesYanfei Xu2025-10-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The load procedure of VFIO PCI devices involves setting up IRT for each VFIO PCI devices. This requires determining whether an interrupt is single-destination interrupt to decide between Posted Interrupt(PI) or remapping mode for the IRTE. However, determining this may require accessing the VM's APIC registers. For example: ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs) ... kvm_arch_irq_bypass_add_producer kvm_x86_call(pi_update_irte) vmx_pi_update_irte kvm_intr_is_single_vcpu If the LAPIC has not been loaded yet, interrupts will use remapping mode. To prevent the fallback of interrupt mode, keep APIC is always loaded prior to VFIO PCI devices. Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com> Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250818131127.1021648-1-yanfei.xu@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: Add error-parameterized function variants in VMSD structArun Menon2025-10-031-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - We need to have good error reporting in the callbacks in VMStateDescription struct. Specifically pre_save, pre_load and post_load callbacks. - It is not possible to change these functions everywhere in one patch, therefore, we introduce a duplicate set of callbacks with Error object passed to them. - So, in this commit, we implement 'errp' variants of these callbacks, introducing an explicit Error object parameter. - This is a functional step towards transitioning the entire codebase to the new error-parameterized functions. - Deliberately called in mutual exclusion from their counterparts, to prevent conflicts during the transition. - New impls should preferentally use 'errp' variants of these methods, and existing impls incrementally converted. The variants without 'errp' are intended to be removed once all usage is converted. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-26-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: Remove error variant of vmstate_save_state() functionArun Menon2025-10-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: push Error **errp into loadvm_process_enable_colo()Arun Menon2025-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_enable_colo() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-21-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
| * migration: push Error **errp into vmstate_load_state()Arun Menon2025-10-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* | migration: Add support for a variable-length array of UINT32 pointersTANG Tiancheng2025-10-031-0/+10
|/ | | | | | | | | | | | | | Add support for defining a vmstate field which is a variable-length array of pointers, and use this to define a VMSTATE_TIMER_PTR_VARRAY() which allows a variable-length array of QEMUTimer* to be used by devices. Message-id: 20250909-timers-v1-0-7ee18a9d8f4b@linux.alibaba.com Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: TANG Tiancheng <lyndra@linux.alibaba.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250911-timers-v3-2-60508f640050@linux.alibaba.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
* migration: Rename save_live_complete_precopy_thread to ↵Juraj Marcin2025-07-112-7/+7
| | | | | | | | | | | | | | | | | | | | | | | save_complete_precopy_thread Recent patch [1] renames the save_live_complete_precopy handler to save_complete, as the machine is not live in most cases when this handler is executed. The same is true also for save_live_complete_precopy_thread, therefore this patch removes the "live" keyword from the handler itself and related types to keep the naming unified. In contrast to save_complete, this handler is only executed at the end of precopy, therefore the "precopy" keyword is retained. [1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Rename save_live_complete_precopy to save_completePeter Xu2025-07-111-3/+3
| | | | | | | | | | | | | | | | | | Now after merging the precopy and postcopy version of complete() hook, rename the precopy version from save_live_complete_precopy() to save_complete(). Dropping the "live" when at it, because it's in most cases not live when happening (in precopy). No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com [peterx: squash the fixup that covers a few more doc spots, per Juraj] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Drop save_live_complete_postcopy hookPeter Xu2025-07-111-16/+8
| | | | | | | | | | | | | | The hook is only defined in two vmstate users ("ram" and "block dirty bitmap"), meanwhile both of them define the hook exactly the same as the precopy version. Hence, this postcopy version isn't needed. No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: vfio cpr state hookSteve Sistare2025-07-031-0/+12
| | | | | | | | | | | | | | | Define a list of vfio devices in CPR state, in a subsection so that older QEMU can be live updated to this version. However, new QEMU will not be live updateable to old QEMU. This is acceptable because CPR is not yet commonly used, and updates to older versions are unusual. The contents of each device object will be defined by the vfio subsystem in a subsequent patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Link: https://lore.kernel.org/qemu-devel/1751493538-202042-14-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: cpr_get_fd_param helperSteve Sistare2025-07-031-0/+2
| | | | | | | | | | | | | | | | | | | | | | | Add the helper function cpr_get_fd_param, to use when preserving a file descriptor that is opened externally and passed to QEMU. cpr_get_fd_param returns a descriptor number either from a QEMU command-line parameter, from a getfd command, or from CPR state. When a descriptor is passed to new QEMU via SCM_RIGHTS, its number changes. Hence, during CPR, the command-line parameter is ignored in new QEMU, and over-ridden by the value found in CPR state. Similarly, if the descriptor was originally specified by a getfd command in old QEMU, the fd number is not known outside of QEMU, and it changes when sent to new QEMU via SCM_RIGHTS. Hence the user cannot send getfd to new QEMU, but when the user sends a hotplug command that references the fd, cpr_get_fd_param finds its value in CPR state. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/qemu-devel/1751493538-202042-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: lower handler prioritySteve Sistare2025-06-111-1/+5
| | | | | | | | | | | | | | | Define a vmstate priority that is lower than the default, so its handlers run after all default priority handlers. Since 0 is no longer the default priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT. CPR for vfio will use this to install handlers for containers that run after handlers for the devices that they contain. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: cpr helpersSteve Sistare2025-06-111-0/+5
| | | | | | | | | | Add the cpr_incoming_needed, cpr_open_fd, and cpr_resave_fd helpers, for use when adding cpr support for vfio and iommufd. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add save_postcopy_prepare() savevm handlerPeter Xu2025-05-021-0/+15
| | | | | | | | | | | | | | | | | | | | | Add a savevm handler for a module to opt-in sending extra sections right before postcopy starts, and before VM is stopped. RAM will start to use this new savevm handler in the next patch to do flush and sync for multifd pages. Note that we choose to do it before VM stopped because the current only potential user is not sensitive to VM status, so doing it before VM is stopped is preferred to enlarge any postcopy downtime. It is still a bit unfortunate that we need to introduce such a new savevm handler just for the only use case, however it's so far the cleanest. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250411114534.3370816-4-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: cpr_is_incomingSteve Sistare2025-03-141-0/+1
| | | | | | | | | | Define the cpr_is_incoming helper, to be used in several cpr fix patches. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <1741380954-341079-2-git-send-email-steven.sistare@oracle.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Add save_live_complete_precopy_thread handlerMaciej S. Szmigiero2025-03-062-0/+36
| | | | | | | | | | | | | | | | | This SaveVMHandler helps device provide its own asynchronous transmission of the remaining data at the end of a precopy phase via multifd channels, in parallel with the transfer done by save_live_complete_precopy handlers. These threads are launched only when multifd device state transfer is supported. Management of these threads in done in the multifd migration code, wrapping them in the generic thread pool. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/eac74a4ca7edd8968bbf72aa07b9041c76364a16.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Add multifd_device_state_supported()Maciej S. Szmigiero2025-03-061-0/+1
| | | | | | | | | | | | | Since device state transfer via multifd channels requires multifd channels with packets and is currently not compatible with multifd compression add an appropriate query function so device can learn whether it can actually make use of it. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/1ff0d98b85f470e5a33687406e877583b8fab74e.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Device state transfer support - send sideMaciej S. Szmigiero2025-03-061-0/+4
| | | | | | | | | | A new function multifd_queue_device_state() is provided for device to queue its state for transmission via a multifd channel. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/ebd55768d3e5fecb5eb3f197bad9c0c07e5bc084.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add thread pool of optional load threadsMaciej S. Szmigiero2025-03-061-0/+3
| | | | | | | | | | | | | | | | Some drivers might want to make use of auxiliary helper threads during VM state loading, for example to make sure that their blocking (sync) I/O operations don't block the rest of the migration process. Add a migration core managed thread pool to facilitate this use case. The migration core will wait for these threads to finish before (re)starting the VM at destination. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/b09fd70369b6159c75847e69f235cb908b02570c.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add qemu_loadvm_load_state_buffer() and its handlerMaciej S. Szmigiero2025-03-061-0/+15
| | | | | | | | | | | | qemu_loadvm_load_state_buffer() and its load_state_buffer SaveVMHandler allow providing device state buffer to explicitly specified device via its idstr and instance id. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/71ca753286b87831ced4afd422e2e2bed071af25.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add MIG_CMD_SWITCHOVER_START and its load handlerMaciej S. Szmigiero2025-03-062-0/+16
| | | | | | | | | | | | | | | | | | | This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is used to mark the switchover point in main migration stream. It can be used to inform the destination that all pre-switchover main migration stream data has been sent/received so it can start to process post-switchover data that it might have received via other migration channels like the multifd ones. Add also the relevant MigrationState bit stream compatibility property and its hw_compat entry. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Clarify that {load, save}_cleanup handlers can run without setupMaciej S. Szmigiero2025-03-061-1/+5
| | | | | | | | | | | | | | | | | | | | It's possible for {load,save}_cleanup SaveVMHandlers to get called without the corresponding {load,save}_setup handler being called first. One such example is if {load,save}_setup handler of a proceeding device returns error. In this case the migration core cleanup code will call all corresponding cleanup handlers, even for these devices which haven't had its setup handler called. Since this behavior can generate some surprises let's clearly document it in these SaveVMHandlers description. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Cédric Le Goater <clg@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/991636623fb780350f493b5f045cb17e13ce4c0f.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: cpr-transfer modeSteve Sistare2025-01-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the cpr-transfer migration mode, which allows the user to transfer a guest to a new QEMU instance on the same host with minimal guest pause time, by preserving guest RAM in place, albeit with new virtual addresses in new QEMU, and by preserving device file descriptors. Pages that were locked in memory for DMA in old QEMU remain locked in new QEMU, because the descriptor of the device that locked them remains open. cpr-transfer preserves memory and devices descriptors by sending them to new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot be sent over the normal migration channel, because devices and backends are created prior to reading the channel, so this mode sends CPR state over a second "cpr" migration channel. New QEMU reads the cpr channel prior to creating devices or backends. The user specifies the cpr channel in the channel arguments on the outgoing side, and in a second -incoming command-line parameter on the incoming side. The user must start old QEMU with the the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process by transferring a memory descriptor for each ram block. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The user starts new QEMU on the same host as old QEMU, with command-line arguments to create the same machine, plus the -incoming option for the main migration channel, like normal live migration. In addition, the user adds a second -incoming option with channel type "cpr". This CPR channel must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket. To initiate CPR, the user issues a migrate command to old QEMU, adding a second migration channel of type "cpr" in the channels argument. Old QEMU stops the VM, saves state to the migration channels, and enters the postmigrate state. New QEMU mmap's memory descriptors, and execution resumes. The implementation splits qmp_migrate into start and finish functions. Start sends CPR state to new QEMU, which responds by closing the CPR channel. Old QEMU detects the HUP then calls finish, which connects the main migration channel. In summary, the usage is: qemu-system-$arch -machine aux-ram-share=on ... start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>" Issue commands to old QEMU: migrate_set_parameter mode cpr-transfer {"execute": "migrate", ... {"channel-type": "main"...}, {"channel-type": "cpr"...} ... } Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: cpr-transfer save and loadSteve Sistare2025-01-291-0/+3
| | | | | | | | | | Add functions to create a QEMUFile based on a unix URI, for saving or loading, for use by cpr-transfer mode to preserve CPR state. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-16-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: VMSTATE_FDSteve Sistare2025-01-291-0/+9
| | | | | | | | | | Define VMSTATE_FD for declaring a file descriptor field in a VMStateDescription. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-15-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: enhance migrate_uri_parseSteve Sistare2025-01-291-0/+7
| | | | | | | | | | | Export migrate_uri_parse for use outside migration internals, and define a method migrate_is_uri that indicates when migrate_uri_parse should be used. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-12-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: cpr-stateSteve Sistare2025-01-291-0/+25
| | | | | | | | | | | | | | | | | | | | CPR must save state that is needed after QEMU is restarted, when devices are realized. Thus the extra state cannot be saved in the migration channel, as objects must already exist before that channel can be loaded. Instead, define auxilliary state structures and vmstate descriptions, not associated with any registered object, and serialize the aux state to a cpr-specific channel in cpr_state_save. Deserialize in cpr_state_load after QEMU restarts, before devices are realized. Provide accessors for clients to register file descriptors for saving. The mechanism for passing the fd's to the new process will be specific to each migration mode, and added in subsequent patches. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1736967650-129648-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/block: Rewrite disk activationPeter Xu2025-01-091-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch proposes a flag to maintain disk activation status globally. It mostly rewrites disk activation mgmt for QEMU, including COLO and QMP command xen_save_devices_state. Backgrounds =========== We have two problems on disk activations, one resolved, one not. Problem 1: disk activation recover (for switchover interruptions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When migration is either cancelled or failed during switchover, especially when after the disks are inactivated, QEMU needs to remember re-activate the disks again before vm starts. It used to be done separately in two paths: one in qmp_migrate_cancel(), the other one in the failure path of migration_completion(). It used to be fixed in different commits, all over the places in QEMU. So these are the relevant changes I saw, I'm not sure if it's complete list: - In 2016, commit fe904ea824 ("migration: regain control of images when migration fails to complete") - In 2017, commit 1d2acc3162 ("migration: re-active images while migration been canceled after inactive them") - In 2023, commit 6dab4c93ec ("migration: Attempt disk reactivation in more failure scenarios") Now since we have a slightly better picture maybe we can unify the reactivation in a single path. One side benefit of doing so is, we can move the disk operation outside QMP command "migrate_cancel". It's possible that in the future we may want to make "migrate_cancel" be OOB-compatible, while that requires the command doesn't need BQL in the first place. This will already do that and make migrate_cancel command lightweight. Problem 2: disk invalidation on top of invalidated disks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is an unresolved bug for current QEMU. Link in "Resolves:" at the end. It turns out besides the src switchover phase (problem 1 above), QEMU also needs to remember block activation on destination. Consider two continuous migration in a row, where the VM was always paused. In that scenario, the disks are not activated even until migration completed in the 1st round. When the 2nd round starts, if QEMU doesn't know the status of the disks, it needs to try inactivate the disk again. Here the issue is the block layer API bdrv_inactivate_all() will crash a QEMU if invoked on already inactive disks for the 2nd migration. For detail, see the bug link at the end. Implementation ============== This patch proposes to maintain disk activation with a global flag, so we know: - If we used to inactivate disks for migration, but migration got cancelled, or failed, QEMU will know it should reactivate the disks. - On incoming side, if the disks are never activated but then another migration is triggered, QEMU should be able to tell that inactivate is not needed for the 2nd migration. We used to have disk_inactive, but it only solves the 1st issue, not the 2nd. Also, it's done in completely separate paths so it's extremely hard to follow either how the flag changes, or the duration that the flag is valid, and when we will reactivate the disks. Convert the existing disk_inactive flag into that global flag (also invert its naming), and maintain the disk activation status for the whole lifecycle of qemu. That includes the incoming QEMU. Put both of the error cases of source migration (failure, cancelled) together into migration_iteration_finish(), which will be invoked for either of the scenario. So from that part QEMU should behave the same as before. However with such global maintenance on disk activation status, we not only cleanup quite a few temporary paths that we try to maintain the disk activation status (e.g. in postcopy code), meanwhile it fixes the crash for problem 2 in one shot. For freshly started QEMU, the flag is initialized to TRUE showing that the QEMU owns the disks by default. For incoming migrated QEMU, the flag will be initialized to FALSE once and for all showing that the dest QEMU doesn't own the disks until switchover. That is guaranteed by the "once" variable. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Unexport migration_is_active()Avihai Horon2024-12-261-1/+0
| | | | | | | | | | | | | | | After being removed from VFIO and dirty limit, migration_is_active() no longer has any users outside the migration subsystem, and in fact, it's only used in migration.c. Unexport it and also relocate it so it can be made static. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-8-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Drop migration_is_device()Avihai Horon2024-12-261-1/+0
| | | | | | | | | | | | After being removed from VFIO, migration_is_device() no longer has any users. Drop it. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Cédric Le Goater <clg@redhat.com> Acked-by: Peter Xu <peterx@redhat.com> Tested-by: Joao Martins <joao.m.martins@oracle.com> Link: https://lore.kernel.org/r/20241218134022.21264-7-avihaih@nvidia.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Drop migration_is_idle()Peter Xu2024-10-311-1/+0
| | | | | | | | | | | | | | | | Now with the current migration_is_running(), it will report exactly the opposite of what will be reported by migration_is_idle(). Drop migration_is_idle(), instead use "!migration_is_running()" which should be identical on functionality. In reality, most of the idle check is inverted, so it's even easier to write with "migrate_is_running()" check. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Drop migration_is_setup_or_active()Peter Xu2024-10-311-2/+2
| | | | | | | | | | | | | | | | | This helper is mostly the same as migration_is_running(), except that one has COLO reported as true, the other has CANCELLING reported as true. Per my past years experience on the state changes, none of them should matter. To make it slightly safer, report both COLO || CANCELLING to be true in migration_is_running(), then drop the other one. We kept the 1st only because the name is simpler, and clear enough. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Unexport ram_mig_init()Peter Xu2024-10-311-1/+0
| | | | | | | | | It's only used within migration/. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Unexport dirty_bitmap_mig_init()Peter Xu2024-10-311-3/+0
| | | | | | | | | It's only used within migration/, so it shouldn't be exported. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20241024213056.1395400-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Remove unused VMSTATE_ARRAY_TEST() macroPhilippe Mathieu-Daudé2024-06-211-10/+0
| | | | | | | | | | | Last use of VMSTATE_ARRAY_TEST() was removed in commit 46baa9007f ("migration/i386: Remove old non-softfloat 64bit FP support"), we can safely get rid of it. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* Merge tag 'migration-20240522-pull-request' of ↵Richard Henderson2024-05-221-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.com/farosas/qemu into staging Migration pull request - Li Zhijian's COLO minor fixes - Marc-André's virtio-gpu fix - Fiona's virtio-net USO fix - A couple of migration-test fixes from Thomas # -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmZObggQHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnWE8D/49RGE+g29qyk9aKx3lU8mSq+ZzmX5GncBt # 5+Mx5qoHDsBCQTE+dQpEVIoeMJ2HIbgbOML4qsnp6Hw/4/TWkfwC/R6+ZmHBevRk # fVLkVh2JMHVg8Tq+0FO1X1QnMU03uJ7EAuWdDa8HqlJ5dQY/K3gDaku8oQBXk96X # 13pChSbMob76tdb+wiwbdEakabigH7XfrPdI6lzI8MCGTIcPKc/UKTFYuoj/OsNx # raqy+uBtvKtfHxiaYnIgHIPNAF/1f4tP3iAOcPoZWIMXWxFkE8+ANDJAbWo6xIcL # DGg/wEzZO/OnXLjOhjvLBUHK/fx4wQ5bsqA09BVxoRyBGblkXr+bcwBLYjgiEqzT # aniPiAx5W/Db+T7HqZPIWesFYj3cmcwvYUTrx/RPMdC0epG+ZczDMtescHdZbxvt # Pjs3nFeCLhyYcVhlTI72eXRCxdd/26+r6/OmrBC2+GaZrybM61TvNo+3XvO0Pfhi # UmwF2EN27XmSMelLvH/MnflUVgBHKDs3CCQzDlxreHq2jMVR0SL7LU5wMJJ58Iok # M3u74izQM25bwYxiASH+4iRn0puH1mOwgOx28W0uiQfZY/678/lCnwa1Tul15BRE # fIQZJhyIGzhSpwLqEXmdXdlLQs1isqIgpd/mzKgZ285nLr7kz+4gxCUqiXgVbrl7 # P45Dym1u4g== # =DDrh # -----END PGP SIGNATURE----- # gpg: Signature made Wed 22 May 2024 03:13:28 PM PDT # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D * tag 'migration-20240522-pull-request' of https://gitlab.com/farosas/qemu: tests/qtest/migration-test: Fix the check for a successful run of analyze-migration.py tests/qtest/migration-test: Run some basic tests on s390x and ppc64 with TCG, too hw/core/machine: move compatibility flags for VirtIO-net USO to machine 8.1 virtio-gpu: fix v2 migration migration: fix a typo migration: add "exists" info to load-state-field trace migration/colo: Tidy up bql_unlock() around bdrv_activate_all() migration/colo: make colo_incoming_co() return void migration/colo: Minor fix for colo error message Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
| * migration/colo: make colo_incoming_co() return voidLi Zhijian2024-05-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, it always returns 0, no need to check the return value at all. In addition, enter colo coroutine only if migration_incoming_colo_enabled() is true. Once the destination side enters the COLO* state, the COLO process will take over the remaining processes until COLO exits. Cc: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> [fixed mangled author email address] Signed-off-by: Fabiano Rosas <farosas@suse.de>
* | migration: Extend migration_file_set_error() with Error* argumentCédric Le Goater2024-05-161-1/+1
|/ | | | | | | | | | | Use it to update the current error of the migration stream if available and if not, simply print out the error. Next changes will update with an error to report. Reviewed-by: Avihai Horon <avihaih@nvidia.com> Acked-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Eric Auger <eric.auger@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Remove block migrationFabiano Rosas2024-05-081-6/+0
| | | | | | | | | | | | | | The block migration has been considered obsolete since QEMU 8.2 in favor of the more flexible storage migration provided by the blockdev-mirror driver. Two releases have passed so now it's time to remove it. Deprecation commit 66db46ca83 ("migration: Deprecate block migration"). Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Add Error** argument to .load_setup() handlerCédric Le Goater2024-04-231-1/+2
| | | | | | | | | | | | This will be useful to report errors at a higher level, mostly in VFIO today. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-9-clg@redhat.com [peterx: drop comment for ERRP_GUARD, per Markus] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Add Error** argument to .save_setup() handlerCédric Le Goater2024-04-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | The purpose is to record a potential error in the migration stream if qemu_savevm_state_setup() fails. Most of the current .save_setup() handlers can be modified to use the Error argument instead of managing their own and calling locally error_report(). Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Harsh Prateek Bora <harshpb@linux.ibm.com> Cc: Halil Pasic <pasic@linux.ibm.com> Cc: Thomas Huth <thuth@redhat.com> Cc: Eric Blake <eblake@redhat.com> Cc: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Cc: John Snow <jsnow@redhat.com> Cc: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/20240320064911.545001-8-clg@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: purge MigrationState from public interfaceSteve Sistare2024-03-111-4/+2
| | | | | | | | | Move remaining MigrationState references from the public file misc.h to the private file migration.h. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-12-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: delete unused accessorsSteve Sistare2024-03-111-3/+0
| | | | | | Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-11-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: migration_file_set_errorSteve Sistare2024-03-111-0/+2
| | | | | | | | | Define and export migration_file_set_error to eliminate a dependency on MigrationState. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-9-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: migration_is_deviceSteve Sistare2024-03-111-0/+1
| | | | | | | | | Define and export migration_is_device to eliminate a dependency on MigrationState. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1710179338-294359-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>