summary refs log tree commit diff stats
path: root/migration/savevm.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* migration: Remove error variant of vmstate_save_state() functionArun Menon2025-10-031-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Capture error in postcopy_ram_listen_thread()Arun Menon2025-10-031-2/+6
| | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. postcopy_ram_listen_thread() calls qemu_loadvm_state_main() to load the vm, and in case of a failure, it should set the error in the migration object. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-23-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_postcopy_handle_switchover_start()Arun Menon2025-10-031-6/+3
| | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_switchover_start() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-22-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_process_enable_colo()Arun Menon2025-10-031-12/+14
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_enable_colo() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-21-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_handle_recv_bitmap()Arun Menon2025-10-031-11/+10
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_handle_recv_bitmap() must report an error in errp, in case of failure. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-19-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_postcopy_ram_handle_discard()Arun Menon2025-10-031-13/+13
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_ram_handle_discard() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-18-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_postcopy_handle_run()Arun Menon2025-10-031-7/+3
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_run() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-17-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_postcopy_handle_listen()Arun Menon2025-10-031-10/+7
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_listen() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-16-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_postcopy_handle_advise()Arun Menon2025-10-031-21/+19
| | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_advise() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-15-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into ram_postcopy_incoming_init()Arun Menon2025-10-031-1/+1
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that ram_postcopy_incoming_init() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-14-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: make loadvm_postcopy_handle_resume() voidArun Menon2025-10-031-6/+5
| | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. Use warn_report() instead of error_report(); it ensures that a resume command received while the migration is not in postcopy recover state is not fatal. It only informs that the command received is unusual, and therefore we should not set errp with the error string. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-13-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Update qemu_file_get_return_path() docs and remove dead checksArun Menon2025-10-031-4/+0
| | | | | | | | | | | | | | | | | | | | The documentation of qemu_file_get_return_path() states that it can return NULL on failure. However, a review of the current implementation reveals that it is guaranteed that it will always succeed and will never return NULL. As a result, the NULL checks post calling the function become redundant. This commit updates the documentation for the function and removes all NULL checks throughout the migration code. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-12-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into qemu_loadvm_section_part_end()Arun Menon2025-10-031-11/+7
| | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_section_part_end() must report an error in errp, in case of failure. This patch also removes the setting of errp when errp is NULL in the out section as it is no longer required in the series. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-11-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into qemu_loadvm_section_start_full()Arun Menon2025-10-031-18/+19
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_section_start_full() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-10-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into qemu_loadvm_state_main()Arun Menon2025-10-031-19/+17
| | | | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_main() must report an error in errp, in case of failure. Set errp explicitly if it is NULL in case of failure in the out section. This will be removed in the subsequent patch when all of the calls are converted to passing errp. The error message in the default case of qemu_loadvm_state_main() has the word "savevm". This is removed because it can confuse the user while reading destination side error logs. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-9-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into qemu_load_device_state()Arun Menon2025-10-031-2/+2
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_load_device_state() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-8-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into qemu_loadvm_state()Arun Menon2025-10-031-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state() must report an error in errp, in case of failure. When postcopy live migration runs, the device states are loaded by both the qemu coroutine process_incoming_migration_co() and the postcopy_ram_listen_thread(). Therefore, it is important that the coroutine also reports the error in case of failure, with error_report_err(). Otherwise, the source qemu will not display any errors before going into the postcopy pause state. Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-7-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_handle_cmd_packaged()Arun Menon2025-10-031-9/+8
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_handle_cmd_packaged() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-6-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into loadvm_process_command()Arun Menon2025-10-031-23/+63
| | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_command() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series when we are actually able to propagate the error to the calling function. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-5-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into vmstate_load()Arun Menon2025-10-031-7/+15
| | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series when we are actually able to propagate the error to the calling function. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-4-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into qemu_loadvm_state_header()Arun Menon2025-10-031-11/+17
| | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_header() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-3-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: push Error **errp into vmstate_load_state()Arun Menon2025-10-031-2/+6
| | | | | | | | | | | | | | | | | | | | | | This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: qemu_file_set_blocking(): add errp parameterVladimir Sementsov-Ogievskiy2025-09-191-2/+2
| | | | | | | | | | | | | | | | | | | | qemu_file_set_blocking() is a wrapper on qio_channel_set_blocking(), so let's passthrough the errp. Note the migration should not be using &error_abort in these calls, however, this is done to expedite the API conversion. The original code would have eventually ended up calling either qemu_socket_set_nonblock which would asset on Linux, or g_unix_set_fd_nonblocking which would propagate errors. We never saw asserts in practice, and conceptually they should not happen, but ideally this code will be later adapted to remove use of &error_abort. Acked-by: Peter Xu <peterx@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
* migration: Rename save_live_complete_precopy_thread to ↵Juraj Marcin2025-07-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | save_complete_precopy_thread Recent patch [1] renames the save_live_complete_precopy handler to save_complete, as the machine is not live in most cases when this handler is executed. The same is true also for save_live_complete_precopy_thread, therefore this patch removes the "live" keyword from the handler itself and related types to keep the naming unified. In contrast to save_complete, this handler is only executed at the end of precopy, therefore the "precopy" keyword is retained. [1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: qemu_savevm_complete*() helpersPeter Xu2025-07-111-34/+44
| | | | | | | | | | | Since we use the same save_complete() hook for both precopy and postcopy, add a set of helpers to invoke the hook() to dedup the code. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Rename save_live_complete_precopy to save_completePeter Xu2025-07-111-4/+4
| | | | | | | | | | | | | | | | | | Now after merging the precopy and postcopy version of complete() hook, rename the precopy version from save_live_complete_precopy() to save_complete(). Dropping the "live" when at it, because it's in most cases not live when happening (in precopy). No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com [peterx: squash the fixup that covers a few more doc spots, per Juraj] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Drop save_live_complete_postcopy hookPeter Xu2025-07-111-5/+4
| | | | | | | | | | | | | | The hook is only defined in two vmstate users ("ram" and "block dirty bitmap"), meanwhile both of them define the hook exactly the same as the precopy version. Hence, this postcopy version isn't needed. No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: lower handler prioritySteve Sistare2025-06-111-2/+2
| | | | | | | | | | | | | | | Define a vmstate priority that is lower than the default, so its handlers run after all default priority handlers. Since 0 is no longer the default priority, translate an uninitialized priority of 0 to MIG_PRI_DEFAULT. CPR for vfio will use this to install handlers for containers that run after handlers for the devices that they contain. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/1749569991-25171-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/postcopy: Replace QemuSemaphore with QemuEventAkihiko Odaki2025-06-061-1/+1
| | | | | | | | | | thread_sync_sem is an one-shot event so it can be converted into QemuEvent, which is more lightweight. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250529-event-v5-9-53b285203794@daynix.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* migration: Add save_postcopy_prepare() savevm handlerPeter Xu2025-05-021-0/+33
| | | | | | | | | | | | | | | | | | | | | Add a savevm handler for a module to opt-in sending extra sections right before postcopy starts, and before VM is stopped. RAM will start to use this new savevm handler in the next patch to do flush and sync for multifd pages. Note that we choose to do it before VM stopped because the current only potential user is not sensitive to VM status, so doing it before VM is stopped is preferred to enlarge any postcopy downtime. It is still a bit unfortunate that we need to introduce such a new savevm handler just for the only use case, however it's so far the cleanest. Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-ID: <20250411114534.3370816-4-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* page-vary: Move and rename qemu_target_page_bits_minRichard Henderson2025-04-231-3/+3
| | | | | | | | | | Rename to migration_legacy_page_bits, to make it clear that we cannot change the value without causing a migration break. Move to page-vary.h and page-vary-target.c. Define via TARGET_PAGE_BITS if not TARGET_PAGE_BITS_VARY. Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* include/system: Move exec/memory.h to system/memory.hRichard Henderson2025-04-231-1/+1
| | | | | | | | | | | | Convert the existing includes with sed -i ,exec/memory.h,system/memory.h,g Move the include within cpu-all.h into a !CONFIG_USER_ONLY block. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* migration: ram block cpr blockersSteve Sistare2025-03-101-0/+2
| | | | | | | | | | | | | | | | | Unlike cpr-reboot mode, cpr-transfer mode cannot save volatile ram blocks in the migration stream file and recreate them later, because the physical memory for the blocks is pinned and registered for vfio. Add a blocker for volatile ram blocks. Also add a blocker for RAM_GUEST_MEMFD. Preserving guest_memfd may be sufficient for CPR, but it has not been tested yet. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: David Hildenbrand <david@redhat.com> Message-ID: <1740667681-257312-1-git-send-email-steven.sistare@oracle.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Add save_live_complete_precopy_thread handlerMaciej S. Szmigiero2025-03-061-1/+39
| | | | | | | | | | | | | | | | | This SaveVMHandler helps device provide its own asynchronous transmission of the remaining data at the end of a precopy phase via multifd channels, in parallel with the transfer done by save_live_complete_precopy handlers. These threads are launched only when multifd device state transfer is supported. Management of these threads in done in the multifd migration code, wrapping them in the generic thread pool. Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/qemu-devel/eac74a4ca7edd8968bbf72aa07b9041c76364a16.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add thread pool of optional load threadsMaciej S. Szmigiero2025-03-061-2/+93
| | | | | | | | | | | | | | | | Some drivers might want to make use of auxiliary helper threads during VM state loading, for example to make sure that their blocking (sync) I/O operations don't block the rest of the migration process. Add a migration core managed thread pool to facilitate this use case. The migration core will wait for these threads to finish before (re)starting the VM at destination. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/b09fd70369b6159c75847e69f235cb908b02570c.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Always take BQL for migration_incoming_state_destroy()Maciej S. Szmigiero2025-03-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All callers to migration_incoming_state_destroy() other than postcopy_ram_listen_thread() do this call with BQL held. Since migration_incoming_state_destroy() ultimately calls "load_cleanup" SaveVMHandlers and it will soon call BQL-sensitive code it makes sense to always call that function under BQL rather than to have it deal with both cases (with BQL and without BQL). Add the necessary bql_lock() and bql_unlock() to postcopy_ram_listen_thread(). qemu_loadvm_state_main() in postcopy_ram_listen_thread() could call "load_state" SaveVMHandlers that are expecting BQL to be held. In principle, the only devices that should be arriving on migration channel serviced by postcopy_ram_listen_thread() are those that are postcopiable and whose load handlers are safe to be called without BQL being held. But nothing currently prevents the source from sending data for "unsafe" devices which would cause trouble there. Add a TODO comment there so it's clear that it would be good to improve handling of such (erroneous) case in the future. Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/21bb5ca337b1d5a802e697f553f37faf296b5ff4.1741193259.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add qemu_loadvm_load_state_buffer() and its handlerMaciej S. Szmigiero2025-03-061-0/+23
| | | | | | | | | | | | qemu_loadvm_load_state_buffer() and its load_state_buffer SaveVMHandler allow providing device state buffer to explicitly specified device via its idstr and instance id. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/71ca753286b87831ced4afd422e2e2bed071af25.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Add MIG_CMD_SWITCHOVER_START and its load handlerMaciej S. Szmigiero2025-03-061-0/+39
| | | | | | | | | | | | | | | | | | | This QEMU_VM_COMMAND sub-command and its switchover_start SaveVMHandler is used to mark the switchover point in main migration stream. It can be used to inform the destination that all pre-switchover main migration stream data has been sent/received so it can start to process post-switchover data that it might have received via other migration channels like the multifd ones. Add also the relevant MigrationState bit stream compatibility property and its hw_compat entry. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhang Chen <zhangckid@gmail.com> # for the COLO part Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/311be6da85fc7e49a7598684d80aa631778dcbce.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Check migration error after loadvmFabiano Rosas2025-02-141-1/+5
| | | | | | | | | | | | | | | | | | We're currently only checking the QEMUFile error after qemu_loadvm_state(). This was causing a TLS termination error from multifd recv threads to be ignored. Start checking the migration error as well to avoid missing further errors. Regarding compatibility concerning the TLS termination error that was being ignored, for QEMUs <= 9.2 - if the old QEMU is being used as migration source - the recently added migration property multifd-tls-clean-termination needs to be set to OFF in the *destination* machine. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Cleanup qemu_savevm_state_complete_precopy()Peter Xu2025-01-291-13/+7
| | | | | | | | | | | Now qemu_savevm_state_complete_precopy() is never used in postcopy, clean it up as in_postcopy==false now unconditionally. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-14-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Unwrap qemu_savevm_state_complete_precopy() in postcopyPeter Xu2025-01-291-1/+0
| | | | | | | | | | | | | | | | | | | | Postcopy invokes qemu_savevm_state_complete_precopy() twice for a long time, and that caused way too much confusions. Let's clean this up and make postcopy easier to read. It's actually fairly straightforward: postcopy starts with saving non-postcopiable iterables, then later it saves again with non-iterable only. Move these two calls out makes everything much easier to follow. Otherwise it's very unclear what qemu_savevm_state_complete_precopy() did in either of the calls. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-13-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Notify COMPLETE once for postcopyPeter Xu2025-01-291-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Postcopy invokes qemu_savevm_state_complete_precopy() twice, that means it'll invoke COMPLETE notify twice.. also twice the tracepoints that marking precopy complete. Move that notification (along with the tracepoint) out to the caller, so that postcopy will only notify once right at the start of switchover phase from precopy. When at it, rename it to suite the file now it locates. For precopy, there should have no functional change except the tracepoint has a name change. For the other two users of qemu_savevm_state_complete_precopy(), namely: qemu_savevm_state() and qemu_savevm_live_state(): the notifier shouldn't matter because they're not precopy at all. Now in these two contexts (aka, "savevm", and "colo") sometimes the precopy notifiers will still be invoked, but that's outside the scope of this patch. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-12-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Synchronize all CPU states only for non-iterable dumpPeter Xu2025-01-291-2/+3
| | | | | | | | | | | | | | | | | | | Do one shot cpu sync at qemu_savevm_state_complete_precopy_non_iterable(), instead of coding it separately in two places. Note that in the context of qemu_savevm_state_complete_precopy(), this patch is also an optimization for postcopy path, in that we can avoid sync cpu twice during switchover: before this patch, postcopy_start() invokes twice on qemu_savevm_state_complete_precopy(), each of them will try to sync CPU info. In reality, only one of them would be enough. For background snapshot, there's no intended functional change. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-7-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Drop inactivate_disk param in qemu_savevm_state_complete*Peter Xu2025-01-291-22/+5
| | | | | | | | | | | | | | | | | | | | | | This parameter is only used by one caller, which is the genuine precopy complete path (migration_completion_precopy). The parameter was introduced in a1fbe750fd ("migration: Fix race of image locking between src and dst") to make sure the inactivate will happen before EOF to make sure dest will always be able to activate the disk properly. However there's no limitation on how early we inactivate the disk. For precopy completion path, we can always do that as long as VM is stopped. Move the disk inactivate there, then we can remove this inactivate_disk parameter in the whole call stack, because all the rest users pass in false always. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-6-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Do not construct JSON description if suppressedPeter Xu2025-01-291-23/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | QEMU machine has a property "suppress-vmdesc". When it is enabled, QEMU will stop attaching JSON VM description at the end of the precopy migration stream (postcopy is never affected because postcopy never attach that). However even if it's suppressed by the user, the source QEMU will still construct the JSON descriptions, which is a complete waste of CPU and memory resources. To avoid it, only create the JSON writer object if suppress-vmdesc is not specified. Luckily, vmstate_save() already supports vmdesc==NULL, so only a few spots that are left to be prepared that vmdesc can be NULL now. When at it, move the init / destroy of the JSON writer object to start / end of the migration - the JSON writer object is a sub-struct of migration state, and that looks like the only object that was dynamically allocated / destroyed within migration process. Make it the same as the rest objects that migration uses. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-3-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Remove postcopy implications in should_send_vmdesc()Peter Xu2025-01-291-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | should_send_vmdesc() has a hack inside (which was not reflected in the function name) in that it tries to detect global postcopy state and that will affect the value to be returned. It's easier to keep the helper simple by only check the suppress-vmdesc property. Then: - On the sender side of its usage, there's already in_postcopy variable that we can use: postcopy doesn't send vmdesc at all, so directly skip everything for postcopy. - On the recv side, when reaching vmdesc processing it must be precopy code already, hence that hack check never used to work anyway. No functional change intended, except a trivial side effect that QEMU source will start to avoid running some JSON helper in postcopy path, but that would only reduce the postcopy blackout window a bit, rather than any other bad side effect. Signed-off-by: Peter Xu <peterx@redhat.com> Tested-by: Jiri Denemark <jdenemar@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250114230746.3268797-2-peterx@redhat.com Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: fix -Werror=maybe-uninitializedMarc-André Lureau2025-01-291-1/+1
| | | | | | | | | | | | | | | ../migration/savevm.c: In function ‘qemu_savevm_state_complete_precopy_non_iterable’: ../migration/savevm.c:1560:20: error: ‘ret’ may be used uninitialized [-Werror=maybe-uninitialized] 1560 | return ret; | ^~~ Cc: Peter Xu <peterx@redhat.com> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250114104811.2612846-1-marcandre.lureau@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/block: Rewrite disk activationPeter Xu2025-01-091-20/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch proposes a flag to maintain disk activation status globally. It mostly rewrites disk activation mgmt for QEMU, including COLO and QMP command xen_save_devices_state. Backgrounds =========== We have two problems on disk activations, one resolved, one not. Problem 1: disk activation recover (for switchover interruptions) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When migration is either cancelled or failed during switchover, especially when after the disks are inactivated, QEMU needs to remember re-activate the disks again before vm starts. It used to be done separately in two paths: one in qmp_migrate_cancel(), the other one in the failure path of migration_completion(). It used to be fixed in different commits, all over the places in QEMU. So these are the relevant changes I saw, I'm not sure if it's complete list: - In 2016, commit fe904ea824 ("migration: regain control of images when migration fails to complete") - In 2017, commit 1d2acc3162 ("migration: re-active images while migration been canceled after inactive them") - In 2023, commit 6dab4c93ec ("migration: Attempt disk reactivation in more failure scenarios") Now since we have a slightly better picture maybe we can unify the reactivation in a single path. One side benefit of doing so is, we can move the disk operation outside QMP command "migrate_cancel". It's possible that in the future we may want to make "migrate_cancel" be OOB-compatible, while that requires the command doesn't need BQL in the first place. This will already do that and make migrate_cancel command lightweight. Problem 2: disk invalidation on top of invalidated disks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is an unresolved bug for current QEMU. Link in "Resolves:" at the end. It turns out besides the src switchover phase (problem 1 above), QEMU also needs to remember block activation on destination. Consider two continuous migration in a row, where the VM was always paused. In that scenario, the disks are not activated even until migration completed in the 1st round. When the 2nd round starts, if QEMU doesn't know the status of the disks, it needs to try inactivate the disk again. Here the issue is the block layer API bdrv_inactivate_all() will crash a QEMU if invoked on already inactive disks for the 2nd migration. For detail, see the bug link at the end. Implementation ============== This patch proposes to maintain disk activation with a global flag, so we know: - If we used to inactivate disks for migration, but migration got cancelled, or failed, QEMU will know it should reactivate the disks. - On incoming side, if the disks are never activated but then another migration is triggered, QEMU should be able to tell that inactivate is not needed for the 2nd migration. We used to have disk_inactive, but it only solves the 1st issue, not the 2nd. Also, it's done in completely separate paths so it's extremely hard to follow either how the flag changes, or the duration that the flag is valid, and when we will reactivate the disks. Convert the existing disk_inactive flag into that global flag (also invert its naming), and maintain the disk activation status for the whole lifecycle of qemu. That includes the incoming QEMU. Put both of the error cases of source migration (failure, cancelled) together into migration_iteration_finish(), which will be invoked for either of the scenario. So from that part QEMU should behave the same as before. However with such global maintenance on disk activation status, we not only cleanup quite a few temporary paths that we try to maintain the disk activation status (e.g. in postcopy code), meanwhile it fixes the crash for problem 2 in one shot. For freshly started QEMU, the flag is initialized to TRUE showing that the QEMU owns the disks by default. For incoming migrated QEMU, the flag will be initialized to FALSE once and for all showing that the dest QEMU doesn't own the disks until switchover. That is guaranteed by the "once" variable. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2395 Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/block: Fix possible race with block_inactivePeter Xu2025-01-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | Src QEMU sets block_inactive=true very early before the invalidation takes place. It means if something wrong happened during setting the flag but before reaching qemu_savevm_state_complete_precopy_non_iterable() where it did the invalidation work, it'll make block_inactive flag inconsistent. For example, think about when qemu_savevm_state_complete_precopy_iterable() can fail: it will have block_inactive set to true even if all block drives are active. Fix that by only update the flag after the invalidation is done. No Fixes for any commit, because it's not an issue if bdrv_activate_all() is re-entrant upon all-active disks - false positive block_inactive can bring nothing more than "trying to active the blocks but they're already active". However let's still do it right to avoid the inconsistent flag v.s. reality. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-6-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/block: Apply late-block-active behavior to postcopyPeter Xu2025-01-091-13/+12
| | | | | | | | | | | | | | Postcopy never cared about late-block-active. However there's no mention in the capability that it doesn't apply to postcopy. Considering that we _assumed_ late activation is always good, do that too for postcopy unconditionally, just like precopy. After this patch, we should have unified the behavior across all. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206230838.1111496-5-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>