summary refs log tree commit diff stats
path: root/hw/net/virtio-net.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2025-10-04virtio: serialize extended features statePaolo Abeni1-31/+57
If the driver uses any of the extended features (i.e. 64 or above), store the extended features range (64-127 bits). At load time, let legacy features initialize the full features range and pass it to the set helper; sub-states loading will have filled-up the extended part as needed. This is one of the few spots that need explicitly to know and set in stone the extended features array size; add a build bug to prevent breaking the migration should such size change again in the future: more serialization plumbing will be needed. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <d5d9d398675bee6c4c7d7308c5d3d5d3c6d17d87.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04virtio: introduce extended features typePaolo Abeni2-3/+130
The virtio specifications allows for up to 128 bits for the device features. Soon we are going to use some of the 'extended' bits features (bit 64 and above) for the virtio net driver. Represent the virtio features bitmask with a fixed size array, and introduce a few helpers to help manipulate them. Most drivers will keep using only 64 bits features space: use union to allow them access the lower part of the extended space without any per driver change. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <6a9bbb5eb33830f20afbcb7e64d300af4126dd98.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04linux-headers: Update to Linux v6.17-rc1Paolo Abeni29-27/+352
Update headers to include the virtio GSO over UDP tunnel features Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <0b1f3c011f90583ab52aa4fef04df6db35cc4a69.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04linux-headers: deal with counted_by annotationPaolo Abeni1-0/+1
Such annotation is present into the kernel uAPI headers since v6.7, and will be used soon by the vhost_type.h. Deal with it just stripping it. Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <a1430f43cc954d2a931fa60581bda6d6af4bc771.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-04net: bundle all offloads in a single structPaolo Abeni13-49/+59
The set_offload() argument list is already pretty long and we are going to introduce soon a bunch of additional offloads. Replace the offload arguments with a single struct and update all the relevant call-sites. No functional changes intended. Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-ID: <a9d4dd043b8c71b791e9ff05e17ef06072d9714e.1758549625.git.pabeni@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2025-10-03migration-test: test cpr-execSteve Sistare1-0/+133
Add a test for the cpr-exec migration mode. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1759332851-370353-20-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03vfio: cpr-exec modeSteve Sistare4-12/+16
All blockers and notifiers for cpr-transfer mode also apply to cpr-exec. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Cédric Le Goater <clg@redhat.com> Link: https://lore.kernel.org/r/30750362-d4a1-4392-8dd6-016624d01be1@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: cpr-exec docsSteve Sistare1-1/+111
Update developer documentation for cpr-exec mode. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-8-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: cpr-exec modeSteve Sistare9-5/+164
Add the cpr-exec migration mode. Usage: qemu-system-$arch -machine aux-ram-share=on ... migrate_set_parameter mode cpr-exec migrate_set_parameter cpr-exec-command \ <arg1> <arg2> ... -incoming <uri-1> \ migrate -d <uri-1> The migrate command stops the VM, saves state to uri-1, directly exec's a new version of QEMU on the same host, replacing the original process while retaining its PID, and loads state from uri-1. Guest RAM is preserved in place, albeit with new virtual addresses. The new QEMU process is started by exec'ing the command specified by the @cpr-exec-command parameter. The first word of the command is the binary, and the remaining words are its arguments. The command may be a direct invocation of new QEMU, or may be a non-QEMU command that exec's the new QEMU binary. This mode creates a second migration channel that is not visible to the user. At the start of migration, old QEMU saves CPR state to the second channel, and at the end of migration, it tells the main loop to call cpr_exec. New QEMU loads CPR state early, before objects are created. Because old QEMU terminates when new QEMU starts, one cannot stream data between the two, so uri-1 must be a type, such as a file, that accepts all data before old QEMU exits. Otherwise, old QEMU may quietly block writing to the channel. Memory-backend objects must have the share=on attribute, but memory-backend-epc is not supported. The VM must be started with the '-machine aux-ram-share=on' option, which allows anonymous memory to be transferred in place to the new process. The memfds are kept open across exec by clearing the close-on-exec flag, their values are saved in CPR state, and they are mmap'd in new QEMU. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-7-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: cpr-exec save and loadSteve Sistare3-0/+105
To preserve CPR state across exec, create a QEMUFile based on a memfd, and keep the memfd open across exec. Save the value of the memfd in an environment variable so post-exec QEMU can find it. These new functions are called in a subsequent patch. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Link: https://lore.kernel.org/r/1759332851-370353-6-git-send-email-steven.sistare@oracle.com [peterx: fix build for Windows] Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: cpr-exec-command parameterSteve Sistare4-4/+63
Create the cpr-exec-command migration parameter, defined as a list of strings. It will be used for cpr-exec migration mode in a subsequent patch, and contains forward references to cpr-exec mode in the qapi doc. No functional change, except that cpr-exec-command is shown by the 'info migrate' command. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Acked-by: Markus Armbruster <armbru@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-5-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03oslib: qemu_clear_cloexecSteve Sistare3-0/+22
Define qemu_clear_cloexec, analogous to qemu_set_cloexec. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-4-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: add cpr_walk_fdSteve Sistare2-0/+16
Add a helper to walk all CPR fd's and run a callback for each. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/1759332851-370353-3-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: multi-mode notifierSteve Sistare2-13/+59
Allow a notifier to be added for multiple migration modes. To allow a notifier to appear on multiple per-node lists, use a generic list type. We can no longer use NotifierWithReturnList, because it shoe horns the notifier onto a single list. Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/1759332851-370353-2-git-send-email-steven.sistare@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: simplify error reporting after channel readDaniel P. Berrangé1-5/+1
The code handling the return value of qio_channel_read proceses len == 0 (EOF) separately from len < 1 (error), but in both cases ends up calling qemu_file_set_error_obj() with -EIO as the errno. This logic can be merged into one codepath to simplify it. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Link: https://lore.kernel.org/r/20250801170212.54409-2-berrange@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03physmem: Destroy all CPU AddressSpaces on unrealizePeter Maydell6-23/+37
When we unrealize a CPU object (which happens on vCPU hot-unplug), we should destroy all the AddressSpace objects we created via calls to cpu_address_space_init() when the CPU was realized. Commit 24bec42f3d6eae added a function to do this for a specific AddressSpace, but did not add any places where the function was called. Since we always want to destroy all the AddressSpaces on unrealize, regardless of the target architecture, we don't need to try to keep track of how many are still undestroyed, or make the target architecture code manually call a destroy function for each AS it created. Instead we can adjust the function to always completely destroy the whole cpu->ases array, and arrange for it to be called during CPU unrealize as part of the common code. Without this fix, AddressSanitizer will report a leak like this from a run where we hot-plugged and then hot-unplugged an x86 KVM vCPU: Direct leak of 416 byte(s) in 1 object(s) allocated from: #0 0x5b638565053d in calloc (/data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/qemu-system-x86_64+0x1ee153d) (BuildId: c1cd6022b195142106e1bffeca23498c2b752bca) #1 0x7c28083f77b1 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x637b1) (BuildId: 1eb6131419edb83b2178b682829a6913cf682d75) #2 0x5b6386999c7c in cpu_address_space_init /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../system/physmem.c:797:25 #3 0x5b638727f049 in kvm_cpu_realizefn /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../target/i386/kvm/kvm-cpu.c:102:5 #4 0x5b6385745f40 in accel_cpu_common_realize /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../accel/accel-common.c:101:13 #5 0x5b638568fe3c in cpu_exec_realizefn /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../hw/core/cpu-common.c:232:10 #6 0x5b63874a2cd5 in x86_cpu_realizefn /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../target/i386/cpu.c:9321:5 #7 0x5b6387a0469a in device_set_realized /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../hw/core/qdev.c:494:13 #8 0x5b6387a27d9e in property_set_bool /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/object.c:2375:5 #9 0x5b6387a2090b in object_property_set /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/object.c:1450:5 #10 0x5b6387a35b05 in object_property_set_qobject /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/qom-qobject.c:28:10 #11 0x5b6387a21739 in object_property_set_bool /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../qom/object.c:1520:15 #12 0x5b63879fe510 in qdev_realize /data_nvme1n1/linaro/qemu-from-laptop/qemu/build/x86-tgts-asan/../../hw/core/qdev.c:276:12 Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2517 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250929144228.1994037-4-peter.maydell@linaro.org Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03memory: New AS helper to serialize destroy+freePeter Xu2-1/+32
If an AddressSpace has been created in its own allocated memory, cleaning it up requires first destroying the AS and then freeing the memory. Doing this doesn't work: address_space_destroy(as); g_free_rcu(as, rcu); because both address_space_destroy() and g_free_rcu() try to use the same 'rcu' node in the AddressSpace struct and the address_space_destroy hook gets overwritten. Provide a new address_space_destroy_free() function which will destroy the AS and then free the memory it uses, all in one RCU callback. (CC to stable because the next commit needs this function.) Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250929144228.1994037-3-peter.maydell@linaro.org Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03include/system/memory.h: Clarify address_space_destroy() behaviourPeter Maydell1-3/+8
address_space_destroy() doesn't actually immediately destroy the AS; it queues it to be destroyed via RCU. This means you can't g_free() the memory the AS struct is in until that has happened. Clarify this in the documentation. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250929144228.1994037-2-peter.maydell@linaro.org Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: ensure APIC is loaded prior to VFIO PCI devicesYanfei Xu2-0/+2
The load procedure of VFIO PCI devices involves setting up IRT for each VFIO PCI devices. This requires determining whether an interrupt is single-destination interrupt to decide between Posted Interrupt(PI) or remapping mode for the IRTE. However, determining this may require accessing the VM's APIC registers. For example: ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs) ... kvm_arch_irq_bypass_add_producer kvm_x86_call(pi_update_irte) vmx_pi_update_irte kvm_intr_is_single_vcpu If the LAPIC has not been loaded yet, interrupts will use remapping mode. To prevent the fallback of interrupt mode, keep APIC is always loaded prior to VFIO PCI devices. Signed-off-by: Yicong Shen <shenyicong.1023@bytedance.com> Signed-off-by: Yanfei Xu <yanfei.xu@bytedance.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250818131127.1021648-1-yanfei.xu@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Fix state transition in postcopy_start() error handlingJuraj Marcin1-2/+3
Commit 48814111366b ("migration: Always set DEVICE state") introduced DEVICE state to postcopy, which moved the actual state transition that leads to POSTCOPY_ACTIVE. However, the error handling part of the postcopy_start() function still expects the state POSTCOPY_ACTIVE, but depending on where an error happens, now the state can be either ACTIVE, DEVICE or CANCELLING, but never POSTCOPY_ACTIVE, as this transition now happens just before a successful return from the function. Instead, accept any state except CANCELLING when transitioning to FAILED state. Cc: qemu-stable@nongnu.org Fixes: 48814111366b ("migration: Always set DEVICE state") Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250826115145.871272-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration/multifd/tls: Cleanup BYE message processing on sender sidePeter Xu1-31/+34
This patch is a trivial cleanup to the BYE messages on the multifd sender side. It could also be a fix, but since we do not have a solid clue, taking this as a cleanup only. One trivial concern is, migration_tls_channel_end() might be unsafe to be invoked in the migration thread if migration is not successful, because when failed / cancelled we do not know whether the multifd sender threads can be writting to the channels, while GnuTLS library (when it's a TLS channel) logically doesn't support concurrent writes. When at it, cleanup on a few things. What changed: - Introduce a helper to do graceful shutdowns with rich comment, hiding the details - Only send bye() iff migration succeeded, skip if it failed / cancelled - Detect TLS channel using channel type rather than thread created flags - Move the loop into the existing one that will close the channels, but do graceful shutdowns before channel shutdowns - local_err seems to have been leaked if set, fix it along the way Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250925201601.290546-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: HMP: Adjust the order of output fieldsBin Guo1-6/+8
Adjust the positions of 'tls-authz' and 'max-postcopy-bandwidth' in the fields output by the 'info migrate_parameters' command so that related fields are next to each other. For clarity only, no functional changes. Sample output after this commit: (qemu) info migrate_parameters ... max-cpu-throttle: 99 tls-creds: '' tls-hostname: '' tls-authz: '' max-bandwidth: 134217728 bytes/second avail-switchover-bandwidth: 0 bytes/second max-postcopy-bandwidth: 0 bytes/second downtime-limit: 300 ms ... Cc: Dr. David Alan Gilbert <dave@treblig.org> Signed-off-by: Bin Guo <guobin@linux.alibaba.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250929021213.28369-1-guobin@linux.alibaba.com [peterx: move postcopy-bw before avail-switchover-bw] Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Make migration_has_failed() work even for CANCELLINGPeter Xu1-1/+2
No issue I hit, the change is only from code observation when I am looking at a TLS premature termination issue. We set CANCELLED very late, it means migration_has_failed() may not work correctly if it's invoked before updating CANCELLING to CANCELLED. Allow that state will make migration_has_failed() working as expected even if it's invoked slightly earlier. One current user is the multifd code for the TLS graceful termination, where it's before updating to CANCELLED. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250918203937.200833-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03io/crypto: Move tls premature termination handling into QIO layerPeter Xu3-14/+24
QCryptoTLSSession allows TLS premature termination in two cases, one of the case is when the channel shutdown() is invoked on READ side. It's possible the shutdown() happened after the read thread blocked at gnutls_record_recv(). In this case, we should allow the premature termination to happen. The problem is by the time qcrypto_tls_session_read() was invoked, tioc->shutdown may not have been set, so this may instead be treated as an error if there is concurrent shutdown() calls. To allow the flag to reflect the latest status of tioc->shutdown, move the check upper into the QIOChannel level, so as to read the flag only after QEMU gets an GNUTLS_E_PREMATURE_TERMINATION. When at it, introduce qio_channel_tls_allow_premature_termination() helper to make the condition checks easier to read. When doing so, change the qatomic_load_acquire() to qatomic_read(): here we don't need any ordering of memory accesses, but reading a flag. qatomic_read() would suffice because it guarantees fetching from memory. Nothing else we should need to order on memory access. This patch will fix a qemu qtest warning when running the preempt tls test, reporting premature termination: QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test --full -r /x86_64/migration/postcopy/preempt/tls/psk ... qemu-kvm: Cannot read from TLS channel: The TLS connection was non-properly terminated. ... In this specific case, the error was set by postcopy_preempt_thread, which normally will be concurrently shutdown()ed by the main thread. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250918203937.200833-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03backends/tpm: Propagate vTPM error on migration failureArun Menon1-19/+21
- When migration of a VM with encrypted vTPM fails on the destination host, (e.g., due to a mismatch in secret values), the error message displayed on the source host is generic and unhelpful. - For example, a typical error looks like this: "operation failed: job 'migration out' failed: Sibling indicated error 1. operation failed: job 'migration in' failed: load of migration failed: Input/output error" - Such generic errors are logged using error_report(), which prints to the console/monitor but does not make the detailed error accessible via the QMP query-migrate command. - This change, along with the set of changes of passing errp Error object to the VM state loading functions, help in addressing the issue. We use the post_load_errp hook of VMStateDescription to propagate errors by setting Error **errp objects in case of failure in the TPM backend. - It can then be retrieved using QMP command: {"execute" : "query-migrate"} Buglink: https://issues.redhat.com/browse/RHEL-82826 Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-27-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Add error-parameterized function variants in VMSD structArun Menon3-3/+61
- We need to have good error reporting in the callbacks in VMStateDescription struct. Specifically pre_save, pre_load and post_load callbacks. - It is not possible to change these functions everywhere in one patch, therefore, we introduce a duplicate set of callbacks with Error object passed to them. - So, in this commit, we implement 'errp' variants of these callbacks, introducing an explicit Error object parameter. - This is a functional step towards transitioning the entire codebase to the new error-parameterized functions. - Deliberately called in mutual exclusion from their counterparts, to prevent conflicts during the transition. - New impls should preferentally use 'errp' variants of these methods, and existing impls incrementally converted. The variants without 'errp' are intended to be removed once all usage is converted. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-26-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Remove error variant of vmstate_save_state() functionArun Menon15-36/+61
This commit removes the redundant vmstate_save_state_with_err() function. Previously, commit 969298f9d7 introduced vmstate_save_state_with_err() to handle error propagation, while vmstate_save_state() existed for non-error scenarios. This is because there were code paths where vmstate_save_state_v() (called internally by vmstate_save_state) did not explicitly set errors on failure. This change unifies error handling by - updating vmstate_save_state() to accept an Error **errp argument. - vmstate_save_state_v() ensures errors are set directly within the errp object, eliminating the need for two separate functions. All calls to vmstate_save_state_with_err() are replaced with vmstate_save_state(). This simplifies the API and improves code maintainability. vmstate_save_state() that only calls vmstate_save_state_v(), by inference, also has errors set in errp in case of failure. The errors are reported using error_report_err(). If we want the function to exit on error, then &error_fatal is passed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-24-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Capture error in postcopy_ram_listen_thread()Arun Menon1-2/+6
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. postcopy_ram_listen_thread() calls qemu_loadvm_state_main() to load the vm, and in case of a failure, it should set the error in the migration object. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-23-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_postcopy_handle_switchover_start()Arun Menon1-6/+3
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_switchover_start() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-22-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_process_enable_colo()Arun Menon5-24/+26
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_enable_colo() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-21-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Return -1 on memory allocation failure in ram.cArun Menon1-1/+3
The function colo_init_ram_cache() currently returns -errno if qemu_anon_ram_alloc() fails. However, the subsequent cleanup loop that calls qemu_anon_ram_free() could potentially alter the value of errno. This would cause the function to return a value that does not accurately represent the original allocation failure. This commit changes the return value to -1 on memory allocation failure. This ensures that the return value is consistent and is not affected by any errno changes that may occur during the free process. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-20-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_handle_recv_bitmap()Arun Menon1-11/+10
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_handle_recv_bitmap() must report an error in errp, in case of failure. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-19-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_postcopy_ram_handle_discard()Arun Menon1-13/+13
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_ram_handle_discard() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-18-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_postcopy_handle_run()Arun Menon1-7/+3
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_run() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-17-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_postcopy_handle_listen()Arun Menon1-10/+7
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_listen() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-16-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_postcopy_handle_advise()Arun Menon1-21/+19
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_postcopy_handle_advise() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-15-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into ram_postcopy_incoming_init()Arun Menon5-8/+11
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that ram_postcopy_incoming_init() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-14-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: make loadvm_postcopy_handle_resume() voidArun Menon1-6/+5
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. Use warn_report() instead of error_report(); it ensures that a resume command received while the migration is not in postcopy recover state is not fatal. It only informs that the command received is unusual, and therefore we should not set errp with the error string. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-13-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: Update qemu_file_get_return_path() docs and remove dead checksArun Menon4-19/+2
The documentation of qemu_file_get_return_path() states that it can return NULL on failure. However, a review of the current implementation reveals that it is guaranteed that it will always succeed and will never return NULL. As a result, the NULL checks post calling the function become redundant. This commit updates the documentation for the function and removes all NULL checks throughout the migration code. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-12-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into qemu_loadvm_section_part_end()Arun Menon1-11/+7
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_section_part_end() must report an error in errp, in case of failure. This patch also removes the setting of errp when errp is NULL in the out section as it is no longer required in the series. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-11-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into qemu_loadvm_section_start_full()Arun Menon1-18/+19
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_section_start_full() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-10-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into qemu_loadvm_state_main()Arun Menon3-22/+20
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_main() must report an error in errp, in case of failure. Set errp explicitly if it is NULL in case of failure in the out section. This will be removed in the subsequent patch when all of the calls are converted to passing errp. The error message in the default case of qemu_loadvm_state_main() has the word "savevm". This is removed because it can confuse the user while reading destination side error logs. Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-9-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into qemu_load_device_state()Arun Menon3-5/+4
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_load_device_state() must report an error in errp, in case of failure. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-8-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into qemu_loadvm_state()Arun Menon3-15/+31
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state() must report an error in errp, in case of failure. When postcopy live migration runs, the device states are loaded by both the qemu coroutine process_incoming_migration_co() and the postcopy_ram_listen_thread(). Therefore, it is important that the coroutine also reports the error in case of failure, with error_report_err(). Otherwise, the source qemu will not display any errors before going into the postcopy pause state. Suggested-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-7-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_handle_cmd_packaged()Arun Menon1-9/+8
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_handle_cmd_packaged() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-6-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into loadvm_process_command()Arun Menon1-23/+63
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that loadvm_process_command() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series when we are actually able to propagate the error to the calling function. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-5-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into vmstate_load()Arun Menon1-7/+15
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series when we are actually able to propagate the error to the calling function. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-4-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into qemu_loadvm_state_header()Arun Menon1-11/+17
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that qemu_loadvm_state_header() must report an error in errp, in case of failure. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-3-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into vmstate_load_state()Arun Menon15-55/+143
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_load_state() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Whereas, if we want the function to exit on error, then error_fatal is passed. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-2-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
2025-10-03migration: push Error **errp into vmstate_subsection_load()Arun Menon1-3/+10
This is an incremental step in converting vmstate loading code to report error via Error objects instead of directly printing it to console/monitor. It is ensured that vmstate_subsection_load() must report an error in errp, in case of failure. The errors are temporarily reported using error_report_err(). This is removed in the subsequent patches in this series, when we are actually able to propagate the error to the calling function using errp. Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Arun Menon <armenon@redhat.com> Tested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp> Link: https://lore.kernel.org/r/20250918-propagate_tpm_error-v14-1-36f11a6fb9d3@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>