summary refs log tree commit diff stats
path: root/migration/multifd.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* migration/multifd/tls: Cleanup BYE message processing on sender sidePeter Xu2025-10-031-31/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a trivial cleanup to the BYE messages on the multifd sender side. It could also be a fix, but since we do not have a solid clue, taking this as a cleanup only. One trivial concern is, migration_tls_channel_end() might be unsafe to be invoked in the migration thread if migration is not successful, because when failed / cancelled we do not know whether the multifd sender threads can be writting to the channels, while GnuTLS library (when it's a TLS channel) logically doesn't support concurrent writes. When at it, cleanup on a few things. What changed: - Introduce a helper to do graceful shutdowns with rich comment, hiding the details - Only send bye() iff migration succeeded, skip if it failed / cancelled - Detect TLS channel using channel type rather than thread created flags - Move the loop into the existing one that will close the channels, but do graceful shutdowns before channel shutdowns - local_err seems to have been leaked if set, fix it along the way Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250925201601.290546-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Don't send device state packets with zerocopy flagMaciej S. Szmigiero2025-05-201-1/+6
| | | | | | | | | | | | | | | | | | If zerocopy is enabled for multifd then QIO_CHANNEL_WRITE_FLAG_ZERO_COPY flag is forced into all multifd channel write calls via p->write_flags that was setup in multifd_nocomp_send_setup(). However, device state packets aren't compatible with zerocopy - the data buffer isn't getting kept pinned until multifd channel flush. Make sure to mask that QIO_CHANNEL_WRITE_FLAG_ZERO_COPY flag in a multifd send thread if the data being sent is device state. Fixes: 0525b91a0b99 ("migration/multifd: Device state transfer support - send side") Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/3bd5f48578e29f3a78f41b1e4fbea3d4b2d9b136.1747403393.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: enable multifd and postcopy togetherPrasad Pandit2025-05-201-0/+7
| | | | | | | | | | | | | | | | | | Enable Multifd and Postcopy migration together. The migration_ioc_process_incoming() routine checks magic value sent on each channel and helps to properly setup multifd and postcopy channels. The Precopy and Multifd threads work during the initial guest RAM transfer. When migration moves to the Postcopy phase, the multifd threads cease to send data on multifd channels and Postcopy threads on the destination request/pull data from the source side. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Link: https://lore.kernel.org/r/20250512125124.147064-3-ppandit@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: move macros to multifd headerPrasad Pandit2025-05-021-5/+0
| | | | | | | | | | | Move MULTIFD_ macros to the header file so that they are accessible from other source files. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250411114534.3370816-2-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* include/system: Move exec/ramblock.h to system/ramblock.hRichard Henderson2025-04-231-1/+1
| | | | | | | | Convert the existing includes with sed. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* migration/multifd: Make MultiFDSendData a structPeter Xu2025-03-061-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly introduced device state buffer can be used for either storing VFIO's read() raw data, but already also possible to store generic device states. After noticing that device states may not easily provide a max buffer size (also the fact that RAM MultiFDPages_t after all also want to have flexibility on managing offset[] array), it may not be a good idea to stick with union on MultiFDSendData.. as it won't play well with such flexibility. Switch MultiFDSendData to a struct. It won't consume a lot more space in reality, after all the real buffers were already dynamically allocated, so it's so far only about the two structs (pages, device_state) that will be duplicated, but they're small. With this, we can remove the pretty hard to understand alloc size logic. Because now we can allocate offset[] together with the SendData, and properly free it when the SendData is freed. [MSS: Make sure to clear possible device state payload before freeing MultiFDSendData, remove placeholders for other patches not included] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Acked-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/qemu-devel/7b02baba8e6ddb23ef7c349d312b9b631db09d7e.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Device state transfer support - send sideMaciej S. Szmigiero2025-03-061-6/+36
| | | | | | | | | | A new function multifd_queue_device_state() is provided for device to queue its state for transmission via a multifd channel. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/ebd55768d3e5fecb5eb3f197bad9c0c07e5bc084.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Add an explicit MultiFDSendData destructorMaciej S. Szmigiero2025-03-061-3/+28
| | | | | | | | | | | | | | | This way if there are fields there that needs explicit disposal (like, for example, some attached buffers) they will be handled appropriately. Add a related assert to multifd_set_payload_type() in order to make sure that this function is only used to fill a previously empty MultiFDSendData with some payload, not the other way around. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/6755205f2b95abbed251f87061feee1c0e410836.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Make multifd_send() thread safeMaciej S. Szmigiero2025-03-061-0/+8
| | | | | | | | | | | | | multifd_send() function is currently not thread safe, make it thread safe by holding a lock during its execution. This way it will be possible to safely call it concurrently from multiple threads. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/dd0f3bcc02ca96a7d523ca58ea69e495a33b453b.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Device state transfer support - receive sideMaciej S. Szmigiero2025-03-061-11/+90
| | | | | | | | | | | | | | | | | | Add a basic support for receiving device state via multifd channels - channels that are shared with RAM transfers. Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the packet header either device state (MultiFDPacketDeviceState_t) or RAM data (existing MultiFDPacket_t) is read. The received device state data is provided to qemu_loadvm_load_state_buffer() function for processing in the device's load_state_buffer handler. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/9b86f806c134e7815ecce0eee84f0e0e34aa0146.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Split packet into header and RAM dataMaciej S. Szmigiero2025-03-061-11/+44
| | | | | | | | | | | | | | | Read packet header first so in the future we will be able to differentiate between a RAM multifd packet and a device state multifd packet. Since these two are of different size we can't read the packet body until we know which packet type it is. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/832ad055fe447561ac1ad565d61658660cb3f63f.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration: Change migrate_fd_ to migration_Fabiano Rosas2025-02-141-1/+1
| | | | | | | | | | | | | | Remove all instances of _fd_ from the migration generic code. These functions have grown over time and the _fd_ part is now just confusing. migration_fd_error() -> migration_error() makes it a little vague. Since it's only used for migration_connect() failures, change it to migration_connect_set_error(). Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250213175927.19642-4-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Add a compat property for TLS terminationFabiano Rosas2025-02-141-2/+12
| | | | | | | | | | | | | | | | | | | | | We're currently changing the way the source multifd migration handles the shutdown of the multifd channels when TLS is in use to perform a clean termination by calling gnutls_bye(). Older src QEMUs will always close the channel without terminating the TLS session. New dst QEMUs treat an unclean termination as an error. Add multifd_clean_tls_termination (default true) that can be switched on the destination whenever a src QEMU <= 9.2 is in use. (Note that the compat property is only strictly necessary for src QEMUs older than 9.1. Due to synchronization coincidences, src QEMUs 9.1 and 9.2 can put the destination in a condition where it doesn't see the unclean termination. Still, make the property more inclusive to facilitate potential backports.) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Terminate the TLS connectionFabiano Rosas2025-02-141-1/+37
| | | | | | | | | | | | | | | | | | | | | | The multifd recv side has been getting a TLS error of GNUTLS_E_PREMATURE_TERMINATION at the end of migration when the send side closes the sockets without ending the TLS session. This has been masked by the code not checking the migration error after loadvm. Start ending the TLS session at multifd_send_shutdown() so the recv side always sees a clean termination (EOF) and we can start to differentiate that from an actual premature termination that might possibly happen in the middle of the migration. There's nothing to be done if a previous migration error has already broken the connection, so add a comment explaining it and ignore any errors coming from gnutls_bye(). This doesn't break compat with older recv-side QEMUs because EOF has always caused the recv thread to exit cleanly. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Fix compat with QEMU < 9.0Fabiano Rosas2025-01-091-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f5f48a7891 ("migration/multifd: Separate SYNC request with normal jobs") changed the multifd source side to stop sending data along with the MULTIFD_FLAG_SYNC, effectively introducing the concept of a SYNC-only packet. Relying on that, commit d7e58f412c ("migration/multifd: Don't send ram data during SYNC") later came along and skipped reading data from SYNC packets. In a versions timeline like this: 8.2 f5f48a7 9.0 9.1 d7e58f41 9.2 The issue arises that QEMUs < 9.0 still send data along with SYNC, but QEMUs > 9.1 don't gather that data anymore. This leads to various kinds of migration failures due to desync/missing data. Stop checking for a SYNC packet on the destination and unconditionally unfill the packet. >From now on: old -> new: the source sends data + sync, destination reads normally new -> new: source sends only sync, destination reads zeros new -> old: source sends only sync, destination reads zeros CC: qemu-stable@nongnu.org Fixes: d7e58f412c ("migration/multifd: Don't send ram data during SYNC") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2720 Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241213160120.23880-2-farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Allow to sync with sender threads onlyPeter Xu2025-01-091-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach multifd_send_sync_main() to sync with threads only. We already have such requests, which is when mapped-ram is enabled with multifd. In that case, no SYNC messages will be pushed to the stream when multifd syncs the sender threads because there's no destination threads waiting for that. The whole point of the sync is to make sure all threads finished their jobs. So fundamentally we have a request to do the sync in different ways: - Either to sync the threads only, - Or to sync the threads but also with the destination side. Mapped-ram did it already because of the use_packet check in the sync handler of the sender thread. It works. However it may stop working when e.g. VFIO may start to reuse multifd channels to push device states. In that case VFIO has similar request on "thread-only sync" however we can't check a flag because such sync request can still come from RAM which needs the on-wire notifications. Paving way for that by allowing the multifd_send_sync_main() to specify what kind of sync the caller needs. We can use it for mapped-ram already. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206224755.1108686-3-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* include: Rename sysemu/ -> system/Philippe Mathieu-Daudé2024-12-201-1/+1
| | | | | | | | | | | | | Headers in include/sysemu/ are not only related to system *emulation*, they are also used by virtualization. Rename as system/ which is clearer. Files renamed manually then mechanical change using sed tool. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Tested-by: Lei Yang <leiyang@redhat.com> Message-Id: <20241203172445.28576-1-philmd@linaro.org>
* migration: fix-possible-int-overflowDmitry Frolov2024-11-131-1/+1
| | | | | | | | | | | | stat64_add() takes uint64_t as 2nd argument, but both "p->next_packet_size" and "p->packet_len" are uint32_t. Thus, theyr sum may overflow uint32_t. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Dmitry Frolov <frolov@swemel.ru> Link: https://lore.kernel.org/r/20241113140509.325732-2-frolov@swemel.ru Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Zero p->flags before starting filling a packetMaciej S. Szmigiero2024-10-311-1/+1
| | | | | | | | | | | | | This way there aren't stale flags there. p->flags can't contain SYNC to be sent at the next RAM packet since syncs are now handled separately in multifd_send_thread. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/r/1c96b6cdb797e6f035eb1a4ad9bfc24f4c7f5df8.1730203967.git.maciej.szmigiero@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Put thread names together with macrosPeter Xu2024-10-311-3/+3
| | | | | | | | | | | | | | | | | | | | | Keep migration thread names together, so it's easier to see a list of all possible migration threads. Still two functional changes below besides the macro defintions: - There's one dirty rate thread that we overlooked before, now we add that too and name it as "mig/dirtyrate" following the old rules. - The old name "mig/src/rp-thr" has "-thr" but it may not be useful if it's a thread name anyway, while "rp" can be slightly hard to read. Taking this chance to rename it to "mig/src/return", hopefully a better name. Reviewed-by: Fabiano Rosas <farosas@suse.de> Acked-by: Hyman Huang <yong.huang@smartx.com> Reviewed-by: Zhang Chen <chen.zhang@intel.com> Link: https://lore.kernel.org/r/20241011153652.517440-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add a couple of asserts for p->iovFabiano Rosas2024-09-031-0/+2
| | | | | | | | | Check that p->iov is indeed always allocated and freed by the MultiFDMethods hooks. Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Stop changing the packet on recv sideFabiano Rosas2024-09-031-11/+9
| | | | | | | | | | | | | | | | | | As observed by Philippe, the multifd_ram_unfill_packet() function currently leaves the MultiFDPacket structure with mixed endianness. This is harmless, but ultimately not very clean. Stop touching the received packet and do the necessary work using stack variables instead. While here tweak the error strings and fix the space before semicolons. Also remove the "100 times bigger" comment because it's just one possible explanation for a size mismatch and it doesn't even match the code. CC: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Make MultiFDMethods constFabiano Rosas2024-09-031-4/+4
| | | | | | | | | The methods are defined at module_init time and don't ever change. Make them const. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Move nocomp code into multifd-nocomp.cFabiano Rosas2024-09-031-375/+2
| | | | | | | | | | | | | | In preparation for adding new payload types to multifd, move most of the no-compression code into multifd-nocomp.c. Let's try to keep a semblance of layering by not mixing general multifd control flow with the details of transmitting pages of ram. There are still some pieces leftover, namely the p->normal, p->zero, etc variables that we use for zero page tracking and the packet allocation which is heavily dependent on the ram code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Register nocomp ops dynamicallyFabiano Rosas2024-09-031-13/+19
| | | | | | | | | | | | | Prior to moving the ram code into multifd-nocomp.c, change the code to register the nocomp ops dynamically so we don't need to have the ops structure defined in multifd.c. While here, move the ops struct initialization to the end of the file to make the next diff cleaner. Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Standardize on multifd ops namesFabiano Rosas2024-09-031-66/+12
| | | | | | | | Add the multifd_ prefix to all functions and remove the useless docstrings. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Allow multifd sync without flushFabiano Rosas2024-09-031-4/+9
| | | | | | | | | | | | | | | | | | | | | | | Separate the multifd sync from flushing the client data to the channels. These two operations are closely related but not strictly necessary to be executed together. The multifd sync is intrinsic to how multifd works. The multiple channels operate independently and may finish IO out of order in relation to each other. This applies also between the source and destination QEMU. Flushing the data that is left in the client-owned data structures (e.g. MultiFDPages_t) prior to sync is usually the right thing to do, but that is particular to how the ram migration is implemented with several passes over dirty data. Make these two routines separate, allowing future code to call the sync by itself if needed. This also allows the usage of multifd_ram_send to be isolated to ram code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Replace multifd_send_state->pages with client dataFabiano Rosas2024-09-031-34/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | Multifd currently has a simple scheduling mechanism that distributes work to the various channels by keeping storage space within each channel and an extra space that is given to the client. Each time the client fills the space with data and calls into multifd, that space is given to the next idle channel and a free storage space is taken from the channel and given to client for the next iteration. This means we always need (#multifd_channels + 1) memory slots to operate multifd. This is fine, except that the presence of this one extra memory slot doesn't allow different types of payloads to be processed at the same time in different channels, i.e. the data type of multifd_send_state->pages needs to be the same as p->pages. For each new data type different from MultiFDPage_t that is to be handled, this logic would need to be duplicated by adding new fields to multifd_send_state, to the channels and to multifd_send_pages(). Fix this situation by moving the extra slot into the client and using only the generic type MultiFDSendData in the multifd core. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Don't send ram data during SYNCFabiano Rosas2024-09-031-3/+10
| | | | | | | | | | | | | | | | | | | | Skip saving and loading any ram data in the packet in the case of a SYNC. This fixes a shortcoming of the current code which requires a reset of the MultiFDPages_t fields right after the previous pending_job finishes, otherwise the very next job might be a SYNC and multifd_send_fill_packet() will put the stale values in the packet. By not calling multifd_ram_fill_packet(), we can stop resetting MultiFDPages_t in the multifd core and leave that to the client code. Actually moving the reset function is not yet done because pages->num==0 is used by the client code to determine whether the MultiFDPages_t needs to be flushed. The subsequent patches will replace that with a generic flag that is not dependent on MultiFDPages_t. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Isolate ram pages packet dataFabiano Rosas2024-09-031-39/+60
| | | | | | | | | | While we cannot yet disentangle the multifd packet from page data, we can make the code a bit cleaner by setting the page-related fields in a separate function. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Remove total pages tracingFabiano Rosas2024-09-031-10/+2
| | | | | | | | | | | The total_normal_pages and total_zero_pages elements are used only for the end tracepoints of the multifd threads. These are not super useful since they record per-channel numbers and are just the sum of all the pages that are transmitted per-packet, for which we already have tracepoints. Remove the totals from the tracing. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Move pages accounting into multifd_send_zero_page_detect()Fabiano Rosas2024-09-031-2/+0
| | | | | | | | | | | | | | All references to pages are being removed from the multifd worker threads in order to allow multifd to deal with different payload types. multifd_send_zero_page_detect() is called by all multifd migration paths that deal with pages and is the last spot where zero pages and normal page amounts are adjusted. Move the pages accounting into that function. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Replace p->pages with an union pointerFabiano Rosas2024-09-031-34/+49
| | | | | | | | | | | | | | | We want multifd to be able to handle more types of data than just ram pages. To start decoupling multifd from pages, replace p->pages (MultiFDPages_t) with the new type MultiFDSendData that hides the client payload inside an union. The general idea here is to isolate functions that *need* to handle MultiFDPages_t and move them in the future to multifd-ram.c, while multifd.c will stay with only the core functions that handle MultiFDSendData/MultiFDRecvData. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Make MultiFDPages_t:offset a flexible array memberFabiano Rosas2024-09-031-7/+12
| | | | | | | | | | | | | | | | | We're about to use MultiFDPages_t from inside the MultiFDSendData payload union, which means we cannot have pointers to allocated data inside the pages structure, otherwise we'd lose the reference to that memory once another payload type touches the union. Move the offset array into the end of the structure and turn it into a flexible array member, so it is allocated along with the rest of MultiFDSendData in the next patches. Note that other pointers, such as the ramblock pointer are still fine as long as the storage for them is not owned by the migration code and can be correctly released at some point. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Pass in MultiFDPages_t to file_write_ramblock_iovFabiano Rosas2024-09-031-1/+1
| | | | | | | | We want to stop dereferencing 'pages' so it can be replaced by an opaque pointer in the next patches. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Remove pages->allocatedFabiano Rosas2024-09-031-4/+2
| | | | | | | | | This value never changes and is always the same as page_count. We don't need a copy of it per-channel plus one in the extra slot. Remove it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Inline page_size and page_countFabiano Rosas2024-09-031-16/+17
| | | | | | | | | | The MultiFD*Params structures are for per-channel data. Constant values should not be there because that needlessly wastes cycles and storage. The page_size and page_count fall into this category so move them inline in multifd.h. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Reduce access to p->pagesFabiano Rosas2024-09-031-6/+7
| | | | | | | | | I'm about to replace the p->pages pointer with an opaque pointer, so do a cleanup now to reduce direct accesses to p->page, which makes the next diffs cleaner. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Free MultiFDRecvParams::dataPeter Maydell2024-08-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In multifd_recv_setup() we allocate (among other things) * a MultiFDRecvData struct to multifd_recv_state::data * a MultiFDRecvData struct to each multfd_recv_state->params[i].data (Then during execution we might swap these pointers around.) But in multifd_recv_cleanup() we free multifd_recv_state->data in multifd_recv_cleanup_state() but we don't ever free the multifd_recv_state->params[i].data. This results in a memory leak reported by LeakSanitizer: (cd build/asan && \ ASAN_OPTIONS="fast_unwind_on_malloc=0:strip_path_prefix=/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/../../" \ QTEST_QEMU_BINARY=./qemu-system-x86_64 \ ./tests/qtest/migration-test --tap -k -p /x86_64/migration/multifd/file/mapped-ram ) [...] Direct leak of 72 byte(s) in 3 object(s) allocated from: #0 0x561cc0afcfd8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x218efd8) (BuildId: be72e086d4e47b172b0a72779972213fd9916466) #1 0x7f89d37acc50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x561cc1e9c83c in multifd_recv_setup migration/multifd.c:1606:19 #3 0x561cc1e68618 in migration_ioc_process_incoming migration/migration.c:972:9 #4 0x561cc1e3ac59 in migration_channel_process_incoming migration/channel.c:45:9 #5 0x561cc1e4fa0b in file_accept_incoming_migration migration/file.c:132:5 #6 0x561cc30f2c0c in qio_channel_fd_source_dispatch io/channel-watch.c:84:12 #7 0x7f89d37a3c43 in g_main_dispatch debian/build/deb/../../../glib/gmain.c:3419:28 #8 0x7f89d37a3c43 in g_main_context_dispatch debian/build/deb/../../../glib/gmain.c:4137:7 #9 0x561cc3b21659 in glib_pollfds_poll util/main-loop.c:287:9 #10 0x561cc3b1ff93 in os_host_main_loop_wait util/main-loop.c:310:5 #11 0x561cc3b1fb5c in main_loop_wait util/main-loop.c:589:11 #12 0x561cc1da2917 in qemu_main_loop system/runstate.c:801:9 #13 0x561cc3796c1c in qemu_default_main system/main.c:37:14 #14 0x561cc3796c67 in main system/main.c:48:12 #15 0x7f89d163bd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #16 0x7f89d163be3f in __libc_start_main csu/../csu/libc-start.c:392:3 #17 0x561cc0a79fa4 in _start (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x210bfa4) (BuildId: be72e086d4e47b172b0a72779972213fd9916466) Direct leak of 24 byte(s) in 1 object(s) allocated from: #0 0x561cc0afcfd8 in __interceptor_calloc (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x218efd8) (BuildId: be72e086d4e47b172b0a72779972213fd9916466) #1 0x7f89d37acc50 in g_malloc0 debian/build/deb/../../../glib/gmem.c:161:13 #2 0x561cc1e9bed9 in multifd_recv_setup migration/multifd.c:1588:32 #3 0x561cc1e68618 in migration_ioc_process_incoming migration/migration.c:972:9 #4 0x561cc1e3ac59 in migration_channel_process_incoming migration/channel.c:45:9 #5 0x561cc1e4fa0b in file_accept_incoming_migration migration/file.c:132:5 #6 0x561cc30f2c0c in qio_channel_fd_source_dispatch io/channel-watch.c:84:12 #7 0x7f89d37a3c43 in g_main_dispatch debian/build/deb/../../../glib/gmain.c:3419:28 #8 0x7f89d37a3c43 in g_main_context_dispatch debian/build/deb/../../../glib/gmain.c:4137:7 #9 0x561cc3b21659 in glib_pollfds_poll util/main-loop.c:287:9 #10 0x561cc3b1ff93 in os_host_main_loop_wait util/main-loop.c:310:5 #11 0x561cc3b1fb5c in main_loop_wait util/main-loop.c:589:11 #12 0x561cc1da2917 in qemu_main_loop system/runstate.c:801:9 #13 0x561cc3796c1c in qemu_default_main system/main.c:37:14 #14 0x561cc3796c67 in main system/main.c:48:12 #15 0x7f89d163bd8f in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16 #16 0x7f89d163be3f in __libc_start_main csu/../csu/libc-start.c:392:3 #17 0x561cc0a79fa4 in _start (/mnt/nvmedisk/linaro/qemu-from-laptop/qemu/build/asan/qemu-system-x86_64+0x210bfa4) (BuildId: be72e086d4e47b172b0a72779972213fd9916466) SUMMARY: AddressSanitizer: 96 byte(s) leaked in 4 allocation(s). Free the params[i].data too. Cc: qemu-stable@nongnu.org Fixes: d117ed0699d41 ("migration/multifd: Allow receiving pages without packets") Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Fix multifd_send_setup cleanup when channel creation failsFabiano Rosas2024-08-021-11/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a channel fails to create, the code currently just returns. This is wrong for two reasons: 1) Channel n+1 will not get to initialize it's semaphores, leading to an assert when terminate_threads tries to post to it: qemu-system-x86_64: ../util/qemu-thread-posix.c:92: qemu_mutex_lock_impl: Assertion `mutex->initialized' failed. 2) (theoretical) If channel n-1 already started creation it will defeat the purpose of the channels_created logic which is in place to avoid migrate_fd_cleanup() to run while channels are still being created. This cannot really happen today because the current failure cases for multifd_new_send_channel_create() are all synchronous, resulting from qio_channel_file_new_path() getting a bad filename. This would hit all channels equally. But I don't want to set a trap for future people, so have all channels try to create (even if failing), and only fail after the channels_created semaphore has been posted. While here, remove the error_report_err call. There's one already at migrate_fd_cleanup later on. Cc: qemu-stable@nongnu.org Reported-by: Jim Fehlig <jfehlig@suse.com> Fixes: b7b03eb614 ("migration/multifd: Add outgoing QIOChannelFile support") Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Rename thread debug namesPeter Xu2024-06-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | The postcopy thread names on dest QEMU are slightly confusing, partly I'll need to blame myself on 36f62f11e4 ("migration: Postcopy preemption preparation on channel creation"). E.g., "fault-fast" reads like a fast version of "fault-default", but it's actually the fast version of "postcopy/listen". Taking this chance, rename all the migration threads with proper rules. Considering we only have 15 chars usable, prefix all threads with "mig/", meanwhile identify src/dst threads properly this time. So now most thread names will look like "mig/DIR/xxx", where DIR will be "src"/"dst", except the bg-snapshot thread which doesn't have a direction. For multifd threads, making them "mig/{src|dst}/{send|recv}_%d". We used to have "live_migration" thread for a very long time, now it's called "mig/src/main". We may hope to have "mig/dst/main" soon but not yet. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Zhijian Li (Fujitsu) <lizhijian@fujitsu.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: put IOV initialization into compression methodYuan Liu2024-06-141-10/+12
| | | | | | | | | | | | | Different compression methods may require different numbers of IOVs. Based on streaming compression of zlib and zstd, all pages will be compressed to a data block, so two IOVs are needed for packet header and compressed data block. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: solve zero page causing multiple page faultsYuan Liu2024-04-231-0/+1
| | | | | | | | | | | | | | | | | Implemented recvbitmap tracking of received pages in multifd. If the zero page appears for the first time in the recvbitmap, this page is not checked and set. If the zero page has already appeared in the recvbitmap, there is no need to check the data but directly set the data to 0, because it is unlikely that the zero page will be migrated multiple times. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240401154110.2028453-2-yuan1.liu@intel.com [peterx: touch up the comment, as the bitmap is used outside postcopy now] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Fix clearing of mapped-ram zero pagesFabiano Rosas2024-03-221-2/+1
| | | | | | | | | | | | | | | | | When the zero page detection is done in the multifd threads, we need to iterate the second part of the pages->offset array and clear the file bitmap for each zero page. The piece of code we merged to do that is wrong. The reason this has passed all the tests is because the bitmap is initialized with zeroes already, so clearing the bits only really has an effect during live migration and when a data page goes from having data to no data. Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.") Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240321201242.6009-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Revert mapped-ram multifd support to fd: URIFabiano Rosas2024-03-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit decdc76772c453ff1444612e910caa0d45cd8eac in full and also the relevant migration-tests from 7a09f092834641b7a793d50a3a261073bbb404a6. After the addition of the new QAPI-based migration address API in 8.2 we've been converting an "fd:" URI into a SocketAddress, missing the fact that the "fd:" syntax could also be used for a plain file instead of a socket. This is a problem because the SocketAddress is part of the API, so we're effectively asking users to create a "socket" channel to pass in a plain file. The easiest way to fix this situation is to deprecate the usage of both SocketAddress and "fd:" when used with a plain file for migration. Since this has been possible since 8.2, we can wait until 9.1 to deprecate it. For 9.0, however, we should avoid adding further support to migration to a plain file using the old "fd:" syntax or the new SocketAddress API, and instead require the usage of either the old-style "file:" URI or the FileMigrationArgs::filename field of the new API with the "/dev/fdset/NN" syntax, both of which are already supported. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240319210941.1907-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Implement zero page transmission on the multifd thread.Hao Xiang2024-03-111-14/+76
| | | | | | | | | | | | | | | 1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters. Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Allow clearing of the file_bmap from multifdFabiano Rosas2024-03-111-1/+1
| | | | | | | | | | | | | We currently only need to clear the mapped-ram file bitmap from the migration thread during save_zero_page. We're about to add support for zero page detection on the multifd thread, so allow ramblock_set_file_bmap_atomic() to also clear the bits. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-3-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Don't fsync when closing QIOChannelFileFabiano Rosas2024-03-111-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | Commit bc38feddeb ("io: fsync before closing a file channel") added a fsync/fdatasync at the closing point of the QIOChannelFile to ensure integrity of the migration stream in case of QEMU crash. The decision to do the sync at qio_channel_close() was not the best since that function runs in the main thread and the fsync can cause QEMU to hang for several minutes, depending on the migration size and disk speed. To fix the hang, remove the fsync from qio_channel_file_close(). At this moment, the migration code is the only user of the fsync and we're taking the tradeoff of not having a sync at all, leaving the responsibility to the upper layers. Fixes: bc38feddeb ("io: fsync before closing a file channel") Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240305195629.9922-1-farosas@suse.de Link: https://lore.kernel.org/r/20240305174332.2553-1-farosas@suse.de [peterx: add more comment to the qio_channel_close()] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Document two places for mapped-ramPeter Xu2024-03-041-0/+12
| | | | | | | | | | | Add two documentations for mapped-ram migration on two spots that may not be extremely clear. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240301091524.39900-1-peterx@redhat.com Cc: Prasad Pandit <ppandit@redhat.com> [peterx: fix two English errors per Prasad] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add mapped-ram support to fd: URIFabiano Rosas2024-03-011-0/+2
| | | | | | | | | | | | | | | | | | If we receive a file descriptor that points to a regular file, there's nothing stopping us from doing multifd migration with mapped-ram to that file. Enable the fd: URI to work with multifd + mapped-ram. Note that the fds passed into multifd are duplicated because we want to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The original fd doesn't need to be duplicated because monitor_get_fd() transfers ownership to the caller. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-23-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>