summary refs log tree commit diff stats
path: root/migration/multifd.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* migration/multifd: move macros to multifd headerPrasad Pandit2025-05-021-0/+5
| | | | | | | | | | | Move MULTIFD_ macros to the header file so that they are accessible from other source files. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Prasad Pandit <pjp@fedoraproject.org> Reviewed-by: Peter Xu <peterx@redhat.com> Message-ID: <20250411114534.3370816-2-ppandit@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Make MultiFDSendData a structPeter Xu2025-03-061-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newly introduced device state buffer can be used for either storing VFIO's read() raw data, but already also possible to store generic device states. After noticing that device states may not easily provide a max buffer size (also the fact that RAM MultiFDPages_t after all also want to have flexibility on managing offset[] array), it may not be a good idea to stick with union on MultiFDSendData.. as it won't play well with such flexibility. Switch MultiFDSendData to a struct. It won't consume a lot more space in reality, after all the real buffers were already dynamically allocated, so it's so far only about the two structs (pages, device_state) that will be duplicated, but they're small. With this, we can remove the pretty hard to understand alloc size logic. Because now we can allocate offset[] together with the SendData, and properly free it when the SendData is freed. [MSS: Make sure to clear possible device state payload before freeing MultiFDSendData, remove placeholders for other patches not included] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Acked-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/qemu-devel/7b02baba8e6ddb23ef7c349d312b9b631db09d7e.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Device state transfer support - send sideMaciej S. Szmigiero2025-03-061-8/+26
| | | | | | | | | | A new function multifd_queue_device_state() is provided for device to queue its state for transmission via a multifd channel. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/ebd55768d3e5fecb5eb3f197bad9c0c07e5bc084.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Add an explicit MultiFDSendData destructorMaciej S. Szmigiero2025-03-061-0/+5
| | | | | | | | | | | | | | | This way if there are fields there that needs explicit disposal (like, for example, some attached buffers) they will be handled appropriately. Add a related assert to multifd_set_payload_type() in order to make sure that this function is only used to fill a previously empty MultiFDSendData with some payload, not the other way around. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/6755205f2b95abbed251f87061feee1c0e410836.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Device state transfer support - receive sideMaciej S. Szmigiero2025-03-061-1/+18
| | | | | | | | | | | | | | | | | | Add a basic support for receiving device state via multifd channels - channels that are shared with RAM transfers. Depending whether MULTIFD_FLAG_DEVICE_STATE flag is present or not in the packet header either device state (MultiFDPacketDeviceState_t) or RAM data (existing MultiFDPacket_t) is read. The received device state data is provided to qemu_loadvm_load_state_buffer() function for processing in the device's load_state_buffer handler. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/9b86f806c134e7815ecce0eee84f0e0e34aa0146.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Split packet into header and RAM dataMaciej S. Szmigiero2025-03-061-0/+5
| | | | | | | | | | | | | | | Read packet header first so in the future we will be able to differentiate between a RAM multifd packet and a device state multifd packet. Since these two are of different size we can't read the packet body until we know which packet type it is. Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Link: https://lore.kernel.org/qemu-devel/832ad055fe447561ac1ad565d61658660cb3f63f.1741124640.git.maciej.szmigiero@oracle.com Signed-off-by: Cédric Le Goater <clg@redhat.com>
* migration/multifd: Add a compat property for TLS terminationFabiano Rosas2025-02-141-0/+2
| | | | | | | | | | | | | | | | | | | | | We're currently changing the way the source multifd migration handles the shutdown of the multifd channels when TLS is in use to perform a clean termination by calling gnutls_bye(). Older src QEMUs will always close the channel without terminating the TLS session. New dst QEMUs treat an unclean termination as an error. Add multifd_clean_tls_termination (default true) that can be switched on the destination whenever a src QEMU <= 9.2 is in use. (Note that the compat property is only strictly necessary for src QEMUs older than 9.1. Due to synchronization coincidences, src QEMUs 9.1 and 9.2 can put the destination in a condition where it doesn't see the unclean termination. Still, make the property more inclusive to facilitate potential backports.) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Cleanup src flushes on condition checkPeter Xu2025-01-091-0/+2
| | | | | | | | | | | | | | | | | | The src flush condition check is over complicated, and it's getting more out of control if postcopy will be involved. In general, we have two modes to do the sync: legacy or modern ways. Legacy uses per-section flush, modern uses per-round flush. Mapped-ram always uses the modern, which is per-round. Introduce two helpers, which can greatly simplify the code, and hopefully make it readable again. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206224755.1108686-7-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messagesPeter Xu2025-01-091-1/+1
| | | | | | | | | | | RAM_SAVE_FLAG_MULTIFD_FLUSH message should always be correlated to a sync request on src. Unify such message into one place, and conditionally send the message only if necessary. Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20241206224755.1108686-5-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Allow to sync with sender threads onlyPeter Xu2025-01-091-3/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach multifd_send_sync_main() to sync with threads only. We already have such requests, which is when mapped-ram is enabled with multifd. In that case, no SYNC messages will be pushed to the stream when multifd syncs the sender threads because there's no destination threads waiting for that. The whole point of the sync is to make sure all threads finished their jobs. So fundamentally we have a request to do the sync in different ways: - Either to sync the threads only, - Or to sync the threads but also with the destination side. Mapped-ram did it already because of the use_packet check in the sync handler of the sender thread. It works. However it may stop working when e.g. VFIO may start to reuse multifd channels to push device states. In that case VFIO has similar request on "thread-only sync" however we can't check a flag because such sync request can still come from RAM which needs the on-wire notifications. Paving way for that by allowing the multifd_send_sync_main() to specify what kind of sync the caller needs. We can use it for mapped-ram already. No functional change intended. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Message-Id: <20241206224755.1108686-3-peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration: Introduce 'qatzip' compression methodBryan Zhang2024-09-091-2/+3
| | | | | | | | | | | | | | | Adds support for 'qatzip' as an option for the multifd compression method parameter, and implements using QAT for 'qatzip' compression and decompression. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Prasad Pandit <pjp@fedoraproject.org> Signed-off-by: Bryan Zhang <bryan.zhang@bytedance.com> Signed-off-by: Hao Xiang <hao.xiang@linux.dev> Signed-off-by: Yichen Wang <yichen.wang@bytedance.com> Link: https://lore.kernel.org/r/20240830232722.58272-5-yichen.wang@bytedance.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add documentation for multifd methodsFabiano Rosas2024-09-031-6/+70
| | | | | | | | | | | | Add documentation clarifying the usage of the multifd methods. The general idea is that the client code calls into multifd to trigger send/recv of data and multifd then calls these hooks back from the worker threads at opportune moments so the client can process a portion of the data. Suggested-by: Peter Xu <peterx@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Make MultiFDMethods constFabiano Rosas2024-09-031-1/+1
| | | | | | | | | The methods are defined at module_init time and don't ever change. Make them const. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Move nocomp code into multifd-nocomp.cFabiano Rosas2024-09-031-0/+5
| | | | | | | | | | | | | | In preparation for adding new payload types to multifd, move most of the no-compression code into multifd-nocomp.c. Let's try to keep a semblance of layering by not mixing general multifd control flow with the details of transmitting pages of ram. There are still some pieces leftover, namely the p->normal, p->zero, etc variables that we use for zero page tracking and the packet allocation which is heavily dependent on the ram code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Allow multifd sync without flushFabiano Rosas2024-09-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Separate the multifd sync from flushing the client data to the channels. These two operations are closely related but not strictly necessary to be executed together. The multifd sync is intrinsic to how multifd works. The multiple channels operate independently and may finish IO out of order in relation to each other. This applies also between the source and destination QEMU. Flushing the data that is left in the client-owned data structures (e.g. MultiFDPages_t) prior to sync is usually the right thing to do, but that is particular to how the ram migration is implemented with several passes over dirty data. Make these two routines separate, allowing future code to call the sync by itself if needed. This also allows the usage of multifd_ram_send to be isolated to ram code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Replace multifd_send_state->pages with client dataFabiano Rosas2024-09-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Multifd currently has a simple scheduling mechanism that distributes work to the various channels by keeping storage space within each channel and an extra space that is given to the client. Each time the client fills the space with data and calls into multifd, that space is given to the next idle channel and a free storage space is taken from the channel and given to client for the next iteration. This means we always need (#multifd_channels + 1) memory slots to operate multifd. This is fine, except that the presence of this one extra memory slot doesn't allow different types of payloads to be processed at the same time in different channels, i.e. the data type of multifd_send_state->pages needs to be the same as p->pages. For each new data type different from MultiFDPage_t that is to be handled, this logic would need to be duplicated by adding new fields to multifd_send_state, to the channels and to multifd_send_pages(). Fix this situation by moving the extra slot into the client and using only the generic type MultiFDSendData in the multifd core. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Remove total pages tracingFabiano Rosas2024-09-031-8/+0
| | | | | | | | | | | The total_normal_pages and total_zero_pages elements are used only for the end tracepoints of the multifd threads. These are not super useful since they record per-channel numbers and are just the sum of all the pages that are transmitted per-packet, for which we already have tracepoints. Remove the totals from the tracing. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Replace p->pages with an union pointerFabiano Rosas2024-09-031-6/+1
| | | | | | | | | | | | | | | We want multifd to be able to handle more types of data than just ram pages. To start decoupling multifd from pages, replace p->pages (MultiFDPages_t) with the new type MultiFDSendData that hides the client payload inside an union. The general idea here is to isolate functions that *need* to handle MultiFDPages_t and move them in the future to multifd-ram.c, while multifd.c will stay with only the core functions that handle MultiFDSendData/MultiFDRecvData. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Make MultiFDPages_t:offset a flexible array memberFabiano Rosas2024-09-031-2/+2
| | | | | | | | | | | | | | | | | We're about to use MultiFDPages_t from inside the MultiFDSendData payload union, which means we cannot have pointers to allocated data inside the pages structure, otherwise we'd lose the reference to that memory once another payload type touches the union. Move the offset array into the end of the structure and turn it into a flexible array member, so it is allocated along with the rest of MultiFDSendData in the next patches. Note that other pointers, such as the ramblock pointer are still fine as long as the storage for them is not owned by the migration code and can be correctly released at some point. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Introduce MultiFDSendDataFabiano Rosas2024-09-031-0/+26
| | | | | | | | | | | | | Add a new data structure to replace p->pages in the multifd channel. This new structure will hide the multifd payload type behind an union, so we don't need to add a new field to the channel each time we want to handle a different data type. This also allow us to keep multifd_send_pages() as is, without needing to complicate the pointer switching. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Remove pages->allocatedFabiano Rosas2024-09-031-2/+0
| | | | | | | | | This value never changes and is always the same as page_count. We don't need a copy of it per-channel plus one in the extra slot. Remove it. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Inline page_size and page_countFabiano Rosas2024-09-031-8/+10
| | | | | | | | | | The MultiFD*Params structures are for per-channel data. Constant values should not be there because that needlessly wastes cycles and storage. The page_size and page_count fall into this category so move them inline in multifd.h. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: add uadk compression frameworkShameer Kolothum2024-06-141-2/+3
| | | | | | | | | | | Adds the skeleton to support uadk compression method. Complete functionality will be added in subsequent patches. Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com> Reviewed-by: Zhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: add qpl compression methodYuan Liu2024-06-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | add the Query Processing Library (QPL) compression method Introduce the qpl as a new multifd migration compression method, it can use In-Memory Analytics Accelerator(IAA) to accelerate compression and decompression, which can not only reduce network bandwidth requirement but also reduce host compression and decompression CPU overhead. How to enable qpl compression during migration: migrate_set_parameter multifd-compression qpl There is no qpl compression level parameter added since it only supports level one, users do not need to specify the qpl compression level. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Nanhai Zou <nanhai.zou@intel.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> [fixed docs spacing in migration.json] Signed-off-by: Fabiano Rosas <farosas@suse.de>
* migration/multifd: Implement zero page transmission on the multifd thread.Hao Xiang2024-03-111-1/+22
| | | | | | | | | | | | | | | 1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters. Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Support incoming mapped-ram stream formatFabiano Rosas2024-03-011-0/+2
| | | | | | | | | | | | | | | | | | | | | For the incoming mapped-ram migration we need to read the ramblock headers, get the pages bitmap and send the host address of each non-zero page to the multifd channel thread for writing. Usage on HMP is: (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_incoming file:migfile (the ram.h include needs to move because we've been previously relying on it being included from migration.c. Now file.h will start including multifd.h before migration.o is processed) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-22-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add outgoing QIOChannelFile supportFabiano Rosas2024-03-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | Allow multifd to open file-backed channels. This will be used when enabling the mapped-ram migration stream format which expects a seekable transport. The QIOChannel read and write methods will use the preadv/pwritev versions which don't update the file offset at each call so we can reuse the fd without re-opening for every channel. Contrary to the socket migration, the file migration doesn't need an asynchronous channel creation process, so expose multifd_channel_connect() and call it directly. Note that this is just setup code and multifd cannot yet make use of the file channels. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-18-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add a wrapper for channels_createdFabiano Rosas2024-03-011-0/+1
| | | | | | | | | | We'll need to access multifd_send_state->channels_created from outside multifd.c, so introduce a helper for that. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-17-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Allow receiving pages without packetsFabiano Rosas2024-03-011-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently multifd does not need to have knowledge of pages on the receiving side because all the information needed is within the packets that come in the stream. We're about to add support to mapped-ram migration, which cannot use packets because it expects the ramblock section in the migration file to contain only the guest pages data. Add a data structure to transfer pages between the ram migration code and the multifd receiving threads. We don't want to reuse MultiFDPages_t for two reasons: a) multifd threads don't really need to know about the data they're receiving. b) the receiving side has to be stopped to load the pages, which means we can experiment with larger granularities than page size when transferring data. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-16-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Decouple recv method from pagesFabiano Rosas2024-03-011-2/+2
| | | | | | | | | | | Next patches will abstract the type of data being received by the channels, so do some cleanup now to remove references to pages and dependency on 'normal_num'. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-14-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Rename MultiFDSend|RecvParams::data to compress_dataFabiano Rosas2024-03-011-2/+2
| | | | | | | | | | Use a more specific name for the compression data so we can use the generic for the multifd core code. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-13-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop registered_yankPeter Xu2024-02-281-2/+0
| | | | | | | | | | | | With a clear definition of p->c protocol, where we only set it up if the channel is fully established (TLS or non-TLS), registered_yank boolean will have equal meaning of "p->c != NULL". Drop registered_yank by checking p->c instead. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Move multifd_send_setup error handling in to the functionFabiano Rosas2024-02-071-1/+1
| | | | | | | | | | Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Remove p->runningFabiano Rosas2024-02-071-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created. However, there are at least two bugs in this logic: 1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join(). 2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked. Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage. Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that. Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon <avihaih@nvidia.com> Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Join the TLS threadFabiano Rosas2024-02-071-0/+2
| | | | | | | | | | | | We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Optimize sender side to be locklessPeter Xu2024-02-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reviewing my attempt to refactor send_prepare(), Fabiano suggested we try out with dropping the mutex in multifd code [1]. I thought about that before but I never tried to change the code. Now maybe it's time to give it a stab. This only optimizes the sender side. The trick here is multifd has a clear provider/consumer model, that the migration main thread publishes requests (either pending_job/pending_sync), while the multifd sender threads are consumers. Here we don't have a lot of complicated data sharing, and the jobs can logically be submitted lockless. Arm the code with atomic weapons. Two things worth mentioning: - For multifd_send_pages(): we can use qatomic_load_acquire() when trying to find a free channel, but that's expensive if we attach one ACQUIRE per channel. Instead, keep the qatomic_read() on reading the pending_job flag as we do already, meanwhile use one smp_mb_acquire() after the loop to guarantee the memory ordering. - For pending_sync: it doesn't have any extra data required since now p->flags are never touched, it should be safe to not use memory barrier. That's different from pending_job. Provide rich comments for all the lockless operations to state how they are paired. With that, we can remove the mutex. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Fix MultiFDSendParams.packet_num racePeter Xu2024-02-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy to be assigned and stored. Consider two consequent operations of: (1) queue a job into multifd send thread X, then (2) queue another sync request to the same send thread X. Then the MultiFDSendParams.packet_num will be assigned twice, and the first assignment can get lost already. To avoid that, we move the packet_num assignment from p->packet_num into where the thread will fill in the packet. Use atomic operations to protect the field, making sure there's no race. Note that atomic fetch_add() may not be good for scaling purposes, however multifd should be fine as number of threads should normally not go beyond 16 threads. Let's leave that concern for later but fix the issue first. There's also a trick on how to make it always work even on 32 bit hosts for uint64_t packet number. Switching to uintptr_t as of now to simply the case. It will cause packet number to overflow easier on 32 bit, but that shouldn't be a major concern for now as 32 bit systems is not the major audience for any performance concerns like what multifd wants to address. We also need to move multifd_send_state definition upper, so that multifd_send_fill_packet() can reference it. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Stick with send/recv on function namesPeter Xu2024-02-051-5/+5
| | | | | | | | | | | | Most of the multifd code uses send/recv to represent the two sides, but some rare cases use save/load. Since send/recv is the majority, replacing the save/load use cases to use send/recv globally. Now we reach a consensus on the naming. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Change retval of multifd_queue_page()Peter Xu2024-02-051-1/+1
| | | | | | | | | Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Move header prepare/fill into send_prepare()Peter Xu2024-02-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy. Now with the new interface, we require the hook to do all the work for preparing the IOVs to send. After it's completed, the IOVs should be ready to be dumped into the specific multifd QIOChannel later. So now the API looks like: p->pages -----------> send_prepare() -------------> IOVs This also prepares for the case where the input can be extended to even not any p->pages. But that's for later. This patch will achieve similar goal of what Fabiano used to propose here: https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de However the send() interface may not be necessary. I'm boldly attaching a "Co-developed-by" for Fabiano. Co-developed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: multifd_send_prepare_header()Peter Xu2024-02-051-0/+8
| | | | | | | | | | | | | | | | | Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender. It's fine to setup the IOV[0] _before_ send_prepare() because the packet buffer is already ready, even if the content is to be filled in. With this helper, we can already slightly clean up the zero copy path. Note that I explicitly put it into multifd.h, because I want it inlined directly into multifd*.c where necessary later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Rename p->num_packets and clean it upPeter Xu2024-02-051-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This field, no matter whether on src or dest, is only used for debugging purpose. They can even be removed already, unless it still more or less provide some accounting on "how many packets are sent/recved for this thread". The other more important one is called packet_num, which is embeded in the multifd packet headers (MultiFDPacket_t). So let's keep them for now, but make them much easier to understand, by doing below: - Rename both of them to packets_sent / packets_recved, the old name (num_packets) are waaay too confusing when we already have MultiFDPacket_t.packets_num. - Avoid worrying on the "initial packet": we know we will send it, that's good enough. The accounting won't matter a great deal to start with 0 or with 1. - Move them to where we send/recv the packets. They're: - multifd_send_fill_packet() for senders. - multifd_recv_unfill_packet() for receivers. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Separate SYNC request with normal jobsPeter Xu2024-02-051-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a sync request. The sync request is a very special kind of job. It never contains a page array, but only a multifd packet telling the dest side to synchronize with sent pages. Before this patch, both requests use the pending_job field, no matter what the request is, it will boost pending_job, while multifd sender thread will decrement it after it finishes one job. However this should be racy, because SYNC is special in that it needs to set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. Consider a sequence of operations where: - migration thread enqueue a job to send some pages, pending_job++ (0->1) - [...before the selected multifd sender thread wakes up...] - migration thread enqueue another job to sync, pending_job++ (1->2), setup p->flags=MULTIFD_FLAG_SYNC - multifd sender thread wakes up, found pending_job==2 - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages - send the 2nd packet with flags==0 and no pages This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done after all the pages are received. Meanwhile, the 2nd packet will be completely useless, which contains zero information. I didn't verify above, but I think this issue is still benign in that at least on the recv side we always receive pages before handling MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky. One other reason I want to separate it is using p->flags to communicate between the two threads is also not clearly defined, it's very hard to read and understand why accessing p->flags is always safe; see the current impl of multifd_send_thread() where we tried to cache only p->flags. It doesn't need to be that complicated. This patch introduces pending_sync, a separate flag just to show that the requester needs a sync. Alongside, we remove the tricky caching of p->flags now because after this patch p->flags should only be used by multifd sender thread now, which will be crystal clear. So it is always thread safe to access p->flags. With that, we can also safely convert the pending_job into a boolean, because we don't support >1 pending jobs anyway. Always use atomic ops to access both flags to make sure no cache effect. When at it, drop the initial setting of "pending_job = 0" because it's always allocated using g_new0(). Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop MultiFDSendParams.normal[] arrayPeter Xu2024-02-051-4/+0
| | | | | | | | | | | | | | | | This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should be safe to always use p->pages->offset[] rather than p->normal[]. Drop the array. Alongside, the normal_num is also redundant, which is the same to p->pages->num. This doesn't apply to recv side, because there's no extra buffering on recv side, so p->normal[] array is still needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop MultiFDSendParams.quit, cleanup error pathsPeter Xu2024-02-051-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multifd send side has two fields to indicate error quits: - MultiFDSendParams.quit - &multifd_send_state->exiting Merge them into the global one. The replacement is done by changing all p->quit checks into the global var check. The global check doesn't need any lock. A few more things done on top of this altogether: - multifd_send_terminate_threads() Moving the xchg() of &multifd_send_state->exiting upper, so as to cover the tracepoint, migrate_set_error() and migrate_set_state(). - multifd_send_sync_main() In the 2nd loop, add one more check over the global var to make sure we don't keep the looping if QEMU already decided to quit. - multifd_tls_outgoing_handshake() Use multifd_send_terminate_threads() to set the error state. That has a benefit of updating MigrationState.error to that error too, so we can persist that 1st error we hit in that specific channel. - multifd_new_send_channel_async() Take similar approach like above, drop the migrate_set_error() because multifd_send_terminate_threads() already covers that. Unwrap the helper multifd_new_send_channel_cleanup() along the way; not really needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Remove QEMUFile from where it is not neededFabiano Rosas2024-01-161-2/+2
| | | | | | | Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144.9680-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Remove MultiFDPages_t::packet_numFabiano Rosas2024-01-161-2/+0
| | | | | | | | | | This was introduced by commit 34c55a94b1 ("migration: Create multipage support") and never used. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240104142144.9680-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* multifd: Add the ramblock to MultiFDRecvParamsLukas Straub2023-05-101-0/+2
| | | | | | | | | This will be used in the next commits to add colo support to multifd. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <88135197411df1a71d7832962b39abf60faf0021.1683572883.git.lukasstraub2@web.de> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: Move load_cleanup inside incoming_state_destroyLeonardo Bras2023-02-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently running migration_incoming_state_destroy() without first running multifd_load_cleanup() will cause a yank error: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. (core dumped) The above error happens in the target host, when multifd is being used for precopy, and then postcopy is triggered and the migration finishes. This will crash the VM in the target host. To avoid that, move multifd_load_cleanup() inside migration_incoming_state_destroy(), so that the load cleanup becomes part of the incoming state destroying process. Running multifd_load_cleanup() twice can become an issue, though, but the only scenario it could be ran twice is on process_incoming_migration_bh(). So removing this extra call is necessary. On the other hand, this multifd_load_cleanup() call happens way before the migration_incoming_state_destroy() and having this happening before dirty_bitmap_mig_before_vm_start() and vm_start() may be a need. So introduce a new function multifd_load_shutdown() that will mainly stop all multifd threads and close their QIOChannels. Then use this function instead of multifd_load_cleanup() to make sure nothing else is received before dirty_bitmap_mig_before_vm_start(). Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: Change multifd_load_cleanup() signature and usageLeonardo Bras2023-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | | Since it's introduction in commit f986c3d256 ("migration: Create multifd migration threads"), multifd_load_cleanup() never returned any value different than 0, neither set up any error on errp. Even though, on process_incoming_migration_bh() an if clause uses it's return value to decide on setting autostart = false, which will never happen. In order to simplify the codebase, change multifd_load_cleanup() signature to 'void multifd_load_cleanup(void)', and for every usage remove error handling or decision made based on return value != 0. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>