summary refs log tree commit diff stats
path: root/migration/multifd.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* migration/multifd: solve zero page causing multiple page faultsYuan Liu2024-04-231-0/+1
| | | | | | | | | | | | | | | | | Implemented recvbitmap tracking of received pages in multifd. If the zero page appears for the first time in the recvbitmap, this page is not checked and set. If the zero page has already appeared in the recvbitmap, there is no need to check the data but directly set the data to 0, because it is unlikely that the zero page will be migrated multiple times. Signed-off-by: Yuan Liu <yuan1.liu@intel.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240401154110.2028453-2-yuan1.liu@intel.com [peterx: touch up the comment, as the bitmap is used outside postcopy now] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Fix clearing of mapped-ram zero pagesFabiano Rosas2024-03-221-2/+1
| | | | | | | | | | | | | | | | | When the zero page detection is done in the multifd threads, we need to iterate the second part of the pages->offset array and clear the file bitmap for each zero page. The piece of code we merged to do that is wrong. The reason this has passed all the tests is because the bitmap is initialized with zeroes already, so clearing the bits only really has an effect during live migration and when a data page goes from having data to no data. Fixes: 303e6f54f9 ("migration/multifd: Implement zero page transmission on the multifd thread.") Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240321201242.6009-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration: Revert mapped-ram multifd support to fd: URIFabiano Rosas2024-03-221-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit decdc76772c453ff1444612e910caa0d45cd8eac in full and also the relevant migration-tests from 7a09f092834641b7a793d50a3a261073bbb404a6. After the addition of the new QAPI-based migration address API in 8.2 we've been converting an "fd:" URI into a SocketAddress, missing the fact that the "fd:" syntax could also be used for a plain file instead of a socket. This is a problem because the SocketAddress is part of the API, so we're effectively asking users to create a "socket" channel to pass in a plain file. The easiest way to fix this situation is to deprecate the usage of both SocketAddress and "fd:" when used with a plain file for migration. Since this has been possible since 8.2, we can wait until 9.1 to deprecate it. For 9.0, however, we should avoid adding further support to migration to a plain file using the old "fd:" syntax or the new SocketAddress API, and instead require the usage of either the old-style "file:" URI or the FileMigrationArgs::filename field of the new API with the "/dev/fdset/NN" syntax, both of which are already supported. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240319210941.1907-1-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Implement zero page transmission on the multifd thread.Hao Xiang2024-03-111-14/+76
| | | | | | | | | | | | | | | 1. Add zero_pages field in MultiFDPacket_t. 2. Implements the zero page detection and handling on the multifd threads for non-compression, zlib and zstd compression backends. 3. Added a new value 'multifd' in ZeroPageDetection enumeration. 4. Adds zero page counters and updates multifd send/receive tracing format to track the newly added counters. Signed-off-by: Hao Xiang <hao.xiang@bytedance.com> Acked-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-5-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Allow clearing of the file_bmap from multifdFabiano Rosas2024-03-111-1/+1
| | | | | | | | | | | | | We currently only need to clear the mapped-ram file bitmap from the migration thread during save_zero_page. We're about to add support for zero page detection on the multifd thread, so allow ramblock_set_file_bmap_atomic() to also clear the bits. Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240311180015.3359271-3-hao.xiang@linux.dev Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Don't fsync when closing QIOChannelFileFabiano Rosas2024-03-111-9/+19
| | | | | | | | | | | | | | | | | | | | | | | | | Commit bc38feddeb ("io: fsync before closing a file channel") added a fsync/fdatasync at the closing point of the QIOChannelFile to ensure integrity of the migration stream in case of QEMU crash. The decision to do the sync at qio_channel_close() was not the best since that function runs in the main thread and the fsync can cause QEMU to hang for several minutes, depending on the migration size and disk speed. To fix the hang, remove the fsync from qio_channel_file_close(). At this moment, the migration code is the only user of the fsync and we're taking the tradeoff of not having a sync at all, leaving the responsibility to the upper layers. Fixes: bc38feddeb ("io: fsync before closing a file channel") Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240305195629.9922-1-farosas@suse.de Link: https://lore.kernel.org/r/20240305174332.2553-1-farosas@suse.de [peterx: add more comment to the qio_channel_close()] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Document two places for mapped-ramPeter Xu2024-03-041-0/+12
| | | | | | | | | | | Add two documentations for mapped-ram migration on two spots that may not be extremely clear. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240301091524.39900-1-peterx@redhat.com Cc: Prasad Pandit <ppandit@redhat.com> [peterx: fix two English errors per Prasad] Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add mapped-ram support to fd: URIFabiano Rosas2024-03-011-0/+2
| | | | | | | | | | | | | | | | | | If we receive a file descriptor that points to a regular file, there's nothing stopping us from doing multifd migration with mapped-ram to that file. Enable the fd: URI to work with multifd + mapped-ram. Note that the fds passed into multifd are duplicated because we want to avoid cross-thread effects when doing cleanup (i.e. close(fd)). The original fd doesn't need to be duplicated because monitor_get_fd() transfers ownership to the caller. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-23-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Support incoming mapped-ram stream formatFabiano Rosas2024-03-011-3/+28
| | | | | | | | | | | | | | | | | | | | | For the incoming mapped-ram migration we need to read the ramblock headers, get the pages bitmap and send the host address of each non-zero page to the multifd channel thread for writing. Usage on HMP is: (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_incoming file:migfile (the ram.h include needs to move because we've been previously relying on it being included from migration.c. Now file.h will start including multifd.h before migration.o is processed) Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-22-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Support outgoing mapped-ram stream formatFabiano Rosas2024-03-011-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The new mapped-ram stream format uses a file transport and puts ram pages in the migration file at their respective offsets and can be done in parallel by using the pwritev system call which takes iovecs and an offset. Add support to enabling the new format along with multifd to make use of the threading and page handling already in place. This requires multifd to stop sending headers and leaving the stream format to the mapped-ram code. When it comes time to write the data, we need to call a version of qio_channel_write that can take an offset. Usage on HMP is: (qemu) stop (qemu) migrate_set_capability multifd on (qemu) migrate_set_capability mapped-ram on (qemu) migrate_set_parameter max-bandwidth 0 (qemu) migrate_set_parameter multifd-channels 8 (qemu) migrate file:migfile Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-21-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add incoming QIOChannelFile supportFabiano Rosas2024-03-011-2/+1
| | | | | | | | | | | | On the receiving side we don't need to differentiate between main channel and threads, so whichever channel is defined first gets to be the main one. And since there are no packets, use the atomic channel count to index into the params array. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-19-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add outgoing QIOChannelFile supportFabiano Rosas2024-03-011-3/+15
| | | | | | | | | | | | | | | | | | | | | | Allow multifd to open file-backed channels. This will be used when enabling the mapped-ram migration stream format which expects a seekable transport. The QIOChannel read and write methods will use the preadv/pwritev versions which don't update the file offset at each call so we can reuse the fd without re-opening for every channel. Contrary to the socket migration, the file migration doesn't need an asynchronous channel creation process, so expose multifd_channel_connect() and call it directly. Note that this is just setup code and multifd cannot yet make use of the file channels. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240229153017.2221-18-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add a wrapper for channels_createdFabiano Rosas2024-03-011-1/+6
| | | | | | | | | | We'll need to access multifd_send_state->channels_created from outside multifd.c, so introduce a helper for that. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-17-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Allow receiving pages without packetsFabiano Rosas2024-03-011-7/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently multifd does not need to have knowledge of pages on the receiving side because all the information needed is within the packets that come in the stream. We're about to add support to mapped-ram migration, which cannot use packets because it expects the ramblock section in the migration file to contain only the guest pages data. Add a data structure to transfer pages between the ram migration code and the multifd receiving threads. We don't want to reuse MultiFDPages_t for two reasons: a) multifd threads don't really need to know about the data they're receiving. b) the receiving side has to be stopped to load the pages, which means we can experiment with larger granularities than page size when transferring data. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-16-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Allow multifd without packetsFabiano Rosas2024-03-011-61/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | For the upcoming support to the new 'mapped-ram' migration stream format, we cannot use multifd packets because each write into the ramblock section in the migration file is expected to contain only the guest pages. They are written at their respective offsets relative to the ramblock section header. There is no space for the packet information and the expected gains from the new approach come partly from being able to write the pages sequentially without extraneous data in between. The new format also simply doesn't need the packets and all necessary information can be taken from the standard migration headers with some (future) changes to multifd code. Use the presence of the mapped-ram capability to decide whether to send packets. This only moves code under multifd_use_packets(), it has no effect for now as mapped-ram cannot yet be enabled with multifd. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-15-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Decouple recv method from pagesFabiano Rosas2024-03-011-5/+8
| | | | | | | | | | | Next patches will abstract the type of data being received by the channels, so do some cleanup now to remove references to pages and dependency on 'normal_num'. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-14-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Cleanup multifd_recv_sync_mainFabiano Rosas2024-03-011-4/+13
| | | | | | | | | | | | | Some minor cleanups and documentation for multifd_recv_sync_main. Use thread_count as done in other parts of the code. Remove p->id from the multifd_recv_state sync, since that is global and not tied to a channel. Add documentation for the sync steps. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240229153017.2221-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop unnecessary helper to destroy IOCPeter Xu2024-02-281-6/+1
| | | | | | | | | | | Both socket_send_channel_destroy() and multifd_send_channel_destroy() are unnecessary wrappers to destroy an IOC, as the only thing to do is to release the final IOC reference. We have plenty of code that destroys an IOC using direct unref() already; keep that style. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Cleanup outgoing_args in state destroyPeter Xu2024-02-281-0/+1
| | | | | | | | | | | outgoing_args is a global cache of socket address to be reused in multifd. Freeing the cache in per-channel destructor is more or less a hack. Move it to multifd_send_cleanup_state() so it only get checked once. Use a small helper to do so because it's internal of socket.c. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Make multifd_channel_connect() return voidPeter Xu2024-02-281-5/+3
| | | | | | | | | It never fails, drop the retval and also the Error**. Suggested-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop registered_yankPeter Xu2024-02-281-4/+3
| | | | | | | | | | | | With a clear definition of p->c protocol, where we only set it up if the channel is fully established (TLS or non-TLS), registered_yank boolean will have equal meaning of "p->c != NULL". Drop registered_yank by checking p->c instead. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Cleanup TLS iochannel referencingPeter Xu2024-02-281-14/+23
| | | | | | | | | | | | | | | | | | | | Commit a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") introduced a thread for TLS channels, which will resolve the issue on blocking the main thread. However in the same commit p->c is slightly abused just to be able to pass over the pointer "p" into the thread. That's the major reason we'll need to conditionally free the io channel in the fault paths. To clean it up, using a separate structure to pass over both "p" and "tioc" in the tls handshake thread. Then we can make it a rule that p->c will never be set until the channel is completely setup. With that, we can drop the tricky conditional unref of the io channel in the error path. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240222095301.171137-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Release recv sem_sync earlierFabiano Rosas2024-02-281-6/+6
| | | | | | | | | | Now that multifd_recv_terminate_threads() is called only once, release the recv side sem_sync earlier like we do for the send side. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240220224138.24759-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Remove p->quit from recv sideFabiano Rosas2024-02-281-17/+24
| | | | | | | | | | Like we did on the sending side, replace the p->quit per-channel flag with a global atomic 'exiting' flag. Signed-off-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240220224138.24759-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Add a synchronization point for channel creationFabiano Rosas2024-02-071-6/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the channel creation tasks are still in flight. This could lead to multifd_save_cleanup() executing the qemu_thread_join() loop too early and not waiting for the threads which haven't been created yet, leading to the freeing of resources that the newly created threads will try to access and crash. Add a synchronization point after which there will be no attempts at thread creation and therefore calling multifd_save_cleanup() past that point will ensure it properly waits for the threads. A note about performance: Prior to this patch, if a channel took too long to be established, other channels could finish connecting first and already start taking load. Now we're bounded by the slowest-connecting channel. Reported-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Unify multifd and TLS connection pathsFabiano Rosas2024-02-071-43/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice, once to create the TLS handshake thread and another time after the asynchrounous TLS handshake has finished. This creates a slightly confusing call stack where multifd_channel_connect() is called more times than the number of channels. It also splits error handling between the two callers of multifd_channel_connect() causing some code duplication. Lastly, it gets in the way of having a single point to determine whether all channel creation tasks have been initiated. Refactor the code to move the reentrancy one level up at the multifd_new_send_channel_async() level, de-duplicating the error handling and allowing for the next patch to introduce a synchronization point common to all the multifd channel creation, regardless of TLS. Note that the previous code would never fail once p->c had been set. This patch changes this assumption, which affects refcounting, so add comments around object_unref to explain the situation. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Move multifd_send_setup error handling in to the functionFabiano Rosas2024-02-071-7/+17
| | | | | | | | | | Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Remove p->runningFabiano Rosas2024-02-071-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created. However, there are at least two bugs in this logic: 1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join(). 2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked. Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage. Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that. Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon <avihaih@nvidia.com> Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Join the TLS threadFabiano Rosas2024-02-071-1/+7
| | | | | | | | | | | | We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Optimize sender side to be locklessPeter Xu2024-02-061-25/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reviewing my attempt to refactor send_prepare(), Fabiano suggested we try out with dropping the mutex in multifd code [1]. I thought about that before but I never tried to change the code. Now maybe it's time to give it a stab. This only optimizes the sender side. The trick here is multifd has a clear provider/consumer model, that the migration main thread publishes requests (either pending_job/pending_sync), while the multifd sender threads are consumers. Here we don't have a lot of complicated data sharing, and the jobs can logically be submitted lockless. Arm the code with atomic weapons. Two things worth mentioning: - For multifd_send_pages(): we can use qatomic_load_acquire() when trying to find a free channel, but that's expensive if we attach one ACQUIRE per channel. Instead, keep the qatomic_read() on reading the pending_job flag as we do already, meanwhile use one smp_mb_acquire() after the loop to guarantee the memory ordering. - For pending_sync: it doesn't have any extra data required since now p->flags are never touched, it should be safe to not use memory barrier. That's different from pending_job. Provide rich comments for all the lockless operations to state how they are paired. With that, we can remove the mutex. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Fix MultiFDSendParams.packet_num racePeter Xu2024-02-051-22/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy to be assigned and stored. Consider two consequent operations of: (1) queue a job into multifd send thread X, then (2) queue another sync request to the same send thread X. Then the MultiFDSendParams.packet_num will be assigned twice, and the first assignment can get lost already. To avoid that, we move the packet_num assignment from p->packet_num into where the thread will fill in the packet. Use atomic operations to protect the field, making sure there's no race. Note that atomic fetch_add() may not be good for scaling purposes, however multifd should be fine as number of threads should normally not go beyond 16 threads. Let's leave that concern for later but fix the issue first. There's also a trick on how to make it always work even on 32 bit hosts for uint64_t packet number. Switching to uintptr_t as of now to simply the case. It will cause packet number to overflow easier on 32 bit, but that shouldn't be a major concern for now as 32 bit systems is not the major audience for any performance concerns like what multifd wants to address. We also need to move multifd_send_state definition upper, so that multifd_send_fill_packet() can reference it. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Stick with send/recv on function namesPeter Xu2024-02-051-5/+5
| | | | | | | | | | | | Most of the multifd code uses send/recv to represent the two sides, but some rare cases use save/load. Since send/recv is the majority, replacing the save/load use cases to use send/recv globally. Now we reach a consensus on the naming. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Cleanup multifd_load_cleanup()Peter Xu2024-02-051-22/+30
| | | | | | | | | | | Use similar logic to cleanup the recv side. Note that multifd_recv_terminate_threads() may need some similar rework like the sender side, but let's leave that for later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Cleanup multifd_save_cleanup()Peter Xu2024-02-051-32/+59
| | | | | | | | | | | | | | Shrink the function by moving relevant works into helpers: move the thread join()s into multifd_send_terminate_threads(), then create two more helpers to cover channel/state cleanups. Add a TODO entry for the thread terminate process because p->running is still buggy. We need to fix it at some point but not yet covered. Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Rewrite multifd_queue_page()Peter Xu2024-02-051-19/+37
| | | | | | | | | | | | | | | | | | | | | | | The current multifd_queue_page() is not easy to read and follow. It is not good with a few reasons: - No helper at all to show what exactly does a condition mean; in short, readability is low. - Rely on pages->ramblock being cleared to detect an empty queue. It's slightly an overload of the ramblock pointer, per Fabiano [1], which I also agree. - Contains a self recursion, even if not necessary.. Rewrite this function. We add some comments to make it even clearer on what it does. [1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Change retval of multifd_send_pages()Peter Xu2024-02-051-7/+8
| | | | | | | | | Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Change retval of multifd_queue_page()Peter Xu2024-02-051-4/+5
| | | | | | | | | Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Split multifd_send_terminate_threads()Peter Xu2024-02-051-9/+18
| | | | | | | | | | | | | | | | | | | | | | | Split multifd_send_terminate_threads() into two functions: - multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only - multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling. Use multifd_send_set_error() in the three old call sites where only the error will be set. Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup(). Both helpers will need to set quitting=1. Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Forbid spurious wakeupsPeter Xu2024-02-051-4/+3
| | | | | | | | | | | | | | Now multifd's logic is designed to have no spurious wakeup. I still remember a talk to Juan and he seems to agree we should drop it now, and if my memory was right it was there because multifd used to hit that when still debugging. Let's drop it and see what can explode; as long as it's not reaching soft-freeze. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Move header prepare/fill into send_prepare()Peter Xu2024-02-051-33/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy. Now with the new interface, we require the hook to do all the work for preparing the IOVs to send. After it's completed, the IOVs should be ready to be dumped into the specific multifd QIOChannel later. So now the API looks like: p->pages -----------> send_prepare() -------------> IOVs This also prepares for the case where the input can be extended to even not any p->pages. But that's for later. This patch will achieve similar goal of what Fabiano used to propose here: https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de However the send() interface may not be necessary. I'm boldly attaching a "Co-developed-by" for Fabiano. Co-developed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: multifd_send_prepare_header()Peter Xu2024-02-051-8/+8
| | | | | | | | | | | | | | | | | Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender. It's fine to setup the IOV[0] _before_ send_prepare() because the packet buffer is already ready, even if the content is to be filled in. With this helper, we can already slightly clean up the zero copy path. Note that I explicitly put it into multifd.h, because I want it inlined directly into multifd*.c where necessary later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Move trace_multifd_send|recv()Peter Xu2024-02-051-5/+6
| | | | | | | | | Move them into fill/unfill of packets. With that, we can further cleanup the send/recv thread procedure, and remove one more temp var. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Move total_normal_pages accountingPeter Xu2024-02-051-2/+2
| | | | | | | | | Just like the previous patch, move the accounting for total_normal_pages on both src/dst sides into the packet fill/unfill procedures. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Rename p->num_packets and clean it upPeter Xu2024-02-051-8/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This field, no matter whether on src or dest, is only used for debugging purpose. They can even be removed already, unless it still more or less provide some accounting on "how many packets are sent/recved for this thread". The other more important one is called packet_num, which is embeded in the multifd packet headers (MultiFDPacket_t). So let's keep them for now, but make them much easier to understand, by doing below: - Rename both of them to packets_sent / packets_recved, the old name (num_packets) are waaay too confusing when we already have MultiFDPacket_t.packets_num. - Avoid worrying on the "initial packet": we know we will send it, that's good enough. The accounting won't matter a great deal to start with 0 or with 1. - Move them to where we send/recv the packets. They're: - multifd_send_fill_packet() for senders. - multifd_recv_unfill_packet() for receivers. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop pages->num check in sender threadPeter Xu2024-02-051-6/+7
| | | | | | | | | Now with a split SYNC handler, we always have pages->num set for pending_job==true. Assert it instead. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Simplify locking in sender threadPeter Xu2024-02-051-7/+16
| | | | | | | | | | | | | | | | | The sender thread will yield the p->mutex before IO starts, trying to not block the requester thread. This may be unnecessary lock optimizations, because the requester can already read pending_job safely even without the lock, because the requester is currently the only one who can assign a task. Drop that lock complication on both sides: (1) in the sender thread, always take the mutex until job done (2) in the requester thread, check pending_job clear lockless Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Separate SYNC request with normal jobsPeter Xu2024-02-051-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a sync request. The sync request is a very special kind of job. It never contains a page array, but only a multifd packet telling the dest side to synchronize with sent pages. Before this patch, both requests use the pending_job field, no matter what the request is, it will boost pending_job, while multifd sender thread will decrement it after it finishes one job. However this should be racy, because SYNC is special in that it needs to set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. Consider a sequence of operations where: - migration thread enqueue a job to send some pages, pending_job++ (0->1) - [...before the selected multifd sender thread wakes up...] - migration thread enqueue another job to sync, pending_job++ (1->2), setup p->flags=MULTIFD_FLAG_SYNC - multifd sender thread wakes up, found pending_job==2 - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages - send the 2nd packet with flags==0 and no pages This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done after all the pages are received. Meanwhile, the 2nd packet will be completely useless, which contains zero information. I didn't verify above, but I think this issue is still benign in that at least on the recv side we always receive pages before handling MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky. One other reason I want to separate it is using p->flags to communicate between the two threads is also not clearly defined, it's very hard to read and understand why accessing p->flags is always safe; see the current impl of multifd_send_thread() where we tried to cache only p->flags. It doesn't need to be that complicated. This patch introduces pending_sync, a separate flag just to show that the requester needs a sync. Alongside, we remove the tricky caching of p->flags now because after this patch p->flags should only be used by multifd sender thread now, which will be crystal clear. So it is always thread safe to access p->flags. With that, we can also safely convert the pending_job into a boolean, because we don't support >1 pending jobs anyway. Always use atomic ops to access both flags to make sure no cache effect. When at it, drop the initial setting of "pending_job = 0" because it's always allocated using g_new0(). Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop MultiFDSendParams.normal[] arrayPeter Xu2024-02-051-20/+13
| | | | | | | | | | | | | | | | This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should be safe to always use p->pages->offset[] rather than p->normal[]. Drop the array. Alongside, the normal_num is also redundant, which is the same to p->pages->num. This doesn't apply to recv side, because there's no extra buffering on recv side, so p->normal[] array is still needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Postpone reset of MultiFDPages_tPeter Xu2024-02-051-4/+14
| | | | | | | | | | | | | | | | Now we reset MultiFDPages_t object in the multifd sender thread in the middle of the sending job. That's not necessary, because the "*pages" struct will not be reused anyway until pending_job is cleared. Move that to the end after the job is completed, provide a helper to reset a "*pages" object. Use that same helper when free the object too. This prepares us to keep using p->pages in the follow up patches, where we may drop p->normal[]. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
* migration/multifd: Drop MultiFDSendParams.quit, cleanup error pathsPeter Xu2024-02-051-52/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Multifd send side has two fields to indicate error quits: - MultiFDSendParams.quit - &multifd_send_state->exiting Merge them into the global one. The replacement is done by changing all p->quit checks into the global var check. The global check doesn't need any lock. A few more things done on top of this altogether: - multifd_send_terminate_threads() Moving the xchg() of &multifd_send_state->exiting upper, so as to cover the tracepoint, migrate_set_error() and migrate_set_state(). - multifd_send_sync_main() In the 2nd loop, add one more check over the global var to make sure we don't keep the looping if QEMU already decided to quit. - multifd_tls_outgoing_handshake() Use multifd_send_terminate_threads() to set the error state. That has a benefit of updating MigrationState.error to that error too, so we can persist that 1st error we hit in that specific channel. - multifd_new_send_channel_async() Take similar approach like above, drop the migrate_set_error() because multifd_send_terminate_threads() already covers that. Unwrap the helper multifd_new_send_channel_cleanup() along the way; not really needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>