summary refs log tree commit diff stats
path: root/migration (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* migration: Make multifd_bytes atomicJuan Quintela2023-04-243-5/+5
| | | | | | | | | | | | | | In the spirit of: commit 394d323bc3451e4d07f13341cb8817fac8dfbadd Author: Peter Xu <peterx@redhat.com> Date: Tue Oct 11 17:55:51 2022 -0400 migration: Use atomic ops properly for page accountings Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Update atomic stats out of the mutexJuan Quintela2023-04-241-2/+2
| | | | | | Reviewed-by: David Edmondson <david.edmondson@oracle.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Merge ram_counters and ram_atomic_countersJuan Quintela2023-04-244-42/+37
| | | | | | | | | | | | | | | | | Using MgrationStats as type for ram_counters mean that we didn't have to re-declare each value in another struct. The need of atomic counters have make us to create MigrationAtomicStats for this atomic counters. Create RAMStats type which is a merge of MigrationStats and MigrationAtomicStats removing unused members. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> --- Fix typos found by David Edmondson
* migration: remove extra whitespace character for code style李皆俊2023-04-241-1/+1
| | | | | | | | Fix code style. Signed-off-by: 李皆俊 <a_lijiejun@163.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* postcopy-ram: do not use qatomic_mb_readPaolo Bonzini2023-04-201-1/+1
| | | | | | | It does not even pair with a qatomic_mb_set(), so it is clearer to use load-acquire in this case; they are synonyms. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* migration: mark mixed functions that can suspendPaolo Bonzini2023-04-202-10/+10
| | | | | | | | | | | | | | | | There should be no paths from a coroutine_fn to aio_poll, however in practice coroutine_mixed_fn will call aio_poll in the !qemu_in_coroutine() path. By marking mixed functions, we can track accurately the call paths that execute entirely in coroutine context, and find more missing coroutine_fn markers. This results in more accurate checks that coroutine code does not end up blocking. If the marking were extended transitively to all functions that call these ones, static analysis could be done much more efficiently. However, this is a start and makes it possible to use vrc's path-based searches to find potential bugs where coroutine_fns call blocking functions. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* migration: fix ram_state_pending_exact()Juan Quintela2023-04-121-1/+2
| | | | | | | | | | | | | | I removed that bit on commit: commit c8df4a7aeffcb46020f610526eea621fa5b0cd47 Author: Juan Quintela <quintela@redhat.com> Date: Mon Oct 3 02:00:03 2022 +0200 migration: Split save_live_pending() into state_pending_* Fixes: c8df4a7aeffcb46020f610526eea621fa5b0cd47 Suggested-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/ram.c: Fix migration with compress enabledLukas Straub2023-04-121-13/+11
| | | | | | | | | | | | | Since ec6f3ab9, migration with compress enabled was broken, because the compress threads use a dummy QEMUFile which just acts as a buffer and that commit accidentally changed it to use the outgoing migration channel instead. Fix this by using the dummy file again in the compress threads. Signed-off-by: Lukas Straub <lukasstraub2@web.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Recover behavior of preempt channel creation for pre-7.2Peter Xu2023-04-123-2/+24
| | | | | | | | | | | | | | | | | | | In 8.0 devel window we reworked preempt channel creation, so that there'll be no race condition when the migration channel and preempt channel got established in the wrong order in commit 5655aab079. However no one noticed that the change will also be not compatible with older qemus, majorly 7.1/7.2 versions where preempt mode started to be supported. Leverage the same pre-7.2 flag introduced in the previous patch to recover the behavior hopefully before 8.0 releases, so we don't break migration when we migrate from 8.0 to older qemu binaries. Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Fix potential race on postcopy_qemufile_srcPeter Xu2023-04-123-8/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | postcopy_qemufile_src object should be owned by one thread, either the main thread (e.g. when at the beginning, or at the end of migration), or by the return path thread (when during a preempt enabled postcopy migration). If that's not the case the access to the object might be racy. postcopy_preempt_shutdown_file() can be potentially racy, because it's called at the end phase of migration on the main thread, however during which the return path thread hasn't yet been recycled; the recycle happens in await_return_path_close_on_source() which is after this point. It means, logically it's posslbe the main thread and the return path thread are both operating on the same qemufile. While I don't think qemufile is thread safe at all. postcopy_preempt_shutdown_file() used to be needed because that's where we send EOS to dest so that dest can safely shutdown the preempt thread. To avoid the possible race, remove this only place that a race can happen. Instead we figure out another way to safely close the preempt thread on dest. The core idea during postcopy on deciding "when to stop" is that dest will send a postcopy SHUT message to src, telling src that all data is there. Hence to shut the dest preempt thread maybe better to do it directly on dest node. This patch proposed such a way that we change postcopy_prio_thread_created into PreemptThreadStatus, so that we kick the preempt thread on dest qemu by a sequence of: mis->preempt_thread_status = PREEMPT_THREAD_QUIT; qemu_file_shutdown(mis->postcopy_qemufile_dst); While here shutdown() is probably so far the easiest way to kick preempt thread from a blocked qemu_get_be64(). Then it reads preempt_thread_status to make sure it's not a network failure but a willingness to quit the thread. We could have avoided that extra status but just rely on migration status. The problem is postcopy_ram_incoming_cleanup() is just called early enough so we're still during POSTCOPY_ACTIVE no matter what.. So just make it simple to have the status introduced. One flag x-preempt-pre-7-2 is added to keep old pre-7.2 behaviors of postcopy preempt. Fixes: 9358982744 ("migration: Send requested page directly in rp-return thread") Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/block: replace uses of blk_nb_sectors that do not check resultPaolo Bonzini2023-04-111-3/+2
| | | | | | | | | | | | | Uses of blk_nb_sectors must check whether the result is negative. Otherwise, underflow can happen. Fortunately, alloc_aio_bitmap() and bmds_aio_inflight() both have an alternative way to retrieve the number of sectors in the file. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20230407153303.391121-6-pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* *: Add missing includes of qemu/error-report.hRichard Henderson2023-03-222-0/+2
| | | | | | | | | | | | | This had been pulled in via qemu/plugin.h from hw/core/cpu.h, but that will be removed. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20230310195252.210956-5-richard.henderson@linaro.org> [AJB: add various additional cases shown by CI] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20230315174331.2959-15-alex.bennee@linaro.org> Reviewed-by: Emilio Cota <cota@braap.org>
* migration: fix populate_vfio_infoSteve Sistare2023-03-161-1/+1
| | | | | | | | | | | | | | Include CONFIG_DEVICES so that populate_vfio_info is instantiated for CONFIG_VFIO. Without it, the 'info migrate' command never returns info about vfio. Fixes: 43bd0bf30f ("migration: Move populate_vfio_info() into a separate file") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: correct multifd_send_thread to trace the flagsWei Wang2023-03-161-1/+2
| | | | | | | | | | | | | The p->flags could be updated via the send_prepare callback, e.g. OR-ed with MULTIFD_FLAG_ZLIB via zlib_send_prepare. Assign p->flags to the local "flags" before the send_prepare callback could only get partial of p->flags. Fix it by moving the assignment of p->flags to the local flags after the callback, so that the correct flags can be traced. Fixes: ab7cbb0b9a3b ("multifd: Make no compression operations into its own structure") Signed-off-by: Wei Wang <wei.w.wang@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: Remove deprecated variable rdma_return_pathLi Zhijian2023-03-161-2/+1
| | | | | | | | | It's no longer needed since commit 44bcfd45e98 ("migration/rdma: destination: create the return patch after the first accept") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/xbzrle: fix out-of-bounds write with axv512Matheus Tavares Bernardino2023-03-161-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xbzrle_encode_buffer_avx512() checks for overflows too scarcely in its outer loop, causing out-of-bounds writes: $ ../configure --target-list=aarch64-softmmu --enable-sanitizers --enable-avx512bw $ make tests/unit/test-xbzrle && ./tests/unit/test-xbzrle ==5518==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x62100000b100 at pc 0x561109a7714d bp 0x7ffed712a440 sp 0x7ffed712a430 WRITE of size 1 at 0x62100000b100 thread T0 #0 0x561109a7714c in uleb128_encode_small ../util/cutils.c:831 #1 0x561109b67f6a in xbzrle_encode_buffer_avx512 ../migration/xbzrle.c:275 #2 0x5611099a7428 in test_encode_decode_overflow ../tests/unit/test-xbzrle.c:153 #3 0x7fb2fb65a58d (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a58d) #4 0x7fb2fb65a333 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7a333) #5 0x7fb2fb65aa79 in g_test_run_suite (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa79) #6 0x7fb2fb65aa94 in g_test_run (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x7aa94) #7 0x5611099a3a23 in main ../tests/unit/test-xbzrle.c:218 #8 0x7fb2fa78c082 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x24082) #9 0x5611099a608d in _start (/qemu/build/tests/unit/test-xbzrle+0x28408d) 0x62100000b100 is located 0 bytes to the right of 4096-byte region [0x62100000a100,0x62100000b100) allocated by thread T0 here: #0 0x7fb2fb823a06 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cc:153 #1 0x7fb2fb637ef0 in g_malloc0 (/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x57ef0) Fix that by performing the overflow check in the inner loop, instead. Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/xbzrle: use ctz64 to avoid undefined resultMatheus Tavares Bernardino2023-03-161-2/+3
| | | | | | | | | | | | | | | | __builtin_ctzll() produces undefined results when the argument is 0. This can be seen through test-xbzrle, which produces the following warning: ../migration/xbzrle.c:265: runtime error: passing zero to ctz(), which is not a valid argument Replace __builtin_ctzll() with our ctz64() wrapper which properly handles 0. Signed-off-by: Matheus Tavares Bernardino <quic_mathbern@quicinc.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/rdma: Fix return-path caseDr. David Alan Gilbert2023-03-161-3/+5
| | | | | | | | | | | | | | | | | The RDMA code has return-path handling code, but it's only enabled if postcopy is enabled; if the 'return-path' migration capability is enabled, the return path is NOT setup but the core migration code still tries to use it and breaks. Enable the RDMA return path if either postcopy or the return-path capability is enabled. bz: https://bugzilla.redhat.com/show_bug.cgi?id=2063615 Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Wait on preempt channel in preempt threadPeter Xu2023-03-161-5/+6
| | | | | | | | | | | | | | | | | | | | | | QEMU main thread will wait until dest preempt channel established during processing the LISTEN command (within the whole postcopy PACKAGED data), by waiting on the semaphore postcopy_qemufile_dst_done. That's racy, because it's possible that the dest QEMU main thread hasn't yet accept()ed the new connection when processing the LISTEN event. The sem_wait() will yield the main thread without being able to run anything else including the accept() of the new socket, which can cause deadlock within the main thread. To avoid the race, move the "wait channel" from main thread to the preempt thread right at the start. Reported-by: Peter Maydell <peter.maydell@linaro.org> Fixes: 5655aab079 ("migration: Postpone postcopy preempt channel to be after main") Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* Fix exec migration on Windows (w32+w64).John Berberian, Jr2023-03-021-0/+24
| | | | | | | | | | | | | | * Use cmd instead of /bin/sh on Windows. * Try to auto-detect cmd.exe's path, but default to a hard-coded path. Note that this will require that gspawn-win[32|64]-helper.exe and gspawn-win[32|64]-helper-console.exe are included in the Windows binary distributions (cc: Stefan Weil). Signed-off-by: "John Berberian, Jr" <jeb.study@gmail.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/colo: Improve an x-colo-lost-heartbeat error messageMarkus Armbruster2023-02-231-2/+1
| | | | | | | | | | | | | | | | | | | | The QERR_ macros are leftovers from the days of "rich" error objects. We've been trying to reduce their remaining use. Get rid of a use of QERR_FEATURE_DISABLED, and improve the somewhat imprecise error message (qemu) x_colo_lost_heartbeat Error: The feature 'colo' is not enabled to Error: VM is not in COLO mode Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-12-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com>
* error: Drop superfluous #include "qapi/qmp/qerror.h"Markus Armbruster2023-02-232-2/+0
| | | | | | | | Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20230207075115.1525-2-armbru@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Konstantin Kostiuk <kkostiuk@redhat.com>
* migration: Rename res_{postcopy,precopy}_onlyJuan Quintela2023-02-156-44/+37
| | | | | | | | | Once that res_compatible is removed, they don't make sense anymore. We remove the _only preffix. And to make things clearer we rename them to must_precopy and can_postcopy. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Remove unused res_compatibleJuan Quintela2023-02-157-24/+12
| | | | | | | Nothing assigns to it after previous commit. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: In case of postcopy, the memory ends in res_postcopy_onlyJuan Quintela2023-02-151-1/+1
| | | | | | | So remove last assignation of res_compatible. Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/block: Convert remaining DPRINTF() debug macro to trace eventsPhilippe Mathieu-Daudé2023-02-152-11/+2
| | | | | | | | | Finish the conversion from commit fe80c0241d ("migration: using trace_ to replace DPRINTF"). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/qemu-file: Add qemu_file_get_to_fd()Avihai Horon2023-02-152-0/+35
| | | | | | | | | | | | | Add new function qemu_file_get_to_fd() that allows reading data from QEMUFile and writing it straight into a given fd. This will be used later in VFIO migration code. Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* ram: Document migration ram flagsJuan Quintela2023-02-131-6/+10
| | | | | | | | | 0x80 is RAM_SAVE_FLAG_HOOK, it is in qemu-file now. Bigger usable flag is 0x200, noticing that. We can reuse RAM_SAVe_FLAG_FULL. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: Move load_cleanup inside incoming_state_destroyLeonardo Bras2023-02-133-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently running migration_incoming_state_destroy() without first running multifd_load_cleanup() will cause a yank error: qemu-system-x86_64: ../util/yank.c:107: yank_unregister_instance: Assertion `QLIST_EMPTY(&entry->yankfns)' failed. (core dumped) The above error happens in the target host, when multifd is being used for precopy, and then postcopy is triggered and the migration finishes. This will crash the VM in the target host. To avoid that, move multifd_load_cleanup() inside migration_incoming_state_destroy(), so that the load cleanup becomes part of the incoming state destroying process. Running multifd_load_cleanup() twice can become an issue, though, but the only scenario it could be ran twice is on process_incoming_migration_bh(). So removing this extra call is necessary. On the other hand, this multifd_load_cleanup() call happens way before the migration_incoming_state_destroy() and having this happening before dirty_bitmap_mig_before_vm_start() and vm_start() may be a need. So introduce a new function multifd_load_shutdown() that will mainly stop all multifd threads and close their QIOChannels. Then use this function instead of multifd_load_cleanup() to make sure nothing else is received before dirty_bitmap_mig_before_vm_start(). Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: Join all multifd threads in order to avoid leaksLeonardo Bras2023-02-131-1/+2
| | | | | | | | | | | | | | | | | | | | | Current approach will only join threads that are still running. For the threads not joined, resources or private memory are always kept in the process space and never reclaimed before process end, and this risks serious memory leaks. This should usually not represent a big problem, since multifd migration is usually just ran at most a few times, and after it succeeds there is not much to be done before exiting the process. Yet still, it should not hurt performance to join all of them. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: Remove unnecessary assignment on multifd_load_cleanup()Leonardo Bras2023-02-131-1/+0
| | | | | | | | | | | | | | | Before assigning "p->quit = true" for every multifd channel, multifd_load_cleanup() will call multifd_recv_terminate_threads() which already does the same assignment, while protected by a mutex. So there is no point doing the same assignment again. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration/multifd: Change multifd_load_cleanup() signature and usageLeonardo Bras2023-02-133-15/+7
| | | | | | | | | | | | | | | | | | | | | Since it's introduction in commit f986c3d256 ("migration: Create multifd migration threads"), multifd_load_cleanup() never returned any value different than 0, neither set up any error on errp. Even though, on process_incoming_migration_bh() an if clause uses it's return value to decide on setting autostart = false, which will never happen. In order to simplify the codebase, change multifd_load_cleanup() signature to 'void multifd_load_cleanup(void)', and for every usage remove error handling or decision made based on return value != 0. Fixes: b5eea99ec2 ("migration: Add yank feature") Reported-by: Li Xiaohui <xiaohli@redhat.com> Signed-off-by: Leonardo Bras <leobras@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Postpone postcopy preempt channel to be after mainPeter Xu2023-02-115-21/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | Postcopy with preempt-mode enabled needs two channels to communicate. The order of channel establishment is not guaranteed. It can happen that the dest QEMU got the preempt channel connection request before the main channel is established, then the migration may make no progress even during precopy due to the wrong order. To fix it, create the preempt channel only if we know the main channel is established. For a general postcopy migration, we delay it until postcopy_start(), that's where we already went through some part of precopy on the main channel. To make sure dest QEMU has already established the channel, we wait until we got the first PONG received. That's something we do at the start of precopy when postcopy enabled so it's guaranteed to happen sooner or later. For a postcopy recovery, we delay it to qemu_savevm_state_resume_prepare() where we'll have round trips of data on bitmap synchronizations, which means the main channel must have been established. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Add a semaphore to count PONGsPeter Xu2023-02-112-0/+9
| | | | | | | | | This is mostly useless, but useful for us to know whether the main channel is correctly established without changing the migration protocol. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Cleanup postcopy_preempt_setup()Peter Xu2023-02-113-14/+4
| | | | | | | | | | | Since we just dropped the only case where postcopy_preempt_setup() can return an error, it doesn't need a retval anymore because it never fails. Move the preempt check to the caller, preparing it to be used elsewhere to do nothing but as simple as kicking the async connection. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Rework multi-channel checks on URIPeter Xu2023-02-114-42/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whole idea of multi-channel checks was not properly done, IMHO. Currently we check multi-channel in a lot of places, but actually that's not needed because we only need to check it right after we get the URI and that should be it. If the URI check succeeded, we should never need to check it again because we must have it. If it check fails, we should fail immediately on either the qmp_migrate or qmp_migrate_incoming, instead of failingg it later after the connection established. Neither should we fail any set capabiliities like what we used to do here: 5ad15e8614 ("migration: allow enabling mutilfd for specific protocol only", 2021-10-19) Because logically the URI will only be set later after the capability is set, so it doesn't make a lot of sense to check the URI type when setting the capability, because we're checking the cap with an old URI passed in, and that may not even be the URI we're going to use later. This patch mostly reverted all such checks for before, dropping the variable migrate_allow_multi_channels and helpers. Instead, add a common helper to check URI for multi-channels for either qmp_migrate and qmp_migrate_incoming and that should do all the proper checks. The failure will only trigger with the "migrate" or "migrate_incoming" command, or when user specified "-incoming xxx" where "xxx" is not "defer". Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* AVX512 support for xbzrle_encode_bufferling xu2023-02-113-3/+159
| | | | | | | | | | | | | | | | This commit is the same with [PATCH v6 1/2], and provides avx512 support for xbzrle_encode_buffer function to accelerate xbzrle encoding speed. Runtime check of avx512 support and benchmark for this feature are added. Compared with C version of xbzrle_encode_buffer function, avx512 version can achieve 50%-70% performance improvement on benchmarking. In addition, if dirty data is randomly located in 4K page, the avx512 version can achieve almost 140% performance gain. Signed-off-by: ling xu <ling1.xu@intel.com> Co-authored-by: Zhou Zhao <zhou.zhao@intel.com> Co-authored-by: Jun Jin <jun.i.jin@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: I messed state_pending_exact/estimateJuan Quintela2023-02-111-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I called the helper function from the wrong top level function. This code was introduced in: commit c8df4a7aeffcb46020f610526eea621fa5b0cd47 Author: Juan Quintela <quintela@redhat.com> Date: Mon Oct 3 02:00:03 2022 +0200 migration: Split save_live_pending() into state_pending_* We split the function into to: - state_pending_estimate: We estimate the remaining state size without stopping the machine. - state pending_exact: We calculate the exact amount of remaining state. Thanks to Avihai Horon <avihaih@nvidia.com> for finding it. Fixes:c8df4a7aeffcb46020f610526eea621fa5b0cd47 When we introduced that patch, we enden calling state_pending_estimate() helper from qemu_savevm_statepending_exact() and state_pending_exact() helper from qemu_savevm_statepending_estimate() This patch fixes it. Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Make ram_save_target_page() a pointerJuan Quintela2023-02-111-4/+15
| | | | | | | We are going to create a new function for multifd latest in the series. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
* migration: Calculate ram size onceJuan Quintela2023-02-111-2/+5
| | | | | | | | We are recalculating ram size continously, when we know that it don't change during migration. Create a field in RAMState to track it. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* migration: Split ram_bytes_total_common() in two functionsJuan Quintela2023-02-111-11/+14
| | | | | | | | | | | | It is just a big if in the middle of the function, and we need two functions anways. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com> --- Reindent to make Phillipe happy (and CODING_STYLE)
* migration: Make find_dirty_block() return a single parameterJuan Quintela2023-02-111-15/+22
| | | | | | | | | | | | We used to return two bools, just return a single int with the following meaning: old return / again / new return false false PAGE_ALL_CLEAN false true PAGE_TRY_AGAIN true true PAGE_DIRTY_FOUND /* We don't care about again at all */ Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Simplify ram_find_and_save_block()Juan Quintela2023-02-111-11/+9
| | | | | | | | | We will need later that find_dirty_block() return errors, so simplify the loop. Signed-off-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* multifd: Remove some redundant codeLi Zhang2023-02-111-11/+4
| | | | | | | Clean up some unnecessary code Signed-off-by: Li Zhang <lizhang@suse.de> Signed-off-by: Juan Quintela <quintela@redhat.com>
* multifd: cleanup the function multifd_channel_connectLi Zhang2023-02-111-22/+21
| | | | | | | | Cleanup multifd_channel_connect Signed-off-by: Li Zhang <lizhang@suse.de> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Remove spurious filesJuan Quintela2023-02-111-1274/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I introduced spurious files on my tree during a rebase: commit ebfc57871506b3fe36cc41f69ee3ad31a34afd63 Author: Zhenzhong Duan <zhenzhong.duan@intel.com> Date: Mon Oct 17 15:53:51 2022 +0800 multifd: Fix flush of zero copy page send request Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com> To make things worse, it appears like Zhenzhong is the one to blame. for(int i=0; i < 1000000; i++) { printf("I will not do rebases when I am tired\n"); } Sorry, Juan. Reviewed-by: Cédric Le Goater <clg@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Juan Quintela <quintela@redhat.com>
* Drop duplicate #includeMarkus Armbruster2023-02-081-2/+0
| | | | | | | | | | | Tracked down with the help of scripts/clean-includes. Signed-off-by: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Greg Kurz <groug@kaod.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Message-Id: <20230202133830.2152150-21-armbru@redhat.com>
* migration: save/delete migration thread infoJiang Jiacheng2023-02-062-0/+10
| | | | | | | | | To support query migration thread infomation, save and delete thread(live_migration and multifdsend) information at thread creation and finish. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* migration: Introduce interface query-migrationthreadsJiang Jiacheng2023-02-063-0/+80
| | | | | | | | | | | Introduce interface query-migrationthreads. The interface is used to query information about migration threads and returns with migration thread's name and its id. Introduce threadinfo.c to manage threads with migration. Signed-off-by: Jiang Jiacheng <jiangjiacheng@huawei.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>
* multifd: Fix flush of zero copy page send requestZhenzhong Duan2023-02-062-4/+1278
| | | | | | | | | Make IO channel flush call after the inflight request has been drained in multifd thread, or else we may missed to flush the inflight request. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com> Reviewed-by: Juan Quintela <quintela@redhat.com> Signed-off-by: Juan Quintela <quintela@redhat.com>