focaccia-qemu - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	system/runstate: Document qemu_add_vm_change_state_handler_prio* in hdr	Philippe Mathieu-Daudé	2025-07-15	2	-30/+30
\| \| \| \| \| \| \| \| \| \| \| \|	Generally APIs to the rest of QEMU should be documented in the headers. Comments on individual functions or internal details are fine to live in the C files. Make qemu_add_vm_change_state_handler_prio[_full]() docstrings consistent by moving them from source to header. Suggested-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250715171920.89670-1-philmd@linaro.org>
*	system/runstate: Document qemu_add_vm_change_state_handler()	Philippe Mathieu-Daudé	2025-07-15	1	-0/+10
\| \| \| \| \| \| \| \|	Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20250703173248.44995-4-philmd@linaro.org>
*	accel/hvf: Implement AccelClass::get_vcpu_stats() handler	Philippe Mathieu-Daudé	2025-07-15	1	-0/+25
\| \| \| \| \| \| \| \|	Co-developed-by: Mads Ynddal <mads@ynddal.dk> Signed-off-by: Mads Ynddal <mads@ynddal.dk> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250715104015.72663-8-philmd@linaro.org>
*	accel/tcg: Implement AccelClass::get_stats() handler	Philippe Mathieu-Daudé	2025-07-15	3	-2/+10
\| \| \| \| \| \| \| \| \| \|	Factor tcg_get_stats() out of tcg_dump_stats(), passing the current accelerator argument to match the AccelClass::get_stats() prototype. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250715140048.84942-7-philmd@linaro.org>
*	accel/tcg: Propagate AccelState to dump_accel_info()	Philippe Mathieu-Daudé	2025-07-15	4	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \|	Declare tcg_dump_stats() in "tcg/tcg.h" so it can be used out of accel/tcg/, like by {bsd,linux}-user. Next commit will register the TCG AccelClass::get_stats handler, which expects a AccelState, so propagate it to dump_accel_info(). Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20250715140048.84942-6-philmd@linaro.org>
*	accel/system: Add 'info accel' on human monitor	Philippe Mathieu-Daudé	2025-07-15	2	-0/+20
\| \| \| \| \| \| \| \| \| \|	'info accel' dispatches to the AccelOpsClass::get_stats() and get_vcpu_stats() handlers. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Acked-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250715140048.84942-5-philmd@linaro.org>
*	accel/system: Introduce @x-accel-stats QMP command	Philippe Mathieu-Daudé	2025-07-15	6	-1/+59
\| \| \| \| \| \| \| \| \| \| \|	Unstable QMP 'x-accel-stats' dispatches to the AccelOpsClass::get_stats() and get_vcpu_stats() handlers. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20250715140048.84942-4-philmd@linaro.org>
*	accel/tcg: Extract statistic related code to tcg-stats.c	Philippe Mathieu-Daudé	2025-07-15	3	-201/+216
\| \| \| \| \| \| \| \| \| \| \| \| \|	Statistic code is not specific to system emulation (except cross-page checks) and can be used to analyze user-mode binaries. Extract statistic related code to its own file: tcg-stats.c, keeping the original LGPL-2.1-or-later license tag. Note, this code is not yet reachable by user-mode. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250715140048.84942-3-philmd@linaro.org>
*	Revert "accel/tcg: Unregister the RCU before exiting RR thread"	Philippe Mathieu-Daudé	2025-07-15	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit bc93332fe460211c2d2f4ff50e1a0e030c7b5159, which was merged prematurely, re-introducing Coverity CID 1547782 (unreachable code). Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250715104015.72663-2-philmd@linaro.org>
*	accel: Extract AccelClass definition to 'accel/accel-ops.h'	Philippe Mathieu-Daudé	2025-07-15	19	-39/+71
\| \| \| \| \| \| \| \| \| \| \|	Only accelerator implementations (and the common accelator code) need to know about AccelClass internals. Move the definition out but forward declare AccelState and AccelClass. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-39-philmd@linaro.org>
*	accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h'	Philippe Mathieu-Daudé	2025-07-15	12	-15/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unfortunately "system/accel-ops.h" handlers are not only system-specific. For example, the cpu_reset_hold() hook is part of the vCPU creation, after it is realized. Mechanical rename to drop 'system' using: $ sed -i -e s_system/accel-ops.h_accel/accel-cpu-ops.h_g \ $(git grep -l system/accel-ops.h) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-38-philmd@linaro.org>
*	accel/tcg: Do not dump NaN statistics	Philippe Mathieu-Daudé	2025-07-15	1	-7/+15
\| \| \| \| \| \|	Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <20250710111303.8917-1-philmd@linaro.org>
*	hw/core/machine: Display CPU model name in 'info cpus' command	Philippe Mathieu-Daudé	2025-07-15	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Display the CPU model in 'info cpus'. Example before: $ qemu-system-aarch64 -M xlnx-versal-virt -S -monitor stdio QEMU 10.0.0 monitor - type 'help' for more information (qemu) info cpus * CPU #0: thread_id=42924 CPU #1: thread_id=42924 CPU #2: thread_id=42924 CPU #3: thread_id=42924 (qemu) q and after: $ qemu-system-aarch64 -M xlnx-versal-virt -S -monitor stdio QEMU 10.0.50 monitor - type 'help' for more information (qemu) info cpus * CPU #0: thread_id=42916 model=cortex-a72 CPU #1: thread_id=42916 model=cortex-a72 CPU #2: thread_id=42916 model=cortex-r5f CPU #3: thread_id=42916 model=cortex-r5f (qemu) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20250715090624.52377-3-philmd@linaro.org>
*	qapi/machine: Add @qom-type field to CpuInfoFast structure	Philippe Mathieu-Daudé	2025-07-15	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \|	Knowing the QOM type name of a CPU can be useful, in particular to infer its model name. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20250715090624.52377-2-philmd@linaro.org>
*	qapi/accel: Move definitions related to accelerators in their own file	Philippe Mathieu-Daudé	2025-07-15	7	-29/+44
\| \| \| \| \| \| \| \| \|	Extract KVM definitions from machine.json to accelerator.json. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Message-Id: <20250703105540.67664-29-philmd@linaro.org>
*	hw/arm/xen-pvh: Remove unnecessary 'hw/xen/arch_hvm.h' header	Philippe Mathieu-Daudé	2025-07-15	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	"hw/xen/arch_hvm.h" only declares the arch_handle_ioreq() and arch_xen_set_memory() prototypes, which are not used by xen-pvh.c. Remove the unnecessary header inclusion. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> Message-Id: <20250715071528.46196-1-philmd@linaro.org>
*	hw/xen/arch_hvm: Unify x86 and ARM variants	Philippe Mathieu-Daudé	2025-07-15	3	-24/+10
\| \| \| \| \| \| \| \| \| \|	As each target declares the same prototypes, we can use a single header, removing the TARGET_XXX uses. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Message-Id: <20250513171737.74386-1-philmd@linaro.org>
*	Merge tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu into staging	Stefan Hajnoczi	2025-07-13	12	-33/+137
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fpu: Process float_muladd_negate_result after rounding tcg: Use uintptr_t in tcg_malloc implementation linux-user: Hold the fd-trans lock across fork linux-user: Implement fchmodat2 syscall linux-user: Check for EFAULT failure in nanosleep linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC linux-user/gen-vdso: Handle fseek() failure linux-user/gen-vdso: Don't read off the end of buf[] # -----BEGIN PGP SIGNATURE----- # # iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAmhxSAkdHHJpY2hhcmQu # aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV9wiQf+PrXwKj+FusE0YU1y # Lnx6+S0M/lDRCNhbgBrw7JK5WUwIfnZQuepf0vjuhoHH1rUdT1EUYdJ7Quwj9fgG # 0YcKRD8OAVKNU8I3ydtzSaJ3TZ02nbbDbwGMoD/eNXGKx0Gt5907vD4PrjT+mByG # 6QTLwuql3ahkl/Tiskk2LwbmHRe0CXiezVuzgprbNiyxrgDT8ArqCq+VJzv/wb2O # 4t6BqRDvBzRe7MUUs2B2W+hs0HW4Rfqcye/3rRnYe7HA4CTiVNqY9rwgrQqGEO0P # 3Cf+VaF6CaLz+HuHfM8rz+xBhfo+UpZYOVMXk/7VEAG6geMKTcQG1tCJYhL+xklJ # 9r4ABw== # =rD+6 # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 13:21:13 EDT # gpg: using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F # gpg: issuer "richard.henderson@linaro.org" # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * tag 'pull-tcg-20250711' of https://gitlab.com/rth7680/qemu: linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC tcg: Use uintptr_t in tcg_malloc implementation linux-user: Hold the fd-trans lock across fork linux-user/mips/o32: Drop sa_restorer functionality linux-user/gen-vdso: Don't read off the end of buf[] linux-user/gen-vdso: Handle fseek() failure linux-user: Check for EFAULT failure in nanosleep linux-user: Implement fchmodat2 syscall fpu: Process float_muladd_negate_result after rounding Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
\| *	linux-user: Use qemu_set_cloexec() to mark pidfd as FD_CLOEXEC	Peter Maydell	2025-07-11	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the linux-user do_fork() function we try to set the FD_CLOEXEC flag on a pidfd like this: fcntl(pid_fd, F_SETFD, fcntl(pid_fd, F_GETFL) \| FD_CLOEXEC); This has two problems: (1) it doesn't check errors, which Coverity complains about (2) we use F_GETFL when we mean F_GETFD Deal with both of these problems by using qemu_set_cloexec() instead. That function will assert() if the fcntls fail, which is fine (we are inside fork_start()/fork_end() so we know nothing can mess around with our file descriptors here, and we just got this one from pidfd_open()). (As we are touching the if() statement here, we correct the indentation.) Coverity: CID 1508111 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250711141217.1429412-1-peter.maydell@linaro.org>
\| *	tcg: Use uintptr_t in tcg_malloc implementation	Richard Henderson	2025-07-11	2	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid ubsan failure with clang-20, tcg.h:715:19: runtime error: applying non-zero offset 64 to null pointer by not using pointers. Acked-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
\| *	linux-user: Hold the fd-trans lock across fork	Geoffrey Thomas	2025-07-10	2	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If another thread is holding target_fd_trans_lock during a fork, then the lock becomes permanently locked in the child and the emulator deadlocks at the next interaction with the fd-trans table. As with other locks, acquire the lock in fork_start() and release it in fork_end(). Cc: qemu-stable@nongnu.org Signed-off-by: Geoffrey Thomas <geofft@ldpreload.com> Fixes: c093364f4d91 "fd-trans: Fix race condition on reallocation of the translation table." Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2846 Buglink: https://github.com/astral-sh/uv/issues/6105 Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250314124742.4965-1-geofft@ldpreload.com>
\| *	linux-user/mips/o32: Drop sa_restorer functionality	Thomas Weißschuh	2025-07-10	2	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Linux kernel dropped support for sa_restorer on O32 MIPS in the release 2.5.48 because it was unused. See the comment in arch/mips/include/uapi/asm/signal.h. Applications using the kernels UAPI headers will not reserve enough space for qemu-user to copy the sigaction.sa_restorer field to. Unrelated data may be overwritten. Align qemu-user with the kernel by also dropping sa_restorer support. Signed-off-by: Thomas Weißschuh <thomas@t-8ch.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250709-mips-sa-restorer-v1-1-fc17120e4afe@t-8ch.de>
\| *	linux-user/gen-vdso: Don't read off the end of buf[]	Peter Maydell	2025-07-10	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In gen-vdso we load in a file and assume it's a valid ELF file. In particular we assume it's big enough to be able to read the ELF information in e_ident in the ELF header. Add a check that the total file length is at least big enough for all the e_ident bytes, which is good enough for the code in gen-vdso.c. This will catch the most obvious possible bad input file (truncated) and allow us to run the sanity checks like "not actually an ELF file" without potentially crashing. The code in elf32_process() and elf64_process() still makes assumptions about the file being well-formed, but this is OK because we only run it on the vdso binaries that we create ourselves in the build process by running the compiler. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250710170707.1299926-3-peter.maydell@linaro.org>
\| *	linux-user/gen-vdso: Handle fseek() failure	Peter Maydell	2025-07-10	1	-2/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Coverity points out that we don't check for fseek() failure in gen-vdso.c, and so we might pass -1 to malloc(). Add the error checking. (This is a standalone executable that doesn't link against glib, so we can't do the easy thing and use g_file_get_contents().) Coverity: CID 1523742 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250710170707.1299926-2-peter.maydell@linaro.org>
\| *	linux-user: Check for EFAULT failure in nanosleep	Peter Maydell	2025-07-10	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	target_to_host_timespec() returns an error if the memory the guest passed us isn't actually readable. We check for this everywhere except the callsite in the TARGET_NR_nanosleep case, so this mistake was caught by a Coverity heuristic. Add the missing error checks to the calls that convert between the host and target timespec structs. Coverity: CID 1507104 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250710164355.1296648-1-peter.maydell@linaro.org>
\| *	linux-user: Implement fchmodat2 syscall	Peter Maydell	2025-07-10	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The fchmodat2 syscall is new from Linux 6.6; it is like the existing fchmodat syscall except that it takes a flags parameter. Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3019 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250710113123.1109461-1-peter.maydell@linaro.org>
\| *	fpu: Process float_muladd_negate_result after rounding	Richard Henderson	2025-07-10	4	-14/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Changing the sign before rounding affects the correctness of the asymmetric rouding modes: float_round_up and float_round_down. Reported-by: WANG Rui <wangrui@loongson.cn> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* \|	Merge tag 'migration-20250711-pull-request' of ↵	Stefan Hajnoczi	2025-07-13	24	-321/+798
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	https://gitlab.com/farosas/qemu into staging Migration pull request - General cleanups around: postcopy, bg-snapshot, migration hooks, migration completion and formatting of 'info migrate'. - Overhaul of postcopy blocktime tracking. # -----BEGIN PGP SIGNATURE----- # # iQJEBAABCAAuFiEEqhtIsKIjJqWkw2TPx5jcdBvsMZ0FAmhxGdgQHGZhcm9zYXNA # c3VzZS5kZQAKCRDHmNx0G+wxnahoD/9uNXirlmRk3tDnhiJsiYx+HnXYPFEORSZq # zlpUyqvhQ1POp3Fa5pRf+bJ5mmPw8h8PdOR2StMpnW2Xa1OatAZj5m1uityAVWOl # EkVfZLl0j6j9HCCmE3c4dztOGIBsd9YY0GWizL05XHYZPrdX4zOpolMN4m53RwQY # HUVD6T2y9eFDnCO6MsoA9EfmkFYCRvqlS0VzTcYzQFN4H+QHlcpDfweqJpTLPa+1 # trahAN9PBuMjoewjDqwkNkf0CLaCXHszAfj6yv62Vi8Cbp9DDPywIYJKFnxspElW # Fjg1b4MdsbYZNmeKgIawzgTOL1RrojvKkoi7KWp3D7M+/ZZl9kBwQuUcBXKI7N0R # Y0GNfkkTycn18nM0JU/6QWSuVeiPbLArxQUGP1cLgvcHSSNgD9JxWbNBu5+1fFOG # Gg3qnyYatJ6xJDiCrdKqV8fwozNlm/G6b9BiCDeVq+4nA2OKQ0shiNA1GZHvVSQL # X4uAPexETdHfA/LeA2w5sgVBEw7BewBdjLntZDIFsyBnLrvqrDcU5Aav0wiHoI8U # QBC2aIpJfMLHiIQ93mVX96NltXC7KvJTIZVl3iwfiYEYCvQtTYgdJ09ELXFJYxFX # XpTTazqpmPSfuZpPRgx9YbDP/kS8Fg/PTOlPeD0T/frFgd1S6Thh6OW455PavMp8 # ht2lE4sxjA== # =vtRD # -----END PGP SIGNATURE----- # gpg: Signature made Fri 11 Jul 2025 10:04:08 EDT # gpg: using RSA key AA1B48B0A22326A5A4C364CFC798DC741BEC319D # gpg: issuer "farosas@suse.de" # gpg: Good signature from "Fabiano Rosas <farosas@suse.de>" [unknown] # gpg: aka "Fabiano Almeida Rosas <fabiano.rosas@suse.com>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: AA1B 48B0 A223 26A5 A4C3 64CF C798 DC74 1BEC 319D * tag 'migration-20250711-pull-request' of https://gitlab.com/farosas/qemu: (26 commits) migration: Rename save_live_complete_precopy_thread to save_complete_precopy_thread migration/postcopy: Add latency distribution report for blocktime migration/postcopy: blocktime allows track / report non-vCPU faults migration/postcopy: Optimize blocktime fault tracking with hashtable migration/postcopy: Cleanup the total blocktime accounting migration/postcopy: Cache the tid->vcpu mapping for blocktime migration/postcopy: Initialize blocktime context only until listen migration/postcopy: Report fault latencies in blocktime migration/postcopy: Add blocktime fault counts per-vcpu migration/postcopy: Bring blocktime layer to ns level migration/postcopy: Drop PostcopyBlocktimeContext.start_time migration/postcopy: Make all blocktime vars 64bits migration/postcopy: Drop all atomic ops in blocktime feature migration/postcopy: Push blocktime start/end into page req mutex migration: Add option to set postcopy-blocktime migration/postcopy: Avoid clearing dirty bitmap for postcopy too migration: Rewrite the migration complete detect logic migration/ram: Add tracepoints for ram_save_complete() migration/ram: One less indent for ram_find_and_save_block() migration: qemu_savevm_complete*() helpers ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
\| * \|	migration: Rename save_live_complete_precopy_thread to ↵	Juraj Marcin	2025-07-11	9	-24/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	save_complete_precopy_thread Recent patch [1] renames the save_live_complete_precopy handler to save_complete, as the machine is not live in most cases when this handler is executed. The same is true also for save_live_complete_precopy_thread, therefore this patch removes the "live" keyword from the handler itself and related types to keep the naming unified. In contrast to save_complete, this handler is only executed at the end of precopy, therefore the "precopy" keyword is retained. [1]: https://lore.kernel.org/all/20250613140801.474264-7-peterx@redhat.com/ Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Cédric Le Goater <clg@redhat.com> Signed-off-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250626085235.294690-1-jmarcin@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Add latency distribution report for blocktime	Peter Xu	2025-07-11	4	-1/+90
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the latency distribution too for blocktime, using order-of-two buckets. It accounts for all the faults, from either vCPU or non-vCPU threads. With prior rework, it's very easy to achieve by adding an array to account for faults in each buckets. Sample output for HMP (while for QMP it's simply an array): Postcopy Latency Distribution: [ 1 us - 2 us ]: 0 [ 2 us - 4 us ]: 0 [ 4 us - 8 us ]: 1 [ 8 us - 16 us ]: 2 [ 16 us - 32 us ]: 2 [ 32 us - 64 us ]: 3 [ 64 us - 128 us ]: 10169 [ 128 us - 256 us ]: 50151 [ 256 us - 512 us ]: 12876 [ 512 us - 1 ms ]: 97 [ 1 ms - 2 ms ]: 42 [ 2 ms - 4 ms ]: 44 [ 4 ms - 8 ms ]: 93 [ 8 ms - 16 ms ]: 138 [ 16 ms - 32 ms ]: 0 [ 32 ms - 65 ms ]: 0 [ 65 ms - 131 ms ]: 0 [ 131 ms - 262 ms ]: 0 [ 262 ms - 524 ms ]: 0 [ 524 ms - 1 sec ]: 0 [ 1 sec - 2 sec ]: 0 [ 2 sec - 4 sec ]: 0 [ 4 sec - 8 sec ]: 0 [ 8 sec - 16 sec ]: 0 Cc: Markus Armbruster <armbru@redhat.com> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: blocktime allows track / report non-vCPU faults	Peter Xu	2025-07-11	5	-17/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When used to report page fault latencies, the blocktime feature can be almost useless when KVM async page fault is enabled, because in most cases such remote fault will kickoff async page faults, then it's not trackable from blocktime layer. After all these recent rewrites to blocktime layer, it's finally so easy to also support tracking non-vCPU faults. It'll be even faster if we could always index fault records with TIDs, unfortunately we need to maintain the blocktime API which report things in vCPU indexes. Of course this can work not only for kworkers, but also any guest accesses that may reach a missing page, for example, very likely when in the QEMU main thread too (and all other threads whenever applicable). In this case, we don't care about "how long the threads are blocked", but we only care about "how long the fault will be resolved". Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Optimize blocktime fault tracking with hashtable	Peter Xu	2025-07-11	2	-48/+216
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, the postcopy blocktime feature maintains vCPU fault information using an array (vcpu_addr[]). It has two issues. Issue 1: Performance Concern ============================ The old algorithm was almost OK and fast on inserts, except that the lookup is slow and won't scale if there are a lot of vCPUs: when a page is copied during postcopy, mark_postcopy_blocktime_end() will walk the whole array trying to find which vCPUs are blocked by the address. So it needs constant O(N) walk for each page resolution. Alexey (the author of postcopy blocktime) mentioned the perf issue and how to optimize it in a piece of comment in the page resolution path. The comment was (interestingly..) not complete, but it's relatively clear what he wanted to say about this perf issue. Issue 2: Wrong Accounting on re-entrancies ========================================== People might think that each vCPU should only and always get one fault at a time, so that when the blocktime layer captured one fault on one vCPU, we should never see another fault message on this vCPU. It's almost correct, except in some extreme rare cases. Case 1: it's possible the fault thread processes the userfaultfd messages too fast so it can see >1 messages on one vCPU before the previous one was resolved. Case 2: it's theoretically also possible one vCPU can get even more than one message on the same fault address if a fault is retried by the kernel (e.g., handle_userfault() got interrupted before page resolution). As this info might be important, instead of using commit message, I put more details into the code as comment, when introducing an array maintaining concurrent faults on one vCPU. Please refer to the comments for details on both cases, especially case 1 which can be tricky. Case 1 sounds rare, but it can be easily reproduced locally for me when we run blocktime together with the migration-test on the vanilla postcopy. New Design ========== This patch should do almost what Alexey mentioned, but slightly differently: instead of having an array to maintain vCPU fault addresses, for each of the fault message we push a message into a hash, indexed by the fault address. With the hash, it can replace the old two structs: both the vcpu_addr[] array, and also the array to store the start time of the fault. However due to above we need one more counter array to account concurrent faults on the same vCPU - that should even be needed in the old code, it's just that the old code was buggy and it will blindly overwrite an existing entry.. now we'll start to really track everything. The hash structure might be more efficient than tree to maintain such addr->(cpu, fault_time) information, so that the insert() and lookup() paths should ideally both be ~O(1). After all, we do not need to sort. Here we need to do one remove() though after the lookup(). It could be slow but only if many vCPUs faulted exactly on the same address (so when the list of cpu entries is long), which should be unlikely. Even with that, it's still a worst case O(N) (consider 400 vCPUs faulted on the same address and how likely is it..) rather than a constant O(N) complexity. When at it, touch up the tracepoints to make them slightly more useful. One tracepoint is added when walking all the fault entries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Cleanup the total blocktime accounting	Peter Xu	2025-07-11	1	-9/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The variable vcpu_total_blocktime isn't easy to follow. In reality, it wants to capture the case where all vCPUs are stopped, and now there will be some vCPUs starts running. The name now starts to conflict with vcpu_blocktime_total[], meanwhile it's actually not necessary to have the variable at all: since nobody is touching smp_cpus_down except ourselves, we can safely do the calculation at the end before decrementing smp_cpus_down. Hopefully this makes the logic easier to read, side benefit is we drop one temp var. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Cache the tid->vcpu mapping for blocktime	Peter Xu	2025-07-11	2	-12/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Looking up the vCPU index for each fault can be expensive when there're hundreds of vCPUs. Provide a cache for tid->vcpu instead with a hash table, then lookup from there. When at it, add another counter to record how many non-vCPU faults it gets. For example, the main thread can also access a guest page that was missing. These kind of faults are not accounted by blocktime so far. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Initialize blocktime context only until listen	Peter Xu	2025-07-11	1	-5/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch, the blocktime context can be created very early, because postcopy_ram_supported_by_host() <- migrate_caps_check() can happen during migration object init. The trick here is the blocktime context needs system vCPU information, which seems to be possible to change after that point. I didn't verify it, but it doesn't sound right. Now move it out and initialize the context only when postcopy listen starts. That is already during a migration so it should be guaranteed the vCPU topology can never change on both sides. While at it, assert that the ctx isn't created instead this time; the old "if" trick isn't needed when we're sure it will only happen once now. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Report fault latencies in blocktime	Peter Xu	2025-07-11	4	-37/+102
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Blocktime so far only cares about the time one vcpu (or the whole system) got blocked. It would be also be helpful if it can also report the latency of page requests, which could be very sensitive during postcopy. Blocktime itself is sometimes not very important, especially when one thinks about KVM async PF support, which means vCPUs are literally almost not blocked at all because the guest OS is smart enough to switch to another task when a remote fault is needed. However, latency is still sensitive and important because even if the guest vCPU is running on threads that do not need a remote fault, the workload that accesses some missing page is still affected. Add two entries to the report, showing how long it takes to resolve a remote fault. Mention in the QAPI doc that this is not the real average fault latency, but only the ones that was requested for a remote fault. Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, meanwhile add the entry checks in qtests for all postcopy tests. Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Add blocktime fault counts per-vcpu	Peter Xu	2025-07-11	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a field to count how many remote faults one vCPU has taken. So far it's still not used, but will be soon. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Bring blocktime layer to ns level	Peter Xu	2025-07-11	1	-12/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With 64-bit fields, it is trivial. The caution is when exposing any values in QMP, it was still declared with milliseconds (ms). Hence it's needed to do the convertion when exporting the values to existing QMP queries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Drop PostcopyBlocktimeContext.start_time	Peter Xu	2025-07-11	1	-6/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now with 64bits, the offseting using start_time is not needed anymore, because the array can always remember the whole timestamp. Then drop the unused parameter in get_low_time_offset() altogether. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Make all blocktime vars 64bits	Peter Xu	2025-07-11	2	-27/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I am guessing it was used to be 32bits because of the atomic ops. Now all the atomic ops are gone and we're protected by a mutex instead, it's ok we can switch to 64 bits. Reasons to move over: - Allow further patches to change the unit from ms to us: with postcopy preempt mode, we're really into hundreds of microseconds level on blocktime. We'd better be able to trap those. - This also paves way for some other tricks that the original version used to avoid overflows, e.g., start_time was almost only useful before to make sure the sampled timestamp won't overflow a 32-bit field. - This prepares further reports on top of existing data collected, e.g. average page fault latencies. When average operation is taken into account, milliseconds are simply too coarse grained. When at it: - Rename page_fault_vcpu_time to vcpu_blocktime_start. - Rename vcpu_blocktime to vcpu_blocktime_total. - Touch up the trace-events to not dump blocktime ctx pointer Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Drop all atomic ops in blocktime feature	Peter Xu	2025-07-11	1	-13/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now with the mutex protection it's not needed anymore. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Push blocktime start/end into page req mutex	Peter Xu	2025-07-11	5	-39/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The postcopy blocktime feature was tricky that it used quite some atomic operations over quite a few arrays and vars, without explaining how that would be thread safe. The thread safety here is about concurrency between the fault thread and the fault resolution threads, possible to access the same chunk of data. All these atomic ops can be expensive too before knowing clearly how it works. OTOH, postcopy has one page_request_mutex used to serialize the received bitmap updates. So far it's ok - we don't yet have a lot of threads contending the lock. It might change after multifd will be supported, but that's a separate story. What is important is, with that mutex, it's pretty lightweight to move all the blocktime maintenance into the mutex critical section. It's because the blocktime layer is lightweighted: almost "remember which vcpu faulted on which address", and "ok we get some fault resolved, calculate how long it takes". It's also an optional feature for now (but I have thought of changing that, maybe in the future). Let's push the blocktime layer into the mutex, so that it's always thread-safe even without any atomic ops. To achieve that, I'll need to add a tid parameter on fault path so that it'll start to pass the faulted thread ID into deeper the stack, but not too deep. When at it, add a comment for the shared fault handler (for example, vhost-user devices running with postcopy), to mention a TODO. One reason it might not be trivial is that vhost-user's userfaultfds should be opened by vhost-user process, so it's pretty hard to control making sure the TID feature will be around. It wasn't supported before, so keep it like that for now. Now we should be as ease when everything is protected by a mutex that we always take anyway. One side effect: we can finally remove one ramblock_recv_bitmap_test() in mark_postcopy_blocktime_begin(), which was pretty weird and which also includes a weird (but maybe necessary.. but maybe not?) operation to inject a blocktime entry then quickly erase it.. When we're with the mutex, and when we make sure it's invoked after checking the receive bitmap, it's not needed anymore. Instead, we assert. As another side effect, this paves way for removing all atomic ops in all the mem accesses in blocktime layer. Note that we need a stub for mark_postcopy_blocktime_begin() for Windows builds. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration: Add option to set postcopy-blocktime	Peter Xu	2025-07-11	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a global property to allow enabling postcopy-blocktime feature. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/postcopy: Avoid clearing dirty bitmap for postcopy too	Peter Xu	2025-07-11	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a follow up on the other commit "migration/ram: avoid to do log clear in the last round" but for postcopy. https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com I can observe more than 10% reduction of average page fault latency during postcopy phase with this optimization: Before: 268.00us (+-1.87%) After: 232.67us (+-2.01%) The test was done with a 16GB VM with 80 vCPUs, running a workload that busy random writes to 13GB memory. Cc: Yanfei Xu <yanfei.xu@bytedance.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration: Rewrite the migration complete detect logic	Peter Xu	2025-07-11	1	-15/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There're a few things off here in that logic, rewrite it. When at it, add rich comment to explain each of the decisions. Since this is very sensitive path for migration, below are the list of things changed with their reasonings. (1) Exact pending size is only needed for precopy not postcopy Fundamentally it's because "exact" version only does one more deep sync to fetch the pending results, while in postcopy's case it's never going to sync anything more than estimate as the VM on source is stopped. (2) Do _not_ rely on threshold_size anymore to decide whether postcopy should complete threshold_size was calculated from the expected downtime and bandwidth only during precopy as an efficient way to decide when to switchover. It's not sensible to rely on threshold_size in postcopy. For precopy, if switchover is decided, the migration will complete soon. It's not true for postcopy. Logically speaking, postcopy should only complete the migration if all pending data is flushed. Here it used to work because save_complete() used to implicitly contain save_live_iterate() when there's pending size. Even if that looks benign, having RAMs to be migrated in postcopy's save_complete() has other bad side effects: (a) Since save_complete() needs to be run once at a time, it means when moving RAM there's no way moving other things (rather than round-robin iterating the vmstate handlers like what we do with ITERABLE phase). Not an immediate concern, but it may stop working in the future when there're more than one iterables (e.g. vfio postcopy). (b) postcopy recovery, unfortunately, only works during ITERABLE phase. IOW, if src QEMU moves RAM during postcopy's save_complete() and network failed, then it'll crash both QEMUs... OTOH if it failed during iteration it'll still be recoverable. IOW, this change should further reduce the window QEMU split brain and crash in extreme cases. If we enable the ram_save_complete() tracepoints, we'll see this before this patch: 1267959@1748381938.294066:ram_save_complete dirty=9627, done=0 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 It means in this migration there're 9627 pages migrated at complete() of postcopy phase. After this change, all the postcopy RAM should be migrated in iterable phase, rather than save_complete(): 1267959@1748381938.294066:ram_save_complete dirty=0, done=0 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 (3) Adjust when to decide to switch to postcopy This shouldn't be super important, the movement makes sure there's only one in_postcopy check, then we are clear on what we do with the two completely differnt use cases (precopy v.s. postcopy). (4) Trivial touch up on threshold_size comparision Which changes: "(!pending_size \|\| pending_size < s->threshold_size)" into: "(pending_size <= s->threshold_size)" Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/ram: Add tracepoints for ram_save_complete()	Peter Xu	2025-07-11	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Take notes on start/end state of dirty pages for the whole system. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration/ram: One less indent for ram_find_and_save_block()	Peter Xu	2025-07-11	1	-9/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The check over PAGE_DIRTY_FOUND isn't necessary. We could indent one less and assert that instead. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration: qemu_savevm_complete*() helpers	Peter Xu	2025-07-11	1	-34/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we use the same save_complete() hook for both precopy and postcopy, add a set of helpers to invoke the hook() to dedup the code. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration: Rename save_live_complete_precopy to save_complete	Peter Xu	2025-07-11	9	-20/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now after merging the precopy and postcopy version of complete() hook, rename the precopy version from save_live_complete_precopy() to save_complete(). Dropping the "live" when at it, because it's in most cases not live when happening (in precopy). No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com [peterx: squash the fixup that covers a few more doc spots, per Juraj] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
\| * \|	migration: Drop save_live_complete_postcopy hook	Peter Xu	2025-07-11	4	-23/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The hook is only defined in two vmstate users ("ram" and "block dirty bitmap"), meanwhile both of them define the hook exactly the same as the precopy version. Hence, this postcopy version isn't needed. No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>