focaccia-qemu - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Files	Lines
2025-05-19	target/riscv: Fix vslidedown with rvv_ta_all_1s	Anton Blanchard	1	-2/+4
	vslidedown always zeroes elements past vl, where it should use the tail policy. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250414213006.3509058-1-antonb@tenstorrent.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: Fix the rvv reserved encoding of unmasked instructions	Max Chou	1	-9/+9
	According to the v spec, the encodings of vcomoress.vm and vector mask-register logical instructions with vm=0 are reserved. Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-11-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to vector indexed load/store ↵	Max Chou	1	-2/+4
	instructions Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-10-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to vector narrow/widen ↵	Max Chou	2	-18/+68
	instructions Handle the overlap of source registers with different EEWs. The vd of vector widening mul-add instructions is one of the input operands. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-9-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to vector integer extension ↵	Max Chou	1	-1/+3
	instructions(OPMVV) Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-8-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to vector slide ↵	Max Chou	1	-1/+3
	instructions(OPIVI/OPIVX) Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-7-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to OPIVV/OPFVV(vext_check_sss) ↵	Max Chou	1	-0/+1
	instructions Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-6-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to ↵	Max Chou	1	-1/+2
	OPIVI/OPIVX/OPFVF(vext_check_ss) instructions Handle the overlap of source registers with different EEWs. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-5-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Apply vext_check_input_eew to vrgather instructions to ↵	Max Chou	1	-0/+32
	check mismatched input EEWs encoding constraint According to the v spec, a vector register cannot be used to provide source operands with more than one EEW for a single instruction. The vs1 EEW of vrgatherei16.vv is 16. Co-authored-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-4-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Add CHECK arg to GEN_OPFVF_WIDEN_TRANS	Anton Blanchard	1	-9/+9
	Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Max Chou <max.chou@sifive.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-3-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: rvv: Source vector registers cannot overlap mask register	Anton Blanchard	1	-3/+26
	Add the relevant ISA paragraphs explaining why source (and destination) registers cannot overlap the mask register. Signed-off-by: Anton Blanchard <antonb@tenstorrent.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Max Chou <max.chou@sifive.com> Signed-off-by: Max Chou <max.chou@sifive.com> Message-ID: <20250408103938.3623486-2-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	common-user/host/riscv: use tail pseudoinstruction for calling tail	Icenowy Zheng	1	-2/+2
	The j pseudoinstruction maps to a JAL instruction, which can only handle a jump to somewhere with a signed 20-bit destination. In case of static linking and LTO'ing this easily leads to "relocation truncated to fit" error. Switch to use tail pseudoinstruction, which is the standard way to tail-call a function in medium code model (emits AUIPC+JALR). Signed-off-by: Icenowy Zheng <uwu@icenowy.me> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250417072206.364008-1-uwu@icenowy.me> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: fix endless translation loop on big endian systems	Ziqiao Kong	1	-2/+4
	On big endian systems, pte and updated_pte hold big endian host data while pte_pa points to little endian target data. This means the branch at cpu_helper.c:1669 will be always satisfied and restart translation, causing an endless translation loop. The correctness of this patch can be deduced by: old_pte will hold value either from cpu_to_le32/64(pte) or cpu_to_le32/64(updated_pte), both of wich is litte endian. After that, an in-place conversion by le32/64_to_cpu(old_pte) ensures that old_pte now is in native endian, same with pte. Therefore, the endianness of the both side of if (old_pte != pte) is correct. Signed-off-by: Ziqiao Kong <ziqiaokong@gmail.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20250415080254.3667878-2-ziqiaokong@gmail.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	hw/riscv: Fix type conflict of GLib function pointers	Paolo Bonzini	1	-2/+5
	qtest_set_command_cb passed to g_once should match GThreadFunc, which it does not. But using g_once is actually unnecessary, because the function is called by riscv_harts_realize() under the Big QEMU Lock. Reported-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Kohei Tokunaga <ktokunaga.mail@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20250410161722.595634-1-pbonzini@redhat.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	Expand the probe_pages helper function to handle probe flags.	Paolo Savini	1	-20/+37
	This commit expands the probe_pages helper function in target/riscv/vector_helper.c to handle also the cases in which we need access to the flags raised while probing the memory and the host address. This is done in order to provide a unified interface to probe_access and probe_access_flags. The new version of probe_pages can now act as a regular call to probe_access as before and as a call to probe_access_flags. In the latter case the user need to pass pointers to flags and host address and a boolean value for nonfault. The flags and host address will be set and made available as for a direct call to probe_access_flags. Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-ID: <20250313123926.374878-2-paolo.savini@embecosm.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores.	Paolo Savini	1	-47/+108
	This patch replaces the use of a helper function with direct tcg ops generation in order to emulate whole register loads and stores. This is done in order to improve the performance of QEMU. We still use the helper function when vstart is not 0 at the beginning of the emulation of the whole register load or store or when we would end up generating partial loads or stores of vector elements (e.g. emulating 64 bits element loads with pairs of 32 bits loads on hosts with 32 bits registers). The latter condition ensures that we are not surprised by a trap in mid-element and consecutively that we can update vstart correctly. We also use the helper function when it performs better than tcg for specific combinations of vector length, number of fields and element size. Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Richard Handerson <richard.henderson@linaro.org> Reviewed-by: Max Chou <max.chou@sifive.com> Reviewed-by: "Alex Bennée" <alex.bennee@linaro.org> Message-ID: <20250313152330.398396-2-paolo.savini@embecosm.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/riscv: microchip_pfsoc: Rework documentation	Sebastian Huber	1	-81/+43
	Mention that running the HSS no longer works. Document the changed boot options. Reorder documentation blocks. Update URLs. Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250319061342.26435-7-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/riscv: Configurable MPFS CLINT timebase freq	Sebastian Huber	2	-4/+46
	This property enables the setting of the CLINT timebase frequency through the command line, for example: -machine microchip-icicle-kit,clint-timebase-frequency=10000000 Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250319061342.26435-6-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/riscv: Allow direct start of kernel for MPFS	Sebastian Huber	1	-17/+42
	Further customize the -bios and -kernel options behaviour for the microchip-icicle-kit machine. If "-bios none -kernel filename" is specified, then do not load a firmware and instead only load and start the kernel image. For test runs, use an approach similar to riscv_find_and_load_firmware(). Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250319061342.26435-5-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/riscv: Make FDT optional for MPFS	Sebastian Huber	1	-28/+28
	Real-time kernels such as RTEMS or Zephyr may use a static device tree built into the kernel image. Do not require to use the -dtb option if -kernel is used for the microchip-icicle-kit machine. Issue a warning if no device tree is provided by the user since the machine does not generate one. Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250319061342.26435-4-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/riscv: More flexible FDT placement for MPFS	Sebastian Huber	1	-2/+9
	If the kernel entry is in the high DRAM area, place the FDT into this area. Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250319061342.26435-3-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/misc: Add MPFS system reset support	Sebastian Huber	1	-0/+7
	Signed-off-by: Sebastian Huber <sebastian.huber@embedded-brains.de> Acked-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20250319061342.26435-2-sebastian.huber@embedded-brains.de> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	Generate strided vector loads/stores with tcg nodes.	Paolo Savini	1	-50/+273
	This commit improves the performance of QEMU when emulating strided vector loads and stores by substituting the call for the helper function with the generation of equivalent TCG operations. Signed-off-by: Paolo Savini <paolo.savini@embecosm.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-ID: <20250312155547.289642-2-paolo.savini@embecosm.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	target/riscv: pmp: remove redundant check in pmp_is_locked	Loïc Lefort	1	-5/+0
	Remove useless check in pmp_is_locked, the function will return 0 in either case. Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Message-ID: <20250313193011.720075-6-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	target/riscv: pmp: exit csr writes early if value was not changed	Loïc Lefort	1	-7/+15
	Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Message-ID: <20250313193011.720075-5-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	target/riscv: pmp: fix checks on writes to pmpcfg in Smepmp MML mode	Loïc Lefort	1	-36/+43
	With Machine Mode Lockdown (mseccfg.MML) set and RLB not set, checks on pmpcfg writes would match the wrong cases of Smepmp truth table. The existing code allows writes for the following cases: - L=1, X=0: cases 8, 10, 12, 14 - L=0, RWX!=WX: cases 0-2, 4-6 This leaves cases 3, 7, 9, 11, 13, 15 for which writes are ignored. From the Smepmp specification: "Adding a rule with executable privileges that either is M-mode-only or a locked Shared-Region is not possible (...)" This description matches cases 9-11, 13 of the truth table. This commit implements an explicit check for these cases by using pmp_get_epmp_operation to convert between PMP configuration and Smepmp truth table cases. Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Message-ID: <20250313193011.720075-4-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: pmp: move Smepmp operation conversion into a function	Loïc Lefort	1	-10/+12
	Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Message-ID: <20250313193011.720075-3-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	target/riscv: pmp: don't allow RLB to bypass rule privileges	Loïc Lefort	1	-20/+23
	When Smepmp is supported, mseccfg.RLB allows bypassing locks when writing CSRs but should not affect interpretation of actual PMP rules. This is not the case with the current implementation where pmp_hart_has_privs calls pmp_is_locked which implements mseccfg.RLB bypass. This commit implements the correct behavior by removing mseccfg.RLB bypass from pmp_is_locked. RLB bypass when writing CSRs is implemented by adding a new pmp_is_readonly function that calls pmp_is_locked and check mseccfg.RLB. pmp_write_cfg and pmpaddr_csr_write are changed to use this new function. Signed-off-by: Loïc Lefort <loic@rivosinc.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Message-ID: <20250313193011.720075-2-loic@rivosinc.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com> Cc: qemu-stable@nongnu.org
2025-05-19	hw/riscv/virt-acpi-build: Add support for RIMT	Sunil V L	1	-0/+215
	RISC-V IO Mapping Table (RIMT) is a new static ACPI table used to communicate IOMMU information to the OS. Add support for creating this table when the IOMMU is present. The specification is frozen and available at [1]. [1] - https://github.com/riscv-non-isa/riscv-acpi-rimt/releases/download/v0.99/rimt-spec.pdf Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-ID: <20250322043139.2003479-3-sunilvl@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-19	hw/riscv/virt: Add the BDF of IOMMU to RISCVVirtState structure	Sunil V L	2	-0/+2
	When the IOMMU is implemented as a PCI device, its BDF is created locally in virt.c. However, the same BDF is also required in virt-acpi-build.c to support ACPI. Therefore, make this information part of the global RISCVVirtState structure so that it can be accessed outside of virt.c as well. Signed-off-by: Sunil V L <sunilvl@ventanamicro.com> Reviewed-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Message-ID: <20250322043139.2003479-2-sunilvl@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
2025-05-15	hw/nvme: fix nvme hotplugging	Klaus Jensen	1	-1/+0
	Commit cd59f50ab017 caused a regression on nvme hotplugging for devices with an implicit nvm subsystem. The nvme-subsys device was incorrectly left with being marked as non-hotpluggable. Fix this. Cc: qemu-stable@nongnu.org Reported-by: Stéphane Graber <stgraber@stgraber.org> Tested-by: Stéphane Graber <stgraber@stgraber.org> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2950 Fixes: cd59f50ab017 ("hw/nvme: always initialize a subsystem") Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
2025-05-14	mirror: Reduce I/O when destination is detect-zeroes:unmap	Eric Blake	1	-4/+9
	If we are going to punch holes in the mirror destination even for the portions where the source image is unallocated, it is nicer to treat the entire image as dirty and punch as we go, rather than pre-zeroing the entire image just to re-do I/O to the allocated portions of the image. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250513220142.535200-2-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14	tests: Add iotest mirror-sparse for recent patches	Eric Blake	2	-0/+490
	Prove that blockdev-mirror can now result in sparse raw destination files, regardless of whether the source is raw or qcow2. By making this a separate test, it was possible to test effects of individual patches for the various pieces that all have to work together for a sparse mirror to be successful. Note that ./check -file produces different job lengths than ./check -qcow2 (the test uses a filter to normalize); that's because when deciding how much of the image to be mirrored, the code looks at how much of the source image was allocated (for qcow2, this is only the written clusters; for raw, it is the entire file). But the important part is that the destination file ends up smaller than 3M, rather than the 20M it used to be before this patch series. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-28-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14	iotests/common.rc: add disk_usage function	Andrey Drobyshev	2	-5/+6
	Move the definition from iotests/250 to common.rc. This is used to detect real disk usage of sparse files. In particular, we want to use it for checking subclusters-based discards. Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-ID: <20240913163942.423050-6-andrey.drobyshev@virtuozzo.com> Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-27-eblake@redhat.com>
2025-05-14	mirror: Skip writing zeroes when target is already zero	Eric Blake	1	-14/+93
	When mirroring, the goal is to ensure that the destination reads the same as the source; this goal is met whether the destination is sparse or fully-allocated (except when explicitly punching holes, then merely reading zero is not enough to know if it is sparse, so we still want to punch the hole). Avoiding a redundant write to zero (whether in the background because the zero cluster was marked in the dirty bitmap, or in the foreground because the guest is writing zeroes) when the destination already reads as zero makes mirroring faster, and avoids allocating the destination merely because the source reports as allocated. The effect is especially pronounced when the source is a raw file. That's because when the source is a qcow2 file, the dirty bitmap only visits the portions of the source that are allocated, which tend to be non-zero. But when the source is a raw file, bdrv_co_is_allocated_above() reports the entire file as allocated so mirror_dirty_init sets the entire dirty bitmap, and it is only later during mirror_iteration that we change to consulting the more precise bdrv_co_block_status_above() to learn where the source reads as zero. Remember that since a mirror operation can write a cluster more than once (every time the guest changes the source, the destination is also changed to keep up), and the guest can change whether a given cluster reads as zero, is discarded, or has non-zero data over the course of the mirror operation, we can't take the shortcut of relying on s->target_is_zero (which is static for the life of the job) in mirror_co_zero() to see if the destination is already zero, because that information may be stale. Any solution we use must be dynamic in the face of the guest writing or discarding a cluster while the mirror has been ongoing. We could just teach mirror_co_zero() to do a block_status() probe of the destination, and skip the zeroes if the destination already reads as zero, but we know from past experience that extra block_status() calls are not always cheap (tmpfs, anyone?), especially when they are random access rather than linear. Use of block_status() of the source by the background task in a linear fashion is not our bottleneck (it's a background task, after all); but since mirroring can be done while the source is actively being changed, we don't want a slow block_status() of the destination to occur on the hot path of the guest trying to do random-access writes to the source. So this patch takes a slightly different approach: any time we have to track dirty clusters, we can also track which clusters are known to read as zero. For sync=TOP or when we are punching holes from "detect-zeroes":"unmap", the zero bitmap starts out empty, but prevents a second write zero to a cluster that was already zero by an earlier pass; for sync=FULL when we are not punching holes, the zero bitmap starts out full if the destination reads as zero during initialization. Either way, I/O to the destination can now avoid redundant write zero to a cluster that already reads as zero, all without having to do a block_status() per write on the destination. With this patch, if I create a raw sparse destination file, connect it with QMP 'blockdev-add' while leaving it at the default "discard": "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the destination remains sparse rather than fully allocated. Meanwhile, a destination image that is already fully allocated remains so unless it was opened with "detect-zeroes": "unmap". And any time writing zeroes is skipped, the job counters are not incremented. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-26-eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14	mirror: Skip pre-zeroing destination if it is already zero	Eric Blake	4	-13/+33
	When doing a sync=full mirroring, we can skip pre-zeroing the destination if it already reads as zeroes and we are not also trying to punch holes due to detect-zeroes. With this patch, there are fewer scenarios that have to pass in an explicit target-is-zero, while still resulting in a sparse destination remaining sparse. A later patch will then further improve things to skip writing to the destination for parts of the image where the source is zero; but even with just this patch, it is possible to see a difference for any source that does not report itself as fully allocated, coupled with a destination BDS that can quickly report that it already reads as zero. (For a source that reports as fully allocated, such as a file, the rest of mirror_dirty_init() still sets the entire dirty bitmap to true, so even though we avoided the pre-zeroing, we are not yet avoiding all redundant I/O). Iotest 194 detects the difference made by this patch: for a file source (where block status reports the entire image as allocated, and therefore we end up writing zeroes everywhere in the destination anyways), the job length remains the same. But for a qcow2 source and a destination that reads as all zeroes, the dirty bitmap changes to just tracking the allocated portions of the source, which results in faster completion and smaller job statistics. For the test to pass with both ./check -file and -qcow2, a new python filter is needed to mask out the now-varying job amounts (this matches the shell filters _filter_block_job_{offset,len} in common.filter). A later test will also be added which further validates expected sparseness, so it does not matter that 194 is no longer explicitly looking at how many bytes were copied. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-25-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14	mirror: Drop redundant zero_target parameter	Eric Blake	4	-26/+11
	The two callers to a mirror job (drive-mirror and blockdev-mirror) set zero_target precisely when sync mode == FULL, with the one exception that drive-mirror skips zeroing the target if it was newly created and reads as zero. But given the previous patch, that exception is equally captured by target_is_zero. Meanwhile, there is another slight wrinkle, fortunately caught by iotest 185: if the caller uses "sync":"top" but the source has no backing file, the code in blockdev.c was changing sync to be FULL, but only after it had set zero_target=false. In mirror.c, prior to recent patches, this didn't matter: the only places that inspected sync were setting is_none_mode (both TOP and FULL had set that to false), and mirror_start() setting base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL. But now that we are passing sync around, the slammed sync mode would result in a new pre-zeroing pass even when the user had passed "sync":"top" in an effort to skip pre-zeroing. Fortunately, the assignment of base when bs has no backing chain still works out to NULL if we don't slam things. So with the forced change of sync ripped out of blockdev.c, the sync mode is passed through the full callstack unmolested, and we can now reliably reconstruct the same settings as what used to be passed in by zero_target=false, without the redundant parameter. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-24-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> [eblake: Fix regression in iotest 185] Signed-off-by: Eric Blake <eblake@redhat.com>
2025-05-14	mirror: Allow QMP override to declare target already zero	Eric Blake	5	-14/+44
	QEMU has an optimization for a just-created drive-mirror destination that is not possible for blockdev-mirror (which can't create the destination) - any time we know the destination starts life as all zeroes, we can skip a pre-zeroing pass on the destination. Recent patches have added an improved heuristic for detecting if a file contains all zeroes, and we plan to use that heuristic in upcoming patches. But since a heuristic cannot quickly detect all scenarios, and there may be cases where the caller is aware of information that QEMU cannot learn quickly, it makes sense to have a way to tell QEMU to assume facts about the destination that can make the mirror operation faster. Given our existing example of "qemu-img convert --target-is-zero", it is time to expose this override in QMP for blockdev-mirror as well. This patch results in some slight redundancy between the older s->zero_target (set any time mode==FULL and the destination image was not just created - ie. clear if drive-mirror is asking to skip the pre-zero pass) and the newly-introduced s->target_is_zero (in addition to the QMP override, it is set when drive-mirror creates the destination image); this will be cleaned up in the next patch. There is also a subtlety that we must consider. When drive-mirror is passing target_is_zero on behalf of a just-created image, we know the image is sparse (skipping the pre-zeroing keeps it that way), so it doesn't matter whether the destination also has "discard":"unmap" and "detect-zeroes":"unmap". But now that we are letting the user set the knob for target-is-zero, if the user passes a pre-existing file that is fully allocated, it is fine to leave the file fully allocated under "detect-zeroes":"on", but if the file is open with "detect-zeroes":"unmap", we should really be trying harder to punch holes in the destination for every region of zeroes copied from the source. The easiest way to do this is to still run the pre-zeroing pass (turning the entire destination file sparse before populating just the allocated portions of the source), even though that currently results in double I/O to the portions of the file that are allocated. A later patch will add further optimizations to reduce redundant zeroing I/O during the mirror operation. Since "target-is-zero":true is designed for optimizations, it is okay to silently ignore the parameter rather than erroring if the user ever sets the parameter in a scenario where the mirror job can't exploit it (for example, when doing "sync":"top" instead of "sync":"full", we can't pre-zero, so setting the parameter won't make a speed difference). Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250509204341.3553601-23-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14	mirror: Pass full sync mode rather than bool to internals	Eric Blake	1	-12/+12
	Out of the five possible values for MirrorSyncMode, INCREMENTAL and BITMAP are already rejected up front in mirror_start, leaving NONE, TOP, and FULL as the remaining values that the code was collapsing into a single bool is_none_mode. Furthermore, mirror_dirty_init() is only reachable for modes TOP and FULL, as further guided by s->zero_target. However, upcoming patches want to further optimize the pre-zeroing pass of a sync=full mirror in mirror_dirty_init(), while avoiding that pass on a sync=top action. Instead of throwing away context by collapsing these two values into s->is_none_mode=false, it is better to pass s->sync_mode throughout the entire operation. For active commit, the desired semantics match sync mode TOP. Signed-off-by: Eric Blake <eblake@redhat.com> Message-ID: <20250509204341.3553601-22-eblake@redhat.com> Reviewed-by: Sunny Zhu <sunnyzhyy@qq.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2025-05-14	mirror: Minor refactoring	Eric Blake	1	-11/+11
	Commit 5791ba52 (v9.2) pre-initialized ret in mirror_dirty_init to silence a false positive compiler warning, even though in all code paths where ret is used, it was guaranteed to be reassigned beforehand. But since the function returns -errno, and -1 is not always the right errno, it's better to initialize to -EIO. An upcoming patch wants to track two bitmaps in do_sync_target_write(); this will be easier if the current variables related to the dirty bitmap are renamed. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-21-eblake@redhat.com>
2025-05-14	iotests: Improve iotest 194 to mirror data	Eric Blake	1	-0/+1
	Mirroring a completely sparse image to a sparse destination should be practically instantaneous. It isn't yet, but the test will be more realistic if it has some non-zero to mirror as well as the holes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-20-eblake@redhat.com>
2025-05-14	block: Add new bdrv_co_is_all_zeroes() function	Eric Blake	2	-0/+64
	There are some optimizations that require knowing if an image starts out as reading all zeroes, such as making blockdev-mirror faster by skipping the copying of source zeroes to the destination. The existing bdrv_co_is_zero_fast() is a good building block for answering this question, but it tends to give an answer of 0 for a file we just created via QMP 'blockdev-create' or similar (such as 'qemu-img create -f raw'). Why? Because file-posix.c insists on allocating a tiny header to any file rather than leaving it 100% sparse, due to some filesystems that are unable to answer alignment probes on a hole. But teaching file-posix.c to read the tiny header doesn't scale - the problem of a small header is also visible when libvirt sets up an NBD client to a just-created file on a migration destination host. So, we need a wrapper function that handles a bit more complexity in a common manner for all block devices - when the BDS is mostly a hole, but has a small non-hole header, it is still worth the time to read that header and check if it reads as all zeroes before giving up and returning a pessimistic answer. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-19-eblake@redhat.com>
2025-05-14	block: Let bdrv_co_is_zero_fast consolidate adjacent extents	Eric Blake	1	-12/+15
	Some BDS drivers have a cap on how much block status they can supply in one query (for example, NBD talking to an older server cannot inspect more than 4G per query; and qcow2 tends to cap its answers rather than cross a cluster boundary of an L1 table). Although the existing callers of bdrv_co_is_zero_fast are not passing in that large of a 'bytes' parameter, an upcoming caller wants to query the entire image at once, and will thus benefit from being able to treat adjacent zero regions in a coalesced manner, rather than claiming the region is non-zero merely because pnum was truncated and didn't match the incoming bytes. While refactoring this into a loop, note that there is no need to assign pnum prior to calling bdrv_co_common_block_status_above() (it is guaranteed to be assigned deeper in the callstack). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-18-eblake@redhat.com>
2025-05-14	file-posix, gluster: Handle zero block status hint better	Eric Blake	2	-2/+3
	Although the previous patch to change 'bool want_zero' into a bitmask made no semantic change, it is now time to differentiate. When the caller specifically wants to know what parts of the file read as zero, we need to use lseek and actually reporting holes, rather than short-circuiting and advertising full allocation. This change will be utilized in later patches to let mirroring optimize for the case when the destination already reads as zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-17-eblake@redhat.com>
2025-05-14	block: Expand block status mode from bool to flags	Eric Blake	25	-86/+99
	This patch is purely mechanical, changing bool want_zero into an unsigned int for bitwise-or of flags. As of this patch, all implementations are unchanged (the old want_zero==true is now mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but the callers in io.c that used to pass want_zero==false are now prepared for future driver changes that can now distinguish bewteen BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually change the file-posix driver along those lines, now that we have more-specific hints. As for the background why this patch is useful: right now, the file-posix driver recognizes that if allocation is being queried, the entire image can be reported as allocated (there is no backing file to refer to) - but this throws away information on whether the entire image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0 returns -ENXIO, a bit more complicated to prove if the raw file was created with 'qemu-img create' since we intentionally allocate a small chunk of all-zero data to help with alignment probing). Later patches will add a generic algorithm for seeing if an entire file reads as zeroes. Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20250509204341.3553601-16-eblake@redhat.com>
2025-05-14	target/arm/tcg/vfp_helper: compile file twice (system, user)	Pierrick Bouvier	2	-2/+5
	Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20250512180502.2395029-49-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-05-14	target/arm/tcg/arith_helper: compile file once	Pierrick Bouvier	2	-3/+4
	Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20250512180502.2395029-48-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-05-14	target/arm/tcg/tlb-insns: compile file once (system)	Pierrick Bouvier	2	-8/+1
	aarch64 specific code is guarded by cpu_isar_feature(aa64*), so it's safe to expose it. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-id: 20250512180502.2395029-47-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-05-14	target/arm/helper: restrict define_tlb_insn_regs to system target	Pierrick Bouvier	1	-0/+2
	Allows to include target/arm/tcg/tlb-insns.c only for system targets. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20250512180502.2395029-46-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-05-14	target/arm/tcg/tlb_helper: compile file twice (system, user)	Pierrick Bouvier	2	-2/+4
	Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-id: 20250512180502.2395029-45-pierrick.bouvier@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>