| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
Very few files use the Physical Memory API. Declare its
methods in their own header: "system/physmem.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20251001175448.18933-19-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
The functions related to the Physical Memory API declared
in "system/ram_addr.h" do not operate on vCPU. Remove the
'cpu_' prefix.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Message-Id: <20251001175448.18933-18-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The "exec/target_page.h" header is indirectly pulled from
"system/ram_addr.h". Include it explicitly, in order to
avoid unrelated issues when refactoring "system/ram_addr.h":
accel/kvm/kvm-all.c: In function ‘kvm_init’:
accel/kvm/kvm-all.c:2636:12: error: ‘TARGET_PAGE_SIZE’ undeclared (first use in this function); did you mean ‘TARGET_PAGE_BITS’?
2636 | assert(TARGET_PAGE_SIZE <= qemu_real_host_page_size());
| ^~~~~~~~~~~~~~~~
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20251001175448.18933-3-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
| |
Keep RAM blocks API in the same header: "system/ramblock.h".
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Peter Xu <peterx@redhat.com>
Message-Id: <20251002032812.26069-4-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
valid
Current QEMU unconditionally sets the guest_memfd_offset of KVMSlot in
kvm_set_phys_mem(), which leads to the trace of kvm_set_user_memory looks:
kvm_set_user_memory AddrSpace#0 Slot#4 flags=0x2 gpa=0xe0000 size=0x20000 ua=0x7f5840de0000 guest_memfd=-1 guest_memfd_offset=0x3e0000 ret=0
It's confusing that the guest_memfd_offset has a non-zero value while
the guest_memfd is invalid (-1).
Change to only set guest_memfd_offset when guest_memfd is valid and
leave it as 0 when no valid guest_memfd.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250728115707.1374614-4-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Zero out the entire mem explicitly before it's used, to ensure the unused
feilds (pad1, pad2) are all zeros. Otherwise, it might cause problem when
the pad fields are extended by future KVM.
Fixes: ce5a983233b4 ("kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot")
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Link: https://lore.kernel.org/r/20250728115707.1374614-3-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
| |
It returns more accruate result on checking KVM_CAP_GUEST_MEMFD and
KVM_CAP_USER_MEMORY2 on VM instance instead of on KVM platform.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Link: https://lore.kernel.org/r/20250728115707.1374614-2-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
There is no reason for some accelerators to use qemu_process_cpu_events_common
(which is separated from qemu_process_cpu_events() specifically for round
robin TCG). They can also check for events directly on the first pass through
the loop, instead of setting cpu->exit_request to true.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make the code common to all accelerators: after seeing cpu->exit_request
set to true, accelerator code needs to reach qemu_process_cpu_events_common().
So for the common cases where they use qemu_process_cpu_events(), go ahead and
clear it in there. Note that the cheap qatomic_set() is enough because
at this point the thread has taken the BQL; qatomic_set_mb() is not needed.
In particular, this is the ordering of the communication between
I/O and vCPU threads is always the same.
In the I/O thread:
(a) store other memory locations that will be checked if cpu->exit_request
or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list
for cpu->exit_request)
(b) cpu_exit(): store-release cpu->exit_request, or
(b) cpu_interrupt(): store-release cpu->interrupt_request
>>> at this point, cpu->halt_cond is broadcast and the BQL released
(c) do the accelerator-specific kick (e.g. write icount_decr for TCG,
pthread_kill for KVM, etc.)
In the vCPU thread instead the opposite order is respected:
(c) the accelerator's execution loop exits thanks to the kick
(b) then the inner execution loop checks cpu->interrupt_request
and cpu->exit_request. If needed cpu->interrupt_request is
converted into cpu->exit_request when work is needed outside
the execution loop.
(a) then the other memory locations are checked. Some may need to
be read under the BQL, but the vCPU thread may also take other
locks (e.g. for queued work items) or none at all.
qatomic_set_mb() would only be needed if the halt sleep was done
outside the BQL (though in that case, cpu->exit_request probably
would be replaced by a QemuEvent or something like that).
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
| |
Do so before extending it to the user-mode emulators, where there is no
such thing as an "I/O thread".
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CPU threads write exit_request as a "note to self" that they need to
go out to a slow path. This write happens out of the BQL and can be
a data race with another threads' cpu_exit(); use atomic accesses
consistently.
While at it, change the source argument from int ("1") to bool ("true").
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reads and writes cpu->exit_request do not use a load-acquire/store-release
pair right now, but this means that cpu_exit() may not write cpu->exit_request
after any flags that are read by the vCPU thread.
Probably everything is protected one way or the other by the BQL, because
cpu->exit_request leads to the slow path, where the CPU thread often takes
the BQL (for example, to go to sleep by waiting on the BQL-protected
cpu->halt_cond); but it's not clear, so use load-acquire/store-release
consistently.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
kvm_park_vcpu() and kvm_unpark_vcpu() is only used in kvm-all.c. Declare it
static, remove it from common header file and make it local to kvm-all.c
Signed-off-by: Ani Sinha <anisinha@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250815065445.8978-1-anisinha@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Accelerators patches
- Unify x86/arm hw/xen/arch_hvm.h header
- Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/
- Move KVM definitions qapi/accelerator.json
- Add @qom-type field to CpuInfoFast QAPI structure
- Display CPU model name in 'info cpus' HMP command
- Introduce @x-accel-stats QMP command
- Add 'info accel' on HMP
- Improve qemu_add_vm_change_state_handler*() docstring
- Extract TCG statistic related code to tcg-stats.c
- Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF
- Do not dump NaN in TCG statistics
- Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread"
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t
# wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg
# GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y
# scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ
# nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx
# pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP
# mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7
# N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K
# KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev
# TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR
# HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE
# 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck=
# =rAav
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue 15 Jul 2025 15:44:05 EDT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'accel-20250715' of https://github.com/philmd/qemu:
system/runstate: Document qemu_add_vm_change_state_handler_prio* in hdr
system/runstate: Document qemu_add_vm_change_state_handler()
accel/hvf: Implement AccelClass::get_vcpu_stats() handler
accel/tcg: Implement AccelClass::get_stats() handler
accel/tcg: Propagate AccelState to dump_accel_info()
accel/system: Add 'info accel' on human monitor
accel/system: Introduce @x-accel-stats QMP command
accel/tcg: Extract statistic related code to tcg-stats.c
Revert "accel/tcg: Unregister the RCU before exiting RR thread"
accel: Extract AccelClass definition to 'accel/accel-ops.h'
accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h'
accel/tcg: Do not dump NaN statistics
hw/core/machine: Display CPU model name in 'info cpus' command
qapi/machine: Add @qom-type field to CpuInfoFast structure
qapi/accel: Move definitions related to accelerators in their own file
hw/arm/xen-pvh: Remove unnecessary 'hw/xen/arch_hvm.h' header
hw/xen/arch_hvm: Unify x86 and ARM variants
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Conflicts:
qapi/machine.json
Commit 0462da9d6b19 ("qapi: remove trivial "Returns:" sections")
removed trivial "Returns:". This caused a conflict with the move from
machine.json to accelerator.json.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Only accelerator implementations (and the common accelator
code) need to know about AccelClass internals. Move the
definition out but forward declare AccelState and AccelClass.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-39-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Unfortunately "system/accel-ops.h" handlers are not only
system-specific. For example, the cpu_reset_hold() hook
is part of the vCPU creation, after it is realized.
Mechanical rename to drop 'system' using:
$ sed -i -e s_system/accel-ops.h_accel/accel-cpu-ops.h_g \
$(git grep -l system/accel-ops.h)
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-38-philmd@linaro.org>
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 126e7f78036 ("kvm: require KVM_CAP_IOEVENTFD and
KVM_CAP_IOEVENTFD_ANY_LENGTH") we require at least kernel 4.5 to
be able to use KVM. Adjust the upgrade_note accordingly.
While we're at it, remove the text about kvm-kmod and the
SourceForge URL since this is not actively maintained anymore.
Fixes: 126e7f78036 ("kvm: require KVM_CAP_IOEVENTFD and KVM_CAP_IOEVENTFD_ANY_LENGTH")
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Accelerators patches
- Generic API consolidation, cleanups (dead code removal, documentation added)
- Remove monitor TCG 'info opcount' and @x-query-opcount
- Have HVF / NVMM / WHPX use generic CPUState::vcpu_dirty field
- Expose nvmm_enabled() and whpx_enabled() to common code
- Have hmp_info_registers() dump vector registers
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmhnql4ACgkQ4+MsLN6t
# wN6Lfg//R4h6dyAg02hyopwb/DSI97hAsD9kap15ro1qszYrIOkJcEPoE37HDi6d
# O0Ls+8NPpJcnMwdghHvVaRGoIH2OY5ogXKo6UK1BbOn8iAGxRrT/IPVCyFbPmQoe
# Bk78Z/wne/YgCXiW4HGHSJO5sL04AQqcFYnwjisHHf3Ox8RR85LbhWqthZluta4i
# a/Y8W5UO7jfwhAl1/Zb2cU+Rv75I6xcaLQAfmbt4j+wHP52I2cjLpIYo4sCn+ULJ
# AVX4q4MKrkDrr6CYPXxdGJzYEzVn9evynVcQoRzL6bLZFMpa284AzVd3kQg9NWAb
# p1hvKJTA57q4XDoD50qVGLhP207VVSUcdm0r2ZJA2jag5ddoT+x2talz8/f6In1b
# 7BrSM/pla8x9KvTne/ko0wSL0o2dOWyig8mBxARLZWPxk+LBVs1PBZfvn+3j1pYA
# rWV25Ht4QJlUYMbe3NvEIomsVThKg8Fh3b4mEuyPM+LZ1brgmhrzJG1SF+G4fH8A
# aig/RVqgNHtajSnG4A723k2/QzlvnAiT7E3dKB5FogjTcVzFRaWFKsUb4ORqsCAz
# c/AheCJY4PP3pAnb0ODISSVviXwAXqCLbtZhDGhHNYl3C69EyGPPMiVxCaIxKDxU
# bF7AIYhRTTMyNSbnkcRS3UDO/gZS7x5/K+/YAM9akQEYADIodYM=
# =Vb39
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 04 Jul 2025 06:18:06 EDT
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full]
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'accel-20250704' of https://github.com/philmd/qemu: (31 commits)
MAINTAINERS: Add me as reviewer of overall accelerators section
monitor/hmp-cmds-target: add CPU_DUMP_VPU in hmp_info_registers()
accel: Pass AccelState argument to gdbstub_supported_sstep_flags()
accel: Remove unused MachineState argument of AccelClass::setup_post()
accel: Directly pass AccelState argument to AccelClass::has_memory()
accel/kvm: Directly pass KVMState argument to do_kvm_create_vm()
accel/kvm: Prefer local AccelState over global MachineState::accel
accel/tcg: Prefer local AccelState over global current_accel()
accel: Propagate AccelState to AccelClass::init_machine()
accel: Keep reference to AccelOpsClass in AccelClass
accel: Expose and register generic_handle_interrupt()
accel/dummy: Extract 'dummy-cpus.h' header from 'system/cpus.h'
accel/whpx: Expose whpx_enabled() to common code
accel/nvmm: Expose nvmm_enabled() to common code
accel/system: Document cpu_synchronize_state_post_init/reset()
accel/system: Document cpu_synchronize_state()
accel/kvm: Remove kvm_cpu_synchronize_state() stub
accel/whpx: Replace @dirty field by generic CPUState::vcpu_dirty field
accel/nvmm: Replace @dirty field by generic CPUState::vcpu_dirty field
accel/hvf: Replace @dirty field by generic CPUState::vcpu_dirty field
...
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to have AccelClass methods instrospect their state,
we need to pass AccelState by argument.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-37-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| | |
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20250703173248.44995-34-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| | |
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-35-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| | |
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250703173248.44995-32-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to avoid init_machine() to call current_accel(),
pass AccelState along.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20250703173248.44995-31-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to dispatch over AccelOpsClass::handle_interrupt(),
we need it always defined, not calling a hidden handler under
the hood. Make AccelOpsClass::handle_interrupt() mandatory.
Expose generic_handle_interrupt() prototype and register it
for each accelerator.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20250703173248.44995-29-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
kvm_create_vcpu() is only used within the same file unit.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Message-Id: <20250703173248.44995-7-philmd@linaro.org>
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
| |
cpr-transfer breaks vfio network connectivity to and from the guest, and
the host system log shows:
irq bypass consumer (token 00000000a03c32e5) registration fails: -16
which is EBUSY. This occurs because KVM descriptors are still open in
the old QEMU process. Close them.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Link: https://lore.kernel.org/qemu-devel/1751493538-202042-4-git-send-email-steven.sistare@oracle.com
Signed-off-by: Cédric Le Goater <clg@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A new field, attributes, was introduced in RAMBlock to link to a
RamBlockAttributes object, which centralizes all guest_memfd related
information (such as fd and status bitmap) within a RAMBlock.
Create and initialize the RamBlockAttributes object upon ram_block_add().
Meanwhile, register the object in the target RAMBlock's MemoryRegion.
After that, guest_memfd-backed RAMBlock is associated with the
RamDiscardManager interface, and the users can execute RamDiscardManager
specific handling. For example, VFIO will register the
RamDiscardListener and get notifications when the state_change() helper
invokes.
As coordinate discarding of RAM with guest_memfd is now supported, only
block uncoordinated discard.
Tested-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Alexey Kardashevskiy <aik@amd.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Chenyi Qiang <chenyi.qiang@intel.com>
Link: https://lore.kernel.org/r/20250612082747.51539-6-chenyi.qiang@intel.com
Signed-off-by: Peter Xu <peterx@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
A page state change is typically followed by an access of the page(s) and
results in another VMEXIT in order to map the page into the nested page
table. Depending on the size of page state change request, this can
generate a number of additional VMEXITs. For example, under SNP, when
Linux is utilizing lazy memory acceptance, memory is typically accepted in
4M chunks. A page state change request is submitted to mark the pages as
private, followed by validation of the memory. Since the guest_memfd
currently only supports 4K pages, each page validation will result in
VMEXIT to map the page, resulting in 1024 additional exits.
When performing a page state change, invoke KVM_PRE_FAULT_MEMORY for the
size of the page state change in order to pre-map the pages and avoid the
additional VMEXITs. This helps speed up boot times.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Link: https://lore.kernel.org/r/f5411c42340bd2f5c14972551edb4e959995e42b.1743193824.git.thomas.lendacky@amd.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
QEMU calls kvm_arch_put_registers() when vcpu_dirty is true in
kvm_vcpu_exec(). However, for confidential guest, like TDX, putting
registers is disallowed due to guest state is protected.
Only set vcpu_dirty to true with guest state is not protected when
creating the vcpu.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-43-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
KVM with TDX support starts to report different KVM_CAP_MAX_VCPUS per
different VM types. So switch to check the KVM_CAP_MAX_VCPUS at vm level.
KVM still returns the global KVM_CAP_MAX_VCPUS when the KVM is old that
doesn't report different value at vm level.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-31-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce kvm_arch_pre_create_vcpu(), to perform arch-dependent
work prior to create any vcpu. This is for i386 TDX because it needs
call TDX_INIT_VM before creating any vcpu.
The specific implementation for i386 will be added in the future patch.
Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
Link: https://lore.kernel.org/r/20250508150002.689633-8-xiaoyao.li@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
| |
Check whether we need to swap at runtime using
target_needs_bswap().
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250417131004.47205-3-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
| |
Mechanical change using gsed, then style manually adapted
to pass checkpatch.pl script.
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250424194905.82506-4-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This define is used only in accel/kvm/kvm-all.c, so we push directly the
definition there. Add more visibility to kvm_arch_on_sigbus_vcpu() to
allow removing this define from any header.
The architectures defining KVM_HAVE_MCE_INJECTION are i386, x86_64 and
aarch64.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-ID: <20250325045915.994760-18-pierrick.bouvier@linaro.org>
|
| |
|
|
|
|
|
|
| |
Convert the existing includes with sed.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Convert the existing includes with
sed -i ,exec/memory.h,system/memory.h,g
Move the include within cpu-all.h into a !CONFIG_USER_ONLY block.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Commit 3f2a05b31ee9 ("target/i386: Reset TSCs of parked vCPUs too on VM
reset") introduced a way to reset TSCs of parked vCPUs during VM reset to
prevent them getting desynchronized with the online vCPUs and therefore
causing the KVM PV clock to lose PVCLOCK_TSC_STABLE_BIT.
The way this was done was by registering a parked vCPU-specific QEMU reset
callback via qemu_register_reset().
However, it turns out that on particularly device-rich VMs QEMU reset
callbacks can take a long time to execute (which isn't surprising,
considering that they involve resetting all of VM devices).
In particular, their total runtime can exceed the 1-second TSC
synchronization window introduced in KVM commit 5d3cb0f6a8e3 ("KVM:
Improve TSC offset matching").
Since the TSCs of online vCPUs are only reset from "synchronize_post_reset"
AccelOps handler (which runs after all qemu_register_reset() handlers) this
essentially makes that fix ineffective on these VMs.
The easiest way to guarantee that these parked vCPUs are reset at the same
time as the online ones (regardless how long it takes for VM devices to
reset) is to piggyback on post-reset vCPU synchronization handler for one
of online vCPUs - as there is no generic post-reset AccelOps handler that
isn't per-vCPU.
The first online vCPU was selected for that since it is easily available
under "first_cpu" define.
This does not create an ordering issue since the order of vCPU TSC resets
does not matter.
Fixes: 3f2a05b31ee9 ("target/i386: Reset TSCs of parked vCPUs too on VM reset")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/e8b85a5915f79aa177ca49eccf0e9b534470c1cd.1743099810.git.maciej.szmigiero@oracle.com
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
| |
Missed in commit b86f59c7155 ("accel: replace struct CpusAccel
with AccelOpsClass") which removed the single CpusAccel use.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-7-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The heavily imported "system/cpus.h" header includes "accel-ops.h"
to get AccelOpsClass type declaration. Reduce headers pressure by
forward declaring it in "qemu/typedefs.h", where we already
declare the AccelCPUState type.
Reduce "system/cpus.h" inclusions by only including
"system/accel-ops.h" when necessary.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20250123234415.59850-14-philmd@linaro.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The list of hwpoison pages used to remap the memory on reset
is based on the backend real page size.
To correctly handle hugetlb, we must mmap(MAP_FIXED) a complete
hugetlb page; hugetlb pages cannot be partially mapped.
Signed-off-by: William Roche <william.roche@oracle.com>
Co-developed-by: David Hildenbrand <david@redhat.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20250211212707.302391-2-william.roche@oracle.com
Signed-off-by: Peter Xu <peterx@redhat.com>
|
| |\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Accel & Exec patch queue
- Ignore writes to CNTP_CTL_EL0 on HVF ARM (Alexander)
- Add '-d invalid_mem' logging option (Zoltan)
- Create QOM containers explicitly (Peter)
- Rename sysemu/ -> system/ (Philippe)
- Re-orderning of include/exec/ headers (Philippe)
Move a lot of declarations from these legacy mixed bag headers:
. "exec/cpu-all.h"
. "exec/cpu-common.h"
. "exec/cpu-defs.h"
. "exec/exec-all.h"
. "exec/translate-all"
to these more specific ones:
. "exec/page-protection.h"
. "exec/translation-block.h"
. "user/cpu_loop.h"
. "user/guest-host.h"
. "user/page-protection.h"
# -----BEGIN PGP SIGNATURE-----
#
# iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmdlnyAACgkQ4+MsLN6t
# wN6mBw//QFWi7CrU+bb8KMM53kOU9C507tjn99LLGFb5or73/umDsw6eo/b8DHBt
# KIwGLgATel42oojKfNKavtAzLK5rOrywpboPDpa3SNeF1onW+99NGJ52LQUqIX6K
# A6bS0fPdGG9ZzEuPpbjDXlp++0yhDcdSgZsS42fEsT7Dyj5gzJYlqpqhiXGqpsn8
# 4Y0UMxSL21K3HEexlzw2hsoOBFA3tUm2ujNDhNkt8QASr85yQVLCypABJnuoe///
# 5Ojl5wTBeDwhANET0rhwHK8eIYaNboiM9fHopJYhvyw1bz6yAu9jQwzF/MrL3s/r
# xa4OBHBy5mq2hQV9Shcl3UfCQdk/vDaYaWpgzJGX8stgMGYfnfej1SIl8haJIfcl
# VMX8/jEFdYbjhO4AeGRYcBzWjEJymkDJZoiSWp2NuEDi6jqIW+7yW1q0Rnlg9lay
# ShAqLK5Pv4zUw3t0Jy3qv9KSW8sbs6PQxtzXjk8p97rTf76BJ2pF8sv1tVzmsidP
# 9L92Hv5O34IqzBu2oATOUZYJk89YGmTIUSLkpT7asJZpBLwNM2qLp5jO00WVU0Sd
# +kAn324guYPkko/TVnjC/AY7CMu55EOtD9NU35k3mUAnxXT9oDUeL4NlYtfgrJx6
# x1Nzr2FkS68+wlPAFKNSSU5lTjsjNaFM0bIJ4LCNtenJVP+SnRo=
# =cjz8
# -----END PGP SIGNATURE-----
# gpg: Signature made Fri 20 Dec 2024 11:45:20 EST
# gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE
# gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [unknown]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg: There is no indication that the signature belongs to the owner.
# Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE
* tag 'exec-20241220' of https://github.com/philmd/qemu: (59 commits)
util/qemu-timer: fix indentation
meson: Do not define CONFIG_DEVICES on user emulation
system/accel-ops: Remove unnecessary 'exec/cpu-common.h' header
system/numa: Remove unnecessary 'exec/cpu-common.h' header
hw/xen: Remove unnecessary 'exec/cpu-common.h' header
target/mips: Drop left-over comment about Jazz machine
target/mips: Remove tswap() calls in semihosting uhi_fstat_cb()
target/xtensa: Remove tswap() calls in semihosting simcall() helper
accel/tcg: Un-inline translator_is_same_page()
accel/tcg: Include missing 'exec/translation-block.h' header
accel/tcg: Move tcg_cflags_has/set() to 'exec/translation-block.h'
accel/tcg: Restrict curr_cflags() declaration to 'internal-common.h'
qemu/coroutine: Include missing 'qemu/atomic.h' header
exec/translation-block: Include missing 'qemu/atomic.h' header
accel/tcg: Declare cpu_loop_exit_requested() in 'exec/cpu-common.h'
exec/cpu-all: Include 'cpu.h' earlier so MMU_USER_IDX is always defined
target/sparc: Move sparc_restore_state_to_opc() to cpu.c
target/sparc: Uninline cpu_get_tb_cpu_state()
target/loongarch: Declare loongarch_cpu_dump_state() locally
user: Move various declarations out of 'exec/exec-all.h'
...
Conflicts:
hw/char/riscv_htif.c
hw/intc/riscv_aplic.c
target/s390x/cpu.c
Apply sysemu header path changes to not in the pull request.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Headers in include/sysemu/ are not only related to system
*emulation*, they are also used by virtualization. Rename
as system/ which is clearer.
Files renamed manually then mechanical change using sed tool.
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Tested-by: Lei Yang <leiyang@redhat.com>
Message-Id: <20241203172445.28576-1-philmd@linaro.org>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since commit 5286c3662294 ("target/i386: properly reset TSC on reset")
QEMU writes the special value of "1" to each online vCPU TSC on VM reset
to reset it.
However parked vCPUs don't get that handling and due to that their TSCs
get desynchronized when the VM gets reset.
This in turn causes KVM to turn off PVCLOCK_TSC_STABLE_BIT in its exported
PV clock.
Note that KVM has no understanding of vCPU being currently parked.
Without PVCLOCK_TSC_STABLE_BIT the sched clock is marked unstable in
the guest's kvm_sched_clock_init().
This causes a performance regressions to show in some tests.
Fix this issue by writing the special value of "1" also to TSCs of parked
vCPUs on VM reset.
Reproducing the issue:
1) Boot a VM with "-smp 2,maxcpus=3" or similar
2) device_add host-x86_64-cpu,id=vcpu,node-id=0,socket-id=0,core-id=2,thread-id=0
3) Wait a few seconds
4) device_del vcpu
5) Inside the VM run:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is 1.
6) Reboot the VM
7) Once the VM boots once again run inside it:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is now 0.
Fixes: 5286c3662294 ("target/i386: properly reset TSC on reset")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/5a605a88e9a231386dc803c60f5fed9b48108139.1734014926.git.maciej.szmigiero@oracle.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |/
|
|
|
|
|
|
|
| |
In case of incorrect parameters, kvm_convert_memory() was returning
-1 instead of -EINVAL. The guest won't notice because it will move
anyway to RUN_STATE_INTERNAL_ERROR, but fix this for consistency and
clarity.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
The exact set of available memory attributes can vary by VM. In the
future it might vary depending on enabled capabilities, too. Query the
extension on the VM level instead of on the KVM level, and only after
architecture-specific initialization.
Inspired by an analogous patch by Tom Dohrmann.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
KVM_CAP_MULTI_ADDRESS_SPACE used to be a global capability, but with the
introduction of AMD SEV-SNP confidential VMs, the number of address spaces
can vary by VM type.
Query the extension on the VM level instead of on the KVM level.
Inspired by an analogous patch by Tom Dohrmann.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
KVM_CAP_READONLY_MEM used to be a global capability, but with the
introduction of AMD SEV-SNP confidential VMs, this extension is not
always available on all VM types [1,2].
Query the extension on the VM level instead of on the KVM level.
[1] https://patchwork.kernel.org/project/kvm/patch/20240809190319.1710470-2-seanjc@google.com/
[2] https://patchwork.kernel.org/project/kvm/patch/20240902144219.3716974-1-erbse.13@gmx.de/
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Tom Dohrmann <erbse.13@gmx.de>
Link: https://lore.kernel.org/r/20240903062953.3926498-1-erbse.13@gmx.de
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
| |
This value used to reflect the maximum supported memslots from KVM kernel.
Rename it to be clearer.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240917163835.194664-5-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
| |
This will make all nr_slots counters to be named in the same manner.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240917163835.194664-4-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
| |
|
|
|
|
|
|
|
|
| |
Make the default max nr_slots a macro, it's only used when KVM reports
nothing.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20240917163835.194664-3-peterx@redhat.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|