summary refs log tree commit diff stats
path: root/target/i386 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* target/i386/nvmm: Inline cpu_physical_memory_rw() in nvmm_mem_callbackPhilippe Mathieu-Daudé2025-10-071-1/+4
| | | | | | Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-12-philmd@linaro.org>
* target/i386/kvm: Replace legacy cpu_physical_memory_rw() callPhilippe Mathieu-Daudé2025-10-071-1/+3
| | | | | | | | | Get the vCPU address space and convert the legacy cpu_physical_memory_rw() by address_space_rw(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-11-philmd@linaro.org>
* target/i386/whpx: Replace legacy cpu_physical_memory_rw() callPhilippe Mathieu-Daudé2025-10-071-2/+5
| | | | | | | | | Get the vCPU address space and convert the legacy cpu_physical_memory_rw() by address_space_rw(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-10-philmd@linaro.org>
* target/i386/arch_memory_mapping: Use address_space_memory_is_io()Philippe Mathieu-Daudé2025-10-071-5/+5
| | | | | | | | | | Since all functions have an address space argument, it is trivial to replace cpu_physical_memory_is_io() by address_space_memory_is_io(). Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20251002084203.63899-4-philmd@linaro.org>
* i386/kvm: Drop KVM_CAP_X86_SMM check in kvm_arch_init()Xiaoyao Li2025-09-171-2/+1
| | | | | | | | | | | x86_machine_is_smm_enabled() checks the KVM_CAP_X86_SMM for KVM case. No need to check KVM_CAP_X86_SMM in kvm_arch_init(). So just drop the check of KVM_CAP_X86_SMM to simplify the code. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250729062014.1669578-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: Define enum X86ASIdx for x86's address spacesXiaoyao Li2025-09-174-5/+10
| | | | | | | | | | | | Define X86ASIdx as enum, like ARM's ARMASIdx, so that it's clear index 0 is for memory and index 1 is for SMM. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-By: Kirill Martynov <stdcalllevi@yandex-team.ru> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250730095253.1833411-3-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* i386/cpu: Enable SMM cpu address space under KVMXiaoyao Li2025-09-172-0/+15
| | | | | | | | | | | | | | | | | | | | | | Kirill Martynov reported assertation in cpu_asidx_from_attrs() being hit when x86_cpu_dump_state() is called to dump the CPU state[*]. It happens when the CPU is in SMM and KVM emulation failure due to misbehaving guest. The root cause is that QEMU i386 never enables the SMM address space for cpu since KVM SMM support has been added. Enable the SMM cpu address space under KVM when the SMM is enabled for the x86machine. [*] https://lore.kernel.org/qemu-devel/20250523154431.506993-1-stdcalllevi@yandex-team.ru/ Reported-by: Kirill Martynov <stdcalllevi@yandex-team.ru> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Tested-by: Kirill Martynov <stdcalllevi@yandex-team.ru> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20250730095253.1833411-2-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel: make all calls to qemu_process_cpu_events look the samePaolo Bonzini2025-09-172-8/+4
| | | | | | | | | | | There is no reason for some accelerators to use qemu_process_cpu_events_common (which is separated from qemu_process_cpu_events() specifically for round robin TCG). They can also check for events directly on the first pass through the loop, instead of setting cpu->exit_request to true. Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* cpus: clear exit_request in qemu_process_cpu_eventsPaolo Bonzini2025-09-172-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the code common to all accelerators: after seeing cpu->exit_request set to true, accelerator code needs to reach qemu_process_cpu_events_common(). So for the common cases where they use qemu_process_cpu_events(), go ahead and clear it in there. Note that the cheap qatomic_set() is enough because at this point the thread has taken the BQL; qatomic_set_mb() is not needed. In particular, this is the ordering of the communication between I/O and vCPU threads is always the same. In the I/O thread: (a) store other memory locations that will be checked if cpu->exit_request or cpu->interrupt_request is 1 (for example cpu->stop or cpu->work_list for cpu->exit_request) (b) cpu_exit(): store-release cpu->exit_request, or (b) cpu_interrupt(): store-release cpu->interrupt_request >>> at this point, cpu->halt_cond is broadcast and the BQL released (c) do the accelerator-specific kick (e.g. write icount_decr for TCG, pthread_kill for KVM, etc.) In the vCPU thread instead the opposite order is respected: (c) the accelerator's execution loop exits thanks to the kick (b) then the inner execution loop checks cpu->interrupt_request and cpu->exit_request. If needed cpu->interrupt_request is converted into cpu->exit_request when work is needed outside the execution loop. (a) then the other memory locations are checked. Some may need to be read under the BQL, but the vCPU thread may also take other locks (e.g. for queued work items) or none at all. qatomic_set_mb() would only be needed if the halt sleep was done outside the BQL (though in that case, cpu->exit_request probably would be replaced by a QemuEvent or something like that). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* treewide: rename qemu_wait_io_event/qemu_wait_io_event_commonPaolo Bonzini2025-09-172-2/+2
| | | | | | | | Do so before extending it to the user-mode emulators, where there is no such thing as an "I/O thread". Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* cpus: properly kick CPUs out of inner execution loopPaolo Bonzini2025-09-171-1/+0
| | | | | | | | Now that cpu_exit() actually kicks all accelerators, use it whenever the message to another thread is processed in qemu_wait_io_event(). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel: use atomic accesses for exit_requestPaolo Bonzini2025-09-174-8/+8
| | | | | | | | | | | | | | | CPU threads write exit_request as a "note to self" that they need to go out to a slow path. This write happens out of the BQL and can be a data race with another threads' cpu_exit(); use atomic accesses consistently. While at it, change the source argument from int ("1") to bool ("true"). Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* accel: use store_release/load_acquire for cross-thread exit_requestPaolo Bonzini2025-09-172-4/+4
| | | | | | | | | | | | | | | | | Reads and writes cpu->exit_request do not use a load-acquire/store-release pair right now, but this means that cpu_exit() may not write cpu->exit_request after any flags that are read by the vCPU thread. Probably everything is protected one way or the other by the BQL, because cpu->exit_request leads to the slow path, where the CPU thread often takes the BQL (for example, to go to sleep by waiting on the BQL-protected cpu->halt_cond); but it's not clear, so use load-acquire/store-release consistently. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* treewide: clear bits of cs->interrupt_request with cpu_reset_interrupt()Paolo Bonzini2025-09-176-30/+29
| | | | | | | | | | | Open coding cpu_reset_interrupt() can cause bugs if the BQL is not taken, for example i386 has the call chain kvm_cpu_exec() -> kvm_put_vcpu_events() -> kvm_arch_put_registers(). Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: limit a20 to system emulationPaolo Bonzini2025-09-171-0/+2
| | | | | | | | | | It is not used by user-mode emulation and is the only caller of cpu_interrupt() in qemu-i386 and qemu-x86_64. Reviewed-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* i386/kvm/vmsr_energy: Plug memory leak on failure to connect socketMarkus Armbruster2025-09-011-5/+1
| | | | | | | | | | | vmsr_open_socket() leaks the Error set by qio_channel_socket_connect_sync(). Plug the leak by not creating the Error. Fixes: 0418f90809ae (Add support for RAPL MSRs in KVM/Qemu) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20250723133257.1497640-2-armbru@redhat.com> Reviewed-by: Zhao Liu <zhao1.liu@intel.com>
* kvm: i386: irqchip: take BQL only if there is an interruptIgor Mammedov2025-08-291-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when kernel-irqchip=split is used, QEMU still hits BQL contention issue when reading ACPI PM/HPET timers (despite of timer[s] access being lock-less). So Windows with more than 255 cpus is still not able to boot (since it requires iommu -> split irqchip). Problematic path is in kvm_arch_pre_run() where BQL is taken unconditionally when split irqchip is in use. There are a few parts that BQL protects there: 1. interrupt check and injecting however we do not take BQL when checking for pending interrupt (even within the same function), so the patch takes the same approach for cpu->interrupt_request checks and takes BQL only if there is a job to do. 2. request_interrupt_window access CPUState::kvm_run::request_interrupt_window doesn't need BQL as it's accessed by its own vCPU thread. 3. cr8/cpu_get_apic_tpr access the same (as #2) applies to CPUState::kvm_run::cr8, and APIC registers are also cached/synced (get/put) within the vCPU thread it belongs to. Taking BQL only when is necessary, eleminates BQL bottleneck on IO/MMIO only exit path, improoving latency by 80% on HPET micro benchmark. This lets Windows to boot succesfully (in case hv-time isn't used) when more than 255 vCPUs are in use. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20250814160600.2327672-8-imammedo@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* add cpu_test_interrupt()/cpu_set_interrupt() helpers and use them tree wideIgor Mammedov2025-08-297-61/+60
| | | | | | | | | | | | | | | | The helpers form load-acquire/store-release pair and ensure that appropriate barriers are in place in case checks happen outside of BQL. Use them to replace open-coded checkers/setters across the code, to make sure that barriers are not missed. Helpers also make code a bit more readable. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Jason J. Herne <jjherne@linux.ibm.com> Link: https://lore.kernel.org/r/20250821155603.2422553-1-imammedo@redhat.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* i386/tcg/svm: fix incorrect canonicalizationZero Tang2025-08-271-1/+1
| | | | | | | | | | | | | | | | | For all 32-bit systems and 64-bit Windows systems, "long" is 4 bytes long. Due to using "long" for a linear address, svm_canonicalization would set all high bits to 1 when (assuming 48-bit linear address) the segment base is bigger than 0x7FFF. This fixes booting guests under TCG when the guest IDT and GDT bases are above 0x7FFF, thereby resulting in incorrect bases. When an interrupt arrives, it would trigger a #PF exception; the #PF would trigger again, resulting in a #DF exception; the #PF would trigger for the third time, resulting in triple-fault, and eventually causes a shutdown VM-Exit to the hypervisor right after guest boot. Cc: qemu-stable@nongnu.org Signed-off-by: Zero Tang <zero.tangptr@gmail.com>
* target/i386: Add support for save/load of exception error codeXin Wang2025-08-201-0/+19
| | | | | | | | | | | | | | | | | | | | | For now, qemu save/load CPU exception info(such as exception_nr and has_error_code), while the exception error_code is ignored. This will cause the dest hypervisor reinject a vCPU exception with error_code(0), potentially causing a guest kernel panic. For instance, if src VM stopped with an user-mode write #PF (error_code 6), the dest hypervisor will reinject an #PF with error_code(0) when vCPU resume, then guest kernel panic as: BUG: unable to handle page fault for address: 00007f80319cb010 #PF: supervisor read access in user mode #PF: error_code(0x0000) - not-present page RIP: 0033:0x40115d To fix it, support save/load exception error_code. Signed-off-by: Xin Wang <wangxinxin.wang@huawei.com> Link: https://lore.kernel.org/r/20250819145834.3998-1-wangxinxin.wang@huawei.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Revert "i386/cpu: Warn about why CPUID_EXT_PDCM is not available"Paolo Bonzini2025-08-191-3/+0
| | | | | | | | | | | | | | | | | | | This reverts commit 00268e00027459abede448662f8794d78eb4b0a4. (The only conflict is in the !is_tdx_vm() part of the condition, which is safe to keep). mark_unavailable_features() actively blocks usage of the feature, so it is a functional change, not merely a emitting warning. The commit was intended to merely warn if PDCM was enabled when the performance counters are not, so revert it. Reported-by: Christian A. Ehrhardt <christian.ehrhardt@canonical.com> Analyzed-by: Daniel P. Berrangé <berrange@redhat.com> Analyzed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20250819150235.785559-1-pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* target/i386/cpu: Move addressable ID encoding out of compat property in ↵Zhao Liu2025-08-051-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPUID[0x1] Currently, the addressable ID encoding for CPUID[0x1].EBX[bits 16-23] (Maximum number of addressable IDs for logical processors in this physical package) is covered by vendor_cpuid_only_v2 compat property. The previous consideration was to avoid breaking migration and this compat property makes it unfriendly to backport the commit f985a1195ba2 ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX [23:16]"). However, NetBSD booting is broken since the commit 88dd4ca06c83 ("i386/cpu: Use APIC ID info to encode cache topo in CPUID[4]"), because NetBSD calculates smt information via `lp_max` / `core_max` for legacy Intel CPUs which doesn't support 0xb leaf, where `lp_max` is from CPUID[0x1].EBX.bits[16-23] and `core_max` is from CPUID[0x4].0x0.bits[26 -31]. The commit 88dd4ca0 changed the encoding rule of `core_max` but didn't update `lp_max`, so that NetBSD would get the wrong smt information, which leads to the module loading failure. Luckily, the commit f985a1195ba2 ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]") updated the encoding rule for `lp_max` and accidentally fixed the NetBSD issue too. This also shows that using CPUID[0x1] and CPUID[0x4].0x0 to calculate HT/SMT information is a common practice to detect CPU topology on legacy Intel CPUs. Therefore, it's necessary to backport the commit f985a1195ba2 to previous stable QEMU to help address the similar issues as well. Then the compat property is not needed any more since all stable QEMUs will follow the same encoding way. So, in CPUID[0x1], move addressable ID encoding out of compat property. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Inspired-by: Chuang Xu <xuchuangxclwt@bytedance.com> Fixes: commit f985a1195ba2 ("i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]") Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3061 Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Tested-by: Michael Tokarev <mjt@tls.msk.ru> Message-ID: <20250804053548.1808629-1-zhao1.liu@intel.com> Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
* target/i386: fix width of third operand of VINSERTx128Paolo Bonzini2025-07-251-2/+2
| | | | | | | | | | | | | | | | | | | | | Table A-5 of the Intel manual incorrectly lists the third operand of VINSERTx128 as Wqq, but it is actually a 128-bit value. This is visible when W is a memory operand close to the end of the page. Fixes the recently-added poly1305_kunit test in linux-next. (No testcase yet, but I plan to modify test-avx2 to use memory close to the end of the page. This would work because the test vectors correctly have the memory operand as xmm2/m128). Reported-by: Eric Biggers <ebiggers@kernel.org> Tested-by: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Guenter Roeck <linux@roeck-us.net> Cc: qemu-stable@nongnu.org Fixes: 79068477686 ("target/i386: reimplement 0x0f 0x3a, add AVX", 2022-10-18) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* i386/tdx: Remove the redundant qemu_mutex_init(&tdx->lock)Xiaoyao Li2025-07-171-2/+0
| | | | | | | | | | | | | | Commit 40da501d8989 ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>") added redundant qemu_mutex_init(&tdx->lock) in tdx_guest_init by mistake. Fix it by removing the redundant one. Fixes: 40da501d8989 ("i386/tdx: handle TDG.VP.VMCALL<GetQuote>") Reported-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> Link: https://lore.kernel.org/r/20250717103707.688929-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* i386/cpu: Cleanup host_cpu_max_instance_init()Xiaoyao Li2025-07-171-1/+0
| | | | | | | | | | | | | | | The implementation of host_cpu_max_instance_init() was merged into host_cpu_instance_init() by commit 29f1ba338baf ("target/i386: merge host_cpu_instance_init() and host_cpu_max_instance_init()"), while the declaration of it remains in host-cpu.h. Clean it up. Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250716063117.602050-1-xiaoyao.li@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: tdx: fix locking for interrupt injectionPaolo Bonzini2025-07-171-3/+7
| | | | | | | | | | | | Take tdx_guest->lock when injecting the event notification interrupt into the guest. Fixes CID 1612364. Reported-by: Peter Maydell <peter.maydell@linaro.org> Cc: Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* i386/cpu: Move x86_ext_save_areas[] initialization to .instance_initZhao Liu2025-07-171-7/+15
| | | | | | | | | | | | | | | | | | | | | | | | In x86_cpu_post_initfn(), the initialization of x86_ext_save_areas[] marks the unsupported xsave areas based on Host support. This step must be done before accel_cpu_instance_init(), otherwise, KVM's assertion on host xsave support would fail: qemu-system-x86_64: ../target/i386/kvm/kvm-cpu.c:149: kvm_cpu_xsave_init: Assertion `esa->size == eax' failed. (on AMD EPYC 7302 16-Core Processor) Move x86_ext_save_areas[] initialization to .instance_init and place it before accel_cpu_instance_init(). Fixes: commit 5f158abef44c ("target/i386: move accel_cpu_instance_init to .instance_init") Reported-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250717023933.2502109-1-zhao1.liu@intel.com Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* target/i386: do not expose ARCH_CAPABILITIES on AMD CPUPaolo Bonzini2025-07-171-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM emulates the ARCH_CAPABILITIES on x86 for both Intel and AMD cpus, although the IA32_ARCH_CAPABILITIES MSR is an Intel-specific MSR and it makes no sense to emulate it on AMD. As a consequence, VMs created on AMD with qemu -cpu host and using KVM will advertise the ARCH_CAPABILITIES feature and provide the IA32_ARCH_CAPABILITIES MSR. This can cause issues (like Windows BSOD) as the guest OS might not expect this MSR to exist on such cpus (the AMD documentation specifies that ARCH_CAPABILITIES feature and MSR are not defined on the AMD architecture). A fix was proposed in KVM code, however KVM maintainers don't want to change this behavior that exists for 6+ years and suggest changes to be done in QEMU instead. Therefore, hide the bit from "-cpu host": migration of -cpu host guests is only possible between identical host kernel and QEMU versions, therefore this is not a problematic breakage. If a future AMD machine does include the MSR, that would re-expose the Windows guest bug; but it would not be KVM/QEMU's problem at that point, as we'd be following a genuine physical CPU impl. Reported-by: Alexandre Chartre <alexandre.chartre@oracle.com> Suggested-by: Daniel P. Berrangé <berrange@redhat.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'accel-20250715' of https://github.com/philmd/qemu into stagingStefan Hajnoczi2025-07-164-2/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accelerators patches - Unify x86/arm hw/xen/arch_hvm.h header - Move non-system-specific 'accel/accel-ops.h' and 'accel-cpu-ops.h' to accel/ - Move KVM definitions qapi/accelerator.json - Add @qom-type field to CpuInfoFast QAPI structure - Display CPU model name in 'info cpus' HMP command - Introduce @x-accel-stats QMP command - Add 'info accel' on HMP - Improve qemu_add_vm_change_state_handler*() docstring - Extract TCG statistic related code to tcg-stats.c - Implement AccelClass::get_[vcpu]_stats() handlers for TCG and HVF - Do not dump NaN in TCG statistics - Revert incomplete "accel/tcg: Unregister the RCU before exiting RR thread" # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEE+qvnXhKRciHc/Wuy4+MsLN6twN4FAmh2r4UACgkQ4+MsLN6t # wN5i6xAAkOvwFh1GmsPUdz5RxzsWoIUDvyENg6E8Axwe5tSEMRFiPjabbTQJomQg # GZt75XIS24LZFZ+hvqrLSA+dFgXTgWv08ZE81EjwjmAMBlLCOPhCgeN6C1p8100Y # scSvRJbP9k9lpA5K7et/1X4AkK2cZyh+LGJgCjr2Al2mbERpPueDF8fxqeohFvXQ # nTSks4XlA0yQ06+9r49aQAiuXvgg9lDT1wIglD2HEV7vOVs/ud+yyL8+z5YMeFzx # pSIc6wDu4PqdA46w4MZs90uTy7S/PMvBiYDEiV3tKzg0MLttvFGlT58/YjVtguTP # mNkfwIEwQtDQzoxsFIJO7yBTlTRBs95V4aIVk3pB+Gb/bideRPIkeVQvgMSEBKj7 # N0pEXWOxfB9iIWO6b1utYpQ4uxeDOU/8DPUCit1IBbNgKTaJkJb77fboYk7NaB0K # KEtObAk6jMatB/xr+vUFWc4sMk9wlm72w8wcQzgKZ0xV2U3d1/Y/9nS4GvI510ev # TRQ3mKj7N319uCeId1czF6W8rillCJ2u8ZK53u+Nfp7R3PbsRSMc6IDJ1UdDUlyR # HFcWHxbcbEGhe8SnFGab4Qd6fWChcn2EaEoAJJz+Rqv0k3zcwqccNM5waCABAjTE # 0S22JIHePJKcpkMLGq3EOUAQuu+8Zsol7gPCLxSAMclVqPTl9ck= # =rAav # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 15:44:05 EDT # gpg: using RSA key FAABE75E12917221DCFD6BB2E3E32C2CDEADC0DE # gpg: Good signature from "Philippe Mathieu-Daudé (F4BUG) <f4bug@amsat.org>" [full] # Primary key fingerprint: FAAB E75E 1291 7221 DCFD 6BB2 E3E3 2C2C DEAD C0DE * tag 'accel-20250715' of https://github.com/philmd/qemu: system/runstate: Document qemu_add_vm_change_state_handler_prio* in hdr system/runstate: Document qemu_add_vm_change_state_handler() accel/hvf: Implement AccelClass::get_vcpu_stats() handler accel/tcg: Implement AccelClass::get_stats() handler accel/tcg: Propagate AccelState to dump_accel_info() accel/system: Add 'info accel' on human monitor accel/system: Introduce @x-accel-stats QMP command accel/tcg: Extract statistic related code to tcg-stats.c Revert "accel/tcg: Unregister the RCU before exiting RR thread" accel: Extract AccelClass definition to 'accel/accel-ops.h' accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h' accel/tcg: Do not dump NaN statistics hw/core/machine: Display CPU model name in 'info cpus' command qapi/machine: Add @qom-type field to CpuInfoFast structure qapi/accel: Move definitions related to accelerators in their own file hw/arm/xen-pvh: Remove unnecessary 'hw/xen/arch_hvm.h' header hw/xen/arch_hvm: Unify x86 and ARM variants Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: qapi/machine.json Commit 0462da9d6b19 ("qapi: remove trivial "Returns:" sections") removed trivial "Returns:". This caused a conflict with the move from machine.json to accelerator.json.
| * accel: Extract AccelClass definition to 'accel/accel-ops.h'Philippe Mathieu-Daudé2025-07-152-0/+2
| | | | | | | | | | | | | | | | | | | | | | Only accelerator implementations (and the common accelator code) need to know about AccelClass internals. Move the definition out but forward declare AccelState and AccelClass. Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-39-philmd@linaro.org>
| * accel: Rename 'system/accel-ops.h' -> 'accel/accel-cpu-ops.h'Philippe Mathieu-Daudé2025-07-152-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unfortunately "system/accel-ops.h" handlers are not only system-specific. For example, the cpu_reset_hold() hook is part of the vCPU creation, after it is realized. Mechanical rename to drop 'system' using: $ sed -i -e s_system/accel-ops.h_accel/accel-cpu-ops.h_g \ $(git grep -l system/accel-ops.h) Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20250703173248.44995-38-philmd@linaro.org>
* | Merge tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu ↵Stefan Hajnoczi2025-07-162-2/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | into staging virtio,pci,pc: features, fixes, tests SPCR acpi table can now be disabled vhost-vdpa can now report hashing capability to guest PPTT acpi table now tells guest vCPUs are identical vost-user-blk now shuts down faster loongarch64 now supports bios-tables-test intel_iommu now supports ATS cxl now supports DCD Fabric Management Command Set arm now supports acpi pci hotplug fixes, cleanups Signed-off-by: Michael S. Tsirkin <mst@redhat.com> # -----BEGIN PGP SIGNATURE----- # # iQFDBAABCgAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmh1+7APHG1zdEByZWRo # YXQuY29tAAoJECgfDbjSjVRpcZ8H/2udpCZ49vjPB8IwQAGdFTw2TWVdxUQFHexQ # pOsCGyFBNAXqD1bmb8lwWyYVJ08WELyL6xWsQ5tfVPiXpKYYHPHl4rNr/SPoyNcv # joY++tagudmOki2DU7nfJ+rPIIuigOTUHbv4TZciwcHle6f65s0iKXhR1sL0cj4i # TS6iJlApSuJInrBBUxuxSUomXk79mFTNKRiXj1k58LRw6JOUEgYvtIW8i+mOUcTg # h1dZphxEQr/oG+a2pM8GOVJ1AFaBPSfgEnRM4kTX9QuTIDCeMAKUBo/mwOk6PV7z # ZhSrDPLrea27XKGL++EJm0fFJ/AsHF1dTks2+c0rDrSK+UV87Zc= # =sktm # -----END PGP SIGNATURE----- # gpg: Signature made Tue 15 Jul 2025 02:56:48 EDT # gpg: using RSA key 5D09FD0871C8F85B94CA8A0D281F0DB8D28D5469 # gpg: issuer "mst@redhat.com" # gpg: Good signature from "Michael S. Tsirkin <mst@kernel.org>" [full] # gpg: aka "Michael S. Tsirkin <mst@redhat.com>" [full] # Primary key fingerprint: 0270 606B 6F3C DF3D 0B17 0970 C350 3912 AFBE 8E67 # Subkey fingerprint: 5D09 FD08 71C8 F85B 94CA 8A0D 281F 0DB8 D28D 5469 * tag 'for_upstream' of https://git.kernel.org/pub/scm/virt/kvm/mst/qemu: (97 commits) hw/cxl: mailbox-utils: 0x5605 - FMAPI Initiate DC Release hw/cxl: mailbox-utils: 0x5604 - FMAPI Initiate DC Add hw/cxl: Create helper function to create DC Event Records from extents hw/cxl: mailbox-utils: 0x5603 - FMAPI Get DC Region Extent Lists hw/cxl: mailbox-utils: 0x5602 - FMAPI Set DC Region Config hw/mem: cxl_type3: Add DC Region bitmap lock hw/cxl: Move definition for dynamic_capacity_uuid and enum for DC event types to header hw/cxl: mailbox-utils: 0x5601 - FMAPI Get Host Region Config hw/mem: cxl_type3: Add dsmas_flags to CXLDCRegion struct hw/cxl: mailbox-utils: 0x5600 - FMAPI Get DCD Info hw/cxl: fix DC extent capacity tracking tests: virt: Update expected ACPI tables for virt test hw/acpi/aml-build: Build a root node in the PPTT table hw/acpi/aml-build: Set identical implementation flag for PPTT processor nodes tests: virt: Allow changes to PPTT test table qtest/bios-tables-test: Generate reference blob for DSDT.acpipcihp qtest/bios-tables-test: Generate reference blob for DSDT.hpoffacpiindex tests/qtest/bios-tables-test: Add aarch64 ACPI PCI hotplug test tests/qtest/bios-tables-test: Prepare for addition of acpi pci hp tests hw/arm/virt: Let virt support pci hotplug/unplug GED event ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Conflicts: net/vhost-vdpa.c vhost_vdpa_set_steering_ebpf() was removed, resolve the context conflict.
| * | qemu: Declare all load/store helper in 'qemu/bswap.h'Philippe Mathieu-Daudé2025-07-152-2/+0
| |/ | | | | | | | | | | | | | | | | | | | | | | Restrict "exec/tswap.h" to the tswap*() methods, move the load/store helpers with the other ones declared in "qemu/bswap.h". Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org> Message-Id: <20250708215320.70426-8-philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* | i386/cpu: Honor maximum value for CPUID.8000001DH.EAX[25:14]Zhao Liu2025-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CPUID.8000001DH:EAX[25:14] is "NumSharingCache", and the number of logical processors sharing this cache is the value of this field incremented by 1. Because of its width limitation, the maximum value currently supported is 4095. Though at present Q35 supports up to 4096 CPUs, by constructing a specific topology, the width of the APIC ID can be extended beyond 12 bits. For example, using `-smp threads=33,cores=9,modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which can also cause overflow. Check and honor the maximum value as CPUID.04H did. Cc: Babu Moger <babu.moger@amd.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-8-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Fix overflow of cache topology fields in CPUID.04HQian Wen2025-07-141-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to SDM, CPUID.0x4:EAX[31:26] indicates the Maximum number of addressable IDs for processor cores in the physical package. If we launch over 64 cores VM, the 6-bit field will overflow, and the wrong core_id number will be reported. Since the HW reports 0x3f when the intel processor has over 64 cores, limit the max value written to EAX[31:26] to 63, so max num_cores should be 64. For EAX[14:25], though at present Q35 supports up to 4096 CPUs, by constructing a specific topology, the width of the APIC ID can be extended beyond 12 bits. For example, using `-smp threads=33,cores=9, modules=9` results in a die level offset of 6 + 4 + 4 = 14 bits, which can also cause overflow. check and honor the maximum value for EAX[14:25] as well. In addition, for host-cache-info case, also apply the same checks and fixes. Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Qian Wen <qian.wen@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Fix cpu number overflow in CPUID.01H.EBX[23:16]Qian Wen2025-07-141-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The legacy topology enumerated by CPUID.1.EBX[23:16] is defined in SDM Vol2: Bits 23-16: Maximum number of addressable IDs for logical processors in this physical package. When threads_per_socket > 255, it will 1) overwrite bits[31:24] which is apic_id, 2) bits [23:16] get truncated. Specifically, if launching the VM with -smp 256, the value written to EBX[23:16] is 0 because of data overflow. If the guest only supports legacy topology, without V2 Extended Topology enumerated by CPUID.0x1f or Extended Topology enumerated by CPUID.0x0b to support over 255 CPUs, the return of the kernel invoking cpu_smt_allowed() is false and APs (application processors) will fail to bring up. Then only CPU 0 is online, and others are offline. For example, launch VM via: qemu-system-x86_64 -M q35,accel=kvm,kernel-irqchip=split \ -cpu qemu64,cpuid-0xb=off -smp 256 -m 32G \ -drive file=guest.img,if=none,id=virtio-disk0,format=raw \ -device virtio-blk-pci,drive=virtio-disk0,bootindex=1 --nographic The guest shows: CPU(s): 256 On-line CPU(s) list: 0 Off-line CPU(s) list: 1-255 To avoid this issue caused by overflow, limit the max value written to EBX[23:16] to 255 as the HW does. Cc: qemu-stable@nongnu.org Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Qian Wen <qian.wen@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-6-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Fix number of addressable IDs field for CPUID.01H.EBX[23:16]Chuang Xu2025-07-141-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When QEMU is started with: -cpu host,migratable=on,host-cache-info=on,l3-cache=off -smp 180,sockets=2,dies=1,cores=45,threads=2 On Intel platform: CPUID.01H.EBX[23:16] is defined as "max number of addressable IDs for logical processors in the physical package". When executing "cpuid -1 -l 1 -r" in the guest, we obtain a value of 90 for CPUID.01H.EBX[23:16], whereas the expected value is 128. Additionally, executing "cpuid -1 -l 4 -r" in the guest yields a value of 63 for CPUID.04H.EAX[31:26], which matches the expected result. As (1+CPUID.04H.EAX[31:26]) rounds up to the nearest power-of-2 integer, it's necessary to round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer too. Otherwise there would be unexpected results in guest with older kernel. For example, when QEMU is started with CLI above and xtopology is disabled, guest kernel 5.15.120 uses CPUID.01H.EBX[23:16]/(1+CPUID.04H.EAX[31:26]) to calculate threads-per-core in detect_ht(). Then guest will get "90/(1+63)=1" as the result, even though threads-per-core should actually be 2. And on AMD platform: CPUID.01H.EBX[23:16] is defined as "Logical processor count". Current result meets our expectation. So round up CPUID.01H.EBX[23:16] to the nearest power-of-2 integer only for Intel platform to solve the unexpected result. Use the "x-vendor-cpuid-only-v2" compat option to fix this issue. Reviewed-by: Zhao Liu <zhao1.liu@intel.com> Signed-off-by: Guixiong Wei <weiguixiong@bytedance.com> Signed-off-by: Yipeng Yin <yinyipeng@bytedance.com> Signed-off-by: Chuang Xu <xuchuangxclwt@bytedance.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Reorder CPUID leaves in cpu_x86_cpuid()Zhao Liu2025-07-141-30/+30
| | | | | | | | | | | | | | | | | | | | Sort the CPUID leaves strictly by index to facilitate checking and changing. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20250627035129.2755537-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Mark CPUID 0x80000008 ECX bits[0:7] & [12:15] as reserved for ↵Zhao Liu2025-07-141-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel/Zhaoxin Per SDM, 80000008H EAX Linear/Physical Address size. Bits 07-00: #Physical Address Bits*. Bits 15-08: #Linear Address Bits. Bits 31-16: Reserved = 0. EBX Bits 08-00: Reserved = 0. Bit 09: WBNOINVD is available if 1. Bits 31-10: Reserved = 0. ECX Reserved = 0. EDX Reserved = 0. ECX/EDX in CPUID 0x80000008 leaf are reserved. Currently, in QEMU, only ECX bits[0:7] and ECX bits[12:15] are encoded, and both are emulated in QEMU. Considering that Intel and Zhaoxin are already using the 0x1f leaf to describe CPU topology, which includes similar information, Intel and Zhaoxin will not implement ECX bits[0:7] and bits[12:15] of 0x80000008. Therefore, mark these two fields as reserved and clear them for Intel and Zhaoxin guests. Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250714080859.1960104-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Mark CPUID 0x80000007[EBX] as reserved for IntelZhao Liu2025-07-141-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per SDM, 80000007H EAX Reserved = 0. EBX Reserved = 0. ECX Reserved = 0. EDX Bits 07-00: Reserved = 0. Bit 08: Invariant TSC available if 1. Bits 31-09: Reserved = 0. EAX/EBX/ECX in CPUID 0x80000007 leaf are reserved for Intel. At present, EAX is reserved for AMD, too. And AMD hasn't used ECX in QEMU. So these 2 registers are both left as 0. Therefore, only fix the EBX and excode it as 0 for Intel. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20250627035129.2755537-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Mark EBX/ECX/EDX in CPUID 0x80000000 leaf as reserved for IntelZhao Liu2025-07-141-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per SDM, 80000000H EAX Maximum Input Value for Extended Function CPUID Information. EBX Reserved. ECX Reserved. EDX Reserved. EBX/ECX/EDX in CPUID 0x80000000 leaf are reserved. Intel is using 0x0 leaf to encode vendor. Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Link: https://lore.kernel.org/r/20250627035129.2755537-2-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Enable 0x1f leaf for YongFeng by defaultZhao Liu2025-07-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | Host YongFeng CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Ewan Hai <ewanhai-oc@zhaoxin.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-10-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Enable 0x1f leaf for SapphireRapids by defaultZhao Liu2025-07-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | Host SapphireRapids CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-9-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Enable 0x1f leaf for GraniteRapids by defaultZhao Liu2025-07-121-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | Host GraniteRapids CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-8-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Enable 0x1f leaf for SierraForest by defaultZhao Liu2025-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Host SierraForest CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Enable 0x1f leaf for SierraForest by defaultZhao Liu2025-07-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | Host SierraForest CPU has 0x1f leaf by default, so that enable it for Guest CPU by default as well. Suggested-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-7-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Add a "x-force-cpuid-0x1f" propertyManish Mishra2025-07-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a "x-force-cpuid-0x1f" property so that CPU models can enable it and have 0x1f CPUID leaf natually as the Host CPU. The advantage is that when the CPU model's cache model is already consistent with the Host CPU, for example, SRF defaults to l2 per module & l3 per package, 0x1f can better help users identify the topology in the VM. Adding 0x1f for specific CPU models should not cause any trouble in principle. This property is only enabled for CPU models that already have 0x1f leaf on the Host, so software that originally runs normally on the Host won't encounter issues in the Guest with corresponding CPU model. Conversely, some software that relies on checking 0x1f might have problems in the Guest due to the lack of 0x1f [*]. In summary, adding 0x1f is also intended to further emulate the Host CPU environment. [*]: https://lore.kernel.org/qemu-devel/PH0PR02MB738410511BF51B12DB09BE6CF6AC2@PH0PR02MB7384.namprd02.prod.outlook.com/ Signed-off-by: Manish Mishra <manish.mishra@nutanix.com> Co-authored-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> [Integrated and rebased 2 previous patches (ordered by post time)] Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-6-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Introduce cache model for YongFengEwan Hai2025-07-121-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the cache model to YongFeng (v3) to better emulate its environment. Note, although YongFeng v2 was added after v10.0, it was also back ported to v10.0.2. Therefore, the new version (v3) is needed to avoid conflict. The cache model is as follows: --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 32768 (32 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 65536 (64 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x200 (512) WBINVD/INVD acts on lower caches = false inclusive to lower caches = true complex cache indexing = false number of sets (s) = 512 (size synth) = 262144 (256 KB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x0 (0) maximum IDs for cores in pkg = 0x0 (0) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x2000 (8192) WBINVD/INVD acts on lower caches = true inclusive to lower caches = true complex cache indexing = false number of sets (s) = 8192 (size synth) = 8388608 (8 MB) --- cache 4 --- cache type = no more caches (0) Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Ewan Hai <ewanhai-oc@zhaoxin.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-5-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Introduce cache model for SapphireRapidsZhao Liu2025-07-121-0/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the cache model to SapphireRapids (v4) to better emulate its environment. The cache model is based on SapphireRapids-SP (Scalable Performance): --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xc (12) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 49152 (48 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x8 (8) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 32768 (32 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x800 (2048) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 2048 (size synth) = 2097152 (2 MB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x7f (127) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xf (15) number of sets = 0x10000 (65536) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = true number of sets (s) = 65536 (size synth) = 62914560 (60 MB) --- cache 4 --- cache type = no more caches (0) Suggested-by: Tejus GK <tejus.gk@nutanix.com> Suggested-by: Jason Zeng <jason.zeng@intel.com> Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-4-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | i386/cpu: Introduce cache model for GraniteRapidsZhao Liu2025-07-121-0/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the cache model to GraniteRapids (v3) to better emulate its environment. The cache model is based on GraniteRapids-SP (Scalable Performance): --- cache 0 --- cache type = data cache (1) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0xc (12) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 49152 (48 KB) --- cache 1 --- cache type = instruction cache (2) cache level = 0x1 (1) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x40 (64) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 64 (size synth) = 65536 (64 KB) --- cache 2 --- cache type = unified cache (3) cache level = 0x2 (2) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0x1 (1) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x800 (2048) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = false number of sets (s) = 2048 (size synth) = 2097152 (2 MB) --- cache 3 --- cache type = unified cache (3) cache level = 0x3 (3) self-initializing cache level = true fully associative cache = false maximum IDs for CPUs sharing cache = 0xff (255) maximum IDs for cores in pkg = 0x3f (63) system coherency line size = 0x40 (64) physical line partitions = 0x1 (1) ways of associativity = 0x10 (16) number of sets = 0x48000 (294912) WBINVD/INVD acts on lower caches = false inclusive to lower caches = false complex cache indexing = true number of sets (s) = 294912 (size synth) = 301989888 (288 MB) --- cache 4 --- cache type = no more caches (0) Suggested-by: Tejus GK <tejus.gk@nutanix.com> Suggested-by: Jason Zeng <jason.zeng@intel.com> Suggested-by: "Daniel P . Berrangé" <berrange@redhat.com> Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Reviewed-by: Tao Su <tao1.su@linux.intel.com> Tested-by: Yi Lai <yi1.lai@intel.com> Signed-off-by: Zhao Liu <zhao1.liu@intel.com> Link: https://lore.kernel.org/r/20250711104603.1634832-3-zhao1.liu@intel.com Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>