summary refs log tree commit diff stats
path: root/accel/tcg/cpu-exec.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
* | accel/tcg: drop the use of CF_HASH_MASK and rename paramsAlex Bennée2021-03-061-8/+8
| | | | | | | | | | | | | | | | | | | | | | We don't really deal in cf_mask most of the time. The one time it's relevant is when we want to remove an invalidated TB from the QHT lookup. Everywhere else we should be looking up things without CF_INVALID set. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210224165811.11567-4-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* | accel/tcg: move CF_CLUSTER calculation to curr_cflagsAlex Bennée2021-03-061-5/+4
| | | | | | | | | | | | | | | | | | | | There is nothing special about this compile flag that doesn't mean we can't just compute it with curr_cflags() which we should be using when building a new set. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210224165811.11567-3-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* | accel/tcg: rename tb_lookup__cpu_state and hoist state extractionAlex Bennée2021-03-061-2/+8
|/ | | | | | | | | | Having a function return either and valid TB and some system state seems excessive. It will make the subsequent re-factoring easier if we lookup the current state where we are. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210224165811.11567-2-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/tcg: cache single instruction TB on pending replay exceptionAlex Bennée2021-02-181-40/+4
| | | | | | | | | | | | | | Again there is no reason to jump through the nocache hoops to execute a single instruction block. We do have to add an additional wrinkle to the cpu_handle_interrupt case to ensure we let through a TB where we have specifically disabled icount for the block. As the last user of cpu_exec_nocache we can now remove the function. Further clean-up will follow in subsequent patches. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210213130325.14781-18-alex.bennee@linaro.org>
* accel/tcg: actually cache our partial icount TBAlex Bennée2021-02-181-8/+9
| | | | | | | | | | | | | When we exit a block under icount with instructions left to execute we might need a shorter than normal block to take us to the next deterministic event. Instead of creating a throwaway block on demand we use the existing compile flags mechanism to ensure we fetch (or compile and fetch) a block with exactly the number of instructions we need. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210213130325.14781-17-alex.bennee@linaro.org>
* accel/tcg: Add URL of clang bug to comment about our workaroundPeter Maydell2021-02-111-6/+19
| | | | | | | | | | | | | | | | | | | | | In cpu_exec() we have a longstanding workaround for compilers which do not correctly implement the part of the sigsetjmp()/siglongjmp() spec which requires that local variables which are not changed between the setjmp and the longjmp retain their value. I recently ran across the upstream clang bug report for this; add a link to it to the comment describing the workaround, and generally expand the comment, so that we have a reasonable chance in future of understanding why it's there and determining when we can remove it, assuming clang eventually fixes the bug. Remove the /* buggy compiler */ comments on the #else and #endif: they don't add anything to understanding and are somewhat misleading since they're sandwiching the code path for *non*-buggy compilers. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-id: 20210129130330.30820-1-peter.maydell@linaro.org
* cpu: tcg_ops: move to tcg-cpu-ops.h, keep a pointer in CPUClassClaudio Fontana2021-02-051-13/+14
| | | | | | | | | | | | | | | | | | | we cannot in principle make the TCG Operations field definitions conditional on CONFIG_TCG in code that is included by both common_ss and specific_ss modules. Therefore, what we can do safely to restrict the TCG fields to TCG-only builds, is to move all tcg cpu operations into a separate header file, which is only included by TCG, target-specific code. This leaves just a NULL pointer in the cpu.h for the non-TCG builds. This also tidies up the code in all targets a bit, having all TCG cpu operations neatly contained by a dedicated data struct. Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20210204163931.7358-16-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* cpu: move cc->do_interrupt to tcg_opsClaudio Fontana2021-02-051-2/+2
| | | | | | | | Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210204163931.7358-10-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* cpu: Move debug_excp_handler to tcg_opsEduardo Habkost2021-02-051-2/+2
| | | | | | | | | Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210204163931.7358-8-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* cpu: Move cpu_exec_* to tcg_opsEduardo Habkost2021-02-051-6/+6
| | | | | | | | | | | Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> [claudio: wrapped target code in CONFIG_TCG] Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20210204163931.7358-6-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* cpu: Move synchronize_from_tb() to tcg_opsEduardo Habkost2021-02-051-2/+2
| | | | | | | | | Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> [claudio: wrapped target code in CONFIG_TCG, reworded comments] Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20210204163931.7358-5-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/tcg: split TCG-only code from cpu_exec_realizefnClaudio Fontana2021-02-051-0/+28
| | | | | | | | | | | move away TCG-only code, make it compile only on TCG. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [claudio: moved the prototypes from hw/core/cpu.h to exec/cpu-all.h] Signed-off-by: Claudio Fontana <cfontana@suse.de> Message-Id: <20210204163931.7358-4-cfontana@suse.de> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Toggle page execution for Apple SiliconRoman Bolshakov2021-01-231-0/+2
| | | | | | | | | | | | | | | Pages can't be both write and executable at the same time on Apple Silicon. macOS provides public API to switch write protection [1] for JIT applications, like TCG. 1. https://developer.apple.com/documentation/apple_silicon/porting_just-in-time_compilers_to_apple_silicon Tested-by: Alexander Graf <agraf@csgraf.de> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Message-Id: <20210113032806.18220-1-r.bolshakov@yadro.com> [rth: Inline the qemu_thread_jit_* functions; drop the MAP_JIT change for a follow-on patch.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/tcg: Restrict tb_gen_code() from other acceleratorsPhilippe Mathieu-Daudé2021-01-231-0/+1
| | | | | | | | | tb_gen_code() is only called within TCG accelerator, declare it locally. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20210117164813.4101761-4-f4bug@amsat.org> [rth: Adjust vs changed tb_flush_jmp_cache patch.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: update the cpu running flag in cpu_exec_step_atomicDouglas Crosher2021-01-221-0/+4
| | | | | | | | | | | | | | | The cpu_exec_step_atomic() function is called with the cpu->running clear and proceeds to run target code without setting this flag. If this target code generates an exception then handle_cpu_signal() will unnecessarily abort. For example if atomic code generates a memory protection fault. This patch at least sets and clears this running flag, and adds some assertions to help detect other cases. Signed-off-by: Douglas Crosher <dtc-ubuntu@scieneer.com> Message-Id: <a272c656-f7c5-019d-1cc0-499b8f80f2fc@scieneer.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Return the TB pointer from the rx region from exit_tbRichard Henderson2021-01-071-14/+21
| | | | | | | | This produces a small pc-relative displacement within the generated code to the TB structure that preceeds it. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Adjust tb_target_set_jmp_target for split-wxRichard Henderson2021-01-071-1/+3
| | | | | | | Pass both rx and rw addresses to tb_target_set_jmp_target. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Introduce tcg_splitwx_to_{rx,rw}Richard Henderson2021-01-071-1/+1
| | | | | | | | | | | Add two helper functions, using a global variable to hold the displacement. The displacement is currently always 0, so no change in behaviour. Begin using the functions in tcg common code only. Reviewed-by: Joelle van Dyne <j@getutm.app> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* cfi: Initial support for cfi-icall in QEMUDaniele Buono2021-01-021-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LLVM/Clang, supports runtime checks for forward-edge Control-Flow Integrity (CFI). CFI on indirect function calls (cfi-icall) ensures that, in indirect function calls, the function called is of the right signature for the pointer type defined at compile time. For this check to work, the code must always respect the function signature when using function pointer, the function must be defined at compile time, and be compiled with link-time optimization. This rules out, for example, shared libraries that are dynamically loaded (given that functions are not known at compile time), and code that is dynamically generated at run-time. This patch: 1) Introduces the CONFIG_CFI flag to support cfi in QEMU 2) Introduces a decorator to allow the definition of "sensitive" functions, where a non-instrumented function may be called at runtime through a pointer. The decorator will take care of disabling cfi-icall checks on such functions, when cfi is enabled. 3) Marks functions currently in QEMU that exhibit such behavior, in particular: - The function in TCG that calls pre-compiled TBs - The function in TCI that interprets instructions - Functions in the plugin infrastructures that jump to callbacks - Functions in util that directly call a signal handler Signed-off-by: Daniele Buono <dbuono@linux.vnet.ibm.com> Acked-by: Alex Bennée <alex.bennee@linaro.org Message-Id: <20201204230615.2392-3-dbuono@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* tcg: Make CPUClass.debug_excp_handler optionalEduardo Habkost2020-12-161-1/+3
| | | | | | | | Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20201212155530.23098-12-cfontana@suse.de> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* tcg: make CPUClass.cpu_exec_* optionalEduardo Habkost2020-12-161-3/+8
| | | | | | | | | | | This will let us simplify the code that initializes CPU class methods, when we move cpu_exec_*() to a separate struct. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20201212155530.23098-11-cfontana@suse.de> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* tcg: cpu_exec_{enter,exit} helpersEduardo Habkost2020-12-161-5/+18
| | | | | | | | | | | | Move invocation of CPUClass.cpu_exec_*() to separate helpers, to make it easier to refactor that code later. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20201212155530.23098-10-cfontana@suse.de> Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>
* accel/tcg: Remove special case for GCC < 4.6Philippe Mathieu-Daudé2020-12-151-1/+1
| | | | | | | | | | | | | | | | | Since commit efc6c070aca ("configure: Add a test for the minimum compiler version") the minimum compiler version required for GCC is 4.8. We can safely remove the special case for GCC 4.6 introduced in commit 0448f5f8b81 ("cpu-exec: Fix compiler warning (-Werror=clobbered)"). No change for Clang as we don't know. Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Message-Id: <20201210134752.780923-3-marcandre.lureau@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* icount: improve exec nocache usagePavel Dovgalyuk2020-12-151-1/+1
| | | | | | | | | | | | | | | cpu-exec tries to execute TB without caching when current icount budget is over. But sometimes refilled budget is big enough to try executing cached blocks. This patch checks that instruction budget is big enough for next block execution instead of just running cpu_exec_nocache. It halves the number of calls of cpu_exec_nocache function during tested OS boot scenario. Signed-off-by: Pavel Dovgalyuk <pavel.dovgalyuk@ispras.ru> Message-Id: <160741865825.348476.7169239332367828943.stgit@pasha-ThinkPad-X280> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* replay: don't record interrupt pollPavel Dovgalyuk2020-10-061-3/+18
| | | | | | | | | | | | | | | | | | | | Interrupt poll is not a real interrupt event. It is needed only for thread safety. This interrupt is used for i386 and converted to hardware interrupt by cpu_handle_interrupt function. Therefore it is not needed to be recorded, because hardware interrupt will be recorded after converting. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgalyuk@ispras.ru> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> -- v4 changes: - Condition check refactoring (suggested by Alex Bennée) Message-Id: <160174517124.12451.12983410242461131737.stgit@pasha-ThinkPad-X280> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* icount: rename functions to be consistent with the module nameClaudio Fontana2020-10-051-3/+3
| | | | | | | Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* cpu-timers, icount: new modulesClaudio Fontana2020-10-051-5/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | refactoring of cpus.c continues with cpu timer state extraction. cpu-timers: responsible for the softmmu cpu timers state, including cpu clocks and ticks. icount: counts the TCG instructions executed. As such it is specific to the TCG accelerator. Therefore, it is built only under CONFIG_TCG. One complication is due to qtest, which uses an icount field to warp time as part of qtest (qtest_clock_warp). In order to solve this problem, provide a separate counter for qtest. This requires fixing assumptions scattered in the code that qtest_enabled() implies icount_enabled(), checking each specific case. Signed-off-by: Claudio Fontana <cfontana@suse.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> [remove redundant initialization with qemu_spice_init] Reviewed-by: Alex Bennée <alex.bennee@linaro.org> [fix lingering calls to icount_get] Signed-off-by: Claudio Fontana <cfontana@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qemu/atomic.h: rename atomic_ to qatomic_Stefan Hajnoczi2020-09-231-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | clang's C11 atomic_fetch_*() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int *' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic\(64\)\?_[a-z0-9_]\+' include/qemu/atomic.h | \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>
* tcg/cpu-exec: precise single-stepping after an interruptRichard Henderson2020-07-171-1/+7
| | | | | | | | | | | | When single-stepping with a debugger attached to QEMU, and when an interrupt is raised, the debugger misses the first instruction after the interrupt. Tested-by: Luc Michel <luc.michel@greensocs.com> Reviewed-by: Luc Michel <luc.michel@greensocs.com> Buglink: https://bugs.launchpad.net/qemu/+bug/757702 Message-Id: <20200717163029.2737546-1-richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg/cpu-exec: precise single-stepping after an exceptionLuc Michel2020-07-161-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When single-stepping with a debugger attached to QEMU, and when an exception is raised, the debugger misses the first instruction after the exception: $ qemu-system-aarch64 -M virt -display none -cpu cortex-a53 -s -S $ aarch64-linux-gnu-gdb GNU gdb (GDB) 9.2 [...] (gdb) tar rem :1234 Remote debugging using :1234 warning: No executable has been specified and target does not support determining executable automatically. Try using the "file" command. 0x0000000000000000 in ?? () (gdb) # writing nop insns to 0x200 and 0x204 (gdb) set *0x200 = 0xd503201f (gdb) set *0x204 = 0xd503201f (gdb) # 0x0 address contains 0 which is an invalid opcode. (gdb) # The CPU should raise an exception and jump to 0x200 (gdb) si 0x0000000000000204 in ?? () With this commit, the same run steps correctly on the first instruction of the exception vector: (gdb) si 0x0000000000000200 in ?? () Buglink: https://bugs.launchpad.net/qemu/+bug/757702 Signed-off-by: Luc Michel <luc.michel@greensocs.com> Message-Id: <20200716193947.3058389-1-luc.michel@greensocs.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/tcg: fix race in cpu_exec_step_atomic (bug 1863025)Alex Bennée2020-02-281-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug describes a race whereby cpu_exec_step_atomic can acquire a TB which is invalidated by a tb_flush before we execute it. This doesn't affect the other cpu_exec modes as a tb_flush by it's nature can only occur on a quiescent system. The race was described as: B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code B3. tcg_tb_alloc obtains a new TB C3. TB obtained with tb_lookup__cpu_state or tb_gen_code (same TB as B2) A3. start_exclusive critical section entered A4. do_tb_flush is called, TB memory freed/re-allocated A5. end_exclusive exits critical section B2. tcg_cpu_exec => cpu_exec => tb_find => tb_gen_code B3. tcg_tb_alloc reallocates TB from B2 C4. start_exclusive critical section entered C5. cpu_tb_exec executes the TB code that was free in A4 The simplest fix is to widen the exclusive period to include the TB lookup. As a result we can drop the complication of checking we are in the exclusive region before we end it. Cc: Yifan <me@yifanlu.com> Buglink: https://bugs.launchpad.net/qemu/+bug/1863025 Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20200214144952.15502-1-alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* tcg: Search includes from the project root source directoryPhilippe Mathieu-Daudé2020-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently search both the root and the tcg/ directories for tcg files: $ git grep '#include "tcg/' | wc -l 28 $ git grep '#include "tcg[^/]' | wc -l 94 To simplify the preprocessor search path, unify by expliciting the tcg/ directory. Patch created mechanically by running: $ for x in \ tcg.h tcg-mo.h tcg-op.h tcg-opc.h \ tcg-op-gvec.h tcg-gvec-desc.h; do \ sed -i "s,#include \"$x\",#include \"tcg/$x\"," \ $(git grep -l "#include \"$x\""); \ done Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts) Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-Id: <20200101112303.20724-2-philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* qemu_log_lock/unlock now preserves the qemu_logfile handle.Robert Foley2019-12-181-2/+2
| | | | | | | | | | | | | | | | qemu_log_lock() now returns a handle and qemu_log_unlock() receives a handle to unlock. This allows for changing the handle during logging and ensures the lock() and unlock() are for the same file. Also in target/tilegx/translate.c removed the qemu_log_lock()/unlock() calls (and the log("\n")), since the translator can longjmp out of the loop if it attempts to translate an instruction in an inaccessible page. Signed-off-by: Robert Foley <robert.foley@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20191118211528.3221-5-robert.foley@linaro.org>
* tcg: let plugins instrument virtual memory accessesEmilio G. Cota2019-10-281-0/+3
| | | | | | | | | | | | To capture all memory accesses we need hook into all the various helper functions that are involved in memory operations as well as the injected inline helper calls. A later commit will allow us to resolve the actual guest HW addresses by replaying the lookup. Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: drop haddr handling, just deal in vaddr] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
* cpu: introduce cpu_in_exclusive_context()Emilio G. Cota2019-10-281-4/+1
| | | | | | | | | Suggested-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> [AJB: moved inside start/end_exclusive fns + cleanup] Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
* icount: clean up cpu_can_io at the entry to the blockPavel Dovgalyuk2019-08-201-1/+0
| | | | | | | | | | | | | | | | | | | | Most of IO instructions can be executed only at the end of the block in icount mode. Therefore translator can set cpu_can_io flag when translating the last instruction. But when the blocks are chained, then this flag is not reset and may remain set at the beginning of the next block. This patch resets the flag at the entry of any translation block, making I/O operations impossible by default. Signed-off-by: Pavel Dovgalyuk <Pavel.Dovgaluk@ispras.ru> -- v2 changes: - reset can_do_io at the start of every TB (suggested by Paolo Bonzini) Message-Id: <156404428943.18669.15747009371169578935.stgit@pasha-Precision-3630-Tower> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Include qemu-common.h exactly where neededMarkus Armbruster2019-06-121-0/+2
| | | | | | | | | | | | | | | | No header includes qemu-common.h after this commit, as prescribed by qemu-common.h's file comment. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-Id: <20190523143508.25387-5-armbru@redhat.com> [Rebased with conflicts resolved automatically, except for include/hw/arm/xlnx-zynqmp.h hw/arm/nrf51_soc.c hw/arm/msf2-soc.c block/qcow2-refcount.c block/qcow2-cluster.c block/qcow2-cache.c target/arm/cpu.h target/lm32/cpu.h target/m68k/cpu.h target/mips/cpu.h target/moxie/cpu.h target/nios2/cpu.h target/openrisc/cpu.h target/riscv/cpu.h target/tilegx/cpu.h target/tricore/cpu.h target/unicore32/cpu.h target/xtensa/cpu.h; bsd-user/main.c and net/tap-bsd.c fixed up]
* cpu: Move icount_decr to CPUNegativeOffsetStateRichard Henderson2019-06-101-11/+12
| | | | | | | | | | | | | Amusingly, we had already ignored the comment to keep this value at the end of CPUState. This restores the minimum negative offset from TCG_AREG0 for code generation. For the couple of uses within qom/cpu.c, without NEED_CPU_H, add a pointer from the CPUState object to the IcountDecr object within CPUNegativeOffsetState. Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190206' into stagingPeter Maydell2019-02-071-3/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Queued accel/tcg patches # gpg: Signature made Wed 06 Feb 2019 03:42:52 GMT # gpg: using RSA key 64DF38E8AF7E215F # gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full] # Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A 05C0 64DF 38E8 AF7E 215F * remotes/rth/tags/pull-tcg-20190206: accel/tcg: Consider cluster index in tb_lookup__cpu_state() tcg: add early clober modifier in atomic16_cmpxchg on aarch64 Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * accel/tcg: Consider cluster index in tb_lookup__cpu_state()Peter Maydell2019-02-061-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit f7b78602fdc6c6e4be we added the CPU cluster number to the cflags field of the TB hash; this included adding it to the value kept in tb->cflags, since we pass that field directly into the hash calculation in some places. Unfortunately we forgot to check whether other parts of the code were doing comparisons against tb->cflags that would need to be updated. It turns out that there is exactly one such place: the tb_lookup__cpu_state() function checks whether the TB it has found in the tb_jmp_cache has a tb->cflags matching the cf_mask that is passed in. The tb->cflags has the cluster_index in it but the cf_mask does not. Hoist the "add cluster index to the cf_mask" code up from tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered in the "did this TB match in the jmp cache" condition, as well as when we do the full hash lookup by physical PC, flags, etc. (tb_htable_lookup() is only called from tb_lookup__cpu_state(), so this change doesn't require any further knock-on changes.) Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash") Tested-by: Cleber Rosa <crosa@redhat.com> Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk> Reported-by: Howard Spoelstra <hsp.cat7@gmail.com> Reported-by: Cleber Rosa <crosa@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20190205151810.571-1-peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* | cpu-exec: reset BQL after longjmp in cpu_exec_step_atomicEmilio G. Cota2019-02-051-0/+3
| | | | | | | | | | | | | | | | | | | | Just like we do in cpu_exec(). Reported-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | cpu-exec: add assert_no_pages_locked() after longjmpEmilio G. Cota2019-02-051-0/+1
|/ | | | | | | | | | We forgot to add this check in faa9372c07 ("translate-all: introduce assert_no_pages_locked", 2018-06-15); we only added it after returning from a longjmp in cpu_exec_step_atomic. Fix it. Signed-off-by: Emilio G. Cota <cota@braap.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* tcg: Fix LGPL version numberThomas Huth2019-01-301-1/+1
| | | | | | | | | | | | It's either "GNU *Library* General Public version 2" or "GNU Lesser General Public version *2.1*", but there was no "version 2.0" of the "Lesser" library. So assume that version 2.1 is meant here. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
* accel/tcg: Add cluster number to TCG TB hashPeter Maydell2019-01-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | Include the cluster number in the hash we use to look up TBs. This is important because a TB that is valid for one cluster at a given physical address and set of CPU flags is not necessarily valid for another: the two clusters may have different views of physical memory, or may have different CPU features (eg FPU present or absent). We put the cluster number in the high 8 bits of the TB cflags. This gives us up to 256 clusters, which should be enough for anybody. If we ever need more, or need more bits in cflags for other purposes, we could make tb_hash_func() take more data (and expand qemu_xxhash7() to qemu_xxhash8()). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com> Message-id: 20190121152218.9592-4-peter.maydell@linaro.org
* tcg: Implement CPU_LOG_TB_NOCHAIN during expansionRichard Henderson2018-10-181-1/+1
| | | | | | | Rather than test NOCHAIN before linking, do not emit the goto_tb opcode at all. We already do this for goto_ptr. Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* accel/tcg: Handle get_page_addr_code() returning -1 in hashtable lookupsPeter Maydell2018-08-141-0/+3
| | | | | | | | | | | | | | | | | When we support execution from non-RAM MMIO regions, get_page_addr_code() will return -1 to indicate that there is no RAM at the requested address. Handle this in the cpu-exec TB hashtable lookup code, treating it as "no match found". Note that the call to get_page_addr_code() in tb_lookup_cmp() needs no changes -- a return of -1 will already correctly result in the function returning false. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Emilio G. Cota <cota@braap.org> Tested-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180710160013.26559-3-peter.maydell@linaro.org
* tcg: remove tb_lockEmilio G. Cota2018-06-151-26/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use mmap_lock in user-mode to protect TCG state and the page descriptors. In !user-mode, each vCPU has its own TCG state, so no locks needed. Per-page locks are used to protect the page descriptors. Per-TB locks are used in both modes to protect TB jumps. Some notes: - tb_lock is removed from notdirty_mem_write by passing a locked page_collection to tb_invalidate_phys_page_fast. - tcg_tb_lookup/remove/insert/etc have their own internal lock(s), so there is no need to further serialize access to them. - do_tb_flush is run in a safe async context, meaning no other vCPU threads are running. Therefore acquiring mmap_lock there is just to please tools such as thread sanitizer. - Not visible in the diff, but tb_invalidate_phys_page already has an assert_memory_lock. - cpu_io_recompile is !user-only, so no mmap_lock there. - Added mmap_unlock()'s before all siglongjmp's that could be called in user-mode while mmap_lock is held. + Added an assert for !have_mmap_lock() after returning from the longjmp in cpu_exec, just like we do in cpu_exec_step_atomic. Performance numbers before/after: Host: AMD Opteron(tm) Processor 6376 ubuntu 17.04 ppc64 bootup+shutdown time 700 +-+--+----+------+------------+-----------+------------*--+-+ | + + + + + *B | | before ***B*** ** * | |tb lock removal ###D### *** | 600 +-+ *** +-+ | ** # | | *B* #D | | *** * ## | 500 +-+ *** ### +-+ | * *** ### | | *B* # ## | | ** * #D# | 400 +-+ ** ## +-+ | ** ### | | ** ## | | ** # ## | 300 +-+ * B* #D# +-+ | B *** ### | | * ** #### | | * *** ### | 200 +-+ B *B #D# +-+ | #B* * ## # | | #* ## | | + D##D# + + + + | 100 +-+--+----+------+------------+-----------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/HwmBHXe debian jessie aarch64 bootup+shutdown time 90 +-+--+-----+-----+------------+------------+------------+--+-+ | + + + + + + | | before ***B*** B | 80 +tb lock removal ###D### **D +-+ | **### | | **## | 70 +-+ ** # +-+ | ** ## | | ** # | 60 +-+ *B ## +-+ | ** ## | | *** #D | 50 +-+ *** ## +-+ | * ** ### | | **B* ### | 40 +-+ **** # ## +-+ | **** #D# | | ***B** ### | 30 +-+ B***B** #### +-+ | B * * # ### | | B ###D# | 20 +-+ D ##D## +-+ | D# | | + + + + + + | 10 +-+--+-----+-----+------------+------------+------------+--+-+ 1 8 16 Guest CPUs 48 64 png: https://imgur.com/iGpGFtv The gains are high for 4-8 CPUs. Beyond that point, however, unrelated lock contention significantly hurts scalability. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* translate-all: protect TB jumps with a per-destination-TB lockEmilio G. Cota2018-06-151-14/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This applies to both user-mode and !user-mode emulation. Instead of relying on a global lock, protect the list of incoming jumps with tb->jmp_lock. This lock also protects tb->cflags, so update all tb->cflags readers outside tb->jmp_lock to use atomic reads via tb_cflags(). In order to find the destination TB (and therefore its jmp_lock) from the origin TB, we introduce tb->jmp_dest[]. I considered not using a linked list of jumps, which simplifies code and makes the struct smaller. However, it unnecessarily increases memory usage, which results in a performance decrease. See for instance these numbers booting+shutting down debian-arm: Time (s) Rel. err (%) Abs. err (s) Rel. slowdown (%) ------------------------------------------------------------------------------ before 20.88 0.74 0.154512 0. after 20.81 0.38 0.079078 -0.33524904 GTree 21.02 0.28 0.058856 0.67049808 GHashTable + xxhash 21.63 1.08 0.233604 3.5919540 Using a hash table or a binary tree to keep track of the jumps doesn't really pay off, not only due to the increased memory usage, but also because most TBs have only 0 or 1 jumps to them. The maximum number of jumps when booting debian-arm that I measured is 35, but as we can see in the histogram below a TB with that many incoming jumps is extremely rare; the average TB has 0.80 incoming jumps. n_jumps: 379208; avg jumps/tb: 0.801099 dist: [0.0,1.0)|▄█▁▁▁▁▁▁▁▁▁▁▁ ▁▁▁▁▁▁ ▁▁▁ ▁▁▁ ▁|[34.0,35.0] Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* translate-all: discard TB when tb_link_page returns an existing matching TBEmilio G. Cota2018-06-151-12/+2
| | | | | | | | | | | Use the recently-gained QHT feature of returning the matching TB if it already exists. This allows us to get rid of the lookup we perform right after acquiring tb_lock. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
* translate-all: introduce assert_no_pages_lockedEmilio G. Cota2018-06-151-0/+1
| | | | | | | | | | The appended adds assertions to make sure we do not longjmp with page locks held. Note that user-mode has nothing to check, since page_locks are !user-mode only. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>