summary refs log tree commit diff stats
path: root/rust/qemu-api/src (unfollow)
Commit message (Collapse)AuthorFilesLines
2025-07-11migration/postcopy: Cache the tid->vcpu mapping for blocktimePeter Xu2-12/+59
Looking up the vCPU index for each fault can be expensive when there're hundreds of vCPUs. Provide a cache for tid->vcpu instead with a hash table, then lookup from there. When at it, add another counter to record how many non-vCPU faults it gets. For example, the main thread can also access a guest page that was missing. These kind of faults are not accounted by blocktime so far. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Initialize blocktime context only until listenPeter Xu1-5/+10
Before this patch, the blocktime context can be created very early, because postcopy_ram_supported_by_host() <- migrate_caps_check() can happen during migration object init. The trick here is the blocktime context needs system vCPU information, which seems to be possible to change after that point. I didn't verify it, but it doesn't sound right. Now move it out and initialize the context only when postcopy listen starts. That is already during a migration so it should be guaranteed the vCPU topology can never change on both sides. While at it, assert that the ctx isn't created instead this time; the old "if" trick isn't needed when we're sure it will only happen once now. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Report fault latencies in blocktimePeter Xu4-37/+102
Blocktime so far only cares about the time one vcpu (or the whole system) got blocked. It would be also be helpful if it can also report the latency of page requests, which could be very sensitive during postcopy. Blocktime itself is sometimes not very important, especially when one thinks about KVM async PF support, which means vCPUs are literally almost not blocked at all because the guest OS is smart enough to switch to another task when a remote fault is needed. However, latency is still sensitive and important because even if the guest vCPU is running on threads that do not need a remote fault, the workload that accesses some missing page is still affected. Add two entries to the report, showing how long it takes to resolve a remote fault. Mention in the QAPI doc that this is not the real average fault latency, but only the ones that was requested for a remote fault. Unwrap get_vcpu_blocktime_list() so we don't need to walk the list twice, meanwhile add the entry checks in qtests for all postcopy tests. Cc: Markus Armbruster <armbru@redhat.com> Cc: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Fabiano Rosas <farosas@suse.de> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613141217.474825-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Add blocktime fault counts per-vcpuPeter Xu1-0/+5
Add a field to count how many remote faults one vCPU has taken. So far it's still not used, but will be soon. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Bring blocktime layer to ns levelPeter Xu1-12/+16
With 64-bit fields, it is trivial. The caution is when exposing any values in QMP, it was still declared with milliseconds (ms). Hence it's needed to do the convertion when exporting the values to existing QMP queries. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Drop PostcopyBlocktimeContext.start_timePeter Xu1-6/+4
Now with 64bits, the offseting using start_time is not needed anymore, because the array can always remember the whole timestamp. Then drop the unused parameter in get_low_time_offset() altogether. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Make all blocktime vars 64bitsPeter Xu2-27/+27
I am guessing it was used to be 32bits because of the atomic ops. Now all the atomic ops are gone and we're protected by a mutex instead, it's ok we can switch to 64 bits. Reasons to move over: - Allow further patches to change the unit from ms to us: with postcopy preempt mode, we're really into hundreds of microseconds level on blocktime. We'd better be able to trap those. - This also paves way for some other tricks that the original version used to avoid overflows, e.g., start_time was almost only useful before to make sure the sampled timestamp won't overflow a 32-bit field. - This prepares further reports on top of existing data collected, e.g. average page fault latencies. When average operation is taken into account, milliseconds are simply too coarse grained. When at it: - Rename page_fault_vcpu_time to vcpu_blocktime_start. - Rename vcpu_blocktime to vcpu_blocktime_total. - Touch up the trace-events to not dump blocktime ctx pointer Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Drop all atomic ops in blocktime featurePeter Xu1-13/+10
Now with the mutex protection it's not needed anymore. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Push blocktime start/end into page req mutexPeter Xu5-39/+48
The postcopy blocktime feature was tricky that it used quite some atomic operations over quite a few arrays and vars, without explaining how that would be thread safe. The thread safety here is about concurrency between the fault thread and the fault resolution threads, possible to access the same chunk of data. All these atomic ops can be expensive too before knowing clearly how it works. OTOH, postcopy has one page_request_mutex used to serialize the received bitmap updates. So far it's ok - we don't yet have a lot of threads contending the lock. It might change after multifd will be supported, but that's a separate story. What is important is, with that mutex, it's pretty lightweight to move all the blocktime maintenance into the mutex critical section. It's because the blocktime layer is lightweighted: almost "remember which vcpu faulted on which address", and "ok we get some fault resolved, calculate how long it takes". It's also an optional feature for now (but I have thought of changing that, maybe in the future). Let's push the blocktime layer into the mutex, so that it's always thread-safe even without any atomic ops. To achieve that, I'll need to add a tid parameter on fault path so that it'll start to pass the faulted thread ID into deeper the stack, but not too deep. When at it, add a comment for the shared fault handler (for example, vhost-user devices running with postcopy), to mention a TODO. One reason it might not be trivial is that vhost-user's userfaultfds should be opened by vhost-user process, so it's pretty hard to control making sure the TID feature will be around. It wasn't supported before, so keep it like that for now. Now we should be as ease when everything is protected by a mutex that we always take anyway. One side effect: we can finally remove one ramblock_recv_bitmap_test() in mark_postcopy_blocktime_begin(), which was pretty weird and which also includes a weird (but maybe necessary.. but maybe not?) operation to inject a blocktime entry then quickly erase it.. When we're with the mutex, and when we make sure it's invoked after checking the receive bitmap, it's not needed anymore. Instead, we assert. As another side effect, this paves way for removing all atomic ops in all the mem accesses in blocktime layer. Note that we need a stub for mark_postcopy_blocktime_begin() for Windows builds. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration: Add option to set postcopy-blocktimePeter Xu1-0/+2
Add a global property to allow enabling postcopy-blocktime feature. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613141217.474825-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/postcopy: Avoid clearing dirty bitmap for postcopy tooPeter Xu1-1/+3
This is a follow up on the other commit "migration/ram: avoid to do log clear in the last round" but for postcopy. https://lore.kernel.org/r/20250514115827.3216082-1-yanfei.xu@bytedance.com I can observe more than 10% reduction of average page fault latency during postcopy phase with this optimization: Before: 268.00us (+-1.87%) After: 232.67us (+-2.01%) The test was done with a 16GB VM with 80 vCPUs, running a workload that busy random writes to 13GB memory. Cc: Yanfei Xu <yanfei.xu@bytedance.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration: Rewrite the migration complete detect logicPeter Xu1-15/+42
There're a few things off here in that logic, rewrite it. When at it, add rich comment to explain each of the decisions. Since this is very sensitive path for migration, below are the list of things changed with their reasonings. (1) Exact pending size is only needed for precopy not postcopy Fundamentally it's because "exact" version only does one more deep sync to fetch the pending results, while in postcopy's case it's never going to sync anything more than estimate as the VM on source is stopped. (2) Do _not_ rely on threshold_size anymore to decide whether postcopy should complete threshold_size was calculated from the expected downtime and bandwidth only during precopy as an efficient way to decide when to switchover. It's not sensible to rely on threshold_size in postcopy. For precopy, if switchover is decided, the migration will complete soon. It's not true for postcopy. Logically speaking, postcopy should only complete the migration if all pending data is flushed. Here it used to work because save_complete() used to implicitly contain save_live_iterate() when there's pending size. Even if that looks benign, having RAMs to be migrated in postcopy's save_complete() has other bad side effects: (a) Since save_complete() needs to be run once at a time, it means when moving RAM there's no way moving other things (rather than round-robin iterating the vmstate handlers like what we do with ITERABLE phase). Not an immediate concern, but it may stop working in the future when there're more than one iterables (e.g. vfio postcopy). (b) postcopy recovery, unfortunately, only works during ITERABLE phase. IOW, if src QEMU moves RAM during postcopy's save_complete() and network failed, then it'll crash both QEMUs... OTOH if it failed during iteration it'll still be recoverable. IOW, this change should further reduce the window QEMU split brain and crash in extreme cases. If we enable the ram_save_complete() tracepoints, we'll see this before this patch: 1267959@1748381938.294066:ram_save_complete dirty=9627, done=0 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 It means in this migration there're 9627 pages migrated at complete() of postcopy phase. After this change, all the postcopy RAM should be migrated in iterable phase, rather than save_complete(): 1267959@1748381938.294066:ram_save_complete dirty=0, done=0 1267959@1748381938.308884:ram_save_complete dirty=0, done=1 (3) Adjust when to decide to switch to postcopy This shouldn't be super important, the movement makes sure there's only one in_postcopy check, then we are clear on what we do with the two completely differnt use cases (precopy v.s. postcopy). (4) Trivial touch up on threshold_size comparision Which changes: "(!pending_size || pending_size < s->threshold_size)" into: "(pending_size <= s->threshold_size)" Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/ram: Add tracepoints for ram_save_complete()Peter Xu2-0/+6
Take notes on start/end state of dirty pages for the whole system. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/ram: One less indent for ram_find_and_save_block()Peter Xu1-9/+11
The check over PAGE_DIRTY_FOUND isn't necessary. We could indent one less and assert that instead. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration: qemu_savevm_complete*() helpersPeter Xu1-34/+44
Since we use the same save_complete() hook for both precopy and postcopy, add a set of helpers to invoke the hook() to dedup the code. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration: Rename save_live_complete_precopy to save_completePeter Xu9-20/+20
Now after merging the precopy and postcopy version of complete() hook, rename the precopy version from save_live_complete_precopy() to save_complete(). Dropping the "live" when at it, because it's in most cases not live when happening (in precopy). No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-7-peterx@redhat.com [peterx: squash the fixup that covers a few more doc spots, per Juraj] Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration: Drop save_live_complete_postcopy hookPeter Xu4-23/+12
The hook is only defined in two vmstate users ("ram" and "block dirty bitmap"), meanwhile both of them define the hook exactly the same as the precopy version. Hence, this postcopy version isn't needed. No functional change intended. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/bg-snapshot: Do not check for SKIP in iteratorPeter Xu1-3/+2
It's not possible to happen in bg-snapshot case. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/docs: Move docs for postcopy blocktime featurePeter Xu1-19/+17
Move it out of vanilla postcopy session, but instead a standalone feature. When at it, removing the NOTE because it's incorrect now after introduction of max-postcopy-bandwidth, which can control the throughput even for postcopy phase. Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/hmp: Fix postcopy-blocktime per-vCPU resultsPeter Xu1-9/+13
Unfortunately, it was never correctly shown.. This is only found when I started to look into making the blocktime feature more useful (so as to avoid using bpftrace, even though I'm not sure which one will be harder to use..). So the old dump would look like this: Postcopy vCPU Blocktime: 0-1,4,10,21,33,46,48,59 Even though there're actually 40 vcpus, and the string will merge same elements and also sort them. To fix it, simply loop over the uint32List manually. Now it looks like: Postcopy vCPU Blocktime (ms): [15, 0, 0, 43, 29, 34, 36, 29, 37, 41, 33, 37, 45, 52, 50, 38, 40, 37, 40, 49, 40, 35, 35, 35, 81, 19, 18, 19, 18, 30, 22, 3, 0, 0, 0, 0, 0, 0, 0, 0] Cc: Dr. David Alan Gilbert <dave@treblig.org> Cc: Alexey Perevalov <a.perevalov@samsung.com> Cc: Markus Armbruster <armbru@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20250613140801.474264-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-11migration/hmp: Reorg "info migrate" once morePeter Xu1-28/+31
Dave suggested the HMP output for "info migrate" can not only leverage the lines but also better grouping: https://lore.kernel.org/r/aC4_-nMc7FwsMf9p@gallifrey I followed Dave's suggestion, and some more modifications on top: - Added all elements into the picture - Use size_to_str() and drop most of the units: benefit is more friendly to most human eyes, bad side effect is lose of details, but that should be corner case per my uses, and one can still leverage the QMP interface when necessary. - Sub-grouping for "Transfers" ("Channels" and "Page Types"). - Better indentations Sample output: (qemu) info migrate Status: postcopy-active Time (ms): total=47317, setup=5, down=8 RAM info: Throughput (Mbps): 1342.83 Sizes: pagesize=4 KiB, total=4.02 GiB Transfers: transferred=1.41 GiB, remain=2.46 GiB Channels: precopy=15.2 MiB, multifd=0 B, postcopy=1.39 GiB Page Types: normal=367713, zero=41195 Page Rates (pps): transfer=40900, dirty=4 Others: dirty_syncs=2, postcopy_req=57503 Suggested-by: Dr. David Alan Gilbert <dave@treblig.org> Tested-by: Li Zhijian <lizhijian@fujitsu.com> Reviewed-by: Li Zhijian <lizhijian@fujitsu.com> Acked-by: Dr. David Alan Gilbert <dave@treblig.org> Reviewed-by: Juraj Marcin <jmarcin@redhat.com> Tested-by: Mario Casquero <mcasquer@redhat.com> Link: https://lore.kernel.org/r/20250613140801.474264-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de>
2025-07-04linux-user/aarch64: Set hwcap bits for SME2p1/SVE2p1Richard Henderson1-0/+8
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-108-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Enable FEAT_SME2p1 on -cpu maxRichard Henderson2-2/+14
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-107-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement SME2 BFMOPA (non-widening)Peter Maydell4-0/+64
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-106-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement FMOPA (non-widening) for fp16Peter Maydell4-0/+63
Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-105-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Support FPCR.AH in SME FMOPS, BFMOPSRichard Henderson4-33/+161
For non-widening, we can use float_muladd_negate_product, For widening, which uses dot-product, we need to handle the negation explicitly. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-104-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Rename BFMOPA to BFMOPA_wPeter Maydell4-5/+5
Our current BFMOPA opcode pattern is the widening version of the insn. Rename it to BFMOPA_w, to make way for the non-widening version added in SME2. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-103-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Rename FMOPA_h to FMOPA_w_hPeter Maydell4-6/+6
The pattern we currently have as FMOPA_h is the "widening" insn that takes fp16 inputs and produces single-precision outputs. This is unlike FMOPA_s and FMOPA_d, which are non-widening produce outputs the same size as their inputs. SME2 introduces a non-widening fp16 FMOPA operation; rename FMOPA_h to FMOPA_w_h (for 'widening'), so we can use FMOPA_h for the non-widening version, giving it a name in line with the other non-widening ops FMOPA_s and FMOPA_d. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-102-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement LUTI2, LUTI4 for SME2/SME2p1Richard Henderson4-0/+210
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-101-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement MOVAZ for SME2p1Richard Henderson4-11/+137
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-100-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement LD1Q, ST1Q for SVE2p1Richard Henderson4-2/+62
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-99-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement {LD, ST}[234]Q for SME2p1/SVE2p1Richard Henderson4-31/+156
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-98-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Move ld1qq and st1qq primitives to sve_ldst_internal.hRichard Henderson2-38/+69
Move from sme_helper.c to the shared header. Add a comment noting the lack of atomicity. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-97-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement {LD1, ST1}{W, D} (128-bit element) for SVE2p1Richard Henderson5-27/+183
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-96-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Split the ST_zpri and ST_zprr patternsRichard Henderson1-8/+18
The msz > esz encodings are reserved, and some of them are about to be reused. Split these patterns so that the new insns do not overlap. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-95-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement SME2 counted predicate register load/storeRichard Henderson4-0/+662
Implement the SVE2p1 consecutive register LD1/ST1, and the SME2 strided register LD1/ST1. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-94-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement TBLQ, TBXQ for SME2p1/SVE2p1Richard Henderson4-0/+37
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-93-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement ZIPQ, UZPQ for SME2p1/SVE2p1Richard Henderson4-1/+63
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-92-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement PMOV for SME2p1/SVE2p1Richard Henderson5-0/+207
Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-91-richard.henderson@linaro.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement EXTQ for SME2p1/SVE2p1Richard Henderson2-0/+51
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-90-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement DUPQ for SME2p1/SVE2p1Richard Henderson2-0/+27
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-89-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement CNTP (predicate as counter) for SME2/SVE2p1Richard Henderson4-1/+54
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-88-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement BFMLSLB{L, T} for SME2/SVE2p1Richard Henderson2-0/+36
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-87-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement FADDQV, F{MIN, MAX}{NM}QV for SVE2p1Richard Henderson4-27/+148
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-86-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement ANDQV, ORQV, EORQV for SVE2p1Richard Henderson4-0/+65
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-85-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement SME2 SELRichard Henderson4-0/+362
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-84-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement SVE2p1 PEXTRichard Henderson5-0/+146
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-83-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement {ADD, SMIN, SMAX, UMIN, UMAX}QV for SVE2p1Richard Henderson4-0/+113
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-82-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement SVE2p1 PTRUE (predicate as counter)Richard Henderson2-0/+17
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-81-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2025-07-04target/arm: Implement SVE2p1 WHILE (predicate as counter)Richard Henderson4-5/+84
Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Message-id: 20250704142112.1018902-80-richard.henderson@linaro.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>