summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | test-bdrv-drain: Don't call bdrv_graph_wrlock() in coroutine contextKevin Wolf2023-10-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AIO callbacks are effectively coroutine_mixed_fn. If AIO requests don't return immediately, their callback is called from the request coroutine. This means that in AIO callbacks, we can't call no_coroutine_fns such as bdrv_graph_wrlock(). Unfortunately test-bdrv-drain does so. Change the test to use a BH to drop out of coroutine context, and add coroutine_mixed_fn and no_coroutine_fn markers to clarify the context each function runs in. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: convert more bdrv_is_allocated* and bdrv_block_status* calls to ↵Paolo Bonzini2023-10-128-32/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | coroutine versions Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-5-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: switch to co_wrapper for bdrv_is_allocated_*Paolo Bonzini2023-10-122-51/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-4-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: complete public block status APIPaolo Bonzini2023-10-122-19/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include both coroutine and non-coroutine versions, the latter being co_wrapper_mixed_bdrv_rdlock of the former. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-3-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: rename the bdrv_co_block_status static functionPaolo Bonzini2023-10-121-10/+11
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdrv_block_status exists as a wrapper for bdrv_block_status_above, but the name of the (hypothetical) coroutine version, bdrv_co_block_status, is squatted by a random static function. Rename it to bdrv_co_do_block_status. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-2-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | | Merge tag 'pull-shadow-2023-10-12' of https://repo.or.cz/qemu/armbru into ↵Stefan Hajnoczi2023-10-166-19/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging -Wshadow=local patches for 2023-10-12 # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmUoCNsSHGFybWJydUBy # ZWRoYXQuY29tAAoJEDhwtADrkYZTTocP/iQ6RggqcHrBxwZZtyydvpWCFrqfuBTk # 6GQtKGm51UcQ9kmAIsoV90pOzdUdjwrpXzKKJwsLzMcVcp1NDPsQIL54wdsRmZfH # E9mxI7UlZf/KWzrfP1nFLcU8T5+cuXosDgjx55Y1Kq+ZRn+7x0DInBGdRryokWTG # zcKh9T3n9KWKscLL7hvxLZS5054V9HBDYIpBBEyV2GtRrCLL0Y+9aaKkBrejHMgY # oKrLKHz1cOGOTzQ7AbhA+Wv3eN+GYVyjnCSUXK/270jbU8Xg4m1vSbrPq2PWy5kV # IGGKZtZsrSq0VBoTi+i9++vP5djKVUYQLqx10L+NYCp25wBnTgXKSDtdAqI68aev # TYrOlQ1ldKXJT4ghPqoWCjRKkryV6/Gj9fHbbvsHJ7SB84VO8G/kpn5zXvN/BosG # 8vxLEL0xc1Q3Sxi91DCjVsP7UebjBt1j/JugU9zVr8OFJWriFmllYB67AOOo3gS2 # c+FNVPLle3udw5EHClMapcGSzTun4iHeEsiJMOOgGOHC09Bi+Om6LlneFWljmvQp # a6ma+bebxCjzuO6heey2Q/1JjltR8Ex0bnbWIoNsysA6OnDtTlbxDqZEca1h6As+ # Rm9XFKf7nVQIHFKW3sjbx6MgqAL6sBakfeJah5Pj5iIKtLaZR591RyAfvfB2sBlS # ZYtp95GIKWXZ # =AArx # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 Oct 2023 10:55:23 EDT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * tag 'pull-shadow-2023-10-12' of https://repo.or.cz/qemu/armbru: target/i386: fix shadowed variable pasto contrib/vhost-user-gpu: Fix compiler warning when compiling with -Wshadow hw/virtio/virtio-gpu: Fix compiler warning when compiling with -Wshadow libvhost-user: Fix compiler warning with -Wshadow=local libvduse: Fix compiler warning with -Wshadow=local Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | | target/i386: fix shadowed variable pastoPaolo Bonzini2023-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a908985971a ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26) failed to use the tss_selector argument of the new function, which was therefore unused. This shows up as a #GP fault when booting old versions of 32-bit Linux. Fixes: a908985971a ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20231011135350.438492-1-pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | contrib/vhost-user-gpu: Fix compiler warning when compiling with -WshadowThomas Huth2023-10-122-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename some variables to avoid compiler warnings when compiling with -Wshadow=local. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231009083726.30301-1-thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | hw/virtio/virtio-gpu: Fix compiler warning when compiling with -WshadowThomas Huth2023-10-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid using trivial variable names in macros, otherwise we get the following compiler warning when compiling with -Wshadow=local: In file included from ../../qemu/hw/display/virtio-gpu-virgl.c:19: ../../home/thuth/devel/qemu/hw/display/virtio-gpu-virgl.c: In function ‘virgl_cmd_submit_3d’: ../../qemu/include/hw/virtio/virtio-gpu.h:228:16: error: declaration of ‘s’ shadows a previous local [-Werror=shadow=compatible-local] 228 | size_t s; | ^ ../../qemu/hw/display/virtio-gpu-virgl.c:215:5: note: in expansion of macro ‘VIRTIO_GPU_FILL_CMD’ 215 | VIRTIO_GPU_FILL_CMD(cs); | ^~~~~~~~~~~~~~~~~~~ ../../qemu/hw/display/virtio-gpu-virgl.c:213:12: note: shadowed declaration is here 213 | size_t s; | ^ cc1: all warnings being treated as errors Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231009084559.41427-1-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | libvhost-user: Fix compiler warning with -Wshadow=localThomas Huth2023-10-121-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename shadowing variables to make this code compilable with -Wshadow=local. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231006121129.487251-1-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | libvduse: Fix compiler warning with -Wshadow=localThomas Huth2023-10-121-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to declare a new variable with the same name here, we can simple re-use the one from the top of the function. With this change, the file now compiles fine with -Wshadow=local. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231006120819.480792-1-thuth@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
* | | Merge tag 'mem-2023-10-12' of https://github.com/davidhildenbrand/qemu into ↵Stefan Hajnoczi2023-10-1623-113/+839
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging Hi, "Host Memory Backends" and "Memory devices" queue ("mem"): - Support memory devices with multiple memslots - Support memory devices that dynamically consume memslots - Support memory devices that can automatically decide on the number of memslots to use - virtio-mem support for exposing memory dynamically via multiple memslots - Some required cleanups/refactorings # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUn+XMRHGRhdmlkQHJl # ZGhhdC5jb20ACgkQTd4Q9wD/g1qDHA//T01suTa+uzrcoJHoMWN11S47WnAmbuTo # vVakucLBPMJAa9xZeCy3OavXaVGpHkw+t6g3OFknof0LfQ5/j9iE3Q1PxURN7g5j # SJ2WJXCoceM6T4TMhPvVvgEaYjFmESqZB5FZgedMT0QRyhAxMuF9pCkWhk1O3OAV # JqQKqLFiGcv60AEuBYGZGzgiOUv8EJ5gKwRF4VOdyHIxqZDw1aZXzlcd4TzFZBQ7 # rwW/3ef+sFmUJdmfrSrqcIlQSRrqZ2w95xATDzLTIEEUT3SWqh/E95EZWIz1M0oQ # NgWgFiLCR1KOj7bWFhLXT7IfyLh0mEysD+P/hY6QwQ4RewWG7EW5UK+JFswssdcZ # rEj5XpHZzev/wx7hM4bWsoQ+VIvrH7j3uYGyWkcgYRbdDEkWDv2rsT23lwGYNhht # oBsrdEBELRw6v4C8doq/+sCmHmuxUMqTGwbArCQVnB1XnLxOEkuqlnfq5MORkzNF # fxbIRx+LRluOllC0HVaDQd8qxRq1+UC5WIpAcDcrouy4HGgi1onWKrXpgjIAbVyH # M6cENkK7rnRk96gpeXdmrf0h9HqRciAOY8oUsFsvLyKBOCPBWDrLyOQEY5UoSdtD # m4QpEVgywCy2z1uU/UObeT/UxJy/9EL/Zb+DHoEK06iEhwONoUJjEBYMJD38RMkk # mwPTB4UAk9g= # =s69t # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 Oct 2023 09:49:39 EDT # gpg: using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A # gpg: issuer "david@redhat.com" # gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown] # gpg: aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full] # gpg: aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D FCCA 4DDE 10F7 00FF 835A * tag 'mem-2023-10-12' of https://github.com/davidhildenbrand/qemu: virtio-mem: Mark memslot alias memory regions unmergeable memory,vhost: Allow for marking memory device memory regions unmergeable virtio-mem: Expose device memory dynamically via multiple memslots if enabled virtio-mem: Update state to match bitmap as soon as it's been migrated virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb memory: Clarify mapping requirements for RamDiscardManager memory-device,vhost: Support automatic decision on the number of memslots vhost: Add vhost_get_max_memslots() kvm: Add stub for kvm_get_max_memslots() memory-device,vhost: Support memory devices that dynamically consume memslots memory-device: Track required and actually used memslots in DeviceMemoryState stubs: Rename qmp_memory_device.c to memory_device.c memory-device: Support memory devices with multiple memslots vhost: Return number of free memslots kvm: Return number of free memslots softmmu/physmem: Fixup qemu_ram_block_from_host() documentation vhost: Remove vhost_backend_can_merge() callback vhost: Rework memslot filtering and fix "used_memslot" tracking Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | virtio-mem: Mark memslot alias memory regions unmergeableDavid Hildenbrand2023-10-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's mark the memslot alias memory regions as unmergable, such that flatview and vhost won't merge adjacent memory region aliases and we can atomically map/unmap individual aliases without affecting adjacent alias memory regions. This handles vhost and vfio in multiple-memslot mode correctly (which do not support atomic memslot updates) and avoids the temporary removal of large memslots, which can be an expensive operation. For example, vfio might have to unpin + repin a lot of memory, which is undesired. Message-ID: <20230926185738.277351-19-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory,vhost: Allow for marking memory device memory regions unmergeableDavid Hildenbrand2023-10-123-8/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's allow for marking memory regions unmergeable, to teach flatview code and vhost to not merge adjacent aliases to the same memory region into a larger memory section; instead, we want separate aliases to stay separate such that we can atomically map/unmap aliases without affecting other aliases. This is desired for virtio-mem mapping device memory located on a RAM memory region via multiple aliases into a memory region container, resulting in separate memslots that can get (un)mapped atomically. As an example with virtio-mem, the layout would look something like this: [...] 0000000240000000-00000020bfffffff (prio 0, i/o): device-memory 0000000240000000-000000043fffffff (prio 0, i/o): virtio-mem 0000000240000000-000000027fffffff (prio 0, ram): alias memslot-0 @mem2 0000000000000000-000000003fffffff 0000000280000000-00000002bfffffff (prio 0, ram): alias memslot-1 @mem2 0000000040000000-000000007fffffff 00000002c0000000-00000002ffffffff (prio 0, ram): alias memslot-2 @mem2 0000000080000000-00000000bfffffff [...] Without unmergable memory regions, all three memslots would get merged into a single memory section. For example, when mapping another alias (e.g., virtio-mem-memslot-3) or when unmapping any of the mapped aliases, memory listeners will first get notified about the removal of the big memory section to then get notified about re-adding of the new (differently merged) memory section(s). In an ideal world, memory listeners would be able to deal with that atomically, like KVM nowadays does. However, (a) supporting this for other memory listeners (vhost-user, vfio) is fairly hard: temporary removal can result in all kinds of issues on concurrent access to guest memory; and (b) this handling is undesired, because temporarily removing+readding can consume quite some time on bigger memslots and is not efficient (e.g., vfio unpinning and repinning pages ...). Let's allow for marking a memory region unmergeable, such that we can atomically (un)map aliases to the same memory region, similar to (un)mapping individual DIMMs. Similarly, teach vhost code to not redo what flatview core stopped doing: don't merge such sections. Merging in vhost code is really only relevant for handling random holes in boot memory where; without this merging, the vhost-user backend wouldn't be able to mmap() some boot memory backed on hugetlb. We'll use this for virtio-mem next. Message-ID: <20230926185738.277351-18-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | virtio-mem: Expose device memory dynamically via multiple memslots if enabledDavid Hildenbrand2023-10-123-1/+340
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having large virtio-mem devices that only expose little memory to a VM is currently a problem: we map the whole sparse memory region into the guest using a single memslot, resulting in one gigantic memslot in KVM. KVM allocates metadata for the whole memslot, which can result in quite some memory waste. Assuming we have a 1 TiB virtio-mem device and only expose little (e.g., 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to allocate metadata for that 1 TiB memslot: on x86, this implies allocating a significant amount of memory for metadata: (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %) With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets allocated lazily when required for nested VMs (2) gfn_track: 2 bytes per 4 KiB -> For 1 TiB: 536870912 = ~512 MiB (0.05 %) (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %) (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page -> For 1 TiB: 536870912 = 64 MiB (0.006 %) So we primarily care about (1) and (2). The bad thing is, that the memory consumption *doubles* once SMM is enabled, because we create the memslot once for !SMM and once for SMM. Having a 1 TiB memslot without the TDP MMU consumes around: * With SMM: 5 GiB * Without SMM: 2.5 GiB Having a 1 TiB memslot with the TDP MMU consumes around: * With SMM: 1 GiB * Without SMM: 512 MiB ... and that's really something we want to optimize, to be able to just start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device that can grow very large (e.g., 1 TiB). Consequently, using multiple memslots and only mapping the memslots we really need can significantly reduce memory waste and speed up memslot-related operations. Let's expose the sparse RAM memory region using multiple memslots, mapping only the memslots we currently need into our device memory region container. The feature can be enabled using "dynamic-memslots=on" and requires "unplugged-inaccessible=on", which is nowadays the default. Once enabled, we'll auto-detect the number of memslots to use based on the memslot limit provided by the core. We'll use at most 1 memslot per gigabyte. Note that our global limit of memslots accross all memory devices is currently set to 256: even with multiple large virtio-mem devices, we'd still have a sane limit on the number of memslots used. The default is to not dynamically map memslot for now ("dynamic-memslots=off"). The optimization must be enabled manually, because some vhost setups (e.g., hotplug of vhost-user devices) might be problematic until we support more memslots especially in vhost-user backends. Note that "dynamic-memslots=on" is just a hint that multiple memslots *may* be used for internal optimizations, not that multiple memslots *must* be used. The actual number of memslots that are used is an internal detail: for example, once memslot metadata is no longer an issue, we could simply stop optimizing for that. Migration source and destination can differ on the setting of "dynamic-memslots". Message-ID: <20230926185738.277351-17-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | virtio-mem: Update state to match bitmap as soon as it's been migratedDavid Hildenbrand2023-10-121-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's cleaner and future-proof to just have other state that depends on the bitmap state to be updated as soon as possible when restoring the bitmap. So factor out informing RamDiscardListener into a functon and call it in case of early migration right after we restored the bitmap. Message-ID: <20230926185738.277351-16-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cbDavid Hildenbrand2023-10-121-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Let's prepare for a user that has to modify the VirtIOMEM device state. Message-ID: <20230926185738.277351-15-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory: Clarify mapping requirements for RamDiscardManagerDavid Hildenbrand2023-10-122-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We really only care about the RAM memory region not being mapped into an address space yet as long as we're still setting up the RamDiscardManager. Once mapped into an address space, memory notifiers would get notified about such a region and any attempts to modify the RamDiscardManager would be wrong. While "mapped into an address space" is easy to check for RAM regions that are mapped directly (following the ->container links), it's harder to check when such regions are mapped indirectly via aliases. For now, we can only detect that a region is mapped through an alias (->mapped_via_alias), but we don't have a handle on these aliases to follow all their ->container links to test if they are eventually mapped into an address space. So relax the assertion in memory_region_set_ram_discard_manager(), remove the check in memory_region_get_ram_discard_manager() and clarify the doc. Message-ID: <20230926185738.277351-14-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory-device,vhost: Support automatic decision on the number of memslotsDavid Hildenbrand2023-10-125-4/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support memory devices that can automatically decide how many memslots they will use. In the worst case, they have to use a single memslot. The target use cases are virtio-mem and the hyper-v balloon. Let's calculate a reasonable limit such a memory device may use, and instruct the device to make a decision based on that limit. Use a simple heuristic that considers: * A memslot soft-limit for all memory devices of 256; also, to not consume too many memslots -- which could harm performance. * Actually still free and unreserved memslots * The percentage of the remaining device memory region that memory device will occupy. Further, while we properly check before plugging a memory device whether there still is are free memslots, we have other memslot consumers (such as boot memory, PCI BARs) that don't perform any checks and might dynamically consume memslots without any prior reservation. So we might succeed in plugging a memory device, but once we dynamically map a PCI BAR we would be in trouble. Doing accounting / reservation / checks for all such users is problematic (e.g., sometimes we might temporarily split boot memory into two memslots, triggered by the BIOS). We use the historic magic memslot number of 509 as orientation to when supporting 256 memory devices -> memslots (leaving 253 for boot memory and other devices) has been proven to work reliable. We'll fallback to suggesting a single memslot if we don't have at least 509 total memslots. Plugging vhost devices with less than 509 memslots available while we have memory devices plugged that consume multiple memslots due to automatic decisions can be problematic. Most configurations might just fail due to "limit < used + reserved", however, it can also happen that these memory devices would suddenly consume memslots that would actually be required by other memslot consumers (boot, PCI BARs) later. Note that this has always been sketchy with vhost devices that support only a small number of memslots; but we don't want to make it any worse.So let's keep it simple and simply reject plugging such vhost devices in such a configuration. Eventually, all vhost devices that want to be fully compatible with such memory devices should support a decent number of memslots (>= 509). Message-ID: <20230926185738.277351-13-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | vhost: Add vhost_get_max_memslots()David Hildenbrand2023-10-123-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Let's add vhost_get_max_memslots(). Message-ID: <20230926185738.277351-12-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | kvm: Add stub for kvm_get_max_memslots()David Hildenbrand2023-10-123-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll need the stub soon from memory device context. While at it, use "unsigned int" as return value and place the declaration next to kvm_get_free_memslots(). Message-ID: <20230926185738.277351-11-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory-device,vhost: Support memory devices that dynamically consume memslotsDavid Hildenbrand2023-10-124-6/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support memory devices that have a dynamically managed memory region container as device memory region. This device memory region maps multiple RAM memory subregions (e.g., aliases to the same RAM memory region), whereby these subregions can be (un)mapped on demand. Each RAM subregion will consume a memslot in KVM and vhost, resulting in such a new device consuming memslots dynamically, and initially usually 0. We already track the number of used vs. required memslots for all memslots. From that, we can derive the number of reserved memslots that must not be used otherwise. The target use case is virtio-mem and the hyper-v balloon, which will dynamically map aliases to RAM memory region into their device memory region container. Properly document what's supported and what's not and extend the vhost memslot check accordingly. Message-ID: <20230926185738.277351-10-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory-device: Track required and actually used memslots in DeviceMemoryStateDavid Hildenbrand2023-10-122-1/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's track how many memslots are required by plugged memory devices and how many are currently actually getting used by plugged memory devices. "required - used" is the number of reserved memslots. For now, the number of used and required memslots is always equal, and there are no reservations. This is a preparation for memory devices that want to dynamically consume memslots after initially specifying how many they require -- where we'll end up with reserved memslots. To track the number of used memslots, create a new address space for our device memory and register a memory listener (add/remove) for that address space. Message-ID: <20230926185738.277351-9-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | stubs: Rename qmp_memory_device.c to memory_device.cDavid Hildenbrand2023-10-123-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to place non-qmp stubs in there, so let's rename it. While at it, put it into the MAINTAINERS file under "Memory devices". Message-ID: <20230926185738.277351-8-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory-device: Support memory devices with multiple memslotsDavid Hildenbrand2023-10-122-8/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support memory devices that have a memory region container as device memory region that maps multiple RAM memory regions. Let's start by supporting memory devices that statically map multiple RAM memory regions and, thereby, consume multiple memslots. We already have one device that uses a container as device memory region: NVDIMMs. However, a NVDIMM always ends up consuming exactly one memslot. Let's add support for that by asking the memory device via a new callback how many memslots it requires. Message-ID: <20230926185738.277351-7-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | vhost: Return number of free memslotsDavid Hildenbrand2023-10-124-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's return the number of free slots instead of only checking if there is a free slot. Required to support memory devices that consume multiple memslots. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-6-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | kvm: Return number of free memslotsDavid Hildenbrand2023-10-125-17/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's return the number of free slots instead of only checking if there is a free slot. While at it, check all address spaces, which will also consider SMM under x86 correctly. This is a preparation for memory devices that consume multiple memslots. Message-ID: <20230926185738.277351-5-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | softmmu/physmem: Fixup qemu_ram_block_from_host() documentationDavid Hildenbrand2023-10-122-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's fixup the documentation (e.g., removing traces of the ram_addr parameter that no longer exists) and move it to the header file while at it. Message-ID: <20230926185738.277351-4-david@redhat.com> Suggested-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | vhost: Remove vhost_backend_can_merge() callbackDavid Hildenbrand2023-10-124-24/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checking whether the memory regions are equal is sufficient: if they are equal, then most certainly the contained fd is equal. The whole vhost-user memslot handling is suboptimal and overly complicated. We shouldn't have to lookup a RAM memory regions we got notified about in vhost_user_get_mr_data() using a host pointer. But that requires a bigger rework -- especially an alternative vhost_set_mem_table() backend call that simply consumes MemoryRegionSections. For now, let's just drop vhost_backend_can_merge(). Message-ID: <20230926185738.277351-3-david@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | vhost: Rework memslot filtering and fix "used_memslot" trackingDavid Hildenbrand2023-10-123-16/+52
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having multiple vhost devices, some filtering out fd-less memslots and some not, can mess up the "used_memslot" accounting. Consequently our "free memslot" checks become unreliable and we might run out of free memslots at runtime later. An example sequence which can trigger a potential issue that involves different vhost backends (vhost-kernel and vhost-user) and hotplugged memory devices can be found at [1]. Let's make the filtering mechanism less generic and distinguish between backends that support private memslots (without a fd) and ones that only support shared memslots (with a fd). Track the used_memslots for both cases separately and use the corresponding value when required. Note: Most probably we should filter out MAP_PRIVATE fd-based RAM regions (for example, via memory-backend-memfd,...,shared=off or as default with memory-backend-file) as well. When not using MAP_SHARED, it might not work as expected. Add a TODO for now. [1] https://lkml.kernel.org/r/fad9136f-08d3-3fd9-71a1-502069c000cf@redhat.com Message-ID: <20230926185738.277351-2-david@redhat.com> Fixes: 988a27754bbb ("vhost: allow backends to filter memory sections") Cc: Tiwei Bie <tiwei.bie@intel.com> Acked-by: Igor Mammedov <imammedo@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
* | Merge tag 'pull-riscv-to-apply-20231012-1' of ↵Stefan Hajnoczi2023-10-1229-891/+1604
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://github.com/alistair23/qemu into staging Second RISC-V PR for 8.2 * Add support for the max CPU * Detect user choice in TCG * Clear CSR values at reset and sync MPSTATE with host * Fix the typo of inverted order of pmpaddr13 and pmpaddr14 * Split TCG/KVM accelerators from cpu.c * Add extension properties for all cpus * Replace GDB exit calls with proper shutdown * Support KVM_GET_REG_LIST * Remove RVG warning * Use env_archcpu for better performance * Deprecate capital 'Z' CPU properties * Fix vfwmaccbf16.vf # -----BEGIN PGP SIGNATURE----- # # iQIzBAABCAAdFiEEaukCtqfKh31tZZKWr3yVEwxTgBMFAmUncYAACgkQr3yVEwxT # gBPQ3g/9Fi4uYRK7dymHHAQbOO9NPlmVPPSxmQ8fNUhoZUkbHfm56JEl42Xr02rA # Lg2ORRQxJhAinANV8CotnbyLRHNCAvouCMCQEjHo1YEHzdXc0tQzp+rIOHT7v9rH # 6OQpI6RuCjO+0LQPMgzJx8yokMw/9b0uma3+RkNKod1XsSySo6JvDkMZGGZZWuVX # Que3TMHzc4513PWEwRS9NaAHqRdy/ax0aPu9khswTYBxeJ/mBTLvGj4wBq5wnS7+ # JPvq0M5ScUMl4K5o884wsAzOdxRk8QZOMx3duMCbqXw0xFmYZj/EzcIeHdnXwuDB # lcANd6LcESMNUb8iDBaFRjLnZ/gNiu20/P/LPWyTirfoZXzZ+h6WPnSeli36xtzO # KKWtvS1YggCjsDvh9/PLYAvUGBcS/kUhIynN10YKnoKB+wSDxxyvBS1GU6c8czgc # WDf3V4P3Z8oPKDA/24Qd9Uiho1Gq9FED4eBQPb9PuvkfboKE/g7lUp708XXDFVld # hkJMsYROSRvk54RHITrD9Z+XFQ2TfC8wHLH0IwlyynQnc1sKvXaR6U1hZTAVtE4f # yley/xCQ7OUV+hrx1sQLURcN6A+SPummOY5jdHiD29QcJnOZnkSy5j2KOlnHSa5i # 6v/6EFCgxwr69N6Q6X34VDv6+DZqLO2dNncQCInYFfupRhQ7t1E= # =SUon # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 Oct 2023 00:09:36 EDT # gpg: using RSA key 6AE902B6A7CA877D6D659296AF7C95130C538013 # gpg: Good signature from "Alistair Francis <alistair@alistair23.me>" [unknown] # gpg: WARNING: This key is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 6AE9 02B6 A7CA 877D 6D65 9296 AF7C 9513 0C53 8013 * tag 'pull-riscv-to-apply-20231012-1' of https://github.com/alistair23/qemu: (54 commits) target/riscv: Fix vfwmaccbf16.vf target/riscv: deprecate capital 'Z' CPU properties target/riscv: Use env_archcpu for better performance target/riscv/tcg: remove RVG warning target/riscv/kvm: support KVM_GET_REG_LIST target/riscv/kvm: improve 'init_multiext_cfg' error msg gdbstub: replace exit calls with proper shutdown for softmmu hw/char: riscv_htif: replace exit calls with proper shutdown hw/misc/sifive_test.c: replace exit calls with proper shutdown softmmu: pass the main loop status to gdb "Wxx" packet softmmu: add means to pass an exit code when requesting a shutdown target/riscv/tcg-cpu.c: add extension properties for all cpus target/riscv: add riscv_cpu_get_name() target/riscv/cpu: move priv spec functions to tcg-cpu.c target/riscv/cpu.c: export isa_edata_arr[] target/riscv/tcg: move riscv_cpu_add_misa_properties() to tcg-cpu.c target/riscv/cpu.c: make misa_ext_cfgs[] 'const' target/riscv/tcg: introduce tcg_cpu_instance_init() target/riscv/cpu.c: export set_misa() target/riscv/kvm: do not use riscv_cpu_add_misa_properties() ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | target/riscv: Fix vfwmaccbf16.vfMax Chou2023-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The operator (fwmacc16) of vfwmaccbf16.vf helper function should be replaced by fwmaccbf16. Fixes: adf772b0f7 ("target/riscv: Add support for Zvfbfwma extension") Signed-off-by: Max Chou <max.chou@sifive.com> Reviewed-by: LIU Zhiwei <zhiwei_liu@linux.alibaba.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20231005095734.567575-1-max.chou@sifive.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv: deprecate capital 'Z' CPU propertiesDaniel Henrique Barboza2023-10-124-12/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At this moment there are eleven CPU extension properties that starts with capital 'Z': Zifencei, Zicsr, Zihintntl, Zihintpause, Zawrs, Zfa, Zfh, Zfhmin, Zve32f, Zve64f and Zve64d. All other extensions are named with lower-case letters. We want all properties to be named with lower-case letters since it's consistent with the riscv-isa string that we create in the FDT. Having these 11 properties to be exceptions can be confusing. Deprecate all of them. Create their lower-case counterpart to be used as maintained CPU properties. When trying to use any deprecated property a warning message will be displayed, recommending users to switch to the lower-case variant: ./build/qemu-system-riscv64 -M virt -cpu rv64,Zifencei=true --nographic qemu-system-riscv64: warning: CPU property 'Zifencei' is deprecated. Please use 'zifencei' instead This will give users some time to change their scripts before we remove the capital 'Z' properties entirely. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Message-ID: <20231009112817.8896-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv: Use env_archcpu for better performanceRichard W.M. Jones2023-10-121-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RISCV_CPU(cs) uses a checked cast. When QOM cast debugging is enabled this adds about 5% total overhead when emulating RV64 on x86-64 host. Using a RISC-V guest with 16 vCPUs, 16 GB of guest RAM, virtio-blk disk. The guest has a copy of the qemu source tree. The test involves compiling the qemu source tree with 'make clean; time make -j16'. Before making this change the compile step took 449 & 447 seconds over two consecutive runs. After making this change: 428 & 421 seconds. The saving is over 5%. Thanks: Paolo Bonzini Thanks: Philippe Mathieu-Daudé Signed-off-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-ID: <20231009124859.3373696-2-rjones@redhat.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/tcg: remove RVG warningDaniel Henrique Barboza2023-10-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Vendor CPUs that set RVG are displaying user warnings about other extensions that RVG must enable, one warning per CPU. E.g.: $ ./build/qemu-system-riscv64 -smp 8 -M virt -cpu veyron-v1 -nographic qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei qemu-system-riscv64: warning: Setting G will also set IMAFD_Zicsr_Zifencei This happens because we decided a while ago that, for simplicity, vendor CPUs could set RVG instead of setting each G extension individually in their cpu_init(). Our warning isn't taking that into account, and we're bugging users with a warning that we're causing ourselves. In a closer look we conclude that this warning is not warranted in any other circumstance since we're just following the ISA [1], which states in chapter 24: "One goal of the RISC-V project is that it be used as a stable software development target. For this purpose, we define a combination of a base ISA (RV32I or RV64I) plus selected standard extensions (IMAFD, Zicsr, Zifencei) as a 'general-purpose' ISA, and we use the abbreviation G for the IMAFDZicsr Zifencei combination of instruction-set extensions." With this in mind, enabling IMAFD_Zicsr_Zifencei if the user explicitly enables 'G' is an expected behavior and the warning is unneeded. Any user caught by surprise should refer to the ISA. Remove the warning when handling RVG. [1] https://github.com/riscv/riscv-isa-manual/releases/download/Ratified-IMAFDQC/riscv-spec-20191213.pdf Reported-by: Paul A. Clarke <pclarke@ventanamicro.com> Suggested-by: Andrew Jones <ajones@ventanamicro.com> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Message-ID: <20231003122539.775932-1-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/kvm: support KVM_GET_REG_LISTDaniel Henrique Barboza2023-10-121-1/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KVM for RISC-V started supporting KVM_GET_REG_LIST in Linux 6.6. It consists of a KVM ioctl() that retrieves a list of all available regs for get_one_reg/set_one_reg. Regs that aren't present in the list aren't supported in the host. This simplifies our lives when initing the KVM regs since we don't have to always attempt a KVM_GET_ONE_REG for all regs QEMU knows. We'll only attempt a get_one_reg() if we're sure the reg is supported, i.e. it was retrieved by KVM_GET_REG_LIST. Any error in get_one_reg() will then always considered fatal, instead of having to handle special error codes that might indicate a non-fatal failure. Start by moving the current kvm_riscv_init_multiext_cfg() logic into a new kvm_riscv_read_multiext_legacy() helper. We'll prioritize using KVM_GET_REG_LIST, so check if we have it available and, in case we don't, use the legacy() logic. Otherwise, retrieve the available reg list and use it to check if the host supports our known KVM regs, doing the usual get_one_reg() for the supported regs and setting cpu->cfg accordingly. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Acked-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Message-ID: <20231003132148.797921-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/kvm: improve 'init_multiext_cfg' error msgDaniel Henrique Barboza2023-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our error message is returning the value of 'ret', which will be always -1 in case of error, and will not be that useful: qemu-system-riscv64: Unable to read ISA_EXT KVM register ssaia, error -1 Improve the error message by outputting 'errno' instead of 'ret'. Use strerrorname_np() to output the error name instead of the error code. This will give us what we need to know right away: qemu-system-riscv64: Unable to read ISA_EXT KVM register ssaia, error code: ENOENT Given that we're going to exit(1) in this condition instead of attempting to recover, remove the 'kvm_riscv_destroy_scratch_vcpu()' call. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20231003132148.797921-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | gdbstub: replace exit calls with proper shutdown for softmmuClément Chigot2023-10-124-2/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the exit calls by shutdown requests, ensuring a proper cleanup of Qemu. Features like net/vhost-vdpa.c are expecting qemu_cleanup to be called to remove their last residuals. Signed-off-by: Clément Chigot <chigot@adacore.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20231003071427.188697-6-chigot@adacore.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | hw/char: riscv_htif: replace exit calls with proper shutdownClément Chigot2023-10-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the exit calls by shutdown requests, ensuring a proper cleanup of Qemu. Otherwise, some connections like gdb could be broken before its final packet ("Wxx") is being sent. This part, being done inside qemu_cleanup function, can be reached only when the main loop exits after a shutdown request. Signed-off-by: Clément Chigot <chigot@adacore.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20231003071427.188697-5-chigot@adacore.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | hw/misc/sifive_test.c: replace exit calls with proper shutdownClément Chigot2023-10-121-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces the exit calls by shutdown requests, ensuring a proper cleanup of Qemu. Otherwise, some connections like gdb could be broken before its final packet ("Wxx") is being sent. This part, being done inside qemu_cleanup function, can be reached only when the main loop exits after a shutdown request. Signed-off-by: Clément Chigot <chigot@adacore.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20231003071427.188697-4-chigot@adacore.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | softmmu: pass the main loop status to gdb "Wxx" packetClément Chigot2023-10-123-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gdb_exit function aims to close gdb sessions and sends the exit code of the current execution. It's being called by qemu_cleanup once the main loop is over. Until now, the exit code sent was always 0. Now that hardware can shutdown this main loop with custom exit codes, these codes must be transfered to gdb as well. Signed-off-by: Clément Chigot <chigot@adacore.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20231003071427.188697-3-chigot@adacore.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | softmmu: add means to pass an exit code when requesting a shutdownClément Chigot2023-10-122-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of now, the exit code was either EXIT_FAILURE when a panic shutdown was requested or EXIT_SUCCESS otherwise. However, some hardware could want to pass more complex exit codes. Thus, introduce a new shutdown request function allowing that. Signed-off-by: Clément Chigot <chigot@adacore.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20231003071427.188697-2-chigot@adacore.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/tcg-cpu.c: add extension properties for all cpusDaniel Henrique Barboza2023-10-121-14/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At this moment we do not expose extension properties for vendor CPUs because that would allow users to change them via command line. The drawback is that if we were to add an API that shows all CPU properties, e.g. qmp-query-cpu-model-expansion, we won't be able to show extensions state of vendor CPUs. We have the required machinery to create extension properties for vendor CPUs while not allowing users to enable extensions. Disabling existing extensions is allowed since it can be useful for debugging. Change the set() callback cpu_set_multi_ext_cfg() to allow enabling extensions only for generic CPUs. In cpu_add_multi_ext_prop() let's not set the default values for the properties if we're not dealing with generic CPUs, otherwise the values set in cpu_init() of vendor CPUs will be overwritten. And finally, in tcg_cpu_instance_init(), add cpu user properties for all CPUs. For the veyron-v1 CPU, we're now able to disable existing extensions like smstateen: $ ./build/qemu-system-riscv64 --nographic -M virt \ -cpu veyron-v1,smstateen=false But setting extensions that the CPU didn't set during cpu_init(), like V, is not allowed: $ ./build/qemu-system-riscv64 --nographic -M virt \ -cpu veyron-v1,v=true qemu-system-riscv64: can't apply global veyron-v1-riscv-cpu.v=true: 'veyron-v1' CPU does not allow enabling extensions Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230926183109.165878-3-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv: add riscv_cpu_get_name()Daniel Henrique Barboza2023-10-123-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll introduce generic errors that will output a CPU type name via its RISCVCPU pointer. Create a helper for that. Use the helper in tcg_cpu_realizefn() instead of hardcoding the 'host' CPU name. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230926183109.165878-2-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/cpu: move priv spec functions to tcg-cpu.cDaniel Henrique Barboza2023-10-123-40/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Priv spec validation is TCG specific. Move it to the TCG accel class. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-ID: <20230925175709.35696-20-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/cpu.c: export isa_edata_arr[]Daniel Henrique Barboza2023-10-122-26/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This array will be read by the TCG accel class, allowing it to handle priv spec verifications on its own. The array will remain here in cpu.c because it's also used by the riscv,isa string function. To export it we'll finish it with an empty element since ARRAY_SIZE() won't work outside of cpu.c. Get rid of its ARRAY_SIZE() usage now to alleviate the changes for the next patch. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230925175709.35696-19-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/tcg: move riscv_cpu_add_misa_properties() to tcg-cpu.cDaniel Henrique Barboza2023-10-123-91/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All code related to MISA TCG properties is also moved. At this point, all TCG properties handling is done in tcg-cpu.c, all KVM properties handling is done in kvm-cpu.c. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230925175709.35696-18-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/cpu.c: make misa_ext_cfgs[] 'const'Daniel Henrique Barboza2023-10-121-13/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The array isn't marked as 'const' because we're initializing their elements in riscv_cpu_add_misa_properties(), 'name' and 'description' fields. In a closer look we can see that we're not using these 2 fields after creating the MISA properties. And we can create the properties by using riscv_get_misa_ext_name() and riscv_get_misa_ext_description() directly. Remove the 'name' and 'description' fields from RISCVCPUMisaExtConfig and make misa_ext_cfgs[] a const array. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230925175709.35696-17-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/tcg: introduce tcg_cpu_instance_init()Daniel Henrique Barboza2023-10-123-151/+149
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcg_cpu_instance_init() will be the 'cpu_instance_init' impl for the TCG accelerator. It'll be called from within riscv_cpu_post_init(), via accel_cpu_instance_init(), similar to what happens with KVM. In fact, to preserve behavior, the implementation will be similar to what riscv_cpu_post_init() already does. In this patch we'll move riscv_cpu_add_user_properties() and riscv_init_max_cpu_extensions() and all their dependencies to tcg-cpu.c. All multi-extension properties code was moved. The 'multi_ext_user_opts' hash table was also moved to tcg-cpu.c since it's a TCG only structure, meaning that we won't have to worry about initializing a TCG hash table when running a KVM CPU anymore. riscv_cpu_add_user_properties() will remain in cpu.c for now due to how much code it requires to be moved at the same time. We'll do that in the next patch. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230925175709.35696-16-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
| * | target/riscv/cpu.c: export set_misa()Daniel Henrique Barboza2023-10-122-16/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll move riscv_init_max_cpu_extensions() to tcg-cpu.c in the next patch and set_misa() needs to be usable from there. Rename it to riscv_cpu_set_misa() and make it public. Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Andrew Jones <ajones@ventanamicro.com> Reviewed-by: Alistair Francis <alistair.francis@wdc.com> Message-ID: <20230925175709.35696-15-dbarboza@ventanamicro.com> Signed-off-by: Alistair Francis <alistair.francis@wdc.com>