summary refs log tree commit diff stats
Commit message (Collapse)AuthorAgeFilesLines
...
* | | Merge tag 'pull-request-2023-10-12' of https://gitlab.com/thuth/qemu into ↵Stefan Hajnoczi2023-10-168-3/+46
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging * Fix CVE-2023-1544 * Deprecate the rdma code * Fix flaky npcm7xx_timer test * i2c-echo license statement and Kconfig switch * Disable the failing riscv64-debian-cross CI job by default * tag 'pull-request-2023-10-12' of https://gitlab.com/thuth/qemu: gitlab-ci: Disable the riscv64-debian-cross-container by default MAINTAINERS: Add include/sysemu/qtest.h to the qtest section hw/misc/Kconfig: add switch for i2c-echo hw/misc/i2c-echo: add copyright/license note tests/qtest: Fix npcm7xx_timer-test.c flaky test hw/rdma: Deprecate the pvrdma device and the rdma subsystem hw/pvrdma: Protect against buggy or malicious guest driver Conflicts: docs/about/deprecated.rst Context conflict between RISC-V and RDMA deprecation. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | | gitlab-ci: Disable the riscv64-debian-cross-container by defaultThomas Huth2023-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This job is failing since weeks. Let's mark it as manual until it gets fixed. Message-Id: <82aa015a-ca94-49ce-beec-679cc175b726@redhat.com> Acked-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Thomas Huth <thuth@redhat.com>
| * | | MAINTAINERS: Add include/sysemu/qtest.h to the qtest sectionThomas Huth2023-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already list system/qtest.c in the qtest section, so the corresponding header file should be listed here, too. Message-ID: <20231012111401.871711-1-thuth@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
| * | | hw/misc/Kconfig: add switch for i2c-echoKlaus Jensen2023-10-122-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Associate i2c-echo with TEST_DEVICES and add a dependency on I2C. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Thomas Huth <thuth@redhat.com> Message-ID: <20230823-i2c-echo-fixes-v1-2-ccc05a6028f0@samsung.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
| * | | hw/misc/i2c-echo: add copyright/license noteKlaus Jensen2023-10-121-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing copyright and license notice. Also add a short description of the device. Signed-off-by: Klaus Jensen <k.jensen@samsung.com> Message-ID: <20230823-i2c-echo-fixes-v1-1-ccc05a6028f0@samsung.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
| * | | tests/qtest: Fix npcm7xx_timer-test.c flaky testChris Rauer2023-10-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | npcm7xx_timer-test occasionally fails due to the state of the timers from the previous test iteration. Advancing the clock step after the reset resolves this issue. Fixes: https://gitlab.com/qemu-project/qemu/-/issues/1897 Signed-off-by: Chris Rauer <crauer@google.com> Message-ID: <20230929000831.691559-1-crauer@google.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
| * | | hw/rdma: Deprecate the pvrdma device and the rdma subsystemThomas Huth2023-10-123-1/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This subsystem is said to be in a bad shape (see e.g. [1], [2] and [3]), and nobody seems to feel responsible to pick up patches for this and send them via a pull request. For example there is a patch for a CVE-worthy bug posted more than half a year ago [4] which has never been merged. Thus let's mark it as deprecated and finally remove it unless somebody steps up and improves the code quality and adds proper regression tests. [1] https://lore.kernel.org/qemu-devel/20230918144206.560120-1-armbru@redhat.com/ [2] https://lore.kernel.org/qemu-devel/ZQnojJOqoFu73995@redhat.com/ [3] https://lore.kernel.org/qemu-devel/1054981c-e8ae-c676-3b04-eeb030e11f65@tls.msk.ru/ [4] https://lore.kernel.org/qemu-devel/20230301142926.18686-1-yuval.shaia.ml@gmail.com/ Message-ID: <20230927133019.228495-1-thuth@redhat.com> Acked-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
| * | | hw/pvrdma: Protect against buggy or malicious guest driverYuval Shaia2023-10-121-1/+15
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Guest driver allocates and initialize page tables to be used as a ring of descriptors for CQ and async events. The page table that represents the ring, along with the number of pages in the page table is passed to the device. Currently our device supports only one page table for a ring. Let's make sure that the number of page table entries the driver reports, do not exceeds the one page table size. Reported-by: Soul Chen <soulchen8650@gmail.com> Signed-off-by: Yuval Shaia <yuval.shaia.ml@gmail.com> Fixes: CVE-2023-1544 Message-ID: <20230301142926.18686-1-yuval.shaia.ml@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
* | | Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into stagingStefan Hajnoczi2023-10-1660-499/+843
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Block layer patches - Clean up coroutine versions of bdrv_{is_allocated,block_status}* - Graph locking part 5 (protect children/parent links) # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmUoHL8RHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9b4uRAAjryVAaA5jXZ3mdGB80nhGtARZlIaIVO/ # tlXk065q2Cj+98f+fBPCPWvmEz28vJwBhJUsFwpHzLZrxecBpwZp0MPAkFBNkouq # +AiO9xyTAqccEp/dnIys4Bun9Rp0Jq9lk9y29zzEmQuK5uCB56lpx2cDn/JkzSQt # ZFtnxxTwi3MDTNvXATub8Ia/1suui0zvESS7J/NBxQNI3cFaQszp1vMwlRIoPiWo # 15YZFPZZQ2pvu6/1nL1Vl9OLbPAVcEGJpjHZv0XhudYOwRiDvjYnwfPL7BuwYEsU # Dos4mZZd/KMU695s7OzlVYi1q4ATKUTUxyyylVhXZrFBXSE5ntnfoHTKHEruTyPb # G31h5mribSTWjdvY5HewHbSSPjByAWsSQg9yzcHybhORiqGQCpcGQ8zuW7oNKMPV # JicWdoRVY4U4hR0nRdDxz9zdpQ8QYok/ginBxFaOzrCfClUB7ZOBxwRMclIghuRH # FV+ZJk0ylVOz2tbfNxUa3KhUgTPd8jgCHFI7xak5EBRtTJiJjE03Xag1Fdxy5/D5 # tRsBBW4sOFygAhjN/xyeaRv9L8rAv3x/akriFjPUbOMLkPcJpe/DTWsP8+5LaZF8 # GkQvjsg5UvmfcJ3LFtecXxfYH4UWhDmyAjF+BswiRqafDDi2CCUmdwDnzEPbwuWO # x1y7cgxe9SE= # =4d/s # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 Oct 2023 12:20:15 EDT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: (26 commits) block: Add assertion for bdrv_graph_wrlock() block: Protect bs->children with graph_lock block: Protect bs->parents with graph_lock block: Mark bdrv_get_specific_info() and callers GRAPH_RDLOCK block: Mark bdrv_apply_auto_read_only() and callers GRAPH_RDLOCK block: Mark bdrv_op_is_blocked() and callers GRAPH_RDLOCK qcow2: Mark check_constraints_on_bitmap() GRAPH_RDLOCK qcow2: Mark qcow2_inactivate() and callers GRAPH_RDLOCK qcow2: Mark qcow2_signal_corruption() and callers GRAPH_RDLOCK block: Mark bdrv_amend_options() and callers GRAPH_RDLOCK block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCK block: Mark bdrv_primary_child() and callers GRAPH_RDLOCK block: Mark bdrv_refresh_filename() and callers GRAPH_RDLOCK block: Mark bdrv_get_xdbg_block_graph() and callers GRAPH_RDLOCK block: Take graph rdlock in parts of reopen block: Mark bdrv_snapshot_fallback() and callers GRAPH_RDLOCK block: Mark bdrv_parent_cb_resize() and callers GRAPH_RDLOCK block: Mark drain related functions GRAPH_RDLOCK block: Mark bdrv_first_blk() and bdrv_is_root_node() GRAPH_RDLOCK block: Take graph rdlock in bdrv_inactivate_all() ... Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | | block: Add assertion for bdrv_graph_wrlock()Kevin Wolf2023-10-122-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdrv_graph_wrlock() can't run in a coroutine (because it polls) and requires holding the BQL. We already have GLOBAL_STATE_CODE() to assert the latter. Assert the former as well and add a no_coroutine_fn marker. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-23-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Protect bs->children with graph_lockKevin Wolf2023-10-124-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Almost all functions that access the child links already take the graph lock now. Add locking to the remaining users and finally annotate the struct field itself as protected by the graph lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-22-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Protect bs->parents with graph_lockKevin Wolf2023-10-122-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Almost all functions that access the parent link already take the graph lock now. Add locking to the remaining user in a test case and finally annotate the struct field itself as protected by the graph lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-21-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_get_specific_info() and callers GRAPH_RDLOCKKevin Wolf2023-10-124-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_get_specific_info() need to hold a reader lock for the graph. This removes an assume_graph_lock() call in vmdk's implementation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-20-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_apply_auto_read_only() and callers GRAPH_RDLOCKKevin Wolf2023-10-1210-5/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_apply_auto_read_only() need to hold a reader lock for the graph because it calls bdrv_can_set_read_only(), which indirectly accesses the parents list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-19-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_op_is_blocked() and callers GRAPH_RDLOCKKevin Wolf2023-10-127-5/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_op_is_blocked() need to hold a reader lock for the graph because it calls bdrv_get_device_or_node_name(), which accesses the parents list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-18-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | qcow2: Mark check_constraints_on_bitmap() GRAPH_RDLOCKKevin Wolf2023-10-121-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It still has an assume_graph_lock() call, but all of its callers are now properly annotated to hold the graph lock. Update the function to be GRAPH_RDLOCK as well and remove the assume_graph_lock(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-17-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | qcow2: Mark qcow2_inactivate() and callers GRAPH_RDLOCKKevin Wolf2023-10-121-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of qcow2_inactivate() need to hold a reader lock for the graph because it calls bdrv_get_device_or_node_name(), which accesses the parents list of a node. qcow2_do_close() is a bit strange because it is called from different contexts. If close_data_file = true, we know that we were called from non-coroutine main loop context (more specifically, we're coming from qcow2_close()) and can safely drop the reader lock temporarily with bdrv_graph_rdunlock_main_loop() and acquire the writer lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-16-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | qcow2: Mark qcow2_signal_corruption() and callers GRAPH_RDLOCKKevin Wolf2023-10-1211-192/+250
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of qcow2_signal_corruption() need to hold a reader lock for the graph because it calls bdrv_get_node_name(), which accesses the parents list of a node. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-15-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_amend_options() and callers GRAPH_RDLOCKKevin Wolf2023-10-124-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_amend_options() need to hold a reader lock for the graph. This removes an assume_graph_lock() call in crypto's implementation. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-14-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_get_parent_name() and callers GRAPH_RDLOCKKevin Wolf2023-10-1217-14/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_get_parent_name() need to hold a reader lock for the graph because it accesses the parents list of a node. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-13-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_primary_child() and callers GRAPH_RDLOCKKevin Wolf2023-10-123-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_primary_child() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-12-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_refresh_filename() and callers GRAPH_RDLOCKKevin Wolf2023-10-1212-47/+101
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_refresh_filename() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-11-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_get_xdbg_block_graph() and callers GRAPH_RDLOCKKevin Wolf2023-10-122-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_get_xdbg_block_graph() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-10-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Take graph rdlock in parts of reopenKevin Wolf2023-10-122-27/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reopen isn't easy with respect to locking because many of its functions need to iterate the graph, some change it, and then you get some drains in the middle where you can't hold any locks. Therefore just documents most of the functions to be unlocked, and take locks internally before accessing the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-9-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_snapshot_fallback() and callers GRAPH_RDLOCKKevin Wolf2023-10-125-24/+67
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_snapshot_fallback() need to hold a reader lock for the graph because it accesses the children list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-8-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_parent_cb_resize() and callers GRAPH_RDLOCKKevin Wolf2023-10-121-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_parent_cb_resize() need to hold a reader lock for the graph. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-7-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark drain related functions GRAPH_RDLOCKEmanuele Giuseppe Esposito2023-10-125-17/+54
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Draining recursively traverses the graph, therefore we need to make sure that also such accesses to the graph are protected by the graph rdlock. There are 3 different drain callers to consider: 1. drain in the main loop: no issue at all, rdlock is nop. 2. drain in an iothread: rdlock only works in main loop or coroutines, so disallow it. 3. drain in a coroutine (regardless of AioContext): the drain mechanism takes care of scheduling a BH in the bs->aio_context that will then take care of perform the actual draining. This is wrong, because as pointed in (2) if bs->aio_context is an iothread then rdlock won't work. Therefore change bdrv_co_yield_to_drain to schedule the BH in the main loop. Caller (2) also implies that we need to modify test-bdrv-drain.c to disallow draining in the iothreads. For some places, we know that they will hold the lock, but we don't have the GRAPH_RDLOCK annotations yet. In this case, add assume_graph_lock() with a FIXME comment. These places will be removed once everything is properly annotated. Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-6-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Mark bdrv_first_blk() and bdrv_is_root_node() GRAPH_RDLOCKKevin Wolf2023-10-1214-13/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds GRAPH_RDLOCK annotations to declare that callers of bdrv_first_blk() and bdrv_is_root_node() need to hold a reader lock for the graph. These functions are the only functions in block-backend.c that access the parent list of a node. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-5-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: Take graph rdlock in bdrv_inactivate_all()Kevin Wolf2023-10-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function reads the parents list, so it needs to hold the graph lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-4-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block-coroutine-wrapper: Add no_co_wrapper_bdrv_rdlock functionsKevin Wolf2023-10-122-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new wrapper type for GRAPH_RDLOCK functions that should be called from coroutine context. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | test-bdrv-drain: Don't call bdrv_graph_wrlock() in coroutine contextKevin Wolf2023-10-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AIO callbacks are effectively coroutine_mixed_fn. If AIO requests don't return immediately, their callback is called from the request coroutine. This means that in AIO callbacks, we can't call no_coroutine_fns such as bdrv_graph_wrlock(). Unfortunately test-bdrv-drain does so. Change the test to use a BH to drop out of coroutine context, and add coroutine_mixed_fn and no_coroutine_fn markers to clarify the context each function runs in. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20230929145157.45443-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: convert more bdrv_is_allocated* and bdrv_block_status* calls to ↵Paolo Bonzini2023-10-128-32/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | coroutine versions Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-5-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: switch to co_wrapper for bdrv_is_allocated_*Paolo Bonzini2023-10-122-51/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-4-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: complete public block status APIPaolo Bonzini2023-10-122-19/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Include both coroutine and non-coroutine versions, the latter being co_wrapper_mixed_bdrv_rdlock of the former. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-3-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
| * | | block: rename the bdrv_co_block_status static functionPaolo Bonzini2023-10-121-10/+11
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | bdrv_block_status exists as a wrapper for bdrv_block_status_above, but the name of the (hypothetical) coroutine version, bdrv_co_block_status, is squatted by a random static function. Rename it to bdrv_co_do_block_status. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20230904100306.156197-2-pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | | Merge tag 'pull-shadow-2023-10-12' of https://repo.or.cz/qemu/armbru into ↵Stefan Hajnoczi2023-10-166-19/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging -Wshadow=local patches for 2023-10-12 # -----BEGIN PGP SIGNATURE----- # # iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmUoCNsSHGFybWJydUBy # ZWRoYXQuY29tAAoJEDhwtADrkYZTTocP/iQ6RggqcHrBxwZZtyydvpWCFrqfuBTk # 6GQtKGm51UcQ9kmAIsoV90pOzdUdjwrpXzKKJwsLzMcVcp1NDPsQIL54wdsRmZfH # E9mxI7UlZf/KWzrfP1nFLcU8T5+cuXosDgjx55Y1Kq+ZRn+7x0DInBGdRryokWTG # zcKh9T3n9KWKscLL7hvxLZS5054V9HBDYIpBBEyV2GtRrCLL0Y+9aaKkBrejHMgY # oKrLKHz1cOGOTzQ7AbhA+Wv3eN+GYVyjnCSUXK/270jbU8Xg4m1vSbrPq2PWy5kV # IGGKZtZsrSq0VBoTi+i9++vP5djKVUYQLqx10L+NYCp25wBnTgXKSDtdAqI68aev # TYrOlQ1ldKXJT4ghPqoWCjRKkryV6/Gj9fHbbvsHJ7SB84VO8G/kpn5zXvN/BosG # 8vxLEL0xc1Q3Sxi91DCjVsP7UebjBt1j/JugU9zVr8OFJWriFmllYB67AOOo3gS2 # c+FNVPLle3udw5EHClMapcGSzTun4iHeEsiJMOOgGOHC09Bi+Om6LlneFWljmvQp # a6ma+bebxCjzuO6heey2Q/1JjltR8Ex0bnbWIoNsysA6OnDtTlbxDqZEca1h6As+ # Rm9XFKf7nVQIHFKW3sjbx6MgqAL6sBakfeJah5Pj5iIKtLaZR591RyAfvfB2sBlS # ZYtp95GIKWXZ # =AArx # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 Oct 2023 10:55:23 EDT # gpg: using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653 # gpg: issuer "armbru@redhat.com" # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [full] # gpg: aka "Markus Armbruster <armbru@pond.sub.org>" [full] # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867 4E5F 3870 B400 EB91 8653 * tag 'pull-shadow-2023-10-12' of https://repo.or.cz/qemu/armbru: target/i386: fix shadowed variable pasto contrib/vhost-user-gpu: Fix compiler warning when compiling with -Wshadow hw/virtio/virtio-gpu: Fix compiler warning when compiling with -Wshadow libvhost-user: Fix compiler warning with -Wshadow=local libvduse: Fix compiler warning with -Wshadow=local Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | | target/i386: fix shadowed variable pastoPaolo Bonzini2023-10-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a908985971a ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26) failed to use the tss_selector argument of the new function, which was therefore unused. This shows up as a #GP fault when booting old versions of 32-bit Linux. Fixes: a908985971a ("target/i386/seg_helper: introduce tss_set_busy", 2023-09-26) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20231011135350.438492-1-pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | contrib/vhost-user-gpu: Fix compiler warning when compiling with -WshadowThomas Huth2023-10-122-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename some variables to avoid compiler warnings when compiling with -Wshadow=local. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231009083726.30301-1-thuth@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | hw/virtio/virtio-gpu: Fix compiler warning when compiling with -WshadowThomas Huth2023-10-121-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid using trivial variable names in macros, otherwise we get the following compiler warning when compiling with -Wshadow=local: In file included from ../../qemu/hw/display/virtio-gpu-virgl.c:19: ../../home/thuth/devel/qemu/hw/display/virtio-gpu-virgl.c: In function ‘virgl_cmd_submit_3d’: ../../qemu/include/hw/virtio/virtio-gpu.h:228:16: error: declaration of ‘s’ shadows a previous local [-Werror=shadow=compatible-local] 228 | size_t s; | ^ ../../qemu/hw/display/virtio-gpu-virgl.c:215:5: note: in expansion of macro ‘VIRTIO_GPU_FILL_CMD’ 215 | VIRTIO_GPU_FILL_CMD(cs); | ^~~~~~~~~~~~~~~~~~~ ../../qemu/hw/display/virtio-gpu-virgl.c:213:12: note: shadowed declaration is here 213 | size_t s; | ^ cc1: all warnings being treated as errors Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231009084559.41427-1-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | libvhost-user: Fix compiler warning with -Wshadow=localThomas Huth2023-10-121-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rename shadowing variables to make this code compilable with -Wshadow=local. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231006121129.487251-1-thuth@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
| * | | libvduse: Fix compiler warning with -Wshadow=localThomas Huth2023-10-121-1/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to declare a new variable with the same name here, we can simple re-use the one from the top of the function. With this change, the file now compiles fine with -Wshadow=local. Signed-off-by: Thomas Huth <thuth@redhat.com> Message-ID: <20231006120819.480792-1-thuth@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
* | | Merge tag 'mem-2023-10-12' of https://github.com/davidhildenbrand/qemu into ↵Stefan Hajnoczi2023-10-1623-113/+839
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging Hi, "Host Memory Backends" and "Memory devices" queue ("mem"): - Support memory devices with multiple memslots - Support memory devices that dynamically consume memslots - Support memory devices that can automatically decide on the number of memslots to use - virtio-mem support for exposing memory dynamically via multiple memslots - Some required cleanups/refactorings # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEEG9nKrXNcTDpGDfzKTd4Q9wD/g1oFAmUn+XMRHGRhdmlkQHJl # ZGhhdC5jb20ACgkQTd4Q9wD/g1qDHA//T01suTa+uzrcoJHoMWN11S47WnAmbuTo # vVakucLBPMJAa9xZeCy3OavXaVGpHkw+t6g3OFknof0LfQ5/j9iE3Q1PxURN7g5j # SJ2WJXCoceM6T4TMhPvVvgEaYjFmESqZB5FZgedMT0QRyhAxMuF9pCkWhk1O3OAV # JqQKqLFiGcv60AEuBYGZGzgiOUv8EJ5gKwRF4VOdyHIxqZDw1aZXzlcd4TzFZBQ7 # rwW/3ef+sFmUJdmfrSrqcIlQSRrqZ2w95xATDzLTIEEUT3SWqh/E95EZWIz1M0oQ # NgWgFiLCR1KOj7bWFhLXT7IfyLh0mEysD+P/hY6QwQ4RewWG7EW5UK+JFswssdcZ # rEj5XpHZzev/wx7hM4bWsoQ+VIvrH7j3uYGyWkcgYRbdDEkWDv2rsT23lwGYNhht # oBsrdEBELRw6v4C8doq/+sCmHmuxUMqTGwbArCQVnB1XnLxOEkuqlnfq5MORkzNF # fxbIRx+LRluOllC0HVaDQd8qxRq1+UC5WIpAcDcrouy4HGgi1onWKrXpgjIAbVyH # M6cENkK7rnRk96gpeXdmrf0h9HqRciAOY8oUsFsvLyKBOCPBWDrLyOQEY5UoSdtD # m4QpEVgywCy2z1uU/UObeT/UxJy/9EL/Zb+DHoEK06iEhwONoUJjEBYMJD38RMkk # mwPTB4UAk9g= # =s69t # -----END PGP SIGNATURE----- # gpg: Signature made Thu 12 Oct 2023 09:49:39 EDT # gpg: using RSA key 1BD9CAAD735C4C3A460DFCCA4DDE10F700FF835A # gpg: issuer "david@redhat.com" # gpg: Good signature from "David Hildenbrand <david@redhat.com>" [unknown] # gpg: aka "David Hildenbrand <davidhildenbrand@gmail.com>" [full] # gpg: aka "David Hildenbrand <hildenbr@in.tum.de>" [unknown] # gpg: WARNING: The key's User ID is not certified with a trusted signature! # gpg: There is no indication that the signature belongs to the owner. # Primary key fingerprint: 1BD9 CAAD 735C 4C3A 460D FCCA 4DDE 10F7 00FF 835A * tag 'mem-2023-10-12' of https://github.com/davidhildenbrand/qemu: virtio-mem: Mark memslot alias memory regions unmergeable memory,vhost: Allow for marking memory device memory regions unmergeable virtio-mem: Expose device memory dynamically via multiple memslots if enabled virtio-mem: Update state to match bitmap as soon as it's been migrated virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cb memory: Clarify mapping requirements for RamDiscardManager memory-device,vhost: Support automatic decision on the number of memslots vhost: Add vhost_get_max_memslots() kvm: Add stub for kvm_get_max_memslots() memory-device,vhost: Support memory devices that dynamically consume memslots memory-device: Track required and actually used memslots in DeviceMemoryState stubs: Rename qmp_memory_device.c to memory_device.c memory-device: Support memory devices with multiple memslots vhost: Return number of free memslots kvm: Return number of free memslots softmmu/physmem: Fixup qemu_ram_block_from_host() documentation vhost: Remove vhost_backend_can_merge() callback vhost: Rework memslot filtering and fix "used_memslot" tracking Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
| * | virtio-mem: Mark memslot alias memory regions unmergeableDavid Hildenbrand2023-10-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's mark the memslot alias memory regions as unmergable, such that flatview and vhost won't merge adjacent memory region aliases and we can atomically map/unmap individual aliases without affecting adjacent alias memory regions. This handles vhost and vfio in multiple-memslot mode correctly (which do not support atomic memslot updates) and avoids the temporary removal of large memslots, which can be an expensive operation. For example, vfio might have to unpin + repin a lot of memory, which is undesired. Message-ID: <20230926185738.277351-19-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory,vhost: Allow for marking memory device memory regions unmergeableDavid Hildenbrand2023-10-123-8/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's allow for marking memory regions unmergeable, to teach flatview code and vhost to not merge adjacent aliases to the same memory region into a larger memory section; instead, we want separate aliases to stay separate such that we can atomically map/unmap aliases without affecting other aliases. This is desired for virtio-mem mapping device memory located on a RAM memory region via multiple aliases into a memory region container, resulting in separate memslots that can get (un)mapped atomically. As an example with virtio-mem, the layout would look something like this: [...] 0000000240000000-00000020bfffffff (prio 0, i/o): device-memory 0000000240000000-000000043fffffff (prio 0, i/o): virtio-mem 0000000240000000-000000027fffffff (prio 0, ram): alias memslot-0 @mem2 0000000000000000-000000003fffffff 0000000280000000-00000002bfffffff (prio 0, ram): alias memslot-1 @mem2 0000000040000000-000000007fffffff 00000002c0000000-00000002ffffffff (prio 0, ram): alias memslot-2 @mem2 0000000080000000-00000000bfffffff [...] Without unmergable memory regions, all three memslots would get merged into a single memory section. For example, when mapping another alias (e.g., virtio-mem-memslot-3) or when unmapping any of the mapped aliases, memory listeners will first get notified about the removal of the big memory section to then get notified about re-adding of the new (differently merged) memory section(s). In an ideal world, memory listeners would be able to deal with that atomically, like KVM nowadays does. However, (a) supporting this for other memory listeners (vhost-user, vfio) is fairly hard: temporary removal can result in all kinds of issues on concurrent access to guest memory; and (b) this handling is undesired, because temporarily removing+readding can consume quite some time on bigger memslots and is not efficient (e.g., vfio unpinning and repinning pages ...). Let's allow for marking a memory region unmergeable, such that we can atomically (un)map aliases to the same memory region, similar to (un)mapping individual DIMMs. Similarly, teach vhost code to not redo what flatview core stopped doing: don't merge such sections. Merging in vhost code is really only relevant for handling random holes in boot memory where; without this merging, the vhost-user backend wouldn't be able to mmap() some boot memory backed on hugetlb. We'll use this for virtio-mem next. Message-ID: <20230926185738.277351-18-david@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | virtio-mem: Expose device memory dynamically via multiple memslots if enabledDavid Hildenbrand2023-10-123-1/+340
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having large virtio-mem devices that only expose little memory to a VM is currently a problem: we map the whole sparse memory region into the guest using a single memslot, resulting in one gigantic memslot in KVM. KVM allocates metadata for the whole memslot, which can result in quite some memory waste. Assuming we have a 1 TiB virtio-mem device and only expose little (e.g., 1 GiB) memory, we would create a single 1 TiB memslot and KVM has to allocate metadata for that 1 TiB memslot: on x86, this implies allocating a significant amount of memory for metadata: (1) RMAP: 8 bytes per 4 KiB, 8 bytes per 2 MiB, 8 bytes per 1 GiB -> For 1 TiB: 2147483648 + 4194304 + 8192 = ~ 2 GiB (0.2 %) With the TDP MMU (cat /sys/module/kvm/parameters/tdp_mmu) this gets allocated lazily when required for nested VMs (2) gfn_track: 2 bytes per 4 KiB -> For 1 TiB: 536870912 = ~512 MiB (0.05 %) (3) lpage_info: 4 bytes per 2 MiB, 4 bytes per 1 GiB -> For 1 TiB: 2097152 + 4096 = ~2 MiB (0.0002 %) (4) 2x dirty bitmaps for tracking: 2x 1 bit per 4 KiB page -> For 1 TiB: 536870912 = 64 MiB (0.006 %) So we primarily care about (1) and (2). The bad thing is, that the memory consumption *doubles* once SMM is enabled, because we create the memslot once for !SMM and once for SMM. Having a 1 TiB memslot without the TDP MMU consumes around: * With SMM: 5 GiB * Without SMM: 2.5 GiB Having a 1 TiB memslot with the TDP MMU consumes around: * With SMM: 1 GiB * Without SMM: 512 MiB ... and that's really something we want to optimize, to be able to just start a VM with small boot memory (e.g., 4 GiB) and a virtio-mem device that can grow very large (e.g., 1 TiB). Consequently, using multiple memslots and only mapping the memslots we really need can significantly reduce memory waste and speed up memslot-related operations. Let's expose the sparse RAM memory region using multiple memslots, mapping only the memslots we currently need into our device memory region container. The feature can be enabled using "dynamic-memslots=on" and requires "unplugged-inaccessible=on", which is nowadays the default. Once enabled, we'll auto-detect the number of memslots to use based on the memslot limit provided by the core. We'll use at most 1 memslot per gigabyte. Note that our global limit of memslots accross all memory devices is currently set to 256: even with multiple large virtio-mem devices, we'd still have a sane limit on the number of memslots used. The default is to not dynamically map memslot for now ("dynamic-memslots=off"). The optimization must be enabled manually, because some vhost setups (e.g., hotplug of vhost-user devices) might be problematic until we support more memslots especially in vhost-user backends. Note that "dynamic-memslots=on" is just a hint that multiple memslots *may* be used for internal optimizations, not that multiple memslots *must* be used. The actual number of memslots that are used is an internal detail: for example, once memslot metadata is no longer an issue, we could simply stop optimizing for that. Migration source and destination can differ on the setting of "dynamic-memslots". Message-ID: <20230926185738.277351-17-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | virtio-mem: Update state to match bitmap as soon as it's been migratedDavid Hildenbrand2023-10-121-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's cleaner and future-proof to just have other state that depends on the bitmap state to be updated as soon as possible when restoring the bitmap. So factor out informing RamDiscardListener into a functon and call it in case of early migration right after we restored the bitmap. Message-ID: <20230926185738.277351-16-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | virtio-mem: Pass non-const VirtIOMEM via virtio_mem_range_cbDavid Hildenbrand2023-10-121-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | Let's prepare for a user that has to modify the VirtIOMEM device state. Message-ID: <20230926185738.277351-15-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory: Clarify mapping requirements for RamDiscardManagerDavid Hildenbrand2023-10-122-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We really only care about the RAM memory region not being mapped into an address space yet as long as we're still setting up the RamDiscardManager. Once mapped into an address space, memory notifiers would get notified about such a region and any attempts to modify the RamDiscardManager would be wrong. While "mapped into an address space" is easy to check for RAM regions that are mapped directly (following the ->container links), it's harder to check when such regions are mapped indirectly via aliases. For now, we can only detect that a region is mapped through an alias (->mapped_via_alias), but we don't have a handle on these aliases to follow all their ->container links to test if they are eventually mapped into an address space. So relax the assertion in memory_region_set_ram_discard_manager(), remove the check in memory_region_get_ram_discard_manager() and clarify the doc. Message-ID: <20230926185738.277351-14-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | memory-device,vhost: Support automatic decision on the number of memslotsDavid Hildenbrand2023-10-125-4/+147
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to support memory devices that can automatically decide how many memslots they will use. In the worst case, they have to use a single memslot. The target use cases are virtio-mem and the hyper-v balloon. Let's calculate a reasonable limit such a memory device may use, and instruct the device to make a decision based on that limit. Use a simple heuristic that considers: * A memslot soft-limit for all memory devices of 256; also, to not consume too many memslots -- which could harm performance. * Actually still free and unreserved memslots * The percentage of the remaining device memory region that memory device will occupy. Further, while we properly check before plugging a memory device whether there still is are free memslots, we have other memslot consumers (such as boot memory, PCI BARs) that don't perform any checks and might dynamically consume memslots without any prior reservation. So we might succeed in plugging a memory device, but once we dynamically map a PCI BAR we would be in trouble. Doing accounting / reservation / checks for all such users is problematic (e.g., sometimes we might temporarily split boot memory into two memslots, triggered by the BIOS). We use the historic magic memslot number of 509 as orientation to when supporting 256 memory devices -> memslots (leaving 253 for boot memory and other devices) has been proven to work reliable. We'll fallback to suggesting a single memslot if we don't have at least 509 total memslots. Plugging vhost devices with less than 509 memslots available while we have memory devices plugged that consume multiple memslots due to automatic decisions can be problematic. Most configurations might just fail due to "limit < used + reserved", however, it can also happen that these memory devices would suddenly consume memslots that would actually be required by other memslot consumers (boot, PCI BARs) later. Note that this has always been sketchy with vhost devices that support only a small number of memslots; but we don't want to make it any worse.So let's keep it simple and simply reject plugging such vhost devices in such a configuration. Eventually, all vhost devices that want to be fully compatible with such memory devices should support a decent number of memslots (>= 509). Message-ID: <20230926185738.277351-13-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
| * | vhost: Add vhost_get_max_memslots()David Hildenbrand2023-10-123-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | Let's add vhost_get_max_memslots(). Message-ID: <20230926185738.277351-12-david@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>