focaccia-qemu - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Age	Files	Lines
*	sphinx/qapidoc: Drop code to generate doc for simple union tag	Markus Armbruster	2024-02-12	1	-6/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	QAPISchemaGenRSTVisitor._nodes_for_members() has a special case to auto-generate documentation for a union tag member of implicit (enum) type that lacks documentation. This was useful for simple unions, where the tag member's type was implicitly. The only implicit enum type left today is 'QType'. Not worth a special case. Drop. No change to generated documentation. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-6-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
*	qapi: Indent tagged doc comment sections properly	Markus Armbruster	2024-02-12	5	-40/+42
\| \| \| \| \| \| \| \| \| \| \| \|	docs/devel/qapi-code-gen demands that the "second and subsequent lines of sections other than "Example"/"Examples" should be indented". Commit a937b6aa739q (qapi: Reformat doc comments to conform to current conventions) missed a few instances, and messed up a few others. Clean that up. Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-5-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
*	qapi/block-core: Fix BlockLatencyHistogramInfo doc markup	Markus Armbruster	2024-02-12	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The description of @bins ends with a literal block: # @bins: list of io request counts corresponding to histogram # intervals, one more element than @boundaries has. For the # example above, @bins may be something like [3, 1, 5, 2], and # corresponding histogram looks like: # # :: # # 5\| * Except it actually ends before the block: the unindented '::' line starts a new section. Makes no sense. We could fix this by indenting the '::' line. Instead, double the colon at the end of the preceding paragraph, and drop the '::' line. This shifts the box for the literal block right in generated documentation, so it lines up with the description. Fixes: commit a0fcff383b34 (qapi: Use rST markup for literal blocks) Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-4-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
*	docs/devel/qapi-code-gen: Tweak doc comment whitespace	Markus Armbruster	2024-02-12	1	-2/+3
\| \| \| \| \| \| \| \| \|	Missed in commit a937b6aa739 (qapi: Reformat doc comments to conform to current conventions). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-3-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
*	docs/devel/qapi-code-gen: Normalize version refs x.y.0 to just x.y	Markus Armbruster	2024-02-12	1	-2/+2
\| \| \| \| \| \| \| \| \|	Missed in commit 9bc6e893b72 (qapi: Normalize version references x.y.0 to just x.y). Signed-off-by: Markus Armbruster <armbru@redhat.com> Message-ID: <20240205074709.3613229-2-armbru@redhat.com> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
*	Merge tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu ↵	Peter Maydell	2024-02-09	12	-396/+547
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	into staging Migration pull - William's fix on hwpoison migration which used to crash QEMU - Peter's multifd cleanup + bugfix + optimizations - Avihai's fix on multifd crash over non-socket channels - Fabiano's multifd thread-race fix - Peter's CI fix series # -----BEGIN PGP SIGNATURE----- # # iIgEABYKADAWIQS5GE3CDMRX2s990ak7X8zN86vXBgUCZcREtRIccGV0ZXJ4QHJl # ZGhhdC5jb20ACgkQO1/MzfOr1wacrwEAl2aeQkh51h/e+OKX7MG4/4Y6Edf6Oz7o # IJLk/cyrUFQA/2exo2lOdv5zHNOJKwAYj8HYDraezrC/MK1eED4Wji0M # =k53l # -----END PGP SIGNATURE----- # gpg: Signature made Thu 08 Feb 2024 03:04:21 GMT # gpg: using EDDSA key B9184DC20CC457DACF7DD1A93B5FCCCDF3ABD706 # gpg: issuer "peterx@redhat.com" # gpg: Good signature from "Peter Xu <xzpeter@gmail.com>" [marginal] # gpg: aka "Peter Xu <peterx@redhat.com>" [marginal] # gpg: WARNING: This key is not certified with sufficiently trusted signatures! # gpg: It is not certain that the signature belongs to the owner. # Primary key fingerprint: B918 4DC2 0CC4 57DA CF7D D1A9 3B5F CCCD F3AB D706 * tag 'migration-staging-pull-request' of https://gitlab.com/peterx/qemu: (34 commits) ci: Update comment for migration-compat-aarch64 ci: Remove tag dependency for build-previous-qemu tests/migration-test: Stick with gicv3 in aarch64 test migration/multifd: Add a synchronization point for channel creation migration/multifd: Unify multifd and TLS connection paths migration/multifd: Move multifd_send_setup into migration thread migration/multifd: Move multifd_send_setup error handling in to the function migration/multifd: Remove p->running migration/multifd: Join the TLS thread migration: Fix logic of channels and transport compatibility check migration/multifd: Optimize sender side to be lockless migration/multifd: Fix MultiFDSendParams.packet_num race migration/multifd: Stick with send/recv on function names migration/multifd: Cleanup multifd_load_cleanup() migration/multifd: Cleanup multifd_save_cleanup() migration/multifd: Rewrite multifd_queue_page() migration/multifd: Change retval of multifd_send_pages() migration/multifd: Change retval of multifd_queue_page() migration/multifd: Split multifd_send_terminate_threads() migration/multifd: Forbid spurious wakeups ... Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
\| *	ci: Update comment for migration-compat-aarch64	Peter Xu	2024-02-07	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that we may not be able to enable this test even for the upcoming v9.0. Document what we're still missing. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240207005403.242235-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	ci: Remove tag dependency for build-previous-qemu	Peter Xu	2024-02-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new build-previous-qemu job relies on QEMU release tag being present, while that may not be always true for personal git repositories since by default tag is not pushed. The job can fail on those CI kicks, as reported by Peter Maydell. Fix it by fetching the tags remotely from the official repository, as suggested by Dan. [1] https://lore.kernel.org/r/ZcC9ScKJ7VvqektA@redhat.com Reported-by: Peter Maydell <peter.maydell@linaro.org> Suggested-by: "Daniel P. Berrangé" <berrange@redhat.com> Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240207005403.242235-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	tests/migration-test: Stick with gicv3 in aarch64 test	Peter Xu	2024-02-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recently we introduced cross-binary migration test. It's always wanted that migration-test uses stable guest ABI for both QEMU binaries in this case, so that both QEMU binaries will be compatible on the migration stream with the cmdline specified. Switch to a static gic version "3" rather than using version "max", so that GIC should be stable now across any future QEMU binaries for migration-test. Here the version can actually be anything as long as the ABI is stable. We choose "3" because it's the majority of what we already use in QEMU while still new enough: "git grep gic-version=3" shows 6 hit, while version 4 has no direct user yet besides "max". Note that even with this change, aarch64 won't be able to work yet with migration cross binary test, but then the only missing piece will be the stable CPU model. Reviewed-by: "Daniel P. Berrangé" <berrange@redhat.com> Link: https://lore.kernel.org/r/20240207005403.242235-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Add a synchronization point for channel creation	Fabiano Rosas	2024-02-07	1	-6/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible that one of the multifd channels fails to be created at multifd_new_send_channel_async() while the rest of the channel creation tasks are still in flight. This could lead to multifd_save_cleanup() executing the qemu_thread_join() loop too early and not waiting for the threads which haven't been created yet, leading to the freeing of resources that the newly created threads will try to access and crash. Add a synchronization point after which there will be no attempts at thread creation and therefore calling multifd_save_cleanup() past that point will ensure it properly waits for the threads. A note about performance: Prior to this patch, if a channel took too long to be established, other channels could finish connecting first and already start taking load. Now we're bounded by the slowest-connecting channel. Reported-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-7-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Unify multifd and TLS connection paths	Fabiano Rosas	2024-02-07	1	-43/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During multifd channel creation (multifd_send_new_channel_async) when TLS is enabled, the multifd_channel_connect function is called twice, once to create the TLS handshake thread and another time after the asynchrounous TLS handshake has finished. This creates a slightly confusing call stack where multifd_channel_connect() is called more times than the number of channels. It also splits error handling between the two callers of multifd_channel_connect() causing some code duplication. Lastly, it gets in the way of having a single point to determine whether all channel creation tasks have been initiated. Refactor the code to move the reentrancy one level up at the multifd_new_send_channel_async() level, de-duplicating the error handling and allowing for the next patch to introduce a synchronization point common to all the multifd channel creation, regardless of TLS. Note that the previous code would never fail once p->c had been set. This patch changes this assumption, which affects refcounting, so add comments around object_unref to explain the situation. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-6-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Move multifd_send_setup into migration thread	Fabiano Rosas	2024-02-07	1	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently have an unfavorable situation around multifd channels creation and the migration thread execution. We create the multifd channels with qio_channel_socket_connect_async -> qio_task_run_in_thread, but only connect them at the multifd_new_send_channel_async callback, called from qio_task_complete, which is registered as a glib event. So at multifd_send_setup() we create the channels, but they will only be actually usable after the whole multifd_send_setup() calling stack returns back to the main loop. Which means that the migration thread is already up and running without any possibility for the multifd channels to be ready on time. We currently rely on the channels-ready semaphore blocking multifd_send_sync_main() until channels start to come up and release it. However there have been bugs recently found when a channel's creation fails and multifd_send_cleanup() is allowed to run while other channels are still being created. Let's start to organize this situation by moving the multifd_send_setup() call into the migration thread. That way we unblock the main-loop to dispatch the completion callbacks and actually have a chance of getting the multifd channels ready for when the migration thread needs them. The next patches will deal with the synchronization aspects. Note that this takes multifd_send_setup() out of the BQL. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-5-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Move multifd_send_setup error handling in to the function	Fabiano Rosas	2024-02-07	3	-13/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hide the error handling inside multifd_send_setup to make it cleaner for the next patch to move the function around. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-4-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Remove p->running	Fabiano Rosas	2024-02-07	2	-20/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently only need p->running to avoid calling qemu_thread_join() on a non existent thread if the thread has never been created. However, there are at least two bugs in this logic: 1) On the sending side, p->running is set too early and qemu_thread_create() can be skipped due to an error during TLS handshake, leaving the flag set and leading to a crash when multifd_send_cleanup() calls qemu_thread_join(). 2) During exit, the multifd thread clears the flag while holding the channel lock. The counterpart at multifd_send_cleanup() reads the flag outside of the lock and might free the mutex while the multifd thread still has it locked. Fix the first issue by setting the flag right before creating the thread. Rename it from p->running to p->thread_created to clarify its usage. Fix the second issue by not clearing the flag at the multifd thread exit. We don't have any use for that. Note that these bugs are straight-forward logic issues and not race conditions. There is still a gap for races to affect this code due to multifd_send_cleanup() being allowed to run concurrently with the thread creation loop. This issue is solved in the next patches. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: 29647140157a ("migration/tls: add support for multifd tls-handshake") Reported-by: Avihai Horon <avihaih@nvidia.com> Reported-by: chenyuhui5@huawei.com Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-3-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Join the TLS thread	Fabiano Rosas	2024-02-07	2	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We're currently leaking the resources of the TLS thread by not joining it and also overwriting the p->thread pointer altogether. Fixes: a1af605bd5 ("migration/multifd: fix hangup with TLS-Multifd due to blocking handshake") Cc: qemu-stable <qemu-stable@nongnu.org> Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240206215118.6171-2-farosas@suse.de Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration: Fix logic of channels and transport compatibility check	Avihai Horon	2024-02-07	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The commit in the fixes line mistakenly modified the channels and transport compatibility check logic so it now checks multi-channel support only for socket transport type. Thus, running multifd migration using a transport other than socket that is incompatible with multi-channels (such as "exec") would lead to a segmentation fault instead of an error message. For example: (qemu) migrate_set_capability multifd on (qemu) migrate -d "exec:cat > /tmp/vm_state" Segmentation fault (core dumped) Fix it by checking multi-channel compatibility for all transport types. Cc: qemu-stable <qemu-stable@nongnu.org> Fixes: d95533e1cdcc ("migration: modify migration_channels_and_uri_compatible() for new QAPI syntax") Signed-off-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Peter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20240125162528.7552-2-avihaih@nvidia.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Optimize sender side to be lockless	Peter Xu	2024-02-06	2	-27/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When reviewing my attempt to refactor send_prepare(), Fabiano suggested we try out with dropping the mutex in multifd code [1]. I thought about that before but I never tried to change the code. Now maybe it's time to give it a stab. This only optimizes the sender side. The trick here is multifd has a clear provider/consumer model, that the migration main thread publishes requests (either pending_job/pending_sync), while the multifd sender threads are consumers. Here we don't have a lot of complicated data sharing, and the jobs can logically be submitted lockless. Arm the code with atomic weapons. Two things worth mentioning: - For multifd_send_pages(): we can use qatomic_load_acquire() when trying to find a free channel, but that's expensive if we attach one ACQUIRE per channel. Instead, keep the qatomic_read() on reading the pending_job flag as we do already, meanwhile use one smp_mb_acquire() after the loop to guarantee the memory ordering. - For pending_sync: it doesn't have any extra data required since now p->flags are never touched, it should be safe to not use memory barrier. That's different from pending_job. Provide rich comments for all the lockless operations to state how they are paired. With that, we can remove the mutex. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-24-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Fix MultiFDSendParams.packet_num race	Peter Xu	2024-02-05	2	-24/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported correctly by Fabiano [1] (while per Fabiano, it sourced back to Elena's initial report in Oct 2023), MultiFDSendParams.packet_num is buggy to be assigned and stored. Consider two consequent operations of: (1) queue a job into multifd send thread X, then (2) queue another sync request to the same send thread X. Then the MultiFDSendParams.packet_num will be assigned twice, and the first assignment can get lost already. To avoid that, we move the packet_num assignment from p->packet_num into where the thread will fill in the packet. Use atomic operations to protect the field, making sure there's no race. Note that atomic fetch_add() may not be good for scaling purposes, however multifd should be fine as number of threads should normally not go beyond 16 threads. Let's leave that concern for later but fix the issue first. There's also a trick on how to make it always work even on 32 bit hosts for uint64_t packet number. Switching to uintptr_t as of now to simply the case. It will cause packet number to overflow easier on 32 bit, but that shouldn't be a major concern for now as 32 bit systems is not the major audience for any performance concerns like what multifd wants to address. We also need to move multifd_send_state definition upper, so that multifd_send_fill_packet() can reference it. [1] https://lore.kernel.org/r/87o7d1jlu5.fsf@suse.de Reported-by: Elena Ufimtseva <elena.ufimtseva@oracle.com> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-23-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Stick with send/recv on function names	Peter Xu	2024-02-05	3	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of the multifd code uses send/recv to represent the two sides, but some rare cases use save/load. Since send/recv is the majority, replacing the save/load use cases to use send/recv globally. Now we reach a consensus on the naming. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-22-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Cleanup multifd_load_cleanup()	Peter Xu	2024-02-05	1	-22/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use similar logic to cleanup the recv side. Note that multifd_recv_terminate_threads() may need some similar rework like the sender side, but let's leave that for later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-21-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Cleanup multifd_save_cleanup()	Peter Xu	2024-02-05	1	-32/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shrink the function by moving relevant works into helpers: move the thread join()s into multifd_send_terminate_threads(), then create two more helpers to cover channel/state cleanups. Add a TODO entry for the thread terminate process because p->running is still buggy. We need to fix it at some point but not yet covered. Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-20-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Rewrite multifd_queue_page()	Peter Xu	2024-02-05	1	-19/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current multifd_queue_page() is not easy to read and follow. It is not good with a few reasons: - No helper at all to show what exactly does a condition mean; in short, readability is low. - Rely on pages->ramblock being cleared to detect an empty queue. It's slightly an overload of the ramblock pointer, per Fabiano [1], which I also agree. - Contains a self recursion, even if not necessary.. Rewrite this function. We add some comments to make it even clearer on what it does. [1] https://lore.kernel.org/r/87wmrpjzew.fsf@suse.de Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-19-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Change retval of multifd_send_pages()	Peter Xu	2024-02-05	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-18-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Change retval of multifd_queue_page()	Peter Xu	2024-02-05	3	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using int is an overkill when there're only two options. Change it to a boolean. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-17-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Split multifd_send_terminate_threads()	Peter Xu	2024-02-05	2	-10/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Split multifd_send_terminate_threads() into two functions: - multifd_send_set_error(): used when an error happened on the sender side, set error and quit state only - multifd_send_terminate_threads(): used only by the main thread to kick all multifd send threads out of sleep, for the last recycling. Use multifd_send_set_error() in the three old call sites where only the error will be set. Use multifd_send_terminate_threads() in the last one where the main thread will kick the multifd threads at last in multifd_save_cleanup(). Both helpers will need to set quitting=1. Suggested-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-16-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Forbid spurious wakeups	Peter Xu	2024-02-05	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now multifd's logic is designed to have no spurious wakeup. I still remember a talk to Juan and he seems to agree we should drop it now, and if my memory was right it was there because multifd used to hit that when still debugging. Let's drop it and see what can explode; as long as it's not reaching soft-freeze. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-15-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Move header prepare/fill into send_prepare()	Peter Xu	2024-02-05	4	-33/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch redefines the interfacing of ->send_prepare(). It further simplifies multifd_send_thread() especially on zero copy. Now with the new interface, we require the hook to do all the work for preparing the IOVs to send. After it's completed, the IOVs should be ready to be dumped into the specific multifd QIOChannel later. So now the API looks like: p->pages -----------> send_prepare() -------------> IOVs This also prepares for the case where the input can be extended to even not any p->pages. But that's for later. This patch will achieve similar goal of what Fabiano used to propose here: https://lore.kernel.org/r/20240126221943.26628-1-farosas@suse.de However the send() interface may not be necessary. I'm boldly attaching a "Co-developed-by" for Fabiano. Co-developed-by: Fabiano Rosas <farosas@suse.de> Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-14-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: multifd_send_prepare_header()	Peter Xu	2024-02-05	2	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce a helper multifd_send_prepare_header() to setup the header packet for multifd sender. It's fine to setup the IOV[0] _before_ send_prepare() because the packet buffer is already ready, even if the content is to be filled in. With this helper, we can already slightly clean up the zero copy path. Note that I explicitly put it into multifd.h, because I want it inlined directly into multifd*.c where necessary later. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-13-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Move trace_multifd_send\|recv()	Peter Xu	2024-02-05	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move them into fill/unfill of packets. With that, we can further cleanup the send/recv thread procedure, and remove one more temp var. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-12-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Move total_normal_pages accounting	Peter Xu	2024-02-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just like the previous patch, move the accounting for total_normal_pages on both src/dst sides into the packet fill/unfill procedures. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-11-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Rename p->num_packets and clean it up	Peter Xu	2024-02-05	2	-11/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This field, no matter whether on src or dest, is only used for debugging purpose. They can even be removed already, unless it still more or less provide some accounting on "how many packets are sent/recved for this thread". The other more important one is called packet_num, which is embeded in the multifd packet headers (MultiFDPacket_t). So let's keep them for now, but make them much easier to understand, by doing below: - Rename both of them to packets_sent / packets_recved, the old name (num_packets) are waaay too confusing when we already have MultiFDPacket_t.packets_num. - Avoid worrying on the "initial packet": we know we will send it, that's good enough. The accounting won't matter a great deal to start with 0 or with 1. - Move them to where we send/recv the packets. They're: - multifd_send_fill_packet() for senders. - multifd_recv_unfill_packet() for receivers. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-10-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Drop pages->num check in sender thread	Peter Xu	2024-02-05	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now with a split SYNC handler, we always have pages->num set for pending_job==true. Assert it instead. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-9-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Simplify locking in sender thread	Peter Xu	2024-02-05	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The sender thread will yield the p->mutex before IO starts, trying to not block the requester thread. This may be unnecessary lock optimizations, because the requester can already read pending_job safely even without the lock, because the requester is currently the only one who can assign a task. Drop that lock complication on both sides: (1) in the sender thread, always take the mutex until job done (2) in the requester thread, check pending_job clear lockless Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-8-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Separate SYNC request with normal jobs	Peter Xu	2024-02-05	2	-16/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Multifd provide a threaded model for processing jobs. On sender side, there can be two kinds of job: (1) a list of pages to send, or (2) a sync request. The sync request is a very special kind of job. It never contains a page array, but only a multifd packet telling the dest side to synchronize with sent pages. Before this patch, both requests use the pending_job field, no matter what the request is, it will boost pending_job, while multifd sender thread will decrement it after it finishes one job. However this should be racy, because SYNC is special in that it needs to set p->flags with MULTIFD_FLAG_SYNC, showing that this is a sync request. Consider a sequence of operations where: - migration thread enqueue a job to send some pages, pending_job++ (0->1) - [...before the selected multifd sender thread wakes up...] - migration thread enqueue another job to sync, pending_job++ (1->2), setup p->flags=MULTIFD_FLAG_SYNC - multifd sender thread wakes up, found pending_job==2 - send the 1st packet with MULTIFD_FLAG_SYNC and list of pages - send the 2nd packet with flags==0 and no pages This is not expected, because MULTIFD_FLAG_SYNC should hopefully be done after all the pages are received. Meanwhile, the 2nd packet will be completely useless, which contains zero information. I didn't verify above, but I think this issue is still benign in that at least on the recv side we always receive pages before handling MULTIFD_FLAG_SYNC. However that's not always guaranteed and just tricky. One other reason I want to separate it is using p->flags to communicate between the two threads is also not clearly defined, it's very hard to read and understand why accessing p->flags is always safe; see the current impl of multifd_send_thread() where we tried to cache only p->flags. It doesn't need to be that complicated. This patch introduces pending_sync, a separate flag just to show that the requester needs a sync. Alongside, we remove the tricky caching of p->flags now because after this patch p->flags should only be used by multifd sender thread now, which will be crystal clear. So it is always thread safe to access p->flags. With that, we can also safely convert the pending_job into a boolean, because we don't support >1 pending jobs anyway. Always use atomic ops to access both flags to make sure no cache effect. When at it, drop the initial setting of "pending_job = 0" because it's always allocated using g_new0(). Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-7-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Drop MultiFDSendParams.normal[] array	Peter Xu	2024-02-05	4	-30/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This array is redundant when p->pages exists. Now we extended the life of p->pages to the whole period where pending_job is set, it should be safe to always use p->pages->offset[] rather than p->normal[]. Drop the array. Alongside, the normal_num is also redundant, which is the same to p->pages->num. This doesn't apply to recv side, because there's no extra buffering on recv side, so p->normal[] array is still needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-6-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Postpone reset of MultiFDPages_t	Peter Xu	2024-02-05	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now we reset MultiFDPages_t object in the multifd sender thread in the middle of the sending job. That's not necessary, because the "pages" struct will not be reused anyway until pending_job is cleared. Move that to the end after the job is completed, provide a helper to reset a "pages" object. Use that same helper when free the object too. This prepares us to keep using p->pages in the follow up patches, where we may drop p->normal[]. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-5-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Drop MultiFDSendParams.quit, cleanup error paths	Peter Xu	2024-02-05	2	-54/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Multifd send side has two fields to indicate error quits: - MultiFDSendParams.quit - &multifd_send_state->exiting Merge them into the global one. The replacement is done by changing all p->quit checks into the global var check. The global check doesn't need any lock. A few more things done on top of this altogether: - multifd_send_terminate_threads() Moving the xchg() of &multifd_send_state->exiting upper, so as to cover the tracepoint, migrate_set_error() and migrate_set_state(). - multifd_send_sync_main() In the 2nd loop, add one more check over the global var to make sure we don't keep the looping if QEMU already decided to quit. - multifd_tls_outgoing_handshake() Use multifd_send_terminate_threads() to set the error state. That has a benefit of updating MigrationState.error to that error too, so we can persist that 1st error we hit in that specific channel. - multifd_new_send_channel_async() Take similar approach like above, drop the migrate_set_error() because multifd_send_terminate_threads() already covers that. Unwrap the helper multifd_new_send_channel_cleanup() along the way; not really needed. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-4-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: multifd_send_kick_main()	Peter Xu	2024-02-05	1	-6/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a multifd sender thread hit errors, it always needs to kick the main thread by kicking all the semaphores that it can be waiting upon. Provide a helper for it and deduplicate the code. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-3-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration/multifd: Drop stale comment for multifd zero copy	Peter Xu	2024-02-05	1	-11/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We've already done that with multifd_flush_after_each_section, for multifd in general. Drop the stale "TODO-like" comment. Reviewed-by: Fabiano Rosas <farosas@suse.de> Link: https://lore.kernel.org/r/20240202102857.110210-2-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com>
\| *	migration: prevent migration when VM has poisoned memory	William Roche	2024-02-05	4	-0/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A memory page poisoned from the hypervisor level is no longer readable. The migration of a VM will crash Qemu when it tries to read the memory address space and stumbles on the poisoned page with a similar stack trace: Program terminated with signal SIGBUS, Bus error. #0 _mm256_loadu_si256 #1 buffer_zero_avx2 #2 select_accel_fn #3 buffer_is_zero #4 save_zero_page #5 ram_save_target_page_legacy #6 ram_save_host_page #7 ram_find_and_save_block #8 ram_save_iterate #9 qemu_savevm_state_iterate #10 migration_iteration_run #11 migration_thread #12 qemu_thread_start To avoid this VM crash during the migration, prevent the migration when a known hardware poison exists on the VM. Signed-off-by: William Roche <william.roche@oracle.com> Link: https://lore.kernel.org/r/20240130190640.139364-2-william.roche@oracle.com Signed-off-by: Peter Xu <peterx@redhat.com>
* \|	Merge tag 'for-upstream' of https://repo.or.cz/qemu/kevin into staging	Peter Maydell	2024-02-08	13	-158/+236
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Block layer patches - Allow concurrent BB context changes - virtio: Re-enable notifications after drain - virtio-blk: Fix missing use of irqfd - scsi: Don't ignore most usb-storage properties - blkio: Respect memory-alignment for bounce buffer allocations - iotests tmpdir fixes - virtio-blk: Code cleanups # -----BEGIN PGP SIGNATURE----- # # iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmXEkwgRHGt3b2xmQHJl # ZGhhdC5jb20ACgkQfwmycsiPL9Y3jA//TmSBVqHljauyImYOgCt8qCXACttV0xhQ # Q5ldUNx/JmIFMoUR7OlpVAL2MtvdwE0jjY+sDlEmWtz4IFJcCsCTUCHZZb8blreb # +mnMkqrQ6Nb3tPR2jeIknrXqNy1ffyjZItktjWXVcl6jaHB8YabHHqszs9DIaf4n # lcKovBKxula8ckMgvm48wCwTtS7VEPeuC5FrOqUqTtuhg+QKp5ZVoyVFHtf6GKTD # iuXzCd4yxu4fDKAthJJj4N1bQaOmCKU7K9N/665wj9P2TyfmwlBAfNwNAlYbdX1E # Sv7eSioQs2+oUxmfD/PUsF7wTYtDCrSAUFn1kP/XdRyXPJR3dHGiBKV9w9CaWNrU # y8rqOhxVcuoBLRljTF32BK4HniAREjRngtpT2FnQQIyedZrXIwyTAWjs+LW12T6O # NMiU603Nl9ZYhO1et2+qspsVpNIfEpQWpK+OCon6E+ggj1ea+pfqU30VPx4JU05I # VLiydluIbehSkRlTHgFcTgApmx843OGW7CvWfRyen86Cexgx3DEjJUQ4/bYqaCha # yLIi91rToSDmtlzJrg9eYiMs5Y6vz+ORvvX5im1RlbUUb7Kx/LaA4BU/uArEbBt8 # xXm/grO4hFUGqtLgd2LIjWaHSsLoW4jKeEiExFUUfvH5DG9Zl5HmzFwu+DYxX+im # MJLLetDJAWI= # =8tc0 # -----END PGP SIGNATURE----- # gpg: Signature made Thu 08 Feb 2024 08:38:32 GMT # gpg: using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6 # gpg: issuer "kwolf@redhat.com" # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full] # Primary key fingerprint: DC3D EB15 9A9A F95D 3D74 56FE 7F09 B272 C88F 2FD6 * tag 'for-upstream' of https://repo.or.cz/qemu/kevin: virtio-blk: avoid using ioeventfd state in irqfd conditional virtio-blk: Use ioeventfd_attach in start_ioeventfd virtio: Re-enable notifications after drain virtio-scsi: Attach event vq notifier with no_poll blkio: Respect memory-alignment for bounce buffer allocations scsi: Don't ignore most usb-storage properties virtio-blk: do not use C99 mixed declarations iotests: give tempdir an identifying name iotests: fix leak of tmpdir in dry-run mode scsi: Await request purging block-backend: Allow concurrent context changes monitor: use aio_co_reschedule_self() virtio-blk: declare VirtIOBlock::rq with a type virtio-blk: add vq_rq[] bounds check in virtio_blk_dma_restart_cb() virtio-blk: clarify that there is at least 1 virtqueue virtio-blk: enforce iothread-vq-mapping validation Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
\| * \|	virtio-blk: avoid using ioeventfd state in irqfd conditional	Stefan Hajnoczi	2024-02-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Requests that complete in an IOThread use irqfd to notify the guest while requests that complete in the main loop thread use the traditional qdev irq code path. The reason for this conditional is that the irq code path requires the BQL: if (s->ioeventfd_started && !s->ioeventfd_disabled) { virtio_notify_irqfd(vdev, req->vq); } else { virtio_notify(vdev, req->vq); } There is a corner case where the conditional invokes the irq code path instead of the irqfd code path: static void virtio_blk_stop_ioeventfd(VirtIODevice vdev) { ... / * Set ->ioeventfd_started to false before draining so that host notifiers * are not detached/attached anymore. / s->ioeventfd_started = false; / Wait for virtio_blk_dma_restart_bh() and in flight I/O to complete */ blk_drain(s->conf.conf.blk); During blk_drain() the conditional produces the wrong result because ioeventfd_started is false. Use qemu_in_iothread() instead of checking the ioeventfd state. Cc: qemu-stable@nongnu.org Buglink: https://issues.redhat.com/browse/RHEL-15394 Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240122172625.415386-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	virtio-blk: Use ioeventfd_attach in start_ioeventfd	Hanna Czenczek	2024-02-07	1	-11/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit d3f6f294aeadd5f88caf0155e4360808c95b3146 ("virtio-blk: always set ioeventfd during startup") has made virtio_blk_start_ioeventfd() always kick the virtqueue (set the ioeventfd), regardless of whether the BB is drained. That is no longer necessary, because attaching the host notifier will now set the ioeventfd, too; this happens either immediately right here in virtio_blk_start_ioeventfd(), or later when the drain ends, in virtio_blk_ioeventfd_attach(). With event_notifier_set() removed, the code becomes the same as the one in virtio_blk_ioeventfd_attach(), so we can reuse that function. Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-4-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	virtio: Re-enable notifications after drain	Hanna Czenczek	2024-02-07	2	-1/+48
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During drain, we do not care about virtqueue notifications, which is why we remove the handlers on it. When removing those handlers, whether vq notifications are enabled or not depends on whether we were in polling mode or not; if not, they are enabled (by default); if so, they have been disabled by the io_poll_start callback. Because we do not care about those notifications after removing the handlers, this is fine. However, we have to explicitly ensure they are enabled when re-attaching the handlers, so we will resume receiving notifications. We do this in virtio_queue_aio_attach_host_notifier*(). If such a function is called while we are in a polling section, attaching the notifiers will then invoke the io_poll_start callback, re-disabling notifications. Because we will always miss virtqueue updates in the drained section, we also need to poll the virtqueue once after attaching the notifiers. Buglink: https://issues.redhat.com/browse/RHEL-3934 Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-3-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	virtio-scsi: Attach event vq notifier with no_poll	Hanna Czenczek	2024-02-07	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As of commit 38738f7dbbda90fbc161757b7f4be35b52205552 ("virtio-scsi: don't waste CPU polling the event virtqueue"), we only attach an io_read notifier for the virtio-scsi event virtqueue instead, and no polling notifiers. During operation, the event virtqueue is typically non-empty, but none of the buffers are intended to be used immediately. Instead, they only get used when certain events occur. Therefore, it makes no sense to continuously poll it when non-empty, because it is supposed to be and stay non-empty. We do this by using virtio_queue_aio_attach_host_notifier_no_poll() instead of virtio_queue_aio_attach_host_notifier() for the event virtqueue. Commit 766aa2de0f29b657148e04599320d771c36fd126 ("virtio-scsi: implement BlockDevOps->drained_begin()") however has virtio_scsi_drained_end() use virtio_queue_aio_attach_host_notifier() for all virtqueues, including the event virtqueue. This can lead to it being polled again, undoing the benefit of commit 38738f7dbbda90fbc161757b7f4be35b52205552. Fix it by using virtio_queue_aio_attach_host_notifier_no_poll() for the event virtqueue. Reported-by: Fiona Ebner <f.ebner@proxmox.com> Fixes: 766aa2de0f29b657148e04599320d771c36fd126 ("virtio-scsi: implement BlockDevOps->drained_begin()") Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Tested-by: Fiona Ebner <f.ebner@proxmox.com> Reviewed-by: Fiona Ebner <f.ebner@proxmox.com> Signed-off-by: Hanna Czenczek <hreitz@redhat.com> Message-ID: <20240202153158.788922-2-hreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	blkio: Respect memory-alignment for bounce buffer allocations	Kevin Wolf	2024-02-07	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	blkio_alloc_mem_region() requires that the requested buffer size is a multiple of the memory-alignment property. If it isn't, the allocation fails with a return value of -EINVAL. Fix the call in blkio_resize_bounce_pool() to make sure the requested size is properly aligned. I observed this problem with vhost-vdpa, which requires page aligned memory. As the virtio-blk device behind it still had 512 byte blocks, we got bs->bl.request_alignment = 512, but actually any request that needed a bounce buffer and was not aligned to 4k would fail without this fix. Suggested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240131173140.42398-1-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	scsi: Don't ignore most usb-storage properties	Kevin Wolf	2024-02-07	3	-28/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	usb-storage is for the most part just a wrapper around an internally created scsi-disk device. It uses DEFINE_BLOCK_PROPERTIES() to offer all of the usual block device properties to the user, but then only forwards a few select properties to the internal device while the rest is silently ignored. This changes scsi_bus_legacy_add_drive() to accept a whole BlockConf instead of some individual values inside of it so that usb-storage can now pass the whole configuration to the internal scsi-disk. This enables the remaining block device properties, e.g. logical/physical_block_size or discard_granularity. Buglink: https://issues.redhat.com/browse/RHEL-22375 Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-ID: <20240131130607.24117-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	virtio-blk: do not use C99 mixed declarations	Stefan Hajnoczi	2024-02-07	1	-7/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	QEMU's coding style generally forbids C99 mixed declarations. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-ID: <20240206140410.65650-1-stefanha@redhat.com> Reviewed-by: Hanna Czenczek <hreitz@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	iotests: give tempdir an identifying name	Daniel P. Berrangé	2024-02-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If something goes wrong causing the iotests not to cleanup their temporary directory, it is useful if the dir had an identifying name to show what is to blame. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240205155158.1843304-1-berrange@redhat.com> Revieved-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
\| * \|	iotests: fix leak of tmpdir in dry-run mode	Daniel P. Berrangé	2024-02-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Creating an instance of the 'TestEnv' class will create a temporary directory. This dir is only deleted, however, in the __exit__ handler invoked by a context manager. In dry-run mode, we don't use the TestEnv via a context manager, so were leaking the temporary directory. Since meson invokes 'check' 5 times on each configure run, developers /tmp was filling up with empty temporary directories. Signed-off-by: Daniel P. Berrangé <berrange@redhat.com> Message-ID: <20240205154019.1841037-1-berrange@redhat.com> Reviewed-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>