focaccia-qemu - Unnamed repository; edit this file 'description' to name the repository.

	Commit message (Collapse)	Author	Files	Lines
2018-06-21	hmp: Restrict auto-complete in preconfig	Dr. David Alan Gilbert	1	-2/+7
	Don't show the commands that aren't available. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20180620153947.30834-4-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-21	hmp: Allow help on preconfig commands	Dr. David Alan Gilbert	2	-1/+8
	Allow the 'help' command in preconfig state but make it only list the preconfig commands. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Peter Xu <peterx@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20180620153947.30834-3-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-21	hmp: Add flag for preconfig commands	Dr. David Alan Gilbert	1	-0/+20
	Add a flag to command definitions to allow them to be used in preconfig and check it. If users try to use commands that aren't available, tell them to use the exit_preconfig comand we're adding in a few patches. Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Igor Mammedov <imammedo@redhat.com> Message-Id: <20180620153947.30834-2-dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-21	hmp-commands: use long for begin and length in dump-guest-memory	Suraj Jitindar Singh	1	-1/+1
	The dump-guest-memory command is used to dump an area of guest memory to a file, the piece of memory is specified by a begin address and a length. These parameters are specified as ints and thus have a maximum value of 4GB. This means you can't dump the guest memory past the first 4GB and instead get: (qemu) dump-guest-memory tmp 0x100000000 0x100000000 'dump-guest-memory' has failed: integer is for 32-bit values Try "help dump-guest-memory" for more information This limitation is imposed in monitor_parse_arguments() since they are both ints. hmp_dump_guest_memory() uses 64 bit quantities to store both the begin and length values. Thus specify begin and length as long so that the entire guest memory space can be dumped. Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com> Message-Id: <20180620003202.10546-1-sjitindarsingh@gmail.com> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-21	monitor: report entirety of hmp command on error	Collin Walling	1	-2/+6
	When a user incorrectly provides an hmp command, an error response will be printed that prompts the user to try "help <command name>". However, when the command contains multiple parts e.g. "info uuid xyz", only the last whitespace delimited string will be reported (in this example "info" will be dropped and the message will read "Try "help uuid" for more information", which is incorrect). Let's correct this by capturing the entirety of the command from the command line -- excluding any extraneous characters. Reported-by: Mikhail Fokin <fokin@de.ibm.com> Signed-off-by: Collin Walling <walling@linux.ibm.com> Message-Id: <ee680f5e-ac9a-479d-f65e-9f8ae9cfe5d4@linux.ibm.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
2018-06-18	Update OpenBIOS images to 8fe6f5f96f built from submodule.	Mark Cave-Ayland	4	-0/+0
	Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2018-06-18	iotests: Add test for active mirroring	Max Reitz	3	-0/+126
	Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-15-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Add copy mode QAPI interface	Max Reitz	4	-9/+27
	This patch allows the user to specify whether to use active or only background mode for mirror block jobs. Currently, this setting will remain constant for the duration of the entire block job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-14-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Add active mirroring	Max Reitz	2	-5/+265
	This patch implements active synchronous mirroring. In active mode, the passive mechanism will still be in place and is used to copy all initially dirty clusters off the source disk; but every write request will write data both to the source and the target disk, so the source cannot be dirtied faster than data is mirrored to the target. Also, once the block job has converged (BLOCK_JOB_READY sent), source and target are guaranteed to stay in sync (unless an error occurs). Active mode is completely optional and currently disabled at runtime. A later patch will add a way for users to enable it. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-13-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	job: Add job_progress_increase_remaining()	Max Reitz	2	-0/+20
	Signed-off-by: Max Reitz <mreitz@redhat.com> Message-id: 20180613181823.13618-12-mreitz@redhat.com Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Add MirrorBDSOpaque	Max Reitz	1	-0/+12
	This will allow us to access the block job data when the mirror block driver becomes more complex. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-11-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/dirty-bitmap: Add bdrv_dirty_iter_next_area	Max Reitz	2	-0/+57
	This new function allows to look for a consecutively dirty area in a dirty bitmap. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-10-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	test-hbitmap: Add non-advancing iter_next tests	Max Reitz	1	-12/+24
	Add a function that wraps hbitmap_iter_next() and always calls it in non-advancing mode first, and in advancing mode next. The result should always be the same. By using this function everywhere we called hbitmap_iter_next() before, we should get good test coverage for non-advancing hbitmap_iter_next(). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-9-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	hbitmap: Add @advance param to hbitmap_iter_next()	Max Reitz	5	-19/+26
	This new parameter allows the caller to just query the next dirty position without moving the iterator. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Message-id: 20180613181823.13618-8-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block: Generalize should_update_child() rule	Max Reitz	1	-10/+34
	Currently, bdrv_replace_node() refuses to create loops from one BDS to itself if the BDS to be replaced is the backing node of the BDS to replace it: Say there is a node A and a node B. Replacing B by A means making all references to B point to A. If B is a child of A (i.e. A has a reference to B), that would mean we would have to make this reference point to A itself -- so we'd create a loop. bdrv_replace_node() (through should_update_child()) refuses to do so if B is the backing node of A. There is no reason why we should create loops if B is not the backing node of A, though. The BDS graph should never contain loops, so we should always refuse to create them. If B is a child of A and B is to be replaced by A, we should simply leave B in place there because it is the most sensible choice. A more specific argument would be: Putting filter drivers into the BDS graph is basically the same as appending an overlay to a backing chain. But the main child BDS of a filter driver is not "backing" but "file", so restricting the no-loop rule to backing nodes would fail here. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-7-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Use source as a BdrvChild	Max Reitz	1	-8/+6
	With this, the mirror_top_bs is no longer just a technically required node in the BDS graph but actually represents the block job operation. Also, drop MirrorBlockJob.source, as we can reach it through mirror_top_bs->backing. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-6-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Wait for in-flight op conflicts	Max Reitz	1	-18/+84
	This patch makes the mirror code differentiate between simply waiting for any operation to complete (mirror_wait_for_free_in_flight_slot()) and specifically waiting for all operations touching a certain range of the virtual disk to complete (mirror_wait_on_conflicts()). Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-5-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Use CoQueue to wait on in-flight ops	Max Reitz	1	-11/+23
	Attach a CoQueue to each in-flight operation so if we need to wait for any we can use it to wait instead of just blindly yielding and hoping for some operation to wake us. A later patch will use this infrastructure to allow requests accessing the same area of the virtual disk to specifically wait for each other. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-4-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Convert to coroutines	Max Reitz	1	-62/+90
	In order to talk to the source BDS (and maybe in the future to the target BDS as well) directly, we need to convert our existing AIO requests into coroutine I/O requests. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 20180613181823.13618-3-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	block/mirror: Pull out mirror_perform()	Max Reitz	1	-22/+29
	When converting mirror's I/O to coroutines, we are going to need a point where these coroutines are created. mirror_perform() is going to be that point. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Alberto Garcia <berto@igalia.com> Message-id: 20180613181823.13618-2-mreitz@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>
2018-06-18	monitor: add lock to protect mon_fdsets	Peter Xu	3	-12/+45
	Introduce a new global big lock for mon_fdsets. Take it where needed. The monitor_fdset_get_fd() handling is a bit tricky: now we need to call qemu_mutex_unlock() which might pollute errno, so we need to make sure the correct errno be passed up to the callers. To make things simpler, we let monitor_fdset_get_fd() return the -errno directly when error happens, then in qemu_open() we move it back into errno. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-8-peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	monitor: move init global earlier	Peter Xu	1	-6/+1
	Before this patch, monitor fd helpers might be called even earlier than monitor_init_globals(). This can be problematic. After previous work, now monitor_init_globals() does not depend on accelerator initialization any more. Call it earlier (before CLI parsing; that's where the monitor APIs might be called) to make sure it is called before any of the monitor APIs. Suggested-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-7-peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	monitor: remove event_clock_type	Peter Xu	1	-9/+16
	Instead, use a dynamic function to detect which clock we'll use. The problem is that the old code will let monitor initialization depend on configure_accelerator() (that's where qtest_enabled() start to take effect). After this change, we don't have such a dependency any more. We just need to make sure configure_accelerator() is called when we start to use it. Now it's only used in monitor_qapi_event_queue() and monitor_qapi_event_handler(), so we're good. Suggested-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-6-peterx@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> [monitor_get_event_clock() name and comment tweaked] Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	monitor: fix comment for monitor_lock	Peter Xu	1	-4/+3
	Fix typo in d622cb5879c. Meanwhile move these variables close to each other. monitor_qapi_event_state can be declared static, add that. Reported-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-5-peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	monitor: more comments on lock-free elements	Peter Xu	1	-1/+10
	Add some explicit comments for both Readline and cpu_set/cpu_get helpers that they do not need the mon_lock protection. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-4-peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	monitor: protect mon->fds with mon_lock	Peter Xu	1	-4/+18
	mon->fds were protected by BQL. Now protect it by mon_lock so that it can even be used in monitor iothread. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-3-peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	monitor: rename out_lock to mon_lock	Peter Xu	1	-24/+29
	The out_lock is protecting a few Monitor fields. In the future the monitor code will start to run in multiple threads. We are going to turn it into a bigger lock to protect not only the out buffer but also most of the rest. Since at it, rearrange the Monitor struct a bit. Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Peter Xu <peterx@redhat.com> Message-Id: <20180608035511.7439-2-peterx@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>
2018-06-18	pc-bios/s390-ccw: Update the s390-netboot.img binary	Thomas Huth	1	-0/+0
	This binary now contains the support for pxelinux.cfg-style network booting. Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	pc-bios/s390-ccw: Optimize the s390-netboot.img for size	Thomas Huth	2	-2/+3
	The -O2 optimization flag is passed via CFLAGS to the firmware Makefile, but in netbook.mak, we've got some rules that only use QEMU_CFLAGS for compiling the libc and libnet from SLOF, so these files get compiled without optimization so far. Use CFLAGS here, too, to create faster and smaller code. We can additionally save some more bytes in the firmware images by compi- ling the code with -fno-asynchronous-unwind-tables. This will omit some ELF sections (used for stack unwinding for example) from the image that we do not need in the firmware. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	pc-bios/s390-ccw/net: Try to load pxelinux.cfg file accoring to the UUID	Thomas Huth	1	-1/+55
	With the STSI instruction, we can get the UUID of the current VM instance, so we can support loading pxelinux config files via UUID in the file name, too. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	pc-bios/s390-ccw/net: Add support for pxelinux-style config files	Thomas Huth	2	-4/+89
	Since it is quite cumbersome to manually create a combined kernel with initrd image for network booting, we now support loading via pxelinux configuration files, too. In these files, the kernel, initrd and command line parameters can be specified seperately, and the firmware then takes care of glueing everything together in memory after the files have been downloaded. See this URL for details about the config file layout: https://www.syslinux.org/wiki/index.php?title=PXELINUX The user can either specify a config file directly as bootfile via DHCP (but in this case, the file has to start either with "default" or a "#" comment so we can distinguish it from binary kernels), or a folder (i.e. the bootfile name must end with "/") where the firmware should look for the typical pxelinux.cfg file names, e.g. based on MAC or IP address. We also support the pxelinux.cfg DHCP options 209 and 210 from RFC 5071. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	pc-bios/s390-ccw/net: Update code for the latest changes in SLOF	Thomas Huth	2	-70/+18
	The ip_version information now has to be stored in the filename_ip_t structure, and there is now a common function called tftp_get_error_info() which can be used to get the error string for a TFTP error code. We can also get rid of some superfluous "(char *)" casts now. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Viktor Mihajlovski <mihajlov@linux.vnet.ibm.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	roms: Update SLOF submodule to current status	Thomas Huth	1	-0/+0
	We need the latest version of SLOF's libnet for adding pxelinux.cfg support in the s390-ccw bios, too. Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	pc-bios/s390-ccw: define loadparm length	Collin Walling	4	-7/+9
	Loadparm is defined by the s390 architecture to be 8 bytes in length. Let's define this size in the s390-ccw bios. Suggested-by: Laszlo Ersek <lersek@redhat.com> Signed-off-by: Collin Walling <walling@linux.ibm.com> Reviewed-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-06-18	block: fix QEMU crash with scsi-hd and drive_del	Greg Kurz	1	-0/+5
	Removing a drive with drive_del while it is being used to run an I/O intensive workload can cause QEMU to crash. An AIO flush can yield at some point: blk_aio_flush_entry() blk_co_flush(blk) bdrv_co_flush(blk->root->bs) ... qemu_coroutine_yield() and let the HMP command to run, free blk->root and give control back to the AIO flush: hmp_drive_del() blk_remove_bs() bdrv_root_unref_child(blk->root) child_bs = blk->root->bs bdrv_detach_child(blk->root) bdrv_replace_child(blk->root, NULL) blk->root->bs = NULL g_free(blk->root) <============== blk->root becomes stale bdrv_unref(child_bs) bdrv_delete(child_bs) bdrv_close() bdrv_drained_begin() bdrv_do_drained_begin() bdrv_drain_recurse() aio_poll() ... qemu_coroutine_switch() and the AIO flush completion ends up dereferencing blk->root: blk_aio_complete() scsi_aio_complete() blk_get_aio_context(blk) bs = blk_bs(blk) ie, bs = blk->root ? blk->root->bs : NULL ^^^^^ stale The problem is that we should avoid making block driver graph changes while we have in-flight requests. Let's drain all I/O for this BB before calling bdrv_root_unref_child(). Signed-off-by: Greg Kurz <groug@kaod.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	test-bdrv-drain: Test graph changes in drain_all section	Kevin Wolf	1	-2/+73
	This tests both adding and remove a node between bdrv_drain_all_begin() and bdrv_drain_all_end(), and enabled the existing detach test for drain_all. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Allow graph changes in bdrv_drain_all_begin/end sections	Kevin Wolf	4	-17/+79
	bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and did a subtree drain for each of them. This works fine as long as the graph is static, but sadly, reality looks different. If the graph changes so that root nodes are added or removed, we would have to compensate for this. bdrv_next() returns each root node only once even if it's the root node for multiple BlockBackends or for a monitor-owned block driver tree, which would only complicate things. The much easier and more obviously correct way is to fundamentally change the way the functions work: Iterate over all BlockDriverStates, no matter who owns them, and drain them individually. Compensation is only necessary when a new BDS is created inside a drain_all section. Removal of a BDS doesn't require any action because it's gone afterwards anyway. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: ignore_bds_parents parameter for drain functions	Kevin Wolf	5	-44/+78
	In the future, bdrv_drained_all_begin/end() will drain all invidiual nodes separately rather than whole subtrees. This means that we don't want to propagate the drain to all parents any more: If the parent is a BDS, it will already be drained separately. Recursing to all parents is unnecessary work and would make it an O(n²) operation. Prepare the drain function for the changed drain_all by adding an ignore_bds_parents parameter to the internal implementation that prevents the propagation of the drain to BDS parents. We still (have to) propagate it to non-BDS parents like BlockBackends or Jobs because those are not drained separately. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Move bdrv_drain_all_begin() out of coroutine context	Kevin Wolf	1	-5/+17
	Before we can introduce a single polling loop for all nodes in bdrv_drain_all_begin(), we must make sure to run it outside of coroutine context like we already do for bdrv_do_drained_begin(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Allow AIO_WAIT_WHILE with NULL ctx	Kevin Wolf	1	-4/+9
	bdrv_drain_all() wants to have a single polling loop for draining the in-flight requests of all nodes. This means that the AIO_WAIT_WHILE() condition relies on activity in multiple AioContexts, which is polled from the mainloop context. We must therefore call AIO_WAIT_WHILE() from the mainloop thread and use the AioWait notification mechanism. Just randomly picking the AioContext of any non-mainloop thread would work, but instead of bothering to find such a context in the caller, we can just as well accept NULL for ctx. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll	Kevin Wolf	1	-14/+88
	This adds a test case that goes wrong if bdrv_drain_invoke() calls aio_poll(). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Defer .bdrv_drain_begin callback to polling phase	Kevin Wolf	1	-5/+23
	We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're done with propagating the drain through the graph and are doing the single final BDRV_POLL_WHILE(). Just schedule the coroutine with the callback and increase bs->in_flight to make sure that the polling phase will wait for it. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	test-bdrv-drain: Graph change through parent callback	Kevin Wolf	1	-0/+130
	Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Don't poll in parent drain callbacks	Kevin Wolf	3	-9/+26
	bdrv_do_drained_begin() is only safe if we have a single BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow that parent callbacks introduce a nested polling loop that could cause graph changes while we're traversing the graph. Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single node without waiting for its requests to complete. These requests will be waited for in the BDRV_POLL_WHILE() call down the call chain. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	test-bdrv-drain: Test node deletion in subtree recursion	Kevin Wolf	1	-9/+29
	If bdrv_do_drained_begin() polls during its subtree recursion, the graph can change and mess up the bs->children iteration. Test that this doesn't happen. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Drain recursively with a single BDRV_POLL_WHILE()	Kevin Wolf	3	-22/+52
	Anything can happen inside BDRV_POLL_WHILE(), including graph changes that may interfere with its callers (e.g. child list iteration in recursive callers of bdrv_do_drained_begin). Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the end of bdrv_do_drained_begin() to avoid such effects. The recursion happens now inside the loop condition. As the graph can only change between bdrv_drain_poll() calls, but not inside of it, doing the recursion here is safe. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	test-bdrv-drain: Add test for node deletion	Max Reitz	1	-0/+169
	This patch adds two bdrv-drain tests for what happens if some BDS goes away during the drainage. The basic idea is that you have a parent BDS with some child nodes. Then, you drain one of the children. Because of that, the party who actually owns the parent decides to (A) delete it, or (B) detach all its children from it -- both while the child is still being drained. A real-world case where this can happen is the mirror block job, which may exit if you drain one of its children. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Remove bdrv_drain_recurse()	Kevin Wolf	1	-33/+3
	For bdrv_drain(), recursively waiting for child node requests is pointless because we didn't quiesce their parents, so new requests could come in anyway. Letting the function work only on a single node makes it more consistent. For subtree drains and drain_all, we already have the recursion in bdrv_do_drained_begin(), so the extra recursion doesn't add anything either. Remove the useless code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2018-06-18	block: Really pause block jobs on drain	Kevin Wolf	8	-14/+107
	We already requested that block jobs be paused in .bdrv_drained_begin, but no guarantee was made that the job was actually inactive at the point where bdrv_drained_begin() returned. This introduces a new callback BdrvChildRole.bdrv_drained_poll() and uses it to make bdrv_drain_poll() consider block jobs using the node to be drained. For the test case to work as expected, we have to switch from block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even considered active and must be waited for when draining the node. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2018-06-18	block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE()	Kevin Wolf	2	-15/+18
	Commit 91af091f923 added an additional aio_poll() to BDRV_POLL_WHILE() in order to make sure that all pending BHs are executed on drain. This was the wrong place to make the fix, as it is useless overhead for all other users of the macro and unnecessarily complicates the mechanism. This patch effectively reverts said commit (the context has changed a bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the loop condition for drain. The effect is probably hard to measure in any real-world use case because actual I/O will dominate, but if I run only the initialisation part of 'qemu-img convert' where it calls bdrv_block_status() for the whole image to find out how much data there is copy, this phase actually needs only roughly half the time after this patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>