summary refs log tree commit diff stats
path: root/hw/core/qdev-properties-system.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2018-09-26tests/vm: Use -cpu max rather than -cpu hostPeter Maydell1-2/+1
-cpu max works with any accelerator, so we don't need to use it only conditionally if not using KVM. Just use it all the time. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Message-Id: <20180820155554.23476-1-peter.maydell@linaro.org> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: do skip system call if ctx->notifier polling succeedsPaolo Bonzini1-3/+4
Commit 70232b5253 ("aio-posix: Don't count ctx->notifier as progress when 2018-08-15), by not reporting progress, causes aio_poll to execute the system call when polling succeeds because of ctx->notifier. This introduces latency before the call to aio_bh_poll() and negates the advantages of polling, unfortunately. The fix builds on the previous patch, separating the effect of polling on the timeout from the progress reported to aio_poll(). ctx->notifier does zero the timeout, causing the caller to skip the system call, but it does not report progress, so that the bug fix of commit 70232b5253 still stands. Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-4-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: compute timeout before pollingPaolo Bonzini2-27/+36
This is a preparation for the next patch, and also a very small optimization. Compute the timeout only once, before invoking try_poll_mode, and adjust it in run_poll_handlers. The adjustment is the polling time when polling fails, or zero (non-blocking) if polling succeeds. Fixes: 70232b5253a3c4e03ed1ac47ef9246a8ac66c6fa Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-3-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-26aio-posix: fix concurrent access to poll_disable_cntPaolo Bonzini1-11/+15
It is valid for an aio_set_fd_handler to happen concurrently with aio_poll. In that case, poll_disable_cnt can change under the heels of aio_poll, and the assertion on poll_disable_cnt can fail in run_poll_handlers. Therefore, this patch simply checks the counter on every polling iteration. There are no particular needs for ordering, since the polling loop is terminated anyway by aio_notify at the end of aio_set_fd_handler. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20180912171040.1732-2-pbonzini@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com>
2018-09-25linux-user: do setrlimit selectivelyMax Filippov1-1/+15
setrlimit guest calls that affect memory resources (RLIMIT_{AS,DATA,STACK}) may interfere with QEMU internal memory management. They may result in QEMU lockup because mprotect call in page_unprotect would fail with ENOMEM error code, causing infinite loop of SIGSEGV. E.g. it happens when running libstdc++ testsuite for xtensa target on x86_64 host. Don't call host setrlimit for memory-related resources. Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Message-Id: <20180917181314.22551-1-jcmvbkbc@gmail.com> [lv: rebase on master] Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-09-25linux-user: write(fd, NULL, 0) parity with linux's treatment of sameTony Garnock-Jones1-0/+3
Bring linux-user write(2) handling into line with linux for the case of a 0-byte write with a NULL buffer. Based on a patch originally written by Zhuowei Zhang. Addresses https://bugs.launchpad.net/qemu/+bug/1716292. >From Zhuowei Zhang's patch (https://lists.gnu.org/archive/html/qemu-devel/2017-09/msg08073.html): Linux returns success for the special case of calling write with a zero-length NULL buffer: compiling and running int main() { ssize_t ret = write(STDOUT_FILENO, NULL, 0); fprintf(stderr, "write returned %ld\n", ret); return 0; } gives "write returned 0" when run directly, but "write returned -1" in QEMU. This commit checks for this situation and returns success if found. Subsequent discussion raised the following questions (and my answers): - Q. Should TARGET_NR_read pass through to safe_read in this situation too? A. I'm wary of changing unrelated code to the specific problem I'm addressing. TARGET_NR_read is already consistent with Linux for this case. - Q. Do pread64/pwrite64 need to be changed similarly? A. Experiment suggests not: both linux and linux-user yield -1 for NULL 0-length reads/writes. Signed-off-by: Tony Garnock-Jones <tonygarnockjones@gmail.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20180908182205.GB409@mornington.dcs.gla.ac.uk> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-09-25linux-user: elf: mmap all the target-pages of hostpage for data segmentShivaprasad G Bhat1-3/+7
If the hostpage size is greater than the TARGET_PAGESIZE, the target-pages of size TARGET_PAGESIZE are marked valid only till the length requested during the elfload. The glibc attempts to consume unused space in the last page of data segment(__libc_memalign() in elf/dl-minimal.c). If PT_LOAD p_align is greater than or equal to hostpage size, the GLRO(dl_pagesize) is actually the host pagesize as set in the auxillary vectors. So, there is no explicit mmap request for the remaining target-pages on the last hostpage. The glibc assumes that particular space as available and subsequent attempts to use those addresses lead to crash as the target_mmap has not marked them valid for those target-pages. The issue is seen when trying to chroot to 16.04-x86_64 ubuntu on a PPC64 host where the fork fails to access the thread_id as it is allocated on a page not marked valid. The recent glibc doesn't have checks for thread-id in fork, but the issue can manifest somewhere else, none the less. The fix here is to map all the target-pages of the hostpage during the elfload if the p_align is greater than or equal to hostpage size, for data segment to allow the glibc for proper consumption. Signed-off-by: Shivaprasad G Bhat <sbhat@linux.vnet.ibm.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <153553435604.51992.5640085189104207249.stgit@lep8c.aus.stglabs.ibm.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-09-25linux-user: add SO_LINGER to {g,s}etsockoptCarlo Marcelo Arenas Belón2-1/+56
Original implementation for setsockopt by Chen Gang[1]; all bugs mine, including removing assignment for optname which hopefully makes the logic easier to follow and moving some variables to make the code more selfcontained. [1] http://patchwork.ozlabs.org/patch/565659/ Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Co-Authored-By: Chen Gang <gang.chen.5i5j@gmail.com> Reviewed-by: Laurent Vivier <laurent@vivier.eu> Message-Id: <20180824085601.6259-1-carenas@gmail.com> Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2018-09-25linux-user: move TargetFdTrans functions to their own fileLaurent Vivier4-1448/+1508
This will ease to move out syscall functions from syscall.c Signed-off-by: Laurent Vivier <laurent@vivier.eu> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Message-Id: <20180823222215.13781-1-laurent@vivier.eu>
2018-09-25Revert "check: Move VMXNET3 test to common"Thomas Huth1-2/+2
This reverts commit 7a066770f53c198014add869696427f81d67e9c2. The patch did not work as expected: The vmxnet3 test is currently not run at all anymore. Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25Revert "check: Move endianess test to common"Thomas Huth1-1/+13
This reverts commit 669cc7100065c690cb7b4f3da5cfc471d1ed4740. The patch did not work as expected: The endianess test is currently not run at all anymore. Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25Revert "check: Move wdt_ib700 test to common"Thomas Huth1-2/+2
This reverts commit ee1f6c812b3240420dff07a3860060b7d4abfe09. The patch did not work as expected: The wdt_ib700 test is currently not run at all anymore. Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25tests/migration: Speed up the test on ppc64Thomas Huth1-3/+3
The SLOF boot process is always quite slow ... but we can speed it up a little bit by specifying "-nodefaults" and by using the "nvramrc" variable instead of "boot-command" (since "nvramrc" is evaluated earlier in the SLOF boot process than "boot-command"). Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Reviewed-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25hw/qdev-core: Fix description of instance_initThomas Huth1-2/+3
The part of the documentation of DeviceClass that talks about instance_init is partly wrong: instance_init() functions must not abort or exit, since the function is also called during introspection of the device already. So if a device calls exit() during its instance_init() function, QEMU terminates unexpectedly if somebody tries to just have a look at the interfaces from the device with "device_add xyz,help" or with the "device-list-properties" QOM command. This should never happen. Reviewed-by: Andreas Färber <afaerber@suse.de> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25qdev: fix a typo in commentLi Qiang1-1/+1
Found by reading code. Signed-off-by: Li Qiang <liq3ea@gmail.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25docs: Fix some typos (most found by codespell)Stefan Weil3-4/+4
Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25trivial: Make bios files and source files non-executableThomas Huth6-0/+0
These files can not be executed on the host, so they should not be marked as executable. Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25memfd: fix possible usage of the uninitialized file descriptorDima Stepanov1-0/+1
The qemu_memfd_alloc_check() routine allocates the fd variable on stack. This variable is initialized inside the qemu_memfd_alloc() function. There are several cases when *fd will be left unintialized which can lead to the unexpected close() in the qemu_memfd_free() call. Set file descriptor to -1 before calling the qemu_memfd_alloc routine. Signed-off-by: Dima Stepanov <dimastep@yandex-team.ru> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Reviewed-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25hw/core/machine: Officially deprecate the enforce-config-section parameterThomas Huth2-0/+8
Commit 16f7244842b5135543ef068a1adafd94c6965953 added this parameter to the documentation, including a note that it is deprecated. But it has never been added to the "Deprecated features" appendix, which is our official way to deprecate legacy parameters. So let's do this now. Reviewed-by: Peter Xu <peterx@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25net/slirp: Deprecate the [hub_id name] parameter tupleThomas Huth2-0/+9
The "name" in the [hub_id name] parameter tuple is the same as a "netdev_id" (which should be unique), so specifying the hub_id here is just redundant (it was likely just necessary in the past when the network subsystem was still using "vlans" only and when it did not use unique "id"s yet). Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25net: Deprecate the "name" parameter of -netThomas Huth2-0/+9
In early times, network backends were specified by a "vlan" and "name" tuple. With the introduction of netdevs, the "name" was replaced by an "id" (which is supposed to be unique), but the "name" parameter stayed as an alias which could be used instead of "id". Unfortunately, we miss the duplication check for "name": $ qemu-system-x86_64 -net user,name=n1 -net user,name=n1 ... starts without an error, while "id" correctly complains: $ qemu-system-x86_64 -net user,id=n1 -net user,id=n1 qemu-system-x86_64: -net user,id=n1: Duplicate ID 'n1' for net Instead of trying to fix the code for the legacy "name" parameter, let's rather get rid of this old interface and deprecate the "name" parameter now - this will also be less confusing for the users in the long run. Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com>
2018-09-25Makefile: Add missing dependency for qemu-deprecated.texiThomas Huth1-1/+1
Make sure that the docs get correctly regenerated when the file qemu-deprecated.texi has been changed. Fixes: 44c67847e32c91a6071fb0440c357b9489f08bc6 Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Thomas Huth <thuth@redhat.com> (cherry picked from commit f99ce85279178385f204a52236f855c879c29cdc)
2018-09-25target/arm: Start AArch32 CPUs with EL2 but not EL3 in Hyp modePeter Maydell1-2/+12
The ARMv8 architecture defines that an AArch32 CPU starts in SVC mode, unless EL2 is the highest available EL, in which case it starts in Hyp mode. (In ARMv7 a CPU with EL2 but not EL3 was not a valid configuration, but we don't specifically reject this if the user asks for one.) Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Message-id: 20180823135047.16525-1-peter.maydell@linaro.org
2018-09-25aspeed/smc: fix some alignment issuesCédric Le Goater1-4/+4
Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180921161939.822-6-clg@kaod.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25hw/arm/aspeed: Add an Aspeed machine classCédric Le Goater2-142/+116
The code looks better, it removes duplicated lines and it will ease the introduction of common properties for the Aspeed machines. Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180921161939.822-4-clg@kaod.org Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25hw/arm/aspeed: change the FMC flash model of the AST2500 evbCédric Le Goater1-1/+1
The AST2500 evb is shipped with a W25Q256 which has a non volatile bit to make the chip operate in 4 Byte address mode at power up. This should be an interesting feature to model as it will exercise a bit more the SMC controllers and MMIO execution at boot time. Signed-off-by: Cédric Le Goater <clg@kaod.org> Message-id: 20180921161939.822-3-clg@kaod.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25aspeed/timer: fix compile breakage with clang 3.4.2Cédric Le Goater2-3/+1
In file included from /home/thuth/devel/qemu/hw/timer/aspeed_timer.c:16: /home/thuth/devel/qemu/include/hw/misc/aspeed_scu.h:37:3: error: redefinition of typedef 'AspeedSCUState' is a C11 feature [-Werror,-Wtypedef-redefinition] } AspeedSCUState; ^ /home/thuth/devel/qemu/include/hw/timer/aspeed_timer.h:27:31: note: previous definition is here typedef struct AspeedSCUState AspeedSCUState; Reported-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180921161939.822-2-clg@kaod.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25hw/timer/cmsdk-apb-dualtimer: Add missing 'break' statementsPeter Maydell1-0/+2
Add 'break' statements missing from a switch in the APB dual-timer write function. Spotted by Coverity as CID 1395626 and 1395633. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-id: 20180924123122.14549-1-peter.maydell@linaro.org
2018-09-25hw/net/pcnet-pci: Unify pcnet_ioport_read/write and pcnet_mmio_read/writePeter Maydell2-67/+2
The only difference between our implementation of the pcnet ioport accessors and the mmio accessors is that the former check BCR_DWIO to see what access widths are permitted for addresses in the aprom range (0x0..0xf). In fact our failure to do this in the mmio accessors is a bug (one which was fixed for the ioport accessors in commit 7ba79741970 in 2011). The data sheet for the Am79C970A does not describe the DWIO bit as only applying for I/O space mapped I/O resources and not memory mapped I/O resources, and our MMIO accessors already honour DWIO for accesses in the 0x10..0x1f range (since the pcnet_ioport_{read,write}{w,l} functions check it). The data sheet for the later but compatible Am79C976 is clearer: it states specifically "DWIO mode applies to both I/O- and memory-mapped acceses." This seems to be reasonable evidence in favour of interpretating the Am79C970A spec as being the same. (NB: Linux's pcnet driver only supports I/O accesses, so the MMIO access part of this device is probably untested anyway.) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25hw/net/pcnet-pci: Convert away from old_mmio accessorsPeter Maydell2-84/+57
Convert the pcnet-pci device away from using the old_mmio MemoryRegionOps accessor functions. This commit is a no-behaviour-change API conversion. (Since PCNET_PNPMMIO_SIZE is 0x20, the old "addr & 0x10" check and the new "addr < 0x10" check are exact opposites; the new code is phrased to be parallel with the pcnet_io_read/write functions.) I have left a TODO comment marker because the similarity between the MMIO and IO accessor behaviour is suspicious and they could be combined, but this will be left to a different patch. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25hw/intc/arm_gic: Drop GIC_BASE_IRQ macroPeter Maydell3-20/+14
The GIC_BASE_IRQ macro is a leftover from when we shared code between the GICv2 and the v7M NVIC. Since the NVIC is now split off, GIC_BASE_IRQ is always 0, and we can just delete it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Reviewed-by: Luc Michel <luc.michel@greensocs.com> Message-id: 20180824161819.11085-1-peter.maydell@linaro.org
2018-09-25hw/intc/arm_gic: Document QEMU interfacePeter Maydell1-0/+43
The GICv2's QEMU interface (sysbus MMIO regions, IRQs, etc) is now quite complicated with the addition of the virtualization extensions. Add a comment in the header file which documents it. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Luc Michel <luc.michel@greensocs.com> Message-id: 20180823103818.31189-1-peter.maydell@linaro.org
2018-09-25hw/arm/smmuv3: fix eventq recording and IRQ triggerringEric Auger2-14/+14
The event queue management is broken today. Event records are not properly written as EVT_SET_* macro was not updating the actual event record. Also the event queue interrupt is not correctly triggered. Fixes: bb981004eaf4 ("hw/arm/smmuv3: Event queue recording helper") Signed-off-by: Eric Auger <eric.auger@redhat.com> Message-id: 20180921070138.10114-3-eric.auger@redhat.com Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2018-09-25test-bdrv-drain: Test draining job source child and parentKevin Wolf1-8/+69
For the block job drain test, don't only test draining the source and the target node, but create a backing chain for the source (source_backing <- source <- source_overlay) and test draining each of the nodes in it. When using iothreads, the source node (and therefore the job) is in a different AioContext than the drain, which happens from the main thread. This way, the main thread waits in AIO_WAIT_WHILE() for the iothread to make process and aio_wait_kick() is required to notify it. The test validates that calling bdrv_wakeup() for a child or a parent node will actually notify AIO_WAIT_WHILE() instead of letting it hang. Increase the sleep time a bit (to 1 ms) because the test case is racy and with the shorter sleep, it didn't reproduce the bug it is supposed to test for me under 'rr record -n'. This was because bdrv_drain_invoke_entry() (in the main thread) was only called after the job had already reached the pause point, so we got a bdrv_dec_in_flight() from the main thread and the additional aio_wait_kick() when the job becomes idle (that we really wanted to test here) wasn't even necessary any more to make progress. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25block: Use a single global AioWaitKevin Wolf10-65/+26
When draining a block node, we recurse to its parent and for subtree drains also to its children. A single AIO_WAIT_WHILE() is then used to wait for bdrv_drain_poll() to become true, which depends on all of the nodes we recursed to. However, if the respective child or parent becomes quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent is checked, while AIO_WAIT_WHILE() depends on the AioWait of the original node. Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE(). This may mean that the draining thread gets a few more unnecessary wakeups because an unrelated operation got completed, but we already wake it up when something _could_ have changed rather than only if it has certainly changed. Apart from that, drain is a slow path anyway. In theory it would be possible to use wakeups more selectively and still correctly, but the gains are likely not worth the additional complexity. In fact, this patch is a nice simplification for some places in the code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25test-bdrv-drain: Fix outdated commentsKevin Wolf1-5/+5
Commit 89bd030533e changed the test case from using job_sleep_ns() to using qemu_co_sleep_ns() instead. Also, block_job_sleep_ns() became job_sleep_ns() in commit 5d43e86e11f. In both cases, some comments in the test case were not updated. Do that now. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>
2018-09-25test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abortKevin Wolf1-12/+104
This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort callbacks. Both reasons why .abort could be called for a single job are tested: Either .run or .prepare could return an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25job: Avoid deadlocks in job_completed_txn_abort()Kevin Wolf1-5/+11
Amongst others, job_finalize_single() calls the .prepare/.commit/.abort callbacks of the individual job driver. Recently, their use was adapted for all block jobs so that they involve code calling AIO_WAIT_WHILE() now. Such code must be called under the AioContext lock for the respective job, but without holding any other AioContext lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()Kevin Wolf1-0/+13
This is a regression test for a deadlock that could occur in callbacks called from the aio_poll() in bdrv_drain_poll_top_level(). The AioContext lock wasn't released and therefore would be taken a second time in the callback. This would cause a possible AIO_WAIT_WHILE() in the callback to hang. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25block: Remove aio_poll() in bdrv_drain_poll variantsKevin Wolf1-8/+0
bdrv_drain_poll_top_level() was buggy because it didn't release the AioContext lock of the node to be drained before calling aio_poll(). This way, callbacks called by aio_poll() would possibly take the lock a second time and run into a deadlock with a nested AIO_WAIT_WHILE() call. However, it turns out that the aio_poll() call isn't actually needed any more. It was introduced in commit 91af091f923, which is effectively reverted by this patch. The cases it was supposed to fix are now covered by bdrv_drain_poll(), which waits for block jobs to reach a quiescent state. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25blockjob: Lie better in child_job_drained_poll()Kevin Wolf3-2/+14
Block jobs claim in .drained_poll() that they are in a quiescent state as soon as job->deferred_to_main_loop is true. This is obviously wrong, they still have a completion BH to run. We only get away with this because commit 91af091f923 added an unconditional aio_poll(false) to the drain functions, but this is bypassing the regular drain mechanisms. However, just removing this and telling that the job is still active doesn't work either: The completion callbacks themselves call drain functions (directly, or indirectly with bdrv_reopen), so they would deadlock then. As a better lie, tell that the job is active as long as the BH is pending, but falsely call it quiescent from the point in the BH when the completion callback is called. At this point, nested drain calls won't deadlock because they ignore the job, and outer drains will wait for the job to really reach a quiescent state because the callback is already running. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25block-backend: Decrease in_flight only after callbackKevin Wolf1-1/+1
Request callbacks can do pretty much anything, including operations that will yield from the coroutine (such as draining the backend). In that case, a decreased in_flight would be visible to other code and could lead to a drain completing while the callback hasn't actually completed yet. Note that reordering these operations forbids calling drain directly inside an AIO callback. As Paolo explains, indirectly calling it is okay: - Calling it through a coroutine is okay, because then bdrv_drained_begin() goes through bdrv_co_yield_to_drain() and you have in_flight=2 when bdrv_co_yield_to_drain() yields, then soon in_flight=1 when the aio_co_wake() in the AIO callback completes, then in_flight=0 after the bottom half starts. - Calling it through a bottom half would be okay too, as long as the AIO callback remembers to do inc_in_flight/dec_in_flight just like bdrv_co_yield_to_drain() and bdrv_co_drain_bh_cb() do A few more important cases that come to mind: - A coroutine that yields because of I/O is okay, with a sequence similar to bdrv_co_yield_to_drain(). - A coroutine that yields with no I/O pending will correctly decrease in_flight to zero before yielding. - Calling more AIO from the callback won't overflow the counter just because of mutual recursion, because AIO functions always yield at least once before invoking the callback. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2018-09-25block-backend: Fix potential double blk_delete()Kevin Wolf1-1/+8
blk_unref() first decreases the refcount of the BlockBackend and calls blk_delete() if the refcount reaches zero. Requests can still be in flight at this point, they are only drained during blk_delete(): At this point, arbitrary callbacks can run. If any callback takes a temporary BlockBackend reference, it will first increase the refcount to 1 and then decrease it to 0 again, triggering another blk_delete(). This will cause a use-after-free crash in the outer blk_delete(). Fix it by draining the BlockBackend before decreasing to refcount to 0. Assert in blk_ref() that it never takes the first refcount (which would mean that the BlockBackend is already being deleted). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25block-backend: Add .drained_poll callbackKevin Wolf1-0/+9
A bdrv_drain operation must ensure that all parents are quiesced, this includes BlockBackends. Otherwise, callbacks called by requests that are completed on the BDS layer, but not quite yet on the BlockBackend layer could still create new requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25block: Add missing locking in bdrv_co_drain_bh_cb()Kevin Wolf3-0/+25
bdrv_do_drained_begin/end() assume that they are called with the AioContext lock of bs held. If we call drain functions from a coroutine with the AioContext lock held, we yield and schedule a BH to move out of coroutine context. This means that the lock for the home context of the coroutine is released and must be re-acquired in the bottom half. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callbackKevin Wolf1-0/+10
This is a regression test for a deadlock that occurred in block job completion callbacks (via job_defer_to_main_loop) because the AioContext lock was taken twice: once in job_finish_sync() and then again in job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25job: Use AIO_WAIT_WHILE() in job_finish_sync()Kevin Wolf1-8/+6
job_finish_sync() needs to release the AioContext lock of the job before calling aio_poll(). Otherwise, callbacks called by aio_poll() would possibly take the lock a second time and run into a deadlock with a nested AIO_WAIT_WHILE() call. Also, job_drain() without aio_poll() isn't necessarily enough to make progress on a job, it could depend on bottom halves to be executed. Combine both open-coded while loops into a single AIO_WAIT_WHILE() call that solves both of these problems. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
2018-09-25test-blockjob: Acquire AioContext around job_cancel_sync()Kevin Wolf2-0/+12
All callers in QEMU proper hold the AioContext lock when calling job_finish_sync(). test-blockjob should do the same when it calls the function indirectly through job_cancel_sync(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25test-bdrv-drain: Drain with block jobs in an I/O threadKevin Wolf1-6/+86
This extends the existing drain test with a block job to include variants where the block job runs in a different AioContext. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>
2018-09-25aio-wait: Increase num_waiters even in home threadKevin Wolf1-3/+3
Even if AIO_WAIT_WHILE() is called in the home context of the AioContext, we still want to allow the condition to change depending on other threads as long as they kick the AioWait. Specfically block jobs can be running in an I/O thread and should then be able to kick a drain in the main loop context. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>