diff options
Diffstat (limited to 'classification_output/04/device')
| -rw-r--r-- | classification_output/04/device/14488057 | 719 | ||||
| -rw-r--r-- | classification_output/04/device/24190340 | 2064 | ||||
| -rw-r--r-- | classification_output/04/device/24930826 | 41 | ||||
| -rw-r--r-- | classification_output/04/device/28596630 | 121 | ||||
| -rw-r--r-- | classification_output/04/device/42226390 | 195 | ||||
| -rw-r--r-- | classification_output/04/device/57195159 | 323 | ||||
| -rw-r--r-- | classification_output/04/device/57231878 | 250 | ||||
| -rw-r--r-- | classification_output/04/device/67821138 | 207 | ||||
| -rw-r--r-- | classification_output/04/device/99674399 | 156 |
9 files changed, 0 insertions, 4076 deletions
diff --git a/classification_output/04/device/14488057 b/classification_output/04/device/14488057 deleted file mode 100644 index 8637a1740..000000000 --- a/classification_output/04/device/14488057 +++ /dev/null @@ -1,719 +0,0 @@ -device: 0.929 -other: 0.922 -assembly: 0.912 -instruction: 0.908 -semantic: 0.905 -boot: 0.892 -graphic: 0.887 -mistranslation: 0.885 -vnc: 0.882 -KVM: 0.880 -network: 0.846 -socket: 0.825 - -[Qemu-devel] [BUG] user-to-root privesc inside VM via bad translation caching - -This is an issue in QEMU's system emulation for X86 in TCG mode. -The issue permits an attacker who can execute code in guest ring 3 -with normal user privileges to inject code into other processes that -are running in guest ring 3, in particular root-owned processes. - -== reproduction steps == - - - Create an x86-64 VM and install Debian Jessie in it. The following - steps should all be executed inside the VM. - - Verify that procmail is installed and the correct version: - address@hidden:~# apt-cache show procmail | egrep 'Version|SHA' - Version: 3.22-24 - SHA1: 54ed2d51db0e76f027f06068ab5371048c13434c - SHA256: 4488cf6975af9134a9b5238d5d70e8be277f70caa45a840dfbefd2dc444bfe7f - - Install build-essential and nasm ("apt install build-essential nasm"). - - Unpack the exploit, compile it and run it: - address@hidden:~$ tar xvf procmail_cache_attack.tar - procmail_cache_attack/ - procmail_cache_attack/shellcode.asm - procmail_cache_attack/xp.c - procmail_cache_attack/compile.sh - procmail_cache_attack/attack.c - address@hidden:~$ cd procmail_cache_attack - address@hidden:~/procmail_cache_attack$ ./compile.sh - address@hidden:~/procmail_cache_attack$ ./attack - memory mappings set up - child is dead, codegen should be complete - executing code as root! :) - address@hidden:~/procmail_cache_attack# id - uid=0(root) gid=0(root) groups=0(root),[...] - -Note: While the exploit depends on the precise version of procmail, -the actual vulnerability is in QEMU, not in procmail. procmail merely -serves as a seldomly-executed setuid root binary into which code can -be injected. - - -== detailed issue description == -QEMU caches translated basic blocks. To look up a translated basic -block, the function tb_find() is used, which uses tb_htable_lookup() -in its slowpath, which in turn compares translated basic blocks -(TranslationBlock) to the lookup information (struct tb_desc) using -tb_cmp(). - -tb_cmp() attempts to ensure (among other things) that both the virtual -start address of the basic block and the physical addresses that the -basic block covers match. When checking the physical addresses, it -assumes that a basic block can span at most two pages. - -gen_intermediate_code() attempts to enforce this by stopping the -translation of a basic block if nearly one page of instructions has -been translated already: - - /* if too long translation, stop generation too */ - if (tcg_op_buf_full() || - (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) || - num_insns >= max_insns) { - gen_jmp_im(pc_ptr - dc->cs_base); - gen_eob(dc); - break; - } - -However, while real X86 processors have a maximum instruction length -of 15 bytes, QEMU's instruction decoder for X86 does not place any -limit on the instruction length or the number of instruction prefixes. -Therefore, it is possible to create an arbitrarily long instruction -by e.g. prepending an arbitrary number of LOCK prefixes to a normal -instruction. This permits creating a basic block that spans three -pages by simply appending an approximately page-sized instruction to -the end of a normal basic block that starts close to the end of a -page. - -Such an overlong basic block causes the basic block caching to fail as -follows: If code is generated and cached for a basic block that spans -the physical pages (A,E,B), this basic block will be returned by -lookups in a process in which the physical pages (A,B,C) are mapped -in the same virtual address range (assuming that all other lookup -parameters match). - -This behavior can be abused by an attacker e.g. as follows: If a -non-relocatable world-readable setuid executable legitimately contains -the pages (A,B,C), an attacker can map (A,E,B) into his own process, -at the normal load address of A, where E is an attacker-controlled -page. If a legitimate basic block spans the pages A and B, an attacker -can write arbitrary non-branch instructions at the start of E, then -append an overlong instruction -that ends behind the start of C, yielding a modified basic block that -spans all three pages. If the attacker then executes the modified -basic block in his process, the modified basic block is cached. -Next, the attacker can execute the setuid binary, which will reuse the -cached modified basic block, executing attacker-controlled -instructions in the context of the privileged process. - -I am sending this to qemu-devel because a QEMU security contact -told me that QEMU does not consider privilege escalation inside a -TCG VM to be a security concern. -procmail_cache_attack.tar -Description: -Unix tar archive - -On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: -> -This is an issue in QEMU's system emulation for X86 in TCG mode. -> -The issue permits an attacker who can execute code in guest ring 3 -> -with normal user privileges to inject code into other processes that -> -are running in guest ring 3, in particular root-owned processes. -> -I am sending this to qemu-devel because a QEMU security contact -> -told me that QEMU does not consider privilege escalation inside a -> -TCG VM to be a security concern. -Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. - -We should really fix the crossing-a-page-boundary code for x86. -I believe we do get it correct for ARM Thumb instructions. - -thanks --- PMM - -On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote: -> -On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: -> -> This is an issue in QEMU's system emulation for X86 in TCG mode. -> -> The issue permits an attacker who can execute code in guest ring 3 -> -> with normal user privileges to inject code into other processes that -> -> are running in guest ring 3, in particular root-owned processes. -> -> -> I am sending this to qemu-devel because a QEMU security contact -> -> told me that QEMU does not consider privilege escalation inside a -> -> TCG VM to be a security concern. -> -> -Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. -> -> -We should really fix the crossing-a-page-boundary code for x86. -> -I believe we do get it correct for ARM Thumb instructions. -How about doing the instruction size check as follows? - -diff --git a/target/i386/translate.c b/target/i386/translate.c -index 72c1b03a2a..94cf3da719 100644 ---- a/target/i386/translate.c -+++ b/target/i386/translate.c -@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State -*env, DisasContext *s, - default: - goto unknown_op; - } -+ if (s->pc - pc_start > 15) { -+ s->pc = pc_start; -+ goto illegal_op; -+ } - return s->pc; - illegal_op: - gen_illegal_opcode(s); - -Thanks, --- -Pranith - -On 22 March 2017 at 14:55, Pranith Kumar <address@hidden> wrote: -> -On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote: -> -> On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: -> ->> This is an issue in QEMU's system emulation for X86 in TCG mode. -> ->> The issue permits an attacker who can execute code in guest ring 3 -> ->> with normal user privileges to inject code into other processes that -> ->> are running in guest ring 3, in particular root-owned processes. -> -> -> ->> I am sending this to qemu-devel because a QEMU security contact -> ->> told me that QEMU does not consider privilege escalation inside a -> ->> TCG VM to be a security concern. -> -> -> -> Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. -> -> -> -> We should really fix the crossing-a-page-boundary code for x86. -> -> I believe we do get it correct for ARM Thumb instructions. -> -> -How about doing the instruction size check as follows? -> -> -diff --git a/target/i386/translate.c b/target/i386/translate.c -> -index 72c1b03a2a..94cf3da719 100644 -> ---- a/target/i386/translate.c -> -+++ b/target/i386/translate.c -> -@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State -> -*env, DisasContext *s, -> -default: -> -goto unknown_op; -> -} -> -+ if (s->pc - pc_start > 15) { -> -+ s->pc = pc_start; -> -+ goto illegal_op; -> -+ } -> -return s->pc; -> -illegal_op: -> -gen_illegal_opcode(s); -This doesn't look right because it means we'll check -only after we've emitted all the code to do the -instruction operation, so the effect will be -"execute instruction, then take illegal-opcode -exception". - -We should check what the x86 architecture spec actually -says and implement that. - -thanks --- PMM - -On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell -<address@hidden> wrote: -> -> -> -> How about doing the instruction size check as follows? -> -> -> -> diff --git a/target/i386/translate.c b/target/i386/translate.c -> -> index 72c1b03a2a..94cf3da719 100644 -> -> --- a/target/i386/translate.c -> -> +++ b/target/i386/translate.c -> -> @@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State -> -> *env, DisasContext *s, -> -> default: -> -> goto unknown_op; -> -> } -> -> + if (s->pc - pc_start > 15) { -> -> + s->pc = pc_start; -> -> + goto illegal_op; -> -> + } -> -> return s->pc; -> -> illegal_op: -> -> gen_illegal_opcode(s); -> -> -This doesn't look right because it means we'll check -> -only after we've emitted all the code to do the -> -instruction operation, so the effect will be -> -"execute instruction, then take illegal-opcode -> -exception". -> -The pc is restored to original address (s->pc = pc_start), so the -exception will overwrite the generated illegal instruction and will be -executed first. - -But yes, it's better to follow the architecture manual. - -Thanks, --- -Pranith - -On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: -> -On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell -> -<address@hidden> wrote: -> -> This doesn't look right because it means we'll check -> -> only after we've emitted all the code to do the -> -> instruction operation, so the effect will be -> -> "execute instruction, then take illegal-opcode -> -> exception". -> -The pc is restored to original address (s->pc = pc_start), so the -> -exception will overwrite the generated illegal instruction and will be -> -executed first. -s->pc is the guest PC -- moving that backwards will -not do anything about the generated TCG IR that's -already been written. You'd need to rewind the -write pointer in the IR stream, which there is -no support for doing AFAIK. - -thanks --- PMM - -On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell -<address@hidden> wrote: -> -On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: -> -> On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell -> -> <address@hidden> wrote: -> ->> This doesn't look right because it means we'll check -> ->> only after we've emitted all the code to do the -> ->> instruction operation, so the effect will be -> ->> "execute instruction, then take illegal-opcode -> ->> exception". -> -> -> The pc is restored to original address (s->pc = pc_start), so the -> -> exception will overwrite the generated illegal instruction and will be -> -> executed first. -> -> -s->pc is the guest PC -- moving that backwards will -> -not do anything about the generated TCG IR that's -> -already been written. You'd need to rewind the -> -write pointer in the IR stream, which there is -> -no support for doing AFAIK. -Ah, OK. Thanks for the explanation. May be we should check the size of -the instruction while decoding the prefixes and error out once we -exceed the limit. We would not generate any IR code. - --- -Pranith - -On 03/23/2017 02:29 AM, Pranith Kumar wrote: -On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell -<address@hidden> wrote: -On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: -On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell -<address@hidden> wrote: -This doesn't look right because it means we'll check -only after we've emitted all the code to do the -instruction operation, so the effect will be -"execute instruction, then take illegal-opcode -exception". -The pc is restored to original address (s->pc = pc_start), so the -exception will overwrite the generated illegal instruction and will be -executed first. -s->pc is the guest PC -- moving that backwards will -not do anything about the generated TCG IR that's -already been written. You'd need to rewind the -write pointer in the IR stream, which there is -no support for doing AFAIK. -Ah, OK. Thanks for the explanation. May be we should check the size of -the instruction while decoding the prefixes and error out once we -exceed the limit. We would not generate any IR code. -Yes. -It would not enforce a true limit of 15 bytes, since you can't know that until -you've done the rest of the decode. But you'd be able to say that no more than -14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 bytes is used. -Which does fix the bug. - - -r~ - -On 22/03/2017 21:01, Richard Henderson wrote: -> -> -> -> Ah, OK. Thanks for the explanation. May be we should check the size of -> -> the instruction while decoding the prefixes and error out once we -> -> exceed the limit. We would not generate any IR code. -> -> -Yes. -> -> -It would not enforce a true limit of 15 bytes, since you can't know that -> -until you've done the rest of the decode. But you'd be able to say that -> -no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 -> -bytes is used. -> -> -Which does fix the bug. -Yeah, that would work for 2.9 if somebody wants to put together a patch. - Ensuring that all instruction fetching happens before translation side -effects is a little harder, but perhaps it's also the opportunity to get -rid of s->rip_offset which is a little ugly. - -Paolo - -On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: -> -> -> -On 22/03/2017 21:01, Richard Henderson wrote: -> ->> -> ->> Ah, OK. Thanks for the explanation. May be we should check the size of -> ->> the instruction while decoding the prefixes and error out once we -> ->> exceed the limit. We would not generate any IR code. -> -> -> -> Yes. -> -> -> -> It would not enforce a true limit of 15 bytes, since you can't know that -> -> until you've done the rest of the decode. But you'd be able to say that -> -> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 -> -> bytes is used. -> -> -> -> Which does fix the bug. -> -> -Yeah, that would work for 2.9 if somebody wants to put together a patch. -> -Ensuring that all instruction fetching happens before translation side -> -effects is a little harder, but perhaps it's also the opportunity to get -> -rid of s->rip_offset which is a little ugly. -How about the following? - -diff --git a/target/i386/translate.c b/target/i386/translate.c -index 72c1b03a2a..67c58b8900 100644 ---- a/target/i386/translate.c -+++ b/target/i386/translate.c -@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State -*env, DisasContext *s, - s->vex_l = 0; - s->vex_v = 0; - next_byte: -+ /* The prefixes can atmost be 14 bytes since x86 has an upper -+ limit of 15 bytes for the instruction */ -+ if (s->pc - pc_start > 14) { -+ goto illegal_op; -+ } - b = cpu_ldub_code(env, s->pc); - s->pc++; - /* Collect prefixes. */ - --- -Pranith - -On 23/03/2017 17:50, Pranith Kumar wrote: -> -On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: -> -> -> -> -> -> On 22/03/2017 21:01, Richard Henderson wrote: -> ->>> -> ->>> Ah, OK. Thanks for the explanation. May be we should check the size of -> ->>> the instruction while decoding the prefixes and error out once we -> ->>> exceed the limit. We would not generate any IR code. -> ->> -> ->> Yes. -> ->> -> ->> It would not enforce a true limit of 15 bytes, since you can't know that -> ->> until you've done the rest of the decode. But you'd be able to say that -> ->> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 -> ->> bytes is used. -> ->> -> ->> Which does fix the bug. -> -> -> -> Yeah, that would work for 2.9 if somebody wants to put together a patch. -> -> Ensuring that all instruction fetching happens before translation side -> -> effects is a little harder, but perhaps it's also the opportunity to get -> -> rid of s->rip_offset which is a little ugly. -> -> -How about the following? -> -> -diff --git a/target/i386/translate.c b/target/i386/translate.c -> -index 72c1b03a2a..67c58b8900 100644 -> ---- a/target/i386/translate.c -> -+++ b/target/i386/translate.c -> -@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State -> -*env, DisasContext *s, -> -s->vex_l = 0; -> -s->vex_v = 0; -> -next_byte: -> -+ /* The prefixes can atmost be 14 bytes since x86 has an upper -> -+ limit of 15 bytes for the instruction */ -> -+ if (s->pc - pc_start > 14) { -> -+ goto illegal_op; -> -+ } -> -b = cpu_ldub_code(env, s->pc); -> -s->pc++; -> -/* Collect prefixes. */ -Please make the comment more verbose, based on Richard's remark. We -should apply it to 2.9. - -Also, QEMU usually formats comments with stars on every line. - -Paolo - -On Thu, Mar 23, 2017 at 1:37 PM, Paolo Bonzini <address@hidden> wrote: -> -> -> -On 23/03/2017 17:50, Pranith Kumar wrote: -> -> On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: -> ->> -> ->> -> ->> On 22/03/2017 21:01, Richard Henderson wrote: -> ->>>> -> ->>>> Ah, OK. Thanks for the explanation. May be we should check the size of -> ->>>> the instruction while decoding the prefixes and error out once we -> ->>>> exceed the limit. We would not generate any IR code. -> ->>> -> ->>> Yes. -> ->>> -> ->>> It would not enforce a true limit of 15 bytes, since you can't know that -> ->>> until you've done the rest of the decode. But you'd be able to say that -> ->>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 -> ->>> bytes is used. -> ->>> -> ->>> Which does fix the bug. -> ->> -> ->> Yeah, that would work for 2.9 if somebody wants to put together a patch. -> ->> Ensuring that all instruction fetching happens before translation side -> ->> effects is a little harder, but perhaps it's also the opportunity to get -> ->> rid of s->rip_offset which is a little ugly. -> -> -> -> How about the following? -> -> -> -> diff --git a/target/i386/translate.c b/target/i386/translate.c -> -> index 72c1b03a2a..67c58b8900 100644 -> -> --- a/target/i386/translate.c -> -> +++ b/target/i386/translate.c -> -> @@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State -> -> *env, DisasContext *s, -> -> s->vex_l = 0; -> -> s->vex_v = 0; -> -> next_byte: -> -> + /* The prefixes can atmost be 14 bytes since x86 has an upper -> -> + limit of 15 bytes for the instruction */ -> -> + if (s->pc - pc_start > 14) { -> -> + goto illegal_op; -> -> + } -> -> b = cpu_ldub_code(env, s->pc); -> -> s->pc++; -> -> /* Collect prefixes. */ -> -> -Please make the comment more verbose, based on Richard's remark. We -> -should apply it to 2.9. -> -> -Also, QEMU usually formats comments with stars on every line. -OK. I'll send a proper patch with updated comment. - -Thanks, --- -Pranith - diff --git a/classification_output/04/device/24190340 b/classification_output/04/device/24190340 deleted file mode 100644 index ebbe31252..000000000 --- a/classification_output/04/device/24190340 +++ /dev/null @@ -1,2064 +0,0 @@ -device: 0.832 -instruction: 0.818 -other: 0.811 -vnc: 0.808 -boot: 0.803 -semantic: 0.793 -assembly: 0.790 -KVM: 0.776 -graphic: 0.775 -mistranslation: 0.758 -network: 0.723 -socket: 0.715 - -[BUG, RFC] Block graph deadlock on job-dismiss - -Hi all, - -There's a bug in block layer which leads to block graph deadlock. -Notably, it takes place when blockdev IO is processed within a separate -iothread. - -This was initially caught by our tests, and I was able to reduce it to a -relatively simple reproducer. Such deadlocks are probably supposed to -be covered in iotests/graph-changes-while-io, but this deadlock isn't. - -Basically what the reproducer does is launches QEMU with a drive having -'iothread' option set, creates a chain of 2 snapshots, launches -block-commit job for a snapshot and then dismisses the job, starting -from the lower snapshot. If the guest is issuing IO at the same time, -there's a race in acquiring block graph lock and a potential deadlock. - -Here's how it can be reproduced: - -1. Run QEMU: -> -SRCDIR=/path/to/srcdir -> -> -> -> -> -$SRCDIR/build/qemu-system-x86_64 -enable-kvm \ -> -> --machine q35 -cpu Nehalem \ -> -> --name guest=alma8-vm,debug-threads=on \ -> -> --m 2g -smp 2 \ -> -> --nographic -nodefaults \ -> -> --qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ -> -> --serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ -> -> --object iothread,id=iothread0 \ -> -> --blockdev -> -node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 -> -\ -> --device virtio-blk-pci,drive=disk,iothread=iothread0 -2. Launch IO (random reads) from within the guest: -> -nc -U /var/run/alma8-serial.sock -> -... -> -[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k -> ---size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting -> ---rw=randread --iodepth=1 --filename=/testfile -3. Run snapshots creation & removal of lower snapshot operation in a -loop (script attached): -> -while /bin/true ; do ./remove_lower_snap.sh ; done -And then it occasionally hangs. - -Note: I've tried bisecting this, and looks like deadlock occurs starting -from the following commit: - -(BAD) 5bdbaebcce virtio: Re-enable notifications after drain -(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll - -On the latest v10.0.0 it does hang as well. - - -Here's backtrace of the main thread: - -> -#0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, -> -timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43 -> -#1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, -> -timeout=-1) at ../util/qemu-timer.c:329 -> -#2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -> -ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -> -#3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at -> -../util/aio-posix.c:730 -> -#4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, -> -parent=0x0, poll=true) at ../block/io.c:378 -> -#5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -> -../block/io.c:391 -> -#6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7682 -> -#7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7608 -> -#8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7668 -> -#9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7608 -> -#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7668 -> -#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7608 -> -#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../blockjob.c:157 -> -#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7592 -> -#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7661 -> -#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx -> -(child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = -> -{...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -> -#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7592 -> -#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -> -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -errp=0x0) -> -at ../block.c:7661 -> -#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, -> -ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 -> -#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at -> -../block.c:3317 -> -#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at -> -../blockjob.c:209 -> -#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -> -../blockjob.c:82 -> -#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at -> -../job.c:474 -> -#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -> -../job.c:771 -> -#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -> -errp=0x7ffd94b4f488) at ../job.c:783 -> ---Type <RET> for more, q to quit, c to continue without paging-- -> -#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", -> -errp=0x7ffd94b4f488) at ../job-qmp.c:138 -> -#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -> -ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -> -#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -> -../qapi/qmp-dispatch.c:128 -> -#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at -> -../util/async.c:172 -> -#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -> -../util/async.c:219 -> -#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -> -../util/aio-posix.c:436 -> -#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -> -callback=0x0, user_data=0x0) at ../util/async.c:361 -> -#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -> -../glib/gmain.c:3364 -> -#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 -> -#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 -> -#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -> -../util/main-loop.c:310 -> -#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -> -../util/main-loop.c:589 -> -#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -> -#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at -> -../system/main.c:50 -> -#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -> -../system/main.c:80 -And here's coroutine trying to acquire read lock: - -> -(gdb) qemu coroutine reader_queue->entries.sqh_first -> -#0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -> -to_=0x7fc537fff508, action=COROUTINE_YIELD) at -> -../util/coroutine-ucontext.c:321 -> -#1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -> -../util/qemu-coroutine.c:339 -> -#2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -> -<reader_queue>, lock=0x7fc53c57de50, flags=0) at -> -../util/qemu-coroutine-lock.c:60 -> -#3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231 -> -#4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at -> -/home/root/src/qemu/master/include/block/graph-lock.h:213 -> -#5 0x0000557eb460fa41 in blk_co_do_preadv_part -> -(blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, -> -qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339 -> -#6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -> -../block/block-backend.c:1619 -> -#7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at -> -../util/coroutine-ucontext.c:175 -> -#8 0x00007fc547c2a360 in __start_context () at -> -../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -> -#9 0x00007ffd94b4ea40 in () -> -#10 0x0000000000000000 in () -So it looks like main thread is processing job-dismiss request and is -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -above). At the same time iothread spawns a coroutine which performs IO -request. Before the coroutine is spawned, blk_aio_prwv() increases -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -trying to acquire the read lock. But main thread isn't releasing the -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -Here's the deadlock. - -Any comments and suggestions on the subject are welcomed. Thanks! - -Andrey -remove_lower_snap.sh -Description: -application/shellscript - -On 4/24/25 8:32 PM, Andrey Drobyshev wrote: -> -Hi all, -> -> -There's a bug in block layer which leads to block graph deadlock. -> -Notably, it takes place when blockdev IO is processed within a separate -> -iothread. -> -> -This was initially caught by our tests, and I was able to reduce it to a -> -relatively simple reproducer. Such deadlocks are probably supposed to -> -be covered in iotests/graph-changes-while-io, but this deadlock isn't. -> -> -Basically what the reproducer does is launches QEMU with a drive having -> -'iothread' option set, creates a chain of 2 snapshots, launches -> -block-commit job for a snapshot and then dismisses the job, starting -> -from the lower snapshot. If the guest is issuing IO at the same time, -> -there's a race in acquiring block graph lock and a potential deadlock. -> -> -Here's how it can be reproduced: -> -> -[...] -> -I took a closer look at iotests/graph-changes-while-io, and have managed -to reproduce the same deadlock in a much simpler setup, without a guest. - -1. Run QSD:> ./build/storage-daemon/qemu-storage-daemon --object -iothread,id=iothread0 \ -> ---blockdev null-co,node-name=node0,read-zeroes=true \ -> -> ---nbd-server addr.type=unix,addr.path=/var/run/qsd_nbd.sock \ -> -> ---export -> -nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true -> -\ -> ---chardev -> -socket,id=qmp-sock,path=/var/run/qsd_qmp.sock,server=on,wait=off \ -> ---monitor chardev=qmp-sock -2. Launch IO: -> -qemu-img bench -f raw -c 2000000 -> -'nbd+unix:///node0?socket=/var/run/qsd_nbd.sock' -3. Add 2 snapshots and remove lower one (script attached):> while -/bin/true ; do ./rls_qsd.sh ; done - -And then it hangs. - -I'll also send a patch with corresponding test case added directly to -iotests. - -This reproduce seems to be hanging starting from Fiona's commit -67446e605dc ("blockjob: drop AioContext lock before calling -bdrv_graph_wrlock()"). AioContext locks were dropped entirely later on -in Stefan's commit b49f4755c7 ("block: remove AioContext locking"), but -the problem remains. - -Andrey -rls_qsd.sh -Description: -application/shellscript - -From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> - -This case is catching potential deadlock which takes place when job-dismiss -is issued when I/O requests are processed in a separate iothread. - -See -https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html -Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> ---- - .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++-- - .../tests/graph-changes-while-io.out | 4 +- - 2 files changed, 96 insertions(+), 9 deletions(-) - -diff --git a/tests/qemu-iotests/tests/graph-changes-while-io -b/tests/qemu-iotests/tests/graph-changes-while-io -index 194fda500e..e30f823da4 100755 ---- a/tests/qemu-iotests/tests/graph-changes-while-io -+++ b/tests/qemu-iotests/tests/graph-changes-while-io -@@ -27,6 +27,8 @@ from iotests import imgfmt, qemu_img, qemu_img_create, -qemu_io, \ - - - top = os.path.join(iotests.test_dir, 'top.img') -+snap1 = os.path.join(iotests.test_dir, 'snap1.img') -+snap2 = os.path.join(iotests.test_dir, 'snap2.img') - nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock') - - -@@ -58,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase): - def tearDown(self) -> None: - self.qsd.stop() - -+ def _wait_for_blockjob(self, status) -> None: -+ done = False -+ while not done: -+ for event in self.qsd.get_qmp().get_events(wait=10.0): -+ if event['event'] != 'JOB_STATUS_CHANGE': -+ continue -+ if event['data']['status'] == status: -+ done = True -+ - def test_blockdev_add_while_io(self) -> None: - # Run qemu-img bench in the background - bench_thr = Thread(target=do_qemu_img_bench) -@@ -116,13 +127,89 @@ class TestGraphChangesWhileIO(QMPTestCase): - 'device': 'job0', - }) - -- cancelled = False -- while not cancelled: -- for event in self.qsd.get_qmp().get_events(wait=10.0): -- if event['event'] != 'JOB_STATUS_CHANGE': -- continue -- if event['data']['status'] == 'null': -- cancelled = True -+ self._wait_for_blockjob('null') -+ -+ bench_thr.join() -+ -+ def test_remove_lower_snapshot_while_io(self) -> None: -+ # Run qemu-img bench in the background -+ bench_thr = Thread(target=do_qemu_img_bench, args=(100000, )) -+ bench_thr.start() -+ -+ # While I/O is performed on 'node0' node, consequently add 2 snapshots -+ # on top of it, then remove (commit) them starting from lower one. -+ while bench_thr.is_alive(): -+ # Recreate snapshot images on every iteration -+ qemu_img_create('-f', imgfmt, snap1, '1G') -+ qemu_img_create('-f', imgfmt, snap2, '1G') -+ -+ self.qsd.cmd('blockdev-add', { -+ 'driver': imgfmt, -+ 'node-name': 'snap1', -+ 'file': { -+ 'driver': 'file', -+ 'filename': snap1 -+ } -+ }) -+ -+ self.qsd.cmd('blockdev-snapshot', { -+ 'node': 'node0', -+ 'overlay': 'snap1', -+ }) -+ -+ self.qsd.cmd('blockdev-add', { -+ 'driver': imgfmt, -+ 'node-name': 'snap2', -+ 'file': { -+ 'driver': 'file', -+ 'filename': snap2 -+ } -+ }) -+ -+ self.qsd.cmd('blockdev-snapshot', { -+ 'node': 'snap1', -+ 'overlay': 'snap2', -+ }) -+ -+ self.qsd.cmd('block-commit', { -+ 'job-id': 'commit-snap1', -+ 'device': 'snap2', -+ 'top-node': 'snap1', -+ 'base-node': 'node0', -+ 'auto-finalize': True, -+ 'auto-dismiss': False, -+ }) -+ -+ self._wait_for_blockjob('concluded') -+ self.qsd.cmd('job-dismiss', { -+ 'id': 'commit-snap1', -+ }) -+ -+ self.qsd.cmd('block-commit', { -+ 'job-id': 'commit-snap2', -+ 'device': 'snap2', -+ 'top-node': 'snap2', -+ 'base-node': 'node0', -+ 'auto-finalize': True, -+ 'auto-dismiss': False, -+ }) -+ -+ self._wait_for_blockjob('ready') -+ self.qsd.cmd('job-complete', { -+ 'id': 'commit-snap2', -+ }) -+ -+ self._wait_for_blockjob('concluded') -+ self.qsd.cmd('job-dismiss', { -+ 'id': 'commit-snap2', -+ }) -+ -+ self.qsd.cmd('blockdev-del', { -+ 'node-name': 'snap1' -+ }) -+ self.qsd.cmd('blockdev-del', { -+ 'node-name': 'snap2' -+ }) - - bench_thr.join() - -diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out -b/tests/qemu-iotests/tests/graph-changes-while-io.out -index fbc63e62f8..8d7e996700 100644 ---- a/tests/qemu-iotests/tests/graph-changes-while-io.out -+++ b/tests/qemu-iotests/tests/graph-changes-while-io.out -@@ -1,5 +1,5 @@ --.. -+... - ---------------------------------------------------------------------- --Ran 2 tests -+Ran 3 tests - - OK --- -2.43.5 - -Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: -> -So it looks like main thread is processing job-dismiss request and is -> -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -above). At the same time iothread spawns a coroutine which performs IO -> -request. Before the coroutine is spawned, blk_aio_prwv() increases -> -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -trying to acquire the read lock. But main thread isn't releasing the -> -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -Here's the deadlock. -And for the IO test you provided, it's client->nb_requests that behaves -similarly to blk->in_flight here. - -The issue also reproduces easily when issuing the following QMP command -in a loop while doing IO on a device: - -> -void qmp_block_locked_drain(const char *node_name, Error **errp) -> -{ -> -BlockDriverState *bs; -> -> -bs = bdrv_find_node(node_name); -> -if (!bs) { -> -error_setg(errp, "node not found"); -> -return; -> -} -> -> -bdrv_graph_wrlock(); -> -bdrv_drained_begin(bs); -> -bdrv_drained_end(bs); -> -bdrv_graph_wrunlock(); -> -} -It seems like either it would be necessary to require: -1. not draining inside an exclusively locked section -or -2. making sure that variables used by drained_poll routines are only set -while holding the reader lock -? - -Those seem to require rather involved changes, so a third option might -be to make draining inside an exclusively locked section possible, by -embedding such locked sections in a drained section: - -> -diff --git a/blockjob.c b/blockjob.c -> -index 32007f31a9..9b2f3b3ea9 100644 -> ---- a/blockjob.c -> -+++ b/blockjob.c -> -@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -* one to make sure that such a concurrent access does not attempt -> -* to process an already freed BdrvChild. -> -*/ -> -+ bdrv_drain_all_begin(); -> -bdrv_graph_wrlock(); -> -while (job->nodes) { -> -GSList *l = job->nodes; -> -@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -g_slist_free_1(l); -> -} -> -bdrv_graph_wrunlock(); -> -+ bdrv_drain_all_end(); -> -} -> -> -bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) -This seems to fix the issue at hand. I can send a patch if this is -considered an acceptable approach. - -Best Regards, -Fiona - -On 4/30/25 11:47 AM, Fiona Ebner wrote: -> -Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: -> -> So it looks like main thread is processing job-dismiss request and is -> -> holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -> above). At the same time iothread spawns a coroutine which performs IO -> -> request. Before the coroutine is spawned, blk_aio_prwv() increases -> -> 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -> trying to acquire the read lock. But main thread isn't releasing the -> -> lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -> Here's the deadlock. -> -> -And for the IO test you provided, it's client->nb_requests that behaves -> -similarly to blk->in_flight here. -> -> -The issue also reproduces easily when issuing the following QMP command -> -in a loop while doing IO on a device: -> -> -> void qmp_block_locked_drain(const char *node_name, Error **errp) -> -> { -> -> BlockDriverState *bs; -> -> -> -> bs = bdrv_find_node(node_name); -> -> if (!bs) { -> -> error_setg(errp, "node not found"); -> -> return; -> -> } -> -> -> -> bdrv_graph_wrlock(); -> -> bdrv_drained_begin(bs); -> -> bdrv_drained_end(bs); -> -> bdrv_graph_wrunlock(); -> -> } -> -> -It seems like either it would be necessary to require: -> -1. not draining inside an exclusively locked section -> -or -> -2. making sure that variables used by drained_poll routines are only set -> -while holding the reader lock -> -? -> -> -Those seem to require rather involved changes, so a third option might -> -be to make draining inside an exclusively locked section possible, by -> -embedding such locked sections in a drained section: -> -> -> diff --git a/blockjob.c b/blockjob.c -> -> index 32007f31a9..9b2f3b3ea9 100644 -> -> --- a/blockjob.c -> -> +++ b/blockjob.c -> -> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -> * one to make sure that such a concurrent access does not attempt -> -> * to process an already freed BdrvChild. -> -> */ -> -> + bdrv_drain_all_begin(); -> -> bdrv_graph_wrlock(); -> -> while (job->nodes) { -> -> GSList *l = job->nodes; -> -> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> -> g_slist_free_1(l); -> -> } -> -> bdrv_graph_wrunlock(); -> -> + bdrv_drain_all_end(); -> -> } -> -> -> -> bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) -> -> -This seems to fix the issue at hand. I can send a patch if this is -> -considered an acceptable approach. -> -> -Best Regards, -> -Fiona -> -Hello Fiona, - -Thanks for looking into it. I've tried your 3rd option above and can -confirm it does fix the deadlock, at least I can't reproduce it. Other -iotests also don't seem to be breaking. So I personally am fine with -that patch. Would be nice to hear a word from the maintainers though on -whether there're any caveats with such approach. - -Andrey - -On Wed, Apr 30, 2025 at 10:11 AM Andrey Drobyshev -<andrey.drobyshev@virtuozzo.com> wrote: -> -> -On 4/30/25 11:47 AM, Fiona Ebner wrote: -> -> Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: -> ->> So it looks like main thread is processing job-dismiss request and is -> ->> holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> ->> above). At the same time iothread spawns a coroutine which performs IO -> ->> request. Before the coroutine is spawned, blk_aio_prwv() increases -> ->> 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> ->> trying to acquire the read lock. But main thread isn't releasing the -> ->> lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> ->> Here's the deadlock. -> -> -> -> And for the IO test you provided, it's client->nb_requests that behaves -> -> similarly to blk->in_flight here. -> -> -> -> The issue also reproduces easily when issuing the following QMP command -> -> in a loop while doing IO on a device: -> -> -> ->> void qmp_block_locked_drain(const char *node_name, Error **errp) -> ->> { -> ->> BlockDriverState *bs; -> ->> -> ->> bs = bdrv_find_node(node_name); -> ->> if (!bs) { -> ->> error_setg(errp, "node not found"); -> ->> return; -> ->> } -> ->> -> ->> bdrv_graph_wrlock(); -> ->> bdrv_drained_begin(bs); -> ->> bdrv_drained_end(bs); -> ->> bdrv_graph_wrunlock(); -> ->> } -> -> -> -> It seems like either it would be necessary to require: -> -> 1. not draining inside an exclusively locked section -> -> or -> -> 2. making sure that variables used by drained_poll routines are only set -> -> while holding the reader lock -> -> ? -> -> -> -> Those seem to require rather involved changes, so a third option might -> -> be to make draining inside an exclusively locked section possible, by -> -> embedding such locked sections in a drained section: -> -> -> ->> diff --git a/blockjob.c b/blockjob.c -> ->> index 32007f31a9..9b2f3b3ea9 100644 -> ->> --- a/blockjob.c -> ->> +++ b/blockjob.c -> ->> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> ->> * one to make sure that such a concurrent access does not attempt -> ->> * to process an already freed BdrvChild. -> ->> */ -> ->> + bdrv_drain_all_begin(); -> ->> bdrv_graph_wrlock(); -> ->> while (job->nodes) { -> ->> GSList *l = job->nodes; -> ->> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) -> ->> g_slist_free_1(l); -> ->> } -> ->> bdrv_graph_wrunlock(); -> ->> + bdrv_drain_all_end(); -> ->> } -> ->> -> ->> bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) -> -> -> -> This seems to fix the issue at hand. I can send a patch if this is -> -> considered an acceptable approach. -Kevin is aware of this thread but it's a public holiday tomorrow so it -may be a little longer. - -Stefan - -Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: -> -Hi all, -> -> -There's a bug in block layer which leads to block graph deadlock. -> -Notably, it takes place when blockdev IO is processed within a separate -> -iothread. -> -> -This was initially caught by our tests, and I was able to reduce it to a -> -relatively simple reproducer. Such deadlocks are probably supposed to -> -be covered in iotests/graph-changes-while-io, but this deadlock isn't. -> -> -Basically what the reproducer does is launches QEMU with a drive having -> -'iothread' option set, creates a chain of 2 snapshots, launches -> -block-commit job for a snapshot and then dismisses the job, starting -> -from the lower snapshot. If the guest is issuing IO at the same time, -> -there's a race in acquiring block graph lock and a potential deadlock. -> -> -Here's how it can be reproduced: -> -> -1. Run QEMU: -> -> SRCDIR=/path/to/srcdir -> -> -> -> -> -> -> -> -> -> $SRCDIR/build/qemu-system-x86_64 -enable-kvm \ -> -> -> -> -machine q35 -cpu Nehalem \ -> -> -> -> -name guest=alma8-vm,debug-threads=on \ -> -> -> -> -m 2g -smp 2 \ -> -> -> -> -nographic -nodefaults \ -> -> -> -> -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ -> -> -> -> -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ -> -> -> -> -object iothread,id=iothread0 \ -> -> -> -> -blockdev -> -> node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 -> -> \ -> -> -device virtio-blk-pci,drive=disk,iothread=iothread0 -> -> -2. Launch IO (random reads) from within the guest: -> -> nc -U /var/run/alma8-serial.sock -> -> ... -> -> [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k -> -> --size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting -> -> --rw=randread --iodepth=1 --filename=/testfile -> -> -3. Run snapshots creation & removal of lower snapshot operation in a -> -loop (script attached): -> -> while /bin/true ; do ./remove_lower_snap.sh ; done -> -> -And then it occasionally hangs. -> -> -Note: I've tried bisecting this, and looks like deadlock occurs starting -> -from the following commit: -> -> -(BAD) 5bdbaebcce virtio: Re-enable notifications after drain -> -(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll -> -> -On the latest v10.0.0 it does hang as well. -> -> -> -Here's backtrace of the main thread: -> -> -> #0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, -> -> timeout=<optimized out>, sigmask=0x0) at -> -> ../sysdeps/unix/sysv/linux/ppoll.c:43 -> -> #1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, -> -> timeout=-1) at ../util/qemu-timer.c:329 -> -> #2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -> -> ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -> -> #3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at -> -> ../util/aio-posix.c:730 -> -> #4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, -> -> parent=0x0, poll=true) at ../block/io.c:378 -> -> #5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -> -> ../block/io.c:391 -> -> #6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7682 -> -> #7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7608 -> -> #8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7668 -> -> #9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7608 -> -> #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7668 -> -> #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7608 -> -> #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../blockjob.c:157 -> -> #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7592 -> -> #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7661 -> -> #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx -> -> (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = -> -> {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -> -> #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7592 -> -> #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -> -> errp=0x0) -> -> at ../block.c:7661 -> -> #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, -> -> ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 -> -> #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at -> -> ../block.c:3317 -> -> #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at -> -> ../blockjob.c:209 -> -> #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -> -> ../blockjob.c:82 -> -> #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at -> -> ../job.c:474 -> -> #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -> -> ../job.c:771 -> -> #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -> -> errp=0x7ffd94b4f488) at ../job.c:783 -> -> --Type <RET> for more, q to quit, c to continue without paging-- -> -> #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 -> -> "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138 -> -> #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -> -> ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -> -> #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -> -> ../qapi/qmp-dispatch.c:128 -> -> #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at -> -> ../util/async.c:172 -> -> #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -> -> ../util/async.c:219 -> -> #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -> -> ../util/aio-posix.c:436 -> -> #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -> -> callback=0x0, user_data=0x0) at ../util/async.c:361 -> -> #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -> -> ../glib/gmain.c:3364 -> -> #33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 -> -> #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 -> -> #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -> -> ../util/main-loop.c:310 -> -> #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -> -> ../util/main-loop.c:589 -> -> #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -> -> #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at -> -> ../system/main.c:50 -> -> #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -> -> ../system/main.c:80 -> -> -> -And here's coroutine trying to acquire read lock: -> -> -> (gdb) qemu coroutine reader_queue->entries.sqh_first -> -> #0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -> -> to_=0x7fc537fff508, action=COROUTINE_YIELD) at -> -> ../util/coroutine-ucontext.c:321 -> -> #1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -> -> ../util/qemu-coroutine.c:339 -> -> #2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -> -> <reader_queue>, lock=0x7fc53c57de50, flags=0) at -> -> ../util/qemu-coroutine-lock.c:60 -> -> #3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at -> -> ../block/graph-lock.c:231 -> -> #4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at -> -> /home/root/src/qemu/master/include/block/graph-lock.h:213 -> -> #5 0x0000557eb460fa41 in blk_co_do_preadv_part -> -> (blk=0x557eb84c0810, offset=6890553344, bytes=4096, -> -> qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at -> -> ../block/block-backend.c:1339 -> -> #6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -> -> ../block/block-backend.c:1619 -> -> #7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) -> -> at ../util/coroutine-ucontext.c:175 -> -> #8 0x00007fc547c2a360 in __start_context () at -> -> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -> -> #9 0x00007ffd94b4ea40 in () -> -> #10 0x0000000000000000 in () -> -> -> -So it looks like main thread is processing job-dismiss request and is -> -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -above). At the same time iothread spawns a coroutine which performs IO -> -request. Before the coroutine is spawned, blk_aio_prwv() increases -> -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -trying to acquire the read lock. But main thread isn't releasing the -> -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -Here's the deadlock. -> -> -Any comments and suggestions on the subject are welcomed. Thanks! -I think this is what the blk_wait_while_drained() call was supposed to -address in blk_co_do_preadv_part(). However, with the use of multiple -I/O threads, this is racy. - -Do you think that in your case we hit the small race window between the -checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there -another reason why blk_wait_while_drained() didn't do its job? - -Kevin - -On 5/2/25 19:34, Kevin Wolf wrote: -Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: -Hi all, - -There's a bug in block layer which leads to block graph deadlock. -Notably, it takes place when blockdev IO is processed within a separate -iothread. - -This was initially caught by our tests, and I was able to reduce it to a -relatively simple reproducer. Such deadlocks are probably supposed to -be covered in iotests/graph-changes-while-io, but this deadlock isn't. - -Basically what the reproducer does is launches QEMU with a drive having -'iothread' option set, creates a chain of 2 snapshots, launches -block-commit job for a snapshot and then dismisses the job, starting -from the lower snapshot. If the guest is issuing IO at the same time, -there's a race in acquiring block graph lock and a potential deadlock. - -Here's how it can be reproduced: - -1. Run QEMU: -SRCDIR=/path/to/srcdir -$SRCDIR/build/qemu-system-x86_64 -enable-kvm \ --machine q35 -cpu Nehalem \ - -name guest=alma8-vm,debug-threads=on \ - -m 2g -smp 2 \ - -nographic -nodefaults \ - -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ - -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ - -object iothread,id=iothread0 \ - -blockdev -node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 - \ - -device virtio-blk-pci,drive=disk,iothread=iothread0 -2. Launch IO (random reads) from within the guest: -nc -U /var/run/alma8-serial.sock -... -[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k ---size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting ---rw=randread --iodepth=1 --filename=/testfile -3. Run snapshots creation & removal of lower snapshot operation in a -loop (script attached): -while /bin/true ; do ./remove_lower_snap.sh ; done -And then it occasionally hangs. - -Note: I've tried bisecting this, and looks like deadlock occurs starting -from the following commit: - -(BAD) 5bdbaebcce virtio: Re-enable notifications after drain -(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll - -On the latest v10.0.0 it does hang as well. - - -Here's backtrace of the main thread: -#0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, timeout=<optimized -out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43 -#1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, timeout=-1) -at ../util/qemu-timer.c:329 -#2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -#3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at -../util/aio-posix.c:730 -#4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, parent=0x0, -poll=true) at ../block/io.c:378 -#5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -../block/io.c:391 -#6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7682 -#7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7608 -#8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7668 -#9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7608 -#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7668 -#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7608 -#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../blockjob.c:157 -#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7592 -#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7661 -#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx - (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7592 -#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, -errp=0x0) - at ../block.c:7661 -#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, -ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 -#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at -../block.c:3317 -#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at -../blockjob.c:209 -#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -../blockjob.c:82 -#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at ../job.c:474 -#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -../job.c:771 -#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -errp=0x7ffd94b4f488) at ../job.c:783 ---Type <RET> for more, q to quit, c to continue without paging-- -#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", -errp=0x7ffd94b4f488) at ../job-qmp.c:138 -#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -../qapi/qmp-dispatch.c:128 -#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at ../util/async.c:172 -#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -../util/async.c:219 -#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -../util/aio-posix.c:436 -#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -callback=0x0, user_data=0x0) at ../util/async.c:361 -#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -../glib/gmain.c:3364 -#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 -#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 -#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -../util/main-loop.c:310 -#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -../util/main-loop.c:589 -#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at ../system/main.c:50 -#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -../system/main.c:80 -And here's coroutine trying to acquire read lock: -(gdb) qemu coroutine reader_queue->entries.sqh_first -#0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -to_=0x7fc537fff508, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:321 -#1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -../util/qemu-coroutine.c:339 -#2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -<reader_queue>, lock=0x7fc53c57de50, flags=0) at -../util/qemu-coroutine-lock.c:60 -#3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231 -#4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at -/home/root/src/qemu/master/include/block/graph-lock.h:213 -#5 0x0000557eb460fa41 in blk_co_do_preadv_part - (blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, -qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339 -#6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -../block/block-backend.c:1619 -#7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at -../util/coroutine-ucontext.c:175 -#8 0x00007fc547c2a360 in __start_context () at -../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -#9 0x00007ffd94b4ea40 in () -#10 0x0000000000000000 in () -So it looks like main thread is processing job-dismiss request and is -holding write lock taken in block_job_remove_all_bdrv() (frame #20 -above). At the same time iothread spawns a coroutine which performs IO -request. Before the coroutine is spawned, blk_aio_prwv() increases -'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -trying to acquire the read lock. But main thread isn't releasing the -lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -Here's the deadlock. - -Any comments and suggestions on the subject are welcomed. Thanks! -I think this is what the blk_wait_while_drained() call was supposed to -address in blk_co_do_preadv_part(). However, with the use of multiple -I/O threads, this is racy. - -Do you think that in your case we hit the small race window between the -checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there -another reason why blk_wait_while_drained() didn't do its job? - -Kevin -At my opinion there is very big race window. Main thread has -eaten graph write lock. After that another coroutine is stalled -within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only -after that main thread has started drain. That is why Fiona's idea is -looking working. Though this would mean that normally we should always -do that at the moment when we acquire write lock. May be even inside -this function. Den - -Am 02.05.2025 um 19:52 hat Denis V. Lunev geschrieben: -> -On 5/2/25 19:34, Kevin Wolf wrote: -> -> Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: -> -> > Hi all, -> -> > -> -> > There's a bug in block layer which leads to block graph deadlock. -> -> > Notably, it takes place when blockdev IO is processed within a separate -> -> > iothread. -> -> > -> -> > This was initially caught by our tests, and I was able to reduce it to a -> -> > relatively simple reproducer. Such deadlocks are probably supposed to -> -> > be covered in iotests/graph-changes-while-io, but this deadlock isn't. -> -> > -> -> > Basically what the reproducer does is launches QEMU with a drive having -> -> > 'iothread' option set, creates a chain of 2 snapshots, launches -> -> > block-commit job for a snapshot and then dismisses the job, starting -> -> > from the lower snapshot. If the guest is issuing IO at the same time, -> -> > there's a race in acquiring block graph lock and a potential deadlock. -> -> > -> -> > Here's how it can be reproduced: -> -> > -> -> > 1. Run QEMU: -> -> > > SRCDIR=/path/to/srcdir -> -> > > $SRCDIR/build/qemu-system-x86_64 -enable-kvm \ -> -> > > -machine q35 -cpu Nehalem \ -> -> > > -name guest=alma8-vm,debug-threads=on \ -> -> > > -m 2g -smp 2 \ -> -> > > -nographic -nodefaults \ -> -> > > -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ -> -> > > -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ -> -> > > -object iothread,id=iothread0 \ -> -> > > -blockdev -> -> > > node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 -> -> > > \ -> -> > > -device virtio-blk-pci,drive=disk,iothread=iothread0 -> -> > 2. Launch IO (random reads) from within the guest: -> -> > > nc -U /var/run/alma8-serial.sock -> -> > > ... -> -> > > [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 -> -> > > --bs=4k --size=1G --numjobs=1 --time_based=1 --runtime=300 -> -> > > --group_reporting --rw=randread --iodepth=1 --filename=/testfile -> -> > 3. Run snapshots creation & removal of lower snapshot operation in a -> -> > loop (script attached): -> -> > > while /bin/true ; do ./remove_lower_snap.sh ; done -> -> > And then it occasionally hangs. -> -> > -> -> > Note: I've tried bisecting this, and looks like deadlock occurs starting -> -> > from the following commit: -> -> > -> -> > (BAD) 5bdbaebcce virtio: Re-enable notifications after drain -> -> > (GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll -> -> > -> -> > On the latest v10.0.0 it does hang as well. -> -> > -> -> > -> -> > Here's backtrace of the main thread: -> -> > -> -> > > #0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, -> -> > > timeout=<optimized out>, sigmask=0x0) at -> -> > > ../sysdeps/unix/sysv/linux/ppoll.c:43 -> -> > > #1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, -> -> > > timeout=-1) at ../util/qemu-timer.c:329 -> -> > > #2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, -> -> > > ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 -> -> > > #3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) -> -> > > at ../util/aio-posix.c:730 -> -> > > #4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, -> -> > > parent=0x0, poll=true) at ../block/io.c:378 -> -> > > #5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at -> -> > > ../block/io.c:391 -> -> > > #6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7682 -> -> > > #7 0x0000557eb45ebf2b in bdrv_child_change_aio_context -> -> > > (c=0x557eb7964250, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7608 -> -> > > #8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7668 -> -> > > #9 0x0000557eb45ebf2b in bdrv_child_change_aio_context -> -> > > (c=0x557eb7e59110, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7608 -> -> > > #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7668 -> -> > > #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context -> -> > > (c=0x557eb814ed80, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7608 -> -> > > #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../blockjob.c:157 -> -> > > #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context -> -> > > (c=0x557eb7c9d3f0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7592 -> -> > > #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7661 -> -> > > #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx -> -> > > (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 -> -> > > = {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 -> -> > > #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context -> -> > > (c=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7592 -> -> > > #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, -> -> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, -> -> > > tran=0x557eb7a87160, errp=0x0) -> -> > > at ../block.c:7661 -> -> > > #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context -> -> > > (bs=0x557eb79575e0, ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at -> -> > > ../block.c:7715 -> -> > > #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) -> -> > > at ../block.c:3317 -> -> > > #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv -> -> > > (job=0x557eb7952800) at ../blockjob.c:209 -> -> > > #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at -> -> > > ../blockjob.c:82 -> -> > > #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at -> -> > > ../job.c:474 -> -> > > #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at -> -> > > ../job.c:771 -> -> > > #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, -> -> > > errp=0x7ffd94b4f488) at ../job.c:783 -> -> > > --Type <RET> for more, q to quit, c to continue without paging-- -> -> > > #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 -> -> > > "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138 -> -> > > #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, -> -> > > ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 -> -> > > #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at -> -> > > ../qapi/qmp-dispatch.c:128 -> -> > > #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at -> -> > > ../util/async.c:172 -> -> > > #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at -> -> > > ../util/async.c:219 -> -> > > #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at -> -> > > ../util/aio-posix.c:436 -> -> > > #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, -> -> > > callback=0x0, user_data=0x0) at ../util/async.c:361 -> -> > > #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at -> -> > > ../glib/gmain.c:3364 -> -> > > #33 g_main_context_dispatch (context=0x557eb76c6430) at -> -> > > ../glib/gmain.c:4079 -> -> > > #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at -> -> > > ../util/main-loop.c:287 -> -> > > #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at -> -> > > ../util/main-loop.c:310 -> -> > > #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at -> -> > > ../util/main-loop.c:589 -> -> > > #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 -> -> > > #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at -> -> > > ../system/main.c:50 -> -> > > #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at -> -> > > ../system/main.c:80 -> -> > -> -> > And here's coroutine trying to acquire read lock: -> -> > -> -> > > (gdb) qemu coroutine reader_queue->entries.sqh_first -> -> > > #0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, -> -> > > to_=0x7fc537fff508, action=COROUTINE_YIELD) at -> -> > > ../util/coroutine-ucontext.c:321 -> -> > > #1 0x0000557eb47d4d4a in qemu_coroutine_yield () at -> -> > > ../util/qemu-coroutine.c:339 -> -> > > #2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 -> -> > > <reader_queue>, lock=0x7fc53c57de50, flags=0) at -> -> > > ../util/qemu-coroutine-lock.c:60 -> -> > > #3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at -> -> > > ../block/graph-lock.c:231 -> -> > > #4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) -> -> > > at /home/root/src/qemu/master/include/block/graph-lock.h:213 -> -> > > #5 0x0000557eb460fa41 in blk_co_do_preadv_part -> -> > > (blk=0x557eb84c0810, offset=6890553344, bytes=4096, -> -> > > qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at -> -> > > ../block/block-backend.c:1339 -> -> > > #6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at -> -> > > ../block/block-backend.c:1619 -> -> > > #7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, -> -> > > i1=21886) at ../util/coroutine-ucontext.c:175 -> -> > > #8 0x00007fc547c2a360 in __start_context () at -> -> > > ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 -> -> > > #9 0x00007ffd94b4ea40 in () -> -> > > #10 0x0000000000000000 in () -> -> > -> -> > So it looks like main thread is processing job-dismiss request and is -> -> > holding write lock taken in block_job_remove_all_bdrv() (frame #20 -> -> > above). At the same time iothread spawns a coroutine which performs IO -> -> > request. Before the coroutine is spawned, blk_aio_prwv() increases -> -> > 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is -> -> > trying to acquire the read lock. But main thread isn't releasing the -> -> > lock as blk_root_drained_poll() returns true since blk->in_flight > 0. -> -> > Here's the deadlock. -> -> > -> -> > Any comments and suggestions on the subject are welcomed. Thanks! -> -> I think this is what the blk_wait_while_drained() call was supposed to -> -> address in blk_co_do_preadv_part(). However, with the use of multiple -> -> I/O threads, this is racy. -> -> -> -> Do you think that in your case we hit the small race window between the -> -> checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there -> -> another reason why blk_wait_while_drained() didn't do its job? -> -> -> -At my opinion there is very big race window. Main thread has -> -eaten graph write lock. After that another coroutine is stalled -> -within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only -> -after that main thread has started drain. -You're right, I confused taking the write lock with draining there. - -> -That is why Fiona's idea is looking working. Though this would mean -> -that normally we should always do that at the moment when we acquire -> -write lock. May be even inside this function. -I actually see now that not all of my graph locking patches were merged. -At least I did have the thought that bdrv_drained_begin() must be marked -GRAPH_UNLOCKED because it polls. That means that calling it from inside -bdrv_try_change_aio_context() is actually forbidden (and that's the part -I didn't see back then because it doesn't have TSA annotations). - -If you refactor the code to move the drain out to before the lock is -taken, I think you end up with Fiona's patch, except you'll remove the -forbidden inner drain and add more annotations for some functions and -clarify the rules around them. I don't know, but I wouldn't be surprised -if along the process we find other bugs, too. - -So Fiona's drain looks right to me, but we should probably approach it -more systematically. - -Kevin - diff --git a/classification_output/04/device/24930826 b/classification_output/04/device/24930826 deleted file mode 100644 index b4e6bcf3c..000000000 --- a/classification_output/04/device/24930826 +++ /dev/null @@ -1,41 +0,0 @@ -device: 0.709 -graphic: 0.667 -mistranslation: 0.637 -instruction: 0.555 -other: 0.535 -network: 0.513 -semantic: 0.487 -vnc: 0.473 -socket: 0.447 -boot: 0.218 -KVM: 0.172 -assembly: 0.142 - -[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate - -Hi, guys - -I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 -(Guest OS) - -The xml of nic is as followed: -<interface type='vhostuser'> - <mac address='52:54:00:3b:83:aa'/> - <source type='unix' path='/var/run/vhost-user/port1' mode='client'/> - <target dev='port1'/> - <model type='virtio'/> - <driver queues='4'/> - <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> -</interface> - -Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest -OS. This operation returns success. -After guest OS discover nic successfully, I use virsh detach-device win2008 -vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate. - -However, if I hot-plug and hot-unplug virtio-net nic , it will not fail. - -I have analysis the process of qmp_device_del , I found that qemu have inject -interrupt to acpi to let it notice guest OS to remove nic. -I guess there is something wrong in Windows when handle the interrupt. - diff --git a/classification_output/04/device/28596630 b/classification_output/04/device/28596630 deleted file mode 100644 index 91a754e21..000000000 --- a/classification_output/04/device/28596630 +++ /dev/null @@ -1,121 +0,0 @@ -device: 0.835 -semantic: 0.814 -mistranslation: 0.813 -graphic: 0.785 -network: 0.780 -instruction: 0.748 -assembly: 0.725 -other: 0.707 -socket: 0.697 -vnc: 0.674 -KVM: 0.649 -boot: 0.609 - -[Qemu-devel] [BUG] [low severity] a strange appearance of message involving slirp while doing "empty" make - -Folks, - -If qemu tree is already fully built, and "make" is attempted, for 3.1, the -outcome is: - -$ make - CHK version_gen.h -$ - -For 4.0-rc0, the outcome seems to be different: - -$ make -make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp' -make[1]: Nothing to be done for 'all'. -make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp' - CHK version_gen.h -$ - -Not sure how significant is that, but I report it just in case. - -Yours, -Aleksandar - -On 20/03/2019 22.08, Aleksandar Markovic wrote: -> -Folks, -> -> -If qemu tree is already fully built, and "make" is attempted, for 3.1, the -> -outcome is: -> -> -$ make -> -CHK version_gen.h -> -$ -> -> -For 4.0-rc0, the outcome seems to be different: -> -> -$ make -> -make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp' -> -make[1]: Nothing to be done for 'all'. -> -make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp' -> -CHK version_gen.h -> -$ -> -> -Not sure how significant is that, but I report it just in case. -It's likely because slirp is currently being reworked to become a -separate project, so the makefiles have been changed a little bit. I -guess the message will go away again once slirp has become a stand-alone -library. - - Thomas - -On Fri, 22 Mar 2019 at 04:59, Thomas Huth <address@hidden> wrote: -> -On 20/03/2019 22.08, Aleksandar Markovic wrote: -> -> $ make -> -> make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp' -> -> make[1]: Nothing to be done for 'all'. -> -> make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp' -> -> CHK version_gen.h -> -> $ -> -> -> -> Not sure how significant is that, but I report it just in case. -> -> -It's likely because slirp is currently being reworked to become a -> -separate project, so the makefiles have been changed a little bit. I -> -guess the message will go away again once slirp has become a stand-alone -> -library. -Well, we'll still need to ship slirp for the foreseeable future... - -I think the cause of this is that the rule in Makefile for -calling the slirp Makefile is not passing it $(SUBDIR_MAKEFLAGS) -like all the other recursive make invocations. If we do that -then we'll suppress the entering/leaving messages for -non-verbose builds. (Some tweaking will be needed as -it looks like the slirp makefile has picked an incompatible -meaning for $BUILD_DIR, which the SUBDIR_MAKEFLAGS will -also be passing to it.) - -thanks --- PMM - diff --git a/classification_output/04/device/42226390 b/classification_output/04/device/42226390 deleted file mode 100644 index 685aeaae5..000000000 --- a/classification_output/04/device/42226390 +++ /dev/null @@ -1,195 +0,0 @@ -device: 0.951 -boot: 0.943 -graphic: 0.942 -instruction: 0.925 -semantic: 0.924 -assembly: 0.919 -KVM: 0.905 -network: 0.894 -other: 0.894 -socket: 0.882 -vnc: 0.853 -mistranslation: 0.826 - -[BUG] AArch64 boot hang with -icount and -smp >1 (iothread locking issue?) - -Hello, - -I am encountering one or more bugs when using -icount and -smp >1 that I am -attempting to sort out. My current theory is that it is an iothread locking -issue. - -I am using a command-line like the following where $kernel is a recent upstream -AArch64 Linux kernel Image (I can provide a binary if that would be helpful - -let me know how is best to post): - - qemu-system-aarch64 \ - -M virt -cpu cortex-a57 -m 1G \ - -nographic \ - -smp 2 \ - -icount 0 \ - -kernel $kernel - -For any/all of the symptoms described below, they seem to disappear when I -either remove `-icount 0` or change smp to `-smp 1`. In other words, it is the -combination of `-smp >1` and `-icount` which triggers what I'm seeing. - -I am seeing two different (but seemingly related) behaviors. The first (and -what I originally started debugging) shows up as a boot hang. When booting -using the above command after Peter's "icount: Take iothread lock when running -QEMU timers" patch [1], The kernel boots for a while and then hangs after: - -> -...snip... -> -[ 0.010764] Serial: AMBA PL011 UART driver -> -[ 0.016334] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 13, base_baud -> -= 0) is a PL011 rev1 -> -[ 0.016907] printk: console [ttyAMA0] enabled -> -[ 0.017624] KASLR enabled -> -[ 0.031986] HugeTLB: registered 16.0 GiB page size, pre-allocated 0 pages -> -[ 0.031986] HugeTLB: 16320 KiB vmemmap can be freed for a 16.0 GiB page -> -[ 0.031986] HugeTLB: registered 512 MiB page size, pre-allocated 0 pages -> -[ 0.031986] HugeTLB: 448 KiB vmemmap can be freed for a 512 MiB page -> -[ 0.031986] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages -> -[ 0.031986] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page -When it hangs here, I drop into QEMU's console, attach to the gdbserver, and it -always reports that it is at address 0xffff800008dc42e8 (as shown below from an -objdump of the vmlinux). I note this is in the middle of messing with timer -system registers - which makes me suspect we're attempting to take the iothread -lock when its already held: - -> -ffff800008dc42b8 <arch_timer_set_next_event_virt>: -> -ffff800008dc42b8: d503201f nop -> -ffff800008dc42bc: d503201f nop -> -ffff800008dc42c0: d503233f paciasp -> -ffff800008dc42c4: d53be321 mrs x1, cntv_ctl_el0 -> -ffff800008dc42c8: 32000021 orr w1, w1, #0x1 -> -ffff800008dc42cc: d5033fdf isb -> -ffff800008dc42d0: d53be042 mrs x2, cntvct_el0 -> -ffff800008dc42d4: ca020043 eor x3, x2, x2 -> -ffff800008dc42d8: 8b2363e3 add x3, sp, x3 -> -ffff800008dc42dc: f940007f ldr xzr, [x3] -> -ffff800008dc42e0: 8b020000 add x0, x0, x2 -> -ffff800008dc42e4: d51be340 msr cntv_cval_el0, x0 -> -* ffff800008dc42e8: 927ef820 and x0, x1, #0xfffffffffffffffd -> -ffff800008dc42ec: d51be320 msr cntv_ctl_el0, x0 -> -ffff800008dc42f0: d5033fdf isb -> -ffff800008dc42f4: 52800000 mov w0, #0x0 -> -// #0 -> -ffff800008dc42f8: d50323bf autiasp -> -ffff800008dc42fc: d65f03c0 ret -The second behavior is that prior to Peter's "icount: Take iothread lock when -running QEMU timers" patch [1], I observe the following message (same command -as above): - -> -ERROR:../accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed: -> -(qemu_mutex_iothread_locked()) -> -Aborted (core dumped) -This is the same behavior described in Gitlab issue 1130 [0] and addressed by -[1]. I bisected the appearance of this assertion, and found it was introduced -by Pavel's "replay: rewrite async event handling" commit [2]. Commits prior to -that one boot successfully (neither assertions nor hangs) with `-icount 0 -smp -2`. - -I've looked over these two commits ([1], [2]), but it is not obvious to me -how/why they might be interacting to produce the boot hangs I'm seeing and -I welcome any help investigating further. - -Thanks! - --Aaron Lindsay - -[0] - -https://gitlab.com/qemu-project/qemu/-/issues/1130 -[1] - -https://gitlab.com/qemu-project/qemu/-/commit/c7f26ded6d5065e4116f630f6a490b55f6c5f58e -[2] - -https://gitlab.com/qemu-project/qemu/-/commit/60618e2d77691e44bb78e23b2b0cf07b5c405e56 - -On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay -<aaron@os.amperecomputing.com> wrote: -> -> -Hello, -> -> -I am encountering one or more bugs when using -icount and -smp >1 that I am -> -attempting to sort out. My current theory is that it is an iothread locking -> -issue. -Weird coincidence, that is a bug that's been in the tree for months -but was only reported to me earlier this week. Try reverting -commit a82fd5a4ec24d923ff1e -- that should fix it. -CAFEAcA_i8x00hD-4XX18ySLNbCB6ds1-DSazVb4yDnF8skjd9A@mail.gmail.com -/">https://lore.kernel.org/qemu-devel/ -CAFEAcA_i8x00hD-4XX18ySLNbCB6ds1-DSazVb4yDnF8skjd9A@mail.gmail.com -/ -has the explanation. - -thanks --- PMM - -On Oct 21 17:00, Peter Maydell wrote: -> -On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay -> -<aaron@os.amperecomputing.com> wrote: -> -> -> -> Hello, -> -> -> -> I am encountering one or more bugs when using -icount and -smp >1 that I am -> -> attempting to sort out. My current theory is that it is an iothread locking -> -> issue. -> -> -Weird coincidence, that is a bug that's been in the tree for months -> -but was only reported to me earlier this week. Try reverting -> -commit a82fd5a4ec24d923ff1e -- that should fix it. -I can confirm that reverting a82fd5a4ec24d923ff1e fixes it for me. -Thanks for the help and fast response! - --Aaron - diff --git a/classification_output/04/device/57195159 b/classification_output/04/device/57195159 deleted file mode 100644 index bfe64b270..000000000 --- a/classification_output/04/device/57195159 +++ /dev/null @@ -1,323 +0,0 @@ -device: 0.877 -other: 0.868 -graphic: 0.861 -instruction: 0.833 -semantic: 0.794 -assembly: 0.782 -boot: 0.781 -KVM: 0.752 -socket: 0.750 -network: 0.687 -mistranslation: 0.665 -vnc: 0.626 - -[BUG Report] Got a use-after-free error while start arm64 VM with lots of pci controllers - -Hi, - -We got a use-after-free report in our Euler Robot Test, it is can be reproduced -quite easily, -It can be reproduced by start VM with lots of pci controller and virtio-scsi -devices. -You can find the full qemu log from attachment. -We have analyzed the log and got the rough process how it happened, but don't -know how to fix it. - -Could anyone help to fix it ? - -The key message shows bellow: -har device redirected to /dev/pts/1 (label charserial0) -==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext -functions and may produce false positives in some cases! -================================================================= -==1517174==ERROR: AddressSanitizer: heap-use-after-free on address -0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0 -READ of size 8 at 0xfffc31a002a0 thread T1 - #0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771 - #1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291 - #2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283 - #3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 - #4 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) - #5 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) - -0xfffc31a002a0 is located 544 bytes inside of 1440-byte region -[0xfffc31a00080,0xfffc31a00620) -freed by thread T37 (CPU 0/KVM) here: - #0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23) - #1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f) - #2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245 - #3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271 - #4 0xaaad745ba867 in pci_bridge_dev_write_config -hw/pci-bridge/pci_bridge_dev.c:153 - #5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81 - #6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483 - #7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544 - #8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482 - #9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167 - #10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207 - #11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297 - #12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386 - #13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246 - #14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 - #15 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) - #16 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) - -previously allocated by thread T0 here: - #0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) - #1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) - #2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188 - #3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385 - #4 0xaaad745baaf3 in pci_bridge_dev_realize -hw/pci-bridge/pci_bridge_dev.c:64 - #5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095 - #6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 - #7 0xaaad7485ed23 in property_set_bool qom/object.c:2102 - #8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 - #9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 - #10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675 - #11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074 - #12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170 - #13 0xaaad73d60c17 in main /home/qemu/vl.c:4313 - #14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) - #15 0xaaad73d6db33 -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) - -Thread T1 created by T0 here: - #0 0xfffc3c068f6f in __interceptor_pthread_create -(/lib64/libasan.so.4+0x38f6f) - #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 - #2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326 - #3 0xaaad74bab2a7 in __libc_csu_init -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7) - #4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47) - #5 0xaaad73d6db33 (/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) - -Thread T37 (CPU 0/KVM) created by T0 here: - #0 0xfffc3c068f6f in __interceptor_pthread_create -(/lib64/libasan.so.4+0x38f6f) - #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 - #2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045 - #3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077 - #4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712 - #5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 - #6 0xaaad7485ed23 in property_set_bool qom/object.c:2102 - #7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 - #8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 - #9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682 - #10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077 - #11 0xaaad73d60b73 in main /home/qemu/vl.c:4292 - #12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) - #13 0xaaad73d6db33 -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) - -SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in -memory_region_unref - -Thanks -use-after-free-qemu.log -Description: -Text document - -Cc: address@hidden - -On 1/17/2020 4:18 PM, Pan Nengyuan wrote: -> -Hi, -> -> -We got a use-after-free report in our Euler Robot Test, it is can be -> -reproduced quite easily, -> -It can be reproduced by start VM with lots of pci controller and virtio-scsi -> -devices. -> -You can find the full qemu log from attachment. -> -We have analyzed the log and got the rough process how it happened, but don't -> -know how to fix it. -> -> -Could anyone help to fix it ? -> -> -The key message shows bellow: -> -har device redirected to /dev/pts/1 (label charserial0) -> -==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext -> -functions and may produce false positives in some cases! -> -================================================================= -> -==1517174==ERROR: AddressSanitizer: heap-use-after-free on address -> -0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0 -> -READ of size 8 at 0xfffc31a002a0 thread T1 -> -#0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771 -> -#1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291 -> -#2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283 -> -#3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 -> -#4 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) -> -#5 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) -> -> -0xfffc31a002a0 is located 544 bytes inside of 1440-byte region -> -[0xfffc31a00080,0xfffc31a00620) -> -freed by thread T37 (CPU 0/KVM) here: -> -#0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23) -> -#1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f) -> -#2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245 -> -#3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271 -> -#4 0xaaad745ba867 in pci_bridge_dev_write_config -> -hw/pci-bridge/pci_bridge_dev.c:153 -> -#5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81 -> -#6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483 -> -#7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544 -> -#8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482 -> -#9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167 -> -#10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207 -> -#11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297 -> -#12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386 -> -#13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246 -> -#14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 -> -#15 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) -> -#16 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) -> -> -previously allocated by thread T0 here: -> -#0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) -> -#1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) -> -#2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188 -> -#3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385 -> -#4 0xaaad745baaf3 in pci_bridge_dev_realize -> -hw/pci-bridge/pci_bridge_dev.c:64 -> -#5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095 -> -#6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 -> -#7 0xaaad7485ed23 in property_set_bool qom/object.c:2102 -> -#8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 -> -#9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 -> -#10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675 -> -#11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074 -> -#12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170 -> -#13 0xaaad73d60c17 in main /home/qemu/vl.c:4313 -> -#14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) -> -#15 0xaaad73d6db33 -> -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) -> -> -Thread T1 created by T0 here: -> -#0 0xfffc3c068f6f in __interceptor_pthread_create -> -(/lib64/libasan.so.4+0x38f6f) -> -#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 -> -#2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326 -> -#3 0xaaad74bab2a7 in __libc_csu_init -> -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7) -> -#4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47) -> -#5 0xaaad73d6db33 -> -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) -> -> -Thread T37 (CPU 0/KVM) created by T0 here: -> -#0 0xfffc3c068f6f in __interceptor_pthread_create -> -(/lib64/libasan.so.4+0x38f6f) -> -#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 -> -#2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045 -> -#3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077 -> -#4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712 -> -#5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 -> -#6 0xaaad7485ed23 in property_set_bool qom/object.c:2102 -> -#7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 -> -#8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 -> -#9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682 -> -#10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077 -> -#11 0xaaad73d60b73 in main /home/qemu/vl.c:4292 -> -#12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) -> -#13 0xaaad73d6db33 -> -(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) -> -> -SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in -> -memory_region_unref -> -> -Thanks -> -use-after-free-qemu.log -Description: -Text document - diff --git a/classification_output/04/device/57231878 b/classification_output/04/device/57231878 deleted file mode 100644 index d02eeac3d..000000000 --- a/classification_output/04/device/57231878 +++ /dev/null @@ -1,250 +0,0 @@ -device: 0.818 -other: 0.788 -semantic: 0.774 -graphic: 0.751 -assembly: 0.745 -mistranslation: 0.719 -KVM: 0.708 -instruction: 0.661 -network: 0.659 -vnc: 0.640 -socket: 0.624 -boot: 0.609 - -[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. - -Hello all, -I wanted to submit a bug report in the tracker, but it seem to require -an Ubuntu One account, which I'm having trouble with, so I'll just -give it here and hopefully somebody can make use of it. The issue -seems to be in an experimental format, so it's likely not very -consequential anyway. - -For the sake of anyone else simply googling for a workaround, I'll -just paste in the (cleaned up) brief IRC conversation about my issue -from the official channel: -<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, -Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine -(FreeBSD-11.1). The VM always aborts at the same point in the -installation (downloading 'ports.tgz') with the following error -message: -"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: -qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. -zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 --enable-kvm -hda freebsd/freebsd.qed -devic" -The commands I ran to create the machine are as follows: -"qemu-img create -f qed freebsd/freebsd.qed 16G" -"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda -freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 --cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" -I tried adding logging options with the -d flag, but I didn't get -anything that seemed relevant, since I'm not sure what to look for. -<stsquad> ohh what's a qed device? -<stsquad> quy: it might be a workaround to use a qcow2 image for now -<stsquad> ahh the wiki has a statement "It is not recommended to use -QED for any new images. " -<danpb> 'qed' was an experimental disk image format created by IBM -before qcow2 v3 came along -<danpb> honestly nothing should ever use QED these days -<danpb> the good ideas from QED became qcow2v3 -<stsquad> danpb: sounds like we should put a warning on the option to -remind users of that fact -<danpb> quy: sounds like qed driver is simply broken - please do file -a bug against qemu bug tracker -<danpb> quy: but you should also really switch to qcow2 -<quy> I see; some people need to update their wikis then. I don't -remember where which guide I read when I first learned what little -QEMU I know, but I remember it specifically remember it saying QED was -the newest and most optimal format. -<stsquad> quy: we can only be responsible for our own wiki I'm afraid... -<danpb> if you remember where you saw that please let us know so we -can try to get it fixed -<quy> Thank you very much for the info; I will switch to QCOW. -Unfortunately, I'm not sure if I will be able to file any bug reports -in the tracker as I can't seem to log Launchpad, which it seems to -require. -<danpb> quy: an email to the mailing list would suffice too if you -can't deal with launchpad -<danpb> kwolf: ^^^ in case you're interested in possible QED -assertions from 2.12 - -If any more info is needed, feel free to email me; I'm not actually -subscribed to this list though. -Thank you, -Quytelda Kahja - -CC Qemu Block; looks like QED is a bit busted. - -On 06/27/2018 10:25 AM, Quytelda Kahja wrote: -> -Hello all, -> -I wanted to submit a bug report in the tracker, but it seem to require -> -an Ubuntu One account, which I'm having trouble with, so I'll just -> -give it here and hopefully somebody can make use of it. The issue -> -seems to be in an experimental format, so it's likely not very -> -consequential anyway. -> -> -For the sake of anyone else simply googling for a workaround, I'll -> -just paste in the (cleaned up) brief IRC conversation about my issue -> -from the official channel: -> -<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, -> -Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine -> -(FreeBSD-11.1). The VM always aborts at the same point in the -> -installation (downloading 'ports.tgz') with the following error -> -message: -> -"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: -> -qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. -> -zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 -> --enable-kvm -hda freebsd/freebsd.qed -devic" -> -The commands I ran to create the machine are as follows: -> -"qemu-img create -f qed freebsd/freebsd.qed 16G" -> -"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda -> -freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 -> --cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" -> -I tried adding logging options with the -d flag, but I didn't get -> -anything that seemed relevant, since I'm not sure what to look for. -> -<stsquad> ohh what's a qed device? -> -<stsquad> quy: it might be a workaround to use a qcow2 image for now -> -<stsquad> ahh the wiki has a statement "It is not recommended to use -> -QED for any new images. " -> -<danpb> 'qed' was an experimental disk image format created by IBM -> -before qcow2 v3 came along -> -<danpb> honestly nothing should ever use QED these days -> -<danpb> the good ideas from QED became qcow2v3 -> -<stsquad> danpb: sounds like we should put a warning on the option to -> -remind users of that fact -> -<danpb> quy: sounds like qed driver is simply broken - please do file -> -a bug against qemu bug tracker -> -<danpb> quy: but you should also really switch to qcow2 -> -<quy> I see; some people need to update their wikis then. I don't -> -remember where which guide I read when I first learned what little -> -QEMU I know, but I remember it specifically remember it saying QED was -> -the newest and most optimal format. -> -<stsquad> quy: we can only be responsible for our own wiki I'm afraid... -> -<danpb> if you remember where you saw that please let us know so we -> -can try to get it fixed -> -<quy> Thank you very much for the info; I will switch to QCOW. -> -Unfortunately, I'm not sure if I will be able to file any bug reports -> -in the tracker as I can't seem to log Launchpad, which it seems to -> -require. -> -<danpb> quy: an email to the mailing list would suffice too if you -> -can't deal with launchpad -> -<danpb> kwolf: ^^^ in case you're interested in possible QED -> -assertions from 2.12 -> -> -If any more info is needed, feel free to email me; I'm not actually -> -subscribed to this list though. -> -Thank you, -> -Quytelda Kahja -> - -On 06/29/2018 03:07 PM, John Snow wrote: -CC Qemu Block; looks like QED is a bit busted. - -On 06/27/2018 10:25 AM, Quytelda Kahja wrote: -Hello all, -I wanted to submit a bug report in the tracker, but it seem to require -an Ubuntu One account, which I'm having trouble with, so I'll just -give it here and hopefully somebody can make use of it. The issue -seems to be in an experimental format, so it's likely not very -consequential anyway. -Analysis in another thread may be relevant: -https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html --- -Eric Blake, Principal Software Engineer -Red Hat, Inc. +1-919-301-3266 -Virtualization: qemu.org | libvirt.org - -Am 29.06.2018 um 22:16 hat Eric Blake geschrieben: -> -On 06/29/2018 03:07 PM, John Snow wrote: -> -> CC Qemu Block; looks like QED is a bit busted. -> -> -> -> On 06/27/2018 10:25 AM, Quytelda Kahja wrote: -> -> > Hello all, -> -> > I wanted to submit a bug report in the tracker, but it seem to require -> -> > an Ubuntu One account, which I'm having trouble with, so I'll just -> -> > give it here and hopefully somebody can make use of it. The issue -> -> > seems to be in an experimental format, so it's likely not very -> -> > consequential anyway. -> -> -Analysis in another thread may be relevant: -> -> -https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html -The assertion there was: - -qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion -`!atomic_read(&to->in_flight)' failed. - -Which quite clearly pointed to a drain bug. This one, however, doesn't -seem to be related to drain, so I think it's probably a different bug. - -Kevin - diff --git a/classification_output/04/device/67821138 b/classification_output/04/device/67821138 deleted file mode 100644 index 4201fc494..000000000 --- a/classification_output/04/device/67821138 +++ /dev/null @@ -1,207 +0,0 @@ -device: 0.916 -assembly: 0.915 -boot: 0.881 -other: 0.853 -semantic: 0.843 -graphic: 0.826 -KVM: 0.822 -instruction: 0.821 -mistranslation: 0.768 -vnc: 0.734 -network: 0.718 -socket: 0.699 - -[BUG, RFC] Base node is in RW after making external snapshot - -Hi everyone, - -When making an external snapshot, we end up in a situation when 2 block -graph nodes related to the same image file (format and storage nodes) -have different RO flags set on them. - -E.g. - -# ls -la /proc/PID/fd -lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd - -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' ---pretty | egrep '"node-name"|"ro"' - "ro": false, - "node-name": "libvirt-1-format", - "ro": false, - "node-name": "libvirt-1-storage", - -# virsh snapshot-create-as VM --name snap --disk-only -Domain snapshot snap created - -# ls -la /proc/PID/fd -lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd -lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap - -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' ---pretty | egrep '"node-name"|"ro"' - "ro": false, - "node-name": "libvirt-2-format", - "ro": false, - "node-name": "libvirt-2-storage", - "ro": true, - "node-name": "libvirt-1-format", - "ro": false, <-------------- - "node-name": "libvirt-1-storage", - -File descriptor has been reopened in RO, but "libvirt-1-storage" node -still has RW permissions set. - -I'm wondering it this a bug or this is intended? Looks like a bug to -me, although I see that some iotests (e.g. 273) expect 2 nodes related -to the same image file to have different RO flags. - -bdrv_reopen_set_read_only() - bdrv_reopen() - bdrv_reopen_queue() - bdrv_reopen_queue_child() - bdrv_reopen_multiple() - bdrv_list_refresh_perms() - bdrv_topological_dfs() - bdrv_do_refresh_perms() - bdrv_reopen_commit() - -In the stack above bdrv_reopen_set_read_only() is only being called for -the parent (libvirt-1-format) node. There're 2 lists: BDSs from -refresh_list are used by bdrv_drv_set_perm and this leads to actual -reopen with RO of the file descriptor. And then there's reopen queue -bs_queue -- BDSs from this queue get their parameters updated. While -refresh_list ends up having the whole subtree (including children, this -is done in bdrv_topological_dfs()) bs_queue only has the parent. And -that is because storage (child) node's (bs->inherits_from == NULL), so -bdrv_reopen_queue_child() never adds it to the queue. Could it be the -source of this bug? - -Anyway, would greatly appreciate a clarification. - -Andrey - -On 4/24/24 21:00, Andrey Drobyshev wrote: -> -Hi everyone, -> -> -When making an external snapshot, we end up in a situation when 2 block -> -graph nodes related to the same image file (format and storage nodes) -> -have different RO flags set on them. -> -> -E.g. -> -> -# ls -la /proc/PID/fd -> -lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd -> -> -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' -> ---pretty | egrep '"node-name"|"ro"' -> -"ro": false, -> -"node-name": "libvirt-1-format", -> -"ro": false, -> -"node-name": "libvirt-1-storage", -> -> -# virsh snapshot-create-as VM --name snap --disk-only -> -Domain snapshot snap created -> -> -# ls -la /proc/PID/fd -> -lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd -> -lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap -> -> -# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' -> ---pretty | egrep '"node-name"|"ro"' -> -"ro": false, -> -"node-name": "libvirt-2-format", -> -"ro": false, -> -"node-name": "libvirt-2-storage", -> -"ro": true, -> -"node-name": "libvirt-1-format", -> -"ro": false, <-------------- -> -"node-name": "libvirt-1-storage", -> -> -File descriptor has been reopened in RO, but "libvirt-1-storage" node -> -still has RW permissions set. -> -> -I'm wondering it this a bug or this is intended? Looks like a bug to -> -me, although I see that some iotests (e.g. 273) expect 2 nodes related -> -to the same image file to have different RO flags. -> -> -bdrv_reopen_set_read_only() -> -bdrv_reopen() -> -bdrv_reopen_queue() -> -bdrv_reopen_queue_child() -> -bdrv_reopen_multiple() -> -bdrv_list_refresh_perms() -> -bdrv_topological_dfs() -> -bdrv_do_refresh_perms() -> -bdrv_reopen_commit() -> -> -In the stack above bdrv_reopen_set_read_only() is only being called for -> -the parent (libvirt-1-format) node. There're 2 lists: BDSs from -> -refresh_list are used by bdrv_drv_set_perm and this leads to actual -> -reopen with RO of the file descriptor. And then there's reopen queue -> -bs_queue -- BDSs from this queue get their parameters updated. While -> -refresh_list ends up having the whole subtree (including children, this -> -is done in bdrv_topological_dfs()) bs_queue only has the parent. And -> -that is because storage (child) node's (bs->inherits_from == NULL), so -> -bdrv_reopen_queue_child() never adds it to the queue. Could it be the -> -source of this bug? -> -> -Anyway, would greatly appreciate a clarification. -> -> -Andrey -Friendly ping. Could somebody confirm that it is a bug indeed? - diff --git a/classification_output/04/device/99674399 b/classification_output/04/device/99674399 deleted file mode 100644 index cbf3c7035..000000000 --- a/classification_output/04/device/99674399 +++ /dev/null @@ -1,156 +0,0 @@ -device: 0.886 -other: 0.883 -instruction: 0.860 -mistranslation: 0.843 -assembly: 0.843 -semantic: 0.822 -boot: 0.822 -graphic: 0.794 -socket: 0.747 -network: 0.711 -KVM: 0.698 -vnc: 0.673 - -[BUG] qemu crashes on assertion in cpu_asidx_from_attrs when cpu is in smm mode - -Hi all! - -First, I see this issue: -https://gitlab.com/qemu-project/qemu/-/issues/1198 -. -where some kvm/hardware failure leads to guest crash, and finally to this -assertion: - - cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed. - -But in the ticket the talk is about the guest crash and fixing the kernel, not -about the final QEMU assertion (which definitely show that something should be -fixed in QEMU code too). - - -We've faced same stack one time: - -(gdb) bt -#0 raise () from /lib/x86_64-linux-gnu/libc.so.6 -#1 abort () from /lib/x86_64-linux-gnu/libc.so.6 -#2 ?? () from /lib/x86_64-linux-gnu/libc.so.6 -#3 __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 -#4 cpu_asidx_from_attrs at ../hw/core/cpu-sysemu.c:76 -#5 cpu_memory_rw_debug at ../softmmu/physmem.c:3529 -#6 x86_cpu_dump_state at ../target/i386/cpu-dump.c:560 -#7 kvm_cpu_exec at ../accel/kvm/kvm-all.c:3000 -#8 kvm_vcpu_thread_fn at ../accel/kvm/kvm-accel-ops.c:51 -#9 qemu_thread_start at ../util/qemu-thread-posix.c:505 -#10 start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 -#11 clone () from /lib/x86_64-linux-gnu/libc.so.6 - - -And what I see: - -static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs) -{ - return !!attrs.secure; -} - -int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs) -{ - int ret = 0; - - if (cpu->cc->sysemu_ops->asidx_from_attrs) { - ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs); - assert(ret < cpu->num_ases && ret >= 0); <<<<<<<<<<<<<<<<< - } - return ret; -} - -(gdb) p cpu->num_ases -$3 = 1 - -(gdb) fr 5 -#5 0x00005578c8814ba3 in cpu_memory_rw_debug (cpu=c... -(gdb) p attrs -$6 = {unspecified = 0, secure = 1, user = 0, memory = 0, requester_id = 0, -byte_swap = 0, target_tlb_bit0 = 0, target_tlb_bit1 = 0, target_tlb_bit2 = 0} - -so .secure is 1, therefore ret is 1, in the same time num_ases is 1 too and -assertion fails. - - - -Where is .secure from? - -static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env) -{ - return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 }); -} - -Ok, it means we in SMM mode. - - - -On the other hand, it seems that num_ases seems to be always 1 for x86: - -vsementsov@vsementsov-lin:~/work/src/qemu/yc-7.2$ git grep 'num_ases = ' -cpu.c: cpu->num_ases = 0; -softmmu/cpus.c: cpu->num_ases = 1; -target/arm/cpu.c: cs->num_ases = 3 + has_secure; -target/arm/cpu.c: cs->num_ases = 1 + has_secure; -target/i386/tcg/sysemu/tcg-cpu.c: cs->num_ases = 2; - - -So, something is wrong around cpu->num_ases and x86_asidx_from_attrs() which -may return more in SMM mode. - - -The stack starts in -//7 0x00005578c882f539 in kvm_cpu_exec (cpu=cpu@entry=0x5578ca2eb340) at -../accel/kvm/kvm-all.c:3000 - if (ret < 0) { - cpu_dump_state(cpu, stderr, CPU_DUMP_CODE); - vm_stop(RUN_STATE_INTERNAL_ERROR); - } - -So that was some kvm error, and we decided to call cpu_dump_state(). And it -crashes. cpu_dump_state() is also called from hmp_info_registers, so I can -reproduce the crash with a tiny patch to master (as only CPU_DUMP_CODE path -calls cpu_memory_rw_debug(), as it is in kvm_cpu_exec()): - -diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c -index ff01cf9d8d..dcf0189048 100644 ---- a/monitor/hmp-cmds-target.c -+++ b/monitor/hmp-cmds-target.c -@@ -116,7 +116,7 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict) - } - - monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index); -- cpu_dump_state(cs, NULL, CPU_DUMP_FPU); -+ cpu_dump_state(cs, NULL, CPU_DUMP_CODE); - } - } - - -Than run - -yes "info registers" | ./build/qemu-system-x86_64 -accel kvm -monitor stdio \ - -global driver=cfi.pflash01,property=secure,value=on \ - -blockdev "{'driver': 'file', 'filename': -'/usr/share/OVMF/OVMF_CODE_4M.secboot.fd', 'node-name': 'ovmf-code', 'read-only': -true}" \ - -blockdev "{'driver': 'file', 'filename': '/usr/share/OVMF/OVMF_VARS_4M.fd', -'node-name': 'ovmf-vars', 'read-only': true}" \ - -machine q35,smm=on,pflash0=ovmf-code,pflash1=ovmf-vars -m 2G -nodefaults - -And after some time (less than 20 seconds for me) it leads to - -qemu-system-x86_64: ../hw/core/cpu-sysemu.c:76: cpu_asidx_from_attrs: Assertion `ret < -cpu->num_ases && ret >= 0' failed. -Aborted (core dumped) - - -I've no idea how to correctly fix this bug, but I hope that my reproducer and -investigation will help a bit. - --- -Best regards, -Vladimir - |