add the outputs of the first five revisions of the classifier

author: Christian Krinitsin <mail@krinitsin.com> 2025-06-01 21:19:55 +0200
committer: Christian Krinitsin <mail@krinitsin.com> 2025-06-01 21:19:55 +0200
commit: e5634e2806195bee44407853c4bf8776f7abfa4f (patch)
tree: 6c6e2503f2cd56c0edf08a55d5c8fcb3832fabd0 /classification_output/05/device
parent: 05ed9b00104b3da2e9cb0d541b5c5bbceb027fde (diff)
download: qemu-analysis-e5634e2806195bee44407853c4bf8776f7abfa4f.tar.gz
qemu-analysis-e5634e2806195bee44407853c4bf8776f7abfa4f.zip
12 files changed, 9369 insertions, 0 deletions
diff --git a/classification_output/05/device/14488057 b/classification_output/05/device/14488057
new file mode 100644
index 000000000..8637a1740
--- /dev/null
+++ b/classification_output/05/device/14488057
@@ -0,0 +1,719 @@
+device: 0.929
+other: 0.922
+assembly: 0.912
+instruction: 0.908
+semantic: 0.905
+boot: 0.892
+graphic: 0.887
+mistranslation: 0.885
+vnc: 0.882
+KVM: 0.880
+network: 0.846
+socket: 0.825
+
+[Qemu-devel] [BUG] user-to-root privesc inside VM via bad translation caching
+
+This is an issue in QEMU's system emulation for X86 in TCG mode.
+The issue permits an attacker who can execute code in guest ring 3
+with normal user privileges to inject code into other processes that
+are running in guest ring 3, in particular root-owned processes.
+
+== reproduction steps ==
+
+ - Create an x86-64 VM and install Debian Jessie in it. The following
+   steps should all be executed inside the VM.
+ - Verify that procmail is installed and the correct version:
+       address@hidden:~# apt-cache show procmail | egrep 'Version|SHA'
+       Version: 3.22-24
+       SHA1: 54ed2d51db0e76f027f06068ab5371048c13434c
+       SHA256: 4488cf6975af9134a9b5238d5d70e8be277f70caa45a840dfbefd2dc444bfe7f
+ - Install build-essential and nasm ("apt install build-essential nasm").
+ - Unpack the exploit, compile it and run it:
+       address@hidden:~$ tar xvf procmail_cache_attack.tar
+       procmail_cache_attack/
+       procmail_cache_attack/shellcode.asm
+       procmail_cache_attack/xp.c
+       procmail_cache_attack/compile.sh
+       procmail_cache_attack/attack.c
+       address@hidden:~$ cd procmail_cache_attack
+       address@hidden:~/procmail_cache_attack$ ./compile.sh
+       address@hidden:~/procmail_cache_attack$ ./attack
+       memory mappings set up
+       child is dead, codegen should be complete
+       executing code as root! :)
+       address@hidden:~/procmail_cache_attack# id
+       uid=0(root) gid=0(root) groups=0(root),[...]
+
+Note: While the exploit depends on the precise version of procmail,
+the actual vulnerability is in QEMU, not in procmail. procmail merely
+serves as a seldomly-executed setuid root binary into which code can
+be injected.
+
+
+== detailed issue description ==
+QEMU caches translated basic blocks. To look up a translated basic
+block, the function tb_find() is used, which uses tb_htable_lookup()
+in its slowpath, which in turn compares translated basic blocks
+(TranslationBlock) to the lookup information (struct tb_desc) using
+tb_cmp().
+
+tb_cmp() attempts to ensure (among other things) that both the virtual
+start address of the basic block and the physical addresses that the
+basic block covers match. When checking the physical addresses, it
+assumes that a basic block can span at most two pages.
+
+gen_intermediate_code() attempts to enforce this by stopping the
+translation of a basic block if nearly one page of instructions has
+been translated already:
+
+    /* if too long translation, stop generation too */
+    if (tcg_op_buf_full() ||
+        (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) ||
+        num_insns >= max_insns) {
+        gen_jmp_im(pc_ptr - dc->cs_base);
+        gen_eob(dc);
+        break;
+    }
+
+However, while real X86 processors have a maximum instruction length
+of 15 bytes, QEMU's instruction decoder for X86 does not place any
+limit on the instruction length or the number of instruction prefixes.
+Therefore, it is possible to create an arbitrarily long instruction
+by e.g. prepending an arbitrary number of LOCK prefixes to a normal
+instruction. This permits creating a basic block that spans three
+pages by simply appending an approximately page-sized instruction to
+the end of a normal basic block that starts close to the end of a
+page.
+
+Such an overlong basic block causes the basic block caching to fail as
+follows: If code is generated and cached for a basic block that spans
+the physical pages (A,E,B), this basic block will be returned by
+lookups in a process in which the physical pages (A,B,C) are mapped
+in the same virtual address range (assuming that all other lookup
+parameters match).
+
+This behavior can be abused by an attacker e.g. as follows: If a
+non-relocatable world-readable setuid executable legitimately contains
+the pages (A,B,C), an attacker can map (A,E,B) into his own process,
+at the normal load address of A, where E is an attacker-controlled
+page. If a legitimate basic block spans the pages A and B, an attacker
+can write arbitrary non-branch instructions at the start of E, then
+append an overlong instruction
+that ends behind the start of C, yielding a modified basic block that
+spans all three pages. If the attacker then executes the modified
+basic block in his process, the modified basic block is cached.
+Next, the attacker can execute the setuid binary, which will reuse the
+cached modified basic block, executing attacker-controlled
+instructions in the context of the privileged process.
+
+I am sending this to qemu-devel because a QEMU security contact
+told me that QEMU does not consider privilege escalation inside a
+TCG VM to be a security concern.
+procmail_cache_attack.tar
+Description:
+Unix tar archive
+
+On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
+>
+This is an issue in QEMU's system emulation for X86 in TCG mode.
+>
+The issue permits an attacker who can execute code in guest ring 3
+>
+with normal user privileges to inject code into other processes that
+>
+are running in guest ring 3, in particular root-owned processes.
+>
+I am sending this to qemu-devel because a QEMU security contact
+>
+told me that QEMU does not consider privilege escalation inside a
+>
+TCG VM to be a security concern.
+Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
+
+We should really fix the crossing-a-page-boundary code for x86.
+I believe we do get it correct for ARM Thumb instructions.
+
+thanks
+-- PMM
+
+On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote:
+>
+On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
+>
+> This is an issue in QEMU's system emulation for X86 in TCG mode.
+>
+> The issue permits an attacker who can execute code in guest ring 3
+>
+> with normal user privileges to inject code into other processes that
+>
+> are running in guest ring 3, in particular root-owned processes.
+>
+>
+> I am sending this to qemu-devel because a QEMU security contact
+>
+> told me that QEMU does not consider privilege escalation inside a
+>
+> TCG VM to be a security concern.
+>
+>
+Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
+>
+>
+We should really fix the crossing-a-page-boundary code for x86.
+>
+I believe we do get it correct for ARM Thumb instructions.
+How about doing the instruction size check as follows?
+
+diff --git a/target/i386/translate.c b/target/i386/translate.c
+index 72c1b03a2a..94cf3da719 100644
+--- a/target/i386/translate.c
++++ b/target/i386/translate.c
+@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
+*env, DisasContext *s,
+     default:
+         goto unknown_op;
+     }
++    if (s->pc - pc_start > 15) {
++        s->pc = pc_start;
++        goto illegal_op;
++    }
+     return s->pc;
+  illegal_op:
+     gen_illegal_opcode(s);
+
+Thanks,
+--
+Pranith
+
+On 22 March 2017 at 14:55, Pranith Kumar <address@hidden> wrote:
+>
+On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote:
+>
+> On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
+>
+>> This is an issue in QEMU's system emulation for X86 in TCG mode.
+>
+>> The issue permits an attacker who can execute code in guest ring 3
+>
+>> with normal user privileges to inject code into other processes that
+>
+>> are running in guest ring 3, in particular root-owned processes.
+>
+>
+>
+>> I am sending this to qemu-devel because a QEMU security contact
+>
+>> told me that QEMU does not consider privilege escalation inside a
+>
+>> TCG VM to be a security concern.
+>
+>
+>
+> Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
+>
+>
+>
+> We should really fix the crossing-a-page-boundary code for x86.
+>
+> I believe we do get it correct for ARM Thumb instructions.
+>
+>
+How about doing the instruction size check as follows?
+>
+>
+diff --git a/target/i386/translate.c b/target/i386/translate.c
+>
+index 72c1b03a2a..94cf3da719 100644
+>
+--- a/target/i386/translate.c
+>
++++ b/target/i386/translate.c
+>
+@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
+>
+*env, DisasContext *s,
+>
+default:
+>
+goto unknown_op;
+>
+}
+>
++    if (s->pc - pc_start > 15) {
+>
++        s->pc = pc_start;
+>
++        goto illegal_op;
+>
++    }
+>
+return s->pc;
+>
+illegal_op:
+>
+gen_illegal_opcode(s);
+This doesn't look right because it means we'll check
+only after we've emitted all the code to do the
+instruction operation, so the effect will be
+"execute instruction, then take illegal-opcode
+exception".
+
+We should check what the x86 architecture spec actually
+says and implement that.
+
+thanks
+-- PMM
+
+On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
+<address@hidden> wrote:
+>
+>
+>
+> How about doing the instruction size check as follows?
+>
+>
+>
+> diff --git a/target/i386/translate.c b/target/i386/translate.c
+>
+> index 72c1b03a2a..94cf3da719 100644
+>
+> --- a/target/i386/translate.c
+>
+> +++ b/target/i386/translate.c
+>
+> @@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
+>
+> *env, DisasContext *s,
+>
+>      default:
+>
+>          goto unknown_op;
+>
+>      }
+>
+> +    if (s->pc - pc_start > 15) {
+>
+> +        s->pc = pc_start;
+>
+> +        goto illegal_op;
+>
+> +    }
+>
+>      return s->pc;
+>
+>   illegal_op:
+>
+>      gen_illegal_opcode(s);
+>
+>
+This doesn't look right because it means we'll check
+>
+only after we've emitted all the code to do the
+>
+instruction operation, so the effect will be
+>
+"execute instruction, then take illegal-opcode
+>
+exception".
+>
+The pc is restored to original address (s->pc = pc_start), so the
+exception will overwrite the generated illegal instruction and will be
+executed first.
+
+But yes, it's better to follow the architecture manual.
+
+Thanks,
+--
+Pranith
+
+On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
+>
+On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
+>
+<address@hidden> wrote:
+>
+> This doesn't look right because it means we'll check
+>
+> only after we've emitted all the code to do the
+>
+> instruction operation, so the effect will be
+>
+> "execute instruction, then take illegal-opcode
+>
+> exception".
+>
+The pc is restored to original address (s->pc = pc_start), so the
+>
+exception will overwrite the generated illegal instruction and will be
+>
+executed first.
+s->pc is the guest PC -- moving that backwards will
+not do anything about the generated TCG IR that's
+already been written. You'd need to rewind the
+write pointer in the IR stream, which there is
+no support for doing AFAIK.
+
+thanks
+-- PMM
+
+On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell
+<address@hidden> wrote:
+>
+On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
+>
+> On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
+>
+> <address@hidden> wrote:
+>
+>> This doesn't look right because it means we'll check
+>
+>> only after we've emitted all the code to do the
+>
+>> instruction operation, so the effect will be
+>
+>> "execute instruction, then take illegal-opcode
+>
+>> exception".
+>
+>
+> The pc is restored to original address (s->pc = pc_start), so the
+>
+> exception will overwrite the generated illegal instruction and will be
+>
+> executed first.
+>
+>
+s->pc is the guest PC -- moving that backwards will
+>
+not do anything about the generated TCG IR that's
+>
+already been written. You'd need to rewind the
+>
+write pointer in the IR stream, which there is
+>
+no support for doing AFAIK.
+Ah, OK. Thanks for the explanation. May be we should check the size of
+the instruction while decoding the prefixes and error out once we
+exceed the limit. We would not generate any IR code.
+
+--
+Pranith
+
+On 03/23/2017 02:29 AM, Pranith Kumar wrote:
+On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell
+<address@hidden> wrote:
+On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
+On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
+<address@hidden> wrote:
+This doesn't look right because it means we'll check
+only after we've emitted all the code to do the
+instruction operation, so the effect will be
+"execute instruction, then take illegal-opcode
+exception".
+The pc is restored to original address (s->pc = pc_start), so the
+exception will overwrite the generated illegal instruction and will be
+executed first.
+s->pc is the guest PC -- moving that backwards will
+not do anything about the generated TCG IR that's
+already been written. You'd need to rewind the
+write pointer in the IR stream, which there is
+no support for doing AFAIK.
+Ah, OK. Thanks for the explanation. May be we should check the size of
+the instruction while decoding the prefixes and error out once we
+exceed the limit. We would not generate any IR code.
+Yes.
+It would not enforce a true limit of 15 bytes, since you can't know that until
+you've done the rest of the decode.  But you'd be able to say that no more than
+14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 bytes is used.
+Which does fix the bug.
+
+
+r~
+
+On 22/03/2017 21:01, Richard Henderson wrote:
+>
+>
+>
+> Ah, OK. Thanks for the explanation. May be we should check the size of
+>
+> the instruction while decoding the prefixes and error out once we
+>
+> exceed the limit. We would not generate any IR code.
+>
+>
+Yes.
+>
+>
+It would not enforce a true limit of 15 bytes, since you can't know that
+>
+until you've done the rest of the decode.  But you'd be able to say that
+>
+no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
+>
+bytes is used.
+>
+>
+Which does fix the bug.
+Yeah, that would work for 2.9 if somebody wants to put together a patch.
+ Ensuring that all instruction fetching happens before translation side
+effects is a little harder, but perhaps it's also the opportunity to get
+rid of s->rip_offset which is a little ugly.
+
+Paolo
+
+On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
+>
+>
+>
+On 22/03/2017 21:01, Richard Henderson wrote:
+>
+>>
+>
+>> Ah, OK. Thanks for the explanation. May be we should check the size of
+>
+>> the instruction while decoding the prefixes and error out once we
+>
+>> exceed the limit. We would not generate any IR code.
+>
+>
+>
+> Yes.
+>
+>
+>
+> It would not enforce a true limit of 15 bytes, since you can't know that
+>
+> until you've done the rest of the decode.  But you'd be able to say that
+>
+> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
+>
+> bytes is used.
+>
+>
+>
+> Which does fix the bug.
+>
+>
+Yeah, that would work for 2.9 if somebody wants to put together a patch.
+>
+Ensuring that all instruction fetching happens before translation side
+>
+effects is a little harder, but perhaps it's also the opportunity to get
+>
+rid of s->rip_offset which is a little ugly.
+How about the following?
+
+diff --git a/target/i386/translate.c b/target/i386/translate.c
+index 72c1b03a2a..67c58b8900 100644
+--- a/target/i386/translate.c
++++ b/target/i386/translate.c
+@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
+*env, DisasContext *s,
+     s->vex_l = 0;
+     s->vex_v = 0;
+  next_byte:
++    /* The prefixes can atmost be 14 bytes since x86 has an upper
++       limit of 15 bytes for the instruction */
++    if (s->pc - pc_start > 14) {
++        goto illegal_op;
++    }
+     b = cpu_ldub_code(env, s->pc);
+     s->pc++;
+     /* Collect prefixes.  */
+
+--
+Pranith
+
+On 23/03/2017 17:50, Pranith Kumar wrote:
+>
+On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
+>
+>
+>
+>
+>
+> On 22/03/2017 21:01, Richard Henderson wrote:
+>
+>>>
+>
+>>> Ah, OK. Thanks for the explanation. May be we should check the size of
+>
+>>> the instruction while decoding the prefixes and error out once we
+>
+>>> exceed the limit. We would not generate any IR code.
+>
+>>
+>
+>> Yes.
+>
+>>
+>
+>> It would not enforce a true limit of 15 bytes, since you can't know that
+>
+>> until you've done the rest of the decode.  But you'd be able to say that
+>
+>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
+>
+>> bytes is used.
+>
+>>
+>
+>> Which does fix the bug.
+>
+>
+>
+> Yeah, that would work for 2.9 if somebody wants to put together a patch.
+>
+>  Ensuring that all instruction fetching happens before translation side
+>
+> effects is a little harder, but perhaps it's also the opportunity to get
+>
+> rid of s->rip_offset which is a little ugly.
+>
+>
+How about the following?
+>
+>
+diff --git a/target/i386/translate.c b/target/i386/translate.c
+>
+index 72c1b03a2a..67c58b8900 100644
+>
+--- a/target/i386/translate.c
+>
++++ b/target/i386/translate.c
+>
+@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
+>
+*env, DisasContext *s,
+>
+s->vex_l = 0;
+>
+s->vex_v = 0;
+>
+next_byte:
+>
++    /* The prefixes can atmost be 14 bytes since x86 has an upper
+>
++       limit of 15 bytes for the instruction */
+>
++    if (s->pc - pc_start > 14) {
+>
++        goto illegal_op;
+>
++    }
+>
+b = cpu_ldub_code(env, s->pc);
+>
+s->pc++;
+>
+/* Collect prefixes.  */
+Please make the comment more verbose, based on Richard's remark.  We
+should apply it to 2.9.
+
+Also, QEMU usually formats comments with stars on every line.
+
+Paolo
+
+On Thu, Mar 23, 2017 at 1:37 PM, Paolo Bonzini <address@hidden> wrote:
+>
+>
+>
+On 23/03/2017 17:50, Pranith Kumar wrote:
+>
+> On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
+>
+>>
+>
+>>
+>
+>> On 22/03/2017 21:01, Richard Henderson wrote:
+>
+>>>>
+>
+>>>> Ah, OK. Thanks for the explanation. May be we should check the size of
+>
+>>>> the instruction while decoding the prefixes and error out once we
+>
+>>>> exceed the limit. We would not generate any IR code.
+>
+>>>
+>
+>>> Yes.
+>
+>>>
+>
+>>> It would not enforce a true limit of 15 bytes, since you can't know that
+>
+>>> until you've done the rest of the decode.  But you'd be able to say that
+>
+>>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
+>
+>>> bytes is used.
+>
+>>>
+>
+>>> Which does fix the bug.
+>
+>>
+>
+>> Yeah, that would work for 2.9 if somebody wants to put together a patch.
+>
+>>  Ensuring that all instruction fetching happens before translation side
+>
+>> effects is a little harder, but perhaps it's also the opportunity to get
+>
+>> rid of s->rip_offset which is a little ugly.
+>
+>
+>
+> How about the following?
+>
+>
+>
+> diff --git a/target/i386/translate.c b/target/i386/translate.c
+>
+> index 72c1b03a2a..67c58b8900 100644
+>
+> --- a/target/i386/translate.c
+>
+> +++ b/target/i386/translate.c
+>
+> @@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
+>
+> *env, DisasContext *s,
+>
+>      s->vex_l = 0;
+>
+>      s->vex_v = 0;
+>
+>   next_byte:
+>
+> +    /* The prefixes can atmost be 14 bytes since x86 has an upper
+>
+> +       limit of 15 bytes for the instruction */
+>
+> +    if (s->pc - pc_start > 14) {
+>
+> +        goto illegal_op;
+>
+> +    }
+>
+>      b = cpu_ldub_code(env, s->pc);
+>
+>      s->pc++;
+>
+>      /* Collect prefixes.  */
+>
+>
+Please make the comment more verbose, based on Richard's remark.  We
+>
+should apply it to 2.9.
+>
+>
+Also, QEMU usually formats comments with stars on every line.
+OK. I'll send a proper patch with updated comment.
+
+Thanks,
+--
+Pranith
+
diff --git a/classification_output/05/device/24190340 b/classification_output/05/device/24190340
new file mode 100644
index 000000000..ebbe31252
--- /dev/null
+++ b/classification_output/05/device/24190340
@@ -0,0 +1,2064 @@
+device: 0.832
+instruction: 0.818
+other: 0.811
+vnc: 0.808
+boot: 0.803
+semantic: 0.793
+assembly: 0.790
+KVM: 0.776
+graphic: 0.775
+mistranslation: 0.758
+network: 0.723
+socket: 0.715
+
+[BUG, RFC] Block graph deadlock on job-dismiss
+
+Hi all,
+
+There's a bug in block layer which leads to block graph deadlock.
+Notably, it takes place when blockdev IO is processed within a separate
+iothread.
+
+This was initially caught by our tests, and I was able to reduce it to a
+relatively simple reproducer.  Such deadlocks are probably supposed to
+be covered in iotests/graph-changes-while-io, but this deadlock isn't.
+
+Basically what the reproducer does is launches QEMU with a drive having
+'iothread' option set, creates a chain of 2 snapshots, launches
+block-commit job for a snapshot and then dismisses the job, starting
+from the lower snapshot.  If the guest is issuing IO at the same time,
+there's a race in acquiring block graph lock and a potential deadlock.
+
+Here's how it can be reproduced:
+
+1. Run QEMU:
+>
+SRCDIR=/path/to/srcdir
+>
+>
+>
+>
+>
+$SRCDIR/build/qemu-system-x86_64 -enable-kvm \
+>
+>
+-machine q35 -cpu Nehalem \
+>
+>
+-name guest=alma8-vm,debug-threads=on \
+>
+>
+-m 2g -smp 2 \
+>
+>
+-nographic -nodefaults \
+>
+>
+-qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
+>
+>
+-serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
+>
+>
+-object iothread,id=iothread0 \
+>
+>
+-blockdev
+>
+node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
+>
+\
+>
+-device virtio-blk-pci,drive=disk,iothread=iothread0
+2. Launch IO (random reads) from within the guest:
+>
+nc -U /var/run/alma8-serial.sock
+>
+...
+>
+[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k
+>
+--size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting
+>
+--rw=randread --iodepth=1 --filename=/testfile
+3. Run snapshots creation & removal of lower snapshot operation in a
+loop (script attached):
+>
+while /bin/true ; do ./remove_lower_snap.sh ; done
+And then it occasionally hangs.
+
+Note: I've tried bisecting this, and looks like deadlock occurs starting
+from the following commit:
+
+(BAD)  5bdbaebcce virtio: Re-enable notifications after drain
+(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
+
+On the latest v10.0.0 it does hang as well.
+
+
+Here's backtrace of the main thread:
+
+>
+#0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1,
+>
+timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43
+>
+#1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1,
+>
+timeout=-1) at ../util/qemu-timer.c:329
+>
+#2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20,
+>
+ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
+>
+#3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at
+>
+../util/aio-posix.c:730
+>
+#4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950,
+>
+parent=0x0, poll=true) at ../block/io.c:378
+>
+#5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at
+>
+../block/io.c:391
+>
+#6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7682
+>
+#7  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7608
+>
+#8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7668
+>
+#9  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7608
+>
+#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7668
+>
+#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7608
+>
+#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../blockjob.c:157
+>
+#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7592
+>
+#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7661
+>
+#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
+>
+(child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 =
+>
+{...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
+>
+#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7592
+>
+#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0,
+>
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+errp=0x0)
+>
+at ../block.c:7661
+>
+#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0,
+>
+ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715
+>
+#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at
+>
+../block.c:3317
+>
+#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at
+>
+../blockjob.c:209
+>
+#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at
+>
+../blockjob.c:82
+>
+#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at
+>
+../job.c:474
+>
+#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at
+>
+../job.c:771
+>
+#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400,
+>
+errp=0x7ffd94b4f488) at ../job.c:783
+>
+--Type <RET> for more, q to quit, c to continue without paging--
+>
+#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1",
+>
+errp=0x7ffd94b4f488) at ../job-qmp.c:138
+>
+#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0,
+>
+ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
+>
+#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at
+>
+../qapi/qmp-dispatch.c:128
+>
+#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at
+>
+../util/async.c:172
+>
+#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at
+>
+../util/async.c:219
+>
+#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at
+>
+../util/aio-posix.c:436
+>
+#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200,
+>
+callback=0x0, user_data=0x0) at ../util/async.c:361
+>
+#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at
+>
+../glib/gmain.c:3364
+>
+#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079
+>
+#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287
+>
+#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at
+>
+../util/main-loop.c:310
+>
+#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at
+>
+../util/main-loop.c:589
+>
+#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
+>
+#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at
+>
+../system/main.c:50
+>
+#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at
+>
+../system/main.c:80
+And here's coroutine trying to acquire read lock:
+
+>
+(gdb) qemu coroutine reader_queue->entries.sqh_first
+>
+#0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0,
+>
+to_=0x7fc537fff508, action=COROUTINE_YIELD) at
+>
+../util/coroutine-ucontext.c:321
+>
+#1  0x0000557eb47d4d4a in qemu_coroutine_yield () at
+>
+../util/qemu-coroutine.c:339
+>
+#2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0
+>
+<reader_queue>, lock=0x7fc53c57de50, flags=0) at
+>
+../util/qemu-coroutine-lock.c:60
+>
+#3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231
+>
+#4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at
+>
+/home/root/src/qemu/master/include/block/graph-lock.h:213
+>
+#5  0x0000557eb460fa41 in blk_co_do_preadv_part
+>
+(blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988,
+>
+qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339
+>
+#6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at
+>
+../block/block-backend.c:1619
+>
+#7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at
+>
+../util/coroutine-ucontext.c:175
+>
+#8  0x00007fc547c2a360 in __start_context () at
+>
+../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
+>
+#9  0x00007ffd94b4ea40 in  ()
+>
+#10 0x0000000000000000 in  ()
+So it looks like main thread is processing job-dismiss request and is
+holding write lock taken in block_job_remove_all_bdrv() (frame #20
+above).  At the same time iothread spawns a coroutine which performs IO
+request.  Before the coroutine is spawned, blk_aio_prwv() increases
+'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+trying to acquire the read lock.  But main thread isn't releasing the
+lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+Here's the deadlock.
+
+Any comments and suggestions on the subject are welcomed.  Thanks!
+
+Andrey
+remove_lower_snap.sh
+Description:
+application/shellscript
+
+On 4/24/25 8:32 PM, Andrey Drobyshev wrote:
+>
+Hi all,
+>
+>
+There's a bug in block layer which leads to block graph deadlock.
+>
+Notably, it takes place when blockdev IO is processed within a separate
+>
+iothread.
+>
+>
+This was initially caught by our tests, and I was able to reduce it to a
+>
+relatively simple reproducer.  Such deadlocks are probably supposed to
+>
+be covered in iotests/graph-changes-while-io, but this deadlock isn't.
+>
+>
+Basically what the reproducer does is launches QEMU with a drive having
+>
+'iothread' option set, creates a chain of 2 snapshots, launches
+>
+block-commit job for a snapshot and then dismisses the job, starting
+>
+from the lower snapshot.  If the guest is issuing IO at the same time,
+>
+there's a race in acquiring block graph lock and a potential deadlock.
+>
+>
+Here's how it can be reproduced:
+>
+>
+[...]
+>
+I took a closer look at iotests/graph-changes-while-io, and have managed
+to reproduce the same deadlock in a much simpler setup, without a guest.
+
+1. Run QSD:> ./build/storage-daemon/qemu-storage-daemon --object
+iothread,id=iothread0 \
+>
+--blockdev null-co,node-name=node0,read-zeroes=true \
+>
+>
+--nbd-server addr.type=unix,addr.path=/var/run/qsd_nbd.sock \
+>
+>
+--export
+>
+nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true
+>
+\
+>
+--chardev
+>
+socket,id=qmp-sock,path=/var/run/qsd_qmp.sock,server=on,wait=off \
+>
+--monitor chardev=qmp-sock
+2. Launch IO:
+>
+qemu-img bench -f raw -c 2000000
+>
+'nbd+unix:///node0?socket=/var/run/qsd_nbd.sock'
+3. Add 2 snapshots and remove lower one (script attached):> while
+/bin/true ; do ./rls_qsd.sh ; done
+
+And then it hangs.
+
+I'll also send a patch with corresponding test case added directly to
+iotests.
+
+This reproduce seems to be hanging starting from Fiona's commit
+67446e605dc ("blockjob: drop AioContext lock before calling
+bdrv_graph_wrlock()").  AioContext locks were dropped entirely later on
+in Stefan's commit b49f4755c7 ("block: remove AioContext locking"), but
+the problem remains.
+
+Andrey
+rls_qsd.sh
+Description:
+application/shellscript
+
+From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
+
+This case is catching potential deadlock which takes place when job-dismiss
+is issued when I/O requests are processed in a separate iothread.
+
+See
+https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html
+Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
+---
+ .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++--
+ .../tests/graph-changes-while-io.out          |   4 +-
+ 2 files changed, 96 insertions(+), 9 deletions(-)
+
+diff --git a/tests/qemu-iotests/tests/graph-changes-while-io 
+b/tests/qemu-iotests/tests/graph-changes-while-io
+index 194fda500e..e30f823da4 100755
+--- a/tests/qemu-iotests/tests/graph-changes-while-io
++++ b/tests/qemu-iotests/tests/graph-changes-while-io
+@@ -27,6 +27,8 @@ from iotests import imgfmt, qemu_img, qemu_img_create, 
+qemu_io, \
+ 
+ 
+ top = os.path.join(iotests.test_dir, 'top.img')
++snap1 = os.path.join(iotests.test_dir, 'snap1.img')
++snap2 = os.path.join(iotests.test_dir, 'snap2.img')
+ nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+ 
+ 
+@@ -58,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase):
+     def tearDown(self) -> None:
+         self.qsd.stop()
+ 
++    def _wait_for_blockjob(self, status) -> None:
++        done = False
++        while not done:
++            for event in self.qsd.get_qmp().get_events(wait=10.0):
++                if event['event'] != 'JOB_STATUS_CHANGE':
++                    continue
++                if event['data']['status'] == status:
++                    done = True
++
+     def test_blockdev_add_while_io(self) -> None:
+         # Run qemu-img bench in the background
+         bench_thr = Thread(target=do_qemu_img_bench)
+@@ -116,13 +127,89 @@ class TestGraphChangesWhileIO(QMPTestCase):
+                 'device': 'job0',
+             })
+ 
+-            cancelled = False
+-            while not cancelled:
+-                for event in self.qsd.get_qmp().get_events(wait=10.0):
+-                    if event['event'] != 'JOB_STATUS_CHANGE':
+-                        continue
+-                    if event['data']['status'] == 'null':
+-                        cancelled = True
++            self._wait_for_blockjob('null')
++
++        bench_thr.join()
++
++    def test_remove_lower_snapshot_while_io(self) -> None:
++        # Run qemu-img bench in the background
++        bench_thr = Thread(target=do_qemu_img_bench, args=(100000, ))
++        bench_thr.start()
++
++        # While I/O is performed on 'node0' node, consequently add 2 snapshots
++        # on top of it, then remove (commit) them starting from lower one.
++        while bench_thr.is_alive():
++            # Recreate snapshot images on every iteration
++            qemu_img_create('-f', imgfmt, snap1, '1G')
++            qemu_img_create('-f', imgfmt, snap2, '1G')
++
++            self.qsd.cmd('blockdev-add', {
++                'driver': imgfmt,
++                'node-name': 'snap1',
++                'file': {
++                    'driver': 'file',
++                    'filename': snap1
++                }
++            })
++
++            self.qsd.cmd('blockdev-snapshot', {
++                'node': 'node0',
++                'overlay': 'snap1',
++            })
++
++            self.qsd.cmd('blockdev-add', {
++                'driver': imgfmt,
++                'node-name': 'snap2',
++                'file': {
++                    'driver': 'file',
++                    'filename': snap2
++                }
++            })
++
++            self.qsd.cmd('blockdev-snapshot', {
++                'node': 'snap1',
++                'overlay': 'snap2',
++            })
++
++            self.qsd.cmd('block-commit', {
++                'job-id': 'commit-snap1',
++                'device': 'snap2',
++                'top-node': 'snap1',
++                'base-node': 'node0',
++                'auto-finalize': True,
++                'auto-dismiss': False,
++            })
++
++            self._wait_for_blockjob('concluded')
++            self.qsd.cmd('job-dismiss', {
++                'id': 'commit-snap1',
++            })
++
++            self.qsd.cmd('block-commit', {
++                'job-id': 'commit-snap2',
++                'device': 'snap2',
++                'top-node': 'snap2',
++                'base-node': 'node0',
++                'auto-finalize': True,
++                'auto-dismiss': False,
++            })
++
++            self._wait_for_blockjob('ready')
++            self.qsd.cmd('job-complete', {
++                'id': 'commit-snap2',
++            })
++
++            self._wait_for_blockjob('concluded')
++            self.qsd.cmd('job-dismiss', {
++                'id': 'commit-snap2',
++            })
++
++            self.qsd.cmd('blockdev-del', {
++                'node-name': 'snap1'
++            })
++            self.qsd.cmd('blockdev-del', {
++                'node-name': 'snap2'
++            })
+ 
+         bench_thr.join()
+ 
+diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out 
+b/tests/qemu-iotests/tests/graph-changes-while-io.out
+index fbc63e62f8..8d7e996700 100644
+--- a/tests/qemu-iotests/tests/graph-changes-while-io.out
++++ b/tests/qemu-iotests/tests/graph-changes-while-io.out
+@@ -1,5 +1,5 @@
+-..
++...
+ ----------------------------------------------------------------------
+-Ran 2 tests
++Ran 3 tests
+ 
+ OK
+-- 
+2.43.5
+
+Am 24.04.25 um 19:32 schrieb Andrey Drobyshev:
+>
+So it looks like main thread is processing job-dismiss request and is
+>
+holding write lock taken in block_job_remove_all_bdrv() (frame #20
+>
+above).  At the same time iothread spawns a coroutine which performs IO
+>
+request.  Before the coroutine is spawned, blk_aio_prwv() increases
+>
+'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+>
+trying to acquire the read lock.  But main thread isn't releasing the
+>
+lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+>
+Here's the deadlock.
+And for the IO test you provided, it's client->nb_requests that behaves
+similarly to blk->in_flight here.
+
+The issue also reproduces easily when issuing the following QMP command
+in a loop while doing IO on a device:
+
+>
+void qmp_block_locked_drain(const char *node_name, Error **errp)
+>
+{
+>
+BlockDriverState *bs;
+>
+>
+bs = bdrv_find_node(node_name);
+>
+if (!bs) {
+>
+error_setg(errp, "node not found");
+>
+return;
+>
+}
+>
+>
+bdrv_graph_wrlock();
+>
+bdrv_drained_begin(bs);
+>
+bdrv_drained_end(bs);
+>
+bdrv_graph_wrunlock();
+>
+}
+It seems like either it would be necessary to require:
+1. not draining inside an exclusively locked section
+or
+2. making sure that variables used by drained_poll routines are only set
+while holding the reader lock
+?
+
+Those seem to require rather involved changes, so a third option might
+be to make draining inside an exclusively locked section possible, by
+embedding such locked sections in a drained section:
+
+>
+diff --git a/blockjob.c b/blockjob.c
+>
+index 32007f31a9..9b2f3b3ea9 100644
+>
+--- a/blockjob.c
+>
++++ b/blockjob.c
+>
+@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
+>
+* one to make sure that such a concurrent access does not attempt
+>
+* to process an already freed BdrvChild.
+>
+*/
+>
++    bdrv_drain_all_begin();
+>
+bdrv_graph_wrlock();
+>
+while (job->nodes) {
+>
+GSList *l = job->nodes;
+>
+@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
+>
+g_slist_free_1(l);
+>
+}
+>
+bdrv_graph_wrunlock();
+>
++    bdrv_drain_all_end();
+>
+}
+>
+>
+bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
+This seems to fix the issue at hand. I can send a patch if this is
+considered an acceptable approach.
+
+Best Regards,
+Fiona
+
+On 4/30/25 11:47 AM, Fiona Ebner wrote:
+>
+Am 24.04.25 um 19:32 schrieb Andrey Drobyshev:
+>
+> So it looks like main thread is processing job-dismiss request and is
+>
+> holding write lock taken in block_job_remove_all_bdrv() (frame #20
+>
+> above).  At the same time iothread spawns a coroutine which performs IO
+>
+> request.  Before the coroutine is spawned, blk_aio_prwv() increases
+>
+> 'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+>
+> trying to acquire the read lock.  But main thread isn't releasing the
+>
+> lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+>
+> Here's the deadlock.
+>
+>
+And for the IO test you provided, it's client->nb_requests that behaves
+>
+similarly to blk->in_flight here.
+>
+>
+The issue also reproduces easily when issuing the following QMP command
+>
+in a loop while doing IO on a device:
+>
+>
+> void qmp_block_locked_drain(const char *node_name, Error **errp)
+>
+> {
+>
+>     BlockDriverState *bs;
+>
+>
+>
+>     bs = bdrv_find_node(node_name);
+>
+>     if (!bs) {
+>
+>         error_setg(errp, "node not found");
+>
+>         return;
+>
+>     }
+>
+>
+>
+>     bdrv_graph_wrlock();
+>
+>     bdrv_drained_begin(bs);
+>
+>     bdrv_drained_end(bs);
+>
+>     bdrv_graph_wrunlock();
+>
+> }
+>
+>
+It seems like either it would be necessary to require:
+>
+1. not draining inside an exclusively locked section
+>
+or
+>
+2. making sure that variables used by drained_poll routines are only set
+>
+while holding the reader lock
+>
+?
+>
+>
+Those seem to require rather involved changes, so a third option might
+>
+be to make draining inside an exclusively locked section possible, by
+>
+embedding such locked sections in a drained section:
+>
+>
+> diff --git a/blockjob.c b/blockjob.c
+>
+> index 32007f31a9..9b2f3b3ea9 100644
+>
+> --- a/blockjob.c
+>
+> +++ b/blockjob.c
+>
+> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
+>
+>       * one to make sure that such a concurrent access does not attempt
+>
+>       * to process an already freed BdrvChild.
+>
+>       */
+>
+> +    bdrv_drain_all_begin();
+>
+>      bdrv_graph_wrlock();
+>
+>      while (job->nodes) {
+>
+>          GSList *l = job->nodes;
+>
+> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
+>
+>          g_slist_free_1(l);
+>
+>      }
+>
+>      bdrv_graph_wrunlock();
+>
+> +    bdrv_drain_all_end();
+>
+>  }
+>
+>
+>
+>  bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
+>
+>
+This seems to fix the issue at hand. I can send a patch if this is
+>
+considered an acceptable approach.
+>
+>
+Best Regards,
+>
+Fiona
+>
+Hello Fiona,
+
+Thanks for looking into it.  I've tried your 3rd option above and can
+confirm it does fix the deadlock, at least I can't reproduce it.  Other
+iotests also don't seem to be breaking.  So I personally am fine with
+that patch.  Would be nice to hear a word from the maintainers though on
+whether there're any caveats with such approach.
+
+Andrey
+
+On Wed, Apr 30, 2025 at 10:11â¯AM Andrey Drobyshev
+<andrey.drobyshev@virtuozzo.com> wrote:
+>
+>
+On 4/30/25 11:47 AM, Fiona Ebner wrote:
+>
+> Am 24.04.25 um 19:32 schrieb Andrey Drobyshev:
+>
+>> So it looks like main thread is processing job-dismiss request and is
+>
+>> holding write lock taken in block_job_remove_all_bdrv() (frame #20
+>
+>> above).  At the same time iothread spawns a coroutine which performs IO
+>
+>> request.  Before the coroutine is spawned, blk_aio_prwv() increases
+>
+>> 'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+>
+>> trying to acquire the read lock.  But main thread isn't releasing the
+>
+>> lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+>
+>> Here's the deadlock.
+>
+>
+>
+> And for the IO test you provided, it's client->nb_requests that behaves
+>
+> similarly to blk->in_flight here.
+>
+>
+>
+> The issue also reproduces easily when issuing the following QMP command
+>
+> in a loop while doing IO on a device:
+>
+>
+>
+>> void qmp_block_locked_drain(const char *node_name, Error **errp)
+>
+>> {
+>
+>>     BlockDriverState *bs;
+>
+>>
+>
+>>     bs = bdrv_find_node(node_name);
+>
+>>     if (!bs) {
+>
+>>         error_setg(errp, "node not found");
+>
+>>         return;
+>
+>>     }
+>
+>>
+>
+>>     bdrv_graph_wrlock();
+>
+>>     bdrv_drained_begin(bs);
+>
+>>     bdrv_drained_end(bs);
+>
+>>     bdrv_graph_wrunlock();
+>
+>> }
+>
+>
+>
+> It seems like either it would be necessary to require:
+>
+> 1. not draining inside an exclusively locked section
+>
+> or
+>
+> 2. making sure that variables used by drained_poll routines are only set
+>
+> while holding the reader lock
+>
+> ?
+>
+>
+>
+> Those seem to require rather involved changes, so a third option might
+>
+> be to make draining inside an exclusively locked section possible, by
+>
+> embedding such locked sections in a drained section:
+>
+>
+>
+>> diff --git a/blockjob.c b/blockjob.c
+>
+>> index 32007f31a9..9b2f3b3ea9 100644
+>
+>> --- a/blockjob.c
+>
+>> +++ b/blockjob.c
+>
+>> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
+>
+>>       * one to make sure that such a concurrent access does not attempt
+>
+>>       * to process an already freed BdrvChild.
+>
+>>       */
+>
+>> +    bdrv_drain_all_begin();
+>
+>>      bdrv_graph_wrlock();
+>
+>>      while (job->nodes) {
+>
+>>          GSList *l = job->nodes;
+>
+>> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
+>
+>>          g_slist_free_1(l);
+>
+>>      }
+>
+>>      bdrv_graph_wrunlock();
+>
+>> +    bdrv_drain_all_end();
+>
+>>  }
+>
+>>
+>
+>>  bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
+>
+>
+>
+> This seems to fix the issue at hand. I can send a patch if this is
+>
+> considered an acceptable approach.
+Kevin is aware of this thread but it's a public holiday tomorrow so it
+may be a little longer.
+
+Stefan
+
+Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben:
+>
+Hi all,
+>
+>
+There's a bug in block layer which leads to block graph deadlock.
+>
+Notably, it takes place when blockdev IO is processed within a separate
+>
+iothread.
+>
+>
+This was initially caught by our tests, and I was able to reduce it to a
+>
+relatively simple reproducer.  Such deadlocks are probably supposed to
+>
+be covered in iotests/graph-changes-while-io, but this deadlock isn't.
+>
+>
+Basically what the reproducer does is launches QEMU with a drive having
+>
+'iothread' option set, creates a chain of 2 snapshots, launches
+>
+block-commit job for a snapshot and then dismisses the job, starting
+>
+from the lower snapshot.  If the guest is issuing IO at the same time,
+>
+there's a race in acquiring block graph lock and a potential deadlock.
+>
+>
+Here's how it can be reproduced:
+>
+>
+1. Run QEMU:
+>
+> SRCDIR=/path/to/srcdir
+>
+>
+>
+>
+>
+>
+>
+>
+>
+> $SRCDIR/build/qemu-system-x86_64 -enable-kvm \
+>
+>
+>
+>   -machine q35 -cpu Nehalem \
+>
+>
+>
+>   -name guest=alma8-vm,debug-threads=on \
+>
+>
+>
+>   -m 2g -smp 2 \
+>
+>
+>
+>   -nographic -nodefaults \
+>
+>
+>
+>   -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
+>
+>
+>
+>   -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
+>
+>
+>
+>   -object iothread,id=iothread0 \
+>
+>
+>
+>   -blockdev
+>
+> node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
+>
+>  \
+>
+>   -device virtio-blk-pci,drive=disk,iothread=iothread0
+>
+>
+2. Launch IO (random reads) from within the guest:
+>
+> nc -U /var/run/alma8-serial.sock
+>
+> ...
+>
+> [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k
+>
+> --size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting
+>
+> --rw=randread --iodepth=1 --filename=/testfile
+>
+>
+3. Run snapshots creation & removal of lower snapshot operation in a
+>
+loop (script attached):
+>
+> while /bin/true ; do ./remove_lower_snap.sh ; done
+>
+>
+And then it occasionally hangs.
+>
+>
+Note: I've tried bisecting this, and looks like deadlock occurs starting
+>
+from the following commit:
+>
+>
+(BAD)  5bdbaebcce virtio: Re-enable notifications after drain
+>
+(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
+>
+>
+On the latest v10.0.0 it does hang as well.
+>
+>
+>
+Here's backtrace of the main thread:
+>
+>
+> #0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1,
+>
+> timeout=<optimized out>, sigmask=0x0) at
+>
+> ../sysdeps/unix/sysv/linux/ppoll.c:43
+>
+> #1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1,
+>
+> timeout=-1) at ../util/qemu-timer.c:329
+>
+> #2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20,
+>
+> ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
+>
+> #3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at
+>
+> ../util/aio-posix.c:730
+>
+> #4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950,
+>
+> parent=0x0, poll=true) at ../block/io.c:378
+>
+> #5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at
+>
+> ../block/io.c:391
+>
+> #6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7682
+>
+> #7  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7608
+>
+> #8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7668
+>
+> #9  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7608
+>
+> #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7668
+>
+> #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7608
+>
+> #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../blockjob.c:157
+>
+> #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7592
+>
+> #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7661
+>
+> #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
+>
+>     (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 =
+>
+> {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
+>
+> #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7592
+>
+> #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0,
+>
+> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160,
+>
+> errp=0x0)
+>
+>     at ../block.c:7661
+>
+> #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0,
+>
+> ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715
+>
+> #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at
+>
+> ../block.c:3317
+>
+> #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at
+>
+> ../blockjob.c:209
+>
+> #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at
+>
+> ../blockjob.c:82
+>
+> #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at
+>
+> ../job.c:474
+>
+> #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at
+>
+> ../job.c:771
+>
+> #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400,
+>
+> errp=0x7ffd94b4f488) at ../job.c:783
+>
+> --Type <RET> for more, q to quit, c to continue without paging--
+>
+> #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0
+>
+> "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138
+>
+> #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0,
+>
+> ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
+>
+> #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at
+>
+> ../qapi/qmp-dispatch.c:128
+>
+> #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at
+>
+> ../util/async.c:172
+>
+> #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at
+>
+> ../util/async.c:219
+>
+> #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at
+>
+> ../util/aio-posix.c:436
+>
+> #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200,
+>
+> callback=0x0, user_data=0x0) at ../util/async.c:361
+>
+> #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at
+>
+> ../glib/gmain.c:3364
+>
+> #33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079
+>
+> #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287
+>
+> #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at
+>
+> ../util/main-loop.c:310
+>
+> #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at
+>
+> ../util/main-loop.c:589
+>
+> #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
+>
+> #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at
+>
+> ../system/main.c:50
+>
+> #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at
+>
+> ../system/main.c:80
+>
+>
+>
+And here's coroutine trying to acquire read lock:
+>
+>
+> (gdb) qemu coroutine reader_queue->entries.sqh_first
+>
+> #0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0,
+>
+> to_=0x7fc537fff508, action=COROUTINE_YIELD) at
+>
+> ../util/coroutine-ucontext.c:321
+>
+> #1  0x0000557eb47d4d4a in qemu_coroutine_yield () at
+>
+> ../util/qemu-coroutine.c:339
+>
+> #2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0
+>
+> <reader_queue>, lock=0x7fc53c57de50, flags=0) at
+>
+> ../util/qemu-coroutine-lock.c:60
+>
+> #3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at
+>
+> ../block/graph-lock.c:231
+>
+> #4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at
+>
+> /home/root/src/qemu/master/include/block/graph-lock.h:213
+>
+> #5  0x0000557eb460fa41 in blk_co_do_preadv_part
+>
+>     (blk=0x557eb84c0810, offset=6890553344, bytes=4096,
+>
+> qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at
+>
+> ../block/block-backend.c:1339
+>
+> #6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at
+>
+> ../block/block-backend.c:1619
+>
+> #7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886)
+>
+> at ../util/coroutine-ucontext.c:175
+>
+> #8  0x00007fc547c2a360 in __start_context () at
+>
+> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
+>
+> #9  0x00007ffd94b4ea40 in  ()
+>
+> #10 0x0000000000000000 in  ()
+>
+>
+>
+So it looks like main thread is processing job-dismiss request and is
+>
+holding write lock taken in block_job_remove_all_bdrv() (frame #20
+>
+above).  At the same time iothread spawns a coroutine which performs IO
+>
+request.  Before the coroutine is spawned, blk_aio_prwv() increases
+>
+'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+>
+trying to acquire the read lock.  But main thread isn't releasing the
+>
+lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+>
+Here's the deadlock.
+>
+>
+Any comments and suggestions on the subject are welcomed.  Thanks!
+I think this is what the blk_wait_while_drained() call was supposed to
+address in blk_co_do_preadv_part(). However, with the use of multiple
+I/O threads, this is racy.
+
+Do you think that in your case we hit the small race window between the
+checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there
+another reason why blk_wait_while_drained() didn't do its job?
+
+Kevin
+
+On 5/2/25 19:34, Kevin Wolf wrote:
+Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben:
+Hi all,
+
+There's a bug in block layer which leads to block graph deadlock.
+Notably, it takes place when blockdev IO is processed within a separate
+iothread.
+
+This was initially caught by our tests, and I was able to reduce it to a
+relatively simple reproducer.  Such deadlocks are probably supposed to
+be covered in iotests/graph-changes-while-io, but this deadlock isn't.
+
+Basically what the reproducer does is launches QEMU with a drive having
+'iothread' option set, creates a chain of 2 snapshots, launches
+block-commit job for a snapshot and then dismisses the job, starting
+from the lower snapshot.  If the guest is issuing IO at the same time,
+there's a race in acquiring block graph lock and a potential deadlock.
+
+Here's how it can be reproduced:
+
+1. Run QEMU:
+SRCDIR=/path/to/srcdir
+$SRCDIR/build/qemu-system-x86_64 -enable-kvm \
+-machine q35 -cpu Nehalem \
+   -name guest=alma8-vm,debug-threads=on \
+   -m 2g -smp 2 \
+   -nographic -nodefaults \
+   -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
+   -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
+   -object iothread,id=iothread0 \
+   -blockdev 
+node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
+ \
+   -device virtio-blk-pci,drive=disk,iothread=iothread0
+2. Launch IO (random reads) from within the guest:
+nc -U /var/run/alma8-serial.sock
+...
+[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k 
+--size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting 
+--rw=randread --iodepth=1 --filename=/testfile
+3. Run snapshots creation & removal of lower snapshot operation in a
+loop (script attached):
+while /bin/true ; do ./remove_lower_snap.sh ; done
+And then it occasionally hangs.
+
+Note: I've tried bisecting this, and looks like deadlock occurs starting
+from the following commit:
+
+(BAD)  5bdbaebcce virtio: Re-enable notifications after drain
+(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
+
+On the latest v10.0.0 it does hang as well.
+
+
+Here's backtrace of the main thread:
+#0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, timeout=<optimized 
+out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43
+#1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, timeout=-1) 
+at ../util/qemu-timer.c:329
+#2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, 
+ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
+#3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at 
+../util/aio-posix.c:730
+#4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, parent=0x0, 
+poll=true) at ../block/io.c:378
+#5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at 
+../block/io.c:391
+#6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7682
+#7  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7608
+#8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7668
+#9  0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7608
+#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7668
+#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7608
+#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../blockjob.c:157
+#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7592
+#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7661
+#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
+     (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, 
+tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
+#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7592
+#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, 
+ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, 
+errp=0x0)
+     at ../block.c:7661
+#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, 
+ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715
+#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at 
+../block.c:3317
+#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at 
+../blockjob.c:209
+#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at 
+../blockjob.c:82
+#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at ../job.c:474
+#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at 
+../job.c:771
+#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, 
+errp=0x7ffd94b4f488) at ../job.c:783
+--Type <RET> for more, q to quit, c to continue without paging--
+#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", 
+errp=0x7ffd94b4f488) at ../job-qmp.c:138
+#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, 
+ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
+#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at 
+../qapi/qmp-dispatch.c:128
+#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at ../util/async.c:172
+#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at 
+../util/async.c:219
+#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at 
+../util/aio-posix.c:436
+#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, 
+callback=0x0, user_data=0x0) at ../util/async.c:361
+#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at 
+../glib/gmain.c:3364
+#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079
+#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287
+#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at 
+../util/main-loop.c:310
+#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at 
+../util/main-loop.c:589
+#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
+#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at ../system/main.c:50
+#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at 
+../system/main.c:80
+And here's coroutine trying to acquire read lock:
+(gdb) qemu coroutine reader_queue->entries.sqh_first
+#0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, 
+to_=0x7fc537fff508, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:321
+#1  0x0000557eb47d4d4a in qemu_coroutine_yield () at 
+../util/qemu-coroutine.c:339
+#2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 
+<reader_queue>, lock=0x7fc53c57de50, flags=0) at 
+../util/qemu-coroutine-lock.c:60
+#3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231
+#4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at 
+/home/root/src/qemu/master/include/block/graph-lock.h:213
+#5  0x0000557eb460fa41 in blk_co_do_preadv_part
+     (blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, 
+qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339
+#6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at 
+../block/block-backend.c:1619
+#7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at 
+../util/coroutine-ucontext.c:175
+#8  0x00007fc547c2a360 in __start_context () at 
+../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
+#9  0x00007ffd94b4ea40 in  ()
+#10 0x0000000000000000 in  ()
+So it looks like main thread is processing job-dismiss request and is
+holding write lock taken in block_job_remove_all_bdrv() (frame #20
+above).  At the same time iothread spawns a coroutine which performs IO
+request.  Before the coroutine is spawned, blk_aio_prwv() increases
+'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+trying to acquire the read lock.  But main thread isn't releasing the
+lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+Here's the deadlock.
+
+Any comments and suggestions on the subject are welcomed.  Thanks!
+I think this is what the blk_wait_while_drained() call was supposed to
+address in blk_co_do_preadv_part(). However, with the use of multiple
+I/O threads, this is racy.
+
+Do you think that in your case we hit the small race window between the
+checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there
+another reason why blk_wait_while_drained() didn't do its job?
+
+Kevin
+At my opinion there is very big race window. Main thread has
+eaten graph write lock. After that another coroutine is stalled
+within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only
+after that main thread has started drain. That is why Fiona's idea is
+looking working. Though this would mean that normally we should always
+do that at the moment when we acquire write lock. May be even inside
+this function. Den
+
+Am 02.05.2025 um 19:52 hat Denis V. Lunev geschrieben:
+>
+On 5/2/25 19:34, Kevin Wolf wrote:
+>
+> Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben:
+>
+> > Hi all,
+>
+> >
+>
+> > There's a bug in block layer which leads to block graph deadlock.
+>
+> > Notably, it takes place when blockdev IO is processed within a separate
+>
+> > iothread.
+>
+> >
+>
+> > This was initially caught by our tests, and I was able to reduce it to a
+>
+> > relatively simple reproducer.  Such deadlocks are probably supposed to
+>
+> > be covered in iotests/graph-changes-while-io, but this deadlock isn't.
+>
+> >
+>
+> > Basically what the reproducer does is launches QEMU with a drive having
+>
+> > 'iothread' option set, creates a chain of 2 snapshots, launches
+>
+> > block-commit job for a snapshot and then dismisses the job, starting
+>
+> > from the lower snapshot.  If the guest is issuing IO at the same time,
+>
+> > there's a race in acquiring block graph lock and a potential deadlock.
+>
+> >
+>
+> > Here's how it can be reproduced:
+>
+> >
+>
+> > 1. Run QEMU:
+>
+> > > SRCDIR=/path/to/srcdir
+>
+> > > $SRCDIR/build/qemu-system-x86_64 -enable-kvm \
+>
+> > >    -machine q35 -cpu Nehalem \
+>
+> > >    -name guest=alma8-vm,debug-threads=on \
+>
+> > >    -m 2g -smp 2 \
+>
+> > >    -nographic -nodefaults \
+>
+> > >    -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \
+>
+> > >    -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \
+>
+> > >    -object iothread,id=iothread0 \
+>
+> > >    -blockdev
+>
+> > > node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2
+>
+> > >  \
+>
+> > >    -device virtio-blk-pci,drive=disk,iothread=iothread0
+>
+> > 2. Launch IO (random reads) from within the guest:
+>
+> > > nc -U /var/run/alma8-serial.sock
+>
+> > > ...
+>
+> > > [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1
+>
+> > > --bs=4k --size=1G --numjobs=1 --time_based=1 --runtime=300
+>
+> > > --group_reporting --rw=randread --iodepth=1 --filename=/testfile
+>
+> > 3. Run snapshots creation & removal of lower snapshot operation in a
+>
+> > loop (script attached):
+>
+> > > while /bin/true ; do ./remove_lower_snap.sh ; done
+>
+> > And then it occasionally hangs.
+>
+> >
+>
+> > Note: I've tried bisecting this, and looks like deadlock occurs starting
+>
+> > from the following commit:
+>
+> >
+>
+> > (BAD)  5bdbaebcce virtio: Re-enable notifications after drain
+>
+> > (GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll
+>
+> >
+>
+> > On the latest v10.0.0 it does hang as well.
+>
+> >
+>
+> >
+>
+> > Here's backtrace of the main thread:
+>
+> >
+>
+> > > #0  0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1,
+>
+> > > timeout=<optimized out>, sigmask=0x0) at
+>
+> > > ../sysdeps/unix/sysv/linux/ppoll.c:43
+>
+> > > #1  0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1,
+>
+> > > timeout=-1) at ../util/qemu-timer.c:329
+>
+> > > #2  0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20,
+>
+> > > ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79
+>
+> > > #3  0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true)
+>
+> > > at ../util/aio-posix.c:730
+>
+> > > #4  0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950,
+>
+> > > parent=0x0, poll=true) at ../block/io.c:378
+>
+> > > #5  0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at
+>
+> > > ../block/io.c:391
+>
+> > > #6  0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950,
+>
+> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7682
+>
+> > > #7  0x0000557eb45ebf2b in bdrv_child_change_aio_context
+>
+> > > (c=0x557eb7964250, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7608
+>
+> > > #8  0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0,
+>
+> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7668
+>
+> > > #9  0x0000557eb45ebf2b in bdrv_child_change_aio_context
+>
+> > > (c=0x557eb7e59110, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7608
+>
+> > > #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960,
+>
+> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7668
+>
+> > > #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context
+>
+> > > (c=0x557eb814ed80, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7608
+>
+> > > #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0,
+>
+> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../blockjob.c:157
+>
+> > > #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context
+>
+> > > (c=0x557eb7c9d3f0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7592
+>
+> > > #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310,
+>
+> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7661
+>
+> > > #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx
+>
+> > >      (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60
+>
+> > > = {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234
+>
+> > > #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context
+>
+> > > (c=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7592
+>
+> > > #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0,
+>
+> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...},
+>
+> > > tran=0x557eb7a87160, errp=0x0)
+>
+> > >      at ../block.c:7661
+>
+> > > #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context
+>
+> > > (bs=0x557eb79575e0, ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at
+>
+> > > ../block.c:7715
+>
+> > > #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30)
+>
+> > > at ../block.c:3317
+>
+> > > #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv
+>
+> > > (job=0x557eb7952800) at ../blockjob.c:209
+>
+> > > #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at
+>
+> > > ../blockjob.c:82
+>
+> > > #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at
+>
+> > > ../job.c:474
+>
+> > > #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at
+>
+> > > ../job.c:771
+>
+> > > #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400,
+>
+> > > errp=0x7ffd94b4f488) at ../job.c:783
+>
+> > > --Type <RET> for more, q to quit, c to continue without paging--
+>
+> > > #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0
+>
+> > > "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138
+>
+> > > #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0,
+>
+> > > ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221
+>
+> > > #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at
+>
+> > > ../qapi/qmp-dispatch.c:128
+>
+> > > #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at
+>
+> > > ../util/async.c:172
+>
+> > > #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at
+>
+> > > ../util/async.c:219
+>
+> > > #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at
+>
+> > > ../util/aio-posix.c:436
+>
+> > > #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200,
+>
+> > > callback=0x0, user_data=0x0) at ../util/async.c:361
+>
+> > > #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at
+>
+> > > ../glib/gmain.c:3364
+>
+> > > #33 g_main_context_dispatch (context=0x557eb76c6430) at
+>
+> > > ../glib/gmain.c:4079
+>
+> > > #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at
+>
+> > > ../util/main-loop.c:287
+>
+> > > #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at
+>
+> > > ../util/main-loop.c:310
+>
+> > > #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at
+>
+> > > ../util/main-loop.c:589
+>
+> > > #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835
+>
+> > > #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at
+>
+> > > ../system/main.c:50
+>
+> > > #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at
+>
+> > > ../system/main.c:80
+>
+> >
+>
+> > And here's coroutine trying to acquire read lock:
+>
+> >
+>
+> > > (gdb) qemu coroutine reader_queue->entries.sqh_first
+>
+> > > #0  0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0,
+>
+> > > to_=0x7fc537fff508, action=COROUTINE_YIELD) at
+>
+> > > ../util/coroutine-ucontext.c:321
+>
+> > > #1  0x0000557eb47d4d4a in qemu_coroutine_yield () at
+>
+> > > ../util/qemu-coroutine.c:339
+>
+> > > #2  0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0
+>
+> > > <reader_queue>, lock=0x7fc53c57de50, flags=0) at
+>
+> > > ../util/qemu-coroutine-lock.c:60
+>
+> > > #3  0x0000557eb461fea7 in bdrv_graph_co_rdlock () at
+>
+> > > ../block/graph-lock.c:231
+>
+> > > #4  0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3)
+>
+> > > at /home/root/src/qemu/master/include/block/graph-lock.h:213
+>
+> > > #5  0x0000557eb460fa41 in blk_co_do_preadv_part
+>
+> > >      (blk=0x557eb84c0810, offset=6890553344, bytes=4096,
+>
+> > > qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at
+>
+> > > ../block/block-backend.c:1339
+>
+> > > #6  0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at
+>
+> > > ../block/block-backend.c:1619
+>
+> > > #7  0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040,
+>
+> > > i1=21886) at ../util/coroutine-ucontext.c:175
+>
+> > > #8  0x00007fc547c2a360 in __start_context () at
+>
+> > > ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
+>
+> > > #9  0x00007ffd94b4ea40 in  ()
+>
+> > > #10 0x0000000000000000 in  ()
+>
+> >
+>
+> > So it looks like main thread is processing job-dismiss request and is
+>
+> > holding write lock taken in block_job_remove_all_bdrv() (frame #20
+>
+> > above).  At the same time iothread spawns a coroutine which performs IO
+>
+> > request.  Before the coroutine is spawned, blk_aio_prwv() increases
+>
+> > 'in_flight' counter for Blk.  Then blk_co_do_preadv_part() (frame #5) is
+>
+> > trying to acquire the read lock.  But main thread isn't releasing the
+>
+> > lock as blk_root_drained_poll() returns true since blk->in_flight > 0.
+>
+> > Here's the deadlock.
+>
+> >
+>
+> > Any comments and suggestions on the subject are welcomed.  Thanks!
+>
+> I think this is what the blk_wait_while_drained() call was supposed to
+>
+> address in blk_co_do_preadv_part(). However, with the use of multiple
+>
+> I/O threads, this is racy.
+>
+>
+>
+> Do you think that in your case we hit the small race window between the
+>
+> checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there
+>
+> another reason why blk_wait_while_drained() didn't do its job?
+>
+>
+>
+At my opinion there is very big race window. Main thread has
+>
+eaten graph write lock. After that another coroutine is stalled
+>
+within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only
+>
+after that main thread has started drain.
+You're right, I confused taking the write lock with draining there.
+
+>
+That is why Fiona's idea is looking working. Though this would mean
+>
+that normally we should always do that at the moment when we acquire
+>
+write lock. May be even inside this function.
+I actually see now that not all of my graph locking patches were merged.
+At least I did have the thought that bdrv_drained_begin() must be marked
+GRAPH_UNLOCKED because it polls. That means that calling it from inside
+bdrv_try_change_aio_context() is actually forbidden (and that's the part
+I didn't see back then because it doesn't have TSA annotations).
+
+If you refactor the code to move the drain out to before the lock is
+taken, I think you end up with Fiona's patch, except you'll remove the
+forbidden inner drain and add more annotations for some functions and
+clarify the rules around them. I don't know, but I wouldn't be surprised
+if along the process we find other bugs, too.
+
+So Fiona's drain looks right to me, but we should probably approach it
+more systematically.
+
+Kevin
+
diff --git a/classification_output/05/device/24930826 b/classification_output/05/device/24930826
new file mode 100644
index 000000000..b4e6bcf3c
--- /dev/null
+++ b/classification_output/05/device/24930826
@@ -0,0 +1,41 @@
+device: 0.709
+graphic: 0.667
+mistranslation: 0.637
+instruction: 0.555
+other: 0.535
+network: 0.513
+semantic: 0.487
+vnc: 0.473
+socket: 0.447
+boot: 0.218
+KVM: 0.172
+assembly: 0.142
+
+[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate
+
+Hi, guys
+
+I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 
+(Guest OS)
+
+The xml of nic is as followed:
+<interface type='vhostuser'>
+  <mac address='52:54:00:3b:83:aa'/>
+  <source type='unix' path='/var/run/vhost-user/port1' mode='client'/>
+  <target dev='port1'/>
+  <model type='virtio'/>
+  <driver queues='4'/>
+  <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
+</interface>
+
+Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest 
+OS. This operation returns success.
+After guest OS discover nic successfully, I use virsh detach-device win2008 
+vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate.
+
+However, if I hot-plug and hot-unplug virtio-net nic , it will not fail.
+
+I have analysis the process of qmp_device_del , I found that qemu have inject 
+interrupt to acpi to let it notice guest OS to remove nic.
+I guess there is something wrong in Windows when handle the interrupt.
+
diff --git a/classification_output/05/device/26095107 b/classification_output/05/device/26095107
new file mode 100644
index 000000000..f23d3275d
--- /dev/null
+++ b/classification_output/05/device/26095107
@@ -0,0 +1,166 @@
+instruction: 0.991
+assembly: 0.988
+device: 0.988
+socket: 0.987
+boot: 0.987
+KVM: 0.985
+other: 0.979
+semantic: 0.974
+vnc: 0.972
+graphic: 0.955
+mistranslation: 0.930
+network: 0.879
+
+[Qemu-devel]  [Bug Report] vm paused after succeeding to migrate
+
+Hi, all
+I encounterd a bug when I try to migrate a windows vm.
+
+Enviroment information:
+host A: cpu E5620(model WestmereEP without flag xsave)
+host B: cpu E5-2643(model SandyBridgeEP with xsave)
+
+The reproduce steps is :
+1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
+2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
+3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
+4. Then migrate the vm to host A (successfully), but vm was paused, and qemu 
+printed log as followed:
+
+KVM: entry failed, hardware error 0x80000021
+
+If you're running a guest on an Intel machine without unrestricted mode
+support, the failure can be most likely due to the guest entering an invalid
+state for Intel VT. For example, the guest maybe running in big real mode
+which is not supported on less recent Intel processors.
+
+EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
+ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
+EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
+ES =0000 00000000 0000ffff 00009300
+CS =f000 ffff0000 0000ffff 00009b00
+SS =0000 00000000 0000ffff 00009300
+DS =0000 00000000 0000ffff 00009300
+FS =0000 00000000 0000ffff 00009300
+GS =0000 00000000 0000ffff 00009300
+LDT=0000 00000000 0000ffff 00008200
+TR =0000 00000000 0000ffff 00008b00
+GDT=     00000000 0000ffff
+IDT=     00000000 0000ffff
+CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
+DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 
+DR3=0000000000000000
+DR6=00000000ffff0ff0 DR7=0000000000000400
+EFER=0000000000000000
+Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 
+00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+
+I have found that problem happened when kvm_put_sregs returns err -22(called by 
+kvm_arch_put_registers(qemu)).
+Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has no 
+X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
+So should we cancel migration when kvm_arch_put_registers returns error?
+
+* linzhecheng (address@hidden) wrote:
+>
+Hi, all
+>
+I encounterd a bug when I try to migrate a windows vm.
+>
+>
+Enviroment information:
+>
+host A: cpu E5620(model WestmereEP without flag xsave)
+>
+host B: cpu E5-2643(model SandyBridgeEP with xsave)
+>
+>
+The reproduce steps is :
+>
+1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
+>
+2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
+>
+3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
+>
+4. Then migrate the vm to host A (successfully), but vm was paused, and qemu
+>
+printed log as followed:
+Remember that migrating using -cpu host  across different CPU models is NOT
+expected to work.
+
+>
+KVM: entry failed, hardware error 0x80000021
+>
+>
+If you're running a guest on an Intel machine without unrestricted mode
+>
+support, the failure can be most likely due to the guest entering an invalid
+>
+state for Intel VT. For example, the guest maybe running in big real mode
+>
+which is not supported on less recent Intel processors.
+>
+>
+EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
+>
+ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
+>
+EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
+>
+ES =0000 00000000 0000ffff 00009300
+>
+CS =f000 ffff0000 0000ffff 00009b00
+>
+SS =0000 00000000 0000ffff 00009300
+>
+DS =0000 00000000 0000ffff 00009300
+>
+FS =0000 00000000 0000ffff 00009300
+>
+GS =0000 00000000 0000ffff 00009300
+>
+LDT=0000 00000000 0000ffff 00008200
+>
+TR =0000 00000000 0000ffff 00008b00
+>
+GDT=     00000000 0000ffff
+>
+IDT=     00000000 0000ffff
+>
+CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
+>
+DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
+>
+DR3=0000000000000000
+>
+DR6=00000000ffff0ff0 DR7=0000000000000400
+>
+EFER=0000000000000000
+>
+Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
+>
+00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+>
+00
+>
+>
+I have found that problem happened when kvm_put_sregs returns err -22(called
+>
+by kvm_arch_put_registers(qemu)).
+>
+Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has
+>
+no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
+>
+So should we cancel migration when kvm_arch_put_registers returns error?
+It would seem good if we can make the migration fail there rather than
+hitting that KVM error.
+It looks like we need to do a bit of plumbing to convert the places that
+call it to return a bool rather than void.
+
+Dave
+
+--
+Dr. David Alan Gilbert / address@hidden / Manchester, UK
+
diff --git a/classification_output/05/device/28596630 b/classification_output/05/device/28596630
new file mode 100644
index 000000000..91a754e21
--- /dev/null
+++ b/classification_output/05/device/28596630
@@ -0,0 +1,121 @@
+device: 0.835
+semantic: 0.814
+mistranslation: 0.813
+graphic: 0.785
+network: 0.780
+instruction: 0.748
+assembly: 0.725
+other: 0.707
+socket: 0.697
+vnc: 0.674
+KVM: 0.649
+boot: 0.609
+
+[Qemu-devel] [BUG] [low severity] a strange appearance of message involving slirp while doing "empty" make
+
+Folks,
+
+If qemu tree is already fully built, and "make" is attempted, for 3.1, the 
+outcome is:
+
+$ make
+        CHK version_gen.h
+$
+
+For 4.0-rc0, the outcome seems to be different:
+
+$ make
+make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
+make[1]: Nothing to be done for 'all'.
+make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
+        CHK version_gen.h
+$
+
+Not sure how significant is that, but I report it just in case.
+
+Yours,
+Aleksandar
+
+On 20/03/2019 22.08, Aleksandar Markovic wrote:
+>
+Folks,
+>
+>
+If qemu tree is already fully built, and "make" is attempted, for 3.1, the
+>
+outcome is:
+>
+>
+$ make
+>
+CHK version_gen.h
+>
+$
+>
+>
+For 4.0-rc0, the outcome seems to be different:
+>
+>
+$ make
+>
+make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
+>
+make[1]: Nothing to be done for 'all'.
+>
+make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
+>
+CHK version_gen.h
+>
+$
+>
+>
+Not sure how significant is that, but I report it just in case.
+It's likely because slirp is currently being reworked to become a
+separate project, so the makefiles have been changed a little bit. I
+guess the message will go away again once slirp has become a stand-alone
+library.
+
+ Thomas
+
+On Fri, 22 Mar 2019 at 04:59, Thomas Huth <address@hidden> wrote:
+>
+On 20/03/2019 22.08, Aleksandar Markovic wrote:
+>
+> $ make
+>
+> make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
+>
+> make[1]: Nothing to be done for 'all'.
+>
+> make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp'
+>
+>       CHK version_gen.h
+>
+> $
+>
+>
+>
+> Not sure how significant is that, but I report it just in case.
+>
+>
+It's likely because slirp is currently being reworked to become a
+>
+separate project, so the makefiles have been changed a little bit. I
+>
+guess the message will go away again once slirp has become a stand-alone
+>
+library.
+Well, we'll still need to ship slirp for the foreseeable future...
+
+I think the cause of this is that the rule in Makefile for
+calling the slirp Makefile is not passing it $(SUBDIR_MAKEFLAGS)
+like all the other recursive make invocations. If we do that
+then we'll suppress the entering/leaving messages for
+non-verbose builds. (Some tweaking will be needed as
+it looks like the slirp makefile has picked an incompatible
+meaning for $BUILD_DIR, which the SUBDIR_MAKEFLAGS will
+also be passing to it.)
+
+thanks
+-- PMM
+
diff --git a/classification_output/05/device/36568044 b/classification_output/05/device/36568044
new file mode 100644
index 000000000..ba6cad70a
--- /dev/null
+++ b/classification_output/05/device/36568044
@@ -0,0 +1,4589 @@
+mistranslation: 0.962
+device: 0.931
+graphic: 0.931
+instruction: 0.930
+other: 0.930
+assembly: 0.926
+semantic: 0.923
+KVM: 0.914
+socket: 0.907
+vnc: 0.905
+network: 0.904
+boot: 0.895
+
+[BUG, RFC] cpr-transfer: qxl guest driver crashes after migration
+
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+>
+EMULATOR=/path/to/emulator
+>
+ROOTFS=/path/to/image
+>
+QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>
+$EMULATOR -enable-kvm \
+>
+-machine q35 \
+>
+-cpu host -smp 2 -m 2G \
+>
+-object
+>
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+>
+-machine memory-backend=ram0 \
+>
+-machine aux-ram-share=on \
+>
+-drive file=$ROOTFS,media=disk,if=virtio \
+>
+-qmp unix:$QMPSOCK,server=on,wait=off \
+>
+-nographic \
+>
+-device qxl-vga
+Run migration target:
+>
+EMULATOR=/path/to/emulator
+>
+ROOTFS=/path/to/image
+>
+QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>
+>
+>
+$EMULATOR -enable-kvm \
+>
+-machine q35 \
+>
+-cpu host -smp 2 -m 2G \
+>
+-object
+>
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+>
+-machine memory-backend=ram0 \
+>
+-machine aux-ram-share=on \
+>
+-drive file=$ROOTFS,media=disk,if=virtio \
+>
+-qmp unix:$QMPSOCK,server=on,wait=off \
+>
+-nographic \
+>
+-device qxl-vga \
+>
+-incoming tcp:0:44444 \
+>
+-incoming '{"channel-type": "cpr", "addr": { "transport": "socket",
+>
+"type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+>
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>
+$QMPSHELL -p $QMPSOCK <<EOF
+>
+migrate-set-parameters mode=cpr-transfer
+>
+migrate
+>
+channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
+>
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+>
+[   73.962002] [TTM] Buffer eviction failed
+>
+[   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
+>
+[   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate
+>
+VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+>
+#!/bin/bash
+>
+>
+chvt 3
+>
+>
+for j in $(seq 80); do
+>
+echo "$(date) starting round $j"
+>
+if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != ""
+>
+]; then
+>
+echo "bug was reproduced after $j tries"
+>
+exit 1
+>
+fi
+>
+for i in $(seq 100); do
+>
+dmesg > /dev/tty3
+>
+done
+>
+done
+>
+>
+echo "bug could not be reproduced"
+>
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?  Any
+suggestions would be appreciated.  Thanks!
+
+Andrey
+
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+     -machine q35 \
+     -cpu host -smp 2 -m 2G \
+     -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+     -machine memory-backend=ram0 \
+     -machine aux-ram-share=on \
+     -drive file=$ROOTFS,media=disk,if=virtio \
+     -qmp unix:$QMPSOCK,server=on,wait=off \
+     -nographic \
+     -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+-machine q35 \
+     -cpu host -smp 2 -m 2G \
+     -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+     -machine memory-backend=ram0 \
+     -machine aux-ram-share=on \
+     -drive file=$ROOTFS,media=disk,if=virtio \
+     -qmp unix:$QMPSOCK,server=on,wait=off \
+     -nographic \
+     -device qxl-vga \
+     -incoming tcp:0:44444 \
+     -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", 
+"path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+     migrate-set-parameters mode=cpr-transfer
+     migrate 
+channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[   73.962002] [TTM] Buffer eviction failed
+[   73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
+[   73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
+VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+         echo "$(date) starting round $j"
+         if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" 
+]; then
+                 echo "bug was reproduced after $j tries"
+                 exit 1
+         fi
+         for i in $(seq 100); do
+                 dmesg > /dev/tty3
+         done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?  Any
+suggestions would be appreciated.  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+
+- Steve
+
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â  -machine q35 \
+Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â  -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â  -nographic \
+Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â  -machine q35 \
+Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â  -object 
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\
+Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â  -nographic \
+Â Â Â Â  -device qxl-vga \
+Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", 
+"path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â  migrate 
+channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate 
+VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" 
+]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.  A message is printed at migration start time.
+1740667681-257312-1-git-send-email-steven.sistare@oracle.com
+/">https://lore.kernel.org/qemu-devel/
+1740667681-257312-1-git-send-email-steven.sistare@oracle.com
+/
+- Steve
+
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>> Hi all,
+>
+>>
+>
+>> We've been experimenting with cpr-transfer migration mode recently and
+>
+>> have discovered the following issue with the guest QXL driver:
+>
+>>
+>
+>> Run migration source:
+>
+>>> EMULATOR=/path/to/emulator
+>
+>>> ROOTFS=/path/to/image
+>
+>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>
+>
+>>> $EMULATOR -enable-kvm \
+>
+>>> Â Â Â Â  -machine q35 \
+>
+>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>> ram0,share=on\
+>
+>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>> Â Â Â Â  -nographic \
+>
+>>> Â Â Â Â  -device qxl-vga
+>
+>>
+>
+>> Run migration target:
+>
+>>> EMULATOR=/path/to/emulator
+>
+>>> ROOTFS=/path/to/image
+>
+>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>> $EMULATOR -enable-kvm \
+>
+>>> Â Â Â Â  -machine q35 \
+>
+>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>> ram0,share=on\
+>
+>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>> Â Â Â Â  -nographic \
+>
+>>> Â Â Â Â  -device qxl-vga \
+>
+>>> Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>> Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>
+>
+>>
+>
+>> Launch the migration:
+>
+>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>
+>
+>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>> Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>> Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>> {"channel-type":"cpr","addr":
+>
+>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>> dst.sock"}}]
+>
+>>> EOF
+>
+>>
+>
+>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>> following messages:
+>
+>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>> 0x00000001)
+>
+>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>> allocate VRAM BO
+>
+>>
+>
+>> That seems to be a known kernel QXL driver bug:
+>
+>>
+>
+>>
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+>
+>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>
+>
+>> (the latter discussion contains that reproduce script which speeds up
+>
+>> the crash in the guest):
+>
+>>> #!/bin/bash
+>
+>>>
+>
+>>> chvt 3
+>
+>>>
+>
+>>> for j in $(seq 80); do
+>
+>>> Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>> Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+>
+>>> BO")" != "" ]; then
+>
+>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>> Â Â Â Â Â Â Â Â  fi
+>
+>>> Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>> Â Â Â Â Â Â Â Â  done
+>
+>>> done
+>
+>>>
+>
+>>> echo "bug could not be reproduced"
+>
+>>> exit 0
+>
+>>
+>
+>> The bug itself seems to remain unfixed, as I was able to reproduce that
+>
+>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>> without the cpr-transfer migration the above reproduce doesn't lead to
+>
+>> crash on the source VM.
+>
+>>
+>
+>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+>
+>> rather passes it through the memory backend object, our code might
+>
+>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>> corruption so far.
+>
+>>
+>
+>> Could somebody help the investigation and take a look into this?Â  Any
+>
+>> suggestions would be appreciated.Â  Thanks!
+>
+>
+>
+> Possibly some memory region created by qxl is not being preserved.
+>
+> Try adding these traces to see what is preserved:
+>
+>
+>
+> -trace enable='*cpr*'
+>
+> -trace enable='*ram_alloc*'
+>
+>
+Also try adding this patch to see if it flags any ram blocks as not
+>
+compatible with cpr.Â  A message is printed at migration start time.
+>
+Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
+>
+steven.sistare@oracle.com/
+>
+>
+- Steve
+>
+With the traces enabled + the "migration: ram block cpr blockers" patch
+applied:
+
+Source:
+>
+cpr_find_fd pc.bios, id 0 returns -1
+>
+cpr_save_fd pc.bios, id 0, fd 22
+>
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+0x7fec18e00000
+>
+cpr_find_fd pc.rom, id 0 returns -1
+>
+cpr_save_fd pc.rom, id 0, fd 23
+>
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+0x7fec18c00000
+>
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+24 host 0x7fec18a00000
+>
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+fd 25 host 0x7feb77e00000
+>
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27
+>
+host 0x7fec18800000
+>
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+fd 28 host 0x7feb73c00000
+>
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34
+>
+host 0x7fec18600000
+>
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35
+>
+host 0x7fec18200000
+>
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36
+>
+host 0x7feb8b600000
+>
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host
+>
+0x7feb8b400000
+>
+>
+cpr_state_save cpr-transfer mode
+>
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+>
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+cpr_state_load cpr-transfer mode
+>
+cpr_find_fd pc.bios, id 0 returns 20
+>
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+0x7fcdc9800000
+>
+cpr_find_fd pc.rom, id 0 returns 19
+>
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+0x7fcdc9600000
+>
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+18 host 0x7fcdc9400000
+>
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+fd 17 host 0x7fcd27e00000
+>
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16
+>
+host 0x7fcdc9200000
+>
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+fd 15 host 0x7fcd23c00000
+>
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14
+>
+host 0x7fcdc8800000
+>
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13
+>
+host 0x7fcdc8400000
+>
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11
+>
+host 0x7fcdc8200000
+>
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host
+>
+0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+addresses), and no incompatible ram blocks are found during migration.
+
+Andrey
+
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+>
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+> On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>>> Hi all,
+>
+>>>
+>
+>>> We've been experimenting with cpr-transfer migration mode recently and
+>
+>>> have discovered the following issue with the guest QXL driver:
+>
+>>>
+>
+>>> Run migration source:
+>
+>>>> EMULATOR=/path/to/emulator
+>
+>>>> ROOTFS=/path/to/image
+>
+>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>
+>
+>>>> $EMULATOR -enable-kvm \
+>
+>>>> Â Â Â Â  -machine q35 \
+>
+>>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>> ram0,share=on\
+>
+>>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>> Â Â Â Â  -nographic \
+>
+>>>> Â Â Â Â  -device qxl-vga
+>
+>>>
+>
+>>> Run migration target:
+>
+>>>> EMULATOR=/path/to/emulator
+>
+>>>> ROOTFS=/path/to/image
+>
+>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>>> $EMULATOR -enable-kvm \
+>
+>>>> Â Â Â Â  -machine q35 \
+>
+>>>> Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>> Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>> ram0,share=on\
+>
+>>>> Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>> Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>> Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>> Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>> Â Â Â Â  -nographic \
+>
+>>>> Â Â Â Â  -device qxl-vga \
+>
+>>>> Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>>> Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>>
+>
+>>>
+>
+>>> Launch the migration:
+>
+>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>
+>
+>>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>>> Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>>> Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>>> {"channel-type":"cpr","addr":
+>
+>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>>> dst.sock"}}]
+>
+>>>> EOF
+>
+>>>
+>
+>>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>>> following messages:
+>
+>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>>> 0x00000001)
+>
+>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>>> allocate VRAM BO
+>
+>>>
+>
+>>> That seems to be a known kernel QXL driver bug:
+>
+>>>
+>
+>>>
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+>
+>>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>>
+>
+>>> (the latter discussion contains that reproduce script which speeds up
+>
+>>> the crash in the guest):
+>
+>>>> #!/bin/bash
+>
+>>>>
+>
+>>>> chvt 3
+>
+>>>>
+>
+>>>> for j in $(seq 80); do
+>
+>>>> Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>>> Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+>
+>>>> BO")" != "" ]; then
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>>> Â Â Â Â Â Â Â Â  fi
+>
+>>>> Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>>> Â Â Â Â Â Â Â Â  done
+>
+>>>> done
+>
+>>>>
+>
+>>>> echo "bug could not be reproduced"
+>
+>>>> exit 0
+>
+>>>
+>
+>>> The bug itself seems to remain unfixed, as I was able to reproduce that
+>
+>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>>> without the cpr-transfer migration the above reproduce doesn't lead to
+>
+>>> crash on the source VM.
+>
+>>>
+>
+>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+>
+>>> rather passes it through the memory backend object, our code might
+>
+>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>>> corruption so far.
+>
+>>>
+>
+>>> Could somebody help the investigation and take a look into this?Â  Any
+>
+>>> suggestions would be appreciated.Â  Thanks!
+>
+>>
+>
+>> Possibly some memory region created by qxl is not being preserved.
+>
+>> Try adding these traces to see what is preserved:
+>
+>>
+>
+>> -trace enable='*cpr*'
+>
+>> -trace enable='*ram_alloc*'
+>
+>
+>
+> Also try adding this patch to see if it flags any ram blocks as not
+>
+> compatible with cpr.Â  A message is printed at migration start time.
+>
+> Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
+>
+> steven.sistare@oracle.com/
+>
+>
+>
+> - Steve
+>
+>
+>
+>
+With the traces enabled + the "migration: ram block cpr blockers" patch
+>
+applied:
+>
+>
+Source:
+>
+> cpr_find_fd pc.bios, id 0 returns -1
+>
+> cpr_save_fd pc.bios, id 0, fd 22
+>
+> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+> 0x7fec18e00000
+>
+> cpr_find_fd pc.rom, id 0 returns -1
+>
+> cpr_save_fd pc.rom, id 0, fd 23
+>
+> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+> 0x7fec18c00000
+>
+> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+> 24 host 0x7fec18a00000
+>
+> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+> fd 25 host 0x7feb77e00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27
+>
+> host 0x7fec18800000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+> fd 28 host 0x7feb73c00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34
+>
+> host 0x7fec18600000
+>
+> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd
+>
+> 35 host 0x7fec18200000
+>
+> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36
+>
+> host 0x7feb8b600000
+>
+> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host
+>
+> 0x7feb8b400000
+>
+>
+>
+> cpr_state_save cpr-transfer mode
+>
+> cpr_transfer_output /var/run/alma8cpr-dst.sock
+>
+>
+Target:
+>
+> cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+> cpr_state_load cpr-transfer mode
+>
+> cpr_find_fd pc.bios, id 0 returns 20
+>
+> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+> 0x7fcdc9800000
+>
+> cpr_find_fd pc.rom, id 0 returns 19
+>
+> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+> 0x7fcdc9600000
+>
+> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd
+>
+> 18 host 0x7fcdc9400000
+>
+> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864
+>
+> fd 17 host 0x7fcd27e00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16
+>
+> host 0x7fcdc9200000
+>
+> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864
+>
+> fd 15 host 0x7fcd23c00000
+>
+> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14
+>
+> host 0x7fcdc8800000
+>
+> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd
+>
+> 13 host 0x7fcdc8400000
+>
+> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11
+>
+> host 0x7fcdc8200000
+>
+> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host
+>
+> 0x7fcd3be00000
+>
+>
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+>
+addresses), and no incompatible ram blocks are found during migration.
+>
+Sorry, addressed are not the same, of course.  However corresponding ram
+blocks do seem to be preserved and initialized.
+
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+ Â Â Â Â  -machine q35 \
+ Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â  -nographic \
+ Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+ Â Â Â Â  -machine q35 \
+ Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â  -nographic \
+ Â Â Â Â  -device qxl-vga \
+ Â Â Â Â  -incoming tcp:0:44444 \
+ Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+ Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+ Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+ Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+ Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+BO")" != "" ]; then
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+ Â Â Â Â Â Â Â Â  fi
+ Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+ Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+ Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers" patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host 
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host 
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 24 
+host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd 
+25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 host 
+0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd 
+28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 host 
+0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 
+host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 host 
+0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host 
+0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host 
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host 
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 18 
+host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd 
+17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 host 
+0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd 
+15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 host 
+0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 
+host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 host 
+0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host 
+0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.  However corresponding ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+  qemu_ram_alloc_internal()
+    if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+        ram_flags |= RAM_READONLY;
+    new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+0001-hw-qxl-cpr-support-preliminary.patch
+Description:
+Text document
+
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+>
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+>
+> On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+>
+>> On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+>>> On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>>>>> Hi all,
+>
+>>>>>
+>
+>>>>> We've been experimenting with cpr-transfer migration mode recently
+>
+>>>>> and
+>
+>>>>> have discovered the following issue with the guest QXL driver:
+>
+>>>>>
+>
+>>>>> Run migration source:
+>
+>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>> ROOTFS=/path/to/image
+>
+>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>
+>
+>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>> Â Â Â Â Â  -machine q35 \
+>
+>>>>>> Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>> Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>>>> ram0,share=on\
+>
+>>>>>> Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>> Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>> Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>> Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>> Â Â Â Â Â  -nographic \
+>
+>>>>>> Â Â Â Â Â  -device qxl-vga
+>
+>>>>>
+>
+>>>>> Run migration target:
+>
+>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>> ROOTFS=/path/to/image
+>
+>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>> Â Â Â Â Â  -machine q35 \
+>
+>>>>>> Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>> Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+>
+>>>>>> ram0,share=on\
+>
+>>>>>> Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>> Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>> Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>> Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>> Â Â Â Â Â  -nographic \
+>
+>>>>>> Â Â Â Â Â  -device qxl-vga \
+>
+>>>>>> Â Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>>>>> Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>>>>
+>
+>>>>>
+>
+>>>>> Launch the migration:
+>
+>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>
+>
+>>>>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>>>>> Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>>>>> Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>>>>> {"channel-type":"cpr","addr":
+>
+>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>>>>> dst.sock"}}]
+>
+>>>>>> EOF
+>
+>>>>>
+>
+>>>>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>>>>> following messages:
+>
+>>>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>>>>> 0x00000001)
+>
+>>>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>>>>> allocate VRAM BO
+>
+>>>>>
+>
+>>>>> That seems to be a known kernel QXL driver bug:
+>
+>>>>>
+>
+>>>>>
+https://lore.kernel.org/all/20220907094423.93581-1-
+>
+>>>>> min_halo@163.com/T/
+>
+>>>>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>>>>
+>
+>>>>> (the latter discussion contains that reproduce script which speeds up
+>
+>>>>> the crash in the guest):
+>
+>>>>>> #!/bin/bash
+>
+>>>>>>
+>
+>>>>>> chvt 3
+>
+>>>>>>
+>
+>>>>>> for j in $(seq 80); do
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+>
+>>>>>> BO")" != "" ]; then
+>
+>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  fi
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>>>>> Â Â Â Â Â Â Â Â Â  done
+>
+>>>>>> done
+>
+>>>>>>
+>
+>>>>>> echo "bug could not be reproduced"
+>
+>>>>>> exit 0
+>
+>>>>>
+>
+>>>>> The bug itself seems to remain unfixed, as I was able to reproduce
+>
+>>>>> that
+>
+>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>>>>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>>>>> without the cpr-transfer migration the above reproduce doesn't
+>
+>>>>> lead to
+>
+>>>>> crash on the source VM.
+>
+>>>>>
+>
+>>>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+>
+>>>>> rather passes it through the memory backend object, our code might
+>
+>>>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>>>>> corruption so far.
+>
+>>>>>
+>
+>>>>> Could somebody help the investigation and take a look into this?Â  Any
+>
+>>>>> suggestions would be appreciated.Â  Thanks!
+>
+>>>>
+>
+>>>> Possibly some memory region created by qxl is not being preserved.
+>
+>>>> Try adding these traces to see what is preserved:
+>
+>>>>
+>
+>>>> -trace enable='*cpr*'
+>
+>>>> -trace enable='*ram_alloc*'
+>
+>>>
+>
+>>> Also try adding this patch to see if it flags any ram blocks as not
+>
+>>> compatible with cpr.Â  A message is printed at migration start time.
+>
+>>> Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+>
+>>> email-
+>
+>>> steven.sistare@oracle.com/
+>
+>>>
+>
+>>> - Steve
+>
+>>>
+>
+>>
+>
+>> With the traces enabled + the "migration: ram block cpr blockers" patch
+>
+>> applied:
+>
+>>
+>
+>> Source:
+>
+>>> cpr_find_fd pc.bios, id 0 returns -1
+>
+>>> cpr_save_fd pc.bios, id 0, fd 22
+>
+>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+>>> 0x7fec18e00000
+>
+>>> cpr_find_fd pc.rom, id 0 returns -1
+>
+>>> cpr_save_fd pc.rom, id 0, fd 23
+>
+>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+>>> 0x7fec18c00000
+>
+>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>> 262144 fd 24 host 0x7fec18a00000
+>
+>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>> 67108864 fd 25 host 0x7feb77e00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>> fd 27 host 0x7fec18800000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>> 67108864 fd 28 host 0x7feb73c00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>> fd 34 host 0x7fec18600000
+>
+>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>> 2097152 fd 35 host 0x7fec18200000
+>
+>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>> fd 36 host 0x7feb8b600000
+>
+>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>> 37 host 0x7feb8b400000
+>
+>>>
+>
+>>> cpr_state_save cpr-transfer mode
+>
+>>> cpr_transfer_output /var/run/alma8cpr-dst.sock
+>
+>>
+>
+>> Target:
+>
+>>> cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+>>> cpr_state_load cpr-transfer mode
+>
+>>> cpr_find_fd pc.bios, id 0 returns 20
+>
+>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+>>> 0x7fcdc9800000
+>
+>>> cpr_find_fd pc.rom, id 0 returns 19
+>
+>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+>>> 0x7fcdc9600000
+>
+>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>> 262144 fd 18 host 0x7fcdc9400000
+>
+>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>> 67108864 fd 17 host 0x7fcd27e00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>> fd 16 host 0x7fcdc9200000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>> 67108864 fd 15 host 0x7fcd23c00000
+>
+>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>> fd 14 host 0x7fcdc8800000
+>
+>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>> 2097152 fd 13 host 0x7fcdc8400000
+>
+>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>> fd 11 host 0x7fcdc8200000
+>
+>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>> 10 host 0x7fcd3be00000
+>
+>>
+>
+>> Looks like both vga.vram and qxl.vram are being preserved (with the same
+>
+>> addresses), and no incompatible ram blocks are found during migration.
+>
+>
+>
+> Sorry, addressed are not the same, of course.Â  However corresponding ram
+>
+> blocks do seem to be preserved and initialized.
+>
+>
+So far, I have not reproduced the guest driver failure.
+>
+>
+However, I have isolated places where new QEMU improperly writes to
+>
+the qxl memory regions prior to starting the guest, by mmap'ing them
+>
+readonly after cpr:
+>
+>
+Â  qemu_ram_alloc_internal()
+>
+Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+>
+Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+>
+Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+>
+>
+I have attached a draft fix; try it and let me know.
+>
+My console window looks fine before and after cpr, using
+>
+-vnc $hostip:0 -vga qxl
+>
+>
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.  Could it
+happen on your stand as well?  Could you try launching VM with
+"-nographic -device qxl-vga"?  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+
+As for your patch, I can report that it doesn't resolve the issue as it
+is.  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+
+>
+Program terminated with signal SIGSEGV, Segmentation fault.
+>
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+412         d->ram->magic       = cpu_to_le32(QXL_RAM_MAGIC);
+>
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+>
+(gdb) bt
+>
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+#1  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+>
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+>
+#2  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+>
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+>
+#3  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+>
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+>
+#4  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+>
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+>
+#5  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+>
+v=0x5638996f3770, name=0x56389759b141 "realized", opaque=0x5638987893d0,
+>
+errp=0x7ffd3c2b84e0)
+>
+at ../qom/object.c:2374
+>
+#6  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+>
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+>
+at ../qom/object.c:1449
+>
+#7  0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70,
+>
+name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0)
+>
+at ../qom/qom-qobject.c:28
+>
+#8  0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70,
+>
+name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0)
+>
+at ../qom/object.c:1519
+>
+#9  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+>
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+>
+#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50,
+>
+from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714
+>
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+>
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+>
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150,
+>
+errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207
+>
+#13 0x000056389737a6cc in qemu_opts_foreach
+>
+(list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+>
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+>
+at ../util/qemu-option.c:1135
+>
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745
+>
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+>
+<error_fatal>) at ../system/vl.c:2806
+>
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at
+>
+../system/vl.c:3838
+>
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at
+>
+../system/main.c:72
+So the attached adjusted version of your patch does seem to help.  At
+least I can't reproduce the crash on my stand.
+
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+
+Andrey
+0001-hw-qxl-cpr-support-preliminary.patch
+Description:
+Text Data
+
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â  -nographic \
+ Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+ Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â  -nographic \
+ Â Â Â Â Â  -device qxl-vga \
+ Â Â Â Â Â  -incoming tcp:0:44444 \
+ Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+ Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+ Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+ Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+ Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate VRAM
+BO")" != "" ]; then
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+ Â Â Â Â Â Â Â Â Â  fi
+ Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+ Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+ Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers" patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+ Â  qemu_ram_alloc_internal()
+ Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+ Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+ Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.  To test, I specify
+port 0 for the source VM and port 1 for the dest.  When the src vnc goes
+dormant the dest vnc becomes active.
+Could you try launching VM with
+"-nographic -device qxl-vga"?  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver crash,
+and I suspect my guest image+kernel is too old.  However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412         d->ram->magic       = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, 
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, 
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, 
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, value=true, 
+errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, v=0x5638996f3770, 
+name=0x56389759b141 "realized", opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+     at ../qom/object.c:2374
+#6  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, name=0x56389759b141 
+"realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+     at ../qom/object.c:1449
+#7  0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, 
+name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0)
+     at ../qom/qom-qobject.c:28
+#8  0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, 
+name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0)
+     at ../qom/object.c:1519
+#9  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, bus=0x563898cf3c20, 
+errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, 
+from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, errp=0x56389855dc40 
+<error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, 
+errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca <device_init_func>, 
+opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+     at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at 
+../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at 
+../system/main.c:72
+So the attached adjusted version of your patch does seem to help.  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram are
+definitely harmful.  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large memory 
+region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is guaranteed to
+be the same:
+  qxl_post_load()
+    qxl_set_mode()
+      d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in vmstate
+from old qemu.
+
+- Steve
+0001-hw-qxl-cpr-support-preliminary-V2.patch
+Description:
+Text document
+
+On 3/5/25 22:19, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â  -object
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â  -object
+memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â  -device qxl-vga \
+Â Â Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing
+the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR*
+failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to
+allocate VRAM
+BO")" != "" ]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the
+crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with
+the same
+addresses), and no incompatible ram blocks are found during
+migration.
+Sorry, addressed are not the same, of course.Â  However
+corresponding ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+Â Â  qemu_ram_alloc_internal()
+Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.Â  However, once I
+realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at
+../system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at
+../system/vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at
+../system/vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at
+../system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in
+init_qxl_ram are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+Good point. Though we could move this code under non-default option to
+avoid re-writing.
+
+Den
+
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+>
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+>
+> On 3/4/25 9:05 PM, Steven Sistare wrote:
+>
+>> On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+>
+>>> On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+>
+>>>> On 2/28/25 8:20 PM, Steven Sistare wrote:
+>
+>>>>> On 2/28/2025 1:13 PM, Steven Sistare wrote:
+>
+>>>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+>
+>>>>>>> Hi all,
+>
+>>>>>>>
+>
+>>>>>>> We've been experimenting with cpr-transfer migration mode recently
+>
+>>>>>>> and
+>
+>>>>>>> have discovered the following issue with the guest QXL driver:
+>
+>>>>>>>
+>
+>>>>>>> Run migration source:
+>
+>>>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>>>> ROOTFS=/path/to/image
+>
+>>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>>>
+>
+>>>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine q35 \
+>
+>>>>>>>> Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>>>> Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+>
+>>>>>>>> dev/shm/
+>
+>>>>>>>> ram0,share=on\
+>
+>>>>>>>> Â Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>>>> Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>>>> Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>>>> Â Â Â Â Â Â  -nographic \
+>
+>>>>>>>> Â Â Â Â Â Â  -device qxl-vga
+>
+>>>>>>>
+>
+>>>>>>> Run migration target:
+>
+>>>>>>>> EMULATOR=/path/to/emulator
+>
+>>>>>>>> ROOTFS=/path/to/image
+>
+>>>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock
+>
+>>>>>>>> $EMULATOR -enable-kvm \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine q35 \
+>
+>>>>>>>> Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+>
+>>>>>>>> Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+>
+>>>>>>>> dev/shm/
+>
+>>>>>>>> ram0,share=on\
+>
+>>>>>>>> Â Â Â Â Â Â  -machine memory-backend=ram0 \
+>
+>>>>>>>> Â Â Â Â Â Â  -machine aux-ram-share=on \
+>
+>>>>>>>> Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+>
+>>>>>>>> Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+>
+>>>>>>>> Â Â Â Â Â Â  -nographic \
+>
+>>>>>>>> Â Â Â Â Â Â  -device qxl-vga \
+>
+>>>>>>>> Â Â Â Â Â Â  -incoming tcp:0:44444 \
+>
+>>>>>>>> Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+>
+>>>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+>
+>>>>>>>
+>
+>>>>>>>
+>
+>>>>>>> Launch the migration:
+>
+>>>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+>
+>>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock
+>
+>>>>>>>>
+>
+>>>>>>>> $QMPSHELL -p $QMPSOCK <<EOF
+>
+>>>>>>>> Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+>
+>>>>>>>> Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+>
+>>>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}},
+>
+>>>>>>>> {"channel-type":"cpr","addr":
+>
+>>>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+>
+>>>>>>>> dst.sock"}}]
+>
+>>>>>>>> EOF
+>
+>>>>>>>
+>
+>>>>>>> Then, after a while, QXL guest driver on target crashes spewing the
+>
+>>>>>>> following messages:
+>
+>>>>>>>> [Â Â  73.962002] [TTM] Buffer eviction failed
+>
+>>>>>>>> [Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+>
+>>>>>>>> 0x00000001)
+>
+>>>>>>>> [Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+>
+>>>>>>>> allocate VRAM BO
+>
+>>>>>>>
+>
+>>>>>>> That seems to be a known kernel QXL driver bug:
+>
+>>>>>>>
+>
+>>>>>>>
+https://lore.kernel.org/all/20220907094423.93581-1-
+>
+>>>>>>> min_halo@163.com/T/
+>
+>>>>>>>
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+>
+>>>>>>>
+>
+>>>>>>> (the latter discussion contains that reproduce script which
+>
+>>>>>>> speeds up
+>
+>>>>>>> the crash in the guest):
+>
+>>>>>>>> #!/bin/bash
+>
+>>>>>>>>
+>
+>>>>>>>> chvt 3
+>
+>>>>>>>>
+>
+>>>>>>>> for j in $(seq 80); do
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+>
+>>>>>>>> VRAM
+>
+>>>>>>>> BO")" != "" ]; then
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  fi
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+>
+>>>>>>>> Â Â Â Â Â Â Â Â Â Â  done
+>
+>>>>>>>> done
+>
+>>>>>>>>
+>
+>>>>>>>> echo "bug could not be reproduced"
+>
+>>>>>>>> exit 0
+>
+>>>>>>>
+>
+>>>>>>> The bug itself seems to remain unfixed, as I was able to reproduce
+>
+>>>>>>> that
+>
+>>>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+>
+>>>>>>> cpr-transfer code also seems to be buggy as it triggers the crash -
+>
+>>>>>>> without the cpr-transfer migration the above reproduce doesn't
+>
+>>>>>>> lead to
+>
+>>>>>>> crash on the source VM.
+>
+>>>>>>>
+>
+>>>>>>> I suspect that, as cpr-transfer doesn't migrate the guest
+>
+>>>>>>> memory, but
+>
+>>>>>>> rather passes it through the memory backend object, our code might
+>
+>>>>>>> somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+>
+>>>>>>> corruption so far.
+>
+>>>>>>>
+>
+>>>>>>> Could somebody help the investigation and take a look into
+>
+>>>>>>> this?Â  Any
+>
+>>>>>>> suggestions would be appreciated.Â  Thanks!
+>
+>>>>>>
+>
+>>>>>> Possibly some memory region created by qxl is not being preserved.
+>
+>>>>>> Try adding these traces to see what is preserved:
+>
+>>>>>>
+>
+>>>>>> -trace enable='*cpr*'
+>
+>>>>>> -trace enable='*ram_alloc*'
+>
+>>>>>
+>
+>>>>> Also try adding this patch to see if it flags any ram blocks as not
+>
+>>>>> compatible with cpr.Â  A message is printed at migration start time.
+>
+>>>>> Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+>
+>>>>> email-
+>
+>>>>> steven.sistare@oracle.com/
+>
+>>>>>
+>
+>>>>> - Steve
+>
+>>>>>
+>
+>>>>
+>
+>>>> With the traces enabled + the "migration: ram block cpr blockers"
+>
+>>>> patch
+>
+>>>> applied:
+>
+>>>>
+>
+>>>> Source:
+>
+>>>>> cpr_find_fd pc.bios, id 0 returns -1
+>
+>>>>> cpr_save_fd pc.bios, id 0, fd 22
+>
+>>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+>
+>>>>> 0x7fec18e00000
+>
+>>>>> cpr_find_fd pc.rom, id 0 returns -1
+>
+>>>>> cpr_save_fd pc.rom, id 0, fd 23
+>
+>>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+>
+>>>>> 0x7fec18c00000
+>
+>>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+>
+>>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>>>> 262144 fd 24 host 0x7fec18a00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 25 host 0x7feb77e00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>>>> fd 27 host 0x7fec18800000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 28 host 0x7feb73c00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+>
+>>>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>>>> fd 34 host 0x7fec18600000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+>
+>>>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>>>> 2097152 fd 35 host 0x7fec18200000
+>
+>>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+>
+>>>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>>>> fd 36 host 0x7feb8b600000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+>
+>>>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>>>> 37 host 0x7feb8b400000
+>
+>>>>>
+>
+>>>>> cpr_state_save cpr-transfer mode
+>
+>>>>> cpr_transfer_output /var/run/alma8cpr-dst.sock
+>
+>>>>
+>
+>>>> Target:
+>
+>>>>> cpr_transfer_input /var/run/alma8cpr-dst.sock
+>
+>>>>> cpr_state_load cpr-transfer mode
+>
+>>>>> cpr_find_fd pc.bios, id 0 returns 20
+>
+>>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+>
+>>>>> 0x7fcdc9800000
+>
+>>>>> cpr_find_fd pc.rom, id 0 returns 19
+>
+>>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+>
+>>>>> 0x7fcdc9600000
+>
+>>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+>
+>>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+>
+>>>>> 262144 fd 18 host 0x7fcdc9400000
+>
+>>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 17 host 0x7fcd27e00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+>
+>>>>> fd 16 host 0x7fcdc9200000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+>
+>>>>> 67108864 fd 15 host 0x7fcd23c00000
+>
+>>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+>
+>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+>
+>>>>> fd 14 host 0x7fcdc8800000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+>
+>>>>> 2097152 fd 13 host 0x7fcdc8400000
+>
+>>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+>
+>>>>> fd 11 host 0x7fcdc8200000
+>
+>>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+>
+>>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+>
+>>>>> 10 host 0x7fcd3be00000
+>
+>>>>
+>
+>>>> Looks like both vga.vram and qxl.vram are being preserved (with the
+>
+>>>> same
+>
+>>>> addresses), and no incompatible ram blocks are found during migration.
+>
+>>>
+>
+>>> Sorry, addressed are not the same, of course.Â  However corresponding
+>
+>>> ram
+>
+>>> blocks do seem to be preserved and initialized.
+>
+>>
+>
+>> So far, I have not reproduced the guest driver failure.
+>
+>>
+>
+>> However, I have isolated places where new QEMU improperly writes to
+>
+>> the qxl memory regions prior to starting the guest, by mmap'ing them
+>
+>> readonly after cpr:
+>
+>>
+>
+>> Â Â  qemu_ram_alloc_internal()
+>
+>> Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+>
+>> Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+>
+>> Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+>
+>>
+>
+>> I have attached a draft fix; try it and let me know.
+>
+>> My console window looks fine before and after cpr, using
+>
+>> -vnc $hostip:0 -vga qxl
+>
+>>
+>
+>> - Steve
+>
+>
+>
+> Regarding the reproduce: when I launch the buggy version with the same
+>
+> options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+>
+> my VNC client silently hangs on the target after a while.Â  Could it
+>
+> happen on your stand as well?Â
+>
+>
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+>
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+>
+dormant the dest vnc becomes active.
+>
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+
+>
+> Could you try launching VM with
+>
+> "-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+>
+> directly in the shell, so when qxl driver crashes you're still able to
+>
+> inspect the kernel messages.
+>
+>
+I have been running like that, but have not reproduced the qxl driver
+>
+crash,
+>
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+
+
+>
+However, once I realized the
+>
+issue was post-cpr modification of qxl memory, I switched my attention
+>
+to the
+>
+fix.
+>
+>
+> As for your patch, I can report that it doesn't resolve the issue as it
+>
+> is.Â  But I was able to track down another possible memory corruption
+>
+> using your approach with readonly mmap'ing:
+>
+>
+>
+>> Program terminated with signal SIGSEGV, Segmentation fault.
+>
+>> #0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+>> 412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+>
+>> [Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+>
+>> (gdb) bt
+>
+>> #0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+>
+>> #1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+>
+>> errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+>
+>> #2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+>
+>> errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+>
+>> #3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+>
+>> errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+>
+>> #4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+>
+>> value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+>
+>> #5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+>
+>> v=0x5638996f3770, name=0x56389759b141 "realized",
+>
+>> opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/object.c:2374
+>
+>> #6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+>
+>> name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/object.c:1449
+>
+>> #7Â  0x00005638970f8586 in object_property_set_qobject
+>
+>> (obj=0x5638996e0e70, name=0x56389759b141 "realized",
+>
+>> value=0x5638996df900, errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/qom-qobject.c:28
+>
+>> #8Â  0x00005638970f3d8d in object_property_set_bool
+>
+>> (obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+>
+>> errp=0x7ffd3c2b84e0)
+>
+>> Â Â Â Â  at ../qom/object.c:1519
+>
+>> #9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+>
+>> bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+>
+>> #10 0x0000563896dba675 in qdev_device_add_from_qdict
+>
+>> (opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+>
+>> system/qdev-monitor.c:714
+>
+>> #11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+>
+>> errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+>
+>> #12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+>
+>> opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+>
+>> vl.c:1207
+>
+>> #13 0x000056389737a6cc in qemu_opts_foreach
+>
+>> Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+>
+>> <device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+>
+>> Â Â Â Â  at ../util/qemu-option.c:1135
+>
+>> #14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+>
+>> vl.c:2745
+>
+>> #15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+>
+>> <error_fatal>) at ../system/vl.c:2806
+>
+>> #16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+>
+>> at ../system/vl.c:3838
+>
+>> #17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+>
+>> system/main.c:72
+>
+>
+>
+> So the attached adjusted version of your patch does seem to help.Â  At
+>
+> least I can't reproduce the crash on my stand.
+>
+>
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+>
+are
+>
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+>
+of init_qxl_ram that modify guest memory.
+>
+Thanks, your v2 patch does seem to prevent the crash.  Would you re-send
+it to the list as a proper fix?
+
+>
+> I'm wondering, could it be useful to explicitly mark all the reused
+>
+> memory regions readonly upon cpr-transfer, and then make them writable
+>
+> back again after the migration is done?Â  That way we will be segfaulting
+>
+> early on instead of debugging tricky memory corruptions.
+>
+>
+It's a useful debugging technique, but changing protection on a large
+>
+memory region
+>
+can be too expensive for production due to TLB shootdowns.
+>
+>
+Also, there are cases where writes are performed but the value is
+>
+guaranteed to
+>
+be the same:
+>
+Â  qxl_post_load()
+>
+Â Â Â  qxl_set_mode()
+>
+Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+>
+The value is the same because mode and shadow_rom.mode were passed in
+>
+vmstate
+>
+from old qemu.
+>
+There're also cases where devices' ROM might be re-initialized.  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+
+>
+Program terminated with signal SIGSEGV, Segmentation fault.
+>
+#0  __memmove_avx_unaligned_erms () at
+>
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+>
+664             rep     movsb
+>
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+>
+(gdb) bt
+>
+#0  __memmove_avx_unaligned_erms () at
+>
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+>
+#1  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380,
+>
+owner=0x55aa2019ac10, name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+>
+at ../hw/core/loader.c:1032
+>
+#2  0x000055aa1d031577 in rom_add_blob
+>
+(name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072,
+>
+max_len=2097152, addr=18446744073709551615, fw_file_name=0x55aa1da51f13
+>
+"etc/acpi/tables", fw_callback=0x55aa1d441f59 <acpi_build_update>,
+>
+callback_opaque=0x55aa20ff0010, as=0x0, read_only=true) at
+>
+../hw/core/loader.c:1147
+>
+#3  0x000055aa1cfd788d in acpi_add_rom_blob
+>
+(update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010,
+>
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at
+>
+../hw/acpi/utils.c:46
+>
+#4  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+>
+#5  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0)
+>
+at ../hw/i386/pc.c:638
+>
+#6  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10
+>
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+>
+#7  0x000055aa1d039ee5 in qdev_machine_creation_done () at
+>
+../hw/core/machine.c:1749
+>
+#8  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40
+>
+<error_fatal>) at ../system/vl.c:2779
+>
+#9  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40
+>
+<error_fatal>) at ../system/vl.c:2807
+>
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at
+>
+../system/vl.c:3838
+>
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at
+>
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+
+Andrey
+
+On 3/6/25 16:16, Andrey Drobyshev wrote:
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+ Â Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â Â  -nographic \
+ Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+ Â Â Â Â Â Â  -machine q35 \
+ Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+ Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+ Â Â Â Â Â Â  -machine memory-backend=ram0 \
+ Â Â Â Â Â Â  -machine aux-ram-share=on \
+ Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+ Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+ Â Â Â Â Â Â  -nographic \
+ Â Â Â Â Â Â  -device qxl-vga \
+ Â Â Â Â Â Â  -incoming tcp:0:44444 \
+ Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+ Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+ Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+ Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+ Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+VRAM
+BO")" != "" ]; then
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+ Â Â Â Â Â Â Â Â Â Â  fi
+ Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+ Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+ Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the
+same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding
+ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+ Â Â  qemu_ram_alloc_internal()
+ Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+ Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+ Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+ Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+ Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+ Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+Thanks, your v2 patch does seem to prevent the crash.  Would you re-send
+it to the list as a proper fix?
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is
+guaranteed to
+be the same:
+ Â  qxl_post_load()
+ Â Â Â  qxl_set_mode()
+ Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in
+vmstate
+from old qemu.
+There're also cases where devices' ROM might be re-initialized.  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+664             rep     movsb
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+(gdb) bt
+#0  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+#1  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
+name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+     at ../hw/core/loader.c:1032
+#2  0x000055aa1d031577 in rom_add_blob
+     (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
+addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
+fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
+read_only=true) at ../hw/core/loader.c:1147
+#3  0x000055aa1cfd788d in acpi_add_rom_blob
+     (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
+#4  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+#5  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
+at ../hw/i386/pc.c:638
+#6  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+#7  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
+../hw/core/machine.c:1749
+#8  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2779
+#9  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2807
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
+../system/vl.c:3838
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+
+Andrey
+No way. ACPI with the source must be used in the same way as BIOSes
+and optional ROMs.
+
+Den
+
+On 3/6/2025 10:52 AM, Denis V. Lunev wrote:
+On 3/6/25 16:16, Andrey Drobyshev wrote:
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga \
+Â Â Â Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+VRAM
+BO")" != "" ]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+Â Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the
+same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding
+ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+Â Â Â  qemu_ram_alloc_internal()
+Â Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+Â Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+Â Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.Â  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+Â Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+Â Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+Thanks, your v2 patch does seem to prevent the crash.Â  Would you re-send
+it to the list as a proper fix?
+Yes.  Was waiting for your confirmation.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is
+guaranteed to
+be the same:
+Â Â  qxl_post_load()
+Â Â Â Â  qxl_set_mode()
+Â Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in
+vmstate
+from old qemu.
+There're also cases where devices' ROM might be re-initialized.Â  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+664Â Â Â Â Â Â Â Â Â Â Â Â  repÂ Â Â Â  movsb
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+(gdb) bt
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+#1Â  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
+name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+Â Â Â Â  at ../hw/core/loader.c:1032
+#2Â  0x000055aa1d031577 in rom_add_blob
+Â Â Â Â  (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
+addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
+fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
+read_only=true) at ../hw/core/loader.c:1147
+#3Â  0x000055aa1cfd788d in acpi_add_rom_blob
+Â Â Â Â  (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
+#4Â  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+#5Â  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
+at ../hw/i386/pc.c:638
+#6Â  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+#7Â  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
+../hw/core/machine.c:1749
+#8Â  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2779
+#9Â  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2807
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
+../system/vl.c:3838
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.Â  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.Â  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+I'll look into adding an option, but there may be too many false positives,
+such as the qxl_set_mode case above.  And the maintainers may object to me
+eliminating the false positives by adding more CPR_IN tests, due to gratuitous
+(from their POV) ugliness.
+
+But I will use the technique to look for more write violations.
+Andrey
+No way. ACPI with the source must be used in the same way as BIOSes
+and optional ROMs.
+Yup, its a bug.  Will fix.
+
+- Steve
+
+see
+1741380954-341079-1-git-send-email-steven.sistare@oracle.com
+/">https://lore.kernel.org/qemu-devel/
+1741380954-341079-1-git-send-email-steven.sistare@oracle.com
+/
+- Steve
+
+On 3/6/2025 11:13 AM, Steven Sistare wrote:
+On 3/6/2025 10:52 AM, Denis V. Lunev wrote:
+On 3/6/25 16:16, Andrey Drobyshev wrote:
+On 3/5/25 11:19 PM, Steven Sistare wrote:
+On 3/5/2025 11:50 AM, Andrey Drobyshev wrote:
+On 3/4/25 9:05 PM, Steven Sistare wrote:
+On 2/28/2025 1:37 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:35 PM, Andrey Drobyshev wrote:
+On 2/28/25 8:20 PM, Steven Sistare wrote:
+On 2/28/2025 1:13 PM, Steven Sistare wrote:
+On 2/28/2025 12:39 PM, Andrey Drobyshev wrote:
+Hi all,
+
+We've been experimenting with cpr-transfer migration mode recently
+and
+have discovered the following issue with the guest QXL driver:
+
+Run migration source:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga
+Run migration target:
+EMULATOR=/path/to/emulator
+ROOTFS=/path/to/image
+QMPSOCK=/var/run/alma8qmp-dst.sock
+$EMULATOR -enable-kvm \
+Â Â Â Â Â Â Â  -machine q35 \
+Â Â Â Â Â Â Â  -cpu host -smp 2 -m 2G \
+Â Â Â Â Â Â Â  -object memory-backend-file,id=ram0,size=2G,mem-path=/
+dev/shm/
+ram0,share=on\
+Â Â Â Â Â Â Â  -machine memory-backend=ram0 \
+Â Â Â Â Â Â Â  -machine aux-ram-share=on \
+Â Â Â Â Â Â Â  -drive file=$ROOTFS,media=disk,if=virtio \
+Â Â Â Â Â Â Â  -qmp unix:$QMPSOCK,server=on,wait=off \
+Â Â Â Â Â Â Â  -nographic \
+Â Â Â Â Â Â Â  -device qxl-vga \
+Â Â Â Â Â Â Â  -incoming tcp:0:44444 \
+Â Â Â Â Â Â Â  -incoming '{"channel-type": "cpr", "addr": { "transport":
+"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}'
+Launch the migration:
+QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell
+QMPSOCK=/var/run/alma8qmp-src.sock
+
+$QMPSHELL -p $QMPSOCK <<EOF
+Â Â Â Â Â Â Â  migrate-set-parameters mode=cpr-transfer
+Â Â Â Â Â Â Â  migrate channels=[{"channel-type":"main","addr":
+{"transport":"socket","type":"inet","host":"0","port":"44444"}},
+{"channel-type":"cpr","addr":
+{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-
+dst.sock"}}]
+EOF
+Then, after a while, QXL guest driver on target crashes spewing the
+following messages:
+[Â Â  73.962002] [TTM] Buffer eviction failed
+[Â Â  73.962072] qxl 0000:00:02.0: object_init failed for (3149824,
+0x00000001)
+[Â Â  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to
+allocate VRAM BO
+That seems to be a known kernel QXL driver bug:
+https://lore.kernel.org/all/20220907094423.93581-1-
+min_halo@163.com/T/
+https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/
+(the latter discussion contains that reproduce script which
+speeds up
+the crash in the guest):
+#!/bin/bash
+
+chvt 3
+
+for j in $(seq 80); do
+Â Â Â Â Â Â Â Â Â Â Â  echo "$(date) starting round $j"
+Â Â Â Â Â Â Â Â Â Â Â  if [ "$(journalctl --boot | grep "failed to allocate
+VRAM
+BO")" != "" ]; then
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  echo "bug was reproduced after $j tries"
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  exit 1
+Â Â Â Â Â Â Â Â Â Â Â  fi
+Â Â Â Â Â Â Â Â Â Â Â  for i in $(seq 100); do
+Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  dmesg > /dev/tty3
+Â Â Â Â Â Â Â Â Â Â Â  done
+done
+
+echo "bug could not be reproduced"
+exit 0
+The bug itself seems to remain unfixed, as I was able to reproduce
+that
+with Fedora 41 guest, as well as AlmaLinux 8 guest. However our
+cpr-transfer code also seems to be buggy as it triggers the crash -
+without the cpr-transfer migration the above reproduce doesn't
+lead to
+crash on the source VM.
+
+I suspect that, as cpr-transfer doesn't migrate the guest
+memory, but
+rather passes it through the memory backend object, our code might
+somehow corrupt the VRAM.Â  However, I wasn't able to trace the
+corruption so far.
+
+Could somebody help the investigation and take a look into
+this?Â  Any
+suggestions would be appreciated.Â  Thanks!
+Possibly some memory region created by qxl is not being preserved.
+Try adding these traces to see what is preserved:
+
+-trace enable='*cpr*'
+-trace enable='*ram_alloc*'
+Also try adding this patch to see if it flags any ram blocks as not
+compatible with cpr.Â  A message is printed at migration start time.
+Â Â Â Â
+https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-
+email-
+steven.sistare@oracle.com/
+
+- Steve
+With the traces enabled + the "migration: ram block cpr blockers"
+patch
+applied:
+
+Source:
+cpr_find_fd pc.bios, id 0 returns -1
+cpr_save_fd pc.bios, id 0, fd 22
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host
+0x7fec18e00000
+cpr_find_fd pc.rom, id 0 returns -1
+cpr_save_fd pc.rom, id 0, fd 23
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host
+0x7fec18c00000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1
+cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 24 host 0x7fec18a00000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 25 host 0x7feb77e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 27 host 0x7fec18800000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 28 host 0x7feb73c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1
+cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 34 host 0x7fec18600000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 35 host 0x7fec18200000
+cpr_find_fd /rom@etc/table-loader, id 0 returns -1
+cpr_save_fd /rom@etc/table-loader, id 0, fd 36
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 36 host 0x7feb8b600000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1
+cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+37 host 0x7feb8b400000
+
+cpr_state_save cpr-transfer mode
+cpr_transfer_output /var/run/alma8cpr-dst.sock
+Target:
+cpr_transfer_input /var/run/alma8cpr-dst.sock
+cpr_state_load cpr-transfer mode
+cpr_find_fd pc.bios, id 0 returns 20
+qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host
+0x7fcdc9800000
+cpr_find_fd pc.rom, id 0 returns 19
+qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host
+0x7fcdc9600000
+cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18
+qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size
+262144 fd 18 host 0x7fcdc9400000
+cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17
+qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size
+67108864 fd 17 host 0x7fcd27e00000
+cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192
+fd 16 host 0x7fcdc9200000
+cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15
+qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size
+67108864 fd 15 host 0x7fcd23c00000
+cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14
+qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536
+fd 14 host 0x7fcdc8800000
+cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13
+qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size
+2097152 fd 13 host 0x7fcdc8400000
+cpr_find_fd /rom@etc/table-loader, id 0 returns 11
+qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536
+fd 11 host 0x7fcdc8200000
+cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10
+qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd
+10 host 0x7fcd3be00000
+Looks like both vga.vram and qxl.vram are being preserved (with the
+same
+addresses), and no incompatible ram blocks are found during migration.
+Sorry, addressed are not the same, of course.Â  However corresponding
+ram
+blocks do seem to be preserved and initialized.
+So far, I have not reproduced the guest driver failure.
+
+However, I have isolated places where new QEMU improperly writes to
+the qxl memory regions prior to starting the guest, by mmap'ing them
+readonly after cpr:
+
+Â Â Â  qemu_ram_alloc_internal()
+Â Â Â Â Â  if (reused && (strstr(name, "qxl") || strstr("name", "vga")))
+Â Â Â Â Â Â Â Â Â  ram_flags |= RAM_READONLY;
+Â Â Â Â Â  new_block = qemu_ram_alloc_from_fd(...)
+
+I have attached a draft fix; try it and let me know.
+My console window looks fine before and after cpr, using
+-vnc $hostip:0 -vga qxl
+
+- Steve
+Regarding the reproduce: when I launch the buggy version with the same
+options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer,
+my VNC client silently hangs on the target after a while.Â  Could it
+happen on your stand as well?
+cpr does not preserve the vnc connection and session.Â  To test, I specify
+port 0 for the source VM and port 1 for the dest.Â  When the src vnc goes
+dormant the dest vnc becomes active.
+Sure, I meant that VNC on the dest (on the port 1) works for a while
+after the migration and then hangs, apparently after the guest QXL crash.
+Could you try launching VM with
+"-nographic -device qxl-vga"?Â  That way VM's serial console is given you
+directly in the shell, so when qxl driver crashes you're still able to
+inspect the kernel messages.
+I have been running like that, but have not reproduced the qxl driver
+crash,
+and I suspect my guest image+kernel is too old.
+Yes, that's probably the case.Â  But the crash occurs on my Fedora 41
+guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to
+be buggy.
+However, once I realized the
+issue was post-cpr modification of qxl memory, I switched my attention
+to the
+fix.
+As for your patch, I can report that it doesn't resolve the issue as it
+is.Â  But I was able to track down another possible memory corruption
+using your approach with readonly mmap'ing:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+412Â Â Â Â Â Â Â Â  d->ram->magicÂ Â Â Â Â Â  = cpu_to_le32(QXL_RAM_MAGIC);
+[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))]
+(gdb) bt
+#0Â  init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412
+#1Â  0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70,
+errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142
+#2Â  0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70,
+errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257
+#3Â  0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70,
+errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174
+#4Â  0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70,
+value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494
+#5Â  0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70,
+v=0x5638996f3770, name=0x56389759b141 "realized",
+opaque=0x5638987893d0, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:2374
+#6Â  0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70,
+name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1449
+#7Â  0x00005638970f8586 in object_property_set_qobject
+(obj=0x5638996e0e70, name=0x56389759b141 "realized",
+value=0x5638996df900, errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/qom-qobject.c:28
+#8Â  0x00005638970f3d8d in object_property_set_bool
+(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true,
+errp=0x7ffd3c2b84e0)
+Â Â Â Â Â  at ../qom/object.c:1519
+#9Â  0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70,
+bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276
+#10 0x0000563896dba675 in qdev_device_add_from_qdict
+(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../
+system/qdev-monitor.c:714
+#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150,
+errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733
+#12 0x0000563896dc48f1 in device_init_func (opaque=0x0,
+opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/
+vl.c:1207
+#13 0x000056389737a6cc in qemu_opts_foreach
+Â Â Â Â Â  (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca
+<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>)
+Â Â Â Â Â  at ../util/qemu-option.c:1135
+#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/
+vl.c:2745
+#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40
+<error_fatal>) at ../system/vl.c:2806
+#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948)
+at ../system/vl.c:3838
+#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../
+system/main.c:72
+So the attached adjusted version of your patch does seem to help.Â  At
+least I can't reproduce the crash on my stand.
+Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram
+are
+definitely harmful.Â  Try V2 of the patch, attached, which skips the lines
+of init_qxl_ram that modify guest memory.
+Thanks, your v2 patch does seem to prevent the crash.Â  Would you re-send
+it to the list as a proper fix?
+Yes.Â  Was waiting for your confirmation.
+I'm wondering, could it be useful to explicitly mark all the reused
+memory regions readonly upon cpr-transfer, and then make them writable
+back again after the migration is done?Â  That way we will be segfaulting
+early on instead of debugging tricky memory corruptions.
+It's a useful debugging technique, but changing protection on a large
+memory region
+can be too expensive for production due to TLB shootdowns.
+
+Also, there are cases where writes are performed but the value is
+guaranteed to
+be the same:
+Â Â  qxl_post_load()
+Â Â Â Â  qxl_set_mode()
+Â Â Â Â Â Â  d->rom->mode = cpu_to_le32(modenr);
+The value is the same because mode and shadow_rom.mode were passed in
+vmstate
+from old qemu.
+There're also cases where devices' ROM might be re-initialized.Â  E.g.
+this segfault occures upon further exploration of RO mapped RAM blocks:
+Program terminated with signal SIGSEGV, Segmentation fault.
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+664Â Â Â Â Â Â Â Â Â Â Â Â  repÂ Â Â Â  movsb
+[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))]
+(gdb) bt
+#0Â  __memmove_avx_unaligned_erms () at 
+../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664
+#1Â  0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, 
+name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true)
+Â Â Â Â  at ../hw/core/loader.c:1032
+#2Â  0x000055aa1d031577 in rom_add_blob
+Â Â Â Â  (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, 
+addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", 
+fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, 
+read_only=true) at ../hw/core/loader.c:1147
+#3Â  0x000055aa1cfd788d in acpi_add_rom_blob
+Â Â Â Â  (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, 
+blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46
+#4Â  0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720
+#5Â  0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) 
+at ../hw/i386/pc.c:638
+#6Â  0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 
+<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39
+#7Â  0x000055aa1d039ee5 in qdev_machine_creation_done () at 
+../hw/core/machine.c:1749
+#8Â  0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2779
+#9Â  0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 
+<error_fatal>) at ../system/vl.c:2807
+#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at 
+../system/vl.c:3838
+#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at 
+../system/main.c:72
+I'm not sure whether ACPI tables ROM in particular is rewritten with the
+same content, but there might be cases where ROM can be read from file
+system upon initialization.Â  That is undesirable as guest kernel
+certainly won't be too happy about sudden change of the device's ROM
+content.
+
+So the issue we're dealing with here is any unwanted memory related
+device initialization upon cpr.
+
+For now the only thing that comes to my mind is to make a test where we
+put as many devices as we can into a VM, make ram blocks RO upon cpr
+(and remap them as RW later after migration is done, if needed), and
+catch any unwanted memory violations.Â  As Den suggested, we might
+consider adding that behaviour as a separate non-default option (or
+"migrate" command flag specific to cpr-transfer), which would only be
+used in the testing.
+I'll look into adding an option, but there may be too many false positives,
+such as the qxl_set_mode case above.Â  And the maintainers may object to me
+eliminating the false positives by adding more CPR_IN tests, due to gratuitous
+(from their POV) ugliness.
+
+But I will use the technique to look for more write violations.
+Andrey
+No way. ACPI with the source must be used in the same way as BIOSes
+and optional ROMs.
+Yup, its a bug.Â  Will fix.
+
+- Steve
+
diff --git a/classification_output/05/device/42226390 b/classification_output/05/device/42226390
new file mode 100644
index 000000000..685aeaae5
--- /dev/null
+++ b/classification_output/05/device/42226390
@@ -0,0 +1,195 @@
+device: 0.951
+boot: 0.943
+graphic: 0.942
+instruction: 0.925
+semantic: 0.924
+assembly: 0.919
+KVM: 0.905
+network: 0.894
+other: 0.894
+socket: 0.882
+vnc: 0.853
+mistranslation: 0.826
+
+[BUG] AArch64 boot hang with -icount and -smp >1 (iothread locking issue?)
+
+Hello,
+
+I am encountering one or more bugs when using -icount and -smp >1 that I am
+attempting to sort out. My current theory is that it is an iothread locking
+issue.
+
+I am using a command-line like the following where $kernel is a recent upstream
+AArch64 Linux kernel Image (I can provide a binary if that would be helpful -
+let me know how is best to post):
+
+        qemu-system-aarch64 \
+                -M virt -cpu cortex-a57 -m 1G \
+                -nographic \
+                -smp 2 \
+                -icount 0 \
+                -kernel $kernel
+
+For any/all of the symptoms described below, they seem to disappear when I
+either remove `-icount 0` or change smp to `-smp 1`. In other words, it is the
+combination of `-smp >1` and `-icount` which triggers what I'm seeing.
+
+I am seeing two different (but seemingly related) behaviors. The first (and
+what I originally started debugging) shows up as a boot hang. When booting
+using the above command after Peter's "icount: Take iothread lock when running
+QEMU timers" patch [1], The kernel boots for a while and then hangs after:
+
+>
+...snip...
+>
+[    0.010764] Serial: AMBA PL011 UART driver
+>
+[    0.016334] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 13, base_baud
+>
+= 0) is a PL011 rev1
+>
+[    0.016907] printk: console [ttyAMA0] enabled
+>
+[    0.017624] KASLR enabled
+>
+[    0.031986] HugeTLB: registered 16.0 GiB page size, pre-allocated 0 pages
+>
+[    0.031986] HugeTLB: 16320 KiB vmemmap can be freed for a 16.0 GiB page
+>
+[    0.031986] HugeTLB: registered 512 MiB page size, pre-allocated 0 pages
+>
+[    0.031986] HugeTLB: 448 KiB vmemmap can be freed for a 512 MiB page
+>
+[    0.031986] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages
+>
+[    0.031986] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page
+When it hangs here, I drop into QEMU's console, attach to the gdbserver, and it
+always reports that it is at address 0xffff800008dc42e8 (as shown below from an
+objdump of the vmlinux). I note this is in the middle of messing with timer
+system registers - which makes me suspect we're attempting to take the iothread
+lock when its already held:
+
+>
+ffff800008dc42b8 <arch_timer_set_next_event_virt>:
+>
+ffff800008dc42b8:       d503201f        nop
+>
+ffff800008dc42bc:       d503201f        nop
+>
+ffff800008dc42c0:       d503233f        paciasp
+>
+ffff800008dc42c4:       d53be321        mrs     x1, cntv_ctl_el0
+>
+ffff800008dc42c8:       32000021        orr     w1, w1, #0x1
+>
+ffff800008dc42cc:       d5033fdf        isb
+>
+ffff800008dc42d0:       d53be042        mrs     x2, cntvct_el0
+>
+ffff800008dc42d4:       ca020043        eor     x3, x2, x2
+>
+ffff800008dc42d8:       8b2363e3        add     x3, sp, x3
+>
+ffff800008dc42dc:       f940007f        ldr     xzr, [x3]
+>
+ffff800008dc42e0:       8b020000        add     x0, x0, x2
+>
+ffff800008dc42e4:       d51be340        msr     cntv_cval_el0, x0
+>
+* ffff800008dc42e8:       927ef820        and     x0, x1, #0xfffffffffffffffd
+>
+ffff800008dc42ec:       d51be320        msr     cntv_ctl_el0, x0
+>
+ffff800008dc42f0:       d5033fdf        isb
+>
+ffff800008dc42f4:       52800000        mov     w0, #0x0
+>
+// #0
+>
+ffff800008dc42f8:       d50323bf        autiasp
+>
+ffff800008dc42fc:       d65f03c0        ret
+The second behavior is that prior to Peter's "icount: Take iothread lock when
+running QEMU timers" patch [1], I observe the following message (same command
+as above):
+
+>
+ERROR:../accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed:
+>
+(qemu_mutex_iothread_locked())
+>
+Aborted (core dumped)
+This is the same behavior described in Gitlab issue 1130 [0] and addressed by
+[1]. I bisected the appearance of this assertion, and found it was introduced
+by Pavel's "replay: rewrite async event handling" commit [2]. Commits prior to
+that one boot successfully (neither assertions nor hangs) with `-icount 0 -smp
+2`.
+
+I've looked over these two commits ([1], [2]), but it is not obvious to me
+how/why they might be interacting to produce the boot hangs I'm seeing and
+I welcome any help investigating further.
+
+Thanks!
+
+-Aaron Lindsay
+
+[0] -
+https://gitlab.com/qemu-project/qemu/-/issues/1130
+[1] -
+https://gitlab.com/qemu-project/qemu/-/commit/c7f26ded6d5065e4116f630f6a490b55f6c5f58e
+[2] -
+https://gitlab.com/qemu-project/qemu/-/commit/60618e2d77691e44bb78e23b2b0cf07b5c405e56
+
+On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay
+<aaron@os.amperecomputing.com> wrote:
+>
+>
+Hello,
+>
+>
+I am encountering one or more bugs when using -icount and -smp >1 that I am
+>
+attempting to sort out. My current theory is that it is an iothread locking
+>
+issue.
+Weird coincidence, that is a bug that's been in the tree for months
+but was only reported to me earlier this week. Try reverting
+commit a82fd5a4ec24d923ff1e -- that should fix it.
+CAFEAcA_i8x00hD-4XX18ySLNbCB6ds1-DSazVb4yDnF8skjd9A@mail.gmail.com
+/">https://lore.kernel.org/qemu-devel/
+CAFEAcA_i8x00hD-4XX18ySLNbCB6ds1-DSazVb4yDnF8skjd9A@mail.gmail.com
+/
+has the explanation.
+
+thanks
+-- PMM
+
+On Oct 21 17:00, Peter Maydell wrote:
+>
+On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay
+>
+<aaron@os.amperecomputing.com> wrote:
+>
+>
+>
+> Hello,
+>
+>
+>
+> I am encountering one or more bugs when using -icount and -smp >1 that I am
+>
+> attempting to sort out. My current theory is that it is an iothread locking
+>
+> issue.
+>
+>
+Weird coincidence, that is a bug that's been in the tree for months
+>
+but was only reported to me earlier this week. Try reverting
+>
+commit a82fd5a4ec24d923ff1e -- that should fix it.
+I can confirm that reverting a82fd5a4ec24d923ff1e fixes it for me.
+Thanks for the help and fast response!
+
+-Aaron
+
diff --git a/classification_output/05/device/48245039 b/classification_output/05/device/48245039
new file mode 100644
index 000000000..b1a9e6510
--- /dev/null
+++ b/classification_output/05/device/48245039
@@ -0,0 +1,538 @@
+assembly: 0.956
+device: 0.953
+other: 0.953
+instruction: 0.951
+semantic: 0.939
+graphic: 0.935
+socket: 0.932
+boot: 0.932
+vnc: 0.926
+mistranslation: 0.888
+KVM: 0.855
+network: 0.818
+
+[Qemu-devel] [BUG] gcov support appears to be broken
+
+Hello, according to out docs, here is the procedure that should produce 
+coverage report for execution of the complete "make check":
+
+#./configure --enable-gcov
+#make
+#make check
+#make coverage-report
+
+It seems that first three commands execute as expected. (For example, there are 
+plenty of files generated by "make check" that would've not been generated if 
+"enable-gcov" hadn't been chosen.) However, the last command complains about 
+some missing files related to FP support. If those files are added (for 
+example, artificially, using "touch <missing-file"), that it starts complaining 
+about missing some decodetree-generated files. Other kinds of files are 
+involved too.
+
+It would be nice to have coverage support working. Please somebody take a look, 
+or explain if I make a mistake or misunderstood our gcov support.
+
+Yours,
+Aleksandar
+
+On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+Hello, according to out docs, here is the procedure that should produce
+>
+coverage report for execution of the complete "make check":
+>
+>
+#./configure --enable-gcov
+>
+#make
+>
+#make check
+>
+#make coverage-report
+>
+>
+It seems that first three commands execute as expected. (For example, there
+>
+are plenty of files generated by "make check" that would've not been
+>
+generated if "enable-gcov" hadn't been chosen.) However, the last command
+>
+complains about some missing files related to FP support. If those files are
+>
+added (for example, artificially, using "touch <missing-file"), that it
+>
+starts complaining about missing some decodetree-generated files. Other kinds
+>
+of files are involved too.
+>
+>
+It would be nice to have coverage support working. Please somebody take a
+>
+look, or explain if I make a mistake or misunderstood our gcov support.
+Cc'ing Alex who's probably the closest we have to a gcov expert.
+
+(make/make check of a --enable-gcov build is in the set of things our
+Travis CI setup runs, so we do defend that part against regressions.)
+
+thanks
+-- PMM
+
+Peter Maydell <address@hidden> writes:
+
+>
+On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
+>
+>
+>
+> Hello, according to out docs, here is the procedure that should produce
+>
+> coverage report for execution of the complete "make check":
+>
+>
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+> support. If those files are added (for example, artificially, using
+>
+> "touch <missing-file"), that it starts complaining about missing some
+>
+> decodetree-generated files. Other kinds of files are involved too.
+The gcov tool is fairly noisy about missing files but that just
+indicates the tests haven't exercised those code paths. "make check"
+especially doesn't touch much of the TCG code and a chunk of floating
+point.
+
+>
+>
+>
+> It would be nice to have coverage support working. Please somebody
+>
+> take a look, or explain if I make a mistake or misunderstood our gcov
+>
+> support.
+So your failure mode is no report is generated at all? It's working for
+me here.
+
+>
+>
+Cc'ing Alex who's probably the closest we have to a gcov expert.
+>
+>
+(make/make check of a --enable-gcov build is in the set of things our
+>
+Travis CI setup runs, so we do defend that part against regressions.)
+We defend the build but I have just checked and it seems our
+check_coverage script is currently failing:
+https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328
+But as it's an after_success script it doesn't fail the build.
+
+>
+>
+thanks
+>
+-- PMM
+--
+Alex BennÃ©e
+
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+So your failure mode is no report is generated at all? It's working for
+>
+me here.
+Alex, no report is generated for my test setups - in fact, "make 
+coverage-report" even says that it explicitly deletes what appears to be the 
+main coverage report html file).
+
+This is the terminal output of an unsuccessful executions of "make 
+coverage-report" for recent ToT:
+
+~/Build/qemu-TOT-TEST$ make coverage-report
+make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
+make[1]: Nothing to be done for 'all'.
+make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
+Makefile:1048: recipe for target 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
+make: *** 
+[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
+
+This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
+existed for quite some time)
+
+~/Build/qemu-3.0$ make coverage-report
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 
+'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
+Makefile:992: recipe for target 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
+make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
+Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
+
+Fond regards,
+Aleksandar
+
+
+>
+Alex BennÃ©e
+
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+So your failure mode is no report is generated at all? It's working for
+>
+me here.
+Another piece of info:
+
+~/Build/qemu-TOT-TEST$ gcov --version
+gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
+Copyright (C) 2015 Free Software Foundation, Inc.
+This is free software; see the source for copying conditions.
+There is NO warranty; not even for MERCHANTABILITY or 
+FITNESS FOR A PARTICULAR PURPOSE.
+
+:~/Build/qemu-TOT-TEST$ gcc --version
+gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
+Copyright (C) 2017 Free Software Foundation, Inc.
+This is free software; see the source for copying conditions.  There is NO
+warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+
+
+
+
+Alex, no report is generated for my test setups - in fact, "make 
+coverage-report" even says that it explicitly deletes what appears to be the 
+main coverage report html file).
+
+This is the terminal output of an unsuccessful executions of "make 
+coverage-report" for recent ToT:
+
+~/Build/qemu-TOT-TEST$ make coverage-report
+make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
+make[1]: Nothing to be done for 'all'.
+make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
+Makefile:1048: recipe for target 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
+make: *** 
+[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
+
+This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
+existed for quite some time)
+
+~/Build/qemu-3.0$ make coverage-report
+        CHK version_gen.h
+  GEN     coverage-report.html
+Traceback (most recent call last):
+  File "/usr/bin/gcovr", line 1970, in <module>
+    print_html_report(covdata, options.html_details)
+  File "/usr/bin/gcovr", line 1473, in print_html_report
+    INPUT = open(data['FILENAME'], 'r')
+IOError: [Errno 2] No such file or directory: 
+'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
+Makefile:992: recipe for target 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
+make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
+Error 1
+make: *** Deleting file 
+'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
+
+Fond regards,
+Aleksandar
+
+
+>
+Alex BennÃ©e
+
+>
+> #./configure --enable-gcov
+>
+> #make
+>
+> #make check
+>
+> #make coverage-report
+>
+>
+>
+> It seems that first three commands execute as expected. (For example,
+>
+> there are plenty of files generated by "make check" that would've not
+>
+> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+> last command complains about some missing files related to FP
+>
+So your failure mode is no report is generated at all? It's working for
+>
+me here.
+Alex, here is the thing:
+
+Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from 
+git repo to the most recent 4.1 (actually, to a dev version, from the very tip 
+of the tree), and "make coverage-report" started generating coverage reports. 
+It did emit some error messages (totally different than previous), but still it 
+did not stop like it used to do with gcovr 3.2.
+
+Perhaps you would want to add some gcov/gcovr minimal version info in our docs. 
+(or at least a statement "this was tested with such and such gcc, gcov and 
+gcovr", etc.?)
+
+Coverage report looked fine at first glance, but it a kind of disappointed me 
+when I digged deeper into its content - for example, it shows very low coverage 
+for our FP code (softfloat), while, in fact, we know that "make check" contains 
+detailed tests on FP functionalities. But this is most likely a separate 
+problem of a very different nature, perhaps the issue of separate git repo for 
+FP tests (testfloat) that our FP tests use as a mid-layer.
+
+I'll try how everything works with my test examples, and will let you know.
+
+Your help is greatly appreciated,
+Aleksandar
+
+Fond regards,
+Aleksandar
+
+
+>
+Alex BennÃ©e
+
+Aleksandar Markovic <address@hidden> writes:
+
+>
+>> #./configure --enable-gcov
+>
+>> #make
+>
+>> #make check
+>
+>> #make coverage-report
+>
+>>
+>
+>> It seems that first three commands execute as expected. (For example,
+>
+>> there are plenty of files generated by "make check" that would've not
+>
+>> been generated if "enable-gcov" hadn't been chosen.) However, the
+>
+>> last command complains about some missing files related to FP
+>
+>
+> So your failure mode is no report is generated at all? It's working for
+>
+> me here.
+>
+>
+Alex, here is the thing:
+>
+>
+Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from
+>
+git repo to the most recent 4.1 (actually, to a dev version, from the very
+>
+tip of the tree), and "make coverage-report" started generating coverage
+>
+reports. It did emit some error messages (totally different than previous),
+>
+but still it did not stop like it used to do with gcovr 3.2.
+>
+>
+Perhaps you would want to add some gcov/gcovr minimal version info in our
+>
+docs. (or at least a statement "this was tested with such and such gcc, gcov
+>
+and gcovr", etc.?)
+>
+>
+Coverage report looked fine at first glance, but it a kind of
+>
+disappointed me when I digged deeper into its content - for example,
+>
+it shows very low coverage for our FP code (softfloat), while, in
+>
+fact, we know that "make check" contains detailed tests on FP
+>
+functionalities. But this is most likely a separate problem of a very
+>
+different nature, perhaps the issue of separate git repo for FP tests
+>
+(testfloat) that our FP tests use as a mid-layer.
+I get:
+
+68.6 %  2593 / 3782     62.2 %  1690 / 2718
+
+Which is not bad considering we don't exercise the 80 and 128 bit
+softfloat code at all (which is not shared by the re-factored 16/32/64
+bit code).
+
+>
+>
+I'll try how everything works with my test examples, and will let you know.
+>
+>
+Your help is greatly appreciated,
+>
+Aleksandar
+>
+>
+Fond regards,
+>
+Aleksandar
+>
+>
+>
+> Alex BennÃ©e
+--
+Alex BennÃ©e
+
+>
+> it shows very low coverage for our FP code (softfloat), while, in
+>
+> fact, we know that "make check" contains detailed tests on FP
+>
+> functionalities. But this is most likely a separate problem of a very
+>
+> different nature, perhaps the issue of separate git repo for FP tests
+>
+> (testfloat) that our FP tests use as a mid-layer.
+>
+>
+I get:
+>
+>
+68.6 %  2593 / 3782     62.2 %  1690 / 2718
+>
+I would expect that kind of result too.
+
+However, I get:
+
+File:   fpu/softfloat.c                 Lines:  8       3334    0.2 %
+Date:   2019-08-05 19:56:58             Branches:       3       2376    0.1 %
+
+:(
+
+OK, I'll try to figure that out, and most likely I could live with it if it is 
+an isolated problem.
+
+Thank you for your assistance in this matter,
+Aleksandar
+
+>
+Which is not bad considering we don't exercise the 80 and 128 bit
+>
+softfloat code at all (which is not shared by the re-factored 16/32/64
+>
+bit code).
+>
+>
+Alex BennÃ©e
+
+>
+> it shows very low coverage for our FP code (softfloat), while, in
+>
+> fact, we know that "make check" contains detailed tests on FP
+>
+> functionalities. But this is most likely a separate problem of a very
+>
+> different nature, perhaps the issue of separate git repo for FP tests
+>
+> (testfloat) that our FP tests use as a mid-layer.
+>
+>
+I get:
+>
+>
+68.6 %  2593 / 3782     62.2 %  1690 / 2718
+>
+This problem is solved too. (and it is my fault)
+
+I worked with multiple versions of QEMU, and my previous low-coverage results 
+were for QEMU 3.0, and for that version the directory tests/fp did not even 
+exist. :D (<blush>)
+
+For QEMU ToT, I get now:
+
+fpu/softfloat.c         
+        68.8 %  2592 / 3770     62.3 %  1693 / 2718
+
+which is identical for all intents and purposes to your result.
+
+Yours cordially,
+Aleksandar
+
diff --git a/classification_output/05/device/57195159 b/classification_output/05/device/57195159
new file mode 100644
index 000000000..bfe64b270
--- /dev/null
+++ b/classification_output/05/device/57195159
@@ -0,0 +1,323 @@
+device: 0.877
+other: 0.868
+graphic: 0.861
+instruction: 0.833
+semantic: 0.794
+assembly: 0.782
+boot: 0.781
+KVM: 0.752
+socket: 0.750
+network: 0.687
+mistranslation: 0.665
+vnc: 0.626
+
+[BUG Report] Got a use-after-free error while start arm64 VM with lots of pci controllers
+
+Hi,
+
+We got a use-after-free report in our Euler Robot Test, it is can be reproduced 
+quite easily,
+It can be reproduced by start VM with lots of pci controller and virtio-scsi 
+devices.
+You can find the full qemu log from attachment.
+We have analyzed the log and got the rough process how it happened, but don't 
+know how to fix it.
+
+Could anyone help to fix it ?
+
+The key message shows bellow:
+har device redirected to /dev/pts/1 (label charserial0)
+==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext 
+functions and may produce false positives in some cases!
+=================================================================
+==1517174==ERROR: AddressSanitizer: heap-use-after-free on address 
+0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0
+READ of size 8 at 0xfffc31a002a0 thread T1
+    #0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771
+    #1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291
+    #2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283
+    #3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
+    #4 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
+    #5 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
+
+0xfffc31a002a0 is located 544 bytes inside of 1440-byte region 
+[0xfffc31a00080,0xfffc31a00620)
+freed by thread T37 (CPU 0/KVM) here:
+    #0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23)
+    #1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f)
+    #2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245
+    #3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271
+    #4 0xaaad745ba867 in pci_bridge_dev_write_config 
+hw/pci-bridge/pci_bridge_dev.c:153
+    #5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81
+    #6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483
+    #7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544
+    #8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482
+    #9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167
+    #10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207
+    #11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297
+    #12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386
+    #13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246
+    #14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
+    #15 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
+    #16 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
+
+previously allocated by thread T0 here:
+    #0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
+    #1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
+    #2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188
+    #3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385
+    #4 0xaaad745baaf3 in pci_bridge_dev_realize 
+hw/pci-bridge/pci_bridge_dev.c:64
+    #5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095
+    #6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
+    #7 0xaaad7485ed23 in property_set_bool qom/object.c:2102
+    #8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
+    #9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
+    #10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675
+    #11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074
+    #12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170
+    #13 0xaaad73d60c17 in main /home/qemu/vl.c:4313
+    #14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
+    #15 0xaaad73d6db33  
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
+
+Thread T1 created by T0 here:
+    #0 0xfffc3c068f6f in __interceptor_pthread_create 
+(/lib64/libasan.so.4+0x38f6f)
+    #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
+    #2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326
+    #3 0xaaad74bab2a7 in __libc_csu_init 
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7)
+    #4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47)
+    #5 0xaaad73d6db33  (/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
+
+Thread T37 (CPU 0/KVM) created by T0 here:
+    #0 0xfffc3c068f6f in __interceptor_pthread_create 
+(/lib64/libasan.so.4+0x38f6f)
+    #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
+    #2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045
+    #3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077
+    #4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712
+    #5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
+    #6 0xaaad7485ed23 in property_set_bool qom/object.c:2102
+    #7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
+    #8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
+    #9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682
+    #10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077
+    #11 0xaaad73d60b73 in main /home/qemu/vl.c:4292
+    #12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
+    #13 0xaaad73d6db33  
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
+
+SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in 
+memory_region_unref
+
+Thanks
+use-after-free-qemu.log
+Description:
+Text document
+
+Cc: address@hidden
+
+On 1/17/2020 4:18 PM, Pan Nengyuan wrote:
+>
+Hi,
+>
+>
+We got a use-after-free report in our Euler Robot Test, it is can be
+>
+reproduced quite easily,
+>
+It can be reproduced by start VM with lots of pci controller and virtio-scsi
+>
+devices.
+>
+You can find the full qemu log from attachment.
+>
+We have analyzed the log and got the rough process how it happened, but don't
+>
+know how to fix it.
+>
+>
+Could anyone help to fix it ?
+>
+>
+The key message shows bellow:
+>
+har device redirected to /dev/pts/1 (label charserial0)
+>
+==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext
+>
+functions and may produce false positives in some cases!
+>
+=================================================================
+>
+==1517174==ERROR: AddressSanitizer: heap-use-after-free on address
+>
+0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0
+>
+READ of size 8 at 0xfffc31a002a0 thread T1
+>
+#0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771
+>
+#1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291
+>
+#2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283
+>
+#3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
+>
+#4 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
+>
+#5 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
+>
+>
+0xfffc31a002a0 is located 544 bytes inside of 1440-byte region
+>
+[0xfffc31a00080,0xfffc31a00620)
+>
+freed by thread T37 (CPU 0/KVM) here:
+>
+#0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23)
+>
+#1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f)
+>
+#2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245
+>
+#3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271
+>
+#4 0xaaad745ba867 in pci_bridge_dev_write_config
+>
+hw/pci-bridge/pci_bridge_dev.c:153
+>
+#5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81
+>
+#6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483
+>
+#7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544
+>
+#8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482
+>
+#9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167
+>
+#10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207
+>
+#11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297
+>
+#12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386
+>
+#13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246
+>
+#14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519
+>
+#15 0xfffc3a1678bb  (/lib64/libpthread.so.0+0x78bb)
+>
+#16 0xfffc3a0a616b  (/lib64/libc.so.6+0xd616b)
+>
+>
+previously allocated by thread T0 here:
+>
+#0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb)
+>
+#1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163)
+>
+#2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188
+>
+#3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385
+>
+#4 0xaaad745baaf3 in pci_bridge_dev_realize
+>
+hw/pci-bridge/pci_bridge_dev.c:64
+>
+#5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095
+>
+#6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
+>
+#7 0xaaad7485ed23 in property_set_bool qom/object.c:2102
+>
+#8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
+>
+#9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
+>
+#10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675
+>
+#11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074
+>
+#12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170
+>
+#13 0xaaad73d60c17 in main /home/qemu/vl.c:4313
+>
+#14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
+>
+#15 0xaaad73d6db33
+>
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
+>
+>
+Thread T1 created by T0 here:
+>
+#0 0xfffc3c068f6f in __interceptor_pthread_create
+>
+(/lib64/libasan.so.4+0x38f6f)
+>
+#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
+>
+#2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326
+>
+#3 0xaaad74bab2a7 in __libc_csu_init
+>
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7)
+>
+#4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47)
+>
+#5 0xaaad73d6db33
+>
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
+>
+>
+Thread T37 (CPU 0/KVM) created by T0 here:
+>
+#0 0xfffc3c068f6f in __interceptor_pthread_create
+>
+(/lib64/libasan.so.4+0x38f6f)
+>
+#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556
+>
+#2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045
+>
+#3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077
+>
+#4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712
+>
+#5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865
+>
+#6 0xaaad7485ed23 in property_set_bool qom/object.c:2102
+>
+#7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26
+>
+#8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360
+>
+#9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682
+>
+#10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077
+>
+#11 0xaaad73d60b73 in main /home/qemu/vl.c:4292
+>
+#12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f)
+>
+#13 0xaaad73d6db33
+>
+(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33)
+>
+>
+SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in
+>
+memory_region_unref
+>
+>
+Thanks
+>
+use-after-free-qemu.log
+Description:
+Text document
+
diff --git a/classification_output/05/device/57231878 b/classification_output/05/device/57231878
new file mode 100644
index 000000000..d02eeac3d
--- /dev/null
+++ b/classification_output/05/device/57231878
@@ -0,0 +1,250 @@
+device: 0.818
+other: 0.788
+semantic: 0.774
+graphic: 0.751
+assembly: 0.745
+mistranslation: 0.719
+KVM: 0.708
+instruction: 0.661
+network: 0.659
+vnc: 0.640
+socket: 0.624
+boot: 0.609
+
+[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
+
+Hello all,
+I wanted to submit a bug report in the tracker, but it seem to require
+an Ubuntu One account, which I'm having trouble with, so I'll just
+give it here and hopefully somebody can make use of it.  The issue
+seems to be in an experimental format, so it's likely not very
+consequential anyway.
+
+For the sake of anyone else simply googling for a workaround, I'll
+just paste in the (cleaned up) brief IRC conversation about my issue
+from the official channel:
+<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux,
+Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine
+(FreeBSD-11.1).  The VM always aborts at the same point in the
+installation (downloading 'ports.tgz') with the following error
+message:
+"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197:
+qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
+zsh: abort (core dumped)  qemu-system-x86_64 -smp 2 -m 4096
+-enable-kvm -hda freebsd/freebsd.qed -devic"
+The commands I ran to create the machine are as follows:
+"qemu-img create -f qed freebsd/freebsd.qed 16G"
+"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda
+freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0
+-cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d"
+I tried adding logging options with the -d flag, but I didn't get
+anything that seemed relevant, since I'm not sure what to look for.
+<stsquad> ohh what's a qed device?
+<stsquad> quy: it might be a workaround to use a qcow2 image for now
+<stsquad> ahh the wiki has a statement "It is not recommended to use
+QED for any new images. "
+<danpb> 'qed' was an experimental disk image format created by IBM
+before qcow2 v3 came along
+<danpb> honestly nothing should ever use  QED these days
+<danpb> the good ideas from QED became  qcow2v3
+<stsquad> danpb: sounds like we should put a warning on the option to
+remind users of that fact
+<danpb> quy: sounds like qed driver is simply broken - please do file
+a bug against qemu bug tracker
+<danpb> quy: but you should also really switch to qcow2
+<quy> I see; some people need to update their wikis then.  I don't
+remember where which guide I read when I first learned what little
+QEMU I know, but I remember it specifically remember it saying QED was
+the newest and most optimal format.
+<stsquad> quy: we can only be responsible for our own wiki I'm afraid...
+<danpb> if you remember where you saw that please let us know so we
+can try to get it fixed
+<quy> Thank you very much for the info; I will switch to QCOW.
+Unfortunately, I'm not sure if I will be able to file any bug reports
+in the tracker as I can't seem to log Launchpad, which it seems to
+require.
+<danpb> quy:  an email to the mailing list would suffice too if you
+can't deal with launchpad
+<danpb> kwolf: ^^^ in case you're interested in possible QED
+assertions from 2.12
+
+If any more info is needed, feel free to email me; I'm not actually
+subscribed to this list though.
+Thank you,
+Quytelda Kahja
+
+CC Qemu Block; looks like QED is a bit busted.
+
+On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
+>
+Hello all,
+>
+I wanted to submit a bug report in the tracker, but it seem to require
+>
+an Ubuntu One account, which I'm having trouble with, so I'll just
+>
+give it here and hopefully somebody can make use of it.  The issue
+>
+seems to be in an experimental format, so it's likely not very
+>
+consequential anyway.
+>
+>
+For the sake of anyone else simply googling for a workaround, I'll
+>
+just paste in the (cleaned up) brief IRC conversation about my issue
+>
+from the official channel:
+>
+<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux,
+>
+Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine
+>
+(FreeBSD-11.1).  The VM always aborts at the same point in the
+>
+installation (downloading 'ports.tgz') with the following error
+>
+message:
+>
+"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197:
+>
+qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
+>
+zsh: abort (core dumped)  qemu-system-x86_64 -smp 2 -m 4096
+>
+-enable-kvm -hda freebsd/freebsd.qed -devic"
+>
+The commands I ran to create the machine are as follows:
+>
+"qemu-img create -f qed freebsd/freebsd.qed 16G"
+>
+"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda
+>
+freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0
+>
+-cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d"
+>
+I tried adding logging options with the -d flag, but I didn't get
+>
+anything that seemed relevant, since I'm not sure what to look for.
+>
+<stsquad> ohh what's a qed device?
+>
+<stsquad> quy: it might be a workaround to use a qcow2 image for now
+>
+<stsquad> ahh the wiki has a statement "It is not recommended to use
+>
+QED for any new images. "
+>
+<danpb> 'qed' was an experimental disk image format created by IBM
+>
+before qcow2 v3 came along
+>
+<danpb> honestly nothing should ever use  QED these days
+>
+<danpb> the good ideas from QED became  qcow2v3
+>
+<stsquad> danpb: sounds like we should put a warning on the option to
+>
+remind users of that fact
+>
+<danpb> quy: sounds like qed driver is simply broken - please do file
+>
+a bug against qemu bug tracker
+>
+<danpb> quy: but you should also really switch to qcow2
+>
+<quy> I see; some people need to update their wikis then.  I don't
+>
+remember where which guide I read when I first learned what little
+>
+QEMU I know, but I remember it specifically remember it saying QED was
+>
+the newest and most optimal format.
+>
+<stsquad> quy: we can only be responsible for our own wiki I'm afraid...
+>
+<danpb> if you remember where you saw that please let us know so we
+>
+can try to get it fixed
+>
+<quy> Thank you very much for the info; I will switch to QCOW.
+>
+Unfortunately, I'm not sure if I will be able to file any bug reports
+>
+in the tracker as I can't seem to log Launchpad, which it seems to
+>
+require.
+>
+<danpb> quy:  an email to the mailing list would suffice too if you
+>
+can't deal with launchpad
+>
+<danpb> kwolf: ^^^ in case you're interested in possible QED
+>
+assertions from 2.12
+>
+>
+If any more info is needed, feel free to email me; I'm not actually
+>
+subscribed to this list though.
+>
+Thank you,
+>
+Quytelda Kahja
+>
+
+On 06/29/2018 03:07 PM, John Snow wrote:
+CC Qemu Block; looks like QED is a bit busted.
+
+On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
+Hello all,
+I wanted to submit a bug report in the tracker, but it seem to require
+an Ubuntu One account, which I'm having trouble with, so I'll just
+give it here and hopefully somebody can make use of it.  The issue
+seems to be in an experimental format, so it's likely not very
+consequential anyway.
+Analysis in another thread may be relevant:
+https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html
+--
+Eric Blake, Principal Software Engineer
+Red Hat, Inc.           +1-919-301-3266
+Virtualization:  qemu.org | libvirt.org
+
+Am 29.06.2018 um 22:16 hat Eric Blake geschrieben:
+>
+On 06/29/2018 03:07 PM, John Snow wrote:
+>
+> CC Qemu Block; looks like QED is a bit busted.
+>
+>
+>
+> On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
+>
+> > Hello all,
+>
+> > I wanted to submit a bug report in the tracker, but it seem to require
+>
+> > an Ubuntu One account, which I'm having trouble with, so I'll just
+>
+> > give it here and hopefully somebody can make use of it.  The issue
+>
+> > seems to be in an experimental format, so it's likely not very
+>
+> > consequential anyway.
+>
+>
+Analysis in another thread may be relevant:
+>
+>
+https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html
+The assertion there was:
+
+qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion 
+`!atomic_read(&to->in_flight)' failed.
+
+Which quite clearly pointed to a drain bug. This one, however, doesn't
+seem to be related to drain, so I think it's probably a different bug.
+
+Kevin
+
diff --git a/classification_output/05/device/67821138 b/classification_output/05/device/67821138
new file mode 100644
index 000000000..4201fc494
--- /dev/null
+++ b/classification_output/05/device/67821138
@@ -0,0 +1,207 @@
+device: 0.916
+assembly: 0.915
+boot: 0.881
+other: 0.853
+semantic: 0.843
+graphic: 0.826
+KVM: 0.822
+instruction: 0.821
+mistranslation: 0.768
+vnc: 0.734
+network: 0.718
+socket: 0.699
+
+[BUG, RFC] Base node is in RW after making external snapshot
+
+Hi everyone,
+
+When making an external snapshot, we end up in a situation when 2 block
+graph nodes related to the same image file (format and storage nodes)
+have different RO flags set on them.
+
+E.g.
+
+# ls -la /proc/PID/fd
+lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd
+
+# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
+--pretty | egrep '"node-name"|"ro"'
+      "ro": false,
+      "node-name": "libvirt-1-format",
+      "ro": false,
+      "node-name": "libvirt-1-storage",
+
+# virsh snapshot-create-as VM --name snap --disk-only
+Domain snapshot snap created
+
+# ls -la /proc/PID/fd
+lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd
+lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap
+
+# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
+--pretty | egrep '"node-name"|"ro"'
+      "ro": false,
+      "node-name": "libvirt-2-format",
+      "ro": false,
+      "node-name": "libvirt-2-storage",
+      "ro": true,
+      "node-name": "libvirt-1-format",
+      "ro": false,                        <--------------
+      "node-name": "libvirt-1-storage",
+
+File descriptor has been reopened in RO, but "libvirt-1-storage" node
+still has RW permissions set.
+
+I'm wondering it this a bug or this is intended?  Looks like a bug to
+me, although I see that some iotests (e.g. 273) expect 2 nodes related
+to the same image file to have different RO flags.
+
+bdrv_reopen_set_read_only()
+  bdrv_reopen()
+    bdrv_reopen_queue()
+      bdrv_reopen_queue_child()
+    bdrv_reopen_multiple()
+      bdrv_list_refresh_perms()
+        bdrv_topological_dfs()
+        bdrv_do_refresh_perms()
+      bdrv_reopen_commit()
+
+In the stack above bdrv_reopen_set_read_only() is only being called for
+the parent (libvirt-1-format) node.  There're 2 lists: BDSs from
+refresh_list are used by bdrv_drv_set_perm and this leads to actual
+reopen with RO of the file descriptor.  And then there's reopen queue
+bs_queue -- BDSs from this queue get their parameters updated.  While
+refresh_list ends up having the whole subtree (including children, this
+is done in bdrv_topological_dfs()) bs_queue only has the parent.  And
+that is because storage (child) node's (bs->inherits_from == NULL), so
+bdrv_reopen_queue_child() never adds it to the queue.  Could it be the
+source of this bug?
+
+Anyway, would greatly appreciate a clarification.
+
+Andrey
+
+On 4/24/24 21:00, Andrey Drobyshev wrote:
+>
+Hi everyone,
+>
+>
+When making an external snapshot, we end up in a situation when 2 block
+>
+graph nodes related to the same image file (format and storage nodes)
+>
+have different RO flags set on them.
+>
+>
+E.g.
+>
+>
+# ls -la /proc/PID/fd
+>
+lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd
+>
+>
+# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
+>
+--pretty | egrep '"node-name"|"ro"'
+>
+"ro": false,
+>
+"node-name": "libvirt-1-format",
+>
+"ro": false,
+>
+"node-name": "libvirt-1-storage",
+>
+>
+# virsh snapshot-create-as VM --name snap --disk-only
+>
+Domain snapshot snap created
+>
+>
+# ls -la /proc/PID/fd
+>
+lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd
+>
+lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap
+>
+>
+# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
+>
+--pretty | egrep '"node-name"|"ro"'
+>
+"ro": false,
+>
+"node-name": "libvirt-2-format",
+>
+"ro": false,
+>
+"node-name": "libvirt-2-storage",
+>
+"ro": true,
+>
+"node-name": "libvirt-1-format",
+>
+"ro": false,                        <--------------
+>
+"node-name": "libvirt-1-storage",
+>
+>
+File descriptor has been reopened in RO, but "libvirt-1-storage" node
+>
+still has RW permissions set.
+>
+>
+I'm wondering it this a bug or this is intended?  Looks like a bug to
+>
+me, although I see that some iotests (e.g. 273) expect 2 nodes related
+>
+to the same image file to have different RO flags.
+>
+>
+bdrv_reopen_set_read_only()
+>
+bdrv_reopen()
+>
+bdrv_reopen_queue()
+>
+bdrv_reopen_queue_child()
+>
+bdrv_reopen_multiple()
+>
+bdrv_list_refresh_perms()
+>
+bdrv_topological_dfs()
+>
+bdrv_do_refresh_perms()
+>
+bdrv_reopen_commit()
+>
+>
+In the stack above bdrv_reopen_set_read_only() is only being called for
+>
+the parent (libvirt-1-format) node.  There're 2 lists: BDSs from
+>
+refresh_list are used by bdrv_drv_set_perm and this leads to actual
+>
+reopen with RO of the file descriptor.  And then there's reopen queue
+>
+bs_queue -- BDSs from this queue get their parameters updated.  While
+>
+refresh_list ends up having the whole subtree (including children, this
+>
+is done in bdrv_topological_dfs()) bs_queue only has the parent.  And
+>
+that is because storage (child) node's (bs->inherits_from == NULL), so
+>
+bdrv_reopen_queue_child() never adds it to the queue.  Could it be the
+>
+source of this bug?
+>
+>
+Anyway, would greatly appreciate a clarification.
+>
+>
+Andrey
+Friendly ping.  Could somebody confirm that it is a bug indeed?
+
diff --git a/classification_output/05/device/99674399 b/classification_output/05/device/99674399
new file mode 100644
index 000000000..cbf3c7035
--- /dev/null
+++ b/classification_output/05/device/99674399
@@ -0,0 +1,156 @@
+device: 0.886
+other: 0.883
+instruction: 0.860
+mistranslation: 0.843
+assembly: 0.843
+semantic: 0.822
+boot: 0.822
+graphic: 0.794
+socket: 0.747
+network: 0.711
+KVM: 0.698
+vnc: 0.673
+
+[BUG] qemu crashes on assertion in cpu_asidx_from_attrs when cpu is in smm mode
+
+Hi all!
+
+First, I see this issue:
+https://gitlab.com/qemu-project/qemu/-/issues/1198
+. 
+where some kvm/hardware failure leads to guest crash, and finally to this 
+assertion:
+
+   cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
+
+But in the ticket the talk is about the guest crash and fixing the kernel, not 
+about the final QEMU assertion (which definitely show that something should be 
+fixed in QEMU code too).
+
+
+We've faced same stack one time:
+
+(gdb) bt
+#0  raise () from /lib/x86_64-linux-gnu/libc.so.6
+#1  abort () from /lib/x86_64-linux-gnu/libc.so.6
+#2  ?? () from /lib/x86_64-linux-gnu/libc.so.6
+#3  __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
+#4  cpu_asidx_from_attrs  at ../hw/core/cpu-sysemu.c:76
+#5  cpu_memory_rw_debug  at ../softmmu/physmem.c:3529
+#6  x86_cpu_dump_state  at ../target/i386/cpu-dump.c:560
+#7  kvm_cpu_exec  at ../accel/kvm/kvm-all.c:3000
+#8  kvm_vcpu_thread_fn  at ../accel/kvm/kvm-accel-ops.c:51
+#9  qemu_thread_start  at ../util/qemu-thread-posix.c:505
+#10 start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#11 clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+
+And what I see:
+
+static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs)
+{
+    return !!attrs.secure;
+}
+
+int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
+{
+    int ret = 0;
+
+    if (cpu->cc->sysemu_ops->asidx_from_attrs) {
+        ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
+        assert(ret < cpu->num_ases && ret >= 0);         <<<<<<<<<<<<<<<<<
+    }
+    return ret;
+}
+
+(gdb) p cpu->num_ases
+$3 = 1
+
+(gdb) fr 5
+#5  0x00005578c8814ba3 in cpu_memory_rw_debug (cpu=c...
+(gdb) p attrs
+$6 = {unspecified = 0, secure = 1, user = 0, memory = 0, requester_id = 0, 
+byte_swap = 0, target_tlb_bit0 = 0, target_tlb_bit1 = 0, target_tlb_bit2 = 0}
+
+so .secure is 1, therefore ret is 1, in the same time num_ases is 1 too and 
+assertion fails.
+
+
+
+Where is .secure from?
+
+static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env)
+{
+    return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 });
+}
+
+Ok, it means we in SMM mode.
+
+
+
+On the other hand, it seems that num_ases seems to be always 1 for x86:
+
+vsementsov@vsementsov-lin:~/work/src/qemu/yc-7.2$ git grep 'num_ases = '
+cpu.c:    cpu->num_ases = 0;
+softmmu/cpus.c:        cpu->num_ases = 1;
+target/arm/cpu.c:        cs->num_ases = 3 + has_secure;
+target/arm/cpu.c:        cs->num_ases = 1 + has_secure;
+target/i386/tcg/sysemu/tcg-cpu.c:    cs->num_ases = 2;
+
+
+So, something is wrong around cpu->num_ases and x86_asidx_from_attrs() which 
+may return more in SMM mode.
+
+
+The stack starts in
+//7  0x00005578c882f539 in kvm_cpu_exec (cpu=cpu@entry=0x5578ca2eb340) at 
+../accel/kvm/kvm-all.c:3000
+    if (ret < 0) {
+        cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
+        vm_stop(RUN_STATE_INTERNAL_ERROR);
+    }
+
+So that was some kvm error, and we decided to call cpu_dump_state(). And it 
+crashes. cpu_dump_state() is also called from hmp_info_registers, so I can 
+reproduce the crash with a tiny patch to master (as only CPU_DUMP_CODE path 
+calls cpu_memory_rw_debug(), as it is in kvm_cpu_exec()):
+
+diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
+index ff01cf9d8d..dcf0189048 100644
+--- a/monitor/hmp-cmds-target.c
++++ b/monitor/hmp-cmds-target.c
+@@ -116,7 +116,7 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict)
+         }
+
+         monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
+-        cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
++        cpu_dump_state(cs, NULL, CPU_DUMP_CODE);
+     }
+ }
+
+
+Than run
+
+yes "info registers" | ./build/qemu-system-x86_64 -accel kvm -monitor stdio \
+   -global driver=cfi.pflash01,property=secure,value=on \
+   -blockdev "{'driver': 'file', 'filename': 
+'/usr/share/OVMF/OVMF_CODE_4M.secboot.fd', 'node-name': 'ovmf-code', 'read-only': 
+true}" \
+   -blockdev "{'driver': 'file', 'filename': '/usr/share/OVMF/OVMF_VARS_4M.fd', 
+'node-name': 'ovmf-vars', 'read-only': true}" \
+   -machine q35,smm=on,pflash0=ovmf-code,pflash1=ovmf-vars -m 2G -nodefaults
+
+And after some time (less than 20 seconds for me) it leads to
+
+qemu-system-x86_64: ../hw/core/cpu-sysemu.c:76: cpu_asidx_from_attrs: Assertion `ret < 
+cpu->num_ases && ret >= 0' failed.
+Aborted (core dumped)
+
+
+I've no idea how to correctly fix this bug, but I hope that my reproducer and 
+investigation will help a bit.
+
+--
+Best regards,
+Vladimir
+
author	Christian Krinitsin <mail@krinitsin.com>	2025-06-01 21:19:55 +0200
committer	Christian Krinitsin <mail@krinitsin.com>	2025-06-01 21:19:55 +0200
commit	e5634e2806195bee44407853c4bf8776f7abfa4f (patch)
tree	6c6e2503f2cd56c0edf08a55d5c8fcb3832fabd0 /classification_output/05/device
parent	05ed9b00104b3da2e9cb0d541b5c5bbceb027fde (diff)
download	qemu-analysis-e5634e2806195bee44407853c4bf8776f7abfa4f.tar.gz qemu-analysis-e5634e2806195bee44407853c4bf8776f7abfa4f.zip