summary refs log tree commit diff stats
path: root/results/classifier/007/permissions
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-03 19:39:53 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-07-03 19:39:53 +0200
commitdee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/007/permissions
parent4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
downloadqemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz
qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip
restructure results
Diffstat (limited to 'results/classifier/007/permissions')
-rw-r--r--results/classifier/007/permissions/12360755306
-rw-r--r--results/classifier/007/permissions/14488057721
-rw-r--r--results/classifier/007/permissions/14887122268
-rw-r--r--results/classifier/007/permissions/16056596108
-rw-r--r--results/classifier/007/permissions/21221931338
-rw-r--r--results/classifier/007/permissions/23300761323
-rw-r--r--results/classifier/007/permissions/26095107168
-rw-r--r--results/classifier/007/permissions/26430026175
-rw-r--r--results/classifier/007/permissions/48245039540
-rw-r--r--results/classifier/007/permissions/51610399318
-rw-r--r--results/classifier/007/permissions/552471161320
-rw-r--r--results/classifier/007/permissions/67821138209
-rw-r--r--results/classifier/007/permissions/74715356136
-rw-r--r--results/classifier/007/permissions/85542195130
-rw-r--r--results/classifier/007/permissions/882255722910
-rw-r--r--results/classifier/007/permissions/88281850291
-rw-r--r--results/classifier/007/permissions/95154278165
17 files changed, 0 insertions, 8426 deletions
diff --git a/results/classifier/007/permissions/12360755 b/results/classifier/007/permissions/12360755
deleted file mode 100644
index 3de2a3c4a..000000000
--- a/results/classifier/007/permissions/12360755
+++ /dev/null
@@ -1,306 +0,0 @@
-permissions: 0.930
-debug: 0.922
-semantic: 0.911
-device: 0.902
-graphic: 0.899
-performance: 0.895
-other: 0.886
-PID: 0.876
-files: 0.851
-boot: 0.818
-vnc: 0.810
-socket: 0.805
-KVM: 0.770
-network: 0.738
-
-[Qemu-devel] [BUG] virtio-net linux driver fails to probe on MIPS Malta since 'hw/virtio-pci: fix virtio behaviour'
-
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-
-Cheers
-James
-signature.asc
-Description:
-Digital signature
-
-On 03/17/2017 11:57 PM, James Hogan wrote:
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-Hi,
-
-
-This is strange. This commit changes virtio devices from legacy to virtio 
-"transitional".
-(your command line changes it to legacy)
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU 
-side
-there is nothing new.
-
-Michael, do you have any idea?
-
-Thanks,
-Marcel
-Cheers
-James
-
-On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote:
->
-On 03/17/2017 11:57 PM, James Hogan wrote:
->
-> Hi,
->
->
->
-> I've bisected the following failure of the virtio_net linux v4.10 driver
->
-> to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
->
->
->
-> virtio_net virtio0: virtio: device uses modern interface but does not have
->
-> VIRTIO_F_VERSION_1
->
-> virtio_net: probe of virtio0 failed with error -22
->
->
->
-> To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
->
->
->
-> It appears that adding ",disable-modern=on,disable-legacy=off" to the
->
-> virtio-net -device makes it work again.
->
->
->
-> I presume this should really just work out of the box. Any ideas why it
->
-> isn't?
->
->
->
->
-Hi,
->
->
->
-This is strange. This commit changes virtio devices from legacy to virtio
->
-"transitional".
->
-(your command line changes it to legacy)
->
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU
->
-side
->
-there is nothing new.
->
->
-Michael, do you have any idea?
->
->
-Thanks,
->
-Marcel
-My guess would be firmware mishandling 64 bit BARs - we saw such
-a case on sparc previously. As a result you are probably reading
-all zeroes from features register or something like that.
-Marcel, could you send a patch making the bar 32 bit?
-If that helps we know what the issue is.
-
->
-> Cheers
->
-> James
->
->
-
-On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote:
-On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote:
-On 03/17/2017 11:57 PM, James Hogan wrote:
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-Hi,
-
-
-This is strange. This commit changes virtio devices from legacy to virtio 
-"transitional".
-(your command line changes it to legacy)
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU 
-side
-there is nothing new.
-
-Michael, do you have any idea?
-
-Thanks,
-Marcel
-My guess would be firmware mishandling 64 bit BARs - we saw such
-a case on sparc previously. As a result you are probably reading
-all zeroes from features register or something like that.
-Marcel, could you send a patch making the bar 32 bit?
-If that helps we know what the issue is.
-Sure,
-
-Thanks,
-Marcel
-Cheers
-James
-
-On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote:
-On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote:
-On 03/17/2017 11:57 PM, James Hogan wrote:
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-Hi,
-
-
-This is strange. This commit changes virtio devices from legacy to virtio 
-"transitional".
-(your command line changes it to legacy)
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU 
-side
-there is nothing new.
-
-Michael, do you have any idea?
-
-Thanks,
-Marcel
-My guess would be firmware mishandling 64 bit BARs - we saw such
-a case on sparc previously. As a result you are probably reading
-all zeroes from features register or something like that.
-Marcel, could you send a patch making the bar 32 bit?
-If that helps we know what the issue is.
-Hi James,
-
-Can you please check if the below patch fixes the problem?
-Please note it is not a solution.
-
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
-index f9b7244..5b4d429 100644
---- a/hw/virtio/virtio-pci.c
-+++ b/hw/virtio/virtio-pci.c
-@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d, 
-Error **errp)
-         }
-
-         pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx,
--                         PCI_BASE_ADDRESS_SPACE_MEMORY |
--                         PCI_BASE_ADDRESS_MEM_PREFETCH |
--                         PCI_BASE_ADDRESS_MEM_TYPE_64,
-+                         PCI_BASE_ADDRESS_SPACE_MEMORY,
-                          &proxy->modern_bar);
-
-         proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap);
-
-
-Thanks,
-Marcel
-
-Hi Marcel,
-
-On Tue, Mar 21, 2017 at 04:16:58PM +0200, Marcel Apfelbaum wrote:
->
-Can you please check if the below patch fixes the problem?
->
-Please note it is not a solution.
->
->
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
->
-index f9b7244..5b4d429 100644
->
---- a/hw/virtio/virtio-pci.c
->
-+++ b/hw/virtio/virtio-pci.c
->
-@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d,
->
-Error **errp)
->
-}
->
->
-pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx,
->
--                         PCI_BASE_ADDRESS_SPACE_MEMORY |
->
--                         PCI_BASE_ADDRESS_MEM_PREFETCH |
->
--                         PCI_BASE_ADDRESS_MEM_TYPE_64,
->
-+                         PCI_BASE_ADDRESS_SPACE_MEMORY,
->
-&proxy->modern_bar);
->
->
-proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap);
-Sorry for the delay trying this, I was away last week.
-
-No, it doesn't seem to make any difference.
-
-Thanks
-James
-signature.asc
-Description:
-Digital signature
-
diff --git a/results/classifier/007/permissions/14488057 b/results/classifier/007/permissions/14488057
deleted file mode 100644
index 6fa010b72..000000000
--- a/results/classifier/007/permissions/14488057
+++ /dev/null
@@ -1,721 +0,0 @@
-permissions: 0.940
-PID: 0.930
-device: 0.929
-debug: 0.925
-other: 0.922
-performance: 0.911
-semantic: 0.905
-boot: 0.892
-graphic: 0.887
-vnc: 0.882
-KVM: 0.880
-network: 0.846
-socket: 0.825
-files: 0.823
-
-[Qemu-devel] [BUG] user-to-root privesc inside VM via bad translation caching
-
-This is an issue in QEMU's system emulation for X86 in TCG mode.
-The issue permits an attacker who can execute code in guest ring 3
-with normal user privileges to inject code into other processes that
-are running in guest ring 3, in particular root-owned processes.
-
-== reproduction steps ==
-
- - Create an x86-64 VM and install Debian Jessie in it. The following
-   steps should all be executed inside the VM.
- - Verify that procmail is installed and the correct version:
-       address@hidden:~# apt-cache show procmail | egrep 'Version|SHA'
-       Version: 3.22-24
-       SHA1: 54ed2d51db0e76f027f06068ab5371048c13434c
-       SHA256: 4488cf6975af9134a9b5238d5d70e8be277f70caa45a840dfbefd2dc444bfe7f
- - Install build-essential and nasm ("apt install build-essential nasm").
- - Unpack the exploit, compile it and run it:
-       address@hidden:~$ tar xvf procmail_cache_attack.tar
-       procmail_cache_attack/
-       procmail_cache_attack/shellcode.asm
-       procmail_cache_attack/xp.c
-       procmail_cache_attack/compile.sh
-       procmail_cache_attack/attack.c
-       address@hidden:~$ cd procmail_cache_attack
-       address@hidden:~/procmail_cache_attack$ ./compile.sh
-       address@hidden:~/procmail_cache_attack$ ./attack
-       memory mappings set up
-       child is dead, codegen should be complete
-       executing code as root! :)
-       address@hidden:~/procmail_cache_attack# id
-       uid=0(root) gid=0(root) groups=0(root),[...]
-
-Note: While the exploit depends on the precise version of procmail,
-the actual vulnerability is in QEMU, not in procmail. procmail merely
-serves as a seldomly-executed setuid root binary into which code can
-be injected.
-
-
-== detailed issue description ==
-QEMU caches translated basic blocks. To look up a translated basic
-block, the function tb_find() is used, which uses tb_htable_lookup()
-in its slowpath, which in turn compares translated basic blocks
-(TranslationBlock) to the lookup information (struct tb_desc) using
-tb_cmp().
-
-tb_cmp() attempts to ensure (among other things) that both the virtual
-start address of the basic block and the physical addresses that the
-basic block covers match. When checking the physical addresses, it
-assumes that a basic block can span at most two pages.
-
-gen_intermediate_code() attempts to enforce this by stopping the
-translation of a basic block if nearly one page of instructions has
-been translated already:
-
-    /* if too long translation, stop generation too */
-    if (tcg_op_buf_full() ||
-        (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) ||
-        num_insns >= max_insns) {
-        gen_jmp_im(pc_ptr - dc->cs_base);
-        gen_eob(dc);
-        break;
-    }
-
-However, while real X86 processors have a maximum instruction length
-of 15 bytes, QEMU's instruction decoder for X86 does not place any
-limit on the instruction length or the number of instruction prefixes.
-Therefore, it is possible to create an arbitrarily long instruction
-by e.g. prepending an arbitrary number of LOCK prefixes to a normal
-instruction. This permits creating a basic block that spans three
-pages by simply appending an approximately page-sized instruction to
-the end of a normal basic block that starts close to the end of a
-page.
-
-Such an overlong basic block causes the basic block caching to fail as
-follows: If code is generated and cached for a basic block that spans
-the physical pages (A,E,B), this basic block will be returned by
-lookups in a process in which the physical pages (A,B,C) are mapped
-in the same virtual address range (assuming that all other lookup
-parameters match).
-
-This behavior can be abused by an attacker e.g. as follows: If a
-non-relocatable world-readable setuid executable legitimately contains
-the pages (A,B,C), an attacker can map (A,E,B) into his own process,
-at the normal load address of A, where E is an attacker-controlled
-page. If a legitimate basic block spans the pages A and B, an attacker
-can write arbitrary non-branch instructions at the start of E, then
-append an overlong instruction
-that ends behind the start of C, yielding a modified basic block that
-spans all three pages. If the attacker then executes the modified
-basic block in his process, the modified basic block is cached.
-Next, the attacker can execute the setuid binary, which will reuse the
-cached modified basic block, executing attacker-controlled
-instructions in the context of the privileged process.
-
-I am sending this to qemu-devel because a QEMU security contact
-told me that QEMU does not consider privilege escalation inside a
-TCG VM to be a security concern.
-procmail_cache_attack.tar
-Description:
-Unix tar archive
-
-On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
->
-This is an issue in QEMU's system emulation for X86 in TCG mode.
->
-The issue permits an attacker who can execute code in guest ring 3
->
-with normal user privileges to inject code into other processes that
->
-are running in guest ring 3, in particular root-owned processes.
->
-I am sending this to qemu-devel because a QEMU security contact
->
-told me that QEMU does not consider privilege escalation inside a
->
-TCG VM to be a security concern.
-Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
-
-We should really fix the crossing-a-page-boundary code for x86.
-I believe we do get it correct for ARM Thumb instructions.
-
-thanks
--- PMM
-
-On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote:
->
-On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
->
-> This is an issue in QEMU's system emulation for X86 in TCG mode.
->
-> The issue permits an attacker who can execute code in guest ring 3
->
-> with normal user privileges to inject code into other processes that
->
-> are running in guest ring 3, in particular root-owned processes.
->
->
-> I am sending this to qemu-devel because a QEMU security contact
->
-> told me that QEMU does not consider privilege escalation inside a
->
-> TCG VM to be a security concern.
->
->
-Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
->
->
-We should really fix the crossing-a-page-boundary code for x86.
->
-I believe we do get it correct for ARM Thumb instructions.
-How about doing the instruction size check as follows?
-
-diff --git a/target/i386/translate.c b/target/i386/translate.c
-index 72c1b03a2a..94cf3da719 100644
---- a/target/i386/translate.c
-+++ b/target/i386/translate.c
-@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
-*env, DisasContext *s,
-     default:
-         goto unknown_op;
-     }
-+    if (s->pc - pc_start > 15) {
-+        s->pc = pc_start;
-+        goto illegal_op;
-+    }
-     return s->pc;
-  illegal_op:
-     gen_illegal_opcode(s);
-
-Thanks,
---
-Pranith
-
-On 22 March 2017 at 14:55, Pranith Kumar <address@hidden> wrote:
->
-On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote:
->
-> On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
->
->> This is an issue in QEMU's system emulation for X86 in TCG mode.
->
->> The issue permits an attacker who can execute code in guest ring 3
->
->> with normal user privileges to inject code into other processes that
->
->> are running in guest ring 3, in particular root-owned processes.
->
->
->
->> I am sending this to qemu-devel because a QEMU security contact
->
->> told me that QEMU does not consider privilege escalation inside a
->
->> TCG VM to be a security concern.
->
->
->
-> Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
->
->
->
-> We should really fix the crossing-a-page-boundary code for x86.
->
-> I believe we do get it correct for ARM Thumb instructions.
->
->
-How about doing the instruction size check as follows?
->
->
-diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-index 72c1b03a2a..94cf3da719 100644
->
---- a/target/i386/translate.c
->
-+++ b/target/i386/translate.c
->
-@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
->
-*env, DisasContext *s,
->
-default:
->
-goto unknown_op;
->
-}
->
-+    if (s->pc - pc_start > 15) {
->
-+        s->pc = pc_start;
->
-+        goto illegal_op;
->
-+    }
->
-return s->pc;
->
-illegal_op:
->
-gen_illegal_opcode(s);
-This doesn't look right because it means we'll check
-only after we've emitted all the code to do the
-instruction operation, so the effect will be
-"execute instruction, then take illegal-opcode
-exception".
-
-We should check what the x86 architecture spec actually
-says and implement that.
-
-thanks
--- PMM
-
-On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
-<address@hidden> wrote:
->
->
->
-> How about doing the instruction size check as follows?
->
->
->
-> diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-> index 72c1b03a2a..94cf3da719 100644
->
-> --- a/target/i386/translate.c
->
-> +++ b/target/i386/translate.c
->
-> @@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
->
-> *env, DisasContext *s,
->
->      default:
->
->          goto unknown_op;
->
->      }
->
-> +    if (s->pc - pc_start > 15) {
->
-> +        s->pc = pc_start;
->
-> +        goto illegal_op;
->
-> +    }
->
->      return s->pc;
->
->   illegal_op:
->
->      gen_illegal_opcode(s);
->
->
-This doesn't look right because it means we'll check
->
-only after we've emitted all the code to do the
->
-instruction operation, so the effect will be
->
-"execute instruction, then take illegal-opcode
->
-exception".
->
-The pc is restored to original address (s->pc = pc_start), so the
-exception will overwrite the generated illegal instruction and will be
-executed first.
-
-But yes, it's better to follow the architecture manual.
-
-Thanks,
---
-Pranith
-
-On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
->
-On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
->
-<address@hidden> wrote:
->
-> This doesn't look right because it means we'll check
->
-> only after we've emitted all the code to do the
->
-> instruction operation, so the effect will be
->
-> "execute instruction, then take illegal-opcode
->
-> exception".
->
-The pc is restored to original address (s->pc = pc_start), so the
->
-exception will overwrite the generated illegal instruction and will be
->
-executed first.
-s->pc is the guest PC -- moving that backwards will
-not do anything about the generated TCG IR that's
-already been written. You'd need to rewind the
-write pointer in the IR stream, which there is
-no support for doing AFAIK.
-
-thanks
--- PMM
-
-On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell
-<address@hidden> wrote:
->
-On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
->
-> On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
->
-> <address@hidden> wrote:
->
->> This doesn't look right because it means we'll check
->
->> only after we've emitted all the code to do the
->
->> instruction operation, so the effect will be
->
->> "execute instruction, then take illegal-opcode
->
->> exception".
->
->
-> The pc is restored to original address (s->pc = pc_start), so the
->
-> exception will overwrite the generated illegal instruction and will be
->
-> executed first.
->
->
-s->pc is the guest PC -- moving that backwards will
->
-not do anything about the generated TCG IR that's
->
-already been written. You'd need to rewind the
->
-write pointer in the IR stream, which there is
->
-no support for doing AFAIK.
-Ah, OK. Thanks for the explanation. May be we should check the size of
-the instruction while decoding the prefixes and error out once we
-exceed the limit. We would not generate any IR code.
-
---
-Pranith
-
-On 03/23/2017 02:29 AM, Pranith Kumar wrote:
-On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell
-<address@hidden> wrote:
-On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
-On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
-<address@hidden> wrote:
-This doesn't look right because it means we'll check
-only after we've emitted all the code to do the
-instruction operation, so the effect will be
-"execute instruction, then take illegal-opcode
-exception".
-The pc is restored to original address (s->pc = pc_start), so the
-exception will overwrite the generated illegal instruction and will be
-executed first.
-s->pc is the guest PC -- moving that backwards will
-not do anything about the generated TCG IR that's
-already been written. You'd need to rewind the
-write pointer in the IR stream, which there is
-no support for doing AFAIK.
-Ah, OK. Thanks for the explanation. May be we should check the size of
-the instruction while decoding the prefixes and error out once we
-exceed the limit. We would not generate any IR code.
-Yes.
-It would not enforce a true limit of 15 bytes, since you can't know that until
-you've done the rest of the decode.  But you'd be able to say that no more than
-14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 bytes is used.
-Which does fix the bug.
-
-
-r~
-
-On 22/03/2017 21:01, Richard Henderson wrote:
->
->
->
-> Ah, OK. Thanks for the explanation. May be we should check the size of
->
-> the instruction while decoding the prefixes and error out once we
->
-> exceed the limit. We would not generate any IR code.
->
->
-Yes.
->
->
-It would not enforce a true limit of 15 bytes, since you can't know that
->
-until you've done the rest of the decode.  But you'd be able to say that
->
-no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
-bytes is used.
->
->
-Which does fix the bug.
-Yeah, that would work for 2.9 if somebody wants to put together a patch.
- Ensuring that all instruction fetching happens before translation side
-effects is a little harder, but perhaps it's also the opportunity to get
-rid of s->rip_offset which is a little ugly.
-
-Paolo
-
-On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
->
->
->
-On 22/03/2017 21:01, Richard Henderson wrote:
->
->>
->
->> Ah, OK. Thanks for the explanation. May be we should check the size of
->
->> the instruction while decoding the prefixes and error out once we
->
->> exceed the limit. We would not generate any IR code.
->
->
->
-> Yes.
->
->
->
-> It would not enforce a true limit of 15 bytes, since you can't know that
->
-> until you've done the rest of the decode.  But you'd be able to say that
->
-> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
-> bytes is used.
->
->
->
-> Which does fix the bug.
->
->
-Yeah, that would work for 2.9 if somebody wants to put together a patch.
->
-Ensuring that all instruction fetching happens before translation side
->
-effects is a little harder, but perhaps it's also the opportunity to get
->
-rid of s->rip_offset which is a little ugly.
-How about the following?
-
-diff --git a/target/i386/translate.c b/target/i386/translate.c
-index 72c1b03a2a..67c58b8900 100644
---- a/target/i386/translate.c
-+++ b/target/i386/translate.c
-@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
-*env, DisasContext *s,
-     s->vex_l = 0;
-     s->vex_v = 0;
-  next_byte:
-+    /* The prefixes can atmost be 14 bytes since x86 has an upper
-+       limit of 15 bytes for the instruction */
-+    if (s->pc - pc_start > 14) {
-+        goto illegal_op;
-+    }
-     b = cpu_ldub_code(env, s->pc);
-     s->pc++;
-     /* Collect prefixes.  */
-
---
-Pranith
-
-On 23/03/2017 17:50, Pranith Kumar wrote:
->
-On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
->
->
->
->
->
-> On 22/03/2017 21:01, Richard Henderson wrote:
->
->>>
->
->>> Ah, OK. Thanks for the explanation. May be we should check the size of
->
->>> the instruction while decoding the prefixes and error out once we
->
->>> exceed the limit. We would not generate any IR code.
->
->>
->
->> Yes.
->
->>
->
->> It would not enforce a true limit of 15 bytes, since you can't know that
->
->> until you've done the rest of the decode.  But you'd be able to say that
->
->> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
->> bytes is used.
->
->>
->
->> Which does fix the bug.
->
->
->
-> Yeah, that would work for 2.9 if somebody wants to put together a patch.
->
->  Ensuring that all instruction fetching happens before translation side
->
-> effects is a little harder, but perhaps it's also the opportunity to get
->
-> rid of s->rip_offset which is a little ugly.
->
->
-How about the following?
->
->
-diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-index 72c1b03a2a..67c58b8900 100644
->
---- a/target/i386/translate.c
->
-+++ b/target/i386/translate.c
->
-@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
->
-*env, DisasContext *s,
->
-s->vex_l = 0;
->
-s->vex_v = 0;
->
-next_byte:
->
-+    /* The prefixes can atmost be 14 bytes since x86 has an upper
->
-+       limit of 15 bytes for the instruction */
->
-+    if (s->pc - pc_start > 14) {
->
-+        goto illegal_op;
->
-+    }
->
-b = cpu_ldub_code(env, s->pc);
->
-s->pc++;
->
-/* Collect prefixes.  */
-Please make the comment more verbose, based on Richard's remark.  We
-should apply it to 2.9.
-
-Also, QEMU usually formats comments with stars on every line.
-
-Paolo
-
-On Thu, Mar 23, 2017 at 1:37 PM, Paolo Bonzini <address@hidden> wrote:
->
->
->
-On 23/03/2017 17:50, Pranith Kumar wrote:
->
-> On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
->
->>
->
->>
->
->> On 22/03/2017 21:01, Richard Henderson wrote:
->
->>>>
->
->>>> Ah, OK. Thanks for the explanation. May be we should check the size of
->
->>>> the instruction while decoding the prefixes and error out once we
->
->>>> exceed the limit. We would not generate any IR code.
->
->>>
->
->>> Yes.
->
->>>
->
->>> It would not enforce a true limit of 15 bytes, since you can't know that
->
->>> until you've done the rest of the decode.  But you'd be able to say that
->
->>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
->>> bytes is used.
->
->>>
->
->>> Which does fix the bug.
->
->>
->
->> Yeah, that would work for 2.9 if somebody wants to put together a patch.
->
->>  Ensuring that all instruction fetching happens before translation side
->
->> effects is a little harder, but perhaps it's also the opportunity to get
->
->> rid of s->rip_offset which is a little ugly.
->
->
->
-> How about the following?
->
->
->
-> diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-> index 72c1b03a2a..67c58b8900 100644
->
-> --- a/target/i386/translate.c
->
-> +++ b/target/i386/translate.c
->
-> @@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
->
-> *env, DisasContext *s,
->
->      s->vex_l = 0;
->
->      s->vex_v = 0;
->
->   next_byte:
->
-> +    /* The prefixes can atmost be 14 bytes since x86 has an upper
->
-> +       limit of 15 bytes for the instruction */
->
-> +    if (s->pc - pc_start > 14) {
->
-> +        goto illegal_op;
->
-> +    }
->
->      b = cpu_ldub_code(env, s->pc);
->
->      s->pc++;
->
->      /* Collect prefixes.  */
->
->
-Please make the comment more verbose, based on Richard's remark.  We
->
-should apply it to 2.9.
->
->
-Also, QEMU usually formats comments with stars on every line.
-OK. I'll send a proper patch with updated comment.
-
-Thanks,
---
-Pranith
-
diff --git a/results/classifier/007/permissions/14887122 b/results/classifier/007/permissions/14887122
deleted file mode 100644
index ae50ba435..000000000
--- a/results/classifier/007/permissions/14887122
+++ /dev/null
@@ -1,268 +0,0 @@
-permissions: 0.964
-files: 0.944
-debug: 0.934
-semantic: 0.928
-device: 0.919
-PID: 0.914
-socket: 0.914
-graphic: 0.910
-performance: 0.897
-other: 0.890
-vnc: 0.871
-network: 0.855
-boot: 0.831
-KVM: 0.814
-
-[BUG][RFC] CPR transfer Issues: Socket permissions and PID files
-
-Hello,
-
-While testing CPR transfer I encountered two issues. The first is that the 
-transfer fails when running with pidfiles due to the destination qemu process 
-attempting to create the pidfile while it is still locked by the source 
-process. The second is that the transfer fails when running with the -run-with 
-user=$USERID parameter. This is because the destination qemu process creates 
-the UNIX sockets used for the CPR transfer before dropping to the lower 
-permissioned user, which causes them to be owned by the original user. The 
-source qemu process then does not have permission to connect to it because it 
-is already running as the lesser permissioned user.
-
-Reproducing the first issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have the -pidfile parameter passed on the command line. You 
-should see the following error on the command line of the second process:
-
-qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
-temporarily unavailable
-
-Reproducing the second issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have -run-with user=$USERID passed on the command line, where 
-$USERID is a different user from the one launching the processes. Then attempt 
-a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
-receive the following error via QMP:
-{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
-Permission denied"}}
-
-I provided a minimal patch that works around the second issue.
-
-Thank you,
-Ben Chaney
-
----
-include/system/os-posix.h | 4 ++++
-os-posix.c | 8 --------
-util/qemu-sockets.c | 21 +++++++++++++++++++++
-3 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/include/system/os-posix.h b/include/system/os-posix.h
-index ce5b3bccf8..2a414a914a 100644
---- a/include/system/os-posix.h
-+++ b/include/system/os-posix.h
-@@ -55,6 +55,10 @@ void os_setup_limits(void);
-void os_setup_post(void);
-int os_mlock(bool on_fault);
-
-+extern struct passwd *user_pwd;
-+extern uid_t user_uid;
-+extern gid_t user_gid;
-+
-/**
-* qemu_alloc_stack:
-* @sz: pointer to a size_t holding the requested usable stack size
-diff --git a/os-posix.c b/os-posix.c
-index 52925c23d3..9369b312a0 100644
---- a/os-posix.c
-+++ b/os-posix.c
-@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
-}
-
-
--/*
-- * Must set all three of these at once.
-- * Legal combinations are unset by name by uid
-- */
--static struct passwd *user_pwd; /* NULL non-NULL NULL */
--static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
--static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
--
-/*
-* Prepare to change user ID. user_id can be one of 3 forms:
-* - a username, in which case user ID will be changed to its uid,
-diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
-index 77477c1cd5..987977ead9 100644
---- a/util/qemu-sockets.c
-+++ b/util/qemu-sockets.c
-@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
-#endif
-}
-
-+/*
-+ * Must set all three of these at once.
-+ * Legal combinations are unset by name by uid
-+ */
-+struct passwd *user_pwd; /* NULL non-NULL NULL */
-+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-+
-static int unix_listen_saddr(UnixSocketAddress *saddr,
-int num,
-Error **errp)
-@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
-error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
-goto err;
-}
-+ if (user_pwd) {
-+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+ else if (user_uid != -1 && user_gid != -1) {
-+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+
-if (listen(sock, num) < 0) {
-error_setg_errno(errp, errno, "Failed to listen on socket");
-goto err;
---
-2.40.1
-
-Thank you Ben.  I appreciate you testing CPR and shaking out the bugs.
-I will study these and propose patches.
-
-My initial reaction to the pidfile issue is that the orchestration layer must
-pass a different filename when starting the destination qemu instance.  When
-using live update without containers, these types of resource conflicts in the
-global namespaces are a known issue.
-
-- Steve
-
-On 3/14/2025 2:33 PM, Chaney, Ben wrote:
-Hello,
-
-While testing CPR transfer I encountered two issues. The first is that the 
-transfer fails when running with pidfiles due to the destination qemu process 
-attempting to create the pidfile while it is still locked by the source 
-process. The second is that the transfer fails when running with the -run-with 
-user=$USERID parameter. This is because the destination qemu process creates 
-the UNIX sockets used for the CPR transfer before dropping to the lower 
-permissioned user, which causes them to be owned by the original user. The 
-source qemu process then does not have permission to connect to it because it 
-is already running as the lesser permissioned user.
-
-Reproducing the first issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have the -pidfile parameter passed on the command line. You 
-should see the following error on the command line of the second process:
-
-qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
-temporarily unavailable
-
-Reproducing the second issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have -run-with user=$USERID passed on the command line, where 
-$USERID is a different user from the one launching the processes. Then attempt 
-a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
-receive the following error via QMP:
-{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
-Permission denied"}}
-
-I provided a minimal patch that works around the second issue.
-
-Thank you,
-Ben Chaney
-
----
-include/system/os-posix.h | 4 ++++
-os-posix.c | 8 --------
-util/qemu-sockets.c | 21 +++++++++++++++++++++
-3 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/include/system/os-posix.h b/include/system/os-posix.h
-index ce5b3bccf8..2a414a914a 100644
---- a/include/system/os-posix.h
-+++ b/include/system/os-posix.h
-@@ -55,6 +55,10 @@ void os_setup_limits(void);
-void os_setup_post(void);
-int os_mlock(bool on_fault);
-
-+extern struct passwd *user_pwd;
-+extern uid_t user_uid;
-+extern gid_t user_gid;
-+
-/**
-* qemu_alloc_stack:
-* @sz: pointer to a size_t holding the requested usable stack size
-diff --git a/os-posix.c b/os-posix.c
-index 52925c23d3..9369b312a0 100644
---- a/os-posix.c
-+++ b/os-posix.c
-@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
-}
-
-
--/*
-- * Must set all three of these at once.
-- * Legal combinations are unset by name by uid
-- */
--static struct passwd *user_pwd; /* NULL non-NULL NULL */
--static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
--static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
--
-/*
-* Prepare to change user ID. user_id can be one of 3 forms:
-* - a username, in which case user ID will be changed to its uid,
-diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
-index 77477c1cd5..987977ead9 100644
---- a/util/qemu-sockets.c
-+++ b/util/qemu-sockets.c
-@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
-#endif
-}
-
-+/*
-+ * Must set all three of these at once.
-+ * Legal combinations are unset by name by uid
-+ */
-+struct passwd *user_pwd; /* NULL non-NULL NULL */
-+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-+
-static int unix_listen_saddr(UnixSocketAddress *saddr,
-int num,
-Error **errp)
-@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
-error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
-goto err;
-}
-+ if (user_pwd) {
-+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+ else if (user_uid != -1 && user_gid != -1) {
-+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+
-if (listen(sock, num) < 0) {
-error_setg_errno(errp, errno, "Failed to listen on socket");
-goto err;
---
-2.40.1
-
diff --git a/results/classifier/007/permissions/16056596 b/results/classifier/007/permissions/16056596
deleted file mode 100644
index e6f8e1f9c..000000000
--- a/results/classifier/007/permissions/16056596
+++ /dev/null
@@ -1,108 +0,0 @@
-permissions: 0.985
-other: 0.980
-semantic: 0.979
-debug: 0.978
-files: 0.975
-device: 0.973
-boot: 0.971
-graphic: 0.970
-performance: 0.970
-PID: 0.961
-socket: 0.952
-vnc: 0.946
-network: 0.940
-KVM: 0.934
-
-[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()"
-
-Bug Description:
-Encountering a boot failure when launching a KVM guest with
-'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor
-crashes.
-Reproduction Steps:
-# qemu-system-ppc64 --version
-QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
-Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
-# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
-pseries,accel=kvm \
--m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
-  -device virtio-scsi-pci,id=scsi \
--drive
-file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
-\
--device scsi-hd,drive=drive0,bus=scsi.0 \
-  -netdev bridge,id=net0,br=virbr0 \
-  -device virtio-net-pci,netdev=net0 \
-  -serial pty \
-  -device virtio-balloon-pci \
-  -cpu host
-QEMU 9.2.50 monitor - type 'help' for more information
-char device redirected to /dev/pts/2 (label serial0)
-(qemu)
-(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
-unavailable: IRQ_XIVE capability must be present for KVM
-Falling back to kernel-irqchip=off
-** Qemu Hang
-
-(In another ssh session)
-# screen /dev/pts/2
-Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
-(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
-(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
-15:20:17 UTC 2024
-Detected machine type: 0000000000000101
-command line:
-BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
-root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
-Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
-Calling ibm,client-architecture-support... done
-memory layout at init:
-  memory_limit : 0000000000000000 (16 MB aligned)
-  alloc_bottom : 0000000008200000
-  alloc_top    : 0000000030000000
-  alloc_top_hi : 0000000800000000
-  rmo_top      : 0000000030000000
-  ram_top      : 0000000800000000
-instantiating rtas at 0x000000002fff0000... done
-prom_hold_cpus: skipped
-copying OF device tree...
-Building dt strings...
-Building dt structure...
-Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
-Device tree struct  0x0000000008220000 -> 0x0000000008230000
-Quiescing Open Firmware ...
-Booting Linux via __start() @ 0x0000000000440000 ...
-** Guest Console Hang
-
-
-Git Bisect:
-Performing git bisect points to the following patch:
-# git bisect bad
-e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
-commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
-Author: Nicholas Piggin <npiggin@gmail.com>
-Date:   Thu Dec 19 13:40:31 2024 +1000
-
-    target/ppc: fix timebase register reset state
-(H)DEC and PURR get reset before icount does, which causes them to
-be
-skewed and not match the init state. This can cause replay to not
-match the recorded trace exactly. For DEC and HDEC this is usually
-not
-noticable since they tend to get programmed before affecting the
-    target machine. PURR has been observed to cause replay bugs when
-    running Linux.
-
-    Fix this by resetting using a time of 0.
-
-    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
-    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
-
- hw/ppc/ppc.c | 11 ++++++++---
- 1 file changed, 8 insertions(+), 3 deletions(-)
-
-
-Reverting the patch helps boot the guest.
-Thanks,
-Misbah Anjum N
-
diff --git a/results/classifier/007/permissions/21221931 b/results/classifier/007/permissions/21221931
deleted file mode 100644
index a925c3002..000000000
--- a/results/classifier/007/permissions/21221931
+++ /dev/null
@@ -1,338 +0,0 @@
-permissions: 0.982
-other: 0.979
-network: 0.976
-device: 0.971
-debug: 0.971
-files: 0.967
-semantic: 0.967
-performance: 0.966
-socket: 0.957
-graphic: 0.948
-boot: 0.947
-PID: 0.945
-vnc: 0.944
-KVM: 0.913
-
-[BUG] qemu git error with virgl
-
-Hello,
-
-i can't start any system if i use virgl. I get the following error:
-qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
-`con->gl' failed.
-./and.sh: line 27: 3337167 Aborted                 qemu-x86_64 -m 4096
--smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
-virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
-intel-hda,id=sound0,msi=on -device
-hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
--device usb-tablet,bus=xhci.0 -net
-nic,macaddr=52:54:00:12:34:62,model=e1000 -net
-tap,ifname=$INTERFACE,script=no,downscript=no -drive
-file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
-Set 'tap3' nonpersistent
-
-i have bicected the issue:
-
-towo:Defiant> git bisect good
-b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
-commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
-Author: Paolo Bonzini <pbonzini@redhat.com>
-Date:   Tue Oct 27 08:44:23 2020 -0400
-
-    vl: remove separate preconfig main_loop
-    Move post-preconfig initialization to the x-exit-preconfig. If
-preconfig
-    is not requested, just exit preconfig mode immediately with the QMP
-    command.
-
-    As a result, the preconfig loop will run with accel_setup_post
-    and os_setup_post restrictions (xen_restrict, chroot, etc.)
-    already done.
-
-    Reviewed-by: Igor Mammedov <imammedo@redhat.com>
-    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
- include/sysemu/runstate.h |  1 -
- monitor/qmp-cmds.c        |  9 -----
- softmmu/vl.c              | 95
-++++++++++++++++++++---------------------------
- 3 files changed, 41 insertions(+), 64 deletions(-)
-
-Regards,
-
-Torsten Wohlfarth
-
-Cc'ing Gerd + patch author/reviewer.
-
-On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
->
-Hello,
->
->
-i can't start any system if i use virgl. I get the following error:
->
->
-qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
->
-`con->gl' failed.
->
-./and.sh: line 27: 3337167 Aborted                 qemu-x86_64 -m 4096
->
--smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
->
-virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
->
-intel-hda,id=sound0,msi=on -device
->
-hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
->
--device usb-tablet,bus=xhci.0 -net
->
-nic,macaddr=52:54:00:12:34:62,model=e1000 -net
->
-tap,ifname=$INTERFACE,script=no,downscript=no -drive
->
-file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
->
->
-Set 'tap3' nonpersistent
->
->
-i have bicected the issue:
->
->
-towo:Defiant> git bisect good
->
-b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
->
-commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
->
-Author: Paolo Bonzini <pbonzini@redhat.com>
->
-Date:   Tue Oct 27 08:44:23 2020 -0400
->
->
-    vl: remove separate preconfig main_loop
->
->
-    Move post-preconfig initialization to the x-exit-preconfig. If
->
-preconfig
->
-    is not requested, just exit preconfig mode immediately with the QMP
->
-    command.
->
->
-    As a result, the preconfig loop will run with accel_setup_post
->
-    and os_setup_post restrictions (xen_restrict, chroot, etc.)
->
-    already done.
->
->
-    Reviewed-by: Igor Mammedov <imammedo@redhat.com>
->
-    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
->
->
- include/sysemu/runstate.h |  1 -
->
- monitor/qmp-cmds.c        |  9 -----
->
- softmmu/vl.c              | 95
->
-++++++++++++++++++++---------------------------
->
- 3 files changed, 41 insertions(+), 64 deletions(-)
->
->
-Regards,
->
->
-Torsten Wohlfarth
->
->
->
-
-On Sun, 3 Jan 2021 18:28:11 +0100
-Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
-
->
-Cc'ing Gerd + patch author/reviewer.
->
->
-On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
->
-> Hello,
->
->
->
-> i can't start any system if i use virgl. I get the following error:
->
->
->
-> qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
->
-> `con->gl' failed.
-Does following fix issue:
-  [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration
-
->
-> ./and.sh: line 27: 3337167 Aborted                 qemu-x86_64 -m 4096
->
-> -smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
->
-> virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
->
-> intel-hda,id=sound0,msi=on -device
->
-> hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
->
-> -device usb-tablet,bus=xhci.0 -net
->
-> nic,macaddr=52:54:00:12:34:62,model=e1000 -net
->
-> tap,ifname=$INTERFACE,script=no,downscript=no -drive
->
-> file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
->
->
->
-> Set 'tap3' nonpersistent
->
->
->
-> i have bicected the issue:
->
->
->
-> towo:Defiant> git bisect good
->
-> b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
->
-> commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
->
-> Author: Paolo Bonzini <pbonzini@redhat.com>
->
-> Date:   Tue Oct 27 08:44:23 2020 -0400
->
->
->
->     vl: remove separate preconfig main_loop
->
->
->
->     Move post-preconfig initialization to the x-exit-preconfig. If
->
-> preconfig
->
->     is not requested, just exit preconfig mode immediately with the QMP
->
->     command.
->
->
->
->     As a result, the preconfig loop will run with accel_setup_post
->
->     and os_setup_post restrictions (xen_restrict, chroot, etc.)
->
->     already done.
->
->
->
->     Reviewed-by: Igor Mammedov <imammedo@redhat.com>
->
->     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
->
->
->
->  include/sysemu/runstate.h |  1 -
->
->  monitor/qmp-cmds.c        |  9 -----
->
->  softmmu/vl.c              | 95
->
-> ++++++++++++++++++++---------------------------
->
->  3 files changed, 41 insertions(+), 64 deletions(-)
->
->
->
-> Regards,
->
->
->
-> Torsten Wohlfarth
->
->
->
->
->
->
->
->
-
-Hi Igor,
-
-yes, that fixes my issue.
-
-Regards, Torsten
-
-Am 04.01.21 um 19:50 schrieb Igor Mammedov:
-On Sun, 3 Jan 2021 18:28:11 +0100
-Philippe Mathieu-Daudé <philmd@redhat.com> wrote:
-Cc'ing Gerd + patch author/reviewer.
-
-On 1/2/21 2:11 PM, Torsten Wohlfarth wrote:
-Hello,
-
-i can't start any system if i use virgl. I get the following error:
-
-qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion
-`con->gl' failed.
-Does following fix issue:
-   [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration
-./and.sh: line 27: 3337167 Aborted                 qemu-x86_64 -m 4096
--smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device
-virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device
-intel-hda,id=sound0,msi=on -device
-hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci
--device usb-tablet,bus=xhci.0 -net
-nic,macaddr=52:54:00:12:34:62,model=e1000 -net
-tap,ifname=$INTERFACE,script=no,downscript=no -drive
-file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads
-
-Set 'tap3' nonpersistent
-
-i have bicected the issue:
-towo:Defiant> git bisect good
-b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit
-commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4
-Author: Paolo Bonzini <pbonzini@redhat.com>
-Date:   Tue Oct 27 08:44:23 2020 -0400
-
-     vl: remove separate preconfig main_loop
-
-     Move post-preconfig initialization to the x-exit-preconfig. If
-preconfig
-     is not requested, just exit preconfig mode immediately with the QMP
-     command.
-
-     As a result, the preconfig loop will run with accel_setup_post
-     and os_setup_post restrictions (xen_restrict, chroot, etc.)
-     already done.
-
-     Reviewed-by: Igor Mammedov <imammedo@redhat.com>
-     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
-
-  include/sysemu/runstate.h |  1 -
-  monitor/qmp-cmds.c        |  9 -----
-  softmmu/vl.c              | 95
-++++++++++++++++++++---------------------------
-  3 files changed, 41 insertions(+), 64 deletions(-)
-
-Regards,
-
-Torsten Wohlfarth
-
diff --git a/results/classifier/007/permissions/23300761 b/results/classifier/007/permissions/23300761
deleted file mode 100644
index 929fc4adf..000000000
--- a/results/classifier/007/permissions/23300761
+++ /dev/null
@@ -1,323 +0,0 @@
-permissions: 0.984
-debug: 0.978
-other: 0.963
-performance: 0.952
-PID: 0.950
-semantic: 0.950
-device: 0.932
-boot: 0.929
-socket: 0.927
-vnc: 0.926
-graphic: 0.924
-files: 0.910
-KVM: 0.897
-network: 0.879
-
-[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical)
-
-Hi,
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-Some of them are already know (wrong format strings), others look like
-real errors:
-- several multiplication results which don't work as they should in
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
-32 bit!),  target/i386/translate.c and other files
-- potential buffer overflows in gdbstub.c and other files
-I am afraid that the overflows in the block code are release critical,
-maybe that in target/i386/translate.c and other errors, too.
-About half of the alerts are issues which can be fixed later.
-
-Regards
-
-Stefan
-
-On 13/07/19 19:46, Stefan Weil wrote:
->
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
-Some of them are already know (wrong format strings), others look like
->
-real errors:
->
->
-- several multiplication results which don't work as they should in
->
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-32 bit!),  target/i386/translate.c and other files
-m->nb_clusters here is limited by s->l2_slice_size (see for example
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
-couldn't find this particular multiplication in Coverity, but it has
-about 250 issues marked as intentional or false positive so there's
-probably a lot of overlap with what LGTM found.
-
-Paolo
-
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-On 13/07/19 19:46, Stefan Weil wrote:
->
-> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
->
-> Some of them are already known (wrong format strings), others look like
->
-> real errors:
->
->
->
-> - several multiplication results which don't work as they should in
->
-> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-> 32 bit!),  target/i386/translate.c and other files
->
-m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-couldn't find this particular multiplication in Coverity, but it has
->
-about 250 issues marked as intentional or false positive so there's
->
-probably a lot of overlap with what LGTM found.
->
->
-Paolo
->
-From other projects I know that there is a certain overlap between the
-results from Coverity Scan an LGTM, but it is good to have both
-analyzers, and the results from LGTM are typically quite reliable.
-
-Even if we know that there is no multiplication overflow, the code could
-be modified. Either the assigned value should use the same data type as
-the factors (possible when there is never an overflow, avoids a size
-extension), or the multiplication could use the larger data type by
-adding a type cast to one of the factors (then an overflow cannot
-happen, static code analysers and human reviewers have an easier job,
-but the multiplication costs more time).
-
-Stefan
-
-Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben:
->
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-> On 13/07/19 19:46, Stefan Weil wrote:
->
->> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->>
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->>
->
->> Some of them are already known (wrong format strings), others look like
->
->> real errors:
->
->>
->
->> - several multiplication results which don't work as they should in
->
->> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
->> 32 bit!),  target/i386/translate.c and other files
-Request sizes are limited to 32 bit in the generic block layer before
-they are even passed to the individual block drivers, so most if not all
-of these are going to be false positives.
-
->
-> m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-> handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-> couldn't find this particular multiplication in Coverity, but it has
->
-> about 250 issues marked as intentional or false positive so there's
->
-> probably a lot of overlap with what LGTM found.
->
->
->
-> Paolo
->
->
-From other projects I know that there is a certain overlap between the
->
-results from Coverity Scan an LGTM, but it is good to have both
->
-analyzers, and the results from LGTM are typically quite reliable.
->
->
-Even if we know that there is no multiplication overflow, the code could
->
-be modified. Either the assigned value should use the same data type as
->
-the factors (possible when there is never an overflow, avoids a size
->
-extension), or the multiplication could use the larger data type by
->
-adding a type cast to one of the factors (then an overflow cannot
->
-happen, static code analysers and human reviewers have an easier job,
->
-but the multiplication costs more time).
-But if you look at the code we're talking about, you see that it's
-complaining about things where being more explicit would make things
-less readable.
-
-For example, if complains about the multiplication in this line:
-
-    s->file_size += n * s->header.cluster_size;
-
-We know that n * s->header.cluster_size fits in 32 bits, but
-s->file_size is 64 bits (and has to be 64 bits). Do you really think we
-should introduce another uint32_t variable to store the intermediate
-result? And if we cast n to uint64_t, not only might the multiplication
-cost more time, but also human readers would wonder why the result could
-become larger than 32 bits. So a cast would be misleading.
-
-
-It also complains about this line:
-
-    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
-                        PREALLOC_MODE_OFF, &local_err);
-
-Here, we don't even assign the result to a 64 bit variable, but just
-pass it to a function which takes a 64 bit parameter. Again, I don't
-think introducing additional variables for the intermediate result or
-adding casts would be an improvement of the situation.
-
-
-So I don't think this is a good enough tool to base our code on what it
-does and doesn't understand. It would have too much of a negative impact
-on our code. We'd rather need a way to mark false positives as such and
-move on without changing the code in such cases.
-
-Kevin
-
-On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote:
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-I had a look at some of these before, but mostly I came
-to the conclusion that it wasn't worth trying to put the
-effort into keeping up with the site because they didn't
-seem to provide any useful way to mark things as false
-positives. Coverity has its flaws but at least you can do
-that kind of thing in its UI (it runs at about a 33% fp
-rate, I think.) "Analyzer thinks this multiply can overflow
-but in fact it's not possible" is quite a common false
-positive cause...
-
-Anyway, if you want to fish out specific issues, analyse
-whether they're false positive or real, and report them
-to the mailing list as followups to the patches which
-introduced the issue, that's probably the best way for
-us to make use of this analyzer. (That is essentially
-what I do for coverity.)
-
-thanks
--- PMM
-
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
-[...]
->
-"Analyzer thinks this multiply can overflow
->
-but in fact it's not possible" is quite a common false
->
-positive cause...
-The analysers don't complain because a multiply can overflow.
-
-They complain because the code indicates that a larger result is
-expected, for example uint64_t = uint32_t * uint32_t. They would not
-complain for the same multiplication if it were assigned to a uint32_t.
-
-So there is a simple solution to write the code in a way which avoids
-false positives...
-
-Stefan
-
-Stefan Weil <address@hidden> writes:
-
->
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
->
-[...]
->
-> "Analyzer thinks this multiply can overflow
->
-> but in fact it's not possible" is quite a common false
->
-> positive cause...
->
->
->
-The analysers don't complain because a multiply can overflow.
->
->
-They complain because the code indicates that a larger result is
->
-expected, for example uint64_t = uint32_t * uint32_t. They would not
->
-complain for the same multiplication if it were assigned to a uint32_t.
-I agree this is an anti-pattern.
-
->
-So there is a simple solution to write the code in a way which avoids
->
-false positives...
-You wrote elsewhere in this thread:
-
-    Either the assigned value should use the same data type as the
-    factors (possible when there is never an overflow, avoids a size
-    extension), or the multiplication could use the larger data type by
-    adding a type cast to one of the factors (then an overflow cannot
-    happen, static code analysers and human reviewers have an easier
-    job, but the multiplication costs more time).
-
-Makes sense to me.
-
-On 7/14/19 5:30 PM, Peter Maydell wrote:
->
-I had a look at some of these before, but mostly I came
->
-to the conclusion that it wasn't worth trying to put the
->
-effort into keeping up with the site because they didn't
->
-seem to provide any useful way to mark things as false
->
-positives. Coverity has its flaws but at least you can do
->
-that kind of thing in its UI (it runs at about a 33% fp
->
-rate, I think.)
-Yes, LGTM wants you to modify the source code with
-
-  /* lgtm [cpp/some-warning-code] */
-
-and on the same line as the reported problem.  Which is mildly annoying in that
-you're definitely committing to LGTM in the long term.  Also for any
-non-trivial bit of code, it will almost certainly run over 80 columns.
-
-
-r~
-
diff --git a/results/classifier/007/permissions/26095107 b/results/classifier/007/permissions/26095107
deleted file mode 100644
index d5075496a..000000000
--- a/results/classifier/007/permissions/26095107
+++ /dev/null
@@ -1,168 +0,0 @@
-permissions: 0.993
-debug: 0.993
-files: 0.989
-PID: 0.988
-device: 0.988
-performance: 0.987
-socket: 0.987
-boot: 0.987
-KVM: 0.985
-other: 0.979
-semantic: 0.974
-vnc: 0.972
-graphic: 0.955
-network: 0.879
-
-[Qemu-devel]  [Bug Report] vm paused after succeeding to migrate
-
-Hi, all
-I encounterd a bug when I try to migrate a windows vm.
-
-Enviroment information:
-host A: cpu E5620(model WestmereEP without flag xsave)
-host B: cpu E5-2643(model SandyBridgeEP with xsave)
-
-The reproduce steps is :
-1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
-2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
-3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
-4. Then migrate the vm to host A (successfully), but vm was paused, and qemu 
-printed log as followed:
-
-KVM: entry failed, hardware error 0x80000021
-
-If you're running a guest on an Intel machine without unrestricted mode
-support, the failure can be most likely due to the guest entering an invalid
-state for Intel VT. For example, the guest maybe running in big real mode
-which is not supported on less recent Intel processors.
-
-EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
-ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
-EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
-ES =0000 00000000 0000ffff 00009300
-CS =f000 ffff0000 0000ffff 00009b00
-SS =0000 00000000 0000ffff 00009300
-DS =0000 00000000 0000ffff 00009300
-FS =0000 00000000 0000ffff 00009300
-GS =0000 00000000 0000ffff 00009300
-LDT=0000 00000000 0000ffff 00008200
-TR =0000 00000000 0000ffff 00008b00
-GDT=     00000000 0000ffff
-IDT=     00000000 0000ffff
-CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
-DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 
-DR3=0000000000000000
-DR6=00000000ffff0ff0 DR7=0000000000000400
-EFER=0000000000000000
-Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 
-00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-
-I have found that problem happened when kvm_put_sregs returns err -22(called by 
-kvm_arch_put_registers(qemu)).
-Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has no 
-X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
-So should we cancel migration when kvm_arch_put_registers returns error?
-
-* linzhecheng (address@hidden) wrote:
->
-Hi, all
->
-I encounterd a bug when I try to migrate a windows vm.
->
->
-Enviroment information:
->
-host A: cpu E5620(model WestmereEP without flag xsave)
->
-host B: cpu E5-2643(model SandyBridgeEP with xsave)
->
->
-The reproduce steps is :
->
-1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
->
-2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
->
-3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
->
-4. Then migrate the vm to host A (successfully), but vm was paused, and qemu
->
-printed log as followed:
-Remember that migrating using -cpu host  across different CPU models is NOT
-expected to work.
-
->
-KVM: entry failed, hardware error 0x80000021
->
->
-If you're running a guest on an Intel machine without unrestricted mode
->
-support, the failure can be most likely due to the guest entering an invalid
->
-state for Intel VT. For example, the guest maybe running in big real mode
->
-which is not supported on less recent Intel processors.
->
->
-EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
->
-ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
->
-EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
->
-ES =0000 00000000 0000ffff 00009300
->
-CS =f000 ffff0000 0000ffff 00009b00
->
-SS =0000 00000000 0000ffff 00009300
->
-DS =0000 00000000 0000ffff 00009300
->
-FS =0000 00000000 0000ffff 00009300
->
-GS =0000 00000000 0000ffff 00009300
->
-LDT=0000 00000000 0000ffff 00008200
->
-TR =0000 00000000 0000ffff 00008b00
->
-GDT=     00000000 0000ffff
->
-IDT=     00000000 0000ffff
->
-CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
->
-DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
->
-DR3=0000000000000000
->
-DR6=00000000ffff0ff0 DR7=0000000000000400
->
-EFER=0000000000000000
->
-Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
->
-00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
->
-00
->
->
-I have found that problem happened when kvm_put_sregs returns err -22(called
->
-by kvm_arch_put_registers(qemu)).
->
-Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has
->
-no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
->
-So should we cancel migration when kvm_arch_put_registers returns error?
-It would seem good if we can make the migration fail there rather than
-hitting that KVM error.
-It looks like we need to do a bit of plumbing to convert the places that
-call it to return a bool rather than void.
-
-Dave
-
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
diff --git a/results/classifier/007/permissions/26430026 b/results/classifier/007/permissions/26430026
deleted file mode 100644
index d081ea9ab..000000000
--- a/results/classifier/007/permissions/26430026
+++ /dev/null
@@ -1,175 +0,0 @@
-permissions: 0.937
-debug: 0.925
-KVM: 0.919
-semantic: 0.904
-device: 0.904
-performance: 0.898
-PID: 0.894
-vnc: 0.893
-files: 0.879
-graphic: 0.862
-boot: 0.841
-socket: 0.817
-other: 0.813
-network: 0.758
-
-[BUG] cxl,i386: e820 mappings may not be correct for cxl
-
-Context included below from prior discussion
-    - `cxl create-region` would fail on inability to allocate memory
-    - traced this down to the memory region being marked RESERVED
-    - E820 map marks the CXL fixed memory window as RESERVED
-
-
-Re: x86 errors, I found that region worked with this patch. (I also
-added the SRAT patches the Davidlohr posted, but I do not think they are
-relevant).
-
-I don't think this is correct, and setting this to E820_RAM causes the
-system to fail to boot at all, but with this change `cxl create-region`
-succeeds, which suggests our e820 mappings in the i386 machine are
-incorrect.
-
-Anyone who can help or have an idea as to what e820 should actually be
-doing with this region, or if this is correct and something else is
-failing, please help!
-
-
-diff --git a/hw/i386/pc.c b/hw/i386/pc.c
-index 566accf7e6..a5e688a742 100644
---- a/hw/i386/pc.c
-+++ b/hw/i386/pc.c
-@@ -1077,7 +1077,7 @@ void pc_memory_init(PCMachineState *pcms,
-                 memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
-                                       "cxl-fixed-memory-region", fw->size);
-                 memory_region_add_subregion(system_memory, fw->base, &fw->mr);
--                e820_add_entry(fw->base, fw->size, E820_RESERVED);
-+                e820_add_entry(fw->base, fw->size, E820_NVS);
-                 cxl_fmw_base += fw->size;
-                 cxl_resv_end = cxl_fmw_base;
-             }
-
-
-On Mon, Oct 10, 2022 at 05:32:42PM +0100, Jonathan Cameron wrote:
->
->
-> > but i'm not sure of what to do with this info.  We have some proof
->
-> > that real hardware works with this no problem, and the only difference
->
-> > is that the EFI/bios/firmware is setting the memory regions as `usable`
->
-> > or `soft reserved`, which would imply the EDK2 is the blocker here
->
-> > regardless of the OS driver status.
->
-> >
->
-> > But I'd seen elsewhere you had gotten some of this working, and I'm
->
-> > failing to get anything working at the moment.  If you have any input i
->
-> > would greatly appreciate the help.
->
-> >
->
-> > QEMU config:
->
-> >
->
-> > /opt/qemu-cxl2/bin/qemu-system-x86_64 \
->
-> > -drive
->
-> > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=d\
->
-> > -m 2G,slots=4,maxmem=4G \
->
-> > -smp 4 \
->
-> > -machine type=q35,accel=kvm,cxl=on \
->
-> > -enable-kvm \
->
-> > -nographic \
->
-> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
->
-> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
->
-> > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=256M \
->
-> > -object memory-backend-file,id=lsa0,mem-path=/tmp/cxl-lsa0,size=256M \
->
-> > -device cxl-type3,bus=rp0,pmem=true,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0
->
-> > \
->
-> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=256M
->
-> >
->
-> > I'd seen on the lists that you had seen issues with single-rp setups,
->
-> > but no combination of configuration I've tried (including all the ones
->
-> > in the docs and tests) lead to a successful region creation with
->
-> > `cxl create-region`
->
->
->
-> Hmm. Let me have a play.  I've not run x86 tests for a while so
->
-> perhaps something is missing there.
->
->
->
-> I'm carrying a patch to override check_last_peer() in
->
-> cxl_port_setup_targets() as that is wrong for some combinations,
->
-> but that doesn't look like it's related to what you are seeing.
->
->
-I'm not sure if it's relevant, but turned out I'd forgotten I'm carrying 3
->
-patches that aren't upstream (and one is a horrible hack).
->
->
-Hack:
-https://lore.kernel.org/linux-cxl/20220819094655.000005ed@huawei.com/
->
-Shouldn't affect a simple case like this...
->
->
-https://lore.kernel.org/linux-cxl/20220819093133.00006c22@huawei.com/T/#t
->
-(Dan's version)
->
->
-https://lore.kernel.org/linux-cxl/20220815154044.24733-1-Jonathan.Cameron@huawei.com/T/#t
->
->
-For writes to work you will currently need two rps (nothing on the second is
->
-fine)
->
-as we still haven't resolved if the kernel should support an HDM decoder on
->
-a host bridge with one port.  I think it should (Spec allows it), others
->
-unconvinced.
->
->
-Note I haven't shifted over to x86 yet so may still be something different
->
-from
->
-arm64.
->
->
-Jonathan
->
->
-
diff --git a/results/classifier/007/permissions/48245039 b/results/classifier/007/permissions/48245039
deleted file mode 100644
index 18cdaa24d..000000000
--- a/results/classifier/007/permissions/48245039
+++ /dev/null
@@ -1,540 +0,0 @@
-permissions: 0.966
-debug: 0.961
-PID: 0.954
-device: 0.953
-other: 0.953
-semantic: 0.939
-graphic: 0.935
-socket: 0.932
-boot: 0.932
-vnc: 0.926
-files: 0.924
-performance: 0.890
-KVM: 0.855
-network: 0.818
-
-[Qemu-devel] [BUG] gcov support appears to be broken
-
-Hello, according to out docs, here is the procedure that should produce 
-coverage report for execution of the complete "make check":
-
-#./configure --enable-gcov
-#make
-#make check
-#make coverage-report
-
-It seems that first three commands execute as expected. (For example, there are 
-plenty of files generated by "make check" that would've not been generated if 
-"enable-gcov" hadn't been chosen.) However, the last command complains about 
-some missing files related to FP support. If those files are added (for 
-example, artificially, using "touch <missing-file"), that it starts complaining 
-about missing some decodetree-generated files. Other kinds of files are 
-involved too.
-
-It would be nice to have coverage support working. Please somebody take a look, 
-or explain if I make a mistake or misunderstood our gcov support.
-
-Yours,
-Aleksandar
-
-On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
->
->
-Hello, according to out docs, here is the procedure that should produce
->
-coverage report for execution of the complete "make check":
->
->
-#./configure --enable-gcov
->
-#make
->
-#make check
->
-#make coverage-report
->
->
-It seems that first three commands execute as expected. (For example, there
->
-are plenty of files generated by "make check" that would've not been
->
-generated if "enable-gcov" hadn't been chosen.) However, the last command
->
-complains about some missing files related to FP support. If those files are
->
-added (for example, artificially, using "touch <missing-file"), that it
->
-starts complaining about missing some decodetree-generated files. Other kinds
->
-of files are involved too.
->
->
-It would be nice to have coverage support working. Please somebody take a
->
-look, or explain if I make a mistake or misunderstood our gcov support.
-Cc'ing Alex who's probably the closest we have to a gcov expert.
-
-(make/make check of a --enable-gcov build is in the set of things our
-Travis CI setup runs, so we do defend that part against regressions.)
-
-thanks
--- PMM
-
-Peter Maydell <address@hidden> writes:
-
->
-On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> Hello, according to out docs, here is the procedure that should produce
->
-> coverage report for execution of the complete "make check":
->
->
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-> support. If those files are added (for example, artificially, using
->
-> "touch <missing-file"), that it starts complaining about missing some
->
-> decodetree-generated files. Other kinds of files are involved too.
-The gcov tool is fairly noisy about missing files but that just
-indicates the tests haven't exercised those code paths. "make check"
-especially doesn't touch much of the TCG code and a chunk of floating
-point.
-
->
->
->
-> It would be nice to have coverage support working. Please somebody
->
-> take a look, or explain if I make a mistake or misunderstood our gcov
->
-> support.
-So your failure mode is no report is generated at all? It's working for
-me here.
-
->
->
-Cc'ing Alex who's probably the closest we have to a gcov expert.
->
->
-(make/make check of a --enable-gcov build is in the set of things our
->
-Travis CI setup runs, so we do defend that part against regressions.)
-We defend the build but I have just checked and it seems our
-check_coverage script is currently failing:
-https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328
-But as it's an after_success script it doesn't fail the build.
-
->
->
-thanks
->
--- PMM
---
-Alex Bennée
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Alex, no report is generated for my test setups - in fact, "make 
-coverage-report" even says that it explicitly deletes what appears to be the 
-main coverage report html file).
-
-This is the terminal output of an unsuccessful executions of "make 
-coverage-report" for recent ToT:
-
-~/Build/qemu-TOT-TEST$ make coverage-report
-make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
-Makefile:1048: recipe for target 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
-make: *** 
-[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
-
-This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
-existed for quite some time)
-
-~/Build/qemu-3.0$ make coverage-report
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 
-'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
-Makefile:992: recipe for target 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
-make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
-Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
-
-Fond regards,
-Aleksandar
-
-
->
-Alex Bennée
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Another piece of info:
-
-~/Build/qemu-TOT-TEST$ gcov --version
-gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
-Copyright (C) 2015 Free Software Foundation, Inc.
-This is free software; see the source for copying conditions.
-There is NO warranty; not even for MERCHANTABILITY or 
-FITNESS FOR A PARTICULAR PURPOSE.
-
-:~/Build/qemu-TOT-TEST$ gcc --version
-gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
-Copyright (C) 2017 Free Software Foundation, Inc.
-This is free software; see the source for copying conditions.  There is NO
-warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-
-
-
-
-Alex, no report is generated for my test setups - in fact, "make 
-coverage-report" even says that it explicitly deletes what appears to be the 
-main coverage report html file).
-
-This is the terminal output of an unsuccessful executions of "make 
-coverage-report" for recent ToT:
-
-~/Build/qemu-TOT-TEST$ make coverage-report
-make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
-Makefile:1048: recipe for target 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
-make: *** 
-[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
-
-This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
-existed for quite some time)
-
-~/Build/qemu-3.0$ make coverage-report
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 
-'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
-Makefile:992: recipe for target 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
-make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
-Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
-
-Fond regards,
-Aleksandar
-
-
->
-Alex Bennée
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Alex, here is the thing:
-
-Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from 
-git repo to the most recent 4.1 (actually, to a dev version, from the very tip 
-of the tree), and "make coverage-report" started generating coverage reports. 
-It did emit some error messages (totally different than previous), but still it 
-did not stop like it used to do with gcovr 3.2.
-
-Perhaps you would want to add some gcov/gcovr minimal version info in our docs. 
-(or at least a statement "this was tested with such and such gcc, gcov and 
-gcovr", etc.?)
-
-Coverage report looked fine at first glance, but it a kind of disappointed me 
-when I digged deeper into its content - for example, it shows very low coverage 
-for our FP code (softfloat), while, in fact, we know that "make check" contains 
-detailed tests on FP functionalities. But this is most likely a separate 
-problem of a very different nature, perhaps the issue of separate git repo for 
-FP tests (testfloat) that our FP tests use as a mid-layer.
-
-I'll try how everything works with my test examples, and will let you know.
-
-Your help is greatly appreciated,
-Aleksandar
-
-Fond regards,
-Aleksandar
-
-
->
-Alex Bennée
-
-Aleksandar Markovic <address@hidden> writes:
-
->
->> #./configure --enable-gcov
->
->> #make
->
->> #make check
->
->> #make coverage-report
->
->>
->
->> It seems that first three commands execute as expected. (For example,
->
->> there are plenty of files generated by "make check" that would've not
->
->> been generated if "enable-gcov" hadn't been chosen.) However, the
->
->> last command complains about some missing files related to FP
->
->
-> So your failure mode is no report is generated at all? It's working for
->
-> me here.
->
->
-Alex, here is the thing:
->
->
-Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from
->
-git repo to the most recent 4.1 (actually, to a dev version, from the very
->
-tip of the tree), and "make coverage-report" started generating coverage
->
-reports. It did emit some error messages (totally different than previous),
->
-but still it did not stop like it used to do with gcovr 3.2.
->
->
-Perhaps you would want to add some gcov/gcovr minimal version info in our
->
-docs. (or at least a statement "this was tested with such and such gcc, gcov
->
-and gcovr", etc.?)
->
->
-Coverage report looked fine at first glance, but it a kind of
->
-disappointed me when I digged deeper into its content - for example,
->
-it shows very low coverage for our FP code (softfloat), while, in
->
-fact, we know that "make check" contains detailed tests on FP
->
-functionalities. But this is most likely a separate problem of a very
->
-different nature, perhaps the issue of separate git repo for FP tests
->
-(testfloat) that our FP tests use as a mid-layer.
-I get:
-
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
-
-Which is not bad considering we don't exercise the 80 and 128 bit
-softfloat code at all (which is not shared by the re-factored 16/32/64
-bit code).
-
->
->
-I'll try how everything works with my test examples, and will let you know.
->
->
-Your help is greatly appreciated,
->
-Aleksandar
->
->
-Fond regards,
->
-Aleksandar
->
->
->
-> Alex Bennée
---
-Alex Bennée
-
->
-> it shows very low coverage for our FP code (softfloat), while, in
->
-> fact, we know that "make check" contains detailed tests on FP
->
-> functionalities. But this is most likely a separate problem of a very
->
-> different nature, perhaps the issue of separate git repo for FP tests
->
-> (testfloat) that our FP tests use as a mid-layer.
->
->
-I get:
->
->
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
->
-I would expect that kind of result too.
-
-However, I get:
-
-File:   fpu/softfloat.c                 Lines:  8       3334    0.2 %
-Date:   2019-08-05 19:56:58             Branches:       3       2376    0.1 %
-
-:(
-
-OK, I'll try to figure that out, and most likely I could live with it if it is 
-an isolated problem.
-
-Thank you for your assistance in this matter,
-Aleksandar
-
->
-Which is not bad considering we don't exercise the 80 and 128 bit
->
-softfloat code at all (which is not shared by the re-factored 16/32/64
->
-bit code).
->
->
-Alex Bennée
-
->
-> it shows very low coverage for our FP code (softfloat), while, in
->
-> fact, we know that "make check" contains detailed tests on FP
->
-> functionalities. But this is most likely a separate problem of a very
->
-> different nature, perhaps the issue of separate git repo for FP tests
->
-> (testfloat) that our FP tests use as a mid-layer.
->
->
-I get:
->
->
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
->
-This problem is solved too. (and it is my fault)
-
-I worked with multiple versions of QEMU, and my previous low-coverage results 
-were for QEMU 3.0, and for that version the directory tests/fp did not even 
-exist. :D (<blush>)
-
-For QEMU ToT, I get now:
-
-fpu/softfloat.c         
-        68.8 %  2592 / 3770     62.3 %  1693 / 2718
-
-which is identical for all intents and purposes to your result.
-
-Yours cordially,
-Aleksandar
-
diff --git a/results/classifier/007/permissions/51610399 b/results/classifier/007/permissions/51610399
deleted file mode 100644
index 2e420e72d..000000000
--- a/results/classifier/007/permissions/51610399
+++ /dev/null
@@ -1,318 +0,0 @@
-permissions: 0.988
-debug: 0.986
-boot: 0.986
-graphic: 0.986
-other: 0.985
-semantic: 0.984
-device: 0.984
-performance: 0.983
-files: 0.981
-PID: 0.978
-socket: 0.978
-KVM: 0.975
-vnc: 0.974
-network: 0.973
-
-[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()”
-
-Bug Description:
-Encountering a boot failure when launching a KVM guest with
-qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor
-crashes.
-Reproduction Steps:
-# qemu-system-ppc64 --version
-QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
-Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
-# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
-pseries,accel=kvm \
--m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
-  -device virtio-scsi-pci,id=scsi \
--drive
-file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
-\
--device scsi-hd,drive=drive0,bus=scsi.0 \
-  -netdev bridge,id=net0,br=virbr0 \
-  -device virtio-net-pci,netdev=net0 \
-  -serial pty \
-  -device virtio-balloon-pci \
-  -cpu host
-QEMU 9.2.50 monitor - type 'help' for more information
-char device redirected to /dev/pts/2 (label serial0)
-(qemu)
-(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
-unavailable: IRQ_XIVE capability must be present for KVM
-Falling back to kernel-irqchip=off
-** Qemu Hang
-
-(In another ssh session)
-# screen /dev/pts/2
-Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
-(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
-(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
-15:20:17 UTC 2024
-Detected machine type: 0000000000000101
-command line:
-BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
-root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
-Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
-Calling ibm,client-architecture-support... done
-memory layout at init:
-  memory_limit : 0000000000000000 (16 MB aligned)
-  alloc_bottom : 0000000008200000
-  alloc_top    : 0000000030000000
-  alloc_top_hi : 0000000800000000
-  rmo_top      : 0000000030000000
-  ram_top      : 0000000800000000
-instantiating rtas at 0x000000002fff0000... done
-prom_hold_cpus: skipped
-copying OF device tree...
-Building dt strings...
-Building dt structure...
-Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
-Device tree struct  0x0000000008220000 -> 0x0000000008230000
-Quiescing Open Firmware ...
-Booting Linux via __start() @ 0x0000000000440000 ...
-** Guest Console Hang
-
-
-Git Bisect:
-Performing git bisect points to the following patch:
-# git bisect bad
-e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
-commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
-Author: Nicholas Piggin <npiggin@gmail.com>
-Date:   Thu Dec 19 13:40:31 2024 +1000
-
-    target/ppc: fix timebase register reset state
-(H)DEC and PURR get reset before icount does, which causes them to
-be
-skewed and not match the init state. This can cause replay to not
-match the recorded trace exactly. For DEC and HDEC this is usually
-not
-noticable since they tend to get programmed before affecting the
-    target machine. PURR has been observed to cause replay bugs when
-    running Linux.
-
-    Fix this by resetting using a time of 0.
-
-    Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
-    Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
-
- hw/ppc/ppc.c | 11 ++++++++---
- 1 file changed, 8 insertions(+), 3 deletions(-)
-
-
-Reverting the patch helps boot the guest.
-Thanks,
-Misbah Anjum N
-
-Thanks for the report.
-
-Tricky problem. A secondary CPU is hanging before it is started by the
-primary via rtas call.
-
-That secondary keeps calling kvm_cpu_exec(), which keeps exiting out
-early with EXCP_HLT because kvm_arch_process_async_events() returns
-true because that cpu has ->halted=1. That just goes around he run
-loop because there is an interrupt pending (DEC).
-
-So it never runs. It also never releases the BQL, and another CPU,
-the primary which is actually supposed to be running, is stuck in
-spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL.
-
-This patch just exposes the bug I think, by causing the interrupt.
-although I'm not quite sure why it's okay previously (-ve decrementer
-values should be causing a timer exception too). The timer exception
-should not be taken as an interrupt by those secondary CPUs, and it
-doesn't because it is masked, until set_all_lpcrs sets an LPCR value
-that enables powersave wakeup on decrementer interrupt.
-
-The start_powered_off sate just sets ->halted, which makes it look
-like a powersaving state. Logically I think it's not the same thing
-as far as spapr goes. I don't know why start_powered_off only sets
-->halted, and not ->stop/stopped as well.
-
-Not sure how best to solve it cleanly. I'll send a revert if I can't
-get something working soon.
-
-Thanks,
-Nick
-
-On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote:
->
-Bug Description:
->
-Encountering a boot failure when launching a KVM guest with
->
-qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor
->
-crashes.
->
->
->
-Reproduction Steps:
->
-# qemu-system-ppc64 --version
->
-QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f)
->
-Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers
->
->
-# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine
->
-pseries,accel=kvm \
->
--m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \
->
--device virtio-scsi-pci,id=scsi \
->
--drive
->
-file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2
->
->
-\
->
--device scsi-hd,drive=drive0,bus=scsi.0 \
->
--netdev bridge,id=net0,br=virbr0 \
->
--device virtio-net-pci,netdev=net0 \
->
--serial pty \
->
--device virtio-balloon-pci \
->
--cpu host
->
-QEMU 9.2.50 monitor - type 'help' for more information
->
-char device redirected to /dev/pts/2 (label serial0)
->
-(qemu)
->
-(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but
->
-unavailable: IRQ_XIVE capability must be present for KVM
->
-Falling back to kernel-irqchip=off
->
-** Qemu Hang
->
->
-(In another ssh session)
->
-# screen /dev/pts/2
->
-Preparing to boot Linux version 6.10.4-200.fc40.ppc64le
->
-(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801
->
-(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11
->
-15:20:17 UTC 2024
->
-Detected machine type: 0000000000000101
->
-command line:
->
-BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le
->
-root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M
->
-Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
->
-Calling ibm,client-architecture-support... done
->
-memory layout at init:
->
-memory_limit : 0000000000000000 (16 MB aligned)
->
-alloc_bottom : 0000000008200000
->
-alloc_top    : 0000000030000000
->
-alloc_top_hi : 0000000800000000
->
-rmo_top      : 0000000030000000
->
-ram_top      : 0000000800000000
->
-instantiating rtas at 0x000000002fff0000... done
->
-prom_hold_cpus: skipped
->
-copying OF device tree...
->
-Building dt strings...
->
-Building dt structure...
->
-Device tree strings 0x0000000008210000 -> 0x0000000008210bd0
->
-Device tree struct  0x0000000008220000 -> 0x0000000008230000
->
-Quiescing Open Firmware ...
->
-Booting Linux via __start() @ 0x0000000000440000 ...
->
-** Guest Console Hang
->
->
->
-Git Bisect:
->
-Performing git bisect points to the following patch:
->
-# git bisect bad
->
-e8291ec16da80566c121c68d9112be458954d90b is the first bad commit
->
-commit e8291ec16da80566c121c68d9112be458954d90b (HEAD)
->
-Author: Nicholas Piggin <npiggin@gmail.com>
->
-Date:   Thu Dec 19 13:40:31 2024 +1000
->
->
-target/ppc: fix timebase register reset state
->
->
-(H)DEC and PURR get reset before icount does, which causes them to
->
-be
->
-skewed and not match the init state. This can cause replay to not
->
-match the recorded trace exactly. For DEC and HDEC this is usually
->
-not
->
-noticable since they tend to get programmed before affecting the
->
-target machine. PURR has been observed to cause replay bugs when
->
-running Linux.
->
->
-Fix this by resetting using a time of 0.
->
->
-Message-ID: <20241219034035.1826173-2-npiggin@gmail.com>
->
-Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
->
->
-hw/ppc/ppc.c | 11 ++++++++---
->
-1 file changed, 8 insertions(+), 3 deletions(-)
->
->
->
-Reverting the patch helps boot the guest.
->
-Thanks,
->
-Misbah Anjum N
-
diff --git a/results/classifier/007/permissions/55247116 b/results/classifier/007/permissions/55247116
deleted file mode 100644
index cbe7dfafd..000000000
--- a/results/classifier/007/permissions/55247116
+++ /dev/null
@@ -1,1320 +0,0 @@
-permissions: 0.946
-other: 0.945
-debug: 0.941
-performance: 0.938
-graphic: 0.933
-PID: 0.929
-socket: 0.929
-semantic: 0.928
-device: 0.919
-boot: 0.918
-network: 0.916
-vnc: 0.916
-files: 0.912
-KVM: 0.894
-
-[Qemu-devel]  [RFC/BUG] xen-mapcache: buggy invalidate map cache?
-
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
-instead of first level entry (if map to rom other than guest memory
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns probably
-mapped to different mfns, thus if guest asks device to DMA to these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-What’s your opinion? Is this a bug? Is my analyze correct?
-
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-Hi,
->
->
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-instead of first level entry (if map to rom other than guest memory
->
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-out memory, qemu did not invalidate cache entries in linked
->
-list(entry->next), so when VM balloon back in memory, gfns probably
->
-mapped to different mfns, thus if guest asks device to DMA to these
->
-GPA, qemu may DMA to stale MFNs.
->
->
-So I think in xen_invalidate_map_cache() linked lists should also be
->
-checked and invalidated.
->
->
-What’s your opinion? Is this a bug? Is my analyze correct?
-Added Jun Nakajima and Alexander Graf
-
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> Hi,
->
->
->
-> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> instead of first level entry (if map to rom other than guest memory
->
-> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> out memory, qemu did not invalidate cache entries in linked
->
-> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> mapped to different mfns, thus if guest asks device to DMA to these
->
-> GPA, qemu may DMA to stale MFNs.
->
->
->
-> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> checked and invalidated.
->
->
->
-> What’s your opinion? Is this a bug? Is my analyze correct?
->
->
-Added Jun Nakajima and Alexander Graf
-And correct Stefano Stabellini's email address.
-
-On Mon, 10 Apr 2017 00:36:02 +0800
-hrg <address@hidden> wrote:
-
-Hi,
-
->
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> Hi,
->
->>
->
->> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> instead of first level entry (if map to rom other than guest memory
->
->> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> out memory, qemu did not invalidate cache entries in linked
->
->> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> mapped to different mfns, thus if guest asks device to DMA to these
->
->> GPA, qemu may DMA to stale MFNs.
->
->>
->
->> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> checked and invalidated.
->
->>
->
->> What’s your opinion? Is this a bug? Is my analyze correct?
->
->
->
-> Added Jun Nakajima and Alexander Graf
->
-And correct Stefano Stabellini's email address.
-There is a real issue with the xen-mapcache corruption in fact. I encountered
-it a few months ago while experimenting with Q35 support on Xen. Q35 emulation
-uses an AHCI controller by default, along with NCQ mode enabled. The issue can
-be (somewhat) easily reproduced there, though using a normal i440 emulation
-might possibly allow to reproduce the issue as well, using a dedicated test
-code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is".
-
-The issue occurs when a guest domain performs an intensive disk I/O, ex. while
-guest OS booting. QEMU crashes with "Bad ram offset 980aa000"
-message logged, where the address is different each time. The hard thing with
-this issue is that it has a very low reproducibility rate.
-
-The corruption happens when there are multiple I/O commands in the NCQ queue.
-So there are overlapping emulated DMA operations in flight and QEMU uses a
-sequence of mapcache actions which can be executed in the "wrong" order thus
-leading to an inconsistent xen-mapcache - so a bad address from the wrong
-entry is returned.
-
-The bad thing with this issue is that QEMU crash due to "Bad ram offset"
-appearance is a relatively good situation in the sense that this is a caught
-error. But there might be a much worse (artificial) situation where the returned
-address looks valid but points to a different mapped memory.
-
-The fix itself is not hard (ex. an additional checked field in MapCacheEntry),
-but there is a need of some reliable way to test it considering the low
-reproducibility rate.
-
-Regards,
-Alex
-
-On Mon, 10 Apr 2017, hrg wrote:
->
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> Hi,
->
->>
->
->> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> instead of first level entry (if map to rom other than guest memory
->
->> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> out memory, qemu did not invalidate cache entries in linked
->
->> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> mapped to different mfns, thus if guest asks device to DMA to these
->
->> GPA, qemu may DMA to stale MFNs.
->
->>
->
->> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> checked and invalidated.
->
->>
->
->> What’s your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-On Mon, 10 Apr 2017, hrg wrote:
->
-> On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> >> Hi,
->
-> >>
->
-> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> >> instead of first level entry (if map to rom other than guest memory
->
-> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> >> out memory, qemu did not invalidate cache entries in linked
->
-> >> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> >> mapped to different mfns, thus if guest asks device to DMA to these
->
-> >> GPA, qemu may DMA to stale MFNs.
->
-> >>
->
-> >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> >> checked and invalidated.
->
-> >>
->
-> >> What’s your opinion? Is this a bug? Is my analyze correct?
->
->
-Yes, you are right. We need to go through the list for each element of
->
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
->
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> On Mon, 10 Apr 2017, hrg wrote:
->
-> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > >> Hi,
->
-> > >>
->
-> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> > >> instead of first level entry (if map to rom other than guest memory
->
-> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> > >> out memory, qemu did not invalidate cache entries in linked
->
-> > >> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> > >> mapped to different mfns, thus if guest asks device to DMA to these
->
-> > >> GPA, qemu may DMA to stale MFNs.
->
-> > >>
->
-> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> > >> checked and invalidated.
->
-> > >>
->
-> > >> What’s your opinion? Is this a bug? Is my analyze correct?
->
->
->
-> Yes, you are right. We need to go through the list for each element of
->
-> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
->
-I spoke too soon. In the regular case there should be no locked mappings
->
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-beginning of the functions). Without locked mappings, there should never
->
-be more than one element in each list (see xen_map_cache_unlocked:
->
-entry->lock == true is a necessary condition to append a new entry to
->
-the list, otherwise it is just remapped).
->
->
-Can you confirm that what you are seeing are locked mappings
->
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-
-On Tue, 11 Apr 2017, hrg wrote:
->
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-<address@hidden> wrote:
->
-> On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
->> On Mon, 10 Apr 2017, hrg wrote:
->
->> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
->> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> > >> Hi,
->
->> > >>
->
->> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> > >> instead of first level entry (if map to rom other than guest memory
->
->> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> > >> out memory, qemu did not invalidate cache entries in linked
->
->> > >> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> > >> mapped to different mfns, thus if guest asks device to DMA to these
->
->> > >> GPA, qemu may DMA to stale MFNs.
->
->> > >>
->
->> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> > >> checked and invalidated.
->
->> > >>
->
->> > >> What’s your opinion? Is this a bug? Is my analyze correct?
->
->>
->
->> Yes, you are right. We need to go through the list for each element of
->
->> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
->
->
-> I spoke too soon. In the regular case there should be no locked mappings
->
-> when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> beginning of the functions). Without locked mappings, there should never
->
-> be more than one element in each list (see xen_map_cache_unlocked:
->
-> entry->lock == true is a necessary condition to append a new entry to
->
-> the list, otherwise it is just remapped).
->
->
->
-> Can you confirm that what you are seeing are locked mappings
->
-> when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
->
-In fact, I think the DPRINTF above is incorrect too. In
->
-pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-pci_add_option_rom->memory_region_get_ram_ptr (after
->
-memory_region_init_ram). So actually I think we should remove the
->
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
-is_default_rom,
-     }
- 
-     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
- }
- 
- static void pci_del_option_rom(PCIDevice *pdev)
-
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
-Stefano Stabellini <address@hidden> wrote:
-
->
-On Tue, 11 Apr 2017, hrg wrote:
->
-> On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> <address@hidden> wrote:
->
-> > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> >> On Mon, 10 Apr 2017, hrg wrote:
->
-> >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> >> > >> Hi,
->
-> >> > >>
->
-> >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> >> > >> entry->next instead of first level entry (if map to rom other than
->
-> >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
->
-> >> > >> when VM ballooned out memory, qemu did not invalidate cache entries
->
-> >> > >> in linked list(entry->next), so when VM balloon back in memory,
->
-> >> > >> gfns probably mapped to different mfns, thus if guest asks device
->
-> >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
->
-> >> > >>
->
-> >> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> >> > >> checked and invalidated.
->
-> >> > >>
->
-> >> > >> What’s your opinion? Is this a bug? Is my analyze correct?
->
-> >>
->
-> >> Yes, you are right. We need to go through the list for each element of
->
-> >> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> >
->
-> > I spoke too soon. In the regular case there should be no locked mappings
->
-> > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > beginning of the functions). Without locked mappings, there should never
->
-> > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > entry->lock == true is a necessary condition to append a new entry to
->
-> > the list, otherwise it is just remapped).
->
-> >
->
-> > Can you confirm that what you are seeing are locked mappings
->
-> > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
->
->
-> In fact, I think the DPRINTF above is incorrect too. In
->
-> pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> memory_region_init_ram). So actually I think we should remove the
->
-> DPRINTF warning as it is normal.
->
->
-Let me explain why the DPRINTF warning is there: emulated dma operations
->
-can involve locked mappings. Once a dma operation completes, the related
->
-mapping is unlocked and can be safely destroyed. But if we destroy a
->
-locked mapping in xen_invalidate_map_cache, while a dma is still
->
-ongoing, QEMU will crash. We cannot handle that case.
->
->
-However, the scenario you described is different. It has nothing to do
->
-with DMA. It looks like pci_add_option_rom calls
->
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-locked mapping and it is never unlocked or destroyed.
->
->
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-the append patch fix the problem you are seeing? For the proper fix, I
->
-think we probably need some sort of memory_region_unmap wrapper or maybe
->
-a call to address_space_unmap.
-Hmm, for some reason my message to the Xen-devel list got rejected but was sent
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
-obvious as a list newbie.
-
-Stefano, hrg,
-
-There is an issue with inconsistency between the list of normal MapCacheEntry's
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
-When bad situation happens, there are multiple (locked) MapCacheEntry
-entries in the bucket's linked list along with a number of MapCacheRev's. And
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
-first list and calculates a wrong pointer from it which may then be caught with
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
-this issue as well I think.
-
-I'll try to provide a test code which can reproduce the issue from the
-guest side using an emulated IDE controller, though it's much simpler to achieve
-this result with an AHCI controller using multiple NCQ I/O commands. So far I've
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
-DMA should be enough I think.
-
-On 2017/4/12 14:17, Alexey G wrote:
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
-Stefano Stabellini <address@hidden> wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in
-entry->next instead of first level entry (if map to rom other than
-guest memory comes first), while in xen_invalidate_map_cache(),
-when VM ballooned out memory, qemu did not invalidate cache entries
-in linked list(entry->next), so when VM balloon back in memory,
-gfns probably mapped to different mfns, thus if guest asks device
-to DMA to these GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-What’s your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Hmm, for some reason my message to the Xen-devel list got rejected but was sent
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
-obvious as a list newbie.
-
-Stefano, hrg,
-
-There is an issue with inconsistency between the list of normal MapCacheEntry's
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
-When bad situation happens, there are multiple (locked) MapCacheEntry
-entries in the bucket's linked list along with a number of MapCacheRev's. And
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
-first list and calculates a wrong pointer from it which may then be caught with
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
-this issue as well I think.
-
-I'll try to provide a test code which can reproduce the issue from the
-guest side using an emulated IDE controller, though it's much simpler to achieve
-this result with an AHCI controller using multiple NCQ I/O commands. So far I've
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
-DMA should be enough I think.
-Yes, I think there may be other bugs lurking, considering the complexity, 
-though we need to reproduce it if we want to delve into it.
-
-On Wed, 12 Apr 2017, Alexey G wrote:
->
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
->
-Stefano Stabellini <address@hidden> wrote:
->
->
-> On Tue, 11 Apr 2017, hrg wrote:
->
-> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > <address@hidden> wrote:
->
-> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > >> On Mon, 10 Apr 2017, hrg wrote:
->
-> > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > >> > >> Hi,
->
-> > >> > >>
->
-> > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > >> > >> entry->next instead of first level entry (if map to rom other than
->
-> > >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
->
-> > >> > >> when VM ballooned out memory, qemu did not invalidate cache
->
-> > >> > >> entries
->
-> > >> > >> in linked list(entry->next), so when VM balloon back in memory,
->
-> > >> > >> gfns probably mapped to different mfns, thus if guest asks device
->
-> > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
->
-> > >> > >>
->
-> > >> > >> So I think in xen_invalidate_map_cache() linked lists should also
->
-> > >> > >> be
->
-> > >> > >> checked and invalidated.
->
-> > >> > >>
->
-> > >> > >> What’s your opinion? Is this a bug? Is my analyze correct?
->
-> > >>
->
-> > >> Yes, you are right. We need to go through the list for each element of
->
-> > >> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> > >
->
-> > > I spoke too soon. In the regular case there should be no locked mappings
->
-> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > > beginning of the functions). Without locked mappings, there should never
->
-> > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > entry->lock == true is a necessary condition to append a new entry to
->
-> > > the list, otherwise it is just remapped).
->
-> > >
->
-> > > Can you confirm that what you are seeing are locked mappings
->
-> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> >
->
-> > In fact, I think the DPRINTF above is incorrect too. In
->
-> > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > memory_region_init_ram). So actually I think we should remove the
->
-> > DPRINTF warning as it is normal.
->
->
->
-> Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> can involve locked mappings. Once a dma operation completes, the related
->
-> mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> ongoing, QEMU will crash. We cannot handle that case.
->
->
->
-> However, the scenario you described is different. It has nothing to do
->
-> with DMA. It looks like pci_add_option_rom calls
->
-> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> locked mapping and it is never unlocked or destroyed.
->
->
->
-> It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> the append patch fix the problem you are seeing? For the proper fix, I
->
-> think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> a call to address_space_unmap.
->
->
-Hmm, for some reason my message to the Xen-devel list got rejected but was
->
-sent
->
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
->
-obvious as a list newbie.
->
->
-Stefano, hrg,
->
->
-There is an issue with inconsistency between the list of normal
->
-MapCacheEntry's
->
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
->
-When bad situation happens, there are multiple (locked) MapCacheEntry
->
-entries in the bucket's linked list along with a number of MapCacheRev's. And
->
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
->
-first list and calculates a wrong pointer from it which may then be caught
->
-with
->
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
->
-this issue as well I think.
->
->
-I'll try to provide a test code which can reproduce the issue from the
->
-guest side using an emulated IDE controller, though it's much simpler to
->
-achieve
->
-this result with an AHCI controller using multiple NCQ I/O commands. So far
->
-I've
->
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block
->
-I/O
->
-DMA should be enough I think.
-That would be helpful. Please see if you can reproduce it after fixing
-the other issue (
-http://marc.info/?l=qemu-devel&m=149195042500707&w=2
-).
-
-On 2017/4/12 6:32, Stefano Stabellini wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
-instead of first level entry (if map to rom other than guest memory
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns probably
-mapped to different mfns, thus if guest asks device to DMA to these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-What’s your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Yes, I think so, maybe this is the proper way to fix this.
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
-is_default_rom,
-      }
-pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
-  }
-static void pci_del_option_rom(PCIDevice *pdev)
-
-On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
->
-On 2017/4/12 6:32, Stefano Stabellini wrote:
->
-> On Tue, 11 Apr 2017, hrg wrote:
->
-> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > <address@hidden> wrote:
->
-> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > > > On Mon, 10 Apr 2017, hrg wrote:
->
-> > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > > > > > > Hi,
->
-> > > > > > >
->
-> > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > > > > > > entry->next
->
-> > > > > > > instead of first level entry (if map to rom other than guest
->
-> > > > > > > memory
->
-> > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
->
-> > > > > > > ballooned
->
-> > > > > > > out memory, qemu did not invalidate cache entries in linked
->
-> > > > > > > list(entry->next), so when VM balloon back in memory, gfns
->
-> > > > > > > probably
->
-> > > > > > > mapped to different mfns, thus if guest asks device to DMA to
->
-> > > > > > > these
->
-> > > > > > > GPA, qemu may DMA to stale MFNs.
->
-> > > > > > >
->
-> > > > > > > So I think in xen_invalidate_map_cache() linked lists should
->
-> > > > > > > also be
->
-> > > > > > > checked and invalidated.
->
-> > > > > > >
->
-> > > > > > > What’s your opinion? Is this a bug? Is my analyze correct?
->
-> > > > Yes, you are right. We need to go through the list for each element of
->
-> > > > the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> > > I spoke too soon. In the regular case there should be no locked mappings
->
-> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > > beginning of the functions). Without locked mappings, there should never
->
-> > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > entry->lock == true is a necessary condition to append a new entry to
->
-> > > the list, otherwise it is just remapped).
->
-> > >
->
-> > > Can you confirm that what you are seeing are locked mappings
->
-> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> > In fact, I think the DPRINTF above is incorrect too. In
->
-> > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > memory_region_init_ram). So actually I think we should remove the
->
-> > DPRINTF warning as it is normal.
->
-> Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> can involve locked mappings. Once a dma operation completes, the related
->
-> mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> ongoing, QEMU will crash. We cannot handle that case.
->
->
->
-> However, the scenario you described is different. It has nothing to do
->
-> with DMA. It looks like pci_add_option_rom calls
->
-> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> locked mapping and it is never unlocked or destroyed.
->
->
->
-> It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> the append patch fix the problem you are seeing? For the proper fix, I
->
-> think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> a call to address_space_unmap.
->
->
-Yes, I think so, maybe this is the proper way to fix this.
-Would you be up for sending a proper patch and testing it? We cannot call
-xen_invalidate_map_cache_entry directly from pci.c though, it would need
-to be one of the other functions like address_space_unmap for example.
-
-
->
-> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-> index e6b08e1..04f98b7 100644
->
-> --- a/hw/pci/pci.c
->
-> +++ b/hw/pci/pci.c
->
-> @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
->
-> is_default_rom,
->
->       }
->
->         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
->
-> +    xen_invalidate_map_cache_entry(ptr);
->
->   }
->
->     static void pci_del_option_rom(PCIDevice *pdev)
-
-On 2017/4/13 7:51, Stefano Stabellini wrote:
-On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
-On 2017/4/12 6:32, Stefano Stabellini wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in
-entry->next
-instead of first level entry (if map to rom other than guest
-memory
-comes first), while in xen_invalidate_map_cache(), when VM
-ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns
-probably
-mapped to different mfns, thus if guest asks device to DMA to
-these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should
-also be
-checked and invalidated.
-
-What’s your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Yes, I think so, maybe this is the proper way to fix this.
-Would you be up for sending a proper patch and testing it? We cannot call
-xen_invalidate_map_cache_entry directly from pci.c though, it would need
-to be one of the other functions like address_space_unmap for example.
-Yes, I will look into this.
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
-is_default_rom,
-       }
-         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
-   }
-     static void pci_del_option_rom(PCIDevice *pdev)
-
-On Thu, 13 Apr 2017, Herongguang (Stephen) wrote:
->
-On 2017/4/13 7:51, Stefano Stabellini wrote:
->
-> On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
->
-> > On 2017/4/12 6:32, Stefano Stabellini wrote:
->
-> > > On Tue, 11 Apr 2017, hrg wrote:
->
-> > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > > > <address@hidden> wrote:
->
-> > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > > > > > On Mon, 10 Apr 2017, hrg wrote:
->
-> > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden>
->
-> > > > > > > wrote:
->
-> > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden>
->
-> > > > > > > > wrote:
->
-> > > > > > > > > Hi,
->
-> > > > > > > > >
->
-> > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > > > > > > > > entry->next
->
-> > > > > > > > > instead of first level entry (if map to rom other than guest
->
-> > > > > > > > > memory
->
-> > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
->
-> > > > > > > > > ballooned
->
-> > > > > > > > > out memory, qemu did not invalidate cache entries in linked
->
-> > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns
->
-> > > > > > > > > probably
->
-> > > > > > > > > mapped to different mfns, thus if guest asks device to DMA
->
-> > > > > > > > > to
->
-> > > > > > > > > these
->
-> > > > > > > > > GPA, qemu may DMA to stale MFNs.
->
-> > > > > > > > >
->
-> > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should
->
-> > > > > > > > > also be
->
-> > > > > > > > > checked and invalidated.
->
-> > > > > > > > >
->
-> > > > > > > > > What’s your opinion? Is this a bug? Is my analyze correct?
->
-> > > > > > Yes, you are right. We need to go through the list for each
->
-> > > > > > element of
->
-> > > > > > the array in xen_invalidate_map_cache. Can you come up with a
->
-> > > > > > patch?
->
-> > > > > I spoke too soon. In the regular case there should be no locked
->
-> > > > > mappings
->
-> > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at
->
-> > > > > the
->
-> > > > > beginning of the functions). Without locked mappings, there should
->
-> > > > > never
->
-> > > > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > > > entry->lock == true is a necessary condition to append a new entry
->
-> > > > > to
->
-> > > > > the list, otherwise it is just remapped).
->
-> > > > >
->
-> > > > > Can you confirm that what you are seeing are locked mappings
->
-> > > > > when xen_invalidate_map_cache is called? To find out, enable the
->
-> > > > > DPRINTK
->
-> > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> > > > In fact, I think the DPRINTF above is incorrect too. In
->
-> > > > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > > > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > > > memory_region_init_ram). So actually I think we should remove the
->
-> > > > DPRINTF warning as it is normal.
->
-> > > Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> > > can involve locked mappings. Once a dma operation completes, the related
->
-> > > mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> > > locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> > > ongoing, QEMU will crash. We cannot handle that case.
->
-> > >
->
-> > > However, the scenario you described is different. It has nothing to do
->
-> > > with DMA. It looks like pci_add_option_rom calls
->
-> > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> > > locked mapping and it is never unlocked or destroyed.
->
-> > >
->
-> > > It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> > > the append patch fix the problem you are seeing? For the proper fix, I
->
-> > > think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> > > a call to address_space_unmap.
->
-> >
->
-> > Yes, I think so, maybe this is the proper way to fix this.
->
->
->
-> Would you be up for sending a proper patch and testing it? We cannot call
->
-> xen_invalidate_map_cache_entry directly from pci.c though, it would need
->
-> to be one of the other functions like address_space_unmap for example.
->
->
->
->
->
-Yes, I will look into this.
-Any updates?
-
-
->
-> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-> > > index e6b08e1..04f98b7 100644
->
-> > > --- a/hw/pci/pci.c
->
-> > > +++ b/hw/pci/pci.c
->
-> > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev,
->
-> > > bool
->
-> > > is_default_rom,
->
-> > >        }
->
-> > >          pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
->
-> > > +    xen_invalidate_map_cache_entry(ptr);
->
-> > >    }
->
-> > >      static void pci_del_option_rom(PCIDevice *pdev)
->
-
diff --git a/results/classifier/007/permissions/67821138 b/results/classifier/007/permissions/67821138
deleted file mode 100644
index 37755ea48..000000000
--- a/results/classifier/007/permissions/67821138
+++ /dev/null
@@ -1,209 +0,0 @@
-permissions: 0.935
-device: 0.916
-PID: 0.909
-boot: 0.881
-debug: 0.870
-other: 0.853
-performance: 0.845
-semantic: 0.843
-graphic: 0.826
-files: 0.824
-KVM: 0.822
-vnc: 0.734
-network: 0.718
-socket: 0.699
-
-[BUG, RFC] Base node is in RW after making external snapshot
-
-Hi everyone,
-
-When making an external snapshot, we end up in a situation when 2 block
-graph nodes related to the same image file (format and storage nodes)
-have different RO flags set on them.
-
-E.g.
-
-# ls -la /proc/PID/fd
-lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd
-
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
---pretty | egrep '"node-name"|"ro"'
-      "ro": false,
-      "node-name": "libvirt-1-format",
-      "ro": false,
-      "node-name": "libvirt-1-storage",
-
-# virsh snapshot-create-as VM --name snap --disk-only
-Domain snapshot snap created
-
-# ls -la /proc/PID/fd
-lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd
-lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap
-
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
---pretty | egrep '"node-name"|"ro"'
-      "ro": false,
-      "node-name": "libvirt-2-format",
-      "ro": false,
-      "node-name": "libvirt-2-storage",
-      "ro": true,
-      "node-name": "libvirt-1-format",
-      "ro": false,                        <--------------
-      "node-name": "libvirt-1-storage",
-
-File descriptor has been reopened in RO, but "libvirt-1-storage" node
-still has RW permissions set.
-
-I'm wondering it this a bug or this is intended?  Looks like a bug to
-me, although I see that some iotests (e.g. 273) expect 2 nodes related
-to the same image file to have different RO flags.
-
-bdrv_reopen_set_read_only()
-  bdrv_reopen()
-    bdrv_reopen_queue()
-      bdrv_reopen_queue_child()
-    bdrv_reopen_multiple()
-      bdrv_list_refresh_perms()
-        bdrv_topological_dfs()
-        bdrv_do_refresh_perms()
-      bdrv_reopen_commit()
-
-In the stack above bdrv_reopen_set_read_only() is only being called for
-the parent (libvirt-1-format) node.  There're 2 lists: BDSs from
-refresh_list are used by bdrv_drv_set_perm and this leads to actual
-reopen with RO of the file descriptor.  And then there's reopen queue
-bs_queue -- BDSs from this queue get their parameters updated.  While
-refresh_list ends up having the whole subtree (including children, this
-is done in bdrv_topological_dfs()) bs_queue only has the parent.  And
-that is because storage (child) node's (bs->inherits_from == NULL), so
-bdrv_reopen_queue_child() never adds it to the queue.  Could it be the
-source of this bug?
-
-Anyway, would greatly appreciate a clarification.
-
-Andrey
-
-On 4/24/24 21:00, Andrey Drobyshev wrote:
->
-Hi everyone,
->
->
-When making an external snapshot, we end up in a situation when 2 block
->
-graph nodes related to the same image file (format and storage nodes)
->
-have different RO flags set on them.
->
->
-E.g.
->
->
-# ls -la /proc/PID/fd
->
-lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd
->
->
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
->
---pretty | egrep '"node-name"|"ro"'
->
-"ro": false,
->
-"node-name": "libvirt-1-format",
->
-"ro": false,
->
-"node-name": "libvirt-1-storage",
->
->
-# virsh snapshot-create-as VM --name snap --disk-only
->
-Domain snapshot snap created
->
->
-# ls -la /proc/PID/fd
->
-lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd
->
-lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap
->
->
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
->
---pretty | egrep '"node-name"|"ro"'
->
-"ro": false,
->
-"node-name": "libvirt-2-format",
->
-"ro": false,
->
-"node-name": "libvirt-2-storage",
->
-"ro": true,
->
-"node-name": "libvirt-1-format",
->
-"ro": false,                        <--------------
->
-"node-name": "libvirt-1-storage",
->
->
-File descriptor has been reopened in RO, but "libvirt-1-storage" node
->
-still has RW permissions set.
->
->
-I'm wondering it this a bug or this is intended?  Looks like a bug to
->
-me, although I see that some iotests (e.g. 273) expect 2 nodes related
->
-to the same image file to have different RO flags.
->
->
-bdrv_reopen_set_read_only()
->
-bdrv_reopen()
->
-bdrv_reopen_queue()
->
-bdrv_reopen_queue_child()
->
-bdrv_reopen_multiple()
->
-bdrv_list_refresh_perms()
->
-bdrv_topological_dfs()
->
-bdrv_do_refresh_perms()
->
-bdrv_reopen_commit()
->
->
-In the stack above bdrv_reopen_set_read_only() is only being called for
->
-the parent (libvirt-1-format) node.  There're 2 lists: BDSs from
->
-refresh_list are used by bdrv_drv_set_perm and this leads to actual
->
-reopen with RO of the file descriptor.  And then there's reopen queue
->
-bs_queue -- BDSs from this queue get their parameters updated.  While
->
-refresh_list ends up having the whole subtree (including children, this
->
-is done in bdrv_topological_dfs()) bs_queue only has the parent.  And
->
-that is because storage (child) node's (bs->inherits_from == NULL), so
->
-bdrv_reopen_queue_child() never adds it to the queue.  Could it be the
->
-source of this bug?
->
->
-Anyway, would greatly appreciate a clarification.
->
->
-Andrey
-Friendly ping.  Could somebody confirm that it is a bug indeed?
-
diff --git a/results/classifier/007/permissions/74715356 b/results/classifier/007/permissions/74715356
deleted file mode 100644
index d579d4ad3..000000000
--- a/results/classifier/007/permissions/74715356
+++ /dev/null
@@ -1,136 +0,0 @@
-permissions: 0.930
-other: 0.927
-semantic: 0.916
-debug: 0.907
-performance: 0.905
-device: 0.900
-PID: 0.897
-graphic: 0.894
-boot: 0.881
-KVM: 0.863
-vnc: 0.850
-files: 0.850
-socket: 0.843
-network: 0.838
-
-[Bug] x86 EFLAGS refresh is not happening correctly
-
-Hello,
-I'm posting this here instead of opening an issue as it is not clear to me if this is a bug or not.
-The issue is located in function "cpu_compute_eflags" in target/i386/cpu.h
-(
-https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.h#L2071
-)
-This function is exectued in an out of cpu loop context.
-It is used to synchronize TCG internal eflags registers (CC_OP, CC_SRC,  etc...) with the CPU eflags field upon loop exit.
-It does:
-    eflags
-|=
-cpu_cc_compute_all
-(
-env
-,
-CC_OP
-)
-|
-(
-env
-->
-df
-&
-DF_MASK
-);
-Shouldn't it be:
-    Â
-eflags
-=
-cpu_cc_compute_all
-(
-env
-,
-CC_OP
-)
-|
-(
-env
-->
-df
-&
-DF_MASK
-);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-Thanks,
-Kind regards,
-Stevie
-
-On 05/08/21 11:51, Stevie Lavern wrote:
-Shouldn't it be:
-eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-No, both are wrong.  env->eflags contains flags other than the
-arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
-The right code is in helper_read_eflags.  You can move it into
-cpu_compute_eflags, and make helper_read_eflags use it.
-Paolo
-
-On 05/08/21 13:24, Paolo Bonzini wrote:
-On 05/08/21 11:51, Stevie Lavern wrote:
-Shouldn't it be:
-eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-No, both are wrong.  env->eflags contains flags other than the
-arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
-The right code is in helper_read_eflags.  You can move it into
-cpu_compute_eflags, and make helper_read_eflags use it.
-Ah, actually the two are really the same, the TF/VM bits do not apply to
-cpu_compute_eflags so it's correct.
-What seems wrong is migration of the EFLAGS register.  There should be
-code in cpu_pre_save and cpu_post_load to special-case it and setup
-CC_DST/CC_OP as done in cpu_load_eflags.
-Also, cpu_load_eflags should assert that update_mask does not include
-any of the arithmetic flags.
-Paolo
-
-Thank for your reply!
-It's still a bit cryptic for me.
-I think i need to precise that I'm using a x86_64 custom user-mode,base on linux user-mode, that i'm developing (unfortunately i cannot share the code) with modifications in the translation loop (I've added cpu loop exits on specific instructions which are not control flow instructions).
-If my understanding is correct, in the user-mode case 'cpu_compute_eflags' is called directly by 'x86_cpu_exec_exit' with the intention of synchronizing the CPU env->eflags field with its real value (represented by the CC_* fields).
-I'm not sure how 'cpu_pre_save' and 'cpu_post_load' are involved in this case.

-As you said in your first email, 'helper_read_eflags' seems to be the correct way to go.
-Here is some detail about my current experimentation/understanding of this "issue":
-With the current implementationÂ
-        Â
-eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-if I exit the loop with a CC_OP different from CC_OP_EFLAGS, I found that the resulting env->eflags may be invalid.
-In my test case, the loop was exiting with eflags = 0x44 and CC_OP = CC_OP_SUBL with CC_DST=1, CC_SRC=258, CC_SRC2=0.
-While 'cpu_cc_compute_all' computes the correct flags (ZF:0, PF:0), the result will still be 0x44 (ZF:1, PF:1) due to the 'or' operation, thus leading to an incorrect eflags value loaded into the CPU env.Â
-In my case, after loop reentry, it led to an invalid branch to be taken.
-Thanks for your time!
-Regards
-Stevie

-On Thu, Aug 5, 2021 at 1:33 PM Paolo Bonzini <
-pbonzini@redhat.com
-> wrote:
-On 05/08/21 13:24, Paolo Bonzini wrote:
-> On 05/08/21 11:51, Stevie Lavern wrote:
->>
->> Shouldn't it be:
->> eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
->> as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
->
-> No, both are wrong.  env->eflags contains flags other than the
-> arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
->
-> The right code is in helper_read_eflags.  You can move it into
-> cpu_compute_eflags, and make helper_read_eflags use it.
-Ah, actually the two are really the same, the TF/VM bits do not apply to
-cpu_compute_eflags so it's correct.
-What seems wrong is migration of the EFLAGS register.  There should be
-code in cpu_pre_save and cpu_post_load to special-case it and setup
-CC_DST/CC_OP as done in cpu_load_eflags.
-Also, cpu_load_eflags should assert that update_mask does not include
-any of the arithmetic flags.
-Paolo
-
diff --git a/results/classifier/007/permissions/85542195 b/results/classifier/007/permissions/85542195
deleted file mode 100644
index 328f31033..000000000
--- a/results/classifier/007/permissions/85542195
+++ /dev/null
@@ -1,130 +0,0 @@
-permissions: 0.968
-PID: 0.945
-other: 0.944
-semantic: 0.941
-graphic: 0.938
-device: 0.936
-performance: 0.933
-boot: 0.932
-vnc: 0.923
-files: 0.920
-debug: 0.915
-socket: 0.905
-network: 0.899
-KVM: 0.898
-
-[Qemu-devel] [Bug in qemu-system-ppc running Mac OS 9 on Windows 10]
-
-Hi all,
-
-I've been experiencing issues when installing Mac OS 9.x using
-qemu-system-ppc.exe in Windows 10. After booting from CD image,
-partitioning a fresh disk image often hangs Qemu. When using a
-pre-partitioned disk image, the OS installation process halts
-somewhere during the process. The issues can be resolved by setting
-qemu-system-ppc.exe to run in Windows 7 compatibility mode.
-AFAIK all Qemu builds for Windows since Mac OS 9 became available as
-guest are affected.
-The issue is reproducible by installing Qemu for Windows from Stephan
-Weil on Windows 10 and boot/install Mac OS 9.x
-
-Best regards and thanks for looking into this,
-Howard
-
-On Nov 25, 2016, at 9:26 AM, address@hidden wrote:
-Hi all,
-
-I've been experiencing issues when installing Mac OS 9.x using
-qemu-system-ppc.exe in Windows 10. After booting from CD image,
-partitioning a fresh disk image often hangs Qemu. When using a
-pre-partitioned disk image, the OS installation process halts
-somewhere during the process. The issues can be resolved by setting
-qemu-system-ppc.exe to run in Windows 7 compatibility mode.
-AFAIK all Qemu builds for Windows since Mac OS 9 became available as
-guest are affected.
-The issue is reproducible by installing Qemu for Windows from Stephan
-Weil on Windows 10 and boot/install Mac OS 9.x
-
-Best regards and thanks for looking into this,
-Howard
-I assume there was some kind of behavior change for some of the
-Windows API between Windows 7 and Windows 10, that is my guess as to
-why the compatibility mode works. Could you run 'make check' on your
-system, once in Windows 7 and once in Windows 10. Maybe the tests
-will tell us something. I'm hoping that one of the tests succeeds in
-Windows 7 and fails in Windows 10. That would help us pinpoint what
-the problem is.
-What I mean by run in Windows 7 is set the mingw environment to run
-in Windows 7 compatibility mode (if possible). If you have Windows 7
-on another partition you could boot from, that would be better.
-Good luck.
-p.s. use 'make check -k' to allow all the tests to run (even if one
-or more of the tests fails).
-
->
-> Hi all,
->
->
->
-> I've been experiencing issues when installing Mac OS 9.x using
->
-> qemu-system-ppc.exe in Windows 10. After booting from CD image,
->
-> partitioning a fresh disk image often hangs Qemu. When using a
->
-> pre-partitioned disk image, the OS installation process halts
->
-> somewhere during the process. The issues can be resolved by setting
->
-> qemu-system-ppc.exe to run in Windows 7 compatibility mode.
->
-> AFAIK all Qemu builds for Windows since Mac OS 9 became available as
->
-> guest are affected.
->
-> The issue is reproducible by installing Qemu for Windows from Stephan
->
-> Weil on Windows 10 and boot/install Mac OS 9.x
->
->
->
-> Best regards and thanks for looking into this,
->
-> Howard
->
->
->
-I assume there was some kind of behavior change for some of the Windows API
->
-between Windows 7 and Windows 10, that is my guess as to why the
->
-compatibility mode works. Could you run 'make check' on your system, once in
->
-Windows 7 and once in Windows 10. Maybe the tests will tell us something.
->
-I'm hoping that one of the tests succeeds in Windows 7 and fails in Windows
->
-10. That would help us pinpoint what the problem is.
->
->
-What I mean by run in Windows 7 is set the mingw environment to run in
->
-Windows 7 compatibility mode (if possible). If you have Windows 7 on another
->
-partition you could boot from, that would be better.
->
->
-Good luck.
->
->
-p.s. use 'make check -k' to allow all the tests to run (even if one or more
->
-of the tests fails).
-Hi,
-
-Thank you for you suggestion, but I have no means to run the check you
-suggest. I cross-compile from Linux.
-
-Best regards,
-Howard
-
diff --git a/results/classifier/007/permissions/88225572 b/results/classifier/007/permissions/88225572
deleted file mode 100644
index 292ea66b8..000000000
--- a/results/classifier/007/permissions/88225572
+++ /dev/null
@@ -1,2910 +0,0 @@
-permissions: 0.992
-other: 0.987
-debug: 0.986
-PID: 0.984
-semantic: 0.976
-graphic: 0.974
-device: 0.970
-boot: 0.969
-performance: 0.965
-vnc: 0.958
-files: 0.957
-socket: 0.955
-network: 0.950
-KVM: 0.924
-
-[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device
-
-Hi,
-
-I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
-think it's because io completion hits use-after-free when device is
-already gone. Is this a known bug that has been fixed? (I went through
-the git log but didn't find anything obvious).
-
-gdb backtrace is:
-
-Core was generated by `/usr/local/libexec/qemu-kvm -name 
-sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
-Program terminated with signal 11, Segmentation fault.
-#0 object_get_class (obj=obj@entry=0x0) at 
-/usr/src/debug/qemu-4.0/qom/object.c:903
-903        return obj->class;
-(gdb) bt
-#0  object_get_class (obj=obj@entry=0x0) at 
-/usr/src/debug/qemu-4.0/qom/object.c:903
-#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
-    vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
-#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
-    opaque=0x558a2f2fd420, ret=0)
-    at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
-#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
-    at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
-#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
-    i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
-#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
-#6  0x00007fff9ed75780 in ?? ()
-#7  0x0000000000000000 in ?? ()
-
-It seems like qemu was completing a discard/write_zero request, but
-parent BusState was already freed & set to NULL.
-
-Do we need to drain all pending request before unrealizing virtio-blk
-device? Like the following patch proposed?
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
-If more info is needed, please let me know.
-
-Thanks,
-Eryu
-
-On Tue, 31 Dec 2019 18:34:34 +0800
-Eryu Guan <address@hidden> wrote:
-
->
-Hi,
->
->
-I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-think it's because io completion hits use-after-free when device is
->
-already gone. Is this a known bug that has been fixed? (I went through
->
-the git log but didn't find anything obvious).
->
->
-gdb backtrace is:
->
->
-Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-Program terminated with signal 11, Segmentation fault.
->
-#0 object_get_class (obj=obj@entry=0x0) at
->
-/usr/src/debug/qemu-4.0/qom/object.c:903
->
-903        return obj->class;
->
-(gdb) bt
->
-#0  object_get_class (obj=obj@entry=0x0) at
->
-/usr/src/debug/qemu-4.0/qom/object.c:903
->
-#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-    vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-    opaque=0x558a2f2fd420, ret=0)
->
-    at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-    at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-    i1=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-#6  0x00007fff9ed75780 in ?? ()
->
-#7  0x0000000000000000 in ?? ()
->
->
-It seems like qemu was completing a discard/write_zero request, but
->
-parent BusState was already freed & set to NULL.
->
->
-Do we need to drain all pending request before unrealizing virtio-blk
->
-device? Like the following patch proposed?
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
->
-If more info is needed, please let me know.
-may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
->
-Thanks,
->
-Eryu
->
-
-On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-On Tue, 31 Dec 2019 18:34:34 +0800
->
-Eryu Guan <address@hidden> wrote:
->
->
-> Hi,
->
->
->
-> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> think it's because io completion hits use-after-free when device is
->
-> already gone. Is this a known bug that has been fixed? (I went through
->
-> the git log but didn't find anything obvious).
->
->
->
-> gdb backtrace is:
->
->
->
-> Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> Program terminated with signal 11, Segmentation fault.
->
-> #0 object_get_class (obj=obj@entry=0x0) at
->
-> /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> 903        return obj->class;
->
-> (gdb) bt
->
-> #0  object_get_class (obj=obj@entry=0x0) at
->
-> /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
->     vector=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
->     opaque=0x558a2f2fd420, ret=0)
->
->     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
->     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
->     i1=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> #6  0x00007fff9ed75780 in ?? ()
->
-> #7  0x0000000000000000 in ?? ()
->
->
->
-> It seems like qemu was completing a discard/write_zero request, but
->
-> parent BusState was already freed & set to NULL.
->
->
->
-> Do we need to drain all pending request before unrealizing virtio-blk
->
-> device? Like the following patch proposed?
->
->
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
->
->
-> If more info is needed, please let me know.
->
->
-may be this will help:
-https://patchwork.kernel.org/patch/11213047/
-Yeah, this looks promising! I'll try it out (though it's a one-time
-crash for me). Thanks!
-
-Eryu
-
-On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> On Tue, 31 Dec 2019 18:34:34 +0800
->
-> Eryu Guan <address@hidden> wrote:
->
->
->
-> > Hi,
->
-> >
->
-> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > think it's because io completion hits use-after-free when device is
->
-> > already gone. Is this a known bug that has been fixed? (I went through
->
-> > the git log but didn't find anything obvious).
->
-> >
->
-> > gdb backtrace is:
->
-> >
->
-> > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > Program terminated with signal 11, Segmentation fault.
->
-> > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > 903        return obj->class;
->
-> > (gdb) bt
->
-> > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> >     vector=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> >     opaque=0x558a2f2fd420, ret=0)
->
-> >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> >     i1=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > #6  0x00007fff9ed75780 in ?? ()
->
-> > #7  0x0000000000000000 in ?? ()
->
-> >
->
-> > It seems like qemu was completing a discard/write_zero request, but
->
-> > parent BusState was already freed & set to NULL.
->
-> >
->
-> > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > device? Like the following patch proposed?
->
-> >
->
-> >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> >
->
-> > If more info is needed, please let me know.
->
->
->
-> may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
->
-Yeah, this looks promising! I'll try it out (though it's a one-time
->
-crash for me). Thanks!
-After applying this patch, I don't see the original segfaut and
-backtrace, but I see this crash
-
-[Thread debugging using libthread_db enabled]
-Using host libthread_db library "/lib64/libthread_db.so.1".
-Core was generated by `/usr/local/libexec/qemu-kvm -name 
-sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
-Program terminated with signal 11, Segmentation fault.
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, 
-addr=0, val=<optimized out>, size=<optimized out>) at 
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
-1324        VirtIOPCIProxy *proxy = 
-VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
-Missing separate debuginfos, use: debuginfo-install 
-glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 
-libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 
-libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 
-pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
-(gdb) bt
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, 
-addr=0, val=<optimized out>, size=<optimized out>) at 
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
-#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, 
-addr=<optimized out>, value=<optimized out>, size=<optimized out>, 
-shift=<optimized out>, mask=<optimized out>, attrs=...) at 
-/usr/src/debug/qemu-4.0/memory.c:502
-#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, 
-value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized 
-out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 
-<memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
-    at /usr/src/debug/qemu-4.0/memory.c:568
-#3  0x0000561216837c66 in memory_region_dispatch_write 
-(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, 
-attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
-#4  0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, 
-addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 
-0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, 
-l=<optimized out>, mr=0x56121846d340)
-    at /usr/src/debug/qemu-4.0/exec.c:3279
-#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, 
-attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at 
-/usr/src/debug/qemu-4.0/exec.c:3318
-#6  0x00005612167e4a1b in address_space_write (as=<optimized out>, 
-addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at 
-/usr/src/debug/qemu-4.0/exec.c:3408
-#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized 
-out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address 
-0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) 
-at /usr/src/debug/qemu-4.0/exec.c:3419
-#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at 
-/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
-#9  0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) 
-at /usr/src/debug/qemu-4.0/cpus.c:1281
-#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at 
-/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
-#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
-#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
-
-And I searched and found
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
-backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
-blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
-bug.
-
-But I can still hit the bug even after applying the commit. Do I miss
-anything?
-
-Thanks,
-Eryu
->
-Eryu
-
-On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
->
-On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > Eryu Guan <address@hidden> wrote:
->
-> >
->
-> > > Hi,
->
-> > >
->
-> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > > think it's because io completion hits use-after-free when device is
->
-> > > already gone. Is this a known bug that has been fixed? (I went through
->
-> > > the git log but didn't find anything obvious).
->
-> > >
->
-> > > gdb backtrace is:
->
-> > >
->
-> > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > Program terminated with signal 11, Segmentation fault.
->
-> > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > 903        return obj->class;
->
-> > > (gdb) bt
->
-> > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > >     vector=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > >     opaque=0x558a2f2fd420, ret=0)
->
-> > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > >     i1=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > #7  0x0000000000000000 in ?? ()
->
-> > >
->
-> > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > parent BusState was already freed & set to NULL.
->
-> > >
->
-> > > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > > device? Like the following patch proposed?
->
-> > >
->
-> > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > >
->
-> > > If more info is needed, please let me know.
->
-> >
->
-> > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
->
->
-> Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> crash for me). Thanks!
->
->
-After applying this patch, I don't see the original segfaut and
->
-backtrace, but I see this crash
->
->
-[Thread debugging using libthread_db enabled]
->
-Using host libthread_db library "/lib64/libthread_db.so.1".
->
-Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-Program terminated with signal 11, Segmentation fault.
->
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-addr=0, val=<optimized out>, size=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-1324        VirtIOPCIProxy *proxy =
->
-VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-Missing separate debuginfos, use: debuginfo-install
->
-glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-(gdb) bt
->
-#0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-addr=0, val=<optimized out>, size=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-#1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
->
-addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-/usr/src/debug/qemu-4.0/memory.c:502
->
-#2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
->
-attrs=...)
->
-at /usr/src/debug/qemu-4.0/memory.c:568
->
-#3  0x0000561216837c66 in memory_region_dispatch_write
->
-(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-#4  0x00005612167e036f in flatview_write_continue
->
-(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340)
->
-at /usr/src/debug/qemu-4.0/exec.c:3279
->
-#5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out
->
-of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-#6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/exec.c:3408
->
-#7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-addr=<optimized out>, attrs=..., attrs@entry=...,
->
-buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-len=<optimized out>, is_write=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/exec.c:3419
->
-#8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-#9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
->
-And I searched and found
->
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-bug.
->
->
-But I can still hit the bug even after applying the commit. Do I miss
->
-anything?
-Hi Eryu,
-This backtrace seems to be caused by this bug (there were two bugs in
-1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
-Although the solution hasn't been tested on virtio-blk yet, you may
-want to apply this patch:
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
-Let me know if this works.
-
-Best regards, Julia Suvorova.
-
-On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
->
->
-> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > Eryu Guan <address@hidden> wrote:
->
-> > >
->
-> > > > Hi,
->
-> > > >
->
-> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > > > think it's because io completion hits use-after-free when device is
->
-> > > > already gone. Is this a known bug that has been fixed? (I went through
->
-> > > > the git log but didn't find anything obvious).
->
-> > > >
->
-> > > > gdb backtrace is:
->
-> > > >
->
-> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > 903        return obj->class;
->
-> > > > (gdb) bt
->
-> > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > > >     vector=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > > >     i1=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > #7  0x0000000000000000 in ?? ()
->
-> > > >
->
-> > > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > > parent BusState was already freed & set to NULL.
->
-> > > >
->
-> > > > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > > > device? Like the following patch proposed?
->
-> > > >
->
-> > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > >
->
-> > > > If more info is needed, please let me know.
->
-> > >
->
-> > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> >
->
-> > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > crash for me). Thanks!
->
->
->
-> After applying this patch, I don't see the original segfaut and
->
-> backtrace, but I see this crash
->
->
->
-> [Thread debugging using libthread_db enabled]
->
-> Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> Program terminated with signal 11, Segmentation fault.
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> 1324        VirtIOPCIProxy *proxy =
->
-> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> Missing separate debuginfos, use: debuginfo-install
->
-> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> (gdb) bt
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
->
-> addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-> shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> /usr/src/debug/qemu-4.0/memory.c:502
->
-> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
->
-> attrs=...)
->
->     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> #4  0x00005612167e036f in flatview_write_continue
->
-> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> mr=0x56121846d340)
->
->     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
->
-> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
->
-> at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=<optimized out>, is_write=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/exec.c:3419
->
-> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
->
->
-> And I searched and found
->
->
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> bug.
->
->
->
-> But I can still hit the bug even after applying the commit. Do I miss
->
-> anything?
->
->
-Hi Eryu,
->
-This backtrace seems to be caused by this bug (there were two bugs in
->
-1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-Although the solution hasn't been tested on virtio-blk yet, you may
->
-want to apply this patch:
->
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-Let me know if this works.
-Will try it out, thanks a lot!
-
-Eryu
-
-On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
->
->
-> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > Eryu Guan <address@hidden> wrote:
->
-> > >
->
-> > > > Hi,
->
-> > > >
->
-> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I
->
-> > > > think it's because io completion hits use-after-free when device is
->
-> > > > already gone. Is this a known bug that has been fixed? (I went through
->
-> > > > the git log but didn't find anything obvious).
->
-> > > >
->
-> > > > gdb backtrace is:
->
-> > > >
->
-> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > 903        return obj->class;
->
-> > > > (gdb) bt
->
-> > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > > >     vector=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > > >     i1=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > #7  0x0000000000000000 in ?? ()
->
-> > > >
->
-> > > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > > parent BusState was already freed & set to NULL.
->
-> > > >
->
-> > > > Do we need to drain all pending request before unrealizing virtio-blk
->
-> > > > device? Like the following patch proposed?
->
-> > > >
->
-> > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > >
->
-> > > > If more info is needed, please let me know.
->
-> > >
->
-> > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> >
->
-> > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > crash for me). Thanks!
->
->
->
-> After applying this patch, I don't see the original segfaut and
->
-> backtrace, but I see this crash
->
->
->
-> [Thread debugging using libthread_db enabled]
->
-> Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> Program terminated with signal 11, Segmentation fault.
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> 1324        VirtIOPCIProxy *proxy =
->
-> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> Missing separate debuginfos, use: debuginfo-install
->
-> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> (gdb) bt
->
-> #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> addr=0, val=<optimized out>, size=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>,
->
-> addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-> shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> /usr/src/debug/qemu-4.0/memory.c:502
->
-> #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340,
->
-> attrs=...)
->
->     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> #4  0x00005612167e036f in flatview_write_continue
->
-> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> mr=0x56121846d340)
->
->     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
->
-> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-> #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>)
->
-> at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> len=<optimized out>, is_write=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/exec.c:3419
->
-> #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
->
->
-> And I searched and found
->
->
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> bug.
->
->
->
-> But I can still hit the bug even after applying the commit. Do I miss
->
-> anything?
->
->
-Hi Eryu,
->
-This backtrace seems to be caused by this bug (there were two bugs in
->
-1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-Although the solution hasn't been tested on virtio-blk yet, you may
->
-want to apply this patch:
->
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-Let me know if this works.
-Unfortunately, I still see the same segfault & backtrace after applying
-commit 421afd2fe8dd ("virtio: reset region cache when on queue
-deletion")
-
-Anything I can help to debug?
-
-Thanks,
-Eryu
-
-On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> >
->
-> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > Eryu Guan <address@hidden> wrote:
->
-> > > >
->
-> > > > > Hi,
->
-> > > > >
->
-> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox,
->
-> > > > > I
->
-> > > > > think it's because io completion hits use-after-free when device is
->
-> > > > > already gone. Is this a known bug that has been fixed? (I went
->
-> > > > > through
->
-> > > > > the git log but didn't find anything obvious).
->
-> > > > >
->
-> > > > > gdb backtrace is:
->
-> > > > >
->
-> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > 903        return obj->class;
->
-> > > > > (gdb) bt
->
-> > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0,
->
-> > > > >     vector=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete (
->
-> > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>,
->
-> > > > >     i1=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > >
->
-> > > > > It seems like qemu was completing a discard/write_zero request, but
->
-> > > > > parent BusState was already freed & set to NULL.
->
-> > > > >
->
-> > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > virtio-blk
->
-> > > > > device? Like the following patch proposed?
->
-> > > > >
->
-> > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > >
->
-> > > > > If more info is needed, please let me know.
->
-> > > >
->
-> > > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> > >
->
-> > > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > > crash for me). Thanks!
->
-> >
->
-> > After applying this patch, I don't see the original segfaut and
->
-> > backtrace, but I see this crash
->
-> >
->
-> > [Thread debugging using libthread_db enabled]
->
-> > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > Program terminated with signal 11, Segmentation fault.
->
-> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> > addr=0, val=<optimized out>, size=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > 1324        VirtIOPCIProxy *proxy =
->
-> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > Missing separate debuginfos, use: debuginfo-install
->
-> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> > (gdb) bt
->
-> > #0  0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0,
->
-> > addr=0, val=<optimized out>, size=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
->
-> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>,
->
-> > shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> > /usr/src/debug/qemu-4.0/memory.c:502
->
-> > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> > access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> > access_fn=0x561216835ac0 <memory_region_write_accessor>,
->
-> > mr=0x56121846d340, attrs=...)
->
-> >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > #4  0x00005612167e036f in flatview_write_continue
->
-> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> > mr=0x56121846d340)
->
-> >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028
->
-> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
->
-> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > len=<optimized out>, is_write=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at
->
-> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> >
->
-> > And I searched and found
->
-> >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> > bug.
->
-> >
->
-> > But I can still hit the bug even after applying the commit. Do I miss
->
-> > anything?
->
->
->
-> Hi Eryu,
->
-> This backtrace seems to be caused by this bug (there were two bugs in
->
-> 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> Although the solution hasn't been tested on virtio-blk yet, you may
->
-> want to apply this patch:
->
->
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> Let me know if this works.
->
->
-Unfortunately, I still see the same segfault & backtrace after applying
->
-commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-deletion")
->
->
-Anything I can help to debug?
-Please post the QEMU command-line and the QMP commands use to remove the
-device.
-
-The backtrace shows a vcpu thread submitting a request.  The device
-seems to be partially destroyed.  That's surprising because the monitor
-and the vcpu thread should use the QEMU global mutex to avoid race
-conditions.  Maybe seeing the QMP commands will make it clearer...
-
-Stefan
-signature.asc
-Description:
-PGP signature
-
-On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
->
-On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> > >
->
-> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > > Eryu Guan <address@hidden> wrote:
->
-> > > > >
->
-> > > > > > Hi,
->
-> > > > > >
->
-> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
->
-> > > > > > sandbox, I
->
-> > > > > > think it's because io completion hits use-after-free when device
->
-> > > > > > is
->
-> > > > > > already gone. Is this a known bug that has been fixed? (I went
->
-> > > > > > through
->
-> > > > > > the git log but didn't find anything obvious).
->
-> > > > > >
->
-> > > > > > gdb backtrace is:
->
-> > > > > >
->
-> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > 903        return obj->class;
->
-> > > > > > (gdb) bt
->
-> > > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
->
-> > > > > > (vdev=0x558a2e7751d0,
->
-> > > > > >     vector=<optimized out>) at
->
-> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > > #2  0x0000558a2bfdcb1e in
->
-> > > > > > virtio_blk_discard_write_zeroes_complete (
->
-> > > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
->
-> > > > > > out>,
->
-> > > > > >     i1=<optimized out>) at
->
-> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > > >
->
-> > > > > > It seems like qemu was completing a discard/write_zero request,
->
-> > > > > > but
->
-> > > > > > parent BusState was already freed & set to NULL.
->
-> > > > > >
->
-> > > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > > virtio-blk
->
-> > > > > > device? Like the following patch proposed?
->
-> > > > > >
->
-> > > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > > >
->
-> > > > > > If more info is needed, please let me know.
->
-> > > > >
->
-> > > > > may be this will help:
-https://patchwork.kernel.org/patch/11213047/
->
-> > > >
->
-> > > > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > > > crash for me). Thanks!
->
-> > >
->
-> > > After applying this patch, I don't see the original segfaut and
->
-> > > backtrace, but I see this crash
->
-> > >
->
-> > > [Thread debugging using libthread_db enabled]
->
-> > > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > > Program terminated with signal 11, Segmentation fault.
->
-> > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > 1324        VirtIOPCIProxy *proxy =
->
-> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > > Missing separate debuginfos, use: debuginfo-install
->
-> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> > > (gdb) bt
->
-> > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
->
-> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized
->
-> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> > > /usr/src/debug/qemu-4.0/memory.c:502
->
-> > > #2  0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0,
->
-> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2,
->
-> > > access_size_min=<optimized out>, access_size_max=<optimized out>,
->
-> > > access_fn=0x561216835ac0 <memory_region_write_accessor>,
->
-> > > mr=0x56121846d340, attrs=...)
->
-> > >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > > #4  0x00005612167e036f in flatview_write_continue
->
-> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> > > mr=0x56121846d340)
->
-> > >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
->
-> > > 0x7fce7dd97028 out of bounds>, len=2) at
->
-> > > /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
->
-> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > len=<optimized out>, is_write=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00)
->
-> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> > >
->
-> > > And I searched and found
->
-> > >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the same
->
-> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular
->
-> > > bug.
->
-> > >
->
-> > > But I can still hit the bug even after applying the commit. Do I miss
->
-> > > anything?
->
-> >
->
-> > Hi Eryu,
->
-> > This backtrace seems to be caused by this bug (there were two bugs in
->
-> > 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> > Although the solution hasn't been tested on virtio-blk yet, you may
->
-> > want to apply this patch:
->
-> >
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> > Let me know if this works.
->
->
->
-> Unfortunately, I still see the same segfault & backtrace after applying
->
-> commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-> deletion")
->
->
->
-> Anything I can help to debug?
->
->
-Please post the QEMU command-line and the QMP commands use to remove the
->
-device.
-It's a normal kata instance using virtio-fs as rootfs.
-
-/usr/local/libexec/qemu-kvm -name 
-sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
- -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine 
-q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
- -cpu host -qmp 
-unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
- \
- -qmp 
-unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
- \
- -m 2048M,slots=10,maxmem=773893M -device 
-pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
- -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device 
-virtconsole,chardev=charconsole0,id=console0 \
- -chardev 
-socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
- \
- -device 
-virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
- -chardev 
-socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
- \
- -device nvdimm,id=nv0,memdev=mem0 -object 
-memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
- \
- -object rng-random,id=rng0,filename=/dev/urandom -device 
-virtio-rng,rng=rng0,romfile= \
- -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
- -chardev 
-socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
- \
- -chardev 
-socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
- \
- -device 
-vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M 
--netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
- -device 
-driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
- \
- -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults 
--nographic -daemonize \
- -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on 
--numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
- -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 
-i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 
-console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 
-root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro 
-rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 
-agent.use_vsock=false init=/usr/lib/systemd/systemd 
-systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service 
-systemd.mask=systemd-networkd.socket \
- -pidfile 
-/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid 
-\
- -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
-
-QMP command to delete device (the device id is just an example, not the
-one caused the crash):
-
-"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
-
-which has been hot plugged by:
-"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
-"{\"return\": {}}"
-"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
-"{\"return\": {}}"
-
->
->
-The backtrace shows a vcpu thread submitting a request.  The device
->
-seems to be partially destroyed.  That's surprising because the monitor
->
-and the vcpu thread should use the QEMU global mutex to avoid race
->
-conditions.  Maybe seeing the QMP commands will make it clearer...
->
->
-Stefan
-Thanks!
-
-Eryu
-
-On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
->
-On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
->
-> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> > > >
->
-> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > > > Eryu Guan <address@hidden> wrote:
->
-> > > > > >
->
-> > > > > > > Hi,
->
-> > > > > > >
->
-> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
->
-> > > > > > > sandbox, I
->
-> > > > > > > think it's because io completion hits use-after-free when
->
-> > > > > > > device is
->
-> > > > > > > already gone. Is this a known bug that has been fixed? (I went
->
-> > > > > > > through
->
-> > > > > > > the git log but didn't find anything obvious).
->
-> > > > > > >
->
-> > > > > > > gdb backtrace is:
->
-> > > > > > >
->
-> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > 903        return obj->class;
->
-> > > > > > > (gdb) bt
->
-> > > > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
->
-> > > > > > > (vdev=0x558a2e7751d0,
->
-> > > > > > >     vector=<optimized out>) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > > > #2  0x0000558a2bfdcb1e in
->
-> > > > > > > virtio_blk_discard_write_zeroes_complete (
->
-> > > > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420)
->
-> > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
->
-> > > > > > > out>,
->
-> > > > > > >     i1=<optimized out>) at
->
-> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > > > >
->
-> > > > > > > It seems like qemu was completing a discard/write_zero request,
->
-> > > > > > > but
->
-> > > > > > > parent BusState was already freed & set to NULL.
->
-> > > > > > >
->
-> > > > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > > > virtio-blk
->
-> > > > > > > device? Like the following patch proposed?
->
-> > > > > > >
->
-> > > > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > > > >
->
-> > > > > > > If more info is needed, please let me know.
->
-> > > > > >
->
-> > > > > > may be this will help:
->
-> > > > > >
-https://patchwork.kernel.org/patch/11213047/
->
-> > > > >
->
-> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time
->
-> > > > > crash for me). Thanks!
->
-> > > >
->
-> > > > After applying this patch, I don't see the original segfaut and
->
-> > > > backtrace, but I see this crash
->
-> > > >
->
-> > > > [Thread debugging using libthread_db enabled]
->
-> > > > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > 1324        VirtIOPCIProxy *proxy =
->
-> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > > > Missing separate debuginfos, use: debuginfo-install
->
-> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > > > libstdc++-4.8.5-28.alios7.1.x86_64
->
-> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64
->
-> > > > zlib-1.2.7-16.2.alios7.x86_64
->
-> > > > (gdb) bt
->
-> > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized
->
-> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > #1  0x0000561216835b22 in memory_region_write_accessor (mr=<optimized
->
-> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized
->
-> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at
->
-> > > > /usr/src/debug/qemu-4.0/memory.c:502
->
-> > > > #2  0x0000561216833c5d in access_with_adjusted_size
->
-> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8,
->
-> > > > size=size@entry=2, access_size_min=<optimized out>,
->
-> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0
->
-> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
->
-> > > >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > > > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > > > #4  0x00005612167e036f in flatview_write_continue
->
-> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=...,
->
-> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>,
->
-> > > > mr=0x56121846d340)
->
-> > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
->
-> > > > 0x7fce7dd97028 out of bounds>, len=2) at
->
-> > > > /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized
->
-> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > > > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>,
->
-> > > > len=<optimized out>, is_write=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > > > #8  0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00)
->
-> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at
->
-> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0
->
-> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> > > >
->
-> > > > And I searched and found
->
-> > > >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the
->
-> > > > same
->
-> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add
->
-> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this
->
-> > > > particular
->
-> > > > bug.
->
-> > > >
->
-> > > > But I can still hit the bug even after applying the commit. Do I miss
->
-> > > > anything?
->
-> > >
->
-> > > Hi Eryu,
->
-> > > This backtrace seems to be caused by this bug (there were two bugs in
->
-> > > 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> > > Although the solution hasn't been tested on virtio-blk yet, you may
->
-> > > want to apply this patch:
->
-> > >
->
-> > >
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> > > Let me know if this works.
->
-> >
->
-> > Unfortunately, I still see the same segfault & backtrace after applying
->
-> > commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-> > deletion")
->
-> >
->
-> > Anything I can help to debug?
->
->
->
-> Please post the QEMU command-line and the QMP commands use to remove the
->
-> device.
->
->
-It's a normal kata instance using virtio-fs as rootfs.
->
->
-/usr/local/libexec/qemu-kvm -name
->
-sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
->
--uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine
->
-q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
->
--cpu host -qmp
->
-unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
-\
->
--qmp
->
-unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
-\
->
--m 2048M,slots=10,maxmem=773893M -device
->
-pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
->
--device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device
->
-virtconsole,chardev=charconsole0,id=console0 \
->
--chardev
->
-socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
->
-\
->
--device
->
-virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \
->
--chardev
->
-socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
->
-\
->
--device nvdimm,id=nv0,memdev=mem0 -object
->
-memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
->
-\
->
--object rng-random,id=rng0,filename=/dev/urandom -device
->
-virtio-rng,rng=rng0,romfile= \
->
--device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
->
--chardev
->
-socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
->
-\
->
--chardev
->
-socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
->
-\
->
--device
->
-vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M
->
--netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
->
--device
->
-driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
->
-\
->
--global kvm-pit.lost_tick_policy=discard -vga none -no-user-config
->
--nodefaults -nographic -daemonize \
->
--object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on
->
--numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
->
--append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1
->
-i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k
->
-console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0
->
-pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro
->
-ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96
->
-agent.use_vsock=false init=/usr/lib/systemd/systemd
->
-systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service
->
-systemd.mask=systemd-networkd.socket \
->
--pidfile
->
-/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid
->
-\
->
--smp 1,cores=1,threads=1,sockets=96,maxcpus=96
->
->
-QMP command to delete device (the device id is just an example, not the
->
-one caused the crash):
->
->
-"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
->
->
-which has been hot plugged by:
->
-"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
->
-"{\"return\": {}}"
->
-"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
->
-"{\"return\": {}}"
-Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
-
-One thing that is strange about the latest backtrace you posted: QEMU is
-dispatching the memory access instead of using the ioeventfd code that
-that virtio-blk-pci normally takes when a virtqueue is notified.  I
-guess this means ioeventfd has already been disabled due to the hot
-unplug.
-
-Could you try with machine type "i440fx" instead of "q35"?  I wonder if
-pci-bridge/shpc is part of the problem.
-
-Stefan
-signature.asc
-Description:
-PGP signature
-
-On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote:
->
-On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote:
->
-> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote:
->
-> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote:
->
-> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote:
->
-> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote:
->
-> > > > >
->
-> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote:
->
-> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote:
->
-> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800
->
-> > > > > > > Eryu Guan <address@hidden> wrote:
->
-> > > > > > >
->
-> > > > > > > > Hi,
->
-> > > > > > > >
->
-> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata
->
-> > > > > > > > sandbox, I
->
-> > > > > > > > think it's because io completion hits use-after-free when
->
-> > > > > > > > device is
->
-> > > > > > > > already gone. Is this a known bug that has been fixed? (I
->
-> > > > > > > > went through
->
-> > > > > > > > the git log but didn't find anything obvious).
->
-> > > > > > > >
->
-> > > > > > > > gdb backtrace is:
->
-> > > > > > > >
->
-> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'.
->
-> > > > > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > > 903        return obj->class;
->
-> > > > > > > > (gdb) bt
->
-> > > > > > > > #0  object_get_class (obj=obj@entry=0x0) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903
->
-> > > > > > > > #1  0x0000558a2c009e9b in virtio_notify_vector
->
-> > > > > > > > (vdev=0x558a2e7751d0,
->
-> > > > > > > >     vector=<optimized out>) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118
->
-> > > > > > > > #2  0x0000558a2bfdcb1e in
->
-> > > > > > > > virtio_blk_discard_write_zeroes_complete (
->
-> > > > > > > >     opaque=0x558a2f2fd420, ret=0)
->
-> > > > > > > >     at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186
->
-> > > > > > > > #3  0x0000558a2c261c7e in blk_aio_complete
->
-> > > > > > > > (acb=0x558a2eed7420)
->
-> > > > > > > >     at /usr/src/debug/qemu-4.0/block/block-backend.c:1305
->
-> > > > > > > > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized
->
-> > > > > > > > out>,
->
-> > > > > > > >     i1=<optimized out>) at
->
-> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116
->
-> > > > > > > > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6
->
-> > > > > > > > #6  0x00007fff9ed75780 in ?? ()
->
-> > > > > > > > #7  0x0000000000000000 in ?? ()
->
-> > > > > > > >
->
-> > > > > > > > It seems like qemu was completing a discard/write_zero
->
-> > > > > > > > request, but
->
-> > > > > > > > parent BusState was already freed & set to NULL.
->
-> > > > > > > >
->
-> > > > > > > > Do we need to drain all pending request before unrealizing
->
-> > > > > > > > virtio-blk
->
-> > > > > > > > device? Like the following patch proposed?
->
-> > > > > > > >
->
-> > > > > > > >
-https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html
->
-> > > > > > > >
->
-> > > > > > > > If more info is needed, please let me know.
->
-> > > > > > >
->
-> > > > > > > may be this will help:
->
-> > > > > > >
-https://patchwork.kernel.org/patch/11213047/
->
-> > > > > >
->
-> > > > > > Yeah, this looks promising! I'll try it out (though it's a
->
-> > > > > > one-time
->
-> > > > > > crash for me). Thanks!
->
-> > > > >
->
-> > > > > After applying this patch, I don't see the original segfaut and
->
-> > > > > backtrace, but I see this crash
->
-> > > > >
->
-> > > > > [Thread debugging using libthread_db enabled]
->
-> > > > > Using host libthread_db library "/lib64/libthread_db.so.1".
->
-> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name
->
-> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'.
->
-> > > > > Program terminated with signal 11, Segmentation fault.
->
-> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>,
->
-> > > > > size=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > > 1324        VirtIOPCIProxy *proxy =
->
-> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent);
->
-> > > > > Missing separate debuginfos, use: debuginfo-install
->
-> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64
->
-> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64
->
-> > > > > libstdc++-4.8.5-28.alios7.1.x86_64
->
-> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64
->
-> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64
->
-> > > > > (gdb) bt
->
-> > > > > #0  0x0000561216a57609 in virtio_pci_notify_write
->
-> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>,
->
-> > > > > size=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324
->
-> > > > > #1  0x0000561216835b22 in memory_region_write_accessor
->
-> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>,
->
-> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>,
->
-> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502
->
-> > > > > #2  0x0000561216833c5d in access_with_adjusted_size
->
-> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8,
->
-> > > > > size=size@entry=2, access_size_min=<optimized out>,
->
-> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0
->
-> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...)
->
-> > > > >     at /usr/src/debug/qemu-4.0/memory.c:568
->
-> > > > > #3  0x0000561216837c66 in memory_region_dispatch_write
->
-> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2,
->
-> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503
->
-> > > > > #4  0x00005612167e036f in flatview_write_continue
->
-> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304,
->
-> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out
->
-> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized
->
-> > > > > out>, mr=0x56121846d340)
->
-> > > > >     at /usr/src/debug/qemu-4.0/exec.c:3279
->
-> > > > > #5  0x00005612167e0506 in flatview_write (fv=0x56121852edd0,
->
-> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address
->
-> > > > > 0x7fce7dd97028 out of bounds>, len=2) at
->
-> > > > > /usr/src/debug/qemu-4.0/exec.c:3318
->
-> > > > > #6  0x00005612167e4a1b in address_space_write (as=<optimized out>,
->
-> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>,
->
-> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408
->
-> > > > > #7  0x00005612167e4aa5 in address_space_rw (as=<optimized out>,
->
-> > > > > addr=<optimized out>, attrs=..., attrs@entry=...,
->
-> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of
->
-> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at
->
-> > > > > /usr/src/debug/qemu-4.0/exec.c:3419
->
-> > > > > #8  0x0000561216849da1 in kvm_cpu_exec
->
-> > > > > (cpu=cpu@entry=0x56121849aa00) at
->
-> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034
->
-> > > > > #9  0x000056121682255e in qemu_kvm_cpu_thread_fn
->
-> > > > > (arg=arg@entry=0x56121849aa00) at
->
-> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281
->
-> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>)
->
-> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502
->
-> > > > > #11 0x00007fce7bef6e25 in start_thread () from
->
-> > > > > /lib64/libpthread.so.0
->
-> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6
->
-> > > > >
->
-> > > > > And I searched and found
->
-> > > > >
-https://bugzilla.redhat.com/show_bug.cgi?id=1706759
-, which has the
->
-> > > > > same
->
-> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk:
->
-> > > > > Add
->
-> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this
->
-> > > > > particular
->
-> > > > > bug.
->
-> > > > >
->
-> > > > > But I can still hit the bug even after applying the commit. Do I
->
-> > > > > miss
->
-> > > > > anything?
->
-> > > >
->
-> > > > Hi Eryu,
->
-> > > > This backtrace seems to be caused by this bug (there were two bugs in
->
-> > > > 1706759):
-https://bugzilla.redhat.com/show_bug.cgi?id=1708480
->
-> > > > Although the solution hasn't been tested on virtio-blk yet, you may
->
-> > > > want to apply this patch:
->
-> > > >
->
-> > > >
-https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html
->
-> > > > Let me know if this works.
->
-> > >
->
-> > > Unfortunately, I still see the same segfault & backtrace after applying
->
-> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue
->
-> > > deletion")
->
-> > >
->
-> > > Anything I can help to debug?
->
-> >
->
-> > Please post the QEMU command-line and the QMP commands use to remove the
->
-> > device.
->
->
->
-> It's a normal kata instance using virtio-fs as rootfs.
->
->
->
-> /usr/local/libexec/qemu-kvm -name
->
-> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \
->
->  -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine
->
-> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \
->
->  -cpu host -qmp
->
-> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
->  \
->
->  -qmp
->
-> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait
->
->  \
->
->  -m 2048M,slots=10,maxmem=773893M -device
->
-> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \
->
->  -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device
->
-> virtconsole,chardev=charconsole0,id=console0 \
->
->  -chardev
->
-> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait
->
->  \
->
->  -device
->
-> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10
->
->  \
->
->  -chardev
->
-> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait
->
->  \
->
->  -device nvdimm,id=nv0,memdev=mem0 -object
->
-> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456
->
->  \
->
->  -object rng-random,id=rng0,filename=/dev/urandom -device
->
-> virtio-rng,rng=rng0,romfile= \
->
->  -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \
->
->  -chardev
->
-> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait
->
->  \
->
->  -chardev
->
-> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock
->
->  \
->
->  -device
->
-> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M
->
->  -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \
->
->  -device
->
-> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile=
->
->  \
->
->  -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config
->
-> -nodefaults -nographic -daemonize \
->
->  -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on
->
-> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \
->
->  -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1
->
-> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp
->
-> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests
->
-> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1
->
-> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet
->
-> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false
->
-> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target
->
-> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \
->
->  -pidfile
->
-> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid
->
->  \
->
->  -smp 1,cores=1,threads=1,sockets=96,maxcpus=96
->
->
->
-> QMP command to delete device (the device id is just an example, not the
->
-> one caused the crash):
->
->
->
-> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}"
->
->
->
-> which has been hot plugged by:
->
-> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}"
->
-> "{\"return\": {}}"
->
-> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}"
->
-> "{\"return\": {}}"
->
->
-Thanks.  I wasn't able to reproduce this crash with qemu.git/master.
->
->
-One thing that is strange about the latest backtrace you posted: QEMU is
->
-dispatching the memory access instead of using the ioeventfd code that
->
-that virtio-blk-pci normally takes when a virtqueue is notified.  I
->
-guess this means ioeventfd has already been disabled due to the hot
->
-unplug.
->
->
-Could you try with machine type "i440fx" instead of "q35"?  I wonder if
->
-pci-bridge/shpc is part of the problem.
-Sure, will try it. But it may take some time, as the test bed is busy
-with other testing tasks. I'll report back once I got the results.
-
-Thanks,
-Eryu
-
diff --git a/results/classifier/007/permissions/88281850 b/results/classifier/007/permissions/88281850
deleted file mode 100644
index 02513facf..000000000
--- a/results/classifier/007/permissions/88281850
+++ /dev/null
@@ -1,291 +0,0 @@
-permissions: 0.985
-other: 0.983
-debug: 0.979
-graphic: 0.974
-network: 0.973
-device: 0.970
-performance: 0.969
-semantic: 0.968
-boot: 0.967
-socket: 0.966
-files: 0.962
-PID: 0.959
-vnc: 0.945
-KVM: 0.881
-
-[Bug] Take more 150s to boot qemu on ARM64
-
-Hi all,
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes
-about 150s between beginning to run qemu command and beginng to boot
-Linux kernel ("EFI stub: Booting Linux Kernel...").
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel
-code and it finds c2445d387850 ("srcu: Add contention check to
-call_srcu() srcu_data ->lock acquisition").
-The qemu (qemu version is 6.2.92) command i run is :
-
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
---trace "kvm*" \
--cpu host \
--machine virt,accel=kvm,gic-version=3  \
--machine smp.cpus=2,smp.sockets=2 \
--no-reboot \
--nographic \
--monitor unix:/home/cx/qmp-test,server,nowait \
--bios /home/cx/boot/QEMU_EFI.fd \
--kernel /home/cx/boot/Image  \
--device
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
-\
--device vfio-pci,host=7d:01.3,id=net0 \
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
--net none \
--D /home/cx/qemu_log.txt
-I am not familiar with rcu code, and don't know how it causes the issue.
-Do you have any idea about this issue?
-Best Regard,
-
-Xiang Chen
-
-On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
->
-Hi all,
->
->
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
->
-150s between beginning to run qemu command and beginng to boot Linux kernel
->
-("EFI stub: Booting Linux Kernel...").
->
->
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
->
-and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
->
-srcu_data ->lock acquisition").
->
->
-The qemu (qemu version is 6.2.92) command i run is :
->
->
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
->
---trace "kvm*" \
->
--cpu host \
->
--machine virt,accel=kvm,gic-version=3  \
->
--machine smp.cpus=2,smp.sockets=2 \
->
--no-reboot \
->
--nographic \
->
--monitor unix:/home/cx/qmp-test,server,nowait \
->
--bios /home/cx/boot/QEMU_EFI.fd \
->
--kernel /home/cx/boot/Image  \
->
--device
->
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
->
-\
->
--device vfio-pci,host=7d:01.3,id=net0 \
->
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
->
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
->
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
->
--net none \
->
--D /home/cx/qemu_log.txt
->
->
-I am not familiar with rcu code, and don't know how it causes the issue. Do
->
-you have any idea about this issue?
-Please see the discussion here:
-https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
-Though that report requires ACPI to be forced on to get the
-delay, which results in more than 9,000 back-to-back calls to
-synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
-with an artificial tight loop invoking synchronize_srcu_expedited(),
-but then again I don't have ARM hardware.
-
-My current guess is that the following patch, but with larger values for
-SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
-or perhaps even larger.
-
-If you get a chance to experiment with this, could you please reply
-to the discussion at the above URL?  (Or let me know, and I can CC
-you on the next message in that thread.)
-
-                                                Thanx, Paul
-
-------------------------------------------------------------------------
-
-diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
-index 50ba70f019dea..0db7873f4e95b 100644
---- a/kernel/rcu/srcutree.c
-+++ b/kernel/rcu/srcutree.c
-@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
- 
- #define SRCU_INTERVAL          1       // Base delay if no expedited GPs 
-pending.
- #define SRCU_MAX_INTERVAL      10      // Maximum incremental delay from slow 
-readers.
--#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
-no-delay instances.
-+#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
-no-delay instances.
- #define SRCU_MAX_NODELAY       100     // Maximum consecutive no-delay 
-instances.
- 
- /*
-@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-  */
- static unsigned long srcu_get_delay(struct srcu_struct *ssp)
- {
-+       unsigned long gpstart;
-+       unsigned long j;
-        unsigned long jbase = SRCU_INTERVAL;
- 
-        if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), 
-READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
-                jbase = 0;
--       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
--               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
--       if (!jbase) {
--               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
--               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
--                       jbase = 1;
-+       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
-+               j = jiffies - 1;
-+               gpstart = READ_ONCE(ssp->srcu_gp_start);
-+               if (time_after(j, gpstart))
-+                       jbase += j - gpstart;
-+               if (!jbase) {
-+                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-+                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
-SRCU_MAX_NODELAY_PHASE)
-+                               jbase = 1;
-+               }
-        }
-        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
- }
-
-在 2022/6/13 21:22, Paul E. McKenney 写道:
-On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
-Hi all,
-
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
-150s between beginning to run qemu command and beginng to boot Linux kernel
-("EFI stub: Booting Linux Kernel...").
-
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
-and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
-srcu_data ->lock acquisition").
-
-The qemu (qemu version is 6.2.92) command i run is :
-
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
---trace "kvm*" \
--cpu host \
--machine virt,accel=kvm,gic-version=3  \
--machine smp.cpus=2,smp.sockets=2 \
--no-reboot \
--nographic \
--monitor unix:/home/cx/qmp-test,server,nowait \
--bios /home/cx/boot/QEMU_EFI.fd \
--kernel /home/cx/boot/Image  \
--device 
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
-\
--device vfio-pci,host=7d:01.3,id=net0 \
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
--net none \
--D /home/cx/qemu_log.txt
-
-I am not familiar with rcu code, and don't know how it causes the issue. Do
-you have any idea about this issue?
-Please see the discussion here:
-https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
-Though that report requires ACPI to be forced on to get the
-delay, which results in more than 9,000 back-to-back calls to
-synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
-with an artificial tight loop invoking synchronize_srcu_expedited(),
-but then again I don't have ARM hardware.
-
-My current guess is that the following patch, but with larger values for
-SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
-or perhaps even larger.
-
-If you get a chance to experiment with this, could you please reply
-to the discussion at the above URL?  (Or let me know, and I can CC
-you on the next message in that thread.)
-Ok, thanks, i will reply it on above URL.
-Thanx, Paul
-
-------------------------------------------------------------------------
-
-diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
-index 50ba70f019dea..0db7873f4e95b 100644
---- a/kernel/rcu/srcutree.c
-+++ b/kernel/rcu/srcutree.c
-@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-#define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
-#define SRCU_MAX_INTERVAL     10      // Maximum incremental delay from slow 
-readers.
--#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
-no-delay instances.
-+#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
-no-delay instances.
-  #define SRCU_MAX_NODELAY      100     // Maximum consecutive no-delay 
-instances.
-/*
-@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-   */
-  static unsigned long srcu_get_delay(struct srcu_struct *ssp)
-  {
-+       unsigned long gpstart;
-+       unsigned long j;
-        unsigned long jbase = SRCU_INTERVAL;
-if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
-jbase = 0;
--       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
--               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
--       if (!jbase) {
--               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
--               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
--                       jbase = 1;
-+       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
-+               j = jiffies - 1;
-+               gpstart = READ_ONCE(ssp->srcu_gp_start);
-+               if (time_after(j, gpstart))
-+                       jbase += j - gpstart;
-+               if (!jbase) {
-+                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-+                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
-SRCU_MAX_NODELAY_PHASE)
-+                               jbase = 1;
-+               }
-        }
-        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
-  }
-.
-
diff --git a/results/classifier/007/permissions/95154278 b/results/classifier/007/permissions/95154278
deleted file mode 100644
index 2dc0c2ffc..000000000
--- a/results/classifier/007/permissions/95154278
+++ /dev/null
@@ -1,165 +0,0 @@
-permissions: 0.989
-other: 0.953
-debug: 0.951
-device: 0.951
-graphic: 0.950
-PID: 0.949
-vnc: 0.948
-semantic: 0.937
-performance: 0.936
-files: 0.918
-KVM: 0.916
-socket: 0.913
-network: 0.913
-boot: 0.902
-
-[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c
-
-If  checkpatch.pl is applied (using switch "-f") on file 
-target/mips/msa_helper.c, it will hang.
-
-There is a workaround for this particular file:
-
-These lines in msa_helper.c:
-
-        uint## BITS ##_t S = _S, T = _T;                            \
-        uint## BITS ##_t as, at, xs, xt, xd;                        \
-
-should be replaced with:
-
-        uint## BITS ## _t S = _S, T = _T;                           \
-        uint## BITS ## _t as, at, xs, xt, xd;                       \
-
-(a space is added after the second "##" in each line)
-
-The workaround is found by partial deleting and undeleting of the code in 
-msa_helper.c in binary search fashion.
-
-This workaround will soon be submitted by me as a patch within a series on misc 
-MIPS issues.
-
-I took a look at checkpatch.pl code, and it looks it is fairly complicated to 
-fix the issue, since it happens in the code segment involving intricate logic 
-conditions.
-
-Regards,
-Aleksandar
-
-On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote:
->
-If  checkpatch.pl is applied (using switch "-f") on file
->
-target/mips/msa_helper.c, it will hang.
->
->
-There is a workaround for this particular file:
->
->
-These lines in msa_helper.c:
->
->
-uint## BITS ##_t S = _S, T = _T;                            \
->
-uint## BITS ##_t as, at, xs, xt, xd;                        \
->
->
-should be replaced with:
->
->
-uint## BITS ## _t S = _S, T = _T;                           \
->
-uint## BITS ## _t as, at, xs, xt, xd;                       \
->
->
-(a space is added after the second "##" in each line)
->
->
-The workaround is found by partial deleting and undeleting of the code in
->
-msa_helper.c in binary search fashion.
->
->
-This workaround will soon be submitted by me as a patch within a series on
->
-misc MIPS issues.
->
->
-I took a look at checkpatch.pl code, and it looks it is fairly complicated to
->
-fix the issue, since it happens in the code segment involving intricate logic
->
-conditions.
-Thanks for figuring this out, Aleksandar.  Not sure if anyone else has
-the apetite to fix checkpatch.pl.
-
-Stefan
-signature.asc
-Description:
-PGP signature
-
-On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote:
->
-On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote:
->
-> If  checkpatch.pl is applied (using switch "-f") on file
->
-> target/mips/msa_helper.c, it will hang.
->
->
->
-> There is a workaround for this particular file:
->
->
->
-> These lines in msa_helper.c:
->
->
->
->         uint## BITS ##_t S = _S, T = _T;                            \
->
->         uint## BITS ##_t as, at, xs, xt, xd;                        \
->
->
->
-> should be replaced with:
->
->
->
->         uint## BITS ## _t S = _S, T = _T;                           \
->
->         uint## BITS ## _t as, at, xs, xt, xd;                       \
->
->
->
-> (a space is added after the second "##" in each line)
->
->
->
-> The workaround is found by partial deleting and undeleting of the code in
->
-> msa_helper.c in binary search fashion.
->
->
->
-> This workaround will soon be submitted by me as a patch within a series on
->
-> misc MIPS issues.
->
->
->
-> I took a look at checkpatch.pl code, and it looks it is fairly complicated
->
-> to fix the issue, since it happens in the code segment involving intricate
->
-> logic conditions.
->
->
-Thanks for figuring this out, Aleksandar.  Not sure if anyone else has
->
-the apetite to fix checkpatch.pl.
-Anyone else but Paolo ;P
-http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html
-signature.asc
-Description:
-OpenPGP digital signature
-