restructure results

author: Christian Krinitsin <mail@krinitsin.com> 2025-07-03 19:39:53 +0200
committer: Christian Krinitsin <mail@krinitsin.com> 2025-07-03 19:39:53 +0200
commit: dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree: 418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/009/permissions
parent: 4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
download: qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz
qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip
17 files changed, 0 insertions, 5635 deletions
diff --git a/results/classifier/009/permissions/12360755 b/results/classifier/009/permissions/12360755
deleted file mode 100644
index 3de2a3c4a..000000000
--- a/results/classifier/009/permissions/12360755
+++ /dev/null
@@ -1,306 +0,0 @@
-permissions: 0.930
-debug: 0.922
-semantic: 0.911
-device: 0.902
-graphic: 0.899
-performance: 0.895
-other: 0.886
-PID: 0.876
-files: 0.851
-boot: 0.818
-vnc: 0.810
-socket: 0.805
-KVM: 0.770
-network: 0.738
-
-[Qemu-devel] [BUG] virtio-net linux driver fails to probe on MIPS Malta since 'hw/virtio-pci: fix virtio behaviour'
-
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-
-Cheers
-James
-signature.asc
-Description:
-Digital signature
-
-On 03/17/2017 11:57 PM, James Hogan wrote:
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-Hi,
-
-
-This is strange. This commit changes virtio devices from legacy to virtio 
-"transitional".
-(your command line changes it to legacy)
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU 
-side
-there is nothing new.
-
-Michael, do you have any idea?
-
-Thanks,
-Marcel
-Cheers
-James
-
-On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote:
->
-On 03/17/2017 11:57 PM, James Hogan wrote:
->
-> Hi,
->
->
->
-> I've bisected the following failure of the virtio_net linux v4.10 driver
->
-> to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
->
->
->
-> virtio_net virtio0: virtio: device uses modern interface but does not have
->
-> VIRTIO_F_VERSION_1
->
-> virtio_net: probe of virtio0 failed with error -22
->
->
->
-> To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
->
->
->
-> It appears that adding ",disable-modern=on,disable-legacy=off" to the
->
-> virtio-net -device makes it work again.
->
->
->
-> I presume this should really just work out of the box. Any ideas why it
->
-> isn't?
->
->
->
->
-Hi,
->
->
->
-This is strange. This commit changes virtio devices from legacy to virtio
->
-"transitional".
->
-(your command line changes it to legacy)
->
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU
->
-side
->
-there is nothing new.
->
->
-Michael, do you have any idea?
->
->
-Thanks,
->
-Marcel
-My guess would be firmware mishandling 64 bit BARs - we saw such
-a case on sparc previously. As a result you are probably reading
-all zeroes from features register or something like that.
-Marcel, could you send a patch making the bar 32 bit?
-If that helps we know what the issue is.
-
->
-> Cheers
->
-> James
->
->
-
-On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote:
-On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote:
-On 03/17/2017 11:57 PM, James Hogan wrote:
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-Hi,
-
-
-This is strange. This commit changes virtio devices from legacy to virtio 
-"transitional".
-(your command line changes it to legacy)
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU 
-side
-there is nothing new.
-
-Michael, do you have any idea?
-
-Thanks,
-Marcel
-My guess would be firmware mishandling 64 bit BARs - we saw such
-a case on sparc previously. As a result you are probably reading
-all zeroes from features register or something like that.
-Marcel, could you send a patch making the bar 32 bit?
-If that helps we know what the issue is.
-Sure,
-
-Thanks,
-Marcel
-Cheers
-James
-
-On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote:
-On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote:
-On 03/17/2017 11:57 PM, James Hogan wrote:
-Hi,
-
-I've bisected the following failure of the virtio_net linux v4.10 driver
-to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine:
-
-virtio_net virtio0: virtio: device uses modern interface but does not have 
-VIRTIO_F_VERSION_1
-virtio_net: probe of virtio0 failed with error -22
-
-To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour").
-
-It appears that adding ",disable-modern=on,disable-legacy=off" to the
-virtio-net -device makes it work again.
-
-I presume this should really just work out of the box. Any ideas why it
-isn't?
-Hi,
-
-
-This is strange. This commit changes virtio devices from legacy to virtio 
-"transitional".
-(your command line changes it to legacy)
-Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU 
-side
-there is nothing new.
-
-Michael, do you have any idea?
-
-Thanks,
-Marcel
-My guess would be firmware mishandling 64 bit BARs - we saw such
-a case on sparc previously. As a result you are probably reading
-all zeroes from features register or something like that.
-Marcel, could you send a patch making the bar 32 bit?
-If that helps we know what the issue is.
-Hi James,
-
-Can you please check if the below patch fixes the problem?
-Please note it is not a solution.
-
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
-index f9b7244..5b4d429 100644
---- a/hw/virtio/virtio-pci.c
-+++ b/hw/virtio/virtio-pci.c
-@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d, 
-Error **errp)
-         }
-
-         pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx,
--                         PCI_BASE_ADDRESS_SPACE_MEMORY |
--                         PCI_BASE_ADDRESS_MEM_PREFETCH |
--                         PCI_BASE_ADDRESS_MEM_TYPE_64,
-+                         PCI_BASE_ADDRESS_SPACE_MEMORY,
-                          &proxy->modern_bar);
-
-         proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap);
-
-
-Thanks,
-Marcel
-
-Hi Marcel,
-
-On Tue, Mar 21, 2017 at 04:16:58PM +0200, Marcel Apfelbaum wrote:
->
-Can you please check if the below patch fixes the problem?
->
-Please note it is not a solution.
->
->
-diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
->
-index f9b7244..5b4d429 100644
->
---- a/hw/virtio/virtio-pci.c
->
-+++ b/hw/virtio/virtio-pci.c
->
-@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d,
->
-Error **errp)
->
-}
->
->
-pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx,
->
--                         PCI_BASE_ADDRESS_SPACE_MEMORY |
->
--                         PCI_BASE_ADDRESS_MEM_PREFETCH |
->
--                         PCI_BASE_ADDRESS_MEM_TYPE_64,
->
-+                         PCI_BASE_ADDRESS_SPACE_MEMORY,
->
-&proxy->modern_bar);
->
->
-proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap);
-Sorry for the delay trying this, I was away last week.
-
-No, it doesn't seem to make any difference.
-
-Thanks
-James
-signature.asc
-Description:
-Digital signature
-
diff --git a/results/classifier/009/permissions/14488057 b/results/classifier/009/permissions/14488057
deleted file mode 100644
index 6fa010b72..000000000
--- a/results/classifier/009/permissions/14488057
+++ /dev/null
@@ -1,721 +0,0 @@
-permissions: 0.940
-PID: 0.930
-device: 0.929
-debug: 0.925
-other: 0.922
-performance: 0.911
-semantic: 0.905
-boot: 0.892
-graphic: 0.887
-vnc: 0.882
-KVM: 0.880
-network: 0.846
-socket: 0.825
-files: 0.823
-
-[Qemu-devel] [BUG] user-to-root privesc inside VM via bad translation caching
-
-This is an issue in QEMU's system emulation for X86 in TCG mode.
-The issue permits an attacker who can execute code in guest ring 3
-with normal user privileges to inject code into other processes that
-are running in guest ring 3, in particular root-owned processes.
-
-== reproduction steps ==
-
- - Create an x86-64 VM and install Debian Jessie in it. The following
-   steps should all be executed inside the VM.
- - Verify that procmail is installed and the correct version:
-       address@hidden:~# apt-cache show procmail | egrep 'Version|SHA'
-       Version: 3.22-24
-       SHA1: 54ed2d51db0e76f027f06068ab5371048c13434c
-       SHA256: 4488cf6975af9134a9b5238d5d70e8be277f70caa45a840dfbefd2dc444bfe7f
- - Install build-essential and nasm ("apt install build-essential nasm").
- - Unpack the exploit, compile it and run it:
-       address@hidden:~$ tar xvf procmail_cache_attack.tar
-       procmail_cache_attack/
-       procmail_cache_attack/shellcode.asm
-       procmail_cache_attack/xp.c
-       procmail_cache_attack/compile.sh
-       procmail_cache_attack/attack.c
-       address@hidden:~$ cd procmail_cache_attack
-       address@hidden:~/procmail_cache_attack$ ./compile.sh
-       address@hidden:~/procmail_cache_attack$ ./attack
-       memory mappings set up
-       child is dead, codegen should be complete
-       executing code as root! :)
-       address@hidden:~/procmail_cache_attack# id
-       uid=0(root) gid=0(root) groups=0(root),[...]
-
-Note: While the exploit depends on the precise version of procmail,
-the actual vulnerability is in QEMU, not in procmail. procmail merely
-serves as a seldomly-executed setuid root binary into which code can
-be injected.
-
-
-== detailed issue description ==
-QEMU caches translated basic blocks. To look up a translated basic
-block, the function tb_find() is used, which uses tb_htable_lookup()
-in its slowpath, which in turn compares translated basic blocks
-(TranslationBlock) to the lookup information (struct tb_desc) using
-tb_cmp().
-
-tb_cmp() attempts to ensure (among other things) that both the virtual
-start address of the basic block and the physical addresses that the
-basic block covers match. When checking the physical addresses, it
-assumes that a basic block can span at most two pages.
-
-gen_intermediate_code() attempts to enforce this by stopping the
-translation of a basic block if nearly one page of instructions has
-been translated already:
-
-    /* if too long translation, stop generation too */
-    if (tcg_op_buf_full() ||
-        (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) ||
-        num_insns >= max_insns) {
-        gen_jmp_im(pc_ptr - dc->cs_base);
-        gen_eob(dc);
-        break;
-    }
-
-However, while real X86 processors have a maximum instruction length
-of 15 bytes, QEMU's instruction decoder for X86 does not place any
-limit on the instruction length or the number of instruction prefixes.
-Therefore, it is possible to create an arbitrarily long instruction
-by e.g. prepending an arbitrary number of LOCK prefixes to a normal
-instruction. This permits creating a basic block that spans three
-pages by simply appending an approximately page-sized instruction to
-the end of a normal basic block that starts close to the end of a
-page.
-
-Such an overlong basic block causes the basic block caching to fail as
-follows: If code is generated and cached for a basic block that spans
-the physical pages (A,E,B), this basic block will be returned by
-lookups in a process in which the physical pages (A,B,C) are mapped
-in the same virtual address range (assuming that all other lookup
-parameters match).
-
-This behavior can be abused by an attacker e.g. as follows: If a
-non-relocatable world-readable setuid executable legitimately contains
-the pages (A,B,C), an attacker can map (A,E,B) into his own process,
-at the normal load address of A, where E is an attacker-controlled
-page. If a legitimate basic block spans the pages A and B, an attacker
-can write arbitrary non-branch instructions at the start of E, then
-append an overlong instruction
-that ends behind the start of C, yielding a modified basic block that
-spans all three pages. If the attacker then executes the modified
-basic block in his process, the modified basic block is cached.
-Next, the attacker can execute the setuid binary, which will reuse the
-cached modified basic block, executing attacker-controlled
-instructions in the context of the privileged process.
-
-I am sending this to qemu-devel because a QEMU security contact
-told me that QEMU does not consider privilege escalation inside a
-TCG VM to be a security concern.
-procmail_cache_attack.tar
-Description:
-Unix tar archive
-
-On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
->
-This is an issue in QEMU's system emulation for X86 in TCG mode.
->
-The issue permits an attacker who can execute code in guest ring 3
->
-with normal user privileges to inject code into other processes that
->
-are running in guest ring 3, in particular root-owned processes.
->
-I am sending this to qemu-devel because a QEMU security contact
->
-told me that QEMU does not consider privilege escalation inside a
->
-TCG VM to be a security concern.
-Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
-
-We should really fix the crossing-a-page-boundary code for x86.
-I believe we do get it correct for ARM Thumb instructions.
-
-thanks
--- PMM
-
-On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote:
->
-On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
->
-> This is an issue in QEMU's system emulation for X86 in TCG mode.
->
-> The issue permits an attacker who can execute code in guest ring 3
->
-> with normal user privileges to inject code into other processes that
->
-> are running in guest ring 3, in particular root-owned processes.
->
->
-> I am sending this to qemu-devel because a QEMU security contact
->
-> told me that QEMU does not consider privilege escalation inside a
->
-> TCG VM to be a security concern.
->
->
-Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
->
->
-We should really fix the crossing-a-page-boundary code for x86.
->
-I believe we do get it correct for ARM Thumb instructions.
-How about doing the instruction size check as follows?
-
-diff --git a/target/i386/translate.c b/target/i386/translate.c
-index 72c1b03a2a..94cf3da719 100644
---- a/target/i386/translate.c
-+++ b/target/i386/translate.c
-@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
-*env, DisasContext *s,
-     default:
-         goto unknown_op;
-     }
-+    if (s->pc - pc_start > 15) {
-+        s->pc = pc_start;
-+        goto illegal_op;
-+    }
-     return s->pc;
-  illegal_op:
-     gen_illegal_opcode(s);
-
-Thanks,
---
-Pranith
-
-On 22 March 2017 at 14:55, Pranith Kumar <address@hidden> wrote:
->
-On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote:
->
-> On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote:
->
->> This is an issue in QEMU's system emulation for X86 in TCG mode.
->
->> The issue permits an attacker who can execute code in guest ring 3
->
->> with normal user privileges to inject code into other processes that
->
->> are running in guest ring 3, in particular root-owned processes.
->
->
->
->> I am sending this to qemu-devel because a QEMU security contact
->
->> told me that QEMU does not consider privilege escalation inside a
->
->> TCG VM to be a security concern.
->
->
->
-> Correct; it's just a bug. Don't trust TCG QEMU as a security boundary.
->
->
->
-> We should really fix the crossing-a-page-boundary code for x86.
->
-> I believe we do get it correct for ARM Thumb instructions.
->
->
-How about doing the instruction size check as follows?
->
->
-diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-index 72c1b03a2a..94cf3da719 100644
->
---- a/target/i386/translate.c
->
-+++ b/target/i386/translate.c
->
-@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
->
-*env, DisasContext *s,
->
-default:
->
-goto unknown_op;
->
-}
->
-+    if (s->pc - pc_start > 15) {
->
-+        s->pc = pc_start;
->
-+        goto illegal_op;
->
-+    }
->
-return s->pc;
->
-illegal_op:
->
-gen_illegal_opcode(s);
-This doesn't look right because it means we'll check
-only after we've emitted all the code to do the
-instruction operation, so the effect will be
-"execute instruction, then take illegal-opcode
-exception".
-
-We should check what the x86 architecture spec actually
-says and implement that.
-
-thanks
--- PMM
-
-On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
-<address@hidden> wrote:
->
->
->
-> How about doing the instruction size check as follows?
->
->
->
-> diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-> index 72c1b03a2a..94cf3da719 100644
->
-> --- a/target/i386/translate.c
->
-> +++ b/target/i386/translate.c
->
-> @@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State
->
-> *env, DisasContext *s,
->
->      default:
->
->          goto unknown_op;
->
->      }
->
-> +    if (s->pc - pc_start > 15) {
->
-> +        s->pc = pc_start;
->
-> +        goto illegal_op;
->
-> +    }
->
->      return s->pc;
->
->   illegal_op:
->
->      gen_illegal_opcode(s);
->
->
-This doesn't look right because it means we'll check
->
-only after we've emitted all the code to do the
->
-instruction operation, so the effect will be
->
-"execute instruction, then take illegal-opcode
->
-exception".
->
-The pc is restored to original address (s->pc = pc_start), so the
-exception will overwrite the generated illegal instruction and will be
-executed first.
-
-But yes, it's better to follow the architecture manual.
-
-Thanks,
---
-Pranith
-
-On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
->
-On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
->
-<address@hidden> wrote:
->
-> This doesn't look right because it means we'll check
->
-> only after we've emitted all the code to do the
->
-> instruction operation, so the effect will be
->
-> "execute instruction, then take illegal-opcode
->
-> exception".
->
-The pc is restored to original address (s->pc = pc_start), so the
->
-exception will overwrite the generated illegal instruction and will be
->
-executed first.
-s->pc is the guest PC -- moving that backwards will
-not do anything about the generated TCG IR that's
-already been written. You'd need to rewind the
-write pointer in the IR stream, which there is
-no support for doing AFAIK.
-
-thanks
--- PMM
-
-On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell
-<address@hidden> wrote:
->
-On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
->
-> On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
->
-> <address@hidden> wrote:
->
->> This doesn't look right because it means we'll check
->
->> only after we've emitted all the code to do the
->
->> instruction operation, so the effect will be
->
->> "execute instruction, then take illegal-opcode
->
->> exception".
->
->
-> The pc is restored to original address (s->pc = pc_start), so the
->
-> exception will overwrite the generated illegal instruction and will be
->
-> executed first.
->
->
-s->pc is the guest PC -- moving that backwards will
->
-not do anything about the generated TCG IR that's
->
-already been written. You'd need to rewind the
->
-write pointer in the IR stream, which there is
->
-no support for doing AFAIK.
-Ah, OK. Thanks for the explanation. May be we should check the size of
-the instruction while decoding the prefixes and error out once we
-exceed the limit. We would not generate any IR code.
-
---
-Pranith
-
-On 03/23/2017 02:29 AM, Pranith Kumar wrote:
-On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell
-<address@hidden> wrote:
-On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote:
-On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell
-<address@hidden> wrote:
-This doesn't look right because it means we'll check
-only after we've emitted all the code to do the
-instruction operation, so the effect will be
-"execute instruction, then take illegal-opcode
-exception".
-The pc is restored to original address (s->pc = pc_start), so the
-exception will overwrite the generated illegal instruction and will be
-executed first.
-s->pc is the guest PC -- moving that backwards will
-not do anything about the generated TCG IR that's
-already been written. You'd need to rewind the
-write pointer in the IR stream, which there is
-no support for doing AFAIK.
-Ah, OK. Thanks for the explanation. May be we should check the size of
-the instruction while decoding the prefixes and error out once we
-exceed the limit. We would not generate any IR code.
-Yes.
-It would not enforce a true limit of 15 bytes, since you can't know that until
-you've done the rest of the decode.  But you'd be able to say that no more than
-14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 bytes is used.
-Which does fix the bug.
-
-
-r~
-
-On 22/03/2017 21:01, Richard Henderson wrote:
->
->
->
-> Ah, OK. Thanks for the explanation. May be we should check the size of
->
-> the instruction while decoding the prefixes and error out once we
->
-> exceed the limit. We would not generate any IR code.
->
->
-Yes.
->
->
-It would not enforce a true limit of 15 bytes, since you can't know that
->
-until you've done the rest of the decode.  But you'd be able to say that
->
-no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
-bytes is used.
->
->
-Which does fix the bug.
-Yeah, that would work for 2.9 if somebody wants to put together a patch.
- Ensuring that all instruction fetching happens before translation side
-effects is a little harder, but perhaps it's also the opportunity to get
-rid of s->rip_offset which is a little ugly.
-
-Paolo
-
-On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
->
->
->
-On 22/03/2017 21:01, Richard Henderson wrote:
->
->>
->
->> Ah, OK. Thanks for the explanation. May be we should check the size of
->
->> the instruction while decoding the prefixes and error out once we
->
->> exceed the limit. We would not generate any IR code.
->
->
->
-> Yes.
->
->
->
-> It would not enforce a true limit of 15 bytes, since you can't know that
->
-> until you've done the rest of the decode.  But you'd be able to say that
->
-> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
-> bytes is used.
->
->
->
-> Which does fix the bug.
->
->
-Yeah, that would work for 2.9 if somebody wants to put together a patch.
->
-Ensuring that all instruction fetching happens before translation side
->
-effects is a little harder, but perhaps it's also the opportunity to get
->
-rid of s->rip_offset which is a little ugly.
-How about the following?
-
-diff --git a/target/i386/translate.c b/target/i386/translate.c
-index 72c1b03a2a..67c58b8900 100644
---- a/target/i386/translate.c
-+++ b/target/i386/translate.c
-@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
-*env, DisasContext *s,
-     s->vex_l = 0;
-     s->vex_v = 0;
-  next_byte:
-+    /* The prefixes can atmost be 14 bytes since x86 has an upper
-+       limit of 15 bytes for the instruction */
-+    if (s->pc - pc_start > 14) {
-+        goto illegal_op;
-+    }
-     b = cpu_ldub_code(env, s->pc);
-     s->pc++;
-     /* Collect prefixes.  */
-
---
-Pranith
-
-On 23/03/2017 17:50, Pranith Kumar wrote:
->
-On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
->
->
->
->
->
-> On 22/03/2017 21:01, Richard Henderson wrote:
->
->>>
->
->>> Ah, OK. Thanks for the explanation. May be we should check the size of
->
->>> the instruction while decoding the prefixes and error out once we
->
->>> exceed the limit. We would not generate any IR code.
->
->>
->
->> Yes.
->
->>
->
->> It would not enforce a true limit of 15 bytes, since you can't know that
->
->> until you've done the rest of the decode.  But you'd be able to say that
->
->> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
->> bytes is used.
->
->>
->
->> Which does fix the bug.
->
->
->
-> Yeah, that would work for 2.9 if somebody wants to put together a patch.
->
->  Ensuring that all instruction fetching happens before translation side
->
-> effects is a little harder, but perhaps it's also the opportunity to get
->
-> rid of s->rip_offset which is a little ugly.
->
->
-How about the following?
->
->
-diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-index 72c1b03a2a..67c58b8900 100644
->
---- a/target/i386/translate.c
->
-+++ b/target/i386/translate.c
->
-@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
->
-*env, DisasContext *s,
->
-s->vex_l = 0;
->
-s->vex_v = 0;
->
-next_byte:
->
-+    /* The prefixes can atmost be 14 bytes since x86 has an upper
->
-+       limit of 15 bytes for the instruction */
->
-+    if (s->pc - pc_start > 14) {
->
-+        goto illegal_op;
->
-+    }
->
-b = cpu_ldub_code(env, s->pc);
->
-s->pc++;
->
-/* Collect prefixes.  */
-Please make the comment more verbose, based on Richard's remark.  We
-should apply it to 2.9.
-
-Also, QEMU usually formats comments with stars on every line.
-
-Paolo
-
-On Thu, Mar 23, 2017 at 1:37 PM, Paolo Bonzini <address@hidden> wrote:
->
->
->
-On 23/03/2017 17:50, Pranith Kumar wrote:
->
-> On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote:
->
->>
->
->>
->
->> On 22/03/2017 21:01, Richard Henderson wrote:
->
->>>>
->
->>>> Ah, OK. Thanks for the explanation. May be we should check the size of
->
->>>> the instruction while decoding the prefixes and error out once we
->
->>>> exceed the limit. We would not generate any IR code.
->
->>>
->
->>> Yes.
->
->>>
->
->>> It would not enforce a true limit of 15 bytes, since you can't know that
->
->>> until you've done the rest of the decode.  But you'd be able to say that
->
->>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25
->
->>> bytes is used.
->
->>>
->
->>> Which does fix the bug.
->
->>
->
->> Yeah, that would work for 2.9 if somebody wants to put together a patch.
->
->>  Ensuring that all instruction fetching happens before translation side
->
->> effects is a little harder, but perhaps it's also the opportunity to get
->
->> rid of s->rip_offset which is a little ugly.
->
->
->
-> How about the following?
->
->
->
-> diff --git a/target/i386/translate.c b/target/i386/translate.c
->
-> index 72c1b03a2a..67c58b8900 100644
->
-> --- a/target/i386/translate.c
->
-> +++ b/target/i386/translate.c
->
-> @@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State
->
-> *env, DisasContext *s,
->
->      s->vex_l = 0;
->
->      s->vex_v = 0;
->
->   next_byte:
->
-> +    /* The prefixes can atmost be 14 bytes since x86 has an upper
->
-> +       limit of 15 bytes for the instruction */
->
-> +    if (s->pc - pc_start > 14) {
->
-> +        goto illegal_op;
->
-> +    }
->
->      b = cpu_ldub_code(env, s->pc);
->
->      s->pc++;
->
->      /* Collect prefixes.  */
->
->
-Please make the comment more verbose, based on Richard's remark.  We
->
-should apply it to 2.9.
->
->
-Also, QEMU usually formats comments with stars on every line.
-OK. I'll send a proper patch with updated comment.
-
-Thanks,
---
-Pranith
-
diff --git a/results/classifier/009/permissions/14887122 b/results/classifier/009/permissions/14887122
deleted file mode 100644
index ae50ba435..000000000
--- a/results/classifier/009/permissions/14887122
+++ /dev/null
@@ -1,268 +0,0 @@
-permissions: 0.964
-files: 0.944
-debug: 0.934
-semantic: 0.928
-device: 0.919
-PID: 0.914
-socket: 0.914
-graphic: 0.910
-performance: 0.897
-other: 0.890
-vnc: 0.871
-network: 0.855
-boot: 0.831
-KVM: 0.814
-
-[BUG][RFC] CPR transfer Issues: Socket permissions and PID files
-
-Hello,
-
-While testing CPR transfer I encountered two issues. The first is that the 
-transfer fails when running with pidfiles due to the destination qemu process 
-attempting to create the pidfile while it is still locked by the source 
-process. The second is that the transfer fails when running with the -run-with 
-user=$USERID parameter. This is because the destination qemu process creates 
-the UNIX sockets used for the CPR transfer before dropping to the lower 
-permissioned user, which causes them to be owned by the original user. The 
-source qemu process then does not have permission to connect to it because it 
-is already running as the lesser permissioned user.
-
-Reproducing the first issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have the -pidfile parameter passed on the command line. You 
-should see the following error on the command line of the second process:
-
-qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
-temporarily unavailable
-
-Reproducing the second issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have -run-with user=$USERID passed on the command line, where 
-$USERID is a different user from the one launching the processes. Then attempt 
-a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
-receive the following error via QMP:
-{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
-Permission denied"}}
-
-I provided a minimal patch that works around the second issue.
-
-Thank you,
-Ben Chaney
-
----
-include/system/os-posix.h | 4 ++++
-os-posix.c | 8 --------
-util/qemu-sockets.c | 21 +++++++++++++++++++++
-3 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/include/system/os-posix.h b/include/system/os-posix.h
-index ce5b3bccf8..2a414a914a 100644
---- a/include/system/os-posix.h
-+++ b/include/system/os-posix.h
-@@ -55,6 +55,10 @@ void os_setup_limits(void);
-void os_setup_post(void);
-int os_mlock(bool on_fault);
-
-+extern struct passwd *user_pwd;
-+extern uid_t user_uid;
-+extern gid_t user_gid;
-+
-/**
-* qemu_alloc_stack:
-* @sz: pointer to a size_t holding the requested usable stack size
-diff --git a/os-posix.c b/os-posix.c
-index 52925c23d3..9369b312a0 100644
---- a/os-posix.c
-+++ b/os-posix.c
-@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
-}
-
-
--/*
-- * Must set all three of these at once.
-- * Legal combinations are unset by name by uid
-- */
--static struct passwd *user_pwd; /* NULL non-NULL NULL */
--static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
--static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
--
-/*
-* Prepare to change user ID. user_id can be one of 3 forms:
-* - a username, in which case user ID will be changed to its uid,
-diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
-index 77477c1cd5..987977ead9 100644
---- a/util/qemu-sockets.c
-+++ b/util/qemu-sockets.c
-@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
-#endif
-}
-
-+/*
-+ * Must set all three of these at once.
-+ * Legal combinations are unset by name by uid
-+ */
-+struct passwd *user_pwd; /* NULL non-NULL NULL */
-+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-+
-static int unix_listen_saddr(UnixSocketAddress *saddr,
-int num,
-Error **errp)
-@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
-error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
-goto err;
-}
-+ if (user_pwd) {
-+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+ else if (user_uid != -1 && user_gid != -1) {
-+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+
-if (listen(sock, num) < 0) {
-error_setg_errno(errp, errno, "Failed to listen on socket");
-goto err;
---
-2.40.1
-
-Thank you Ben.  I appreciate you testing CPR and shaking out the bugs.
-I will study these and propose patches.
-
-My initial reaction to the pidfile issue is that the orchestration layer must
-pass a different filename when starting the destination qemu instance.  When
-using live update without containers, these types of resource conflicts in the
-global namespaces are a known issue.
-
-- Steve
-
-On 3/14/2025 2:33 PM, Chaney, Ben wrote:
-Hello,
-
-While testing CPR transfer I encountered two issues. The first is that the 
-transfer fails when running with pidfiles due to the destination qemu process 
-attempting to create the pidfile while it is still locked by the source 
-process. The second is that the transfer fails when running with the -run-with 
-user=$USERID parameter. This is because the destination qemu process creates 
-the UNIX sockets used for the CPR transfer before dropping to the lower 
-permissioned user, which causes them to be owned by the original user. The 
-source qemu process then does not have permission to connect to it because it 
-is already running as the lesser permissioned user.
-
-Reproducing the first issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have the -pidfile parameter passed on the command line. You 
-should see the following error on the command line of the second process:
-
-qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource 
-temporarily unavailable
-
-Reproducing the second issue:
-
-Create a source and destination qemu instance associated with the same VM where 
-both processes have -run-with user=$USERID passed on the command line, where 
-$USERID is a different user from the one launching the processes. Then attempt 
-a CPR transfer using UNIX sockets for the main and cpr sockets. You should 
-receive the following error via QMP:
-{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': 
-Permission denied"}}
-
-I provided a minimal patch that works around the second issue.
-
-Thank you,
-Ben Chaney
-
----
-include/system/os-posix.h | 4 ++++
-os-posix.c | 8 --------
-util/qemu-sockets.c | 21 +++++++++++++++++++++
-3 files changed, 25 insertions(+), 8 deletions(-)
-
-diff --git a/include/system/os-posix.h b/include/system/os-posix.h
-index ce5b3bccf8..2a414a914a 100644
---- a/include/system/os-posix.h
-+++ b/include/system/os-posix.h
-@@ -55,6 +55,10 @@ void os_setup_limits(void);
-void os_setup_post(void);
-int os_mlock(bool on_fault);
-
-+extern struct passwd *user_pwd;
-+extern uid_t user_uid;
-+extern gid_t user_gid;
-+
-/**
-* qemu_alloc_stack:
-* @sz: pointer to a size_t holding the requested usable stack size
-diff --git a/os-posix.c b/os-posix.c
-index 52925c23d3..9369b312a0 100644
---- a/os-posix.c
-+++ b/os-posix.c
-@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s)
-}
-
-
--/*
-- * Must set all three of these at once.
-- * Legal combinations are unset by name by uid
-- */
--static struct passwd *user_pwd; /* NULL non-NULL NULL */
--static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
--static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
--
-/*
-* Prepare to change user ID. user_id can be one of 3 forms:
-* - a username, in which case user ID will be changed to its uid,
-diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c
-index 77477c1cd5..987977ead9 100644
---- a/util/qemu-sockets.c
-+++ b/util/qemu-sockets.c
-@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr)
-#endif
-}
-
-+/*
-+ * Must set all three of these at once.
-+ * Legal combinations are unset by name by uid
-+ */
-+struct passwd *user_pwd; /* NULL non-NULL NULL */
-+uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */
-+gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */
-+
-static int unix_listen_saddr(UnixSocketAddress *saddr,
-int num,
-Error **errp)
-@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr,
-error_setg_errno(errp, errno, "Failed to bind socket to %s", path);
-goto err;
-}
-+ if (user_pwd) {
-+ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+ else if (user_uid != -1 && user_gid != -1) {
-+ if (chown(un.sun_path, user_uid, user_gid) < 0) {
-+ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", 
-path);
-+ goto err;
-+ }
-+ }
-+
-if (listen(sock, num) < 0) {
-error_setg_errno(errp, errno, "Failed to listen on socket");
-goto err;
---
-2.40.1
-
diff --git a/results/classifier/009/permissions/23300761 b/results/classifier/009/permissions/23300761
deleted file mode 100644
index 929fc4adf..000000000
--- a/results/classifier/009/permissions/23300761
+++ /dev/null
@@ -1,323 +0,0 @@
-permissions: 0.984
-debug: 0.978
-other: 0.963
-performance: 0.952
-PID: 0.950
-semantic: 0.950
-device: 0.932
-boot: 0.929
-socket: 0.927
-vnc: 0.926
-graphic: 0.924
-files: 0.910
-KVM: 0.897
-network: 0.879
-
-[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical)
-
-Hi,
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-Some of them are already know (wrong format strings), others look like
-real errors:
-- several multiplication results which don't work as they should in
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
-32 bit!),Â  target/i386/translate.c and other files
-- potential buffer overflows in gdbstub.c and other files
-I am afraid that the overflows in the block code are release critical,
-maybe that in target/i386/translate.c and other errors, too.
-About half of the alerts are issues which can be fixed later.
-
-Regards
-
-Stefan
-
-On 13/07/19 19:46, Stefan Weil wrote:
->
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
-Some of them are already know (wrong format strings), others look like
->
-real errors:
->
->
-- several multiplication results which don't work as they should in
->
-contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-32 bit!),Â  target/i386/translate.c and other files
-m->nb_clusters here is limited by s->l2_slice_size (see for example
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
-couldn't find this particular multiplication in Coverity, but it has
-about 250 issues marked as intentional or false positive so there's
-probably a lot of overlap with what LGTM found.
-
-Paolo
-
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-On 13/07/19 19:46, Stefan Weil wrote:
->
-> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->
->
-> Some of them are already known (wrong format strings), others look like
->
-> real errors:
->
->
->
-> - several multiplication results which don't work as they should in
->
-> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
-> 32 bit!),Â  target/i386/translate.c and other files
->
-m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-couldn't find this particular multiplication in Coverity, but it has
->
-about 250 issues marked as intentional or false positive so there's
->
-probably a lot of overlap with what LGTM found.
->
->
-Paolo
->
-From other projects I know that there is a certain overlap between the
-results from Coverity Scan an LGTM, but it is good to have both
-analyzers, and the results from LGTM are typically quite reliable.
-
-Even if we know that there is no multiplication overflow, the code could
-be modified. Either the assigned value should use the same data type as
-the factors (possible when there is never an overflow, avoids a size
-extension), or the multiplication could use the larger data type by
-adding a type cast to one of the factors (then an overflow cannot
-happen, static code analysers and human reviewers have an easier job,
-but the multiplication costs more time).
-
-Stefan
-
-Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben:
->
-Am 13.07.2019 um 21:42 schrieb Paolo Bonzini:
->
-> On 13/07/19 19:46, Stefan Weil wrote:
->
->> LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
->>
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
->
->>
->
->> Some of them are already known (wrong format strings), others look like
->
->> real errors:
->
->>
->
->> - several multiplication results which don't work as they should in
->
->> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only
->
->> 32 bit!),Â  target/i386/translate.c and other files
-Request sizes are limited to 32 bit in the generic block layer before
-they are even passed to the individual block drivers, so most if not all
-of these are going to be false positives.
-
->
-> m->nb_clusters here is limited by s->l2_slice_size (see for example
->
-> handle_alloc) so I wouldn't be surprised if this is a false positive.  I
->
-> couldn't find this particular multiplication in Coverity, but it has
->
-> about 250 issues marked as intentional or false positive so there's
->
-> probably a lot of overlap with what LGTM found.
->
->
->
-> Paolo
->
->
-From other projects I know that there is a certain overlap between the
->
-results from Coverity Scan an LGTM, but it is good to have both
->
-analyzers, and the results from LGTM are typically quite reliable.
->
->
-Even if we know that there is no multiplication overflow, the code could
->
-be modified. Either the assigned value should use the same data type as
->
-the factors (possible when there is never an overflow, avoids a size
->
-extension), or the multiplication could use the larger data type by
->
-adding a type cast to one of the factors (then an overflow cannot
->
-happen, static code analysers and human reviewers have an easier job,
->
-but the multiplication costs more time).
-But if you look at the code we're talking about, you see that it's
-complaining about things where being more explicit would make things
-less readable.
-
-For example, if complains about the multiplication in this line:
-
-    s->file_size += n * s->header.cluster_size;
-
-We know that n * s->header.cluster_size fits in 32 bits, but
-s->file_size is 64 bits (and has to be 64 bits). Do you really think we
-should introduce another uint32_t variable to store the intermediate
-result? And if we cast n to uint64_t, not only might the multiplication
-cost more time, but also human readers would wonder why the result could
-become larger than 32 bits. So a cast would be misleading.
-
-
-It also complains about this line:
-
-    ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size,
-                        PREALLOC_MODE_OFF, &local_err);
-
-Here, we don't even assign the result to a 64 bit variable, but just
-pass it to a function which takes a 64 bit parameter. Again, I don't
-think introducing additional variables for the intermediate result or
-adding casts would be an improvement of the situation.
-
-
-So I don't think this is a good enough tool to base our code on what it
-does and doesn't understand. It would have too much of a negative impact
-on our code. We'd rather need a way to mark false positives as such and
-move on without changing the code in such cases.
-
-Kevin
-
-On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote:
->
-LGTM reports 16 errors, 81 warnings and 119 recommendations:
->
-https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list
-.
-I had a look at some of these before, but mostly I came
-to the conclusion that it wasn't worth trying to put the
-effort into keeping up with the site because they didn't
-seem to provide any useful way to mark things as false
-positives. Coverity has its flaws but at least you can do
-that kind of thing in its UI (it runs at about a 33% fp
-rate, I think.) "Analyzer thinks this multiply can overflow
-but in fact it's not possible" is quite a common false
-positive cause...
-
-Anyway, if you want to fish out specific issues, analyse
-whether they're false positive or real, and report them
-to the mailing list as followups to the patches which
-introduced the issue, that's probably the best way for
-us to make use of this analyzer. (That is essentially
-what I do for coverity.)
-
-thanks
--- PMM
-
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
-[...]
->
-"Analyzer thinks this multiply can overflow
->
-but in fact it's not possible" is quite a common false
->
-positive cause...
-The analysers don't complain because a multiply can overflow.
-
-They complain because the code indicates that a larger result is
-expected, for example uint64_t = uint32_t * uint32_t. They would not
-complain for the same multiplication if it were assigned to a uint32_t.
-
-So there is a simple solution to write the code in a way which avoids
-false positives...
-
-Stefan
-
-Stefan Weil <address@hidden> writes:
-
->
-Am 14.07.2019 um 19:30 schrieb Peter Maydell:
->
-[...]
->
-> "Analyzer thinks this multiply can overflow
->
-> but in fact it's not possible" is quite a common false
->
-> positive cause...
->
->
->
-The analysers don't complain because a multiply can overflow.
->
->
-They complain because the code indicates that a larger result is
->
-expected, for example uint64_t = uint32_t * uint32_t. They would not
->
-complain for the same multiplication if it were assigned to a uint32_t.
-I agree this is an anti-pattern.
-
->
-So there is a simple solution to write the code in a way which avoids
->
-false positives...
-You wrote elsewhere in this thread:
-
-    Either the assigned value should use the same data type as the
-    factors (possible when there is never an overflow, avoids a size
-    extension), or the multiplication could use the larger data type by
-    adding a type cast to one of the factors (then an overflow cannot
-    happen, static code analysers and human reviewers have an easier
-    job, but the multiplication costs more time).
-
-Makes sense to me.
-
-On 7/14/19 5:30 PM, Peter Maydell wrote:
->
-I had a look at some of these before, but mostly I came
->
-to the conclusion that it wasn't worth trying to put the
->
-effort into keeping up with the site because they didn't
->
-seem to provide any useful way to mark things as false
->
-positives. Coverity has its flaws but at least you can do
->
-that kind of thing in its UI (it runs at about a 33% fp
->
-rate, I think.)
-Yes, LGTM wants you to modify the source code with
-
-  /* lgtm [cpp/some-warning-code] */
-
-and on the same line as the reported problem.  Which is mildly annoying in that
-you're definitely committing to LGTM in the long term.  Also for any
-non-trivial bit of code, it will almost certainly run over 80 columns.
-
-
-r~
-
diff --git a/results/classifier/009/permissions/26095107 b/results/classifier/009/permissions/26095107
deleted file mode 100644
index d5075496a..000000000
--- a/results/classifier/009/permissions/26095107
+++ /dev/null
@@ -1,168 +0,0 @@
-permissions: 0.993
-debug: 0.993
-files: 0.989
-PID: 0.988
-device: 0.988
-performance: 0.987
-socket: 0.987
-boot: 0.987
-KVM: 0.985
-other: 0.979
-semantic: 0.974
-vnc: 0.972
-graphic: 0.955
-network: 0.879
-
-[Qemu-devel]  [Bug Report] vm paused after succeeding to migrate
-
-Hi, all
-I encounterd a bug when I try to migrate a windows vm.
-
-Enviroment information:
-host A: cpu E5620(model WestmereEP without flag xsave)
-host B: cpu E5-2643(model SandyBridgeEP with xsave)
-
-The reproduce steps is :
-1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
-2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
-3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
-4. Then migrate the vm to host A (successfully), but vm was paused, and qemu 
-printed log as followed:
-
-KVM: entry failed, hardware error 0x80000021
-
-If you're running a guest on an Intel machine without unrestricted mode
-support, the failure can be most likely due to the guest entering an invalid
-state for Intel VT. For example, the guest maybe running in big real mode
-which is not supported on less recent Intel processors.
-
-EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
-ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
-EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
-ES =0000 00000000 0000ffff 00009300
-CS =f000 ffff0000 0000ffff 00009b00
-SS =0000 00000000 0000ffff 00009300
-DS =0000 00000000 0000ffff 00009300
-FS =0000 00000000 0000ffff 00009300
-GS =0000 00000000 0000ffff 00009300
-LDT=0000 00000000 0000ffff 00008200
-TR =0000 00000000 0000ffff 00008b00
-GDT=     00000000 0000ffff
-IDT=     00000000 0000ffff
-CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
-DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 
-DR3=0000000000000000
-DR6=00000000ffff0ff0 DR7=0000000000000400
-EFER=0000000000000000
-Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 
-00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
-
-I have found that problem happened when kvm_put_sregs returns err -22(called by 
-kvm_arch_put_registers(qemu)).
-Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has no 
-X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
-So should we cancel migration when kvm_arch_put_registers returns error?
-
-* linzhecheng (address@hidden) wrote:
->
-Hi, all
->
-I encounterd a bug when I try to migrate a windows vm.
->
->
-Enviroment information:
->
-host A: cpu E5620(model WestmereEP without flag xsave)
->
-host B: cpu E5-2643(model SandyBridgeEP with xsave)
->
->
-The reproduce steps is :
->
-1. Start a windows 2008 vm with -cpu host(which means host-passthrough).
->
-2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully).
->
-3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1.
->
-4. Then migrate the vm to host A (successfully), but vm was paused, and qemu
->
-printed log as followed:
-Remember that migrating using -cpu host  across different CPU models is NOT
-expected to work.
-
->
-KVM: entry failed, hardware error 0x80000021
->
->
-If you're running a guest on an Intel machine without unrestricted mode
->
-support, the failure can be most likely due to the guest entering an invalid
->
-state for Intel VT. For example, the guest maybe running in big real mode
->
-which is not supported on less recent Intel processors.
->
->
-EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000
->
-ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20
->
-EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
->
-ES =0000 00000000 0000ffff 00009300
->
-CS =f000 ffff0000 0000ffff 00009b00
->
-SS =0000 00000000 0000ffff 00009300
->
-DS =0000 00000000 0000ffff 00009300
->
-FS =0000 00000000 0000ffff 00009300
->
-GS =0000 00000000 0000ffff 00009300
->
-LDT=0000 00000000 0000ffff 00008200
->
-TR =0000 00000000 0000ffff 00008b00
->
-GDT=     00000000 0000ffff
->
-IDT=     00000000 0000ffff
->
-CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000
->
-DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000
->
-DR3=0000000000000000
->
-DR6=00000000ffff0ff0 DR7=0000000000000400
->
-EFER=0000000000000000
->
-Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00
->
-00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
->
-00
->
->
-I have found that problem happened when kvm_put_sregs returns err -22(called
->
-by kvm_arch_put_registers(qemu)).
->
-Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has
->
-no X86_FEATURE_XSAVE but cr4.OSXSAVE=1.
->
-So should we cancel migration when kvm_arch_put_registers returns error?
-It would seem good if we can make the migration fail there rather than
-hitting that KVM error.
-It looks like we need to do a bit of plumbing to convert the places that
-call it to return a bool rather than void.
-
-Dave
-
---
-Dr. David Alan Gilbert / address@hidden / Manchester, UK
-
diff --git a/results/classifier/009/permissions/26430026 b/results/classifier/009/permissions/26430026
deleted file mode 100644
index d081ea9ab..000000000
--- a/results/classifier/009/permissions/26430026
+++ /dev/null
@@ -1,175 +0,0 @@
-permissions: 0.937
-debug: 0.925
-KVM: 0.919
-semantic: 0.904
-device: 0.904
-performance: 0.898
-PID: 0.894
-vnc: 0.893
-files: 0.879
-graphic: 0.862
-boot: 0.841
-socket: 0.817
-other: 0.813
-network: 0.758
-
-[BUG] cxl,i386: e820 mappings may not be correct for cxl
-
-Context included below from prior discussion
-    - `cxl create-region` would fail on inability to allocate memory
-    - traced this down to the memory region being marked RESERVED
-    - E820 map marks the CXL fixed memory window as RESERVED
-
-
-Re: x86 errors, I found that region worked with this patch. (I also
-added the SRAT patches the Davidlohr posted, but I do not think they are
-relevant).
-
-I don't think this is correct, and setting this to E820_RAM causes the
-system to fail to boot at all, but with this change `cxl create-region`
-succeeds, which suggests our e820 mappings in the i386 machine are
-incorrect.
-
-Anyone who can help or have an idea as to what e820 should actually be
-doing with this region, or if this is correct and something else is
-failing, please help!
-
-
-diff --git a/hw/i386/pc.c b/hw/i386/pc.c
-index 566accf7e6..a5e688a742 100644
---- a/hw/i386/pc.c
-+++ b/hw/i386/pc.c
-@@ -1077,7 +1077,7 @@ void pc_memory_init(PCMachineState *pcms,
-                 memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw,
-                                       "cxl-fixed-memory-region", fw->size);
-                 memory_region_add_subregion(system_memory, fw->base, &fw->mr);
--                e820_add_entry(fw->base, fw->size, E820_RESERVED);
-+                e820_add_entry(fw->base, fw->size, E820_NVS);
-                 cxl_fmw_base += fw->size;
-                 cxl_resv_end = cxl_fmw_base;
-             }
-
-
-On Mon, Oct 10, 2022 at 05:32:42PM +0100, Jonathan Cameron wrote:
->
->
-> > but i'm not sure of what to do with this info.  We have some proof
->
-> > that real hardware works with this no problem, and the only difference
->
-> > is that the EFI/bios/firmware is setting the memory regions as `usable`
->
-> > or `soft reserved`, which would imply the EDK2 is the blocker here
->
-> > regardless of the OS driver status.
->
-> >
->
-> > But I'd seen elsewhere you had gotten some of this working, and I'm
->
-> > failing to get anything working at the moment.  If you have any input i
->
-> > would greatly appreciate the help.
->
-> >
->
-> > QEMU config:
->
-> >
->
-> > /opt/qemu-cxl2/bin/qemu-system-x86_64 \
->
-> > -drive
->
-> > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=d\
->
-> > -m 2G,slots=4,maxmem=4G \
->
-> > -smp 4 \
->
-> > -machine type=q35,accel=kvm,cxl=on \
->
-> > -enable-kvm \
->
-> > -nographic \
->
-> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
->
-> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \
->
-> > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=256M \
->
-> > -object memory-backend-file,id=lsa0,mem-path=/tmp/cxl-lsa0,size=256M \
->
-> > -device cxl-type3,bus=rp0,pmem=true,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0
->
-> > \
->
-> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=256M
->
-> >
->
-> > I'd seen on the lists that you had seen issues with single-rp setups,
->
-> > but no combination of configuration I've tried (including all the ones
->
-> > in the docs and tests) lead to a successful region creation with
->
-> > `cxl create-region`
->
->
->
-> Hmm. Let me have a play.  I've not run x86 tests for a while so
->
-> perhaps something is missing there.
->
->
->
-> I'm carrying a patch to override check_last_peer() in
->
-> cxl_port_setup_targets() as that is wrong for some combinations,
->
-> but that doesn't look like it's related to what you are seeing.
->
->
-I'm not sure if it's relevant, but turned out I'd forgotten I'm carrying 3
->
-patches that aren't upstream (and one is a horrible hack).
->
->
-Hack:
-https://lore.kernel.org/linux-cxl/20220819094655.000005ed@huawei.com/
->
-Shouldn't affect a simple case like this...
->
->
-https://lore.kernel.org/linux-cxl/20220819093133.00006c22@huawei.com/T/#t
->
-(Dan's version)
->
->
-https://lore.kernel.org/linux-cxl/20220815154044.24733-1-Jonathan.Cameron@huawei.com/T/#t
->
->
-For writes to work you will currently need two rps (nothing on the second is
->
-fine)
->
-as we still haven't resolved if the kernel should support an HDM decoder on
->
-a host bridge with one port.  I think it should (Spec allows it), others
->
-unconvinced.
->
->
-Note I haven't shifted over to x86 yet so may still be something different
->
-from
->
-arm64.
->
->
-Jonathan
->
->
-
diff --git a/results/classifier/009/permissions/31349848 b/results/classifier/009/permissions/31349848
deleted file mode 100644
index 8dc0eb6b0..000000000
--- a/results/classifier/009/permissions/31349848
+++ /dev/null
@@ -1,164 +0,0 @@
-permissions: 0.908
-PID: 0.903
-other: 0.901
-device: 0.881
-graphic: 0.876
-performance: 0.864
-vnc: 0.854
-semantic: 0.846
-socket: 0.846
-KVM: 0.827
-debug: 0.826
-files: 0.820
-boot: 0.815
-network: 0.769
-
-[Qemu-devel]  [BUG] qemu stuck when detach host-usb device
-
-Description of problem:
-The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 
-G2), which is attached
-to xhci controller(on host). Qemu will stuck if I detach it from guest.
-
-How reproducible:
-100%
-
-Steps to Reproduce:
-1.            Use usb stick to copy files in guest , make it busy working.
-2.            virsh detach-device vm_name usb.xml
-
-Then qemu will stuck for 20s, I found this is because libusb_release_interface 
-block for 20s.
-Dmesg prints:
-
-[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
-[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
-[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
-[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
-
-Is this a hardware error or software's bug?
-
-On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
->
-Description of problem:
->
-The guest has a host-usb device(Kingston Technology DataTraveler 100
->
-G3/G4/SE9 G2), which is attached
->
-to xhci controller(on host). Qemu will stuck if I detach it from guest.
->
->
-How reproducible:
->
-100%
->
->
-Steps to Reproduce:
->
-1.            Use usb stick to copy files in guest , make it busy working.
->
-2.            virsh detach-device vm_name usb.xml
->
->
-Then qemu will stuck for 20s, I found this is because
->
-libusb_release_interface block for 20s.
->
-Dmesg prints:
->
->
-[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
->
-[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
->
-[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
->
-[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
->
->
-Is this a hardware error or software's bug?
-I'd guess software error, could be is libusb or (host) linux kernel.
-Cc'ing libusb-devel.
-
-cheers,
-  Gerd
-
->
------Original Message-----
->
-From: Gerd Hoffmann [
-mailto:address@hidden
->
-Sent: Tuesday, November 27, 2018 2:09 PM
->
-To: linzhecheng <address@hidden>
->
-Cc: address@hidden; wangxin (U) <address@hidden>;
->
-Zhoujian (jay) <address@hidden>; address@hidden
->
-Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device
->
->
-On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote:
->
-> Description of problem:
->
-> The guest has a host-usb device(Kingston Technology DataTraveler 100
->
-> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will
->
-> stuck
->
-if I detach it from guest.
->
->
->
-> How reproducible:
->
-> 100%
->
->
->
-> Steps to Reproduce:
->
-> 1.            Use usb stick to copy files in guest , make it busy working.
->
-> 2.            virsh detach-device vm_name usb.xml
->
->
->
-> Then qemu will stuck for 20s, I found this is because
->
-> libusb_release_interface
->
-block for 20s.
->
-> Dmesg prints:
->
->
->
-> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed.
->
-> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed.
->
-> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed.
->
-> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed.
->
->
->
-> Is this a hardware error or software's bug?
->
->
-I'd guess software error, could be is libusb or (host) linux kernel.
->
-Cc'ing libusb-devel.
-Perhaps it's usb driver's bug. Could you also reproduce it?
->
->
-cheers,
->
-Gerd
-
diff --git a/results/classifier/009/permissions/48245039 b/results/classifier/009/permissions/48245039
deleted file mode 100644
index 18cdaa24d..000000000
--- a/results/classifier/009/permissions/48245039
+++ /dev/null
@@ -1,540 +0,0 @@
-permissions: 0.966
-debug: 0.961
-PID: 0.954
-device: 0.953
-other: 0.953
-semantic: 0.939
-graphic: 0.935
-socket: 0.932
-boot: 0.932
-vnc: 0.926
-files: 0.924
-performance: 0.890
-KVM: 0.855
-network: 0.818
-
-[Qemu-devel] [BUG] gcov support appears to be broken
-
-Hello, according to out docs, here is the procedure that should produce 
-coverage report for execution of the complete "make check":
-
-#./configure --enable-gcov
-#make
-#make check
-#make coverage-report
-
-It seems that first three commands execute as expected. (For example, there are 
-plenty of files generated by "make check" that would've not been generated if 
-"enable-gcov" hadn't been chosen.) However, the last command complains about 
-some missing files related to FP support. If those files are added (for 
-example, artificially, using "touch <missing-file"), that it starts complaining 
-about missing some decodetree-generated files. Other kinds of files are 
-involved too.
-
-It would be nice to have coverage support working. Please somebody take a look, 
-or explain if I make a mistake or misunderstood our gcov support.
-
-Yours,
-Aleksandar
-
-On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
->
->
-Hello, according to out docs, here is the procedure that should produce
->
-coverage report for execution of the complete "make check":
->
->
-#./configure --enable-gcov
->
-#make
->
-#make check
->
-#make coverage-report
->
->
-It seems that first three commands execute as expected. (For example, there
->
-are plenty of files generated by "make check" that would've not been
->
-generated if "enable-gcov" hadn't been chosen.) However, the last command
->
-complains about some missing files related to FP support. If those files are
->
-added (for example, artificially, using "touch <missing-file"), that it
->
-starts complaining about missing some decodetree-generated files. Other kinds
->
-of files are involved too.
->
->
-It would be nice to have coverage support working. Please somebody take a
->
-look, or explain if I make a mistake or misunderstood our gcov support.
-Cc'ing Alex who's probably the closest we have to a gcov expert.
-
-(make/make check of a --enable-gcov build is in the set of things our
-Travis CI setup runs, so we do defend that part against regressions.)
-
-thanks
--- PMM
-
-Peter Maydell <address@hidden> writes:
-
->
-On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote:
->
->
->
-> Hello, according to out docs, here is the procedure that should produce
->
-> coverage report for execution of the complete "make check":
->
->
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-> support. If those files are added (for example, artificially, using
->
-> "touch <missing-file"), that it starts complaining about missing some
->
-> decodetree-generated files. Other kinds of files are involved too.
-The gcov tool is fairly noisy about missing files but that just
-indicates the tests haven't exercised those code paths. "make check"
-especially doesn't touch much of the TCG code and a chunk of floating
-point.
-
->
->
->
-> It would be nice to have coverage support working. Please somebody
->
-> take a look, or explain if I make a mistake or misunderstood our gcov
->
-> support.
-So your failure mode is no report is generated at all? It's working for
-me here.
-
->
->
-Cc'ing Alex who's probably the closest we have to a gcov expert.
->
->
-(make/make check of a --enable-gcov build is in the set of things our
->
-Travis CI setup runs, so we do defend that part against regressions.)
-We defend the build but I have just checked and it seems our
-check_coverage script is currently failing:
-https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328
-But as it's an after_success script it doesn't fail the build.
-
->
->
-thanks
->
--- PMM
---
-Alex BennÃ©e
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Alex, no report is generated for my test setups - in fact, "make 
-coverage-report" even says that it explicitly deletes what appears to be the 
-main coverage report html file).
-
-This is the terminal output of an unsuccessful executions of "make 
-coverage-report" for recent ToT:
-
-~/Build/qemu-TOT-TEST$ make coverage-report
-make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
-Makefile:1048: recipe for target 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
-make: *** 
-[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
-
-This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
-existed for quite some time)
-
-~/Build/qemu-3.0$ make coverage-report
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 
-'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
-Makefile:992: recipe for target 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
-make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
-Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
-
-Fond regards,
-Aleksandar
-
-
->
-Alex BennÃ©e
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Another piece of info:
-
-~/Build/qemu-TOT-TEST$ gcov --version
-gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010
-Copyright (C) 2015 Free Software Foundation, Inc.
-This is free software; see the source for copying conditions.
-There is NO warranty; not even for MERCHANTABILITY or 
-FITNESS FOR A PARTICULAR PURPOSE.
-
-:~/Build/qemu-TOT-TEST$ gcc --version
-gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0
-Copyright (C) 2017 Free Software Foundation, Inc.
-This is free software; see the source for copying conditions.  There is NO
-warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-
-
-
-
-Alex, no report is generated for my test setups - in fact, "make 
-coverage-report" even says that it explicitly deletes what appears to be the 
-main coverage report html file).
-
-This is the terminal output of an unsuccessful executions of "make 
-coverage-report" for recent ToT:
-
-~/Build/qemu-TOT-TEST$ make coverage-report
-make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp'
-make[1]: Nothing to be done for 'all'.
-make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp'
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 'wrap.inc.c'
-Makefile:1048: recipe for target 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed
-make: *** 
-[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html'
-
-This instance is executed in QEMU 3.0 source tree: (so, it looks the problem 
-existed for quite some time)
-
-~/Build/qemu-3.0$ make coverage-report
-        CHK version_gen.h
-  GEN     coverage-report.html
-Traceback (most recent call last):
-  File "/usr/bin/gcovr", line 1970, in <module>
-    print_html_report(covdata, options.html_details)
-  File "/usr/bin/gcovr", line 1473, in print_html_report
-    INPUT = open(data['FILENAME'], 'r')
-IOError: [Errno 2] No such file or directory: 
-'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c'
-Makefile:992: recipe for target 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed
-make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] 
-Error 1
-make: *** Deleting file 
-'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html'
-
-Fond regards,
-Aleksandar
-
-
->
-Alex BennÃ©e
-
->
-> #./configure --enable-gcov
->
-> #make
->
-> #make check
->
-> #make coverage-report
->
->
->
-> It seems that first three commands execute as expected. (For example,
->
-> there are plenty of files generated by "make check" that would've not
->
-> been generated if "enable-gcov" hadn't been chosen.) However, the
->
-> last command complains about some missing files related to FP
->
-So your failure mode is no report is generated at all? It's working for
->
-me here.
-Alex, here is the thing:
-
-Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from 
-git repo to the most recent 4.1 (actually, to a dev version, from the very tip 
-of the tree), and "make coverage-report" started generating coverage reports. 
-It did emit some error messages (totally different than previous), but still it 
-did not stop like it used to do with gcovr 3.2.
-
-Perhaps you would want to add some gcov/gcovr minimal version info in our docs. 
-(or at least a statement "this was tested with such and such gcc, gcov and 
-gcovr", etc.?)
-
-Coverage report looked fine at first glance, but it a kind of disappointed me 
-when I digged deeper into its content - for example, it shows very low coverage 
-for our FP code (softfloat), while, in fact, we know that "make check" contains 
-detailed tests on FP functionalities. But this is most likely a separate 
-problem of a very different nature, perhaps the issue of separate git repo for 
-FP tests (testfloat) that our FP tests use as a mid-layer.
-
-I'll try how everything works with my test examples, and will let you know.
-
-Your help is greatly appreciated,
-Aleksandar
-
-Fond regards,
-Aleksandar
-
-
->
-Alex BennÃ©e
-
-Aleksandar Markovic <address@hidden> writes:
-
->
->> #./configure --enable-gcov
->
->> #make
->
->> #make check
->
->> #make coverage-report
->
->>
->
->> It seems that first three commands execute as expected. (For example,
->
->> there are plenty of files generated by "make check" that would've not
->
->> been generated if "enable-gcov" hadn't been chosen.) However, the
->
->> last command complains about some missing files related to FP
->
->
-> So your failure mode is no report is generated at all? It's working for
->
-> me here.
->
->
-Alex, here is the thing:
->
->
-Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from
->
-git repo to the most recent 4.1 (actually, to a dev version, from the very
->
-tip of the tree), and "make coverage-report" started generating coverage
->
-reports. It did emit some error messages (totally different than previous),
->
-but still it did not stop like it used to do with gcovr 3.2.
->
->
-Perhaps you would want to add some gcov/gcovr minimal version info in our
->
-docs. (or at least a statement "this was tested with such and such gcc, gcov
->
-and gcovr", etc.?)
->
->
-Coverage report looked fine at first glance, but it a kind of
->
-disappointed me when I digged deeper into its content - for example,
->
-it shows very low coverage for our FP code (softfloat), while, in
->
-fact, we know that "make check" contains detailed tests on FP
->
-functionalities. But this is most likely a separate problem of a very
->
-different nature, perhaps the issue of separate git repo for FP tests
->
-(testfloat) that our FP tests use as a mid-layer.
-I get:
-
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
-
-Which is not bad considering we don't exercise the 80 and 128 bit
-softfloat code at all (which is not shared by the re-factored 16/32/64
-bit code).
-
->
->
-I'll try how everything works with my test examples, and will let you know.
->
->
-Your help is greatly appreciated,
->
-Aleksandar
->
->
-Fond regards,
->
-Aleksandar
->
->
->
-> Alex BennÃ©e
---
-Alex BennÃ©e
-
->
-> it shows very low coverage for our FP code (softfloat), while, in
->
-> fact, we know that "make check" contains detailed tests on FP
->
-> functionalities. But this is most likely a separate problem of a very
->
-> different nature, perhaps the issue of separate git repo for FP tests
->
-> (testfloat) that our FP tests use as a mid-layer.
->
->
-I get:
->
->
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
->
-I would expect that kind of result too.
-
-However, I get:
-
-File:   fpu/softfloat.c                 Lines:  8       3334    0.2 %
-Date:   2019-08-05 19:56:58             Branches:       3       2376    0.1 %
-
-:(
-
-OK, I'll try to figure that out, and most likely I could live with it if it is 
-an isolated problem.
-
-Thank you for your assistance in this matter,
-Aleksandar
-
->
-Which is not bad considering we don't exercise the 80 and 128 bit
->
-softfloat code at all (which is not shared by the re-factored 16/32/64
->
-bit code).
->
->
-Alex BennÃ©e
-
->
-> it shows very low coverage for our FP code (softfloat), while, in
->
-> fact, we know that "make check" contains detailed tests on FP
->
-> functionalities. But this is most likely a separate problem of a very
->
-> different nature, perhaps the issue of separate git repo for FP tests
->
-> (testfloat) that our FP tests use as a mid-layer.
->
->
-I get:
->
->
-68.6 %  2593 / 3782     62.2 %  1690 / 2718
->
-This problem is solved too. (and it is my fault)
-
-I worked with multiple versions of QEMU, and my previous low-coverage results 
-were for QEMU 3.0, and for that version the directory tests/fp did not even 
-exist. :D (<blush>)
-
-For QEMU ToT, I get now:
-
-fpu/softfloat.c         
-        68.8 %  2592 / 3770     62.3 %  1693 / 2718
-
-which is identical for all intents and purposes to your result.
-
-Yours cordially,
-Aleksandar
-
diff --git a/results/classifier/009/permissions/50773216 b/results/classifier/009/permissions/50773216
deleted file mode 100644
index b7f8c0d20..000000000
--- a/results/classifier/009/permissions/50773216
+++ /dev/null
@@ -1,120 +0,0 @@
-permissions: 0.813
-device: 0.764
-other: 0.737
-graphic: 0.723
-semantic: 0.669
-files: 0.666
-debug: 0.659
-vnc: 0.656
-socket: 0.652
-boot: 0.637
-PID: 0.636
-performance: 0.628
-network: 0.606
-KVM: 0.601
-
-[Qemu-devel] Can I have someone's feedback on [bug 1809075] Concurrency bug on keyboard events: capslock LED messing up keycode streams causes character misses at guest kernel
-
-Hi everyone.
-Can I please have someone's feedback on this bug?
-https://bugs.launchpad.net/qemu/+bug/1809075
-Briefly, guest OS loses characters sent to it via vnc. And I spot the
-bug in relation to ps2 driver.
-I'm thinking of possible fixes and I might want to use a memory barrier.
-But I would really like to have some suggestion from a qemu developer
-first. For example, can we brutally drop capslock LED key events in ps2
-queue?
-It is actually relevant to openQA, an automated QA tool for openSUSE.
-And this bug blocks a few test cases for us.
-Thank you in advance!
-
-Kind regards,
-Gao Zhiyuan
-
-Cc'ing Marc-AndrÃ© & Gerd.
-
-On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
->
-Hi everyone.
->
->
-Can I please have someone's feedback on this bug?
->
-https://bugs.launchpad.net/qemu/+bug/1809075
->
-Briefly, guest OS loses characters sent to it via vnc. And I spot the
->
-bug in relation to ps2 driver.
->
->
-I'm thinking of possible fixes and I might want to use a memory barrier.
->
-But I would really like to have some suggestion from a qemu developer
->
-first. For example, can we brutally drop capslock LED key events in ps2
->
-queue?
->
->
-It is actually relevant to openQA, an automated QA tool for openSUSE.
->
-And this bug blocks a few test cases for us.
->
->
-Thank you in advance!
->
->
-Kind regards,
->
-Gao Zhiyuan
->
-
-On Thu, Jan 03, 2019 at 12:05:54PM +0100, Philippe Mathieu-DaudÃ© wrote:
->
-Cc'ing Marc-AndrÃ© & Gerd.
->
->
-On 12/19/18 10:31 AM, Gao Zhiyuan wrote:
->
-> Hi everyone.
->
->
->
-> Can I please have someone's feedback on this bug?
->
->
-https://bugs.launchpad.net/qemu/+bug/1809075
->
-> Briefly, guest OS loses characters sent to it via vnc. And I spot the
->
-> bug in relation to ps2 driver.
->
->
->
-> I'm thinking of possible fixes and I might want to use a memory barrier.
->
-> But I would really like to have some suggestion from a qemu developer
->
-> first. For example, can we brutally drop capslock LED key events in ps2
->
-> queue?
-There is no "capslock LED key event".  0xfa is KBD_REPLY_ACK, and the
-device queues it in response to guest port writes.  Yes, the ack can
-race with actual key events.  But IMO that isn't a bug in qemu.
-
-Probably the linux kernel just throws away everything until it got the
-ack for the port write, and that way the key event gets lost.  On
-physical hardware you will not notice because it is next to impossible
-to type fast enough to hit the race window.
-
-So, go fix the kernel.
-
-Alternatively fix vncdotool to send uppercase letters properly with
-shift key pressed.  Then qemu wouldn't generate capslock key events
-(that happens because qemu thinks guest and host capslock state is out
-of sync) and the guests's capslock led update request wouldn't get into
-the way.
-
-cheers,
-  Gerd
-
diff --git a/results/classifier/009/permissions/55247116 b/results/classifier/009/permissions/55247116
deleted file mode 100644
index cbe7dfafd..000000000
--- a/results/classifier/009/permissions/55247116
+++ /dev/null
@@ -1,1320 +0,0 @@
-permissions: 0.946
-other: 0.945
-debug: 0.941
-performance: 0.938
-graphic: 0.933
-PID: 0.929
-socket: 0.929
-semantic: 0.928
-device: 0.919
-boot: 0.918
-network: 0.916
-vnc: 0.916
-files: 0.912
-KVM: 0.894
-
-[Qemu-devel]  [RFC/BUG] xen-mapcache: buggy invalidate map cache?
-
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
-instead of first level entry (if map to rom other than guest memory
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns probably
-mapped to different mfns, thus if guest asks device to DMA to these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-Hi,
->
->
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-instead of first level entry (if map to rom other than guest memory
->
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-out memory, qemu did not invalidate cache entries in linked
->
-list(entry->next), so when VM balloon back in memory, gfns probably
->
-mapped to different mfns, thus if guest asks device to DMA to these
->
-GPA, qemu may DMA to stale MFNs.
->
->
-So I think in xen_invalidate_map_cache() linked lists should also be
->
-checked and invalidated.
->
->
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Added Jun Nakajima and Alexander Graf
-
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> Hi,
->
->
->
-> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> instead of first level entry (if map to rom other than guest memory
->
-> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> out memory, qemu did not invalidate cache entries in linked
->
-> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> mapped to different mfns, thus if guest asks device to DMA to these
->
-> GPA, qemu may DMA to stale MFNs.
->
->
->
-> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> checked and invalidated.
->
->
->
-> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
-Added Jun Nakajima and Alexander Graf
-And correct Stefano Stabellini's email address.
-
-On Mon, 10 Apr 2017 00:36:02 +0800
-hrg <address@hidden> wrote:
-
-Hi,
-
->
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> Hi,
->
->>
->
->> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> instead of first level entry (if map to rom other than guest memory
->
->> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> out memory, qemu did not invalidate cache entries in linked
->
->> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> mapped to different mfns, thus if guest asks device to DMA to these
->
->> GPA, qemu may DMA to stale MFNs.
->
->>
->
->> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> checked and invalidated.
->
->>
->
->> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
->
-> Added Jun Nakajima and Alexander Graf
->
-And correct Stefano Stabellini's email address.
-There is a real issue with the xen-mapcache corruption in fact. I encountered
-it a few months ago while experimenting with Q35 support on Xen. Q35 emulation
-uses an AHCI controller by default, along with NCQ mode enabled. The issue can
-be (somewhat) easily reproduced there, though using a normal i440 emulation
-might possibly allow to reproduce the issue as well, using a dedicated test
-code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is".
-
-The issue occurs when a guest domain performs an intensive disk I/O, ex. while
-guest OS booting. QEMU crashes with "Bad ram offset 980aa000"
-message logged, where the address is different each time. The hard thing with
-this issue is that it has a very low reproducibility rate.
-
-The corruption happens when there are multiple I/O commands in the NCQ queue.
-So there are overlapping emulated DMA operations in flight and QEMU uses a
-sequence of mapcache actions which can be executed in the "wrong" order thus
-leading to an inconsistent xen-mapcache - so a bad address from the wrong
-entry is returned.
-
-The bad thing with this issue is that QEMU crash due to "Bad ram offset"
-appearance is a relatively good situation in the sense that this is a caught
-error. But there might be a much worse (artificial) situation where the returned
-address looks valid but points to a different mapped memory.
-
-The fix itself is not hard (ex. an additional checked field in MapCacheEntry),
-but there is a need of some reliable way to test it considering the low
-reproducibility rate.
-
-Regards,
-Alex
-
-On Mon, 10 Apr 2017, hrg wrote:
->
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> Hi,
->
->>
->
->> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> instead of first level entry (if map to rom other than guest memory
->
->> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> out memory, qemu did not invalidate cache entries in linked
->
->> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> mapped to different mfns, thus if guest asks device to DMA to these
->
->> GPA, qemu may DMA to stale MFNs.
->
->>
->
->> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> checked and invalidated.
->
->>
->
->> Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-On Mon, 10 Apr 2017, hrg wrote:
->
-> On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> >> Hi,
->
-> >>
->
-> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> >> instead of first level entry (if map to rom other than guest memory
->
-> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> >> out memory, qemu did not invalidate cache entries in linked
->
-> >> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> >> mapped to different mfns, thus if guest asks device to DMA to these
->
-> >> GPA, qemu may DMA to stale MFNs.
->
-> >>
->
-> >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> >> checked and invalidated.
->
-> >>
->
-> >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
-Yes, you are right. We need to go through the list for each element of
->
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
->
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> On Mon, 10 Apr 2017, hrg wrote:
->
-> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > >> Hi,
->
-> > >>
->
-> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
-> > >> instead of first level entry (if map to rom other than guest memory
->
-> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
-> > >> out memory, qemu did not invalidate cache entries in linked
->
-> > >> list(entry->next), so when VM balloon back in memory, gfns probably
->
-> > >> mapped to different mfns, thus if guest asks device to DMA to these
->
-> > >> GPA, qemu may DMA to stale MFNs.
->
-> > >>
->
-> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> > >> checked and invalidated.
->
-> > >>
->
-> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->
->
-> Yes, you are right. We need to go through the list for each element of
->
-> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
->
-I spoke too soon. In the regular case there should be no locked mappings
->
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-beginning of the functions). Without locked mappings, there should never
->
-be more than one element in each list (see xen_map_cache_unlocked:
->
-entry->lock == true is a necessary condition to append a new entry to
->
-the list, otherwise it is just remapped).
->
->
-Can you confirm that what you are seeing are locked mappings
->
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-
-On Tue, 11 Apr 2017, hrg wrote:
->
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-<address@hidden> wrote:
->
-> On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
->> On Mon, 10 Apr 2017, hrg wrote:
->
->> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
->> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
->> > >> Hi,
->
->> > >>
->
->> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
->
->> > >> instead of first level entry (if map to rom other than guest memory
->
->> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned
->
->> > >> out memory, qemu did not invalidate cache entries in linked
->
->> > >> list(entry->next), so when VM balloon back in memory, gfns probably
->
->> > >> mapped to different mfns, thus if guest asks device to DMA to these
->
->> > >> GPA, qemu may DMA to stale MFNs.
->
->> > >>
->
->> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
->> > >> checked and invalidated.
->
->> > >>
->
->> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
->>
->
->> Yes, you are right. We need to go through the list for each element of
->
->> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
->
->
-> I spoke too soon. In the regular case there should be no locked mappings
->
-> when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> beginning of the functions). Without locked mappings, there should never
->
-> be more than one element in each list (see xen_map_cache_unlocked:
->
-> entry->lock == true is a necessary condition to append a new entry to
->
-> the list, otherwise it is just remapped).
->
->
->
-> Can you confirm that what you are seeing are locked mappings
->
-> when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
->
-In fact, I think the DPRINTF above is incorrect too. In
->
-pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-pci_add_option_rom->memory_region_get_ram_ptr (after
->
-memory_region_init_ram). So actually I think we should remove the
->
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-
-
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
-is_default_rom,
-     }
- 
-     pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
- }
- 
- static void pci_del_option_rom(PCIDevice *pdev)
-
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
-Stefano Stabellini <address@hidden> wrote:
-
->
-On Tue, 11 Apr 2017, hrg wrote:
->
-> On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> <address@hidden> wrote:
->
-> > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> >> On Mon, 10 Apr 2017, hrg wrote:
->
-> >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> >> > >> Hi,
->
-> >> > >>
->
-> >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> >> > >> entry->next instead of first level entry (if map to rom other than
->
-> >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
->
-> >> > >> when VM ballooned out memory, qemu did not invalidate cache entries
->
-> >> > >> in linked list(entry->next), so when VM balloon back in memory,
->
-> >> > >> gfns probably mapped to different mfns, thus if guest asks device
->
-> >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
->
-> >> > >>
->
-> >> > >> So I think in xen_invalidate_map_cache() linked lists should also be
->
-> >> > >> checked and invalidated.
->
-> >> > >>
->
-> >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> >>
->
-> >> Yes, you are right. We need to go through the list for each element of
->
-> >> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> >
->
-> > I spoke too soon. In the regular case there should be no locked mappings
->
-> > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > beginning of the functions). Without locked mappings, there should never
->
-> > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > entry->lock == true is a necessary condition to append a new entry to
->
-> > the list, otherwise it is just remapped).
->
-> >
->
-> > Can you confirm that what you are seeing are locked mappings
->
-> > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
->
->
-> In fact, I think the DPRINTF above is incorrect too. In
->
-> pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> memory_region_init_ram). So actually I think we should remove the
->
-> DPRINTF warning as it is normal.
->
->
-Let me explain why the DPRINTF warning is there: emulated dma operations
->
-can involve locked mappings. Once a dma operation completes, the related
->
-mapping is unlocked and can be safely destroyed. But if we destroy a
->
-locked mapping in xen_invalidate_map_cache, while a dma is still
->
-ongoing, QEMU will crash. We cannot handle that case.
->
->
-However, the scenario you described is different. It has nothing to do
->
-with DMA. It looks like pci_add_option_rom calls
->
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-locked mapping and it is never unlocked or destroyed.
->
->
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-the append patch fix the problem you are seeing? For the proper fix, I
->
-think we probably need some sort of memory_region_unmap wrapper or maybe
->
-a call to address_space_unmap.
-Hmm, for some reason my message to the Xen-devel list got rejected but was sent
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
-obvious as a list newbie.
-
-Stefano, hrg,
-
-There is an issue with inconsistency between the list of normal MapCacheEntry's
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
-When bad situation happens, there are multiple (locked) MapCacheEntry
-entries in the bucket's linked list along with a number of MapCacheRev's. And
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
-first list and calculates a wrong pointer from it which may then be caught with
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
-this issue as well I think.
-
-I'll try to provide a test code which can reproduce the issue from the
-guest side using an emulated IDE controller, though it's much simpler to achieve
-this result with an AHCI controller using multiple NCQ I/O commands. So far I've
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
-DMA should be enough I think.
-
-On 2017/4/12 14:17, Alexey G wrote:
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
-Stefano Stabellini <address@hidden> wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in
-entry->next instead of first level entry (if map to rom other than
-guest memory comes first), while in xen_invalidate_map_cache(),
-when VM ballooned out memory, qemu did not invalidate cache entries
-in linked list(entry->next), so when VM balloon back in memory,
-gfns probably mapped to different mfns, thus if guest asks device
-to DMA to these GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Hmm, for some reason my message to the Xen-devel list got rejected but was sent
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
-obvious as a list newbie.
-
-Stefano, hrg,
-
-There is an issue with inconsistency between the list of normal MapCacheEntry's
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
-When bad situation happens, there are multiple (locked) MapCacheEntry
-entries in the bucket's linked list along with a number of MapCacheRev's. And
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
-first list and calculates a wrong pointer from it which may then be caught with
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
-this issue as well I think.
-
-I'll try to provide a test code which can reproduce the issue from the
-guest side using an emulated IDE controller, though it's much simpler to achieve
-this result with an AHCI controller using multiple NCQ I/O commands. So far I've
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O
-DMA should be enough I think.
-Yes, I think there may be other bugs lurking, considering the complexity, 
-though we need to reproduce it if we want to delve into it.
-
-On Wed, 12 Apr 2017, Alexey G wrote:
->
-On Tue, 11 Apr 2017 15:32:09 -0700 (PDT)
->
-Stefano Stabellini <address@hidden> wrote:
->
->
-> On Tue, 11 Apr 2017, hrg wrote:
->
-> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > <address@hidden> wrote:
->
-> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > >> On Mon, 10 Apr 2017, hrg wrote:
->
-> > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > >> > >> Hi,
->
-> > >> > >>
->
-> > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > >> > >> entry->next instead of first level entry (if map to rom other than
->
-> > >> > >> guest memory comes first), while in xen_invalidate_map_cache(),
->
-> > >> > >> when VM ballooned out memory, qemu did not invalidate cache
->
-> > >> > >> entries
->
-> > >> > >> in linked list(entry->next), so when VM balloon back in memory,
->
-> > >> > >> gfns probably mapped to different mfns, thus if guest asks device
->
-> > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs.
->
-> > >> > >>
->
-> > >> > >> So I think in xen_invalidate_map_cache() linked lists should also
->
-> > >> > >> be
->
-> > >> > >> checked and invalidated.
->
-> > >> > >>
->
-> > >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> > >>
->
-> > >> Yes, you are right. We need to go through the list for each element of
->
-> > >> the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> > >
->
-> > > I spoke too soon. In the regular case there should be no locked mappings
->
-> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > > beginning of the functions). Without locked mappings, there should never
->
-> > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > entry->lock == true is a necessary condition to append a new entry to
->
-> > > the list, otherwise it is just remapped).
->
-> > >
->
-> > > Can you confirm that what you are seeing are locked mappings
->
-> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> >
->
-> > In fact, I think the DPRINTF above is incorrect too. In
->
-> > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > memory_region_init_ram). So actually I think we should remove the
->
-> > DPRINTF warning as it is normal.
->
->
->
-> Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> can involve locked mappings. Once a dma operation completes, the related
->
-> mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> ongoing, QEMU will crash. We cannot handle that case.
->
->
->
-> However, the scenario you described is different. It has nothing to do
->
-> with DMA. It looks like pci_add_option_rom calls
->
-> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> locked mapping and it is never unlocked or destroyed.
->
->
->
-> It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> the append patch fix the problem you are seeing? For the proper fix, I
->
-> think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> a call to address_space_unmap.
->
->
-Hmm, for some reason my message to the Xen-devel list got rejected but was
->
-sent
->
-to Qemu-devel instead, without any notice. Sorry if I'm missing something
->
-obvious as a list newbie.
->
->
-Stefano, hrg,
->
->
-There is an issue with inconsistency between the list of normal
->
-MapCacheEntry's
->
-and their 'reverse' counterparts - MapCacheRev's in locked_entries.
->
-When bad situation happens, there are multiple (locked) MapCacheEntry
->
-entries in the bucket's linked list along with a number of MapCacheRev's. And
->
-when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the
->
-first list and calculates a wrong pointer from it which may then be caught
->
-with
->
-the "Bad RAM offset" check (or not). Mapcache invalidation might be related to
->
-this issue as well I think.
->
->
-I'll try to provide a test code which can reproduce the issue from the
->
-guest side using an emulated IDE controller, though it's much simpler to
->
-achieve
->
-this result with an AHCI controller using multiple NCQ I/O commands. So far
->
-I've
->
-seen this issue only with Windows 7 (and above) guest on AHCI, but any block
->
-I/O
->
-DMA should be enough I think.
-That would be helpful. Please see if you can reproduce it after fixing
-the other issue (
-http://marc.info/?l=qemu-devel&m=149195042500707&w=2
-).
-
-On 2017/4/12 6:32, Stefano Stabellini wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in entry->next
-instead of first level entry (if map to rom other than guest memory
-comes first), while in xen_invalidate_map_cache(), when VM ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns probably
-mapped to different mfns, thus if guest asks device to DMA to these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Yes, I think so, maybe this is the proper way to fix this.
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
-is_default_rom,
-      }
-pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
-  }
-static void pci_del_option_rom(PCIDevice *pdev)
-
-On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
->
-On 2017/4/12 6:32, Stefano Stabellini wrote:
->
-> On Tue, 11 Apr 2017, hrg wrote:
->
-> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > <address@hidden> wrote:
->
-> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > > > On Mon, 10 Apr 2017, hrg wrote:
->
-> > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
->
-> > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
->
-> > > > > > > Hi,
->
-> > > > > > >
->
-> > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > > > > > > entry->next
->
-> > > > > > > instead of first level entry (if map to rom other than guest
->
-> > > > > > > memory
->
-> > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
->
-> > > > > > > ballooned
->
-> > > > > > > out memory, qemu did not invalidate cache entries in linked
->
-> > > > > > > list(entry->next), so when VM balloon back in memory, gfns
->
-> > > > > > > probably
->
-> > > > > > > mapped to different mfns, thus if guest asks device to DMA to
->
-> > > > > > > these
->
-> > > > > > > GPA, qemu may DMA to stale MFNs.
->
-> > > > > > >
->
-> > > > > > > So I think in xen_invalidate_map_cache() linked lists should
->
-> > > > > > > also be
->
-> > > > > > > checked and invalidated.
->
-> > > > > > >
->
-> > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> > > > Yes, you are right. We need to go through the list for each element of
->
-> > > > the array in xen_invalidate_map_cache. Can you come up with a patch?
->
-> > > I spoke too soon. In the regular case there should be no locked mappings
->
-> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the
->
-> > > beginning of the functions). Without locked mappings, there should never
->
-> > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > entry->lock == true is a necessary condition to append a new entry to
->
-> > > the list, otherwise it is just remapped).
->
-> > >
->
-> > > Can you confirm that what you are seeing are locked mappings
->
-> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
->
-> > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> > In fact, I think the DPRINTF above is incorrect too. In
->
-> > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > memory_region_init_ram). So actually I think we should remove the
->
-> > DPRINTF warning as it is normal.
->
-> Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> can involve locked mappings. Once a dma operation completes, the related
->
-> mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> ongoing, QEMU will crash. We cannot handle that case.
->
->
->
-> However, the scenario you described is different. It has nothing to do
->
-> with DMA. It looks like pci_add_option_rom calls
->
-> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> locked mapping and it is never unlocked or destroyed.
->
->
->
-> It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> the append patch fix the problem you are seeing? For the proper fix, I
->
-> think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> a call to address_space_unmap.
->
->
-Yes, I think so, maybe this is the proper way to fix this.
-Would you be up for sending a proper patch and testing it? We cannot call
-xen_invalidate_map_cache_entry directly from pci.c though, it would need
-to be one of the other functions like address_space_unmap for example.
-
-
->
-> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-> index e6b08e1..04f98b7 100644
->
-> --- a/hw/pci/pci.c
->
-> +++ b/hw/pci/pci.c
->
-> @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
->
-> is_default_rom,
->
->       }
->
->         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
->
-> +    xen_invalidate_map_cache_entry(ptr);
->
->   }
->
->     static void pci_del_option_rom(PCIDevice *pdev)
-
-On 2017/4/13 7:51, Stefano Stabellini wrote:
-On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
-On 2017/4/12 6:32, Stefano Stabellini wrote:
-On Tue, 11 Apr 2017, hrg wrote:
-On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
-<address@hidden> wrote:
-On Mon, 10 Apr 2017, Stefano Stabellini wrote:
-On Mon, 10 Apr 2017, hrg wrote:
-On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote:
-On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote:
-Hi,
-
-In xen_map_cache_unlocked(), map to guest memory maybe in
-entry->next
-instead of first level entry (if map to rom other than guest
-memory
-comes first), while in xen_invalidate_map_cache(), when VM
-ballooned
-out memory, qemu did not invalidate cache entries in linked
-list(entry->next), so when VM balloon back in memory, gfns
-probably
-mapped to different mfns, thus if guest asks device to DMA to
-these
-GPA, qemu may DMA to stale MFNs.
-
-So I think in xen_invalidate_map_cache() linked lists should
-also be
-checked and invalidated.
-
-Whatâs your opinion? Is this a bug? Is my analyze correct?
-Yes, you are right. We need to go through the list for each element of
-the array in xen_invalidate_map_cache. Can you come up with a patch?
-I spoke too soon. In the regular case there should be no locked mappings
-when xen_invalidate_map_cache is called (see the DPRINTF warning at the
-beginning of the functions). Without locked mappings, there should never
-be more than one element in each list (see xen_map_cache_unlocked:
-entry->lock == true is a necessary condition to append a new entry to
-the list, otherwise it is just remapped).
-
-Can you confirm that what you are seeing are locked mappings
-when xen_invalidate_map_cache is called? To find out, enable the DPRINTK
-by turning it into a printf or by defininig MAPCACHE_DEBUG.
-In fact, I think the DPRINTF above is incorrect too. In
-pci_add_option_rom(), rtl8139 rom is locked mapped in
-pci_add_option_rom->memory_region_get_ram_ptr (after
-memory_region_init_ram). So actually I think we should remove the
-DPRINTF warning as it is normal.
-Let me explain why the DPRINTF warning is there: emulated dma operations
-can involve locked mappings. Once a dma operation completes, the related
-mapping is unlocked and can be safely destroyed. But if we destroy a
-locked mapping in xen_invalidate_map_cache, while a dma is still
-ongoing, QEMU will crash. We cannot handle that case.
-
-However, the scenario you described is different. It has nothing to do
-with DMA. It looks like pci_add_option_rom calls
-memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
-locked mapping and it is never unlocked or destroyed.
-
-It looks like "ptr" is not used after pci_add_option_rom returns. Does
-the append patch fix the problem you are seeing? For the proper fix, I
-think we probably need some sort of memory_region_unmap wrapper or maybe
-a call to address_space_unmap.
-Yes, I think so, maybe this is the proper way to fix this.
-Would you be up for sending a proper patch and testing it? We cannot call
-xen_invalidate_map_cache_entry directly from pci.c though, it would need
-to be one of the other functions like address_space_unmap for example.
-Yes, I will look into this.
-diff --git a/hw/pci/pci.c b/hw/pci/pci.c
-index e6b08e1..04f98b7 100644
---- a/hw/pci/pci.c
-+++ b/hw/pci/pci.c
-@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool
-is_default_rom,
-       }
-         pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
-+    xen_invalidate_map_cache_entry(ptr);
-   }
-     static void pci_del_option_rom(PCIDevice *pdev)
-
-On Thu, 13 Apr 2017, Herongguang (Stephen) wrote:
->
-On 2017/4/13 7:51, Stefano Stabellini wrote:
->
-> On Wed, 12 Apr 2017, Herongguang (Stephen) wrote:
->
-> > On 2017/4/12 6:32, Stefano Stabellini wrote:
->
-> > > On Tue, 11 Apr 2017, hrg wrote:
->
-> > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini
->
-> > > > <address@hidden> wrote:
->
-> > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote:
->
-> > > > > > On Mon, 10 Apr 2017, hrg wrote:
->
-> > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden>
->
-> > > > > > > wrote:
->
-> > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden>
->
-> > > > > > > > wrote:
->
-> > > > > > > > > Hi,
->
-> > > > > > > > >
->
-> > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in
->
-> > > > > > > > > entry->next
->
-> > > > > > > > > instead of first level entry (if map to rom other than guest
->
-> > > > > > > > > memory
->
-> > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM
->
-> > > > > > > > > ballooned
->
-> > > > > > > > > out memory, qemu did not invalidate cache entries in linked
->
-> > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns
->
-> > > > > > > > > probably
->
-> > > > > > > > > mapped to different mfns, thus if guest asks device to DMA
->
-> > > > > > > > > to
->
-> > > > > > > > > these
->
-> > > > > > > > > GPA, qemu may DMA to stale MFNs.
->
-> > > > > > > > >
->
-> > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should
->
-> > > > > > > > > also be
->
-> > > > > > > > > checked and invalidated.
->
-> > > > > > > > >
->
-> > > > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct?
->
-> > > > > > Yes, you are right. We need to go through the list for each
->
-> > > > > > element of
->
-> > > > > > the array in xen_invalidate_map_cache. Can you come up with a
->
-> > > > > > patch?
->
-> > > > > I spoke too soon. In the regular case there should be no locked
->
-> > > > > mappings
->
-> > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at
->
-> > > > > the
->
-> > > > > beginning of the functions). Without locked mappings, there should
->
-> > > > > never
->
-> > > > > be more than one element in each list (see xen_map_cache_unlocked:
->
-> > > > > entry->lock == true is a necessary condition to append a new entry
->
-> > > > > to
->
-> > > > > the list, otherwise it is just remapped).
->
-> > > > >
->
-> > > > > Can you confirm that what you are seeing are locked mappings
->
-> > > > > when xen_invalidate_map_cache is called? To find out, enable the
->
-> > > > > DPRINTK
->
-> > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG.
->
-> > > > In fact, I think the DPRINTF above is incorrect too. In
->
-> > > > pci_add_option_rom(), rtl8139 rom is locked mapped in
->
-> > > > pci_add_option_rom->memory_region_get_ram_ptr (after
->
-> > > > memory_region_init_ram). So actually I think we should remove the
->
-> > > > DPRINTF warning as it is normal.
->
-> > > Let me explain why the DPRINTF warning is there: emulated dma operations
->
-> > > can involve locked mappings. Once a dma operation completes, the related
->
-> > > mapping is unlocked and can be safely destroyed. But if we destroy a
->
-> > > locked mapping in xen_invalidate_map_cache, while a dma is still
->
-> > > ongoing, QEMU will crash. We cannot handle that case.
->
-> > >
->
-> > > However, the scenario you described is different. It has nothing to do
->
-> > > with DMA. It looks like pci_add_option_rom calls
->
-> > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a
->
-> > > locked mapping and it is never unlocked or destroyed.
->
-> > >
->
-> > > It looks like "ptr" is not used after pci_add_option_rom returns. Does
->
-> > > the append patch fix the problem you are seeing? For the proper fix, I
->
-> > > think we probably need some sort of memory_region_unmap wrapper or maybe
->
-> > > a call to address_space_unmap.
->
-> >
->
-> > Yes, I think so, maybe this is the proper way to fix this.
->
->
->
-> Would you be up for sending a proper patch and testing it? We cannot call
->
-> xen_invalidate_map_cache_entry directly from pci.c though, it would need
->
-> to be one of the other functions like address_space_unmap for example.
->
->
->
->
->
-Yes, I will look into this.
-Any updates?
-
-
->
-> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
->
-> > > index e6b08e1..04f98b7 100644
->
-> > > --- a/hw/pci/pci.c
->
-> > > +++ b/hw/pci/pci.c
->
-> > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev,
->
-> > > bool
->
-> > > is_default_rom,
->
-> > >        }
->
-> > >          pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom);
->
-> > > +    xen_invalidate_map_cache_entry(ptr);
->
-> > >    }
->
-> > >      static void pci_del_option_rom(PCIDevice *pdev)
->
-
diff --git a/results/classifier/009/permissions/57231878 b/results/classifier/009/permissions/57231878
deleted file mode 100644
index 9c7802717..000000000
--- a/results/classifier/009/permissions/57231878
+++ /dev/null
@@ -1,252 +0,0 @@
-permissions: 0.856
-device: 0.818
-other: 0.788
-semantic: 0.774
-graphic: 0.751
-debug: 0.732
-KVM: 0.708
-PID: 0.684
-network: 0.659
-performance: 0.644
-vnc: 0.640
-socket: 0.624
-boot: 0.609
-files: 0.587
-
-[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
-
-Hello all,
-I wanted to submit a bug report in the tracker, but it seem to require
-an Ubuntu One account, which I'm having trouble with, so I'll just
-give it here and hopefully somebody can make use of it.  The issue
-seems to be in an experimental format, so it's likely not very
-consequential anyway.
-
-For the sake of anyone else simply googling for a workaround, I'll
-just paste in the (cleaned up) brief IRC conversation about my issue
-from the official channel:
-<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux,
-Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine
-(FreeBSD-11.1).  The VM always aborts at the same point in the
-installation (downloading 'ports.tgz') with the following error
-message:
-"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197:
-qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
-zsh: abort (core dumped)  qemu-system-x86_64 -smp 2 -m 4096
--enable-kvm -hda freebsd/freebsd.qed -devic"
-The commands I ran to create the machine are as follows:
-"qemu-img create -f qed freebsd/freebsd.qed 16G"
-"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda
-freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0
--cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d"
-I tried adding logging options with the -d flag, but I didn't get
-anything that seemed relevant, since I'm not sure what to look for.
-<stsquad> ohh what's a qed device?
-<stsquad> quy: it might be a workaround to use a qcow2 image for now
-<stsquad> ahh the wiki has a statement "It is not recommended to use
-QED for any new images. "
-<danpb> 'qed' was an experimental disk image format created by IBM
-before qcow2 v3 came along
-<danpb> honestly nothing should ever use  QED these days
-<danpb> the good ideas from QED became  qcow2v3
-<stsquad> danpb: sounds like we should put a warning on the option to
-remind users of that fact
-<danpb> quy: sounds like qed driver is simply broken - please do file
-a bug against qemu bug tracker
-<danpb> quy: but you should also really switch to qcow2
-<quy> I see; some people need to update their wikis then.  I don't
-remember where which guide I read when I first learned what little
-QEMU I know, but I remember it specifically remember it saying QED was
-the newest and most optimal format.
-<stsquad> quy: we can only be responsible for our own wiki I'm afraid...
-<danpb> if you remember where you saw that please let us know so we
-can try to get it fixed
-<quy> Thank you very much for the info; I will switch to QCOW.
-Unfortunately, I'm not sure if I will be able to file any bug reports
-in the tracker as I can't seem to log Launchpad, which it seems to
-require.
-<danpb> quy:  an email to the mailing list would suffice too if you
-can't deal with launchpad
-<danpb> kwolf: ^^^ in case you're interested in possible QED
-assertions from 2.12
-
-If any more info is needed, feel free to email me; I'm not actually
-subscribed to this list though.
-Thank you,
-Quytelda Kahja
-
-CC Qemu Block; looks like QED is a bit busted.
-
-On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
->
-Hello all,
->
-I wanted to submit a bug report in the tracker, but it seem to require
->
-an Ubuntu One account, which I'm having trouble with, so I'll just
->
-give it here and hopefully somebody can make use of it.  The issue
->
-seems to be in an experimental format, so it's likely not very
->
-consequential anyway.
->
->
-For the sake of anyone else simply googling for a workaround, I'll
->
-just paste in the (cleaned up) brief IRC conversation about my issue
->
-from the official channel:
->
-<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux,
->
-Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine
->
-(FreeBSD-11.1).  The VM always aborts at the same point in the
->
-installation (downloading 'ports.tgz') with the following error
->
-message:
->
-"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197:
->
-qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed.
->
-zsh: abort (core dumped)  qemu-system-x86_64 -smp 2 -m 4096
->
--enable-kvm -hda freebsd/freebsd.qed -devic"
->
-The commands I ran to create the machine are as follows:
->
-"qemu-img create -f qed freebsd/freebsd.qed 16G"
->
-"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda
->
-freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0
->
--cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d"
->
-I tried adding logging options with the -d flag, but I didn't get
->
-anything that seemed relevant, since I'm not sure what to look for.
->
-<stsquad> ohh what's a qed device?
->
-<stsquad> quy: it might be a workaround to use a qcow2 image for now
->
-<stsquad> ahh the wiki has a statement "It is not recommended to use
->
-QED for any new images. "
->
-<danpb> 'qed' was an experimental disk image format created by IBM
->
-before qcow2 v3 came along
->
-<danpb> honestly nothing should ever use  QED these days
->
-<danpb> the good ideas from QED became  qcow2v3
->
-<stsquad> danpb: sounds like we should put a warning on the option to
->
-remind users of that fact
->
-<danpb> quy: sounds like qed driver is simply broken - please do file
->
-a bug against qemu bug tracker
->
-<danpb> quy: but you should also really switch to qcow2
->
-<quy> I see; some people need to update their wikis then.  I don't
->
-remember where which guide I read when I first learned what little
->
-QEMU I know, but I remember it specifically remember it saying QED was
->
-the newest and most optimal format.
->
-<stsquad> quy: we can only be responsible for our own wiki I'm afraid...
->
-<danpb> if you remember where you saw that please let us know so we
->
-can try to get it fixed
->
-<quy> Thank you very much for the info; I will switch to QCOW.
->
-Unfortunately, I'm not sure if I will be able to file any bug reports
->
-in the tracker as I can't seem to log Launchpad, which it seems to
->
-require.
->
-<danpb> quy:  an email to the mailing list would suffice too if you
->
-can't deal with launchpad
->
-<danpb> kwolf: ^^^ in case you're interested in possible QED
->
-assertions from 2.12
->
->
-If any more info is needed, feel free to email me; I'm not actually
->
-subscribed to this list though.
->
-Thank you,
->
-Quytelda Kahja
->
-
-On 06/29/2018 03:07 PM, John Snow wrote:
-CC Qemu Block; looks like QED is a bit busted.
-
-On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
-Hello all,
-I wanted to submit a bug report in the tracker, but it seem to require
-an Ubuntu One account, which I'm having trouble with, so I'll just
-give it here and hopefully somebody can make use of it.  The issue
-seems to be in an experimental format, so it's likely not very
-consequential anyway.
-Analysis in another thread may be relevant:
-https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html
---
-Eric Blake, Principal Software Engineer
-Red Hat, Inc.           +1-919-301-3266
-Virtualization:  qemu.org | libvirt.org
-
-Am 29.06.2018 um 22:16 hat Eric Blake geschrieben:
->
-On 06/29/2018 03:07 PM, John Snow wrote:
->
-> CC Qemu Block; looks like QED is a bit busted.
->
->
->
-> On 06/27/2018 10:25 AM, Quytelda Kahja wrote:
->
-> > Hello all,
->
-> > I wanted to submit a bug report in the tracker, but it seem to require
->
-> > an Ubuntu One account, which I'm having trouble with, so I'll just
->
-> > give it here and hopefully somebody can make use of it.  The issue
->
-> > seems to be in an experimental format, so it's likely not very
->
-> > consequential anyway.
->
->
-Analysis in another thread may be relevant:
->
->
-https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html
-The assertion there was:
-
-qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion 
-`!atomic_read(&to->in_flight)' failed.
-
-Which quite clearly pointed to a drain bug. This one, however, doesn't
-seem to be related to drain, so I think it's probably a different bug.
-
-Kevin
-
diff --git a/results/classifier/009/permissions/67821138 b/results/classifier/009/permissions/67821138
deleted file mode 100644
index 37755ea48..000000000
--- a/results/classifier/009/permissions/67821138
+++ /dev/null
@@ -1,209 +0,0 @@
-permissions: 0.935
-device: 0.916
-PID: 0.909
-boot: 0.881
-debug: 0.870
-other: 0.853
-performance: 0.845
-semantic: 0.843
-graphic: 0.826
-files: 0.824
-KVM: 0.822
-vnc: 0.734
-network: 0.718
-socket: 0.699
-
-[BUG, RFC] Base node is in RW after making external snapshot
-
-Hi everyone,
-
-When making an external snapshot, we end up in a situation when 2 block
-graph nodes related to the same image file (format and storage nodes)
-have different RO flags set on them.
-
-E.g.
-
-# ls -la /proc/PID/fd
-lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd
-
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
---pretty | egrep '"node-name"|"ro"'
-      "ro": false,
-      "node-name": "libvirt-1-format",
-      "ro": false,
-      "node-name": "libvirt-1-storage",
-
-# virsh snapshot-create-as VM --name snap --disk-only
-Domain snapshot snap created
-
-# ls -la /proc/PID/fd
-lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd
-lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap
-
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
---pretty | egrep '"node-name"|"ro"'
-      "ro": false,
-      "node-name": "libvirt-2-format",
-      "ro": false,
-      "node-name": "libvirt-2-storage",
-      "ro": true,
-      "node-name": "libvirt-1-format",
-      "ro": false,                        <--------------
-      "node-name": "libvirt-1-storage",
-
-File descriptor has been reopened in RO, but "libvirt-1-storage" node
-still has RW permissions set.
-
-I'm wondering it this a bug or this is intended?  Looks like a bug to
-me, although I see that some iotests (e.g. 273) expect 2 nodes related
-to the same image file to have different RO flags.
-
-bdrv_reopen_set_read_only()
-  bdrv_reopen()
-    bdrv_reopen_queue()
-      bdrv_reopen_queue_child()
-    bdrv_reopen_multiple()
-      bdrv_list_refresh_perms()
-        bdrv_topological_dfs()
-        bdrv_do_refresh_perms()
-      bdrv_reopen_commit()
-
-In the stack above bdrv_reopen_set_read_only() is only being called for
-the parent (libvirt-1-format) node.  There're 2 lists: BDSs from
-refresh_list are used by bdrv_drv_set_perm and this leads to actual
-reopen with RO of the file descriptor.  And then there's reopen queue
-bs_queue -- BDSs from this queue get their parameters updated.  While
-refresh_list ends up having the whole subtree (including children, this
-is done in bdrv_topological_dfs()) bs_queue only has the parent.  And
-that is because storage (child) node's (bs->inherits_from == NULL), so
-bdrv_reopen_queue_child() never adds it to the queue.  Could it be the
-source of this bug?
-
-Anyway, would greatly appreciate a clarification.
-
-Andrey
-
-On 4/24/24 21:00, Andrey Drobyshev wrote:
->
-Hi everyone,
->
->
-When making an external snapshot, we end up in a situation when 2 block
->
-graph nodes related to the same image file (format and storage nodes)
->
-have different RO flags set on them.
->
->
-E.g.
->
->
-# ls -la /proc/PID/fd
->
-lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd
->
->
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
->
---pretty | egrep '"node-name"|"ro"'
->
-"ro": false,
->
-"node-name": "libvirt-1-format",
->
-"ro": false,
->
-"node-name": "libvirt-1-storage",
->
->
-# virsh snapshot-create-as VM --name snap --disk-only
->
-Domain snapshot snap created
->
->
-# ls -la /proc/PID/fd
->
-lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd
->
-lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap
->
->
-# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}'
->
---pretty | egrep '"node-name"|"ro"'
->
-"ro": false,
->
-"node-name": "libvirt-2-format",
->
-"ro": false,
->
-"node-name": "libvirt-2-storage",
->
-"ro": true,
->
-"node-name": "libvirt-1-format",
->
-"ro": false,                        <--------------
->
-"node-name": "libvirt-1-storage",
->
->
-File descriptor has been reopened in RO, but "libvirt-1-storage" node
->
-still has RW permissions set.
->
->
-I'm wondering it this a bug or this is intended?  Looks like a bug to
->
-me, although I see that some iotests (e.g. 273) expect 2 nodes related
->
-to the same image file to have different RO flags.
->
->
-bdrv_reopen_set_read_only()
->
-bdrv_reopen()
->
-bdrv_reopen_queue()
->
-bdrv_reopen_queue_child()
->
-bdrv_reopen_multiple()
->
-bdrv_list_refresh_perms()
->
-bdrv_topological_dfs()
->
-bdrv_do_refresh_perms()
->
-bdrv_reopen_commit()
->
->
-In the stack above bdrv_reopen_set_read_only() is only being called for
->
-the parent (libvirt-1-format) node.  There're 2 lists: BDSs from
->
-refresh_list are used by bdrv_drv_set_perm and this leads to actual
->
-reopen with RO of the file descriptor.  And then there's reopen queue
->
-bs_queue -- BDSs from this queue get their parameters updated.  While
->
-refresh_list ends up having the whole subtree (including children, this
->
-is done in bdrv_topological_dfs()) bs_queue only has the parent.  And
->
-that is because storage (child) node's (bs->inherits_from == NULL), so
->
-bdrv_reopen_queue_child() never adds it to the queue.  Could it be the
->
-source of this bug?
->
->
-Anyway, would greatly appreciate a clarification.
->
->
-Andrey
-Friendly ping.  Could somebody confirm that it is a bug indeed?
-
diff --git a/results/classifier/009/permissions/74545755 b/results/classifier/009/permissions/74545755
deleted file mode 100644
index 85e70c209..000000000
--- a/results/classifier/009/permissions/74545755
+++ /dev/null
@@ -1,354 +0,0 @@
-permissions: 0.770
-debug: 0.740
-performance: 0.721
-device: 0.720
-other: 0.683
-semantic: 0.669
-KVM: 0.661
-graphic: 0.660
-vnc: 0.650
-boot: 0.607
-files: 0.577
-network: 0.550
-socket: 0.549
-PID: 0.479
-
-[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration
-
-There's a bug (failing assert) which is reproduced during migration of
-a paused VM.  I am able to reproduce it on a stand with 2 nodes and a common
-NFS share, with VM's disk on that share.
-
-root@fedora40-1-vm:~# virsh domblklist alma8-vm
- Target   Source
-------------------------------------------
- sda      /mnt/shared/images/alma8.qcow2
-
-root@fedora40-1-vm:~# df -Th /mnt/shared
-Filesystem          Type  Size  Used Avail Use% Mounted on
-127.0.0.1:/srv/nfsd nfs4   63G   16G   48G  25% /mnt/shared
-
-On the 1st node:
-
-root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm
-root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent 
---undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system
-
-Then on the 2nd node:
-
-root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent 
---undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system
-error: operation failed: domain is not running
-
-root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log
-2024-09-19 13:53:33.336+0000: initiating migration
-qemu-system-x86_64: ../block.c:6976: int 
-bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & 
-BDRV_O_INACTIVE)' failed.
-2024-09-19 13:53:42.991+0000: shutting down, reason=crashed
-
-Backtrace:
-
-(gdb) bt
-#0  0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6
-#1  0x00007f7eaa298c4e in raise () at /lib64/libc.so.6
-#2  0x00007f7eaa280902 in abort () at /lib64/libc.so.6
-#3  0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6
-#4  0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6
-#5  0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at 
-../block.c:6976
-#6  0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038
-#7  0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable 
-(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true)
-    at ../migration/savevm.c:1571
-#8  0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, 
-iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631
-#9  0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, 
-current_active_state=<optimized out>) at ../migration/migration.c:2780
-#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844
-#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270
-#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536
-#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at 
-../util/qemu-thread-posix.c:541
-#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6
-#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6
-
-What happens here is that after 1st migration BDS related to HDD remains
-inactive as VM is still paused.  Then when we initiate 2nd migration,
-bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag
-on that node which is already set, thus assert fails.
-
-Attached patch which simply skips setting flag if it's already set is more
-of a kludge than a clean solution.  Should we use more sophisticated logic
-which allows some of the nodes be in inactive state prior to the migration,
-and takes them into account during bdrv_inactivate_all()?  Comments would
-be appreciated.
-
-Andrey
-
-Andrey Drobyshev (1):
-  block: do not fail when inactivating node which is inactive
-
- block.c | 10 +++++++++-
- 1 file changed, 9 insertions(+), 1 deletion(-)
-
--- 
-2.39.3
-
-Instead of throwing an assert let's just ignore that flag is already set
-and return.  We assume that it's going to be safe to ignore.  Otherwise
-this assert fails when migrating a paused VM back and forth.
-
-Ideally we'd like to have a more sophisticated solution, e.g. not even
-scan the nodes which should be inactive at this point.
-
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
- block.c | 10 +++++++++-
- 1 file changed, 9 insertions(+), 1 deletion(-)
-
-diff --git a/block.c b/block.c
-index 7d90007cae..c1dcf906d1 100644
---- a/block.c
-+++ b/block.c
-@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
-bdrv_inactivate_recurse(BlockDriverState *bs)
-         return 0;
-     }
- 
--    assert(!(bs->open_flags & BDRV_O_INACTIVE));
-+    if (bs->open_flags & BDRV_O_INACTIVE) {
-+        /*
-+         * Return here instead of throwing assert as a workaround to
-+         * prevent failure on migrating paused VM.
-+         * Here we assume that if we're trying to inactivate BDS that's
-+         * already inactive, it's safe to just ignore it.
-+         */
-+        return 0;
-+    }
- 
-     /* Inactivate this node */
-     if (bs->drv->bdrv_inactivate) {
--- 
-2.39.3
-
-[add migration maintainers]
-
-On 24.09.24 15:56, Andrey Drobyshev wrote:
-Instead of throwing an assert let's just ignore that flag is already set
-and return.  We assume that it's going to be safe to ignore.  Otherwise
-this assert fails when migrating a paused VM back and forth.
-
-Ideally we'd like to have a more sophisticated solution, e.g. not even
-scan the nodes which should be inactive at this point.
-
-Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
----
-  block.c | 10 +++++++++-
-  1 file changed, 9 insertions(+), 1 deletion(-)
-
-diff --git a/block.c b/block.c
-index 7d90007cae..c1dcf906d1 100644
---- a/block.c
-+++ b/block.c
-@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK 
-bdrv_inactivate_recurse(BlockDriverState *bs)
-          return 0;
-      }
--    assert(!(bs->open_flags & BDRV_O_INACTIVE));
-+    if (bs->open_flags & BDRV_O_INACTIVE) {
-+        /*
-+         * Return here instead of throwing assert as a workaround to
-+         * prevent failure on migrating paused VM.
-+         * Here we assume that if we're trying to inactivate BDS that's
-+         * already inactive, it's safe to just ignore it.
-+         */
-+        return 0;
-+    }
-/* Inactivate this node */
-if (bs->drv->bdrv_inactivate) {
-I doubt that this a correct way to go.
-
-As far as I understand, "inactive" actually means that "storage is not belong to 
-qemu, but to someone else (another qemu process for example), and may be changed 
-transparently". In turn this means that Qemu should do nothing with inactive disks. So the 
-problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that.
-
-Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), 
-but only in some scenarios. May be, the condition should be less strict here.
-
-Why we need any condition here at all? Don't we want to activate block-layer on 
-target after migration anyway?
-
---
-Best regards,
-Vladimir
-
-On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote:
->
-[add migration maintainers]
->
->
-On 24.09.24 15:56, Andrey Drobyshev wrote:
->
-> [...]
->
->
-I doubt that this a correct way to go.
->
->
-As far as I understand, "inactive" actually means that "storage is not
->
-belong to qemu, but to someone else (another qemu process for example),
->
-and may be changed transparently". In turn this means that Qemu should
->
-do nothing with inactive disks. So the problem is that nobody called
->
-bdrv_activate_all on target, and we shouldn't ignore that.
->
->
-Hmm, I see in process_incoming_migration_bh() we do call
->
-bdrv_activate_all(), but only in some scenarios. May be, the condition
->
-should be less strict here.
->
->
-Why we need any condition here at all? Don't we want to activate
->
-block-layer on target after migration anyway?
->
-Hmm I'm not sure about the unconditional activation, since we at least
-have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it
-in such a case).  In current libvirt upstream I see such code:
-
->
-/* Migration capabilities which should always be enabled as long as they
->
->
-* are supported by QEMU. If the capability is supposed to be enabled on both
->
->
-* sides of migration, it won't be enabled unless both sides support it.
->
->
-*/
->
->
-static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] =
->
-{
->
->
-{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER,
->
->
-QEMU_MIGRATION_SOURCE},
->
->
->
->
-{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE,
->
->
-QEMU_MIGRATION_DESTINATION},
->
->
-};
-which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set.
-
-The code from process_incoming_migration_bh() you're referring to:
-
->
-/* If capability late_block_activate is set:
->
->
-* Only fire up the block code now if we're going to restart the
->
->
-* VM, else 'cont' will do it.
->
->
-* This causes file locking to happen; so we don't want it to happen
->
->
-* unless we really are starting the VM.
->
->
-*/
->
->
-if (!migrate_late_block_activate() ||
->
->
-(autostart && (!global_state_received() ||
->
->
-runstate_is_live(global_state_get_runstate())))) {
->
->
-/* Make sure all file formats throw away their mutable metadata.
->
->
->
-* If we get an error here, just don't restart the VM yet. */
->
->
-bdrv_activate_all(&local_err);
->
->
-if (local_err) {
->
->
-error_report_err(local_err);
->
->
-local_err = NULL;
->
->
-autostart = false;
->
->
-}
->
->
-}
-It states explicitly that we're either going to start VM right at this
-point if (autostart == true), or we wait till "cont" command happens.
-None of this is going to happen if we start another migration while
-still being in PAUSED state.  So I think it seems reasonable to take
-such case into account.  For instance, this patch does prevent the crash:
-
->
-diff --git a/migration/migration.c b/migration/migration.c
->
-index ae2be31557..3222f6745b 100644
->
---- a/migration/migration.c
->
-+++ b/migration/migration.c
->
-@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque)
->
-*/
->
-if (!migrate_late_block_activate() ||
->
-(autostart && (!global_state_received() ||
->
--            runstate_is_live(global_state_get_runstate())))) {
->
-+            runstate_is_live(global_state_get_runstate()))) ||
->
-+         (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) {
->
-/* Make sure all file formats throw away their mutable metadata.
->
-* If we get an error here, just don't restart the VM yet. */
->
-bdrv_activate_all(&local_err);
-What are your thoughts on it?
-
-Andrey
-
diff --git a/results/classifier/009/permissions/74715356 b/results/classifier/009/permissions/74715356
deleted file mode 100644
index d579d4ad3..000000000
--- a/results/classifier/009/permissions/74715356
+++ /dev/null
@@ -1,136 +0,0 @@
-permissions: 0.930
-other: 0.927
-semantic: 0.916
-debug: 0.907
-performance: 0.905
-device: 0.900
-PID: 0.897
-graphic: 0.894
-boot: 0.881
-KVM: 0.863
-vnc: 0.850
-files: 0.850
-socket: 0.843
-network: 0.838
-
-[Bug] x86 EFLAGS refresh is not happening correctly
-
-Hello,
-I'm posting this here instead of opening an issue as it is not clear to me if this is a bug or not.
-The issue is located in function "cpu_compute_eflags" in target/i386/cpu.h
-(
-https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.h#L2071
-)
-This function is exectued in an out of cpu loop context.
-It is used to synchronize TCG internal eflags registers (CC_OP, CC_SRC,Â  etc...) with the CPU eflags field upon loop exit.
-It does:
-Â  Â  eflags
-|=
-cpu_cc_compute_all
-(
-env
-,
-CC_OP
-)
-|
-(
-env
-->
-df
-&
-DF_MASK
-);
-Shouldn't it be:
-Â  Â  Â
-eflags
-=
-cpu_cc_compute_all
-(
-env
-,
-CC_OP
-)
-|
-(
-env
-->
-df
-&
-DF_MASK
-);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-Thanks,
-Kind regards,
-Stevie
-
-On 05/08/21 11:51, Stevie Lavern wrote:
-Shouldn't it be:
-eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-No, both are wrong.  env->eflags contains flags other than the
-arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
-The right code is in helper_read_eflags.  You can move it into
-cpu_compute_eflags, and make helper_read_eflags use it.
-Paolo
-
-On 05/08/21 13:24, Paolo Bonzini wrote:
-On 05/08/21 11:51, Stevie Lavern wrote:
-Shouldn't it be:
-eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
-No, both are wrong.Â  env->eflags contains flags other than the
-arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
-The right code is in helper_read_eflags.Â  You can move it into
-cpu_compute_eflags, and make helper_read_eflags use it.
-Ah, actually the two are really the same, the TF/VM bits do not apply to
-cpu_compute_eflags so it's correct.
-What seems wrong is migration of the EFLAGS register.  There should be
-code in cpu_pre_save and cpu_post_load to special-case it and setup
-CC_DST/CC_OP as done in cpu_load_eflags.
-Also, cpu_load_eflags should assert that update_mask does not include
-any of the arithmetic flags.
-Paolo
-
-Thank for your reply!
-It's still a bit cryptic for me.
-I think i need to precise that I'm using a x86_64 custom user-mode,base on linux user-mode, that i'm developing (unfortunately i cannot share the code) with modifications in the translation loop (I've added cpu loop exits on specific instructions which are not control flow instructions).
-If my understanding is correct, in the user-mode case 'cpu_compute_eflags' is called directly by 'x86_cpu_exec_exit' with the intention of synchronizing the CPU env->eflags field with its real value (represented by the CC_* fields).
-I'm not sure how 'cpu_pre_save' and 'cpu_post_load' are involved in this case.
-Â
-As you said in your first email, 'helper_read_eflags' seems to be the correct way to go.
-Here is some detail about my current experimentation/understanding of this "issue":
-With the current implementationÂ
-Â  Â  Â  Â  Â
-eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
-if I exit the loop with a CC_OP different from CC_OP_EFLAGS, I found that the resulting env->eflags may be invalid.
-In my test case, the loop was exiting with eflags = 0x44 and CC_OP = CC_OP_SUBL with CC_DST=1, CC_SRC=258, CC_SRC2=0.
-While 'cpu_cc_compute_all' computes the correct flags (ZF:0, PF:0), the result will still be 0x44 (ZF:1, PF:1) due to the 'or' operation, thus leading to an incorrect eflags value loaded into the CPU env.Â
-In my case, after loop reentry, it led to an invalid branch to be taken.
-Thanks for your time!
-Regards
-Stevie
-Â
-On Thu, Aug 5, 2021 at 1:33 PM Paolo Bonzini <
-pbonzini@redhat.com
-> wrote:
-On 05/08/21 13:24, Paolo Bonzini wrote:
-> On 05/08/21 11:51, Stevie Lavern wrote:
->>
->> Shouldn't it be:
->> eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
->> as eflags is entirely reevaluated by "cpu_cc_compute_all" ?
->
-> No, both are wrong.Â  env->eflags contains flags other than the
-> arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved.
->
-> The right code is in helper_read_eflags.Â  You can move it into
-> cpu_compute_eflags, and make helper_read_eflags use it.
-Ah, actually the two are really the same, the TF/VM bits do not apply to
-cpu_compute_eflags so it's correct.
-What seems wrong is migration of the EFLAGS register.Â  There should be
-code in cpu_pre_save and cpu_post_load to special-case it and setup
-CC_DST/CC_OP as done in cpu_load_eflags.
-Also, cpu_load_eflags should assert that update_mask does not include
-any of the arithmetic flags.
-Paolo
-
diff --git a/results/classifier/009/permissions/85542195 b/results/classifier/009/permissions/85542195
deleted file mode 100644
index 328f31033..000000000
--- a/results/classifier/009/permissions/85542195
+++ /dev/null
@@ -1,130 +0,0 @@
-permissions: 0.968
-PID: 0.945
-other: 0.944
-semantic: 0.941
-graphic: 0.938
-device: 0.936
-performance: 0.933
-boot: 0.932
-vnc: 0.923
-files: 0.920
-debug: 0.915
-socket: 0.905
-network: 0.899
-KVM: 0.898
-
-[Qemu-devel] [Bug in qemu-system-ppc running Mac OS 9 on Windows 10]
-
-Hi all,
-
-I've been experiencing issues when installing Mac OS 9.x using
-qemu-system-ppc.exe in Windows 10. After booting from CD image,
-partitioning a fresh disk image often hangs Qemu. When using a
-pre-partitioned disk image, the OS installation process halts
-somewhere during the process. The issues can be resolved by setting
-qemu-system-ppc.exe to run in Windows 7 compatibility mode.
-AFAIK all Qemu builds for Windows since Mac OS 9 became available as
-guest are affected.
-The issue is reproducible by installing Qemu for Windows from Stephan
-Weil on Windows 10 and boot/install Mac OS 9.x
-
-Best regards and thanks for looking into this,
-Howard
-
-On Nov 25, 2016, at 9:26 AM, address@hidden wrote:
-Hi all,
-
-I've been experiencing issues when installing Mac OS 9.x using
-qemu-system-ppc.exe in Windows 10. After booting from CD image,
-partitioning a fresh disk image often hangs Qemu. When using a
-pre-partitioned disk image, the OS installation process halts
-somewhere during the process. The issues can be resolved by setting
-qemu-system-ppc.exe to run in Windows 7 compatibility mode.
-AFAIK all Qemu builds for Windows since Mac OS 9 became available as
-guest are affected.
-The issue is reproducible by installing Qemu for Windows from Stephan
-Weil on Windows 10 and boot/install Mac OS 9.x
-
-Best regards and thanks for looking into this,
-Howard
-I assume there was some kind of behavior change for some of the
-Windows API between Windows 7 and Windows 10, that is my guess as to
-why the compatibility mode works. Could you run 'make check' on your
-system, once in Windows 7 and once in Windows 10. Maybe the tests
-will tell us something. I'm hoping that one of the tests succeeds in
-Windows 7 and fails in Windows 10. That would help us pinpoint what
-the problem is.
-What I mean by run in Windows 7 is set the mingw environment to run
-in Windows 7 compatibility mode (if possible). If you have Windows 7
-on another partition you could boot from, that would be better.
-Good luck.
-p.s. use 'make check -k' to allow all the tests to run (even if one
-or more of the tests fails).
-
->
-> Hi all,
->
->
->
-> I've been experiencing issues when installing Mac OS 9.x using
->
-> qemu-system-ppc.exe in Windows 10. After booting from CD image,
->
-> partitioning a fresh disk image often hangs Qemu. When using a
->
-> pre-partitioned disk image, the OS installation process halts
->
-> somewhere during the process. The issues can be resolved by setting
->
-> qemu-system-ppc.exe to run in Windows 7 compatibility mode.
->
-> AFAIK all Qemu builds for Windows since Mac OS 9 became available as
->
-> guest are affected.
->
-> The issue is reproducible by installing Qemu for Windows from Stephan
->
-> Weil on Windows 10 and boot/install Mac OS 9.x
->
->
->
-> Best regards and thanks for looking into this,
->
-> Howard
->
->
->
-I assume there was some kind of behavior change for some of the Windows API
->
-between Windows 7 and Windows 10, that is my guess as to why the
->
-compatibility mode works. Could you run 'make check' on your system, once in
->
-Windows 7 and once in Windows 10. Maybe the tests will tell us something.
->
-I'm hoping that one of the tests succeeds in Windows 7 and fails in Windows
->
-10. That would help us pinpoint what the problem is.
->
->
-What I mean by run in Windows 7 is set the mingw environment to run in
->
-Windows 7 compatibility mode (if possible). If you have Windows 7 on another
->
-partition you could boot from, that would be better.
->
->
-Good luck.
->
->
-p.s. use 'make check -k' to allow all the tests to run (even if one or more
->
-of the tests fails).
-Hi,
-
-Thank you for you suggestion, but I have no means to run the check you
-suggest. I cross-compile from Linux.
-
-Best regards,
-Howard
-
diff --git a/results/classifier/009/permissions/88281850 b/results/classifier/009/permissions/88281850
deleted file mode 100644
index 02513facf..000000000
--- a/results/classifier/009/permissions/88281850
+++ /dev/null
@@ -1,291 +0,0 @@
-permissions: 0.985
-other: 0.983
-debug: 0.979
-graphic: 0.974
-network: 0.973
-device: 0.970
-performance: 0.969
-semantic: 0.968
-boot: 0.967
-socket: 0.966
-files: 0.962
-PID: 0.959
-vnc: 0.945
-KVM: 0.881
-
-[Bug] Take more 150s to boot qemu on ARM64
-
-Hi all,
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes
-about 150s between beginning to run qemu command and beginng to boot
-Linux kernel ("EFI stub: Booting Linux Kernel...").
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel
-code and it finds c2445d387850 ("srcu: Add contention check to
-call_srcu() srcu_data ->lock acquisition").
-The qemu (qemu version is 6.2.92) command i run is :
-
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
---trace "kvm*" \
--cpu host \
--machine virt,accel=kvm,gic-version=3  \
--machine smp.cpus=2,smp.sockets=2 \
--no-reboot \
--nographic \
--monitor unix:/home/cx/qmp-test,server,nowait \
--bios /home/cx/boot/QEMU_EFI.fd \
--kernel /home/cx/boot/Image  \
--device
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
-\
--device vfio-pci,host=7d:01.3,id=net0 \
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
--net none \
--D /home/cx/qemu_log.txt
-I am not familiar with rcu code, and don't know how it causes the issue.
-Do you have any idea about this issue?
-Best Regard,
-
-Xiang Chen
-
-On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
->
-Hi all,
->
->
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
->
-150s between beginning to run qemu command and beginng to boot Linux kernel
->
-("EFI stub: Booting Linux Kernel...").
->
->
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
->
-and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
->
-srcu_data ->lock acquisition").
->
->
-The qemu (qemu version is 6.2.92) command i run is :
->
->
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
->
---trace "kvm*" \
->
--cpu host \
->
--machine virt,accel=kvm,gic-version=3  \
->
--machine smp.cpus=2,smp.sockets=2 \
->
--no-reboot \
->
--nographic \
->
--monitor unix:/home/cx/qmp-test,server,nowait \
->
--bios /home/cx/boot/QEMU_EFI.fd \
->
--kernel /home/cx/boot/Image  \
->
--device
->
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
->
-\
->
--device vfio-pci,host=7d:01.3,id=net0 \
->
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
->
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
->
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
->
--net none \
->
--D /home/cx/qemu_log.txt
->
->
-I am not familiar with rcu code, and don't know how it causes the issue. Do
->
-you have any idea about this issue?
-Please see the discussion here:
-https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
-Though that report requires ACPI to be forced on to get the
-delay, which results in more than 9,000 back-to-back calls to
-synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
-with an artificial tight loop invoking synchronize_srcu_expedited(),
-but then again I don't have ARM hardware.
-
-My current guess is that the following patch, but with larger values for
-SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
-or perhaps even larger.
-
-If you get a chance to experiment with this, could you please reply
-to the discussion at the above URL?  (Or let me know, and I can CC
-you on the next message in that thread.)
-
-                                                Thanx, Paul
-
-------------------------------------------------------------------------
-
-diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
-index 50ba70f019dea..0db7873f4e95b 100644
---- a/kernel/rcu/srcutree.c
-+++ b/kernel/rcu/srcutree.c
-@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
- 
- #define SRCU_INTERVAL          1       // Base delay if no expedited GPs 
-pending.
- #define SRCU_MAX_INTERVAL      10      // Maximum incremental delay from slow 
-readers.
--#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
-no-delay instances.
-+#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
-no-delay instances.
- #define SRCU_MAX_NODELAY       100     // Maximum consecutive no-delay 
-instances.
- 
- /*
-@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-  */
- static unsigned long srcu_get_delay(struct srcu_struct *ssp)
- {
-+       unsigned long gpstart;
-+       unsigned long j;
-        unsigned long jbase = SRCU_INTERVAL;
- 
-        if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), 
-READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
-                jbase = 0;
--       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
--               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
--       if (!jbase) {
--               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
--               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
--                       jbase = 1;
-+       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
-+               j = jiffies - 1;
-+               gpstart = READ_ONCE(ssp->srcu_gp_start);
-+               if (time_after(j, gpstart))
-+                       jbase += j - gpstart;
-+               if (!jbase) {
-+                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-+                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
-SRCU_MAX_NODELAY_PHASE)
-+                               jbase = 1;
-+               }
-        }
-        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
- }
-
-å¨ 2022/6/13 21:22, Paul E. McKenney åé:
-On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote:
-Hi all,
-
-I encounter a issue with kernel 5.19-rc1 on a ARM64 board:  it takes about
-150s between beginning to run qemu command and beginng to boot Linux kernel
-("EFI stub: Booting Linux Kernel...").
-
-But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code
-and it finds c2445d387850 ("srcu: Add contention check to call_srcu()
-srcu_data ->lock acquisition").
-
-The qemu (qemu version is 6.2.92) command i run is :
-
-./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \
---trace "kvm*" \
--cpu host \
--machine virt,accel=kvm,gic-version=3  \
--machine smp.cpus=2,smp.sockets=2 \
--no-reboot \
--nographic \
--monitor unix:/home/cx/qmp-test,server,nowait \
--bios /home/cx/boot/QEMU_EFI.fd \
--kernel /home/cx/boot/Image  \
--device 
-pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1
-\
--device vfio-pci,host=7d:01.3,id=net0 \
--device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4  \
--drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \
--append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \
--net none \
--D /home/cx/qemu_log.txt
-
-I am not familiar with rcu code, and don't know how it causes the issue. Do
-you have any idea about this issue?
-Please see the discussion here:
-https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/
-Though that report requires ACPI to be forced on to get the
-delay, which results in more than 9,000 back-to-back calls to
-synchronize_srcu_expedited().  I cannot reproduce this on my setup, even
-with an artificial tight loop invoking synchronize_srcu_expedited(),
-but then again I don't have ARM hardware.
-
-My current guess is that the following patch, but with larger values for
-SRCU_MAX_NODELAY_PHASE.  Here "larger" might well be up in the hundreds,
-or perhaps even larger.
-
-If you get a chance to experiment with this, could you please reply
-to the discussion at the above URL?  (Or let me know, and I can CC
-you on the next message in that thread.)
-Ok, thanks, i will reply it on above URL.
-Thanx, Paul
-
-------------------------------------------------------------------------
-
-diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
-index 50ba70f019dea..0db7873f4e95b 100644
---- a/kernel/rcu/srcutree.c
-+++ b/kernel/rcu/srcutree.c
-@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-#define SRCU_INTERVAL		1	// Base delay if no expedited GPs pending.
-#define SRCU_MAX_INTERVAL     10      // Maximum incremental delay from slow 
-readers.
--#define SRCU_MAX_NODELAY_PHASE 1       // Maximum per-GP-phase consecutive 
-no-delay instances.
-+#define SRCU_MAX_NODELAY_PHASE 3       // Maximum per-GP-phase consecutive 
-no-delay instances.
-  #define SRCU_MAX_NODELAY      100     // Maximum consecutive no-delay 
-instances.
-/*
-@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp)
-   */
-  static unsigned long srcu_get_delay(struct srcu_struct *ssp)
-  {
-+       unsigned long gpstart;
-+       unsigned long j;
-        unsigned long jbase = SRCU_INTERVAL;
-if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp)))
-jbase = 0;
--       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq)))
--               jbase += jiffies - READ_ONCE(ssp->srcu_gp_start);
--       if (!jbase) {
--               WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
--               if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE)
--                       jbase = 1;
-+       if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) {
-+               j = jiffies - 1;
-+               gpstart = READ_ONCE(ssp->srcu_gp_start);
-+               if (time_after(j, gpstart))
-+                       jbase += j - gpstart;
-+               if (!jbase) {
-+                       WRITE_ONCE(ssp->srcu_n_exp_nodelay, 
-READ_ONCE(ssp->srcu_n_exp_nodelay) + 1);
-+                       if (READ_ONCE(ssp->srcu_n_exp_nodelay) > 
-SRCU_MAX_NODELAY_PHASE)
-+                               jbase = 1;
-+               }
-        }
-        return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase;
-  }
-.
-
diff --git a/results/classifier/009/permissions/99674399 b/results/classifier/009/permissions/99674399
deleted file mode 100644
index 3e80733b0..000000000
--- a/results/classifier/009/permissions/99674399
+++ /dev/null
@@ -1,158 +0,0 @@
-permissions: 0.896
-device: 0.886
-other: 0.883
-debug: 0.857
-performance: 0.845
-semantic: 0.822
-boot: 0.822
-PID: 0.812
-graphic: 0.794
-files: 0.787
-socket: 0.747
-network: 0.711
-KVM: 0.698
-vnc: 0.673
-
-[BUG] qemu crashes on assertion in cpu_asidx_from_attrs when cpu is in smm mode
-
-Hi all!
-
-First, I see this issue:
-https://gitlab.com/qemu-project/qemu/-/issues/1198
-. 
-where some kvm/hardware failure leads to guest crash, and finally to this 
-assertion:
-
-   cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed.
-
-But in the ticket the talk is about the guest crash and fixing the kernel, not 
-about the final QEMU assertion (which definitely show that something should be 
-fixed in QEMU code too).
-
-
-We've faced same stack one time:
-
-(gdb) bt
-#0  raise () from /lib/x86_64-linux-gnu/libc.so.6
-#1  abort () from /lib/x86_64-linux-gnu/libc.so.6
-#2  ?? () from /lib/x86_64-linux-gnu/libc.so.6
-#3  __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
-#4  cpu_asidx_from_attrs  at ../hw/core/cpu-sysemu.c:76
-#5  cpu_memory_rw_debug  at ../softmmu/physmem.c:3529
-#6  x86_cpu_dump_state  at ../target/i386/cpu-dump.c:560
-#7  kvm_cpu_exec  at ../accel/kvm/kvm-all.c:3000
-#8  kvm_vcpu_thread_fn  at ../accel/kvm/kvm-accel-ops.c:51
-#9  qemu_thread_start  at ../util/qemu-thread-posix.c:505
-#10 start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
-#11 clone () from /lib/x86_64-linux-gnu/libc.so.6
-
-
-And what I see:
-
-static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs)
-{
-    return !!attrs.secure;
-}
-
-int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs)
-{
-    int ret = 0;
-
-    if (cpu->cc->sysemu_ops->asidx_from_attrs) {
-        ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs);
-        assert(ret < cpu->num_ases && ret >= 0);         <<<<<<<<<<<<<<<<<
-    }
-    return ret;
-}
-
-(gdb) p cpu->num_ases
-$3 = 1
-
-(gdb) fr 5
-#5  0x00005578c8814ba3 in cpu_memory_rw_debug (cpu=c...
-(gdb) p attrs
-$6 = {unspecified = 0, secure = 1, user = 0, memory = 0, requester_id = 0, 
-byte_swap = 0, target_tlb_bit0 = 0, target_tlb_bit1 = 0, target_tlb_bit2 = 0}
-
-so .secure is 1, therefore ret is 1, in the same time num_ases is 1 too and 
-assertion fails.
-
-
-
-Where is .secure from?
-
-static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env)
-{
-    return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 });
-}
-
-Ok, it means we in SMM mode.
-
-
-
-On the other hand, it seems that num_ases seems to be always 1 for x86:
-
-vsementsov@vsementsov-lin:~/work/src/qemu/yc-7.2$ git grep 'num_ases = '
-cpu.c:    cpu->num_ases = 0;
-softmmu/cpus.c:        cpu->num_ases = 1;
-target/arm/cpu.c:        cs->num_ases = 3 + has_secure;
-target/arm/cpu.c:        cs->num_ases = 1 + has_secure;
-target/i386/tcg/sysemu/tcg-cpu.c:    cs->num_ases = 2;
-
-
-So, something is wrong around cpu->num_ases and x86_asidx_from_attrs() which 
-may return more in SMM mode.
-
-
-The stack starts in
-//7  0x00005578c882f539 in kvm_cpu_exec (cpu=cpu@entry=0x5578ca2eb340) at 
-../accel/kvm/kvm-all.c:3000
-    if (ret < 0) {
-        cpu_dump_state(cpu, stderr, CPU_DUMP_CODE);
-        vm_stop(RUN_STATE_INTERNAL_ERROR);
-    }
-
-So that was some kvm error, and we decided to call cpu_dump_state(). And it 
-crashes. cpu_dump_state() is also called from hmp_info_registers, so I can 
-reproduce the crash with a tiny patch to master (as only CPU_DUMP_CODE path 
-calls cpu_memory_rw_debug(), as it is in kvm_cpu_exec()):
-
-diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c
-index ff01cf9d8d..dcf0189048 100644
---- a/monitor/hmp-cmds-target.c
-+++ b/monitor/hmp-cmds-target.c
-@@ -116,7 +116,7 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict)
-         }
-
-         monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
--        cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
-+        cpu_dump_state(cs, NULL, CPU_DUMP_CODE);
-     }
- }
-
-
-Than run
-
-yes "info registers" | ./build/qemu-system-x86_64 -accel kvm -monitor stdio \
-   -global driver=cfi.pflash01,property=secure,value=on \
-   -blockdev "{'driver': 'file', 'filename': 
-'/usr/share/OVMF/OVMF_CODE_4M.secboot.fd', 'node-name': 'ovmf-code', 'read-only': 
-true}" \
-   -blockdev "{'driver': 'file', 'filename': '/usr/share/OVMF/OVMF_VARS_4M.fd', 
-'node-name': 'ovmf-vars', 'read-only': true}" \
-   -machine q35,smm=on,pflash0=ovmf-code,pflash1=ovmf-vars -m 2G -nodefaults
-
-And after some time (less than 20 seconds for me) it leads to
-
-qemu-system-x86_64: ../hw/core/cpu-sysemu.c:76: cpu_asidx_from_attrs: Assertion `ret < 
-cpu->num_ases && ret >= 0' failed.
-Aborted (core dumped)
-
-
-I've no idea how to correctly fix this bug, but I hope that my reproducer and 
-investigation will help a bit.
-
---
-Best regards,
-Vladimir
-
author	Christian Krinitsin <mail@krinitsin.com>	2025-07-03 19:39:53 +0200
committer	Christian Krinitsin <mail@krinitsin.com>	2025-07-03 19:39:53 +0200
commit	dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree	418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/009/permissions
parent	4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
download	qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip