diff options
Diffstat (limited to 'results/classifier/zero-shot/012/permissions')
18 files changed, 5698 insertions, 0 deletions
diff --git a/results/classifier/zero-shot/012/permissions/12360755 b/results/classifier/zero-shot/012/permissions/12360755 new file mode 100644 index 000000000..d5efc2be5 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/12360755 @@ -0,0 +1,314 @@ +permissions: 0.930 +debug: 0.922 +semantic: 0.911 +device: 0.902 +register: 0.901 +graphic: 0.899 +performance: 0.895 +assembly: 0.893 +other: 0.886 +architecture: 0.885 +PID: 0.876 +arm: 0.875 +risc-v: 0.868 +files: 0.851 +mistranslation: 0.844 +boot: 0.818 +kernel virtual machine: 0.814 +vnc: 0.810 +socket: 0.805 +network: 0.738 +TCG: 0.735 +x86: 0.656 + +[Qemu-devel] [BUG] virtio-net linux driver fails to probe on MIPS Malta since 'hw/virtio-pci: fix virtio behaviour' + +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? + +Cheers +James +signature.asc +Description: +Digital signature + +On 03/17/2017 11:57 PM, James Hogan wrote: +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? +Hi, + + +This is strange. This commit changes virtio devices from legacy to virtio +"transitional". +(your command line changes it to legacy) +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +side +there is nothing new. + +Michael, do you have any idea? + +Thanks, +Marcel +Cheers +James + +On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote: +> +On 03/17/2017 11:57 PM, James Hogan wrote: +> +> Hi, +> +> +> +> I've bisected the following failure of the virtio_net linux v4.10 driver +> +> to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: +> +> +> +> virtio_net virtio0: virtio: device uses modern interface but does not have +> +> VIRTIO_F_VERSION_1 +> +> virtio_net: probe of virtio0 failed with error -22 +> +> +> +> To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). +> +> +> +> It appears that adding ",disable-modern=on,disable-legacy=off" to the +> +> virtio-net -device makes it work again. +> +> +> +> I presume this should really just work out of the box. Any ideas why it +> +> isn't? +> +> +> +> +Hi, +> +> +> +This is strange. This commit changes virtio devices from legacy to virtio +> +"transitional". +> +(your command line changes it to legacy) +> +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +> +side +> +there is nothing new. +> +> +Michael, do you have any idea? +> +> +Thanks, +> +Marcel +My guess would be firmware mishandling 64 bit BARs - we saw such +a case on sparc previously. As a result you are probably reading +all zeroes from features register or something like that. +Marcel, could you send a patch making the bar 32 bit? +If that helps we know what the issue is. + +> +> Cheers +> +> James +> +> + +On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote: +On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote: +On 03/17/2017 11:57 PM, James Hogan wrote: +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? +Hi, + + +This is strange. This commit changes virtio devices from legacy to virtio +"transitional". +(your command line changes it to legacy) +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +side +there is nothing new. + +Michael, do you have any idea? + +Thanks, +Marcel +My guess would be firmware mishandling 64 bit BARs - we saw such +a case on sparc previously. As a result you are probably reading +all zeroes from features register or something like that. +Marcel, could you send a patch making the bar 32 bit? +If that helps we know what the issue is. +Sure, + +Thanks, +Marcel +Cheers +James + +On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote: +On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote: +On 03/17/2017 11:57 PM, James Hogan wrote: +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? +Hi, + + +This is strange. This commit changes virtio devices from legacy to virtio +"transitional". +(your command line changes it to legacy) +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +side +there is nothing new. + +Michael, do you have any idea? + +Thanks, +Marcel +My guess would be firmware mishandling 64 bit BARs - we saw such +a case on sparc previously. As a result you are probably reading +all zeroes from features register or something like that. +Marcel, could you send a patch making the bar 32 bit? +If that helps we know what the issue is. +Hi James, + +Can you please check if the below patch fixes the problem? +Please note it is not a solution. + +diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c +index f9b7244..5b4d429 100644 +--- a/hw/virtio/virtio-pci.c ++++ b/hw/virtio/virtio-pci.c +@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d, +Error **errp) + } + + pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx, +- PCI_BASE_ADDRESS_SPACE_MEMORY | +- PCI_BASE_ADDRESS_MEM_PREFETCH | +- PCI_BASE_ADDRESS_MEM_TYPE_64, ++ PCI_BASE_ADDRESS_SPACE_MEMORY, + &proxy->modern_bar); + + proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap); + + +Thanks, +Marcel + +Hi Marcel, + +On Tue, Mar 21, 2017 at 04:16:58PM +0200, Marcel Apfelbaum wrote: +> +Can you please check if the below patch fixes the problem? +> +Please note it is not a solution. +> +> +diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c +> +index f9b7244..5b4d429 100644 +> +--- a/hw/virtio/virtio-pci.c +> ++++ b/hw/virtio/virtio-pci.c +> +@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d, +> +Error **errp) +> +} +> +> +pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx, +> +- PCI_BASE_ADDRESS_SPACE_MEMORY | +> +- PCI_BASE_ADDRESS_MEM_PREFETCH | +> +- PCI_BASE_ADDRESS_MEM_TYPE_64, +> ++ PCI_BASE_ADDRESS_SPACE_MEMORY, +> +&proxy->modern_bar); +> +> +proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap); +Sorry for the delay trying this, I was away last week. + +No, it doesn't seem to make any difference. + +Thanks +James +signature.asc +Description: +Digital signature + diff --git a/results/classifier/zero-shot/012/permissions/14488057 b/results/classifier/zero-shot/012/permissions/14488057 new file mode 100644 index 000000000..8e65ad6b6 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/14488057 @@ -0,0 +1,729 @@ +permissions: 0.940 +architecture: 0.936 +PID: 0.930 +device: 0.929 +register: 0.927 +debug: 0.925 +other: 0.922 +TCG: 0.921 +risc-v: 0.917 +assembly: 0.912 +performance: 0.911 +semantic: 0.905 +arm: 0.900 +kernel virtual machine: 0.893 +boot: 0.892 +x86: 0.889 +graphic: 0.887 +mistranslation: 0.885 +vnc: 0.882 +network: 0.846 +socket: 0.825 +files: 0.823 + +[Qemu-devel] [BUG] user-to-root privesc inside VM via bad translation caching + +This is an issue in QEMU's system emulation for X86 in TCG mode. +The issue permits an attacker who can execute code in guest ring 3 +with normal user privileges to inject code into other processes that +are running in guest ring 3, in particular root-owned processes. + +== reproduction steps == + + - Create an x86-64 VM and install Debian Jessie in it. The following + steps should all be executed inside the VM. + - Verify that procmail is installed and the correct version: + address@hidden:~# apt-cache show procmail | egrep 'Version|SHA' + Version: 3.22-24 + SHA1: 54ed2d51db0e76f027f06068ab5371048c13434c + SHA256: 4488cf6975af9134a9b5238d5d70e8be277f70caa45a840dfbefd2dc444bfe7f + - Install build-essential and nasm ("apt install build-essential nasm"). + - Unpack the exploit, compile it and run it: + address@hidden:~$ tar xvf procmail_cache_attack.tar + procmail_cache_attack/ + procmail_cache_attack/shellcode.asm + procmail_cache_attack/xp.c + procmail_cache_attack/compile.sh + procmail_cache_attack/attack.c + address@hidden:~$ cd procmail_cache_attack + address@hidden:~/procmail_cache_attack$ ./compile.sh + address@hidden:~/procmail_cache_attack$ ./attack + memory mappings set up + child is dead, codegen should be complete + executing code as root! :) + address@hidden:~/procmail_cache_attack# id + uid=0(root) gid=0(root) groups=0(root),[...] + +Note: While the exploit depends on the precise version of procmail, +the actual vulnerability is in QEMU, not in procmail. procmail merely +serves as a seldomly-executed setuid root binary into which code can +be injected. + + +== detailed issue description == +QEMU caches translated basic blocks. To look up a translated basic +block, the function tb_find() is used, which uses tb_htable_lookup() +in its slowpath, which in turn compares translated basic blocks +(TranslationBlock) to the lookup information (struct tb_desc) using +tb_cmp(). + +tb_cmp() attempts to ensure (among other things) that both the virtual +start address of the basic block and the physical addresses that the +basic block covers match. When checking the physical addresses, it +assumes that a basic block can span at most two pages. + +gen_intermediate_code() attempts to enforce this by stopping the +translation of a basic block if nearly one page of instructions has +been translated already: + + /* if too long translation, stop generation too */ + if (tcg_op_buf_full() || + (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) || + num_insns >= max_insns) { + gen_jmp_im(pc_ptr - dc->cs_base); + gen_eob(dc); + break; + } + +However, while real X86 processors have a maximum instruction length +of 15 bytes, QEMU's instruction decoder for X86 does not place any +limit on the instruction length or the number of instruction prefixes. +Therefore, it is possible to create an arbitrarily long instruction +by e.g. prepending an arbitrary number of LOCK prefixes to a normal +instruction. This permits creating a basic block that spans three +pages by simply appending an approximately page-sized instruction to +the end of a normal basic block that starts close to the end of a +page. + +Such an overlong basic block causes the basic block caching to fail as +follows: If code is generated and cached for a basic block that spans +the physical pages (A,E,B), this basic block will be returned by +lookups in a process in which the physical pages (A,B,C) are mapped +in the same virtual address range (assuming that all other lookup +parameters match). + +This behavior can be abused by an attacker e.g. as follows: If a +non-relocatable world-readable setuid executable legitimately contains +the pages (A,B,C), an attacker can map (A,E,B) into his own process, +at the normal load address of A, where E is an attacker-controlled +page. If a legitimate basic block spans the pages A and B, an attacker +can write arbitrary non-branch instructions at the start of E, then +append an overlong instruction +that ends behind the start of C, yielding a modified basic block that +spans all three pages. If the attacker then executes the modified +basic block in his process, the modified basic block is cached. +Next, the attacker can execute the setuid binary, which will reuse the +cached modified basic block, executing attacker-controlled +instructions in the context of the privileged process. + +I am sending this to qemu-devel because a QEMU security contact +told me that QEMU does not consider privilege escalation inside a +TCG VM to be a security concern. +procmail_cache_attack.tar +Description: +Unix tar archive + +On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: +> +This is an issue in QEMU's system emulation for X86 in TCG mode. +> +The issue permits an attacker who can execute code in guest ring 3 +> +with normal user privileges to inject code into other processes that +> +are running in guest ring 3, in particular root-owned processes. +> +I am sending this to qemu-devel because a QEMU security contact +> +told me that QEMU does not consider privilege escalation inside a +> +TCG VM to be a security concern. +Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. + +We should really fix the crossing-a-page-boundary code for x86. +I believe we do get it correct for ARM Thumb instructions. + +thanks +-- PMM + +On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote: +> +On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: +> +> This is an issue in QEMU's system emulation for X86 in TCG mode. +> +> The issue permits an attacker who can execute code in guest ring 3 +> +> with normal user privileges to inject code into other processes that +> +> are running in guest ring 3, in particular root-owned processes. +> +> +> I am sending this to qemu-devel because a QEMU security contact +> +> told me that QEMU does not consider privilege escalation inside a +> +> TCG VM to be a security concern. +> +> +Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. +> +> +We should really fix the crossing-a-page-boundary code for x86. +> +I believe we do get it correct for ARM Thumb instructions. +How about doing the instruction size check as follows? + +diff --git a/target/i386/translate.c b/target/i386/translate.c +index 72c1b03a2a..94cf3da719 100644 +--- a/target/i386/translate.c ++++ b/target/i386/translate.c +@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State +*env, DisasContext *s, + default: + goto unknown_op; + } ++ if (s->pc - pc_start > 15) { ++ s->pc = pc_start; ++ goto illegal_op; ++ } + return s->pc; + illegal_op: + gen_illegal_opcode(s); + +Thanks, +-- +Pranith + +On 22 March 2017 at 14:55, Pranith Kumar <address@hidden> wrote: +> +On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote: +> +> On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: +> +>> This is an issue in QEMU's system emulation for X86 in TCG mode. +> +>> The issue permits an attacker who can execute code in guest ring 3 +> +>> with normal user privileges to inject code into other processes that +> +>> are running in guest ring 3, in particular root-owned processes. +> +> +> +>> I am sending this to qemu-devel because a QEMU security contact +> +>> told me that QEMU does not consider privilege escalation inside a +> +>> TCG VM to be a security concern. +> +> +> +> Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. +> +> +> +> We should really fix the crossing-a-page-boundary code for x86. +> +> I believe we do get it correct for ARM Thumb instructions. +> +> +How about doing the instruction size check as follows? +> +> +diff --git a/target/i386/translate.c b/target/i386/translate.c +> +index 72c1b03a2a..94cf3da719 100644 +> +--- a/target/i386/translate.c +> ++++ b/target/i386/translate.c +> +@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State +> +*env, DisasContext *s, +> +default: +> +goto unknown_op; +> +} +> ++ if (s->pc - pc_start > 15) { +> ++ s->pc = pc_start; +> ++ goto illegal_op; +> ++ } +> +return s->pc; +> +illegal_op: +> +gen_illegal_opcode(s); +This doesn't look right because it means we'll check +only after we've emitted all the code to do the +instruction operation, so the effect will be +"execute instruction, then take illegal-opcode +exception". + +We should check what the x86 architecture spec actually +says and implement that. + +thanks +-- PMM + +On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +<address@hidden> wrote: +> +> +> +> How about doing the instruction size check as follows? +> +> +> +> diff --git a/target/i386/translate.c b/target/i386/translate.c +> +> index 72c1b03a2a..94cf3da719 100644 +> +> --- a/target/i386/translate.c +> +> +++ b/target/i386/translate.c +> +> @@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State +> +> *env, DisasContext *s, +> +> default: +> +> goto unknown_op; +> +> } +> +> + if (s->pc - pc_start > 15) { +> +> + s->pc = pc_start; +> +> + goto illegal_op; +> +> + } +> +> return s->pc; +> +> illegal_op: +> +> gen_illegal_opcode(s); +> +> +This doesn't look right because it means we'll check +> +only after we've emitted all the code to do the +> +instruction operation, so the effect will be +> +"execute instruction, then take illegal-opcode +> +exception". +> +The pc is restored to original address (s->pc = pc_start), so the +exception will overwrite the generated illegal instruction and will be +executed first. + +But yes, it's better to follow the architecture manual. + +Thanks, +-- +Pranith + +On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: +> +On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +> +<address@hidden> wrote: +> +> This doesn't look right because it means we'll check +> +> only after we've emitted all the code to do the +> +> instruction operation, so the effect will be +> +> "execute instruction, then take illegal-opcode +> +> exception". +> +The pc is restored to original address (s->pc = pc_start), so the +> +exception will overwrite the generated illegal instruction and will be +> +executed first. +s->pc is the guest PC -- moving that backwards will +not do anything about the generated TCG IR that's +already been written. You'd need to rewind the +write pointer in the IR stream, which there is +no support for doing AFAIK. + +thanks +-- PMM + +On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell +<address@hidden> wrote: +> +On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: +> +> On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +> +> <address@hidden> wrote: +> +>> This doesn't look right because it means we'll check +> +>> only after we've emitted all the code to do the +> +>> instruction operation, so the effect will be +> +>> "execute instruction, then take illegal-opcode +> +>> exception". +> +> +> The pc is restored to original address (s->pc = pc_start), so the +> +> exception will overwrite the generated illegal instruction and will be +> +> executed first. +> +> +s->pc is the guest PC -- moving that backwards will +> +not do anything about the generated TCG IR that's +> +already been written. You'd need to rewind the +> +write pointer in the IR stream, which there is +> +no support for doing AFAIK. +Ah, OK. Thanks for the explanation. May be we should check the size of +the instruction while decoding the prefixes and error out once we +exceed the limit. We would not generate any IR code. + +-- +Pranith + +On 03/23/2017 02:29 AM, Pranith Kumar wrote: +On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell +<address@hidden> wrote: +On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: +On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +<address@hidden> wrote: +This doesn't look right because it means we'll check +only after we've emitted all the code to do the +instruction operation, so the effect will be +"execute instruction, then take illegal-opcode +exception". +The pc is restored to original address (s->pc = pc_start), so the +exception will overwrite the generated illegal instruction and will be +executed first. +s->pc is the guest PC -- moving that backwards will +not do anything about the generated TCG IR that's +already been written. You'd need to rewind the +write pointer in the IR stream, which there is +no support for doing AFAIK. +Ah, OK. Thanks for the explanation. May be we should check the size of +the instruction while decoding the prefixes and error out once we +exceed the limit. We would not generate any IR code. +Yes. +It would not enforce a true limit of 15 bytes, since you can't know that until +you've done the rest of the decode. But you'd be able to say that no more than +14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 bytes is used. +Which does fix the bug. + + +r~ + +On 22/03/2017 21:01, Richard Henderson wrote: +> +> +> +> Ah, OK. Thanks for the explanation. May be we should check the size of +> +> the instruction while decoding the prefixes and error out once we +> +> exceed the limit. We would not generate any IR code. +> +> +Yes. +> +> +It would not enforce a true limit of 15 bytes, since you can't know that +> +until you've done the rest of the decode. But you'd be able to say that +> +no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +bytes is used. +> +> +Which does fix the bug. +Yeah, that would work for 2.9 if somebody wants to put together a patch. + Ensuring that all instruction fetching happens before translation side +effects is a little harder, but perhaps it's also the opportunity to get +rid of s->rip_offset which is a little ugly. + +Paolo + +On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: +> +> +> +On 22/03/2017 21:01, Richard Henderson wrote: +> +>> +> +>> Ah, OK. Thanks for the explanation. May be we should check the size of +> +>> the instruction while decoding the prefixes and error out once we +> +>> exceed the limit. We would not generate any IR code. +> +> +> +> Yes. +> +> +> +> It would not enforce a true limit of 15 bytes, since you can't know that +> +> until you've done the rest of the decode. But you'd be able to say that +> +> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +> bytes is used. +> +> +> +> Which does fix the bug. +> +> +Yeah, that would work for 2.9 if somebody wants to put together a patch. +> +Ensuring that all instruction fetching happens before translation side +> +effects is a little harder, but perhaps it's also the opportunity to get +> +rid of s->rip_offset which is a little ugly. +How about the following? + +diff --git a/target/i386/translate.c b/target/i386/translate.c +index 72c1b03a2a..67c58b8900 100644 +--- a/target/i386/translate.c ++++ b/target/i386/translate.c +@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State +*env, DisasContext *s, + s->vex_l = 0; + s->vex_v = 0; + next_byte: ++ /* The prefixes can atmost be 14 bytes since x86 has an upper ++ limit of 15 bytes for the instruction */ ++ if (s->pc - pc_start > 14) { ++ goto illegal_op; ++ } + b = cpu_ldub_code(env, s->pc); + s->pc++; + /* Collect prefixes. */ + +-- +Pranith + +On 23/03/2017 17:50, Pranith Kumar wrote: +> +On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: +> +> +> +> +> +> On 22/03/2017 21:01, Richard Henderson wrote: +> +>>> +> +>>> Ah, OK. Thanks for the explanation. May be we should check the size of +> +>>> the instruction while decoding the prefixes and error out once we +> +>>> exceed the limit. We would not generate any IR code. +> +>> +> +>> Yes. +> +>> +> +>> It would not enforce a true limit of 15 bytes, since you can't know that +> +>> until you've done the rest of the decode. But you'd be able to say that +> +>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +>> bytes is used. +> +>> +> +>> Which does fix the bug. +> +> +> +> Yeah, that would work for 2.9 if somebody wants to put together a patch. +> +> Ensuring that all instruction fetching happens before translation side +> +> effects is a little harder, but perhaps it's also the opportunity to get +> +> rid of s->rip_offset which is a little ugly. +> +> +How about the following? +> +> +diff --git a/target/i386/translate.c b/target/i386/translate.c +> +index 72c1b03a2a..67c58b8900 100644 +> +--- a/target/i386/translate.c +> ++++ b/target/i386/translate.c +> +@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State +> +*env, DisasContext *s, +> +s->vex_l = 0; +> +s->vex_v = 0; +> +next_byte: +> ++ /* The prefixes can atmost be 14 bytes since x86 has an upper +> ++ limit of 15 bytes for the instruction */ +> ++ if (s->pc - pc_start > 14) { +> ++ goto illegal_op; +> ++ } +> +b = cpu_ldub_code(env, s->pc); +> +s->pc++; +> +/* Collect prefixes. */ +Please make the comment more verbose, based on Richard's remark. We +should apply it to 2.9. + +Also, QEMU usually formats comments with stars on every line. + +Paolo + +On Thu, Mar 23, 2017 at 1:37 PM, Paolo Bonzini <address@hidden> wrote: +> +> +> +On 23/03/2017 17:50, Pranith Kumar wrote: +> +> On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: +> +>> +> +>> +> +>> On 22/03/2017 21:01, Richard Henderson wrote: +> +>>>> +> +>>>> Ah, OK. Thanks for the explanation. May be we should check the size of +> +>>>> the instruction while decoding the prefixes and error out once we +> +>>>> exceed the limit. We would not generate any IR code. +> +>>> +> +>>> Yes. +> +>>> +> +>>> It would not enforce a true limit of 15 bytes, since you can't know that +> +>>> until you've done the rest of the decode. But you'd be able to say that +> +>>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +>>> bytes is used. +> +>>> +> +>>> Which does fix the bug. +> +>> +> +>> Yeah, that would work for 2.9 if somebody wants to put together a patch. +> +>> Ensuring that all instruction fetching happens before translation side +> +>> effects is a little harder, but perhaps it's also the opportunity to get +> +>> rid of s->rip_offset which is a little ugly. +> +> +> +> How about the following? +> +> +> +> diff --git a/target/i386/translate.c b/target/i386/translate.c +> +> index 72c1b03a2a..67c58b8900 100644 +> +> --- a/target/i386/translate.c +> +> +++ b/target/i386/translate.c +> +> @@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State +> +> *env, DisasContext *s, +> +> s->vex_l = 0; +> +> s->vex_v = 0; +> +> next_byte: +> +> + /* The prefixes can atmost be 14 bytes since x86 has an upper +> +> + limit of 15 bytes for the instruction */ +> +> + if (s->pc - pc_start > 14) { +> +> + goto illegal_op; +> +> + } +> +> b = cpu_ldub_code(env, s->pc); +> +> s->pc++; +> +> /* Collect prefixes. */ +> +> +Please make the comment more verbose, based on Richard's remark. We +> +should apply it to 2.9. +> +> +Also, QEMU usually formats comments with stars on every line. +OK. I'll send a proper patch with updated comment. + +Thanks, +-- +Pranith + diff --git a/results/classifier/zero-shot/012/permissions/14887122 b/results/classifier/zero-shot/012/permissions/14887122 new file mode 100644 index 000000000..ee3acf6fb --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/14887122 @@ -0,0 +1,276 @@ +permissions: 0.964 +files: 0.944 +risc-v: 0.934 +debug: 0.934 +mistranslation: 0.930 +semantic: 0.928 +register: 0.922 +device: 0.919 +assembly: 0.918 +PID: 0.914 +socket: 0.914 +graphic: 0.910 +architecture: 0.899 +performance: 0.897 +x86: 0.895 +other: 0.890 +arm: 0.883 +vnc: 0.871 +kernel virtual machine: 0.869 +network: 0.855 +TCG: 0.836 +boot: 0.831 + +[BUG][RFC] CPR transfer Issues: Socket permissions and PID files + +Hello, + +While testing CPR transfer I encountered two issues. The first is that the +transfer fails when running with pidfiles due to the destination qemu process +attempting to create the pidfile while it is still locked by the source +process. The second is that the transfer fails when running with the -run-with +user=$USERID parameter. This is because the destination qemu process creates +the UNIX sockets used for the CPR transfer before dropping to the lower +permissioned user, which causes them to be owned by the original user. The +source qemu process then does not have permission to connect to it because it +is already running as the lesser permissioned user. + +Reproducing the first issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have the -pidfile parameter passed on the command line. You +should see the following error on the command line of the second process: + +qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource +temporarily unavailable + +Reproducing the second issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have -run-with user=$USERID passed on the command line, where +$USERID is a different user from the one launching the processes. Then attempt +a CPR transfer using UNIX sockets for the main and cpr sockets. You should +receive the following error via QMP: +{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': +Permission denied"}} + +I provided a minimal patch that works around the second issue. + +Thank you, +Ben Chaney + +--- +include/system/os-posix.h | 4 ++++ +os-posix.c | 8 -------- +util/qemu-sockets.c | 21 +++++++++++++++++++++ +3 files changed, 25 insertions(+), 8 deletions(-) + +diff --git a/include/system/os-posix.h b/include/system/os-posix.h +index ce5b3bccf8..2a414a914a 100644 +--- a/include/system/os-posix.h ++++ b/include/system/os-posix.h +@@ -55,6 +55,10 @@ void os_setup_limits(void); +void os_setup_post(void); +int os_mlock(bool on_fault); + ++extern struct passwd *user_pwd; ++extern uid_t user_uid; ++extern gid_t user_gid; ++ +/** +* qemu_alloc_stack: +* @sz: pointer to a size_t holding the requested usable stack size +diff --git a/os-posix.c b/os-posix.c +index 52925c23d3..9369b312a0 100644 +--- a/os-posix.c ++++ b/os-posix.c +@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s) +} + + +-/* +- * Must set all three of these at once. +- * Legal combinations are unset by name by uid +- */ +-static struct passwd *user_pwd; /* NULL non-NULL NULL */ +-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ +-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ +- +/* +* Prepare to change user ID. user_id can be one of 3 forms: +* - a username, in which case user ID will be changed to its uid, +diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c +index 77477c1cd5..987977ead9 100644 +--- a/util/qemu-sockets.c ++++ b/util/qemu-sockets.c +@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr) +#endif +} + ++/* ++ * Must set all three of these at once. ++ * Legal combinations are unset by name by uid ++ */ ++struct passwd *user_pwd; /* NULL non-NULL NULL */ ++uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ ++gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ ++ +static int unix_listen_saddr(UnixSocketAddress *saddr, +int num, +Error **errp) +@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr, +error_setg_errno(errp, errno, "Failed to bind socket to %s", path); +goto err; +} ++ if (user_pwd) { ++ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ else if (user_uid != -1 && user_gid != -1) { ++ if (chown(un.sun_path, user_uid, user_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ +if (listen(sock, num) < 0) { +error_setg_errno(errp, errno, "Failed to listen on socket"); +goto err; +-- +2.40.1 + +Thank you Ben. I appreciate you testing CPR and shaking out the bugs. +I will study these and propose patches. + +My initial reaction to the pidfile issue is that the orchestration layer must +pass a different filename when starting the destination qemu instance. When +using live update without containers, these types of resource conflicts in the +global namespaces are a known issue. + +- Steve + +On 3/14/2025 2:33 PM, Chaney, Ben wrote: +Hello, + +While testing CPR transfer I encountered two issues. The first is that the +transfer fails when running with pidfiles due to the destination qemu process +attempting to create the pidfile while it is still locked by the source +process. The second is that the transfer fails when running with the -run-with +user=$USERID parameter. This is because the destination qemu process creates +the UNIX sockets used for the CPR transfer before dropping to the lower +permissioned user, which causes them to be owned by the original user. The +source qemu process then does not have permission to connect to it because it +is already running as the lesser permissioned user. + +Reproducing the first issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have the -pidfile parameter passed on the command line. You +should see the following error on the command line of the second process: + +qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource +temporarily unavailable + +Reproducing the second issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have -run-with user=$USERID passed on the command line, where +$USERID is a different user from the one launching the processes. Then attempt +a CPR transfer using UNIX sockets for the main and cpr sockets. You should +receive the following error via QMP: +{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': +Permission denied"}} + +I provided a minimal patch that works around the second issue. + +Thank you, +Ben Chaney + +--- +include/system/os-posix.h | 4 ++++ +os-posix.c | 8 -------- +util/qemu-sockets.c | 21 +++++++++++++++++++++ +3 files changed, 25 insertions(+), 8 deletions(-) + +diff --git a/include/system/os-posix.h b/include/system/os-posix.h +index ce5b3bccf8..2a414a914a 100644 +--- a/include/system/os-posix.h ++++ b/include/system/os-posix.h +@@ -55,6 +55,10 @@ void os_setup_limits(void); +void os_setup_post(void); +int os_mlock(bool on_fault); + ++extern struct passwd *user_pwd; ++extern uid_t user_uid; ++extern gid_t user_gid; ++ +/** +* qemu_alloc_stack: +* @sz: pointer to a size_t holding the requested usable stack size +diff --git a/os-posix.c b/os-posix.c +index 52925c23d3..9369b312a0 100644 +--- a/os-posix.c ++++ b/os-posix.c +@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s) +} + + +-/* +- * Must set all three of these at once. +- * Legal combinations are unset by name by uid +- */ +-static struct passwd *user_pwd; /* NULL non-NULL NULL */ +-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ +-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ +- +/* +* Prepare to change user ID. user_id can be one of 3 forms: +* - a username, in which case user ID will be changed to its uid, +diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c +index 77477c1cd5..987977ead9 100644 +--- a/util/qemu-sockets.c ++++ b/util/qemu-sockets.c +@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr) +#endif +} + ++/* ++ * Must set all three of these at once. ++ * Legal combinations are unset by name by uid ++ */ ++struct passwd *user_pwd; /* NULL non-NULL NULL */ ++uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ ++gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ ++ +static int unix_listen_saddr(UnixSocketAddress *saddr, +int num, +Error **errp) +@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr, +error_setg_errno(errp, errno, "Failed to bind socket to %s", path); +goto err; +} ++ if (user_pwd) { ++ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ else if (user_uid != -1 && user_gid != -1) { ++ if (chown(un.sun_path, user_uid, user_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ +if (listen(sock, num) < 0) { +error_setg_errno(errp, errno, "Failed to listen on socket"); +goto err; +-- +2.40.1 + diff --git a/results/classifier/zero-shot/012/permissions/16056596 b/results/classifier/zero-shot/012/permissions/16056596 new file mode 100644 index 000000000..e002a7695 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/16056596 @@ -0,0 +1,116 @@ +permissions: 0.985 +other: 0.980 +semantic: 0.979 +debug: 0.978 +register: 0.977 +assembly: 0.975 +files: 0.975 +device: 0.973 +boot: 0.971 +arm: 0.970 +graphic: 0.970 +architecture: 0.970 +performance: 0.970 +kernel virtual machine: 0.965 +PID: 0.961 +mistranslation: 0.961 +socket: 0.952 +vnc: 0.946 +network: 0.940 +risc-v: 0.940 +TCG: 0.935 +x86: 0.847 + +[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()" + +Bug Description: +Encountering a boot failure when launching a KVM guest with +'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor +crashes. +Reproduction Steps: +# qemu-system-ppc64 --version +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +pseries,accel=kvm \ +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ + -device virtio-scsi-pci,id=scsi \ +-drive +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +\ +-device scsi-hd,drive=drive0,bus=scsi.0 \ + -netdev bridge,id=net0,br=virbr0 \ + -device virtio-net-pci,netdev=net0 \ + -serial pty \ + -device virtio-balloon-pci \ + -cpu host +QEMU 9.2.50 monitor - type 'help' for more information +char device redirected to /dev/pts/2 (label serial0) +(qemu) +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +unavailable: IRQ_XIVE capability must be present for KVM +Falling back to kernel-irqchip=off +** Qemu Hang + +(In another ssh session) +# screen /dev/pts/2 +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +15:20:17 UTC 2024 +Detected machine type: 0000000000000101 +command line: +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +Calling ibm,client-architecture-support... done +memory layout at init: + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 0000000008200000 + alloc_top : 0000000030000000 + alloc_top_hi : 0000000800000000 + rmo_top : 0000000030000000 + ram_top : 0000000800000000 +instantiating rtas at 0x000000002fff0000... done +prom_hold_cpus: skipped +copying OF device tree... +Building dt strings... +Building dt structure... +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +Quiescing Open Firmware ... +Booting Linux via __start() @ 0x0000000000440000 ... +** Guest Console Hang + + +Git Bisect: +Performing git bisect points to the following patch: +# git bisect bad +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +Author: Nicholas Piggin <npiggin@gmail.com> +Date: Thu Dec 19 13:40:31 2024 +1000 + + target/ppc: fix timebase register reset state +(H)DEC and PURR get reset before icount does, which causes them to +be +skewed and not match the init state. This can cause replay to not +match the recorded trace exactly. For DEC and HDEC this is usually +not +noticable since they tend to get programmed before affecting the + target machine. PURR has been observed to cause replay bugs when + running Linux. + + Fix this by resetting using a time of 0. + + Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> + Signed-off-by: Nicholas Piggin <npiggin@gmail.com> + + hw/ppc/ppc.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + + +Reverting the patch helps boot the guest. +Thanks, +Misbah Anjum N + diff --git a/results/classifier/zero-shot/012/permissions/23300761 b/results/classifier/zero-shot/012/permissions/23300761 new file mode 100644 index 000000000..ba857a848 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/23300761 @@ -0,0 +1,331 @@ +permissions: 0.984 +debug: 0.978 +other: 0.963 +assembly: 0.958 +performance: 0.952 +PID: 0.950 +semantic: 0.950 +architecture: 0.946 +arm: 0.946 +register: 0.934 +device: 0.932 +boot: 0.929 +socket: 0.927 +vnc: 0.926 +risc-v: 0.926 +graphic: 0.924 +files: 0.910 +network: 0.879 +TCG: 0.868 +x86: 0.782 +mistranslation: 0.770 +kernel virtual machine: 0.738 + +[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical) + +Hi, +LGTM reports 16 errors, 81 warnings and 119 recommendations: +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +Some of them are already know (wrong format strings), others look like +real errors: +- several multiplication results which don't work as they should in +contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +32 bit!), target/i386/translate.c and other files +- potential buffer overflows in gdbstub.c and other files +I am afraid that the overflows in the block code are release critical, +maybe that in target/i386/translate.c and other errors, too. +About half of the alerts are issues which can be fixed later. + +Regards + +Stefan + +On 13/07/19 19:46, Stefan Weil wrote: +> +> +LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +> +> +Some of them are already know (wrong format strings), others look like +> +real errors: +> +> +- several multiplication results which don't work as they should in +> +contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +> +32 bit!), target/i386/translate.c and other files +m->nb_clusters here is limited by s->l2_slice_size (see for example +handle_alloc) so I wouldn't be surprised if this is a false positive. I +couldn't find this particular multiplication in Coverity, but it has +about 250 issues marked as intentional or false positive so there's +probably a lot of overlap with what LGTM found. + +Paolo + +Am 13.07.2019 um 21:42 schrieb Paolo Bonzini: +> +On 13/07/19 19:46, Stefan Weil wrote: +> +> LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +> +> +> +> Some of them are already known (wrong format strings), others look like +> +> real errors: +> +> +> +> - several multiplication results which don't work as they should in +> +> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +> +> 32 bit!), target/i386/translate.c and other files +> +m->nb_clusters here is limited by s->l2_slice_size (see for example +> +handle_alloc) so I wouldn't be surprised if this is a false positive. I +> +couldn't find this particular multiplication in Coverity, but it has +> +about 250 issues marked as intentional or false positive so there's +> +probably a lot of overlap with what LGTM found. +> +> +Paolo +> +From other projects I know that there is a certain overlap between the +results from Coverity Scan an LGTM, but it is good to have both +analyzers, and the results from LGTM are typically quite reliable. + +Even if we know that there is no multiplication overflow, the code could +be modified. Either the assigned value should use the same data type as +the factors (possible when there is never an overflow, avoids a size +extension), or the multiplication could use the larger data type by +adding a type cast to one of the factors (then an overflow cannot +happen, static code analysers and human reviewers have an easier job, +but the multiplication costs more time). + +Stefan + +Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben: +> +Am 13.07.2019 um 21:42 schrieb Paolo Bonzini: +> +> On 13/07/19 19:46, Stefan Weil wrote: +> +>> LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +>> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +> +>> +> +>> Some of them are already known (wrong format strings), others look like +> +>> real errors: +> +>> +> +>> - several multiplication results which don't work as they should in +> +>> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +> +>> 32 bit!), target/i386/translate.c and other files +Request sizes are limited to 32 bit in the generic block layer before +they are even passed to the individual block drivers, so most if not all +of these are going to be false positives. + +> +> m->nb_clusters here is limited by s->l2_slice_size (see for example +> +> handle_alloc) so I wouldn't be surprised if this is a false positive. I +> +> couldn't find this particular multiplication in Coverity, but it has +> +> about 250 issues marked as intentional or false positive so there's +> +> probably a lot of overlap with what LGTM found. +> +> +> +> Paolo +> +> +From other projects I know that there is a certain overlap between the +> +results from Coverity Scan an LGTM, but it is good to have both +> +analyzers, and the results from LGTM are typically quite reliable. +> +> +Even if we know that there is no multiplication overflow, the code could +> +be modified. Either the assigned value should use the same data type as +> +the factors (possible when there is never an overflow, avoids a size +> +extension), or the multiplication could use the larger data type by +> +adding a type cast to one of the factors (then an overflow cannot +> +happen, static code analysers and human reviewers have an easier job, +> +but the multiplication costs more time). +But if you look at the code we're talking about, you see that it's +complaining about things where being more explicit would make things +less readable. + +For example, if complains about the multiplication in this line: + + s->file_size += n * s->header.cluster_size; + +We know that n * s->header.cluster_size fits in 32 bits, but +s->file_size is 64 bits (and has to be 64 bits). Do you really think we +should introduce another uint32_t variable to store the intermediate +result? And if we cast n to uint64_t, not only might the multiplication +cost more time, but also human readers would wonder why the result could +become larger than 32 bits. So a cast would be misleading. + + +It also complains about this line: + + ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size, + PREALLOC_MODE_OFF, &local_err); + +Here, we don't even assign the result to a 64 bit variable, but just +pass it to a function which takes a 64 bit parameter. Again, I don't +think introducing additional variables for the intermediate result or +adding casts would be an improvement of the situation. + + +So I don't think this is a good enough tool to base our code on what it +does and doesn't understand. It would have too much of a negative impact +on our code. We'd rather need a way to mark false positives as such and +move on without changing the code in such cases. + +Kevin + +On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote: +> +LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +I had a look at some of these before, but mostly I came +to the conclusion that it wasn't worth trying to put the +effort into keeping up with the site because they didn't +seem to provide any useful way to mark things as false +positives. Coverity has its flaws but at least you can do +that kind of thing in its UI (it runs at about a 33% fp +rate, I think.) "Analyzer thinks this multiply can overflow +but in fact it's not possible" is quite a common false +positive cause... + +Anyway, if you want to fish out specific issues, analyse +whether they're false positive or real, and report them +to the mailing list as followups to the patches which +introduced the issue, that's probably the best way for +us to make use of this analyzer. (That is essentially +what I do for coverity.) + +thanks +-- PMM + +Am 14.07.2019 um 19:30 schrieb Peter Maydell: +[...] +> +"Analyzer thinks this multiply can overflow +> +but in fact it's not possible" is quite a common false +> +positive cause... +The analysers don't complain because a multiply can overflow. + +They complain because the code indicates that a larger result is +expected, for example uint64_t = uint32_t * uint32_t. They would not +complain for the same multiplication if it were assigned to a uint32_t. + +So there is a simple solution to write the code in a way which avoids +false positives... + +Stefan + +Stefan Weil <address@hidden> writes: + +> +Am 14.07.2019 um 19:30 schrieb Peter Maydell: +> +[...] +> +> "Analyzer thinks this multiply can overflow +> +> but in fact it's not possible" is quite a common false +> +> positive cause... +> +> +> +The analysers don't complain because a multiply can overflow. +> +> +They complain because the code indicates that a larger result is +> +expected, for example uint64_t = uint32_t * uint32_t. They would not +> +complain for the same multiplication if it were assigned to a uint32_t. +I agree this is an anti-pattern. + +> +So there is a simple solution to write the code in a way which avoids +> +false positives... +You wrote elsewhere in this thread: + + Either the assigned value should use the same data type as the + factors (possible when there is never an overflow, avoids a size + extension), or the multiplication could use the larger data type by + adding a type cast to one of the factors (then an overflow cannot + happen, static code analysers and human reviewers have an easier + job, but the multiplication costs more time). + +Makes sense to me. + +On 7/14/19 5:30 PM, Peter Maydell wrote: +> +I had a look at some of these before, but mostly I came +> +to the conclusion that it wasn't worth trying to put the +> +effort into keeping up with the site because they didn't +> +seem to provide any useful way to mark things as false +> +positives. Coverity has its flaws but at least you can do +> +that kind of thing in its UI (it runs at about a 33% fp +> +rate, I think.) +Yes, LGTM wants you to modify the source code with + + /* lgtm [cpp/some-warning-code] */ + +and on the same line as the reported problem. Which is mildly annoying in that +you're definitely committing to LGTM in the long term. Also for any +non-trivial bit of code, it will almost certainly run over 80 columns. + + +r~ + diff --git a/results/classifier/zero-shot/012/permissions/26095107 b/results/classifier/zero-shot/012/permissions/26095107 new file mode 100644 index 000000000..5b9600ca9 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/26095107 @@ -0,0 +1,176 @@ +permissions: 0.993 +debug: 0.993 +register: 0.990 +files: 0.989 +arm: 0.989 +assembly: 0.988 +PID: 0.988 +device: 0.988 +performance: 0.987 +socket: 0.987 +boot: 0.987 +architecture: 0.986 +risc-v: 0.981 +other: 0.979 +semantic: 0.974 +vnc: 0.972 +x86: 0.969 +graphic: 0.955 +kernel virtual machine: 0.948 +mistranslation: 0.930 +TCG: 0.917 +network: 0.879 + +[Qemu-devel] [Bug Report] vm paused after succeeding to migrate + +Hi, all +I encounterd a bug when I try to migrate a windows vm. + +Enviroment information: +host A: cpu E5620(model WestmereEP without flag xsave) +host B: cpu E5-2643(model SandyBridgeEP with xsave) + +The reproduce steps is : +1. Start a windows 2008 vm with -cpu host(which means host-passthrough). +2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully). +3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1. +4. Then migrate the vm to host A (successfully), but vm was paused, and qemu +printed log as followed: + +KVM: entry failed, hardware error 0x80000021 + +If you're running a guest on an Intel machine without unrestricted mode +support, the failure can be most likely due to the guest entering an invalid +state for Intel VT. For example, the guest maybe running in big real mode +which is not supported on less recent Intel processors. + +EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000 +ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20 +EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 +ES =0000 00000000 0000ffff 00009300 +CS =f000 ffff0000 0000ffff 00009b00 +SS =0000 00000000 0000ffff 00009300 +DS =0000 00000000 0000ffff 00009300 +FS =0000 00000000 0000ffff 00009300 +GS =0000 00000000 0000ffff 00009300 +LDT=0000 00000000 0000ffff 00008200 +TR =0000 00000000 0000ffff 00008b00 +GDT= 00000000 0000ffff +IDT= 00000000 0000ffff +CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 +DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 +DR3=0000000000000000 +DR6=00000000ffff0ff0 DR7=0000000000000400 +EFER=0000000000000000 +Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + +I have found that problem happened when kvm_put_sregs returns err -22(called by +kvm_arch_put_registers(qemu)). +Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has no +X86_FEATURE_XSAVE but cr4.OSXSAVE=1. +So should we cancel migration when kvm_arch_put_registers returns error? + +* linzhecheng (address@hidden) wrote: +> +Hi, all +> +I encounterd a bug when I try to migrate a windows vm. +> +> +Enviroment information: +> +host A: cpu E5620(model WestmereEP without flag xsave) +> +host B: cpu E5-2643(model SandyBridgeEP with xsave) +> +> +The reproduce steps is : +> +1. Start a windows 2008 vm with -cpu host(which means host-passthrough). +> +2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully). +> +3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1. +> +4. Then migrate the vm to host A (successfully), but vm was paused, and qemu +> +printed log as followed: +Remember that migrating using -cpu host across different CPU models is NOT +expected to work. + +> +KVM: entry failed, hardware error 0x80000021 +> +> +If you're running a guest on an Intel machine without unrestricted mode +> +support, the failure can be most likely due to the guest entering an invalid +> +state for Intel VT. For example, the guest maybe running in big real mode +> +which is not supported on less recent Intel processors. +> +> +EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000 +> +ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20 +> +EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 +> +ES =0000 00000000 0000ffff 00009300 +> +CS =f000 ffff0000 0000ffff 00009b00 +> +SS =0000 00000000 0000ffff 00009300 +> +DS =0000 00000000 0000ffff 00009300 +> +FS =0000 00000000 0000ffff 00009300 +> +GS =0000 00000000 0000ffff 00009300 +> +LDT=0000 00000000 0000ffff 00008200 +> +TR =0000 00000000 0000ffff 00008b00 +> +GDT= 00000000 0000ffff +> +IDT= 00000000 0000ffff +> +CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 +> +DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 +> +DR3=0000000000000000 +> +DR6=00000000ffff0ff0 DR7=0000000000000400 +> +EFER=0000000000000000 +> +Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 +> +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +> +00 +> +> +I have found that problem happened when kvm_put_sregs returns err -22(called +> +by kvm_arch_put_registers(qemu)). +> +Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has +> +no X86_FEATURE_XSAVE but cr4.OSXSAVE=1. +> +So should we cancel migration when kvm_arch_put_registers returns error? +It would seem good if we can make the migration fail there rather than +hitting that KVM error. +It looks like we need to do a bit of plumbing to convert the places that +call it to return a bool rather than void. + +Dave + +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + diff --git a/results/classifier/zero-shot/012/permissions/26430026 b/results/classifier/zero-shot/012/permissions/26430026 new file mode 100644 index 000000000..42d5a71d9 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/26430026 @@ -0,0 +1,183 @@ +permissions: 0.937 +TCG: 0.934 +debug: 0.925 +architecture: 0.924 +register: 0.922 +mistranslation: 0.915 +risc-v: 0.914 +x86: 0.909 +semantic: 0.904 +device: 0.904 +kernel virtual machine: 0.901 +performance: 0.898 +assembly: 0.896 +PID: 0.894 +vnc: 0.893 +arm: 0.887 +files: 0.879 +graphic: 0.862 +boot: 0.841 +socket: 0.817 +other: 0.813 +network: 0.758 + +[BUG] cxl,i386: e820 mappings may not be correct for cxl + +Context included below from prior discussion + - `cxl create-region` would fail on inability to allocate memory + - traced this down to the memory region being marked RESERVED + - E820 map marks the CXL fixed memory window as RESERVED + + +Re: x86 errors, I found that region worked with this patch. (I also +added the SRAT patches the Davidlohr posted, but I do not think they are +relevant). + +I don't think this is correct, and setting this to E820_RAM causes the +system to fail to boot at all, but with this change `cxl create-region` +succeeds, which suggests our e820 mappings in the i386 machine are +incorrect. + +Anyone who can help or have an idea as to what e820 should actually be +doing with this region, or if this is correct and something else is +failing, please help! + + +diff --git a/hw/i386/pc.c b/hw/i386/pc.c +index 566accf7e6..a5e688a742 100644 +--- a/hw/i386/pc.c ++++ b/hw/i386/pc.c +@@ -1077,7 +1077,7 @@ void pc_memory_init(PCMachineState *pcms, + memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw, + "cxl-fixed-memory-region", fw->size); + memory_region_add_subregion(system_memory, fw->base, &fw->mr); +- e820_add_entry(fw->base, fw->size, E820_RESERVED); ++ e820_add_entry(fw->base, fw->size, E820_NVS); + cxl_fmw_base += fw->size; + cxl_resv_end = cxl_fmw_base; + } + + +On Mon, Oct 10, 2022 at 05:32:42PM +0100, Jonathan Cameron wrote: +> +> +> > but i'm not sure of what to do with this info. We have some proof +> +> > that real hardware works with this no problem, and the only difference +> +> > is that the EFI/bios/firmware is setting the memory regions as `usable` +> +> > or `soft reserved`, which would imply the EDK2 is the blocker here +> +> > regardless of the OS driver status. +> +> > +> +> > But I'd seen elsewhere you had gotten some of this working, and I'm +> +> > failing to get anything working at the moment. If you have any input i +> +> > would greatly appreciate the help. +> +> > +> +> > QEMU config: +> +> > +> +> > /opt/qemu-cxl2/bin/qemu-system-x86_64 \ +> +> > -drive +> +> > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=d\ +> +> > -m 2G,slots=4,maxmem=4G \ +> +> > -smp 4 \ +> +> > -machine type=q35,accel=kvm,cxl=on \ +> +> > -enable-kvm \ +> +> > -nographic \ +> +> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \ +> +> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \ +> +> > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=256M \ +> +> > -object memory-backend-file,id=lsa0,mem-path=/tmp/cxl-lsa0,size=256M \ +> +> > -device cxl-type3,bus=rp0,pmem=true,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 +> +> > \ +> +> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=256M +> +> > +> +> > I'd seen on the lists that you had seen issues with single-rp setups, +> +> > but no combination of configuration I've tried (including all the ones +> +> > in the docs and tests) lead to a successful region creation with +> +> > `cxl create-region` +> +> +> +> Hmm. Let me have a play. I've not run x86 tests for a while so +> +> perhaps something is missing there. +> +> +> +> I'm carrying a patch to override check_last_peer() in +> +> cxl_port_setup_targets() as that is wrong for some combinations, +> +> but that doesn't look like it's related to what you are seeing. +> +> +I'm not sure if it's relevant, but turned out I'd forgotten I'm carrying 3 +> +patches that aren't upstream (and one is a horrible hack). +> +> +Hack: +https://lore.kernel.org/linux-cxl/20220819094655.000005ed@huawei.com/ +> +Shouldn't affect a simple case like this... +> +> +https://lore.kernel.org/linux-cxl/20220819093133.00006c22@huawei.com/T/#t +> +(Dan's version) +> +> +https://lore.kernel.org/linux-cxl/20220815154044.24733-1-Jonathan.Cameron@huawei.com/T/#t +> +> +For writes to work you will currently need two rps (nothing on the second is +> +fine) +> +as we still haven't resolved if the kernel should support an HDM decoder on +> +a host bridge with one port. I think it should (Spec allows it), others +> +unconvinced. +> +> +Note I haven't shifted over to x86 yet so may still be something different +> +from +> +arm64. +> +> +Jonathan +> +> + diff --git a/results/classifier/zero-shot/012/permissions/31349848 b/results/classifier/zero-shot/012/permissions/31349848 new file mode 100644 index 000000000..fafb18e01 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/31349848 @@ -0,0 +1,172 @@ +permissions: 0.908 +PID: 0.903 +other: 0.901 +device: 0.881 +graphic: 0.876 +register: 0.870 +risc-v: 0.865 +performance: 0.864 +assembly: 0.855 +vnc: 0.854 +arm: 0.852 +semantic: 0.846 +socket: 0.846 +architecture: 0.845 +TCG: 0.828 +debug: 0.826 +files: 0.820 +boot: 0.815 +x86: 0.815 +mistranslation: 0.781 +network: 0.769 +kernel virtual machine: 0.716 + +[Qemu-devel] [BUG] qemu stuck when detach host-usb device + +Description of problem: +The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 +G2), which is attached +to xhci controller(on host). Qemu will stuck if I detach it from guest. + +How reproducible: +100% + +Steps to Reproduce: +1. Use usb stick to copy files in guest , make it busy working. +2. virsh detach-device vm_name usb.xml + +Then qemu will stuck for 20s, I found this is because libusb_release_interface +block for 20s. +Dmesg prints: + +[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. +[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. +[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. +[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. + +Is this a hardware error or software's bug? + +On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote: +> +Description of problem: +> +The guest has a host-usb device(Kingston Technology DataTraveler 100 +> +G3/G4/SE9 G2), which is attached +> +to xhci controller(on host). Qemu will stuck if I detach it from guest. +> +> +How reproducible: +> +100% +> +> +Steps to Reproduce: +> +1. Use usb stick to copy files in guest , make it busy working. +> +2. virsh detach-device vm_name usb.xml +> +> +Then qemu will stuck for 20s, I found this is because +> +libusb_release_interface block for 20s. +> +Dmesg prints: +> +> +[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. +> +[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. +> +[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. +> +[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. +> +> +Is this a hardware error or software's bug? +I'd guess software error, could be is libusb or (host) linux kernel. +Cc'ing libusb-devel. + +cheers, + Gerd + +> +-----Original Message----- +> +From: Gerd Hoffmann [ +mailto:address@hidden +> +Sent: Tuesday, November 27, 2018 2:09 PM +> +To: linzhecheng <address@hidden> +> +Cc: address@hidden; wangxin (U) <address@hidden>; +> +Zhoujian (jay) <address@hidden>; address@hidden +> +Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device +> +> +On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote: +> +> Description of problem: +> +> The guest has a host-usb device(Kingston Technology DataTraveler 100 +> +> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will +> +> stuck +> +if I detach it from guest. +> +> +> +> How reproducible: +> +> 100% +> +> +> +> Steps to Reproduce: +> +> 1. Use usb stick to copy files in guest , make it busy working. +> +> 2. virsh detach-device vm_name usb.xml +> +> +> +> Then qemu will stuck for 20s, I found this is because +> +> libusb_release_interface +> +block for 20s. +> +> Dmesg prints: +> +> +> +> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. +> +> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. +> +> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. +> +> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. +> +> +> +> Is this a hardware error or software's bug? +> +> +I'd guess software error, could be is libusb or (host) linux kernel. +> +Cc'ing libusb-devel. +Perhaps it's usb driver's bug. Could you also reproduce it? +> +> +cheers, +> +Gerd + diff --git a/results/classifier/zero-shot/012/permissions/48245039 b/results/classifier/zero-shot/012/permissions/48245039 new file mode 100644 index 000000000..e7c5c36e4 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/48245039 @@ -0,0 +1,548 @@ +permissions: 0.966 +debug: 0.961 +assembly: 0.956 +PID: 0.954 +device: 0.953 +other: 0.953 +register: 0.941 +arm: 0.940 +semantic: 0.939 +graphic: 0.935 +socket: 0.932 +boot: 0.932 +vnc: 0.926 +architecture: 0.925 +files: 0.924 +TCG: 0.919 +performance: 0.890 +mistranslation: 0.888 +risc-v: 0.878 +x86: 0.875 +network: 0.818 +kernel virtual machine: 0.730 + +[Qemu-devel] [BUG] gcov support appears to be broken + +Hello, according to out docs, here is the procedure that should produce +coverage report for execution of the complete "make check": + +#./configure --enable-gcov +#make +#make check +#make coverage-report + +It seems that first three commands execute as expected. (For example, there are +plenty of files generated by "make check" that would've not been generated if +"enable-gcov" hadn't been chosen.) However, the last command complains about +some missing files related to FP support. If those files are added (for +example, artificially, using "touch <missing-file"), that it starts complaining +about missing some decodetree-generated files. Other kinds of files are +involved too. + +It would be nice to have coverage support working. Please somebody take a look, +or explain if I make a mistake or misunderstood our gcov support. + +Yours, +Aleksandar + +On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote: +> +> +Hello, according to out docs, here is the procedure that should produce +> +coverage report for execution of the complete "make check": +> +> +#./configure --enable-gcov +> +#make +> +#make check +> +#make coverage-report +> +> +It seems that first three commands execute as expected. (For example, there +> +are plenty of files generated by "make check" that would've not been +> +generated if "enable-gcov" hadn't been chosen.) However, the last command +> +complains about some missing files related to FP support. If those files are +> +added (for example, artificially, using "touch <missing-file"), that it +> +starts complaining about missing some decodetree-generated files. Other kinds +> +of files are involved too. +> +> +It would be nice to have coverage support working. Please somebody take a +> +look, or explain if I make a mistake or misunderstood our gcov support. +Cc'ing Alex who's probably the closest we have to a gcov expert. + +(make/make check of a --enable-gcov build is in the set of things our +Travis CI setup runs, so we do defend that part against regressions.) + +thanks +-- PMM + +Peter Maydell <address@hidden> writes: + +> +On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote: +> +> +> +> Hello, according to out docs, here is the procedure that should produce +> +> coverage report for execution of the complete "make check": +> +> +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +> support. If those files are added (for example, artificially, using +> +> "touch <missing-file"), that it starts complaining about missing some +> +> decodetree-generated files. Other kinds of files are involved too. +The gcov tool is fairly noisy about missing files but that just +indicates the tests haven't exercised those code paths. "make check" +especially doesn't touch much of the TCG code and a chunk of floating +point. + +> +> +> +> It would be nice to have coverage support working. Please somebody +> +> take a look, or explain if I make a mistake or misunderstood our gcov +> +> support. +So your failure mode is no report is generated at all? It's working for +me here. + +> +> +Cc'ing Alex who's probably the closest we have to a gcov expert. +> +> +(make/make check of a --enable-gcov build is in the set of things our +> +Travis CI setup runs, so we do defend that part against regressions.) +We defend the build but I have just checked and it seems our +check_coverage script is currently failing: +https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328 +But as it's an after_success script it doesn't fail the build. + +> +> +thanks +> +-- PMM +-- +Alex Bennée + +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +So your failure mode is no report is generated at all? It's working for +> +me here. +Alex, no report is generated for my test setups - in fact, "make +coverage-report" even says that it explicitly deletes what appears to be the +main coverage report html file). + +This is the terminal output of an unsuccessful executions of "make +coverage-report" for recent ToT: + +~/Build/qemu-TOT-TEST$ make coverage-report +make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp' +make[1]: Nothing to be done for 'all'. +make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp' + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: 'wrap.inc.c' +Makefile:1048: recipe for target +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed +make: *** +[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1 +make: *** Deleting file +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' + +This instance is executed in QEMU 3.0 source tree: (so, it looks the problem +existed for quite some time) + +~/Build/qemu-3.0$ make coverage-report + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: +'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c' +Makefile:992: recipe for target +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed +make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] +Error 1 +make: *** Deleting file +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' + +Fond regards, +Aleksandar + + +> +Alex Bennée + +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +So your failure mode is no report is generated at all? It's working for +> +me here. +Another piece of info: + +~/Build/qemu-TOT-TEST$ gcov --version +gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010 +Copyright (C) 2015 Free Software Foundation, Inc. +This is free software; see the source for copying conditions. +There is NO warranty; not even for MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. + +:~/Build/qemu-TOT-TEST$ gcc --version +gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0 +Copyright (C) 2017 Free Software Foundation, Inc. +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + + + + +Alex, no report is generated for my test setups - in fact, "make +coverage-report" even says that it explicitly deletes what appears to be the +main coverage report html file). + +This is the terminal output of an unsuccessful executions of "make +coverage-report" for recent ToT: + +~/Build/qemu-TOT-TEST$ make coverage-report +make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp' +make[1]: Nothing to be done for 'all'. +make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp' + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: 'wrap.inc.c' +Makefile:1048: recipe for target +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed +make: *** +[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1 +make: *** Deleting file +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' + +This instance is executed in QEMU 3.0 source tree: (so, it looks the problem +existed for quite some time) + +~/Build/qemu-3.0$ make coverage-report + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: +'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c' +Makefile:992: recipe for target +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed +make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] +Error 1 +make: *** Deleting file +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' + +Fond regards, +Aleksandar + + +> +Alex Bennée + +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +So your failure mode is no report is generated at all? It's working for +> +me here. +Alex, here is the thing: + +Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from +git repo to the most recent 4.1 (actually, to a dev version, from the very tip +of the tree), and "make coverage-report" started generating coverage reports. +It did emit some error messages (totally different than previous), but still it +did not stop like it used to do with gcovr 3.2. + +Perhaps you would want to add some gcov/gcovr minimal version info in our docs. +(or at least a statement "this was tested with such and such gcc, gcov and +gcovr", etc.?) + +Coverage report looked fine at first glance, but it a kind of disappointed me +when I digged deeper into its content - for example, it shows very low coverage +for our FP code (softfloat), while, in fact, we know that "make check" contains +detailed tests on FP functionalities. But this is most likely a separate +problem of a very different nature, perhaps the issue of separate git repo for +FP tests (testfloat) that our FP tests use as a mid-layer. + +I'll try how everything works with my test examples, and will let you know. + +Your help is greatly appreciated, +Aleksandar + +Fond regards, +Aleksandar + + +> +Alex Bennée + +Aleksandar Markovic <address@hidden> writes: + +> +>> #./configure --enable-gcov +> +>> #make +> +>> #make check +> +>> #make coverage-report +> +>> +> +>> It seems that first three commands execute as expected. (For example, +> +>> there are plenty of files generated by "make check" that would've not +> +>> been generated if "enable-gcov" hadn't been chosen.) However, the +> +>> last command complains about some missing files related to FP +> +> +> So your failure mode is no report is generated at all? It's working for +> +> me here. +> +> +Alex, here is the thing: +> +> +Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from +> +git repo to the most recent 4.1 (actually, to a dev version, from the very +> +tip of the tree), and "make coverage-report" started generating coverage +> +reports. It did emit some error messages (totally different than previous), +> +but still it did not stop like it used to do with gcovr 3.2. +> +> +Perhaps you would want to add some gcov/gcovr minimal version info in our +> +docs. (or at least a statement "this was tested with such and such gcc, gcov +> +and gcovr", etc.?) +> +> +Coverage report looked fine at first glance, but it a kind of +> +disappointed me when I digged deeper into its content - for example, +> +it shows very low coverage for our FP code (softfloat), while, in +> +fact, we know that "make check" contains detailed tests on FP +> +functionalities. But this is most likely a separate problem of a very +> +different nature, perhaps the issue of separate git repo for FP tests +> +(testfloat) that our FP tests use as a mid-layer. +I get: + +68.6 % 2593 / 3782 62.2 % 1690 / 2718 + +Which is not bad considering we don't exercise the 80 and 128 bit +softfloat code at all (which is not shared by the re-factored 16/32/64 +bit code). + +> +> +I'll try how everything works with my test examples, and will let you know. +> +> +Your help is greatly appreciated, +> +Aleksandar +> +> +Fond regards, +> +Aleksandar +> +> +> +> Alex Bennée +-- +Alex Bennée + +> +> it shows very low coverage for our FP code (softfloat), while, in +> +> fact, we know that "make check" contains detailed tests on FP +> +> functionalities. But this is most likely a separate problem of a very +> +> different nature, perhaps the issue of separate git repo for FP tests +> +> (testfloat) that our FP tests use as a mid-layer. +> +> +I get: +> +> +68.6 % 2593 / 3782 62.2 % 1690 / 2718 +> +I would expect that kind of result too. + +However, I get: + +File: fpu/softfloat.c Lines: 8 3334 0.2 % +Date: 2019-08-05 19:56:58 Branches: 3 2376 0.1 % + +:( + +OK, I'll try to figure that out, and most likely I could live with it if it is +an isolated problem. + +Thank you for your assistance in this matter, +Aleksandar + +> +Which is not bad considering we don't exercise the 80 and 128 bit +> +softfloat code at all (which is not shared by the re-factored 16/32/64 +> +bit code). +> +> +Alex Bennée + +> +> it shows very low coverage for our FP code (softfloat), while, in +> +> fact, we know that "make check" contains detailed tests on FP +> +> functionalities. But this is most likely a separate problem of a very +> +> different nature, perhaps the issue of separate git repo for FP tests +> +> (testfloat) that our FP tests use as a mid-layer. +> +> +I get: +> +> +68.6 % 2593 / 3782 62.2 % 1690 / 2718 +> +This problem is solved too. (and it is my fault) + +I worked with multiple versions of QEMU, and my previous low-coverage results +were for QEMU 3.0, and for that version the directory tests/fp did not even +exist. :D (<blush>) + +For QEMU ToT, I get now: + +fpu/softfloat.c + 68.8 % 2592 / 3770 62.3 % 1693 / 2718 + +which is identical for all intents and purposes to your result. + +Yours cordially, +Aleksandar + diff --git a/results/classifier/zero-shot/012/permissions/50773216 b/results/classifier/zero-shot/012/permissions/50773216 new file mode 100644 index 000000000..320ce82a5 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/50773216 @@ -0,0 +1,128 @@ +permissions: 0.813 +risc-v: 0.813 +architecture: 0.788 +device: 0.764 +arm: 0.758 +register: 0.751 +other: 0.737 +graphic: 0.723 +kernel virtual machine: 0.702 +semantic: 0.669 +files: 0.666 +debug: 0.659 +vnc: 0.656 +socket: 0.652 +mistranslation: 0.652 +boot: 0.637 +PID: 0.636 +performance: 0.628 +assembly: 0.623 +network: 0.606 +TCG: 0.571 +x86: 0.510 + +[Qemu-devel] Can I have someone's feedback on [bug 1809075] Concurrency bug on keyboard events: capslock LED messing up keycode streams causes character misses at guest kernel + +Hi everyone. +Can I please have someone's feedback on this bug? +https://bugs.launchpad.net/qemu/+bug/1809075 +Briefly, guest OS loses characters sent to it via vnc. And I spot the +bug in relation to ps2 driver. +I'm thinking of possible fixes and I might want to use a memory barrier. +But I would really like to have some suggestion from a qemu developer +first. For example, can we brutally drop capslock LED key events in ps2 +queue? +It is actually relevant to openQA, an automated QA tool for openSUSE. +And this bug blocks a few test cases for us. +Thank you in advance! + +Kind regards, +Gao Zhiyuan + +Cc'ing Marc-André & Gerd. + +On 12/19/18 10:31 AM, Gao Zhiyuan wrote: +> +Hi everyone. +> +> +Can I please have someone's feedback on this bug? +> +https://bugs.launchpad.net/qemu/+bug/1809075 +> +Briefly, guest OS loses characters sent to it via vnc. And I spot the +> +bug in relation to ps2 driver. +> +> +I'm thinking of possible fixes and I might want to use a memory barrier. +> +But I would really like to have some suggestion from a qemu developer +> +first. For example, can we brutally drop capslock LED key events in ps2 +> +queue? +> +> +It is actually relevant to openQA, an automated QA tool for openSUSE. +> +And this bug blocks a few test cases for us. +> +> +Thank you in advance! +> +> +Kind regards, +> +Gao Zhiyuan +> + +On Thu, Jan 03, 2019 at 12:05:54PM +0100, Philippe Mathieu-Daudé wrote: +> +Cc'ing Marc-André & Gerd. +> +> +On 12/19/18 10:31 AM, Gao Zhiyuan wrote: +> +> Hi everyone. +> +> +> +> Can I please have someone's feedback on this bug? +> +> +https://bugs.launchpad.net/qemu/+bug/1809075 +> +> Briefly, guest OS loses characters sent to it via vnc. And I spot the +> +> bug in relation to ps2 driver. +> +> +> +> I'm thinking of possible fixes and I might want to use a memory barrier. +> +> But I would really like to have some suggestion from a qemu developer +> +> first. For example, can we brutally drop capslock LED key events in ps2 +> +> queue? +There is no "capslock LED key event". 0xfa is KBD_REPLY_ACK, and the +device queues it in response to guest port writes. Yes, the ack can +race with actual key events. But IMO that isn't a bug in qemu. + +Probably the linux kernel just throws away everything until it got the +ack for the port write, and that way the key event gets lost. On +physical hardware you will not notice because it is next to impossible +to type fast enough to hit the race window. + +So, go fix the kernel. + +Alternatively fix vncdotool to send uppercase letters properly with +shift key pressed. Then qemu wouldn't generate capslock key events +(that happens because qemu thinks guest and host capslock state is out +of sync) and the guests's capslock led update request wouldn't get into +the way. + +cheers, + Gerd + diff --git a/results/classifier/zero-shot/012/permissions/55247116 b/results/classifier/zero-shot/012/permissions/55247116 new file mode 100644 index 000000000..e5e4ddce1 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/55247116 @@ -0,0 +1,1328 @@ +permissions: 0.946 +other: 0.945 +debug: 0.941 +assembly: 0.938 +performance: 0.938 +arm: 0.936 +graphic: 0.933 +PID: 0.929 +socket: 0.929 +architecture: 0.928 +register: 0.928 +semantic: 0.928 +device: 0.919 +boot: 0.918 +risc-v: 0.918 +network: 0.916 +vnc: 0.916 +files: 0.912 +x86: 0.855 +TCG: 0.853 +mistranslation: 0.841 +kernel virtual machine: 0.828 + +[Qemu-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache? + +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +instead of first level entry (if map to rom other than guest memory +comes first), while in xen_invalidate_map_cache(), when VM ballooned +out memory, qemu did not invalidate cache entries in linked +list(entry->next), so when VM balloon back in memory, gfns probably +mapped to different mfns, thus if guest asks device to DMA to these +GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? + +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +Hi, +> +> +In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +instead of first level entry (if map to rom other than guest memory +> +comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +out memory, qemu did not invalidate cache entries in linked +> +list(entry->next), so when VM balloon back in memory, gfns probably +> +mapped to different mfns, thus if guest asks device to DMA to these +> +GPA, qemu may DMA to stale MFNs. +> +> +So I think in xen_invalidate_map_cache() linked lists should also be +> +checked and invalidated. +> +> +Whatâs your opinion? Is this a bug? Is my analyze correct? +Added Jun Nakajima and Alexander Graf + +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> Hi, +> +> +> +> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +> instead of first level entry (if map to rom other than guest memory +> +> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +> out memory, qemu did not invalidate cache entries in linked +> +> list(entry->next), so when VM balloon back in memory, gfns probably +> +> mapped to different mfns, thus if guest asks device to DMA to these +> +> GPA, qemu may DMA to stale MFNs. +> +> +> +> So I think in xen_invalidate_map_cache() linked lists should also be +> +> checked and invalidated. +> +> +> +> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +Added Jun Nakajima and Alexander Graf +And correct Stefano Stabellini's email address. + +On Mon, 10 Apr 2017 00:36:02 +0800 +hrg <address@hidden> wrote: + +Hi, + +> +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +>> Hi, +> +>> +> +>> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +>> instead of first level entry (if map to rom other than guest memory +> +>> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +>> out memory, qemu did not invalidate cache entries in linked +> +>> list(entry->next), so when VM balloon back in memory, gfns probably +> +>> mapped to different mfns, thus if guest asks device to DMA to these +> +>> GPA, qemu may DMA to stale MFNs. +> +>> +> +>> So I think in xen_invalidate_map_cache() linked lists should also be +> +>> checked and invalidated. +> +>> +> +>> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +> +> Added Jun Nakajima and Alexander Graf +> +And correct Stefano Stabellini's email address. +There is a real issue with the xen-mapcache corruption in fact. I encountered +it a few months ago while experimenting with Q35 support on Xen. Q35 emulation +uses an AHCI controller by default, along with NCQ mode enabled. The issue can +be (somewhat) easily reproduced there, though using a normal i440 emulation +might possibly allow to reproduce the issue as well, using a dedicated test +code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is". + +The issue occurs when a guest domain performs an intensive disk I/O, ex. while +guest OS booting. QEMU crashes with "Bad ram offset 980aa000" +message logged, where the address is different each time. The hard thing with +this issue is that it has a very low reproducibility rate. + +The corruption happens when there are multiple I/O commands in the NCQ queue. +So there are overlapping emulated DMA operations in flight and QEMU uses a +sequence of mapcache actions which can be executed in the "wrong" order thus +leading to an inconsistent xen-mapcache - so a bad address from the wrong +entry is returned. + +The bad thing with this issue is that QEMU crash due to "Bad ram offset" +appearance is a relatively good situation in the sense that this is a caught +error. But there might be a much worse (artificial) situation where the returned +address looks valid but points to a different mapped memory. + +The fix itself is not hard (ex. an additional checked field in MapCacheEntry), +but there is a need of some reliable way to test it considering the low +reproducibility rate. + +Regards, +Alex + +On Mon, 10 Apr 2017, hrg wrote: +> +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +>> Hi, +> +>> +> +>> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +>> instead of first level entry (if map to rom other than guest memory +> +>> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +>> out memory, qemu did not invalidate cache entries in linked +> +>> list(entry->next), so when VM balloon back in memory, gfns probably +> +>> mapped to different mfns, thus if guest asks device to DMA to these +> +>> GPA, qemu may DMA to stale MFNs. +> +>> +> +>> So I think in xen_invalidate_map_cache() linked lists should also be +> +>> checked and invalidated. +> +>> +> +>> Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? + +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +On Mon, 10 Apr 2017, hrg wrote: +> +> On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> >> Hi, +> +> >> +> +> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +> >> instead of first level entry (if map to rom other than guest memory +> +> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +> >> out memory, qemu did not invalidate cache entries in linked +> +> >> list(entry->next), so when VM balloon back in memory, gfns probably +> +> >> mapped to different mfns, thus if guest asks device to DMA to these +> +> >> GPA, qemu may DMA to stale MFNs. +> +> >> +> +> >> So I think in xen_invalidate_map_cache() linked lists should also be +> +> >> checked and invalidated. +> +> >> +> +> >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +Yes, you are right. We need to go through the list for each element of +> +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. + +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +> +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> On Mon, 10 Apr 2017, hrg wrote: +> +> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> > >> Hi, +> +> > >> +> +> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +> > >> instead of first level entry (if map to rom other than guest memory +> +> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +> > >> out memory, qemu did not invalidate cache entries in linked +> +> > >> list(entry->next), so when VM balloon back in memory, gfns probably +> +> > >> mapped to different mfns, thus if guest asks device to DMA to these +> +> > >> GPA, qemu may DMA to stale MFNs. +> +> > >> +> +> > >> So I think in xen_invalidate_map_cache() linked lists should also be +> +> > >> checked and invalidated. +> +> > >> +> +> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +> +> Yes, you are right. We need to go through the list for each element of +> +> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> +I spoke too soon. In the regular case there should be no locked mappings +> +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +beginning of the functions). Without locked mappings, there should never +> +be more than one element in each list (see xen_map_cache_unlocked: +> +entry->lock == true is a necessary condition to append a new entry to +> +the list, otherwise it is just remapped). +> +> +Can you confirm that what you are seeing are locked mappings +> +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. + +On Tue, 11 Apr 2017, hrg wrote: +> +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +<address@hidden> wrote: +> +> On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +>> On Mon, 10 Apr 2017, hrg wrote: +> +>> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +>> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +>> > >> Hi, +> +>> > >> +> +>> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +>> > >> instead of first level entry (if map to rom other than guest memory +> +>> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +>> > >> out memory, qemu did not invalidate cache entries in linked +> +>> > >> list(entry->next), so when VM balloon back in memory, gfns probably +> +>> > >> mapped to different mfns, thus if guest asks device to DMA to these +> +>> > >> GPA, qemu may DMA to stale MFNs. +> +>> > >> +> +>> > >> So I think in xen_invalidate_map_cache() linked lists should also be +> +>> > >> checked and invalidated. +> +>> > >> +> +>> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +>> +> +>> Yes, you are right. We need to go through the list for each element of +> +>> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> +> +> I spoke too soon. In the regular case there should be no locked mappings +> +> when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> beginning of the functions). Without locked mappings, there should never +> +> be more than one element in each list (see xen_map_cache_unlocked: +> +> entry->lock == true is a necessary condition to append a new entry to +> +> the list, otherwise it is just remapped). +> +> +> +> Can you confirm that what you are seeing are locked mappings +> +> when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> +In fact, I think the DPRINTF above is incorrect too. In +> +pci_add_option_rom(), rtl8139 rom is locked mapped in +> +pci_add_option_rom->memory_region_get_ram_ptr (after +> +memory_region_init_ram). So actually I think we should remove the +> +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. + + +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index e6b08e1..04f98b7 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +is_default_rom, + } + + pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); ++ xen_invalidate_map_cache_entry(ptr); + } + + static void pci_del_option_rom(PCIDevice *pdev) + +On Tue, 11 Apr 2017 15:32:09 -0700 (PDT) +Stefano Stabellini <address@hidden> wrote: + +> +On Tue, 11 Apr 2017, hrg wrote: +> +> On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> <address@hidden> wrote: +> +> > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> >> On Mon, 10 Apr 2017, hrg wrote: +> +> >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> >> > >> Hi, +> +> >> > >> +> +> >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in +> +> >> > >> entry->next instead of first level entry (if map to rom other than +> +> >> > >> guest memory comes first), while in xen_invalidate_map_cache(), +> +> >> > >> when VM ballooned out memory, qemu did not invalidate cache entries +> +> >> > >> in linked list(entry->next), so when VM balloon back in memory, +> +> >> > >> gfns probably mapped to different mfns, thus if guest asks device +> +> >> > >> to DMA to these GPA, qemu may DMA to stale MFNs. +> +> >> > >> +> +> >> > >> So I think in xen_invalidate_map_cache() linked lists should also be +> +> >> > >> checked and invalidated. +> +> >> > >> +> +> >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> >> +> +> >> Yes, you are right. We need to go through the list for each element of +> +> >> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> > +> +> > I spoke too soon. In the regular case there should be no locked mappings +> +> > when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> > beginning of the functions). Without locked mappings, there should never +> +> > be more than one element in each list (see xen_map_cache_unlocked: +> +> > entry->lock == true is a necessary condition to append a new entry to +> +> > the list, otherwise it is just remapped). +> +> > +> +> > Can you confirm that what you are seeing are locked mappings +> +> > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> +> +> In fact, I think the DPRINTF above is incorrect too. In +> +> pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> pci_add_option_rom->memory_region_get_ram_ptr (after +> +> memory_region_init_ram). So actually I think we should remove the +> +> DPRINTF warning as it is normal. +> +> +Let me explain why the DPRINTF warning is there: emulated dma operations +> +can involve locked mappings. Once a dma operation completes, the related +> +mapping is unlocked and can be safely destroyed. But if we destroy a +> +locked mapping in xen_invalidate_map_cache, while a dma is still +> +ongoing, QEMU will crash. We cannot handle that case. +> +> +However, the scenario you described is different. It has nothing to do +> +with DMA. It looks like pci_add_option_rom calls +> +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +locked mapping and it is never unlocked or destroyed. +> +> +It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +the append patch fix the problem you are seeing? For the proper fix, I +> +think we probably need some sort of memory_region_unmap wrapper or maybe +> +a call to address_space_unmap. +Hmm, for some reason my message to the Xen-devel list got rejected but was sent +to Qemu-devel instead, without any notice. Sorry if I'm missing something +obvious as a list newbie. + +Stefano, hrg, + +There is an issue with inconsistency between the list of normal MapCacheEntry's +and their 'reverse' counterparts - MapCacheRev's in locked_entries. +When bad situation happens, there are multiple (locked) MapCacheEntry +entries in the bucket's linked list along with a number of MapCacheRev's. And +when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the +first list and calculates a wrong pointer from it which may then be caught with +the "Bad RAM offset" check (or not). Mapcache invalidation might be related to +this issue as well I think. + +I'll try to provide a test code which can reproduce the issue from the +guest side using an emulated IDE controller, though it's much simpler to achieve +this result with an AHCI controller using multiple NCQ I/O commands. So far I've +seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O +DMA should be enough I think. + +On 2017/4/12 14:17, Alexey G wrote: +On Tue, 11 Apr 2017 15:32:09 -0700 (PDT) +Stefano Stabellini <address@hidden> wrote: +On Tue, 11 Apr 2017, hrg wrote: +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +On Mon, 10 Apr 2017, hrg wrote: +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in +entry->next instead of first level entry (if map to rom other than +guest memory comes first), while in xen_invalidate_map_cache(), +when VM ballooned out memory, qemu did not invalidate cache entries +in linked list(entry->next), so when VM balloon back in memory, +gfns probably mapped to different mfns, thus if guest asks device +to DMA to these GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. +Hmm, for some reason my message to the Xen-devel list got rejected but was sent +to Qemu-devel instead, without any notice. Sorry if I'm missing something +obvious as a list newbie. + +Stefano, hrg, + +There is an issue with inconsistency between the list of normal MapCacheEntry's +and their 'reverse' counterparts - MapCacheRev's in locked_entries. +When bad situation happens, there are multiple (locked) MapCacheEntry +entries in the bucket's linked list along with a number of MapCacheRev's. And +when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the +first list and calculates a wrong pointer from it which may then be caught with +the "Bad RAM offset" check (or not). Mapcache invalidation might be related to +this issue as well I think. + +I'll try to provide a test code which can reproduce the issue from the +guest side using an emulated IDE controller, though it's much simpler to achieve +this result with an AHCI controller using multiple NCQ I/O commands. So far I've +seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O +DMA should be enough I think. +Yes, I think there may be other bugs lurking, considering the complexity, +though we need to reproduce it if we want to delve into it. + +On Wed, 12 Apr 2017, Alexey G wrote: +> +On Tue, 11 Apr 2017 15:32:09 -0700 (PDT) +> +Stefano Stabellini <address@hidden> wrote: +> +> +> On Tue, 11 Apr 2017, hrg wrote: +> +> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> > <address@hidden> wrote: +> +> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> > >> On Mon, 10 Apr 2017, hrg wrote: +> +> > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> > >> > >> Hi, +> +> > >> > >> +> +> > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in +> +> > >> > >> entry->next instead of first level entry (if map to rom other than +> +> > >> > >> guest memory comes first), while in xen_invalidate_map_cache(), +> +> > >> > >> when VM ballooned out memory, qemu did not invalidate cache +> +> > >> > >> entries +> +> > >> > >> in linked list(entry->next), so when VM balloon back in memory, +> +> > >> > >> gfns probably mapped to different mfns, thus if guest asks device +> +> > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs. +> +> > >> > >> +> +> > >> > >> So I think in xen_invalidate_map_cache() linked lists should also +> +> > >> > >> be +> +> > >> > >> checked and invalidated. +> +> > >> > >> +> +> > >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> > >> +> +> > >> Yes, you are right. We need to go through the list for each element of +> +> > >> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> > > +> +> > > I spoke too soon. In the regular case there should be no locked mappings +> +> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> > > beginning of the functions). Without locked mappings, there should never +> +> > > be more than one element in each list (see xen_map_cache_unlocked: +> +> > > entry->lock == true is a necessary condition to append a new entry to +> +> > > the list, otherwise it is just remapped). +> +> > > +> +> > > Can you confirm that what you are seeing are locked mappings +> +> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> > > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> > +> +> > In fact, I think the DPRINTF above is incorrect too. In +> +> > pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> > pci_add_option_rom->memory_region_get_ram_ptr (after +> +> > memory_region_init_ram). So actually I think we should remove the +> +> > DPRINTF warning as it is normal. +> +> +> +> Let me explain why the DPRINTF warning is there: emulated dma operations +> +> can involve locked mappings. Once a dma operation completes, the related +> +> mapping is unlocked and can be safely destroyed. But if we destroy a +> +> locked mapping in xen_invalidate_map_cache, while a dma is still +> +> ongoing, QEMU will crash. We cannot handle that case. +> +> +> +> However, the scenario you described is different. It has nothing to do +> +> with DMA. It looks like pci_add_option_rom calls +> +> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +> locked mapping and it is never unlocked or destroyed. +> +> +> +> It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +> the append patch fix the problem you are seeing? For the proper fix, I +> +> think we probably need some sort of memory_region_unmap wrapper or maybe +> +> a call to address_space_unmap. +> +> +Hmm, for some reason my message to the Xen-devel list got rejected but was +> +sent +> +to Qemu-devel instead, without any notice. Sorry if I'm missing something +> +obvious as a list newbie. +> +> +Stefano, hrg, +> +> +There is an issue with inconsistency between the list of normal +> +MapCacheEntry's +> +and their 'reverse' counterparts - MapCacheRev's in locked_entries. +> +When bad situation happens, there are multiple (locked) MapCacheEntry +> +entries in the bucket's linked list along with a number of MapCacheRev's. And +> +when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the +> +first list and calculates a wrong pointer from it which may then be caught +> +with +> +the "Bad RAM offset" check (or not). Mapcache invalidation might be related to +> +this issue as well I think. +> +> +I'll try to provide a test code which can reproduce the issue from the +> +guest side using an emulated IDE controller, though it's much simpler to +> +achieve +> +this result with an AHCI controller using multiple NCQ I/O commands. So far +> +I've +> +seen this issue only with Windows 7 (and above) guest on AHCI, but any block +> +I/O +> +DMA should be enough I think. +That would be helpful. Please see if you can reproduce it after fixing +the other issue ( +http://marc.info/?l=qemu-devel&m=149195042500707&w=2 +). + +On 2017/4/12 6:32, Stefano Stabellini wrote: +On Tue, 11 Apr 2017, hrg wrote: +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +On Mon, 10 Apr 2017, hrg wrote: +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +instead of first level entry (if map to rom other than guest memory +comes first), while in xen_invalidate_map_cache(), when VM ballooned +out memory, qemu did not invalidate cache entries in linked +list(entry->next), so when VM balloon back in memory, gfns probably +mapped to different mfns, thus if guest asks device to DMA to these +GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. +Yes, I think so, maybe this is the proper way to fix this. +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index e6b08e1..04f98b7 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +is_default_rom, + } +pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); ++ xen_invalidate_map_cache_entry(ptr); + } +static void pci_del_option_rom(PCIDevice *pdev) + +On Wed, 12 Apr 2017, Herongguang (Stephen) wrote: +> +On 2017/4/12 6:32, Stefano Stabellini wrote: +> +> On Tue, 11 Apr 2017, hrg wrote: +> +> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> > <address@hidden> wrote: +> +> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> > > > On Mon, 10 Apr 2017, hrg wrote: +> +> > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> > > > > > > Hi, +> +> > > > > > > +> +> > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in +> +> > > > > > > entry->next +> +> > > > > > > instead of first level entry (if map to rom other than guest +> +> > > > > > > memory +> +> > > > > > > comes first), while in xen_invalidate_map_cache(), when VM +> +> > > > > > > ballooned +> +> > > > > > > out memory, qemu did not invalidate cache entries in linked +> +> > > > > > > list(entry->next), so when VM balloon back in memory, gfns +> +> > > > > > > probably +> +> > > > > > > mapped to different mfns, thus if guest asks device to DMA to +> +> > > > > > > these +> +> > > > > > > GPA, qemu may DMA to stale MFNs. +> +> > > > > > > +> +> > > > > > > So I think in xen_invalidate_map_cache() linked lists should +> +> > > > > > > also be +> +> > > > > > > checked and invalidated. +> +> > > > > > > +> +> > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> > > > Yes, you are right. We need to go through the list for each element of +> +> > > > the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> > > I spoke too soon. In the regular case there should be no locked mappings +> +> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> > > beginning of the functions). Without locked mappings, there should never +> +> > > be more than one element in each list (see xen_map_cache_unlocked: +> +> > > entry->lock == true is a necessary condition to append a new entry to +> +> > > the list, otherwise it is just remapped). +> +> > > +> +> > > Can you confirm that what you are seeing are locked mappings +> +> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> > > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> > In fact, I think the DPRINTF above is incorrect too. In +> +> > pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> > pci_add_option_rom->memory_region_get_ram_ptr (after +> +> > memory_region_init_ram). So actually I think we should remove the +> +> > DPRINTF warning as it is normal. +> +> Let me explain why the DPRINTF warning is there: emulated dma operations +> +> can involve locked mappings. Once a dma operation completes, the related +> +> mapping is unlocked and can be safely destroyed. But if we destroy a +> +> locked mapping in xen_invalidate_map_cache, while a dma is still +> +> ongoing, QEMU will crash. We cannot handle that case. +> +> +> +> However, the scenario you described is different. It has nothing to do +> +> with DMA. It looks like pci_add_option_rom calls +> +> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +> locked mapping and it is never unlocked or destroyed. +> +> +> +> It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +> the append patch fix the problem you are seeing? For the proper fix, I +> +> think we probably need some sort of memory_region_unmap wrapper or maybe +> +> a call to address_space_unmap. +> +> +Yes, I think so, maybe this is the proper way to fix this. +Would you be up for sending a proper patch and testing it? We cannot call +xen_invalidate_map_cache_entry directly from pci.c though, it would need +to be one of the other functions like address_space_unmap for example. + + +> +> diff --git a/hw/pci/pci.c b/hw/pci/pci.c +> +> index e6b08e1..04f98b7 100644 +> +> --- a/hw/pci/pci.c +> +> +++ b/hw/pci/pci.c +> +> @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +> +> is_default_rom, +> +> } +> +> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); +> +> + xen_invalidate_map_cache_entry(ptr); +> +> } +> +> static void pci_del_option_rom(PCIDevice *pdev) + +On 2017/4/13 7:51, Stefano Stabellini wrote: +On Wed, 12 Apr 2017, Herongguang (Stephen) wrote: +On 2017/4/12 6:32, Stefano Stabellini wrote: +On Tue, 11 Apr 2017, hrg wrote: +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +On Mon, 10 Apr 2017, hrg wrote: +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in +entry->next +instead of first level entry (if map to rom other than guest +memory +comes first), while in xen_invalidate_map_cache(), when VM +ballooned +out memory, qemu did not invalidate cache entries in linked +list(entry->next), so when VM balloon back in memory, gfns +probably +mapped to different mfns, thus if guest asks device to DMA to +these +GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should +also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. +Yes, I think so, maybe this is the proper way to fix this. +Would you be up for sending a proper patch and testing it? We cannot call +xen_invalidate_map_cache_entry directly from pci.c though, it would need +to be one of the other functions like address_space_unmap for example. +Yes, I will look into this. +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index e6b08e1..04f98b7 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +is_default_rom, + } + pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); ++ xen_invalidate_map_cache_entry(ptr); + } + static void pci_del_option_rom(PCIDevice *pdev) + +On Thu, 13 Apr 2017, Herongguang (Stephen) wrote: +> +On 2017/4/13 7:51, Stefano Stabellini wrote: +> +> On Wed, 12 Apr 2017, Herongguang (Stephen) wrote: +> +> > On 2017/4/12 6:32, Stefano Stabellini wrote: +> +> > > On Tue, 11 Apr 2017, hrg wrote: +> +> > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> > > > <address@hidden> wrote: +> +> > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> > > > > > On Mon, 10 Apr 2017, hrg wrote: +> +> > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> +> +> > > > > > > wrote: +> +> > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> +> +> > > > > > > > wrote: +> +> > > > > > > > > Hi, +> +> > > > > > > > > +> +> > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in +> +> > > > > > > > > entry->next +> +> > > > > > > > > instead of first level entry (if map to rom other than guest +> +> > > > > > > > > memory +> +> > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM +> +> > > > > > > > > ballooned +> +> > > > > > > > > out memory, qemu did not invalidate cache entries in linked +> +> > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns +> +> > > > > > > > > probably +> +> > > > > > > > > mapped to different mfns, thus if guest asks device to DMA +> +> > > > > > > > > to +> +> > > > > > > > > these +> +> > > > > > > > > GPA, qemu may DMA to stale MFNs. +> +> > > > > > > > > +> +> > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should +> +> > > > > > > > > also be +> +> > > > > > > > > checked and invalidated. +> +> > > > > > > > > +> +> > > > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> > > > > > Yes, you are right. We need to go through the list for each +> +> > > > > > element of +> +> > > > > > the array in xen_invalidate_map_cache. Can you come up with a +> +> > > > > > patch? +> +> > > > > I spoke too soon. In the regular case there should be no locked +> +> > > > > mappings +> +> > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at +> +> > > > > the +> +> > > > > beginning of the functions). Without locked mappings, there should +> +> > > > > never +> +> > > > > be more than one element in each list (see xen_map_cache_unlocked: +> +> > > > > entry->lock == true is a necessary condition to append a new entry +> +> > > > > to +> +> > > > > the list, otherwise it is just remapped). +> +> > > > > +> +> > > > > Can you confirm that what you are seeing are locked mappings +> +> > > > > when xen_invalidate_map_cache is called? To find out, enable the +> +> > > > > DPRINTK +> +> > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> > > > In fact, I think the DPRINTF above is incorrect too. In +> +> > > > pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> > > > pci_add_option_rom->memory_region_get_ram_ptr (after +> +> > > > memory_region_init_ram). So actually I think we should remove the +> +> > > > DPRINTF warning as it is normal. +> +> > > Let me explain why the DPRINTF warning is there: emulated dma operations +> +> > > can involve locked mappings. Once a dma operation completes, the related +> +> > > mapping is unlocked and can be safely destroyed. But if we destroy a +> +> > > locked mapping in xen_invalidate_map_cache, while a dma is still +> +> > > ongoing, QEMU will crash. We cannot handle that case. +> +> > > +> +> > > However, the scenario you described is different. It has nothing to do +> +> > > with DMA. It looks like pci_add_option_rom calls +> +> > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +> > > locked mapping and it is never unlocked or destroyed. +> +> > > +> +> > > It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +> > > the append patch fix the problem you are seeing? For the proper fix, I +> +> > > think we probably need some sort of memory_region_unmap wrapper or maybe +> +> > > a call to address_space_unmap. +> +> > +> +> > Yes, I think so, maybe this is the proper way to fix this. +> +> +> +> Would you be up for sending a proper patch and testing it? We cannot call +> +> xen_invalidate_map_cache_entry directly from pci.c though, it would need +> +> to be one of the other functions like address_space_unmap for example. +> +> +> +> +> +Yes, I will look into this. +Any updates? + + +> +> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c +> +> > > index e6b08e1..04f98b7 100644 +> +> > > --- a/hw/pci/pci.c +> +> > > +++ b/hw/pci/pci.c +> +> > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, +> +> > > bool +> +> > > is_default_rom, +> +> > > } +> +> > > pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); +> +> > > + xen_invalidate_map_cache_entry(ptr); +> +> > > } +> +> > > static void pci_del_option_rom(PCIDevice *pdev) +> + diff --git a/results/classifier/zero-shot/012/permissions/57231878 b/results/classifier/zero-shot/012/permissions/57231878 new file mode 100644 index 000000000..c192129ee --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/57231878 @@ -0,0 +1,260 @@ +permissions: 0.856 +device: 0.818 +other: 0.788 +register: 0.783 +semantic: 0.774 +graphic: 0.751 +assembly: 0.745 +debug: 0.732 +mistranslation: 0.719 +arm: 0.713 +x86: 0.713 +TCG: 0.689 +PID: 0.684 +architecture: 0.678 +network: 0.659 +performance: 0.644 +vnc: 0.640 +socket: 0.624 +risc-v: 0.611 +boot: 0.609 +files: 0.587 +kernel virtual machine: 0.584 + +[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. + +Hello all, +I wanted to submit a bug report in the tracker, but it seem to require +an Ubuntu One account, which I'm having trouble with, so I'll just +give it here and hopefully somebody can make use of it. The issue +seems to be in an experimental format, so it's likely not very +consequential anyway. + +For the sake of anyone else simply googling for a workaround, I'll +just paste in the (cleaned up) brief IRC conversation about my issue +from the official channel: +<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, +Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine +(FreeBSD-11.1). The VM always aborts at the same point in the +installation (downloading 'ports.tgz') with the following error +message: +"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: +qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. +zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 +-enable-kvm -hda freebsd/freebsd.qed -devic" +The commands I ran to create the machine are as follows: +"qemu-img create -f qed freebsd/freebsd.qed 16G" +"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda +freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 +-cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" +I tried adding logging options with the -d flag, but I didn't get +anything that seemed relevant, since I'm not sure what to look for. +<stsquad> ohh what's a qed device? +<stsquad> quy: it might be a workaround to use a qcow2 image for now +<stsquad> ahh the wiki has a statement "It is not recommended to use +QED for any new images. " +<danpb> 'qed' was an experimental disk image format created by IBM +before qcow2 v3 came along +<danpb> honestly nothing should ever use QED these days +<danpb> the good ideas from QED became qcow2v3 +<stsquad> danpb: sounds like we should put a warning on the option to +remind users of that fact +<danpb> quy: sounds like qed driver is simply broken - please do file +a bug against qemu bug tracker +<danpb> quy: but you should also really switch to qcow2 +<quy> I see; some people need to update their wikis then. I don't +remember where which guide I read when I first learned what little +QEMU I know, but I remember it specifically remember it saying QED was +the newest and most optimal format. +<stsquad> quy: we can only be responsible for our own wiki I'm afraid... +<danpb> if you remember where you saw that please let us know so we +can try to get it fixed +<quy> Thank you very much for the info; I will switch to QCOW. +Unfortunately, I'm not sure if I will be able to file any bug reports +in the tracker as I can't seem to log Launchpad, which it seems to +require. +<danpb> quy: an email to the mailing list would suffice too if you +can't deal with launchpad +<danpb> kwolf: ^^^ in case you're interested in possible QED +assertions from 2.12 + +If any more info is needed, feel free to email me; I'm not actually +subscribed to this list though. +Thank you, +Quytelda Kahja + +CC Qemu Block; looks like QED is a bit busted. + +On 06/27/2018 10:25 AM, Quytelda Kahja wrote: +> +Hello all, +> +I wanted to submit a bug report in the tracker, but it seem to require +> +an Ubuntu One account, which I'm having trouble with, so I'll just +> +give it here and hopefully somebody can make use of it. The issue +> +seems to be in an experimental format, so it's likely not very +> +consequential anyway. +> +> +For the sake of anyone else simply googling for a workaround, I'll +> +just paste in the (cleaned up) brief IRC conversation about my issue +> +from the official channel: +> +<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, +> +Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine +> +(FreeBSD-11.1). The VM always aborts at the same point in the +> +installation (downloading 'ports.tgz') with the following error +> +message: +> +"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: +> +qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. +> +zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 +> +-enable-kvm -hda freebsd/freebsd.qed -devic" +> +The commands I ran to create the machine are as follows: +> +"qemu-img create -f qed freebsd/freebsd.qed 16G" +> +"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda +> +freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 +> +-cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" +> +I tried adding logging options with the -d flag, but I didn't get +> +anything that seemed relevant, since I'm not sure what to look for. +> +<stsquad> ohh what's a qed device? +> +<stsquad> quy: it might be a workaround to use a qcow2 image for now +> +<stsquad> ahh the wiki has a statement "It is not recommended to use +> +QED for any new images. " +> +<danpb> 'qed' was an experimental disk image format created by IBM +> +before qcow2 v3 came along +> +<danpb> honestly nothing should ever use QED these days +> +<danpb> the good ideas from QED became qcow2v3 +> +<stsquad> danpb: sounds like we should put a warning on the option to +> +remind users of that fact +> +<danpb> quy: sounds like qed driver is simply broken - please do file +> +a bug against qemu bug tracker +> +<danpb> quy: but you should also really switch to qcow2 +> +<quy> I see; some people need to update their wikis then. I don't +> +remember where which guide I read when I first learned what little +> +QEMU I know, but I remember it specifically remember it saying QED was +> +the newest and most optimal format. +> +<stsquad> quy: we can only be responsible for our own wiki I'm afraid... +> +<danpb> if you remember where you saw that please let us know so we +> +can try to get it fixed +> +<quy> Thank you very much for the info; I will switch to QCOW. +> +Unfortunately, I'm not sure if I will be able to file any bug reports +> +in the tracker as I can't seem to log Launchpad, which it seems to +> +require. +> +<danpb> quy: an email to the mailing list would suffice too if you +> +can't deal with launchpad +> +<danpb> kwolf: ^^^ in case you're interested in possible QED +> +assertions from 2.12 +> +> +If any more info is needed, feel free to email me; I'm not actually +> +subscribed to this list though. +> +Thank you, +> +Quytelda Kahja +> + +On 06/29/2018 03:07 PM, John Snow wrote: +CC Qemu Block; looks like QED is a bit busted. + +On 06/27/2018 10:25 AM, Quytelda Kahja wrote: +Hello all, +I wanted to submit a bug report in the tracker, but it seem to require +an Ubuntu One account, which I'm having trouble with, so I'll just +give it here and hopefully somebody can make use of it. The issue +seems to be in an experimental format, so it's likely not very +consequential anyway. +Analysis in another thread may be relevant: +https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html +-- +Eric Blake, Principal Software Engineer +Red Hat, Inc. +1-919-301-3266 +Virtualization: qemu.org | libvirt.org + +Am 29.06.2018 um 22:16 hat Eric Blake geschrieben: +> +On 06/29/2018 03:07 PM, John Snow wrote: +> +> CC Qemu Block; looks like QED is a bit busted. +> +> +> +> On 06/27/2018 10:25 AM, Quytelda Kahja wrote: +> +> > Hello all, +> +> > I wanted to submit a bug report in the tracker, but it seem to require +> +> > an Ubuntu One account, which I'm having trouble with, so I'll just +> +> > give it here and hopefully somebody can make use of it. The issue +> +> > seems to be in an experimental format, so it's likely not very +> +> > consequential anyway. +> +> +Analysis in another thread may be relevant: +> +> +https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html +The assertion there was: + +qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion +`!atomic_read(&to->in_flight)' failed. + +Which quite clearly pointed to a drain bug. This one, however, doesn't +seem to be related to drain, so I think it's probably a different bug. + +Kevin + diff --git a/results/classifier/zero-shot/012/permissions/67821138 b/results/classifier/zero-shot/012/permissions/67821138 new file mode 100644 index 000000000..1d7afe60d --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/67821138 @@ -0,0 +1,217 @@ +architecture: 0.939 +permissions: 0.935 +device: 0.916 +assembly: 0.915 +risc-v: 0.914 +PID: 0.909 +register: 0.891 +boot: 0.881 +arm: 0.872 +debug: 0.870 +other: 0.853 +performance: 0.845 +semantic: 0.843 +graphic: 0.826 +files: 0.824 +mistranslation: 0.768 +kernel virtual machine: 0.756 +vnc: 0.734 +network: 0.718 +socket: 0.699 +x86: 0.540 +TCG: 0.498 + +[BUG, RFC] Base node is in RW after making external snapshot + +Hi everyone, + +When making an external snapshot, we end up in a situation when 2 block +graph nodes related to the same image file (format and storage nodes) +have different RO flags set on them. + +E.g. + +# ls -la /proc/PID/fd +lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd + +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +--pretty | egrep '"node-name"|"ro"' + "ro": false, + "node-name": "libvirt-1-format", + "ro": false, + "node-name": "libvirt-1-storage", + +# virsh snapshot-create-as VM --name snap --disk-only +Domain snapshot snap created + +# ls -la /proc/PID/fd +lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd +lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap + +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +--pretty | egrep '"node-name"|"ro"' + "ro": false, + "node-name": "libvirt-2-format", + "ro": false, + "node-name": "libvirt-2-storage", + "ro": true, + "node-name": "libvirt-1-format", + "ro": false, <-------------- + "node-name": "libvirt-1-storage", + +File descriptor has been reopened in RO, but "libvirt-1-storage" node +still has RW permissions set. + +I'm wondering it this a bug or this is intended? Looks like a bug to +me, although I see that some iotests (e.g. 273) expect 2 nodes related +to the same image file to have different RO flags. + +bdrv_reopen_set_read_only() + bdrv_reopen() + bdrv_reopen_queue() + bdrv_reopen_queue_child() + bdrv_reopen_multiple() + bdrv_list_refresh_perms() + bdrv_topological_dfs() + bdrv_do_refresh_perms() + bdrv_reopen_commit() + +In the stack above bdrv_reopen_set_read_only() is only being called for +the parent (libvirt-1-format) node. There're 2 lists: BDSs from +refresh_list are used by bdrv_drv_set_perm and this leads to actual +reopen with RO of the file descriptor. And then there's reopen queue +bs_queue -- BDSs from this queue get their parameters updated. While +refresh_list ends up having the whole subtree (including children, this +is done in bdrv_topological_dfs()) bs_queue only has the parent. And +that is because storage (child) node's (bs->inherits_from == NULL), so +bdrv_reopen_queue_child() never adds it to the queue. Could it be the +source of this bug? + +Anyway, would greatly appreciate a clarification. + +Andrey + +On 4/24/24 21:00, Andrey Drobyshev wrote: +> +Hi everyone, +> +> +When making an external snapshot, we end up in a situation when 2 block +> +graph nodes related to the same image file (format and storage nodes) +> +have different RO flags set on them. +> +> +E.g. +> +> +# ls -la /proc/PID/fd +> +lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd +> +> +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +> +--pretty | egrep '"node-name"|"ro"' +> +"ro": false, +> +"node-name": "libvirt-1-format", +> +"ro": false, +> +"node-name": "libvirt-1-storage", +> +> +# virsh snapshot-create-as VM --name snap --disk-only +> +Domain snapshot snap created +> +> +# ls -la /proc/PID/fd +> +lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd +> +lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap +> +> +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +> +--pretty | egrep '"node-name"|"ro"' +> +"ro": false, +> +"node-name": "libvirt-2-format", +> +"ro": false, +> +"node-name": "libvirt-2-storage", +> +"ro": true, +> +"node-name": "libvirt-1-format", +> +"ro": false, <-------------- +> +"node-name": "libvirt-1-storage", +> +> +File descriptor has been reopened in RO, but "libvirt-1-storage" node +> +still has RW permissions set. +> +> +I'm wondering it this a bug or this is intended? Looks like a bug to +> +me, although I see that some iotests (e.g. 273) expect 2 nodes related +> +to the same image file to have different RO flags. +> +> +bdrv_reopen_set_read_only() +> +bdrv_reopen() +> +bdrv_reopen_queue() +> +bdrv_reopen_queue_child() +> +bdrv_reopen_multiple() +> +bdrv_list_refresh_perms() +> +bdrv_topological_dfs() +> +bdrv_do_refresh_perms() +> +bdrv_reopen_commit() +> +> +In the stack above bdrv_reopen_set_read_only() is only being called for +> +the parent (libvirt-1-format) node. There're 2 lists: BDSs from +> +refresh_list are used by bdrv_drv_set_perm and this leads to actual +> +reopen with RO of the file descriptor. And then there's reopen queue +> +bs_queue -- BDSs from this queue get their parameters updated. While +> +refresh_list ends up having the whole subtree (including children, this +> +is done in bdrv_topological_dfs()) bs_queue only has the parent. And +> +that is because storage (child) node's (bs->inherits_from == NULL), so +> +bdrv_reopen_queue_child() never adds it to the queue. Could it be the +> +source of this bug? +> +> +Anyway, would greatly appreciate a clarification. +> +> +Andrey +Friendly ping. Could somebody confirm that it is a bug indeed? + diff --git a/results/classifier/zero-shot/012/permissions/74715356 b/results/classifier/zero-shot/012/permissions/74715356 new file mode 100644 index 000000000..202ed2bc8 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/74715356 @@ -0,0 +1,144 @@ +permissions: 0.930 +other: 0.927 +register: 0.925 +semantic: 0.916 +debug: 0.907 +architecture: 0.906 +performance: 0.905 +assembly: 0.905 +TCG: 0.901 +device: 0.900 +arm: 0.898 +PID: 0.897 +graphic: 0.894 +risc-v: 0.888 +x86: 0.883 +boot: 0.881 +mistranslation: 0.870 +vnc: 0.850 +files: 0.850 +socket: 0.843 +network: 0.838 +kernel virtual machine: 0.726 + +[Bug] x86 EFLAGS refresh is not happening correctly + +Hello, +I'm posting this here instead of opening an issue as it is not clear to me if this is a bug or not. +The issue is located in function "cpu_compute_eflags" in target/i386/cpu.h +( +https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.h#L2071 +) +This function is exectued in an out of cpu loop context. +It is used to synchronize TCG internal eflags registers (CC_OP, CC_SRC, etc...) with the CPU eflags field upon loop exit. +It does: +  eflags +|= +cpu_cc_compute_all +( +env +, +CC_OP +) +| +( +env +-> +df +& +DF_MASK +); +Shouldn't it be: +   +eflags += +cpu_cc_compute_all +( +env +, +CC_OP +) +| +( +env +-> +df +& +DF_MASK +); +as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +Thanks, +Kind regards, +Stevie + +On 05/08/21 11:51, Stevie Lavern wrote: +Shouldn't it be: +eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +No, both are wrong. env->eflags contains flags other than the +arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved. +The right code is in helper_read_eflags. You can move it into +cpu_compute_eflags, and make helper_read_eflags use it. +Paolo + +On 05/08/21 13:24, Paolo Bonzini wrote: +On 05/08/21 11:51, Stevie Lavern wrote: +Shouldn't it be: +eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +No, both are wrong. env->eflags contains flags other than the +arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved. +The right code is in helper_read_eflags. You can move it into +cpu_compute_eflags, and make helper_read_eflags use it. +Ah, actually the two are really the same, the TF/VM bits do not apply to +cpu_compute_eflags so it's correct. +What seems wrong is migration of the EFLAGS register. There should be +code in cpu_pre_save and cpu_post_load to special-case it and setup +CC_DST/CC_OP as done in cpu_load_eflags. +Also, cpu_load_eflags should assert that update_mask does not include +any of the arithmetic flags. +Paolo + +Thank for your reply! +It's still a bit cryptic for me. +I think i need to precise that I'm using a x86_64 custom user-mode,base on linux user-mode, that i'm developing (unfortunately i cannot share the code) with modifications in the translation loop (I've added cpu loop exits on specific instructions which are not control flow instructions). +If my understanding is correct, in the user-mode case 'cpu_compute_eflags' is called directly by 'x86_cpu_exec_exit' with the intention of synchronizing the CPU env->eflags field with its real value (represented by the CC_* fields). +I'm not sure how 'cpu_pre_save' and 'cpu_post_load' are involved in this case. + +As you said in your first email, 'helper_read_eflags' seems to be the correct way to go. +Here is some detail about my current experimentation/understanding of this "issue": +With the current implementation +     +eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +if I exit the loop with a CC_OP different from CC_OP_EFLAGS, I found that the resulting env->eflags may be invalid. +In my test case, the loop was exiting with eflags = 0x44 and CC_OP = CC_OP_SUBL with CC_DST=1, CC_SRC=258, CC_SRC2=0. +While 'cpu_cc_compute_all' computes the correct flags (ZF:0, PF:0), the result will still be 0x44 (ZF:1, PF:1) due to the 'or' operation, thus leading to an incorrect eflags value loaded into the CPU env. +In my case, after loop reentry, it led to an invalid branch to be taken. +Thanks for your time! +Regards +Stevie + +On Thu, Aug 5, 2021 at 1:33 PM Paolo Bonzini < +pbonzini@redhat.com +> wrote: +On 05/08/21 13:24, Paolo Bonzini wrote: +> On 05/08/21 11:51, Stevie Lavern wrote: +>> +>> Shouldn't it be: +>> eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +>> as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +> +> No, both are wrong. env->eflags contains flags other than the +> arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved. +> +> The right code is in helper_read_eflags. You can move it into +> cpu_compute_eflags, and make helper_read_eflags use it. +Ah, actually the two are really the same, the TF/VM bits do not apply to +cpu_compute_eflags so it's correct. +What seems wrong is migration of the EFLAGS register. There should be +code in cpu_pre_save and cpu_post_load to special-case it and setup +CC_DST/CC_OP as done in cpu_load_eflags. +Also, cpu_load_eflags should assert that update_mask does not include +any of the arithmetic flags. +Paolo + diff --git a/results/classifier/zero-shot/012/permissions/85542195 b/results/classifier/zero-shot/012/permissions/85542195 new file mode 100644 index 000000000..91435f757 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/85542195 @@ -0,0 +1,138 @@ +permissions: 0.968 +register: 0.947 +PID: 0.945 +other: 0.944 +semantic: 0.941 +assembly: 0.941 +graphic: 0.938 +device: 0.936 +performance: 0.933 +arm: 0.932 +boot: 0.932 +vnc: 0.923 +files: 0.920 +architecture: 0.916 +debug: 0.915 +risc-v: 0.913 +mistranslation: 0.907 +socket: 0.905 +network: 0.899 +x86: 0.888 +TCG: 0.877 +kernel virtual machine: 0.815 + +[Qemu-devel] [Bug in qemu-system-ppc running Mac OS 9 on Windows 10] + +Hi all, + +I've been experiencing issues when installing Mac OS 9.x using +qemu-system-ppc.exe in Windows 10. After booting from CD image, +partitioning a fresh disk image often hangs Qemu. When using a +pre-partitioned disk image, the OS installation process halts +somewhere during the process. The issues can be resolved by setting +qemu-system-ppc.exe to run in Windows 7 compatibility mode. +AFAIK all Qemu builds for Windows since Mac OS 9 became available as +guest are affected. +The issue is reproducible by installing Qemu for Windows from Stephan +Weil on Windows 10 and boot/install Mac OS 9.x + +Best regards and thanks for looking into this, +Howard + +On Nov 25, 2016, at 9:26 AM, address@hidden wrote: +Hi all, + +I've been experiencing issues when installing Mac OS 9.x using +qemu-system-ppc.exe in Windows 10. After booting from CD image, +partitioning a fresh disk image often hangs Qemu. When using a +pre-partitioned disk image, the OS installation process halts +somewhere during the process. The issues can be resolved by setting +qemu-system-ppc.exe to run in Windows 7 compatibility mode. +AFAIK all Qemu builds for Windows since Mac OS 9 became available as +guest are affected. +The issue is reproducible by installing Qemu for Windows from Stephan +Weil on Windows 10 and boot/install Mac OS 9.x + +Best regards and thanks for looking into this, +Howard +I assume there was some kind of behavior change for some of the +Windows API between Windows 7 and Windows 10, that is my guess as to +why the compatibility mode works. Could you run 'make check' on your +system, once in Windows 7 and once in Windows 10. Maybe the tests +will tell us something. I'm hoping that one of the tests succeeds in +Windows 7 and fails in Windows 10. That would help us pinpoint what +the problem is. +What I mean by run in Windows 7 is set the mingw environment to run +in Windows 7 compatibility mode (if possible). If you have Windows 7 +on another partition you could boot from, that would be better. +Good luck. +p.s. use 'make check -k' to allow all the tests to run (even if one +or more of the tests fails). + +> +> Hi all, +> +> +> +> I've been experiencing issues when installing Mac OS 9.x using +> +> qemu-system-ppc.exe in Windows 10. After booting from CD image, +> +> partitioning a fresh disk image often hangs Qemu. When using a +> +> pre-partitioned disk image, the OS installation process halts +> +> somewhere during the process. The issues can be resolved by setting +> +> qemu-system-ppc.exe to run in Windows 7 compatibility mode. +> +> AFAIK all Qemu builds for Windows since Mac OS 9 became available as +> +> guest are affected. +> +> The issue is reproducible by installing Qemu for Windows from Stephan +> +> Weil on Windows 10 and boot/install Mac OS 9.x +> +> +> +> Best regards and thanks for looking into this, +> +> Howard +> +> +> +I assume there was some kind of behavior change for some of the Windows API +> +between Windows 7 and Windows 10, that is my guess as to why the +> +compatibility mode works. Could you run 'make check' on your system, once in +> +Windows 7 and once in Windows 10. Maybe the tests will tell us something. +> +I'm hoping that one of the tests succeeds in Windows 7 and fails in Windows +> +10. That would help us pinpoint what the problem is. +> +> +What I mean by run in Windows 7 is set the mingw environment to run in +> +Windows 7 compatibility mode (if possible). If you have Windows 7 on another +> +partition you could boot from, that would be better. +> +> +Good luck. +> +> +p.s. use 'make check -k' to allow all the tests to run (even if one or more +> +of the tests fails). +Hi, + +Thank you for you suggestion, but I have no means to run the check you +suggest. I cross-compile from Linux. + +Best regards, +Howard + diff --git a/results/classifier/zero-shot/012/permissions/88281850 b/results/classifier/zero-shot/012/permissions/88281850 new file mode 100644 index 000000000..77e648912 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/88281850 @@ -0,0 +1,299 @@ +permissions: 0.985 +other: 0.983 +architecture: 0.980 +debug: 0.979 +arm: 0.979 +graphic: 0.974 +network: 0.973 +assembly: 0.972 +device: 0.970 +performance: 0.969 +semantic: 0.968 +boot: 0.967 +socket: 0.966 +register: 0.964 +files: 0.962 +PID: 0.959 +risc-v: 0.950 +mistranslation: 0.948 +vnc: 0.945 +TCG: 0.938 +kernel virtual machine: 0.901 +x86: 0.865 + +[Bug] Take more 150s to boot qemu on ARM64 + +Hi all, +I encounter a issue with kernel 5.19-rc1 on a ARM64 board: it takes +about 150s between beginning to run qemu command and beginng to boot +Linux kernel ("EFI stub: Booting Linux Kernel..."). +But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel +code and it finds c2445d387850 ("srcu: Add contention check to +call_srcu() srcu_data ->lock acquisition"). +The qemu (qemu version is 6.2.92) command i run is : + +./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \ +--trace "kvm*" \ +-cpu host \ +-machine virt,accel=kvm,gic-version=3 \ +-machine smp.cpus=2,smp.sockets=2 \ +-no-reboot \ +-nographic \ +-monitor unix:/home/cx/qmp-test,server,nowait \ +-bios /home/cx/boot/QEMU_EFI.fd \ +-kernel /home/cx/boot/Image \ +-device +pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1 +\ +-device vfio-pci,host=7d:01.3,id=net0 \ +-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4 \ +-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \ +-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \ +-net none \ +-D /home/cx/qemu_log.txt +I am not familiar with rcu code, and don't know how it causes the issue. +Do you have any idea about this issue? +Best Regard, + +Xiang Chen + +On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote: +> +Hi all, +> +> +I encounter a issue with kernel 5.19-rc1 on a ARM64 board: it takes about +> +150s between beginning to run qemu command and beginng to boot Linux kernel +> +("EFI stub: Booting Linux Kernel..."). +> +> +But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code +> +and it finds c2445d387850 ("srcu: Add contention check to call_srcu() +> +srcu_data ->lock acquisition"). +> +> +The qemu (qemu version is 6.2.92) command i run is : +> +> +./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \ +> +--trace "kvm*" \ +> +-cpu host \ +> +-machine virt,accel=kvm,gic-version=3 \ +> +-machine smp.cpus=2,smp.sockets=2 \ +> +-no-reboot \ +> +-nographic \ +> +-monitor unix:/home/cx/qmp-test,server,nowait \ +> +-bios /home/cx/boot/QEMU_EFI.fd \ +> +-kernel /home/cx/boot/Image \ +> +-device +> +pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1 +> +\ +> +-device vfio-pci,host=7d:01.3,id=net0 \ +> +-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4 \ +> +-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \ +> +-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \ +> +-net none \ +> +-D /home/cx/qemu_log.txt +> +> +I am not familiar with rcu code, and don't know how it causes the issue. Do +> +you have any idea about this issue? +Please see the discussion here: +https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/ +Though that report requires ACPI to be forced on to get the +delay, which results in more than 9,000 back-to-back calls to +synchronize_srcu_expedited(). I cannot reproduce this on my setup, even +with an artificial tight loop invoking synchronize_srcu_expedited(), +but then again I don't have ARM hardware. + +My current guess is that the following patch, but with larger values for +SRCU_MAX_NODELAY_PHASE. Here "larger" might well be up in the hundreds, +or perhaps even larger. + +If you get a chance to experiment with this, could you please reply +to the discussion at the above URL? (Or let me know, and I can CC +you on the next message in that thread.) + + Thanx, Paul + +------------------------------------------------------------------------ + +diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c +index 50ba70f019dea..0db7873f4e95b 100644 +--- a/kernel/rcu/srcutree.c ++++ b/kernel/rcu/srcutree.c +@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp) + + #define SRCU_INTERVAL 1 // Base delay if no expedited GPs +pending. + #define SRCU_MAX_INTERVAL 10 // Maximum incremental delay from slow +readers. +-#define SRCU_MAX_NODELAY_PHASE 1 // Maximum per-GP-phase consecutive +no-delay instances. ++#define SRCU_MAX_NODELAY_PHASE 3 // Maximum per-GP-phase consecutive +no-delay instances. + #define SRCU_MAX_NODELAY 100 // Maximum consecutive no-delay +instances. + + /* +@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp) + */ + static unsigned long srcu_get_delay(struct srcu_struct *ssp) + { ++ unsigned long gpstart; ++ unsigned long j; + unsigned long jbase = SRCU_INTERVAL; + + if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), +READ_ONCE(ssp->srcu_gp_seq_needed_exp))) + jbase = 0; +- if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) +- jbase += jiffies - READ_ONCE(ssp->srcu_gp_start); +- if (!jbase) { +- WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); +- if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE) +- jbase = 1; ++ if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) { ++ j = jiffies - 1; ++ gpstart = READ_ONCE(ssp->srcu_gp_start); ++ if (time_after(j, gpstart)) ++ jbase += j - gpstart; ++ if (!jbase) { ++ WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); ++ if (READ_ONCE(ssp->srcu_n_exp_nodelay) > +SRCU_MAX_NODELAY_PHASE) ++ jbase = 1; ++ } + } + return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase; + } + +å¨ 2022/6/13 21:22, Paul E. McKenney åé: +On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote: +Hi all, + +I encounter a issue with kernel 5.19-rc1 on a ARM64 board: it takes about +150s between beginning to run qemu command and beginng to boot Linux kernel +("EFI stub: Booting Linux Kernel..."). + +But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code +and it finds c2445d387850 ("srcu: Add contention check to call_srcu() +srcu_data ->lock acquisition"). + +The qemu (qemu version is 6.2.92) command i run is : + +./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \ +--trace "kvm*" \ +-cpu host \ +-machine virt,accel=kvm,gic-version=3 \ +-machine smp.cpus=2,smp.sockets=2 \ +-no-reboot \ +-nographic \ +-monitor unix:/home/cx/qmp-test,server,nowait \ +-bios /home/cx/boot/QEMU_EFI.fd \ +-kernel /home/cx/boot/Image \ +-device +pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1 +\ +-device vfio-pci,host=7d:01.3,id=net0 \ +-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4 \ +-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \ +-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \ +-net none \ +-D /home/cx/qemu_log.txt + +I am not familiar with rcu code, and don't know how it causes the issue. Do +you have any idea about this issue? +Please see the discussion here: +https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/ +Though that report requires ACPI to be forced on to get the +delay, which results in more than 9,000 back-to-back calls to +synchronize_srcu_expedited(). I cannot reproduce this on my setup, even +with an artificial tight loop invoking synchronize_srcu_expedited(), +but then again I don't have ARM hardware. + +My current guess is that the following patch, but with larger values for +SRCU_MAX_NODELAY_PHASE. Here "larger" might well be up in the hundreds, +or perhaps even larger. + +If you get a chance to experiment with this, could you please reply +to the discussion at the above URL? (Or let me know, and I can CC +you on the next message in that thread.) +Ok, thanks, i will reply it on above URL. +Thanx, Paul + +------------------------------------------------------------------------ + +diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c +index 50ba70f019dea..0db7873f4e95b 100644 +--- a/kernel/rcu/srcutree.c ++++ b/kernel/rcu/srcutree.c +@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp) +#define SRCU_INTERVAL 1 // Base delay if no expedited GPs pending. +#define SRCU_MAX_INTERVAL 10 // Maximum incremental delay from slow +readers. +-#define SRCU_MAX_NODELAY_PHASE 1 // Maximum per-GP-phase consecutive +no-delay instances. ++#define SRCU_MAX_NODELAY_PHASE 3 // Maximum per-GP-phase consecutive +no-delay instances. + #define SRCU_MAX_NODELAY 100 // Maximum consecutive no-delay +instances. +/* +@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp) + */ + static unsigned long srcu_get_delay(struct srcu_struct *ssp) + { ++ unsigned long gpstart; ++ unsigned long j; + unsigned long jbase = SRCU_INTERVAL; +if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp))) +jbase = 0; +- if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) +- jbase += jiffies - READ_ONCE(ssp->srcu_gp_start); +- if (!jbase) { +- WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); +- if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE) +- jbase = 1; ++ if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) { ++ j = jiffies - 1; ++ gpstart = READ_ONCE(ssp->srcu_gp_start); ++ if (time_after(j, gpstart)) ++ jbase += j - gpstart; ++ if (!jbase) { ++ WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); ++ if (READ_ONCE(ssp->srcu_n_exp_nodelay) > +SRCU_MAX_NODELAY_PHASE) ++ jbase = 1; ++ } + } + return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase; + } +. + diff --git a/results/classifier/zero-shot/012/permissions/95154278 b/results/classifier/zero-shot/012/permissions/95154278 new file mode 100644 index 000000000..1773d675b --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/95154278 @@ -0,0 +1,173 @@ +permissions: 0.989 +register: 0.957 +risc-v: 0.954 +other: 0.953 +debug: 0.951 +device: 0.951 +graphic: 0.950 +PID: 0.949 +vnc: 0.948 +assembly: 0.944 +semantic: 0.937 +performance: 0.936 +x86: 0.936 +arm: 0.935 +architecture: 0.928 +files: 0.918 +socket: 0.913 +network: 0.913 +TCG: 0.906 +boot: 0.902 +mistranslation: 0.897 +kernel virtual machine: 0.828 + +[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c + +If checkpatch.pl is applied (using switch "-f") on file +target/mips/msa_helper.c, it will hang. + +There is a workaround for this particular file: + +These lines in msa_helper.c: + + uint## BITS ##_t S = _S, T = _T; \ + uint## BITS ##_t as, at, xs, xt, xd; \ + +should be replaced with: + + uint## BITS ## _t S = _S, T = _T; \ + uint## BITS ## _t as, at, xs, xt, xd; \ + +(a space is added after the second "##" in each line) + +The workaround is found by partial deleting and undeleting of the code in +msa_helper.c in binary search fashion. + +This workaround will soon be submitted by me as a patch within a series on misc +MIPS issues. + +I took a look at checkpatch.pl code, and it looks it is fairly complicated to +fix the issue, since it happens in the code segment involving intricate logic +conditions. + +Regards, +Aleksandar + +On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: +> +If checkpatch.pl is applied (using switch "-f") on file +> +target/mips/msa_helper.c, it will hang. +> +> +There is a workaround for this particular file: +> +> +These lines in msa_helper.c: +> +> +uint## BITS ##_t S = _S, T = _T; \ +> +uint## BITS ##_t as, at, xs, xt, xd; \ +> +> +should be replaced with: +> +> +uint## BITS ## _t S = _S, T = _T; \ +> +uint## BITS ## _t as, at, xs, xt, xd; \ +> +> +(a space is added after the second "##" in each line) +> +> +The workaround is found by partial deleting and undeleting of the code in +> +msa_helper.c in binary search fashion. +> +> +This workaround will soon be submitted by me as a patch within a series on +> +misc MIPS issues. +> +> +I took a look at checkpatch.pl code, and it looks it is fairly complicated to +> +fix the issue, since it happens in the code segment involving intricate logic +> +conditions. +Thanks for figuring this out, Aleksandar. Not sure if anyone else has +the apetite to fix checkpatch.pl. + +Stefan +signature.asc +Description: +PGP signature + +On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote: +> +On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: +> +> If checkpatch.pl is applied (using switch "-f") on file +> +> target/mips/msa_helper.c, it will hang. +> +> +> +> There is a workaround for this particular file: +> +> +> +> These lines in msa_helper.c: +> +> +> +> uint## BITS ##_t S = _S, T = _T; \ +> +> uint## BITS ##_t as, at, xs, xt, xd; \ +> +> +> +> should be replaced with: +> +> +> +> uint## BITS ## _t S = _S, T = _T; \ +> +> uint## BITS ## _t as, at, xs, xt, xd; \ +> +> +> +> (a space is added after the second "##" in each line) +> +> +> +> The workaround is found by partial deleting and undeleting of the code in +> +> msa_helper.c in binary search fashion. +> +> +> +> This workaround will soon be submitted by me as a patch within a series on +> +> misc MIPS issues. +> +> +> +> I took a look at checkpatch.pl code, and it looks it is fairly complicated +> +> to fix the issue, since it happens in the code segment involving intricate +> +> logic conditions. +> +> +Thanks for figuring this out, Aleksandar. Not sure if anyone else has +> +the apetite to fix checkpatch.pl. +Anyone else but Paolo ;P +http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html +signature.asc +Description: +OpenPGP digital signature + diff --git a/results/classifier/zero-shot/012/permissions/99674399 b/results/classifier/zero-shot/012/permissions/99674399 new file mode 100644 index 000000000..5a79c8e40 --- /dev/null +++ b/results/classifier/zero-shot/012/permissions/99674399 @@ -0,0 +1,166 @@ +permissions: 0.896 +device: 0.886 +other: 0.883 +debug: 0.857 +performance: 0.845 +mistranslation: 0.843 +assembly: 0.843 +register: 0.841 +architecture: 0.834 +arm: 0.829 +semantic: 0.822 +boot: 0.822 +PID: 0.812 +risc-v: 0.800 +kernel virtual machine: 0.797 +graphic: 0.794 +files: 0.787 +socket: 0.747 +x86: 0.731 +network: 0.711 +TCG: 0.686 +vnc: 0.673 + +[BUG] qemu crashes on assertion in cpu_asidx_from_attrs when cpu is in smm mode + +Hi all! + +First, I see this issue: +https://gitlab.com/qemu-project/qemu/-/issues/1198 +. +where some kvm/hardware failure leads to guest crash, and finally to this +assertion: + + cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed. + +But in the ticket the talk is about the guest crash and fixing the kernel, not +about the final QEMU assertion (which definitely show that something should be +fixed in QEMU code too). + + +We've faced same stack one time: + +(gdb) bt +#0 raise () from /lib/x86_64-linux-gnu/libc.so.6 +#1 abort () from /lib/x86_64-linux-gnu/libc.so.6 +#2 ?? () from /lib/x86_64-linux-gnu/libc.so.6 +#3 __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 +#4 cpu_asidx_from_attrs at ../hw/core/cpu-sysemu.c:76 +#5 cpu_memory_rw_debug at ../softmmu/physmem.c:3529 +#6 x86_cpu_dump_state at ../target/i386/cpu-dump.c:560 +#7 kvm_cpu_exec at ../accel/kvm/kvm-all.c:3000 +#8 kvm_vcpu_thread_fn at ../accel/kvm/kvm-accel-ops.c:51 +#9 qemu_thread_start at ../util/qemu-thread-posix.c:505 +#10 start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 +#11 clone () from /lib/x86_64-linux-gnu/libc.so.6 + + +And what I see: + +static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs) +{ + return !!attrs.secure; +} + +int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs) +{ + int ret = 0; + + if (cpu->cc->sysemu_ops->asidx_from_attrs) { + ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs); + assert(ret < cpu->num_ases && ret >= 0); <<<<<<<<<<<<<<<<< + } + return ret; +} + +(gdb) p cpu->num_ases +$3 = 1 + +(gdb) fr 5 +#5 0x00005578c8814ba3 in cpu_memory_rw_debug (cpu=c... +(gdb) p attrs +$6 = {unspecified = 0, secure = 1, user = 0, memory = 0, requester_id = 0, +byte_swap = 0, target_tlb_bit0 = 0, target_tlb_bit1 = 0, target_tlb_bit2 = 0} + +so .secure is 1, therefore ret is 1, in the same time num_ases is 1 too and +assertion fails. + + + +Where is .secure from? + +static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env) +{ + return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 }); +} + +Ok, it means we in SMM mode. + + + +On the other hand, it seems that num_ases seems to be always 1 for x86: + +vsementsov@vsementsov-lin:~/work/src/qemu/yc-7.2$ git grep 'num_ases = ' +cpu.c: cpu->num_ases = 0; +softmmu/cpus.c: cpu->num_ases = 1; +target/arm/cpu.c: cs->num_ases = 3 + has_secure; +target/arm/cpu.c: cs->num_ases = 1 + has_secure; +target/i386/tcg/sysemu/tcg-cpu.c: cs->num_ases = 2; + + +So, something is wrong around cpu->num_ases and x86_asidx_from_attrs() which +may return more in SMM mode. + + +The stack starts in +//7 0x00005578c882f539 in kvm_cpu_exec (cpu=cpu@entry=0x5578ca2eb340) at +../accel/kvm/kvm-all.c:3000 + if (ret < 0) { + cpu_dump_state(cpu, stderr, CPU_DUMP_CODE); + vm_stop(RUN_STATE_INTERNAL_ERROR); + } + +So that was some kvm error, and we decided to call cpu_dump_state(). And it +crashes. cpu_dump_state() is also called from hmp_info_registers, so I can +reproduce the crash with a tiny patch to master (as only CPU_DUMP_CODE path +calls cpu_memory_rw_debug(), as it is in kvm_cpu_exec()): + +diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c +index ff01cf9d8d..dcf0189048 100644 +--- a/monitor/hmp-cmds-target.c ++++ b/monitor/hmp-cmds-target.c +@@ -116,7 +116,7 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict) + } + + monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index); +- cpu_dump_state(cs, NULL, CPU_DUMP_FPU); ++ cpu_dump_state(cs, NULL, CPU_DUMP_CODE); + } + } + + +Than run + +yes "info registers" | ./build/qemu-system-x86_64 -accel kvm -monitor stdio \ + -global driver=cfi.pflash01,property=secure,value=on \ + -blockdev "{'driver': 'file', 'filename': +'/usr/share/OVMF/OVMF_CODE_4M.secboot.fd', 'node-name': 'ovmf-code', 'read-only': +true}" \ + -blockdev "{'driver': 'file', 'filename': '/usr/share/OVMF/OVMF_VARS_4M.fd', +'node-name': 'ovmf-vars', 'read-only': true}" \ + -machine q35,smm=on,pflash0=ovmf-code,pflash1=ovmf-vars -m 2G -nodefaults + +And after some time (less than 20 seconds for me) it leads to + +qemu-system-x86_64: ../hw/core/cpu-sysemu.c:76: cpu_asidx_from_attrs: Assertion `ret < +cpu->num_ases && ret >= 0' failed. +Aborted (core dumped) + + +I've no idea how to correctly fix this bug, but I hope that my reproducer and +investigation will help a bit. + +-- +Best regards, +Vladimir + |