diff options
Diffstat (limited to 'results/classifier/105/other/1866892')
| -rw-r--r-- | results/classifier/105/other/1866892 | 187 |
1 files changed, 187 insertions, 0 deletions
diff --git a/results/classifier/105/other/1866892 b/results/classifier/105/other/1866892 new file mode 100644 index 00000000..10bbaa66 --- /dev/null +++ b/results/classifier/105/other/1866892 @@ -0,0 +1,187 @@ +semantic: 0.989 +other: 0.984 +graphic: 0.976 +instruction: 0.970 +assembly: 0.969 +mistranslation: 0.966 +device: 0.954 +vnc: 0.953 +boot: 0.938 +KVM: 0.936 +socket: 0.933 +network: 0.899 + +guest OS catches a page fault bug when running dotnet + +The linux guest OS catches a page fault bug when running the dotnet application. + +host = metal = x86_64 +host OS = ubuntu 19.10 +qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; built from head/master +guest emulation = x86_64 +guest OS = ubuntu 19.10 +guest app = dotnet, running any program + +qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10, 2020) + +qemu invocation is: + +qemu/build/x86_64-softmmu/qemu-system-x86_64 \ + -m size=4096 \ + -smp cpus=1 \ + -machine type=pc-i440fx-5.0,accel=tcg \ + -cpu Skylake-Server-v1 \ + -nographic \ + -bios OVMF-pure-efi.fd \ + -drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \ + -device virtio-blk,drive=hd0 \ + -drive if=none,id=cloud,file=linux_cloud_config.img \ + -device virtio-blk,drive=cloud \ + -netdev user,id=user0,hostfwd=tcp::2223-:22 \ + -device virtio-net,netdev=user0 + + +Here's the guest kernel console output: + + +[ 2834.005449] BUG: unable to handle page fault for address: 00007fffffffc2c0 +[ 2834.009895] #PF: supervisor read access in user mode +[ 2834.013872] #PF: error_code(0x0001) - permissions violation +[ 2834.018025] IDT: 0xfffffe0000000000 (limit=0xfff) GDT: 0xfffffe0000001000 (limit=0x7f) +[ 2834.022242] LDTR: NULL +[ 2834.026306] TR: 0x40 -- base=0xfffffe0000003000 limit=0x206f +[ 2834.030395] PGD 80000000360d0067 P4D 80000000360d0067 PUD 36105067 PMD 36193067 PTE 8000000076d8e867 +[ 2834.038672] Oops: 0001 [#4] SMP PTI +[ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G D 5.3.0-29-generic #31-Ubuntu +[ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 +[ 2834.054785] RIP: 0033:0x1555547eaeda +[ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00 +[ 2834.072103] RSP: 002b:00007fffffffc2c0 EFLAGS: 00000202 +[ 2834.076507] RAX: 0000000000000000 RBX: 00001554b401af38 RCX: 0000000000000001 +[ 2834.080832] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fffffffcfb0 +[ 2834.085010] RBP: 00007fffffffd730 R08: 0000000000000000 R09: 00007fffffffd1b0 +[ 2834.089184] R10: 0000155555331dd5 R11: 00001555553ad8d0 R12: 0000000000000002 +[ 2834.093350] R13: 0000000000000001 R14: 0000000000000001 R15: 00001554b401d388 +[ 2834.097309] FS: 0000155554fa5740 GS: 0000000000000000 +[ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper virtio_net psmouse net_failover failover virtio_blk floppy +[ 2834.122539] CR2: 00007fffffffc2c0 +[ 2834.126867] ---[ end trace dfae51f1d9432708 ]--- +[ 2834.131239] RIP: 0033:0x14d793262eda +[ 2834.135715] Code: Bad RIP value. +[ 2834.140243] RSP: 002b:00007ffddb4e2980 EFLAGS: 00000202 +[ 2834.144615] RAX: 0000000000000000 RBX: 000014d6f402acb8 RCX: 0000000000000002 +[ 2834.148943] RDX: 0000000001cd6950 RSI: 0000000000000000 RDI: 00007ffddb4e3670 +[ 2834.153335] RBP: 00007ffddb4e3df0 R08: 0000000000000001 R09: 00007ffddb4e3870 +[ 2834.157774] R10: 000014d793da9dd5 R11: 000014d793e258d0 R12: 0000000000000002 +[ 2834.162132] R13: 0000000000000001 R14: 0000000000000001 R15: 000014d6f402d040 +[ 2834.166239] FS: 0000155554fa5740(0000) GS:ffff97213ba00000(0000) knlGS:0000000000000000 +[ 2834.170529] CS: 0033 DS: 0000 ES: 0000 CR0: 0000000080050033 +[ 2834.174751] CR2: 000014d793262eb0 CR3: 0000000036130000 CR4: 00000000007406f0 +[ 2834.178892] PKRU: 55555554 + +I run the application from a shell with `ulimit -s unlimited` (unlimited stack to size). + +The application creates a number of threads, and those threads make a lot of calls to sigaltstack() and mprotect(); see the relevant source for dotnet here https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467 + +using strace -f on the app shows that no alt stacks come anywhere near the failing address; all alt stacks are in the heap, as expected. None of the mmap/mprotect/munmap syscalls were given arguments in the high memory 0x7fffffff0000 and up. + +gdb (with default signal stop/print/pass semantics) does not report any signals prior to the kernel bug being tripped, so I doubt the alternate signal stack is actually used. + +When I run the same dotnet binary on the host (eg, on "bare metal"), the host kernel seems happy and dotnet runs as expected. + +I have not tried different qemu or guest or host O/S. + +A simpler case seems to produce the same error. See https://bugs.launchpad.net/qemu/+bug/1824344 + +I have confirmed that the dotnet guest application is executing a "iretq" instruction when this guest kernel bug is hit. A first round of analysis shows nothing unreasonable at the point the iretq is executed. The $rsp points into the middle of a mapped in page, the returned-to $rip looks reasonable, etc. We continue our analysis of qemu and the dotnet runtime. + +Is dotnet intentionally doing an iret? It seems like an odd thing for a userspace program to do, given it's basically "return from interrupt". + + +yes, it is intentional. I don't yet understand why, but am talking to those who do. https://github.com/dotnet/runtime/blob/1b02665be501b695b9c22c1ebd69148c07a225f6/src/coreclr/src/pal/src/arch/amd64/context2.S#L183 + + +Thanks, that's pretty clear. I expect you'll find the bug is just that QEMU doesn't get the semantics of an iret from userspace correct. The helper_iret_protected() function is probably a good place to look. + + +I've stepped/nexted from the helper_iret_protected, going deep into the bowels of the TLB, MMU and page table engine. None of which I understand. The helper_ret_protected faults in the first POPQ_RA. I'll investigate the value of sp at the time of the POPQ_RA. + +Here's the POPQ_RA in i386/seg_helper.c:2140 + + sp = env->regs[R_ESP]; + ssp = env->segs[R_SS].base; + new_eflags = 0; /* avoid warning */ +#ifdef TARGET_X86_64 + if (shift == 2) { + POPQ_RA(sp, new_eip, retaddr); + POPQ_RA(sp, new_cs, retaddr); + new_cs &= 0xffff; + if (is_iret) { + POPQ_RA(sp, new_eflags, retaddr); + } + +and here's the stack. Note some of the logical intermediate frames are optimized out due to -O3 and inline. (the value of env-errorcode is 1) + +0 0x0000555555a370c0 in raise_interrupt2 + (env=env@entry=0x5555566ef200, intno=14, is_int=is_int@entry=0, error_code=1, next_eip_addend=next_eip_addend@entry=0, retaddr=retaddr@entry=140736367565663) at /mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426 +#1 0x0000555555a377f9 in raise_exception_err_ra + (env=env@entry=0x5555566ef200, exception_index=<optimized out>, error_code=<optimized out>, retaddr=retaddr@entry=140736367565663) at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:127 +#2 0x0000555555a37d69 in x86_cpu_tlb_fill + (cs=0x5555566e69a0, addr=140727872411616, size=<optimized out>, access_type=MMU_DATA_LOAD, mmu_idx=0, probe=<optimized out>, retaddr=140736367565663) at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:697 +#3 0x0000555555952295 in tlb_fill + (cpu=0x5555566e69a0, addr=140727872411616, size=8, access_type=MMU_DATA_LOAD, mmu_idx=0, retaddr=140736367565663) + at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1017 +#4 0x0000555555956320 in load_helper + (full_load=0x555555956140 <helper_le_ldq_mmu>, code_read=false, op=MO_64, retaddr=93825010692608, oi=48, addr=140727872411616, env=0x5555566ef200) at /mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426 +#5 0x0000555555956320 in helper_le_ldq_mmu + (env=env@entry=0x5555566ef200, addr=addr@entry=140727872411616, oi=oi@entry=48, retaddr=retaddr@entry=140736367565663) + at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1688 +#6 0x0000555555956dc0 in cpu_load_helper + (full_load=0x555555956140 <helper_le_ldq_mmu>, op=MO_64, retaddr=140736367565663, mmu_idx=<optimized out>, addr=140727872411616, env=0x5555566ef200) at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1752 +#7 0x0000555555956dc0 in cpu_ldq_mmuidx_ra + (env=env@entry=0x5555566ef200, addr=addr@entry=140727872411616, mmu_idx=<optimized out>, ra=ra@entry=140736367565663) +--Type <RET> for more, q to quit, c to continue without paging-- + at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1799 +#8 0x0000555555a4ff09 in helper_ret_protected + (env=env@entry=0x5555566ef200, shift=shift@entry=2, is_iret=is_iret@entry=1, addend=addend@entry=0, retaddr=140736367565663) + at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2140 +#9 0x0000555555a50ff5 in helper_iret_protected (env=0x5555566ef200, shift=2, next_eip=-999377888) + at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2363 +#10 0x00007fffbd321b5f in code_gen_buffer () + + +Peter: I think your intuition is right. The POPQ_RA (pop quad, passing through return address handle) is only called from helper_ret_protected, and it suspiciously calls cpu_ldq_kernel_ra which calls cpu_mmu_index_kernel which only is prepared for kernel space iretq (and of course the substring _kernel in the function name tells us that too). + +That function is for "do a guest memory load as if from the kernel" (ie with the permissions and page table settings that guest kernel-mode accesses would use). We'd need to look at what the required semantics for the instruction are for user-mode iretq: are the loads supposed to be done with only the user-mode access and page tables? + + +Robert, please try replacing s/_kernel_/_data_/g. + +That will use the current address space and permissions as opposed to cpl=0 address space and permissions. + +The change should only be dynamically visible when doing an iretq from and to the same protection level, AFAICT. The code clearly[sic] works now for the interrupt return that is used by the linux kernel, presumably {from=kernel, to=kernel} or {from=kernel, to=user}. I would claim that to make this code correct it needs to work for all 4*4 state changes, as windows (notoriously) uses all 4 protection levels. I'm still digging, but your suggestion is certainly on the path forward. + +The QEMU project is currently considering to move its bug tracking to +another system. For this we need to know which bugs are still valid +and which could be closed already. Thus we are setting older bugs to +"Incomplete" now. + +If you still think this bug report here is valid, then please switch +the state back to "New" within the next 60 days, otherwise this report +will be marked as "Expired". Or please mark it as "Fix Released" if +the problem has been solved with a newer version of QEMU already. + +Thank you and sorry for the inconvenience. + + +The QEMU code implementing the iret insn is still doing loads/stores as if kernel mode, so this is still a bug. + + + +This is an automated cleanup. This bug report has been moved to QEMU's +new bug tracker on gitlab.com and thus gets marked as 'expired' now. +Please continue with the discussion here: + + https://gitlab.com/qemu-project/qemu/-/issues/249 + + |