diff options
Diffstat (limited to 'classification_output')
84 files changed, 59258 insertions, 0 deletions
diff --git a/classification_output/01/README.md b/classification_output/01/README.md new file mode 100644 index 000000000..b71a384d1 --- /dev/null +++ b/classification_output/01/README.md @@ -0,0 +1,4 @@ +- instruction: 13 +- mistranslation: 14 +- semantic: 8 +- other: 48 diff --git a/classification_output/01/instruction/0966902 b/classification_output/01/instruction/0966902 new file mode 100644 index 000000000..80cdabd29 --- /dev/null +++ b/classification_output/01/instruction/0966902 @@ -0,0 +1,39 @@ +instruction: 0.803 +semantic: 0.775 +mistranslation: 0.718 +other: 0.715 + +[Bug] "-ht" flag ignored under KVM - guest still reports HT + +Hi Community, +We have observed that the 'ht' feature bit cannot be disabled when QEMU runs +with KVM acceleration. +qemu-system-x86_64 \ + --enable-kvm \ + -machine q35 \ + -cpu host,-ht \ + -smp 4 \ + -m 4G \ + -drive file=rootfs.img,format=raw \ + -nographic \ + -append 'console=ttyS0 root=/dev/sda rw' +Because '-ht' is specified, the guest should expose no HT capability +(cpuid.1.edx[28] = 0), and /proc/cpuinfo shouldn't show HT feature, but we still +saw ht in linux guest when run 'cat /proc/cpuinfo'. +XiaoYao mentioned that: + +It has been the behavior of QEMU since + + commit 400281af34e5ee6aa9f5496b53d8f82c6fef9319 + Author: Andre Przywara <andre.przywara@amd.com> + Date: Wed Aug 19 15:42:42 2009 +0200 + + set CPUID bits to present cores and threads topology + +that we cannot remove HT CPUID bit from guest via "-cpu xxx,-ht" if the +VM has >= 2 vcpus. +I'd like to know whether there's a plan to address this issue, or if the current +behaviour is considered acceptable. +Best regards, +Ewan. + diff --git a/classification_output/01/instruction/2609717 b/classification_output/01/instruction/2609717 new file mode 100644 index 000000000..b8e563ad9 --- /dev/null +++ b/classification_output/01/instruction/2609717 @@ -0,0 +1,4939 @@ +instruction: 0.693 +mistranslation: 0.687 +semantic: 0.656 +other: 0.637 + +[BUG] cxl can not create region + +Hi list + +I want to test cxl functions in arm64, and found some problems I can't +figure out. + +My test environment: + +1. build latest bios from +https://github.com/tianocore/edk2.git +master +branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2) +2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git +master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm +support patch: +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +3. build Linux kernel from +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git +preview +branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683) +4. build latest ndctl tools from +https://github.com/pmem/ndctl +create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159) + +And my qemu test commands: +sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \ + -cpu max -smp 8 -nographic -no-reboot \ + -kernel $KERNEL -bios $BIOS_BIN \ + -drive if=none,file=$ROOTFS,format=qcow2,id=hd \ + -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1 +nokaslr dyndbg="module cxl* +p"' \ + -object memory-backend-ram,size=4G,id=mem0 \ + -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ + -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \ + -object +memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M +\ + -object +memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M +\ + -object +memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M +\ + -object +memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M +\ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ + -device cxl-upstream,bus=root_port0,id=us0 \ + -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ + -device +cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ + -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ + -device +cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ + -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ + -device +cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ + -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ + -device +cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ + -M +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k + +And I have got two problems. +1. When I want to create x1 region with command: "cxl create-region -d +decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer +reference. Crash log: + +[ 534.697324] cxl_region region0: config state: 0 +[ 534.697346] cxl_region region0: probe: -6 +[ 534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0 +[ 534.699115] cxl region0: mem0:endpoint3 decoder3.0 add: +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +[ 534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +[ 534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +[ 534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +[ 534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +for mem0:decoder3.0 @ 0 +[ 534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256 +[ 534.699193] cxl region0: 0000:0d:00.0:port2 target[0] = +0000:0e:00.0 for mem0:decoder3.0 @ 0 +[ 534.699405] Unable to handle kernel NULL pointer dereference at +virtual address 0000000000000000 +[ 534.701474] Mem abort info: +[ 534.701994] ESR = 0x0000000086000004 +[ 534.702653] EC = 0x21: IABT (current EL), IL = 32 bits +[ 534.703616] SET = 0, FnV = 0 +[ 534.704174] EA = 0, S1PTW = 0 +[ 534.704803] FSC = 0x04: level 0 translation fault +[ 534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000 +[ 534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 +[ 534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP +[ 534.710301] Modules linked in: +[ 534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted +5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11 +[ 534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 +[ 534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +[ 534.719190] pc : 0x0 +[ 534.719928] lr : commit_store+0x118/0x2cc +[ 534.721007] sp : ffff80000aec3c30 +[ 534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30 +[ 534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400 +[ 534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800 +[ 534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000 +[ 534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838 +[ 534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 +[ 534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 +[ 534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80 +[ 534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0 +[ 534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000 +[ 534.738878] Call trace: +[ 534.739368] 0x0 +[ 534.739713] dev_attr_store+0x1c/0x30 +[ 534.740186] sysfs_kf_write+0x48/0x58 +[ 534.740961] kernfs_fop_write_iter+0x128/0x184 +[ 534.741872] new_sync_write+0xdc/0x158 +[ 534.742706] vfs_write+0x1ac/0x2a8 +[ 534.743440] ksys_write+0x68/0xf0 +[ 534.744328] __arm64_sys_write+0x1c/0x28 +[ 534.745180] invoke_syscall+0x44/0xf0 +[ 534.745989] el0_svc_common+0x4c/0xfc +[ 534.746661] do_el0_svc+0x60/0xa8 +[ 534.747378] el0_svc+0x2c/0x78 +[ 534.748066] el0t_64_sync_handler+0xb8/0x12c +[ 534.748919] el0t_64_sync+0x18c/0x190 +[ 534.749629] Code: bad PC value +[ 534.750169] ---[ end trace 0000000000000000 ]--- + +2. When I want to create x4 region with command: "cxl create-region -d +decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors: + +cxl region: create_region: region0: failed to set target3 to mem3 +cxl region: cmd_create_region: created 0 regions + +And kernel log as below: +[ 60.536663] cxl_region region0: config state: 0 +[ 60.536675] cxl_region region0: probe: -6 +[ 60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0 +[ 60.538251] cxl region0: mem0:endpoint3 decoder3.0 add: +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +[ 60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +[ 60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +[ 60.538647] cxl region0: mem1:endpoint4 decoder4.0 add: +mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1 +[ 60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2 +[ 60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1 +[ 60.539311] cxl region0: mem2:endpoint5 decoder5.0 add: +mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1 +[ 60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3 +[ 60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1 +[ 60.539711] cxl region0: mem3:endpoint6 decoder6.0 add: +mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1 +[ 60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4 +[ 60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1 +[ 60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +[ 60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +for mem0:decoder3.0 @ 0 +[ 60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512 +[ 60.539758] cxl region0: 0000:0d:00.0:port2 target[0] = +0000:0e:00.0 for mem0:decoder3.0 @ 0 +[ 60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1 + +I have tried to write sysfs node manually, got same errors. + +Hope I can get some helps here. + +Bob + +On Fri, 5 Aug 2022 10:20:23 +0800 +Bobo WL <lmw.bobo@gmail.com> wrote: + +> +Hi list +> +> +I want to test cxl functions in arm64, and found some problems I can't +> +figure out. +Hi Bob, + +Glad to see people testing this code. + +> +> +My test environment: +> +> +1. build latest bios from +https://github.com/tianocore/edk2.git +master +> +branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2) +> +2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git +> +master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm +> +support patch: +> +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +> +3. build Linux kernel from +> +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git +preview +> +branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683) +> +4. build latest ndctl tools from +https://github.com/pmem/ndctl +> +create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159) +> +> +And my qemu test commands: +> +sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \ +> +-cpu max -smp 8 -nographic -no-reboot \ +> +-kernel $KERNEL -bios $BIOS_BIN \ +> +-drive if=none,file=$ROOTFS,format=qcow2,id=hd \ +> +-device virtio-blk-pci,drive=hd -append 'root=/dev/vda1 +> +nokaslr dyndbg="module cxl* +p"' \ +> +-object memory-backend-ram,size=4G,id=mem0 \ +> +-numa node,nodeid=0,cpus=0-7,memdev=mem0 \ +> +-net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \ +> +-object +> +memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M +> +\ +> +-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +Probably not related to your problem, but there is a disconnect in QEMU / +kernel assumptionsaround the presence of an HDM decoder when a HB only +has a single root port. Spec allows it to be provided or not as an +implementation choice. +Kernel assumes it isn't provide. Qemu assumes it is. + +The temporary solution is to throw in a second root port on the HB and not +connect anything to it. Longer term I may special case this so that the +particular +decoder defaults to pass through settings in QEMU if there is only one root +port. + +> +-device cxl-upstream,bus=root_port0,id=us0 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ +> +-device +> +cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ +> +-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ +> +-device +> +cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ +> +-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ +> +-device +> +cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ +> +-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ +> +-device +> +cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ +> +-M +> +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k +> +> +And I have got two problems. +> +1. When I want to create x1 region with command: "cxl create-region -d +> +decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer +> +reference. Crash log: +> +> +[ 534.697324] cxl_region region0: config state: 0 +> +[ 534.697346] cxl_region region0: probe: -6 +Seems odd this is up here. But maybe fine. + +> +[ 534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 534.699115] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256 +> +[ 534.699193] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 534.699405] Unable to handle kernel NULL pointer dereference at +> +virtual address 0000000000000000 +> +[ 534.701474] Mem abort info: +> +[ 534.701994] ESR = 0x0000000086000004 +> +[ 534.702653] EC = 0x21: IABT (current EL), IL = 32 bits +> +[ 534.703616] SET = 0, FnV = 0 +> +[ 534.704174] EA = 0, S1PTW = 0 +> +[ 534.704803] FSC = 0x04: level 0 translation fault +> +[ 534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000 +> +[ 534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 +> +[ 534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP +> +[ 534.710301] Modules linked in: +> +[ 534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted +> +5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11 +> +[ 534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 534.719190] pc : 0x0 +> +[ 534.719928] lr : commit_store+0x118/0x2cc +> +[ 534.721007] sp : ffff80000aec3c30 +> +[ 534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: +> +ffff0000c0c06b30 +> +[ 534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: +> +ffff0000c0a29400 +> +[ 534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: +> +ffff0000c0c06800 +> +[ 534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: +> +0000000000000000 +> +[ 534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: +> +0000ffffd41fe838 +> +[ 534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: +> +0000000000000000 +> +[ 534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : +> +0000000000000000 +> +[ 534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : +> +ffff0000c0906e80 +> +[ 534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : +> +ffff80000aec3bf0 +> +[ 534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : +> +ffff0000c155a000 +> +[ 534.738878] Call trace: +> +[ 534.739368] 0x0 +> +[ 534.739713] dev_attr_store+0x1c/0x30 +> +[ 534.740186] sysfs_kf_write+0x48/0x58 +> +[ 534.740961] kernfs_fop_write_iter+0x128/0x184 +> +[ 534.741872] new_sync_write+0xdc/0x158 +> +[ 534.742706] vfs_write+0x1ac/0x2a8 +> +[ 534.743440] ksys_write+0x68/0xf0 +> +[ 534.744328] __arm64_sys_write+0x1c/0x28 +> +[ 534.745180] invoke_syscall+0x44/0xf0 +> +[ 534.745989] el0_svc_common+0x4c/0xfc +> +[ 534.746661] do_el0_svc+0x60/0xa8 +> +[ 534.747378] el0_svc+0x2c/0x78 +> +[ 534.748066] el0t_64_sync_handler+0xb8/0x12c +> +[ 534.748919] el0t_64_sync+0x18c/0x190 +> +[ 534.749629] Code: bad PC value +> +[ 534.750169] ---[ end trace 0000000000000000 ]--- +> +> +2. When I want to create x4 region with command: "cxl create-region -d +> +decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors: +> +> +cxl region: create_region: region0: failed to set target3 to mem3 +> +cxl region: cmd_create_region: created 0 regions +> +> +And kernel log as below: +> +[ 60.536663] cxl_region region0: config state: 0 +> +[ 60.536675] cxl_region region0: probe: -6 +> +[ 60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 60.538251] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 60.538647] cxl region0: mem1:endpoint4 decoder4.0 add: +> +mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2 +> +[ 60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1 +> +[ 60.539311] cxl region0: mem2:endpoint5 decoder5.0 add: +> +mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3 +> +[ 60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1 +> +[ 60.539711] cxl region0: mem3:endpoint6 decoder6.0 add: +> +mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4 +> +[ 60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1 +> +[ 60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512 +This looks like off by 1 that should be fixed in the below mentioned +cxl/pending branch. That ig should be 256. Note the fix was +for a test case with a fat HB and no switch, but certainly looks +like this is the same issue. + +> +[ 60.539758] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at +> +1 +> +> +I have tried to write sysfs node manually, got same errors. +When stepping through by hand, which sysfs write triggers the crash above? + +Not sure it's related, but I've just sent out a fix to the +target register handling in QEMU. +20220808122051.14822-1-Jonathan.Cameron@huawei.com +/T/#m47ff985412ce44559e6b04d677c302f8cd371330">https://lore.kernel.org/linux-cxl/ +20220808122051.14822-1-Jonathan.Cameron@huawei.com +/T/#m47ff985412ce44559e6b04d677c302f8cd371330 +I did have one instance last week of triggering what looked to be a race +condition but +the stack trace doesn't looks related to what you've hit. + +It will probably be a few days before I have time to take a look at replicating +what you have seen. + +If you have time, try using the kernel.org cxl/pending branch as there are +a few additional fixes on there since you sent this email. Optimistic to hope +this is covered by one of those, but at least it will mean we are trying to +replicate +on same branch. + +Jonathan + + +> +> +Hope I can get some helps here. +> +> +Bob + +Hi Jonathan + +Thanks for your reply! + +On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +<Jonathan.Cameron@huawei.com> wrote: +> +> +Probably not related to your problem, but there is a disconnect in QEMU / +> +kernel assumptionsaround the presence of an HDM decoder when a HB only +> +has a single root port. Spec allows it to be provided or not as an +> +implementation choice. +> +Kernel assumes it isn't provide. Qemu assumes it is. +> +> +The temporary solution is to throw in a second root port on the HB and not +> +connect anything to it. Longer term I may special case this so that the +> +particular +> +decoder defaults to pass through settings in QEMU if there is only one root +> +port. +> +You are right! After adding an extra HB in qemu, I can create a x1 +region successfully. +But have some errors in Nvdimm: + +[ 74.925838] Unknown online node for memory at 0x10000000000, assuming node 0 +[ 74.925846] Unknown target node for memory at 0x10000000000, assuming node 0 +[ 74.927470] nd_region region0: nmem0: is disabled, failing probe + +And x4 region still failed with same errors, using latest cxl/preview +branch don't work. +I have picked "Two CXL emulation fixes" patches in qemu, still not working. + +Bob + +On Tue, 9 Aug 2022 21:07:06 +0800 +Bobo WL <lmw.bobo@gmail.com> wrote: + +> +Hi Jonathan +> +> +Thanks for your reply! +> +> +On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +<Jonathan.Cameron@huawei.com> wrote: +> +> +> +> Probably not related to your problem, but there is a disconnect in QEMU / +> +> kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> has a single root port. Spec allows it to be provided or not as an +> +> implementation choice. +> +> Kernel assumes it isn't provide. Qemu assumes it is. +> +> +> +> The temporary solution is to throw in a second root port on the HB and not +> +> connect anything to it. Longer term I may special case this so that the +> +> particular +> +> decoder defaults to pass through settings in QEMU if there is only one root +> +> port. +> +> +> +> +You are right! After adding an extra HB in qemu, I can create a x1 +> +region successfully. +> +But have some errors in Nvdimm: +> +> +[ 74.925838] Unknown online node for memory at 0x10000000000, assuming node > 0 +> +[ 74.925846] Unknown target node for memory at 0x10000000000, assuming node > 0 +> +[ 74.927470] nd_region region0: nmem0: is disabled, failing probe +Ah. I've seen this one, but not chased it down yet. Was on my todo list to +chase +down. Once I reach this state I can verify the HDM Decode is correct which is +what +I've been using to test (Which wasn't true until earlier this week). +I'm currently testing via devmem, more for historical reasons than because it +makes +that much sense anymore. + +> +> +And x4 region still failed with same errors, using latest cxl/preview +> +branch don't work. +> +I have picked "Two CXL emulation fixes" patches in qemu, still not working. +> +> +Bob + +On Tue, 9 Aug 2022 17:08:25 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Tue, 9 Aug 2022 21:07:06 +0800 +> +Bobo WL <lmw.bobo@gmail.com> wrote: +> +> +> Hi Jonathan +> +> +> +> Thanks for your reply! +> +> +> +> On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > Probably not related to your problem, but there is a disconnect in QEMU / +> +> > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > has a single root port. Spec allows it to be provided or not as an +> +> > implementation choice. +> +> > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > +> +> > The temporary solution is to throw in a second root port on the HB and not +> +> > connect anything to it. Longer term I may special case this so that the +> +> > particular +> +> > decoder defaults to pass through settings in QEMU if there is only one +> +> > root port. +> +> > +> +> +> +> You are right! After adding an extra HB in qemu, I can create a x1 +> +> region successfully. +> +> But have some errors in Nvdimm: +> +> +> +> [ 74.925838] Unknown online node for memory at 0x10000000000, assuming +> +> node 0 +> +> [ 74.925846] Unknown target node for memory at 0x10000000000, assuming +> +> node 0 +> +> [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> +Ah. I've seen this one, but not chased it down yet. Was on my todo list to +> +chase +> +down. Once I reach this state I can verify the HDM Decode is correct which is +> +what +> +I've been using to test (Which wasn't true until earlier this week). +> +I'm currently testing via devmem, more for historical reasons than because it +> +makes +> +that much sense anymore. +*embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +I'd forgotten that was still on the todo list. I don't think it will +be particularly hard to do and will take a look in next few days. + +Very very indirectly this error is causing a driver probe fail that means that +we hit a code path that has a rather odd looking check on NDD_LABELING. +Should not have gotten near that path though - hence the problem is actually +when we call cxl_pmem_get_config_data() and it returns an error because +we haven't fully connected up the command in QEMU. + +Jonathan + + +> +> +> +> +> And x4 region still failed with same errors, using latest cxl/preview +> +> branch don't work. +> +> I have picked "Two CXL emulation fixes" patches in qemu, still not working. +> +> +> +> Bob + +On Thu, 11 Aug 2022 18:08:57 +0100 +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: + +> +On Tue, 9 Aug 2022 17:08:25 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Tue, 9 Aug 2022 21:07:06 +0800 +> +> Bobo WL <lmw.bobo@gmail.com> wrote: +> +> +> +> > Hi Jonathan +> +> > +> +> > Thanks for your reply! +> +> > +> +> > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > <Jonathan.Cameron@huawei.com> wrote: +> +> > > +> +> > > Probably not related to your problem, but there is a disconnect in QEMU +> +> > > / +> +> > > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > > has a single root port. Spec allows it to be provided or not as an +> +> > > implementation choice. +> +> > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > +> +> > > The temporary solution is to throw in a second root port on the HB and +> +> > > not +> +> > > connect anything to it. Longer term I may special case this so that +> +> > > the particular +> +> > > decoder defaults to pass through settings in QEMU if there is only one +> +> > > root port. +> +> > > +> +> > +> +> > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > region successfully. +> +> > But have some errors in Nvdimm: +> +> > +> +> > [ 74.925838] Unknown online node for memory at 0x10000000000, assuming +> +> > node 0 +> +> > [ 74.925846] Unknown target node for memory at 0x10000000000, assuming +> +> > node 0 +> +> > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> +> +> Ah. I've seen this one, but not chased it down yet. Was on my todo list to +> +> chase +> +> down. Once I reach this state I can verify the HDM Decode is correct which +> +> is what +> +> I've been using to test (Which wasn't true until earlier this week). +> +> I'm currently testing via devmem, more for historical reasons than because +> +> it makes +> +> that much sense anymore. +> +> +*embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +I'd forgotten that was still on the todo list. I don't think it will +> +be particularly hard to do and will take a look in next few days. +> +> +Very very indirectly this error is causing a driver probe fail that means that +> +we hit a code path that has a rather odd looking check on NDD_LABELING. +> +Should not have gotten near that path though - hence the problem is actually +> +when we call cxl_pmem_get_config_data() and it returns an error because +> +we haven't fully connected up the command in QEMU. +So a least one bug in QEMU. We were not supporting variable length payloads on +mailbox +inputs (but were on outputs). That hasn't mattered until we get to LSA writes. +We just need to relax condition on the supplied length. + +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +index c352a935c4..fdda9529fe 100644 +--- a/hw/cxl/cxl-mailbox-utils.c ++++ b/hw/cxl/cxl-mailbox-utils.c +@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) + cxl_cmd = &cxl_cmd_set[set][cmd]; + h = cxl_cmd->handler; + if (h) { +- if (len == cxl_cmd->in) { ++ if (len == cxl_cmd->in || !cxl_cmd->in) { + cxl_cmd->payload = cxl_dstate->mbox_reg_state + + A_CXL_DEV_CMD_PAYLOAD; + ret = (*h)(cxl_cmd, cxl_dstate, &len); + + +This lets the nvdimm/region probe fine, but I'm getting some issues with +namespace capacity so I'll look at what is causing that next. +Unfortunately I'm not that familiar with the driver/nvdimm side of things +so it's take a while to figure out what kicks off what! + +Jonathan + +> +> +Jonathan +> +> +> +> +> +> > +> +> > And x4 region still failed with same errors, using latest cxl/preview +> +> > branch don't work. +> +> > I have picked "Two CXL emulation fixes" patches in qemu, still not +> +> > working. +> +> > +> +> > Bob +> +> + +Jonathan Cameron wrote: +> +On Thu, 11 Aug 2022 18:08:57 +0100 +> +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> On Tue, 9 Aug 2022 17:08:25 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > +> +> > > Hi Jonathan +> +> > > +> +> > > Thanks for your reply! +> +> > > +> +> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > +> +> > > > Probably not related to your problem, but there is a disconnect in +> +> > > > QEMU / +> +> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > implementation choice. +> +> > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > +> +> > > > The temporary solution is to throw in a second root port on the HB +> +> > > > and not +> +> > > > connect anything to it. Longer term I may special case this so that +> +> > > > the particular +> +> > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > one root port. +> +> > > > +> +> > > +> +> > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > region successfully. +> +> > > But have some errors in Nvdimm: +> +> > > +> +> > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > +> +> > Ah. I've seen this one, but not chased it down yet. Was on my todo list +> +> > to chase +> +> > down. Once I reach this state I can verify the HDM Decode is correct +> +> > which is what +> +> > I've been using to test (Which wasn't true until earlier this week). +> +> > I'm currently testing via devmem, more for historical reasons than +> +> > because it makes +> +> > that much sense anymore. +> +> +> +> *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> I'd forgotten that was still on the todo list. I don't think it will +> +> be particularly hard to do and will take a look in next few days. +> +> +> +> Very very indirectly this error is causing a driver probe fail that means +> +> that +> +> we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> Should not have gotten near that path though - hence the problem is actually +> +> when we call cxl_pmem_get_config_data() and it returns an error because +> +> we haven't fully connected up the command in QEMU. +> +> +So a least one bug in QEMU. We were not supporting variable length payloads +> +on mailbox +> +inputs (but were on outputs). That hasn't mattered until we get to LSA +> +writes. +> +We just need to relax condition on the supplied length. +> +> +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +index c352a935c4..fdda9529fe 100644 +> +--- a/hw/cxl/cxl-mailbox-utils.c +> ++++ b/hw/cxl/cxl-mailbox-utils.c +> +@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +cxl_cmd = &cxl_cmd_set[set][cmd]; +> +h = cxl_cmd->handler; +> +if (h) { +> +- if (len == cxl_cmd->in) { +> ++ if (len == cxl_cmd->in || !cxl_cmd->in) { +> +cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +A_CXL_DEV_CMD_PAYLOAD; +> +ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +This lets the nvdimm/region probe fine, but I'm getting some issues with +> +namespace capacity so I'll look at what is causing that next. +> +Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +so it's take a while to figure out what kicks off what! +The whirlwind tour is that 'struct nd_region' instances that represent a +persitent memory address range are composed of one more mappings of +'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +the dimm (if locked) and interrogating the label area to look for +namespace labels. + +The label command calls are routed to the '->ndctl()' callback that was +registered when the CXL nvdimm_bus_descriptor was created. That callback +handles both 'bus' scope calls, currently none for CXL, and per nvdimm +calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +to CXL commands. + +The 'struct nvdimm' objects that the CXL side registers have the +NDD_LABELING flag set which means that namespaces need to be explicitly +created / provisioned from region capacity. Otherwise, if +drivers/nvdimm/dimm.c does not find a namespace-label-index block then +the region reverts to label-less mode and a default namespace equal to +the size of the region is instantiated. + +If you are seeing small mismatches in namespace capacity then it may +just be the fact that by default 'ndctl create-namespace' results in an +'fsdax' mode namespace which just means that it is a block device where +1.5% of the capacity is reserved for 'struct page' metadata. You should +be able to see namespace capacity == region capacity by doing "ndctl +create-namespace -m raw", and disable DAX operation. + +Hope that helps. + +On Fri, 12 Aug 2022 09:03:02 -0700 +Dan Williams <dan.j.williams@intel.com> wrote: + +> +Jonathan Cameron wrote: +> +> On Thu, 11 Aug 2022 18:08:57 +0100 +> +> Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> +> > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > +> +> > > > Hi Jonathan +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > +> +> > > > > Probably not related to your problem, but there is a disconnect in +> +> > > > > QEMU / +> +> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > only +> +> > > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > > implementation choice. +> +> > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > +> +> > > > > The temporary solution is to throw in a second root port on the HB +> +> > > > > and not +> +> > > > > connect anything to it. Longer term I may special case this so +> +> > > > > that the particular +> +> > > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > > one root port. +> +> > > > > +> +> > > > +> +> > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > region successfully. +> +> > > > But have some errors in Nvdimm: +> +> > > > +> +> > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > +> +> > > +> +> > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > list to chase +> +> > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > which is what +> +> > > I've been using to test (Which wasn't true until earlier this week). +> +> > > I'm currently testing via devmem, more for historical reasons than +> +> > > because it makes +> +> > > that much sense anymore. +> +> > +> +> > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > I'd forgotten that was still on the todo list. I don't think it will +> +> > be particularly hard to do and will take a look in next few days. +> +> > +> +> > Very very indirectly this error is causing a driver probe fail that means +> +> > that +> +> > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > Should not have gotten near that path though - hence the problem is +> +> > actually +> +> > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > we haven't fully connected up the command in QEMU. +> +> +> +> So a least one bug in QEMU. We were not supporting variable length payloads +> +> on mailbox +> +> inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> writes. +> +> We just need to relax condition on the supplied length. +> +> +> +> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> index c352a935c4..fdda9529fe 100644 +> +> --- a/hw/cxl/cxl-mailbox-utils.c +> +> +++ b/hw/cxl/cxl-mailbox-utils.c +> +> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> h = cxl_cmd->handler; +> +> if (h) { +> +> - if (len == cxl_cmd->in) { +> +> + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> A_CXL_DEV_CMD_PAYLOAD; +> +> ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +> +> +> This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> namespace capacity so I'll look at what is causing that next. +> +> Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +> so it's take a while to figure out what kicks off what! +> +> +The whirlwind tour is that 'struct nd_region' instances that represent a +> +persitent memory address range are composed of one more mappings of +> +'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +> +in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +> +the dimm (if locked) and interrogating the label area to look for +> +namespace labels. +> +> +The label command calls are routed to the '->ndctl()' callback that was +> +registered when the CXL nvdimm_bus_descriptor was created. That callback +> +handles both 'bus' scope calls, currently none for CXL, and per nvdimm +> +calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +> +to CXL commands. +> +> +The 'struct nvdimm' objects that the CXL side registers have the +> +NDD_LABELING flag set which means that namespaces need to be explicitly +> +created / provisioned from region capacity. Otherwise, if +> +drivers/nvdimm/dimm.c does not find a namespace-label-index block then +> +the region reverts to label-less mode and a default namespace equal to +> +the size of the region is instantiated. +> +> +If you are seeing small mismatches in namespace capacity then it may +> +just be the fact that by default 'ndctl create-namespace' results in an +> +'fsdax' mode namespace which just means that it is a block device where +> +1.5% of the capacity is reserved for 'struct page' metadata. You should +> +be able to see namespace capacity == region capacity by doing "ndctl +> +create-namespace -m raw", and disable DAX operation. +Currently ndctl create-namespace crashes qemu ;) +Which isn't ideal! + +> +> +Hope that helps. +Got me looking at the right code. Thanks! + +Jonathan + +On Fri, 12 Aug 2022 17:15:09 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Fri, 12 Aug 2022 09:03:02 -0700 +> +Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> Jonathan Cameron wrote: +> +> > On Thu, 11 Aug 2022 18:08:57 +0100 +> +> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > > +> +> > > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > > +> +> > > > > Hi Jonathan +> +> > > > > +> +> > > > > Thanks for your reply! +> +> > > > > +> +> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > > +> +> > > > > > Probably not related to your problem, but there is a disconnect +> +> > > > > > in QEMU / +> +> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > > only +> +> > > > > > has a single root port. Spec allows it to be provided or not as +> +> > > > > > an implementation choice. +> +> > > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > > +> +> > > > > > The temporary solution is to throw in a second root port on the +> +> > > > > > HB and not +> +> > > > > > connect anything to it. Longer term I may special case this so +> +> > > > > > that the particular +> +> > > > > > decoder defaults to pass through settings in QEMU if there is +> +> > > > > > only one root port. +> +> > > > > > +> +> > > > > +> +> > > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > > region successfully. +> +> > > > > But have some errors in Nvdimm: +> +> > > > > +> +> > > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > > +> +> > > > +> +> > > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > > list to chase +> +> > > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > > which is what +> +> > > > I've been using to test (Which wasn't true until earlier this week). +> +> > > > I'm currently testing via devmem, more for historical reasons than +> +> > > > because it makes +> +> > > > that much sense anymore. +> +> > > +> +> > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > > I'd forgotten that was still on the todo list. I don't think it will +> +> > > be particularly hard to do and will take a look in next few days. +> +> > > +> +> > > Very very indirectly this error is causing a driver probe fail that +> +> > > means that +> +> > > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > > Should not have gotten near that path though - hence the problem is +> +> > > actually +> +> > > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > > we haven't fully connected up the command in QEMU. +> +> > +> +> > So a least one bug in QEMU. We were not supporting variable length +> +> > payloads on mailbox +> +> > inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> > writes. +> +> > We just need to relax condition on the supplied length. +> +> > +> +> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> > index c352a935c4..fdda9529fe 100644 +> +> > --- a/hw/cxl/cxl-mailbox-utils.c +> +> > +++ b/hw/cxl/cxl-mailbox-utils.c +> +> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> > cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> > h = cxl_cmd->handler; +> +> > if (h) { +> +> > - if (len == cxl_cmd->in) { +> +> > + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> > cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> > A_CXL_DEV_CMD_PAYLOAD; +> +> > ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> > +> +> > +> +> > This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> > namespace capacity so I'll look at what is causing that next. +> +> > Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +> > so it's take a while to figure out what kicks off what! +> +> +> +> The whirlwind tour is that 'struct nd_region' instances that represent a +> +> persitent memory address range are composed of one more mappings of +> +> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +> +> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +> +> the dimm (if locked) and interrogating the label area to look for +> +> namespace labels. +> +> +> +> The label command calls are routed to the '->ndctl()' callback that was +> +> registered when the CXL nvdimm_bus_descriptor was created. That callback +> +> handles both 'bus' scope calls, currently none for CXL, and per nvdimm +> +> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +> +> to CXL commands. +> +> +> +> The 'struct nvdimm' objects that the CXL side registers have the +> +> NDD_LABELING flag set which means that namespaces need to be explicitly +> +> created / provisioned from region capacity. Otherwise, if +> +> drivers/nvdimm/dimm.c does not find a namespace-label-index block then +> +> the region reverts to label-less mode and a default namespace equal to +> +> the size of the region is instantiated. +> +> +> +> If you are seeing small mismatches in namespace capacity then it may +> +> just be the fact that by default 'ndctl create-namespace' results in an +> +> 'fsdax' mode namespace which just means that it is a block device where +> +> 1.5% of the capacity is reserved for 'struct page' metadata. You should +> +> be able to see namespace capacity == region capacity by doing "ndctl +> +> create-namespace -m raw", and disable DAX operation. +> +> +Currently ndctl create-namespace crashes qemu ;) +> +Which isn't ideal! +> +Found a cause for this one. Mailbox payload may be as small as 256 bytes. +We have code in kernel sanity checking that output payload fits in the +mailbox, but nothing on the input payload. Symptom is that we write just +off the end whatever size the payload is. Note doing this shouldn't crash +qemu - so I need to fix a range check somewhere. + +I think this is because cxl_pmem_get_config_size() returns the mailbox +payload size as being the available LSA size, forgetting to remove the +size of the headers on the set_lsa side of things. +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110 +I've hacked the max_payload to be -8 + +Now we still don't succeed in creating the namespace, but bonus is it doesn't +crash any more. + + +Jonathan + + + +> +> +> +> Hope that helps. +> +Got me looking at the right code. Thanks! +> +> +Jonathan +> +> + +On Mon, 15 Aug 2022 15:18:09 +0100 +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: + +> +On Fri, 12 Aug 2022 17:15:09 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Fri, 12 Aug 2022 09:03:02 -0700 +> +> Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> +> > Jonathan Cameron wrote: +> +> > > On Thu, 11 Aug 2022 18:08:57 +0100 +> +> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> > > +> +> > > > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > > > +> +> > > > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > > > +> +> > > > > > Hi Jonathan +> +> > > > > > +> +> > > > > > Thanks for your reply! +> +> > > > > > +> +> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > > > +> +> > > > > > > Probably not related to your problem, but there is a disconnect +> +> > > > > > > in QEMU / +> +> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a +> +> > > > > > > HB only +> +> > > > > > > has a single root port. Spec allows it to be provided or not as +> +> > > > > > > an implementation choice. +> +> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > > > +> +> > > > > > > The temporary solution is to throw in a second root port on the +> +> > > > > > > HB and not +> +> > > > > > > connect anything to it. Longer term I may special case this so +> +> > > > > > > that the particular +> +> > > > > > > decoder defaults to pass through settings in QEMU if there is +> +> > > > > > > only one root port. +> +> > > > > > > +> +> > > > > > +> +> > > > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > > > region successfully. +> +> > > > > > But have some errors in Nvdimm: +> +> > > > > > +> +> > > > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > > > assuming node 0 +> +> > > > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > > > assuming node 0 +> +> > > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing +> +> > > > > > probe +> +> > > > > +> +> > > > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > > > list to chase +> +> > > > > down. Once I reach this state I can verify the HDM Decode is +> +> > > > > correct which is what +> +> > > > > I've been using to test (Which wasn't true until earlier this +> +> > > > > week). +> +> > > > > I'm currently testing via devmem, more for historical reasons than +> +> > > > > because it makes +> +> > > > > that much sense anymore. +> +> > > > +> +> > > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > > > I'd forgotten that was still on the todo list. I don't think it will +> +> > > > be particularly hard to do and will take a look in next few days. +> +> > > > +> +> > > > Very very indirectly this error is causing a driver probe fail that +> +> > > > means that +> +> > > > we hit a code path that has a rather odd looking check on +> +> > > > NDD_LABELING. +> +> > > > Should not have gotten near that path though - hence the problem is +> +> > > > actually +> +> > > > when we call cxl_pmem_get_config_data() and it returns an error +> +> > > > because +> +> > > > we haven't fully connected up the command in QEMU. +> +> > > +> +> > > So a least one bug in QEMU. We were not supporting variable length +> +> > > payloads on mailbox +> +> > > inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> > > writes. +> +> > > We just need to relax condition on the supplied length. +> +> > > +> +> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> > > index c352a935c4..fdda9529fe 100644 +> +> > > --- a/hw/cxl/cxl-mailbox-utils.c +> +> > > +++ b/hw/cxl/cxl-mailbox-utils.c +> +> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> > > cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> > > h = cxl_cmd->handler; +> +> > > if (h) { +> +> > > - if (len == cxl_cmd->in) { +> +> > > + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> > > cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> > > A_CXL_DEV_CMD_PAYLOAD; +> +> > > ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> > > +> +> > > +> +> > > This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> > > namespace capacity so I'll look at what is causing that next. +> +> > > Unfortunately I'm not that familiar with the driver/nvdimm side of +> +> > > things +> +> > > so it's take a while to figure out what kicks off what! +> +> > +> +> > The whirlwind tour is that 'struct nd_region' instances that represent a +> +> > persitent memory address range are composed of one more mappings of +> +> > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +> +> > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +> +> > the dimm (if locked) and interrogating the label area to look for +> +> > namespace labels. +> +> > +> +> > The label command calls are routed to the '->ndctl()' callback that was +> +> > registered when the CXL nvdimm_bus_descriptor was created. That callback +> +> > handles both 'bus' scope calls, currently none for CXL, and per nvdimm +> +> > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +> +> > to CXL commands. +> +> > +> +> > The 'struct nvdimm' objects that the CXL side registers have the +> +> > NDD_LABELING flag set which means that namespaces need to be explicitly +> +> > created / provisioned from region capacity. Otherwise, if +> +> > drivers/nvdimm/dimm.c does not find a namespace-label-index block then +> +> > the region reverts to label-less mode and a default namespace equal to +> +> > the size of the region is instantiated. +> +> > +> +> > If you are seeing small mismatches in namespace capacity then it may +> +> > just be the fact that by default 'ndctl create-namespace' results in an +> +> > 'fsdax' mode namespace which just means that it is a block device where +> +> > 1.5% of the capacity is reserved for 'struct page' metadata. You should +> +> > be able to see namespace capacity == region capacity by doing "ndctl +> +> > create-namespace -m raw", and disable DAX operation. +> +> +> +> Currently ndctl create-namespace crashes qemu ;) +> +> Which isn't ideal! +> +> +> +> +Found a cause for this one. Mailbox payload may be as small as 256 bytes. +> +We have code in kernel sanity checking that output payload fits in the +> +mailbox, but nothing on the input payload. Symptom is that we write just +> +off the end whatever size the payload is. Note doing this shouldn't crash +> +qemu - so I need to fix a range check somewhere. +> +> +I think this is because cxl_pmem_get_config_size() returns the mailbox +> +payload size as being the available LSA size, forgetting to remove the +> +size of the headers on the set_lsa side of things. +> +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110 +> +> +I've hacked the max_payload to be -8 +> +> +Now we still don't succeed in creating the namespace, but bonus is it doesn't +> +crash any more. +In the interests of defensive / correct handling from QEMU I took a +look into why it was crashing. Turns out that providing a NULL write callback +for +the memory device region (that the above overlarge write was spilling into) +isn't +a safe thing to do. Needs a stub. Oops. + +On plus side we might never have noticed this was going wrong without the crash +*silver lining in every cloud* + +Fix to follow... + +Jonathan + + +> +> +> +Jonathan +> +> +> +> +> > +> +> > Hope that helps. +> +> Got me looking at the right code. Thanks! +> +> +> +> Jonathan +> +> +> +> +> +> + +On Mon, 15 Aug 2022 at 15:55, Jonathan Cameron via <qemu-arm@nongnu.org> wrote: +> +In the interests of defensive / correct handling from QEMU I took a +> +look into why it was crashing. Turns out that providing a NULL write +> +callback for +> +the memory device region (that the above overlarge write was spilling into) +> +isn't +> +a safe thing to do. Needs a stub. Oops. +Yeah. We've talked before about adding an assert so that that kind of +"missing function" bug is caught at device creation rather than only +if the guest tries to access the device, but we never quite got around +to it... + +-- PMM + +On Fri, 12 Aug 2022 16:44:03 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Thu, 11 Aug 2022 18:08:57 +0100 +> +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> On Tue, 9 Aug 2022 17:08:25 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > +> +> > > Hi Jonathan +> +> > > +> +> > > Thanks for your reply! +> +> > > +> +> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > +> +> > > > Probably not related to your problem, but there is a disconnect in +> +> > > > QEMU / +> +> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > implementation choice. +> +> > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > +> +> > > > The temporary solution is to throw in a second root port on the HB +> +> > > > and not +> +> > > > connect anything to it. Longer term I may special case this so that +> +> > > > the particular +> +> > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > one root port. +> +> > > > +> +> > > +> +> > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > region successfully. +> +> > > But have some errors in Nvdimm: +> +> > > +> +> > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > +> +> > +> +> > Ah. I've seen this one, but not chased it down yet. Was on my todo list +> +> > to chase +> +> > down. Once I reach this state I can verify the HDM Decode is correct +> +> > which is what +> +> > I've been using to test (Which wasn't true until earlier this week). +> +> > I'm currently testing via devmem, more for historical reasons than +> +> > because it makes +> +> > that much sense anymore. +> +> +> +> *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> I'd forgotten that was still on the todo list. I don't think it will +> +> be particularly hard to do and will take a look in next few days. +> +> +> +> Very very indirectly this error is causing a driver probe fail that means +> +> that +> +> we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> Should not have gotten near that path though - hence the problem is actually +> +> when we call cxl_pmem_get_config_data() and it returns an error because +> +> we haven't fully connected up the command in QEMU. +> +> +So a least one bug in QEMU. We were not supporting variable length payloads +> +on mailbox +> +inputs (but were on outputs). That hasn't mattered until we get to LSA +> +writes. +> +We just need to relax condition on the supplied length. +> +> +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +index c352a935c4..fdda9529fe 100644 +> +--- a/hw/cxl/cxl-mailbox-utils.c +> ++++ b/hw/cxl/cxl-mailbox-utils.c +> +@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +cxl_cmd = &cxl_cmd_set[set][cmd]; +> +h = cxl_cmd->handler; +> +if (h) { +> +- if (len == cxl_cmd->in) { +> ++ if (len == cxl_cmd->in || !cxl_cmd->in) { +Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. + +With that fixed we hit new fun paths - after some errors we get the +worrying - not totally sure but looks like a failure on an error cleanup. +I'll chase down the error source, but even then this is probably triggerable by +hardware problem or similar. Some bonus prints in here from me chasing +error paths, but it's otherwise just cxl/next + the fix I posted earlier today. + +[ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +[ 69.920108] nd_region_probe +[ 69.920623] ------------[ cut here ]------------ +[ 69.920675] refcount_t: addition on 0; use-after-free. +[ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +refcount_warn_saturate+0xa0/0x144 +[ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi +cxl_core +[ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399 +[ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 +[ 69.931482] Workqueue: events_unbound async_run_entry_fn +[ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +[ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +[ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +[ 69.936541] sp : ffff80000890b960 +[ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000 +[ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000 +[ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800 +[ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff +[ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000 +[ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375 +[ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288 +[ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e +[ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000 +root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +[ 69.957098] Call trace: +[ 69.957959] refcount_warn_saturate+0xa0/0x144 +[ 69.958773] get_ndd+0x5c/0x80 +[ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +[ 69.960253] nd_region_probe+0x100/0x290 +[ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +[ 69.962087] really_probe+0x19c/0x3f0 +[ 69.962620] __driver_probe_device+0x11c/0x190 +[ 69.963258] driver_probe_device+0x44/0xf4 +[ 69.963773] __device_attach_driver+0xa4/0x140 +[ 69.964471] bus_for_each_drv+0x84/0xe0 +[ 69.965068] __device_attach+0xb0/0x1f0 +[ 69.966101] device_initial_probe+0x20/0x30 +[ 69.967142] bus_probe_device+0xa4/0xb0 +[ 69.968104] device_add+0x3e8/0x910 +[ 69.969111] nd_async_device_register+0x24/0x74 +[ 69.969928] async_run_entry_fn+0x40/0x150 +[ 69.970725] process_one_work+0x1dc/0x450 +[ 69.971796] worker_thread+0x154/0x450 +[ 69.972700] kthread+0x118/0x120 +[ 69.974141] ret_from_fork+0x10/0x20 +[ 69.975141] ---[ end trace 0000000000000000 ]--- +[ 70.117887] Into nd_namespace_pmem_set_resource() + +> +cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +A_CXL_DEV_CMD_PAYLOAD; +> +ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +This lets the nvdimm/region probe fine, but I'm getting some issues with +> +namespace capacity so I'll look at what is causing that next. +> +Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +so it's take a while to figure out what kicks off what! +> +> +Jonathan +> +> +> +> +> Jonathan +> +> +> +> +> +> > +> +> > > +> +> > > And x4 region still failed with same errors, using latest cxl/preview +> +> > > branch don't work. +> +> > > I have picked "Two CXL emulation fixes" patches in qemu, still not +> +> > > working. +> +> > > +> +> > > Bob +> +> +> +> +> + +On Mon, 15 Aug 2022 18:04:44 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Fri, 12 Aug 2022 16:44:03 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Thu, 11 Aug 2022 18:08:57 +0100 +> +> Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> +> > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > +> +> > > > Hi Jonathan +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > +> +> > > > > Probably not related to your problem, but there is a disconnect in +> +> > > > > QEMU / +> +> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > only +> +> > > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > > implementation choice. +> +> > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > +> +> > > > > The temporary solution is to throw in a second root port on the HB +> +> > > > > and not +> +> > > > > connect anything to it. Longer term I may special case this so +> +> > > > > that the particular +> +> > > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > > one root port. +> +> > > > > +> +> > > > +> +> > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > region successfully. +> +> > > > But have some errors in Nvdimm: +> +> > > > +> +> > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > +> +> > > +> +> > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > list to chase +> +> > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > which is what +> +> > > I've been using to test (Which wasn't true until earlier this week). +> +> > > I'm currently testing via devmem, more for historical reasons than +> +> > > because it makes +> +> > > that much sense anymore. +> +> > +> +> > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > I'd forgotten that was still on the todo list. I don't think it will +> +> > be particularly hard to do and will take a look in next few days. +> +> > +> +> > Very very indirectly this error is causing a driver probe fail that means +> +> > that +> +> > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > Should not have gotten near that path though - hence the problem is +> +> > actually +> +> > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > we haven't fully connected up the command in QEMU. +> +> +> +> So a least one bug in QEMU. We were not supporting variable length payloads +> +> on mailbox +> +> inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> writes. +> +> We just need to relax condition on the supplied length. +> +> +> +> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> index c352a935c4..fdda9529fe 100644 +> +> --- a/hw/cxl/cxl-mailbox-utils.c +> +> +++ b/hw/cxl/cxl-mailbox-utils.c +> +> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> h = cxl_cmd->handler; +> +> if (h) { +> +> - if (len == cxl_cmd->in) { +> +> + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. +Cause of the error is a failure in GET_LSA. +Reason, payload length is wrong in QEMU but was hidden previously by my wrong +fix here. Probably still a good idea to inject an error in GET_LSA and chase +down the refcount issue. + + +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +index fdda9529fe..e8565fbd6e 100644 +--- a/hw/cxl/cxl-mailbox-utils.c ++++ b/hw/cxl/cxl-mailbox-utils.c +@@ -489,7 +489,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = { + cmd_identify_memory_device, 0, 0 }, + [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO", + cmd_ccls_get_partition_info, 0, 0 }, +- [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 }, ++ [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 }, + [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa, + ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE }, + [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST", +@@ -510,12 +510,13 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) + cxl_cmd = &cxl_cmd_set[set][cmd]; + h = cxl_cmd->handler; + if (h) { +- if (len == cxl_cmd->in || !cxl_cmd->in) { ++ if (len == cxl_cmd->in || cxl_cmd->in == ~0) { + cxl_cmd->payload = cxl_dstate->mbox_reg_state + + A_CXL_DEV_CMD_PAYLOAD; + +And woot, we get a namespace in the LSA :) + +I'll post QEMU fixes in next day or two. Kernel side now seems more or less +fine be it with suspicious refcount underflow. + +> +> +With that fixed we hit new fun paths - after some errors we get the +> +worrying - not totally sure but looks like a failure on an error cleanup. +> +I'll chase down the error source, but even then this is probably triggerable +> +by +> +hardware problem or similar. Some bonus prints in here from me chasing +> +error paths, but it's otherwise just cxl/next + the fix I posted earlier +> +today. +> +> +[ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +> +[ 69.920108] nd_region_probe +> +[ 69.920623] ------------[ cut here ]------------ +> +[ 69.920675] refcount_t: addition on 0; use-after-free. +> +[ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +> +refcount_warn_saturate+0xa0/0x144 +> +[ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi +> +cxl_core +> +[ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399 +> +[ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 69.931482] Workqueue: events_unbound async_run_entry_fn +> +[ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +> +[ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +> +[ 69.936541] sp : ffff80000890b960 +> +[ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: +> +0000000000000000 +> +[ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: +> +0000000000000000 +> +[ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: +> +ffff0000c5254800 +> +[ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: +> +ffffffffffffffff +> +[ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: +> +0000000000000000 +> +[ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: +> +657466612d657375 +> +[ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : +> +ffffa54a8f63d288 +> +[ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : +> +00000000fffff31e +> +[ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : +> +ffff5ab66e5ef000 +> +root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +> +0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +> +[ 69.957098] Call trace: +> +[ 69.957959] refcount_warn_saturate+0xa0/0x144 +> +[ 69.958773] get_ndd+0x5c/0x80 +> +[ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +> +[ 69.960253] nd_region_probe+0x100/0x290 +> +[ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +> +[ 69.962087] really_probe+0x19c/0x3f0 +> +[ 69.962620] __driver_probe_device+0x11c/0x190 +> +[ 69.963258] driver_probe_device+0x44/0xf4 +> +[ 69.963773] __device_attach_driver+0xa4/0x140 +> +[ 69.964471] bus_for_each_drv+0x84/0xe0 +> +[ 69.965068] __device_attach+0xb0/0x1f0 +> +[ 69.966101] device_initial_probe+0x20/0x30 +> +[ 69.967142] bus_probe_device+0xa4/0xb0 +> +[ 69.968104] device_add+0x3e8/0x910 +> +[ 69.969111] nd_async_device_register+0x24/0x74 +> +[ 69.969928] async_run_entry_fn+0x40/0x150 +> +[ 69.970725] process_one_work+0x1dc/0x450 +> +[ 69.971796] worker_thread+0x154/0x450 +> +[ 69.972700] kthread+0x118/0x120 +> +[ 69.974141] ret_from_fork+0x10/0x20 +> +[ 69.975141] ---[ end trace 0000000000000000 ]--- +> +[ 70.117887] Into nd_namespace_pmem_set_resource() +> +> +> cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> A_CXL_DEV_CMD_PAYLOAD; +> +> ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +> +> +> This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> namespace capacity so I'll look at what is causing that next. +> +> Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +> so it's take a while to figure out what kicks off what! +> +> +> +> Jonathan +> +> +> +> > +> +> > Jonathan +> +> > +> +> > +> +> > > +> +> > > > +> +> > > > And x4 region still failed with same errors, using latest cxl/preview +> +> > > > branch don't work. +> +> > > > I have picked "Two CXL emulation fixes" patches in qemu, still not +> +> > > > working. +> +> > > > +> +> > > > Bob +> +> > +> +> > +> +> +> + +Jonathan Cameron wrote: +> +On Fri, 12 Aug 2022 16:44:03 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Thu, 11 Aug 2022 18:08:57 +0100 +> +> Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> +> > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > +> +> > > > Hi Jonathan +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > +> +> > > > > Probably not related to your problem, but there is a disconnect in +> +> > > > > QEMU / +> +> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > only +> +> > > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > > implementation choice. +> +> > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > +> +> > > > > The temporary solution is to throw in a second root port on the HB +> +> > > > > and not +> +> > > > > connect anything to it. Longer term I may special case this so +> +> > > > > that the particular +> +> > > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > > one root port. +> +> > > > > +> +> > > > +> +> > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > region successfully. +> +> > > > But have some errors in Nvdimm: +> +> > > > +> +> > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > +> +> > > +> +> > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > list to chase +> +> > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > which is what +> +> > > I've been using to test (Which wasn't true until earlier this week). +> +> > > I'm currently testing via devmem, more for historical reasons than +> +> > > because it makes +> +> > > that much sense anymore. +> +> > +> +> > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > I'd forgotten that was still on the todo list. I don't think it will +> +> > be particularly hard to do and will take a look in next few days. +> +> > +> +> > Very very indirectly this error is causing a driver probe fail that means +> +> > that +> +> > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > Should not have gotten near that path though - hence the problem is +> +> > actually +> +> > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > we haven't fully connected up the command in QEMU. +> +> +> +> So a least one bug in QEMU. We were not supporting variable length payloads +> +> on mailbox +> +> inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> writes. +> +> We just need to relax condition on the supplied length. +> +> +> +> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> index c352a935c4..fdda9529fe 100644 +> +> --- a/hw/cxl/cxl-mailbox-utils.c +> +> +++ b/hw/cxl/cxl-mailbox-utils.c +> +> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> h = cxl_cmd->handler; +> +> if (h) { +> +> - if (len == cxl_cmd->in) { +> +> + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. +> +> +With that fixed we hit new fun paths - after some errors we get the +> +worrying - not totally sure but looks like a failure on an error cleanup. +> +I'll chase down the error source, but even then this is probably triggerable +> +by +> +hardware problem or similar. Some bonus prints in here from me chasing +> +error paths, but it's otherwise just cxl/next + the fix I posted earlier +> +today. +One of the scenarios that I cannot rule out is nvdimm_probe() racing +nd_region_probe(), but given all the work it takes to create a region I +suspect all the nvdimm_probe() work to have completed... + +It is at least one potentially wrong hypothesis that needs to be chased +down. + +> +> +[ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +> +[ 69.920108] nd_region_probe +> +[ 69.920623] ------------[ cut here ]------------ +> +[ 69.920675] refcount_t: addition on 0; use-after-free. +> +[ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +> +refcount_warn_saturate+0xa0/0x144 +> +[ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi +> +cxl_core +> +[ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399 +> +[ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 69.931482] Workqueue: events_unbound async_run_entry_fn +> +[ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +> +[ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +> +[ 69.936541] sp : ffff80000890b960 +> +[ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: +> +0000000000000000 +> +[ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: +> +0000000000000000 +> +[ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: +> +ffff0000c5254800 +> +[ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: +> +ffffffffffffffff +> +[ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: +> +0000000000000000 +> +[ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: +> +657466612d657375 +> +[ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : +> +ffffa54a8f63d288 +> +[ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : +> +00000000fffff31e +> +[ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : +> +ffff5ab66e5ef000 +> +root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +> +0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +> +[ 69.957098] Call trace: +> +[ 69.957959] refcount_warn_saturate+0xa0/0x144 +> +[ 69.958773] get_ndd+0x5c/0x80 +> +[ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +> +[ 69.960253] nd_region_probe+0x100/0x290 +> +[ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +> +[ 69.962087] really_probe+0x19c/0x3f0 +> +[ 69.962620] __driver_probe_device+0x11c/0x190 +> +[ 69.963258] driver_probe_device+0x44/0xf4 +> +[ 69.963773] __device_attach_driver+0xa4/0x140 +> +[ 69.964471] bus_for_each_drv+0x84/0xe0 +> +[ 69.965068] __device_attach+0xb0/0x1f0 +> +[ 69.966101] device_initial_probe+0x20/0x30 +> +[ 69.967142] bus_probe_device+0xa4/0xb0 +> +[ 69.968104] device_add+0x3e8/0x910 +> +[ 69.969111] nd_async_device_register+0x24/0x74 +> +[ 69.969928] async_run_entry_fn+0x40/0x150 +> +[ 69.970725] process_one_work+0x1dc/0x450 +> +[ 69.971796] worker_thread+0x154/0x450 +> +[ 69.972700] kthread+0x118/0x120 +> +[ 69.974141] ret_from_fork+0x10/0x20 +> +[ 69.975141] ---[ end trace 0000000000000000 ]--- +> +[ 70.117887] Into nd_namespace_pmem_set_resource() + +On Mon, 15 Aug 2022 15:55:15 -0700 +Dan Williams <dan.j.williams@intel.com> wrote: + +> +Jonathan Cameron wrote: +> +> On Fri, 12 Aug 2022 16:44:03 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Thu, 11 Aug 2022 18:08:57 +0100 +> +> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > > +> +> > > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > > +> +> > > > > Hi Jonathan +> +> > > > > +> +> > > > > Thanks for your reply! +> +> > > > > +> +> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > > +> +> > > > > > Probably not related to your problem, but there is a disconnect +> +> > > > > > in QEMU / +> +> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > > only +> +> > > > > > has a single root port. Spec allows it to be provided or not as +> +> > > > > > an implementation choice. +> +> > > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > > +> +> > > > > > The temporary solution is to throw in a second root port on the +> +> > > > > > HB and not +> +> > > > > > connect anything to it. Longer term I may special case this so +> +> > > > > > that the particular +> +> > > > > > decoder defaults to pass through settings in QEMU if there is +> +> > > > > > only one root port. +> +> > > > > > +> +> > > > > +> +> > > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > > region successfully. +> +> > > > > But have some errors in Nvdimm: +> +> > > > > +> +> > > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > > +> +> > > > +> +> > > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > > list to chase +> +> > > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > > which is what +> +> > > > I've been using to test (Which wasn't true until earlier this week). +> +> > > > I'm currently testing via devmem, more for historical reasons than +> +> > > > because it makes +> +> > > > that much sense anymore. +> +> > > +> +> > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > > I'd forgotten that was still on the todo list. I don't think it will +> +> > > be particularly hard to do and will take a look in next few days. +> +> > > +> +> > > Very very indirectly this error is causing a driver probe fail that +> +> > > means that +> +> > > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > > Should not have gotten near that path though - hence the problem is +> +> > > actually +> +> > > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > > we haven't fully connected up the command in QEMU. +> +> > +> +> > So a least one bug in QEMU. We were not supporting variable length +> +> > payloads on mailbox +> +> > inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> > writes. +> +> > We just need to relax condition on the supplied length. +> +> > +> +> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> > index c352a935c4..fdda9529fe 100644 +> +> > --- a/hw/cxl/cxl-mailbox-utils.c +> +> > +++ b/hw/cxl/cxl-mailbox-utils.c +> +> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> > cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> > h = cxl_cmd->handler; +> +> > if (h) { +> +> > - if (len == cxl_cmd->in) { +> +> > + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. +> +> +> +> With that fixed we hit new fun paths - after some errors we get the +> +> worrying - not totally sure but looks like a failure on an error cleanup. +> +> I'll chase down the error source, but even then this is probably +> +> triggerable by +> +> hardware problem or similar. Some bonus prints in here from me chasing +> +> error paths, but it's otherwise just cxl/next + the fix I posted earlier +> +> today. +> +> +One of the scenarios that I cannot rule out is nvdimm_probe() racing +> +nd_region_probe(), but given all the work it takes to create a region I +> +suspect all the nvdimm_probe() work to have completed... +> +> +It is at least one potentially wrong hypothesis that needs to be chased +> +down. +Maybe there should be a special award for the non-intuitive +ndctl create-namespace command (modifies existing namespace and might create +a different empty one...) I'm sure there is some interesting history behind +that one :) + +Upshot is I just threw a filesystem on fsdax and wrote some text files on it +to allow easy grepping. The right data ends up in the memory and a plausible +namespace description is stored in the LSA. + +So to some degree at least it's 'working' on an 8 way direct connected +set of emulated devices. + +One snag is that serial number support isn't yet upstream in QEMU. +(I have had it in my tree for a while but not posted it yet because of + QEMU feature freeze) +https://gitlab.com/jic23/qemu/-/commit/144c783ea8a5fbe169f46ea1ba92940157f42733 +That's needed for meaningful cookie generation. Otherwise you can build the +namespace once, but it won't work on next probe as the cookie is 0 and you +hit some error paths. + +Maybe sensible to add a sanity check and fail namespace creation if +cookie is 0? (Silly side question, but is there a theoretical risk of +a serial number / other data combination leading to a fletcher64() +checksum that happens to be 0 - that would give a very odd bug report!) + +So to make it work the following is needed: + +1) The kernel fix for mailbox buffer overflow. +2) Qemu fix for size of arguements for get_lsa +3) Qemu fix to allow variable size input arguements (for set_lsa) +4) Serial number patch above + command lines to qemu to set appropriate + serial numbers. + +I'll send out the QEMU fixes shortly and post the Serial number patch, +though that almost certainly won't go in until next QEMU development +cycle starts in a few weeks. + +Next up, run through same tests on some other topologies. + +Jonathan + +> +> +> +> +> [ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +> +> [ 69.920108] nd_region_probe +> +> [ 69.920623] ------------[ cut here ]------------ +> +> [ 69.920675] refcount_t: addition on 0; use-after-free. +> +> [ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +> +> refcount_warn_saturate+0xa0/0x144 +> +> [ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port +> +> cxl_acpi cxl_core +> +> [ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ +> +> #399 +> +> [ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 +> +> 02/06/2015 +> +> [ 69.931482] Workqueue: events_unbound async_run_entry_fn +> +> [ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS +> +> BTYPE=--) +> +> [ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +> +> [ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +> +> [ 69.936541] sp : ffff80000890b960 +> +> [ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: +> +> 0000000000000000 +> +> [ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: +> +> 0000000000000000 +> +> [ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: +> +> ffff0000c5254800 +> +> [ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: +> +> ffffffffffffffff +> +> [ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: +> +> 0000000000000000 +> +> [ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: +> +> 657466612d657375 +> +> [ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : +> +> ffffa54a8f63d288 +> +> [ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : +> +> 00000000fffff31e +> +> [ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : +> +> ffff5ab66e5ef000 +> +> root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +> +> 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +> +> [ 69.957098] Call trace: +> +> [ 69.957959] refcount_warn_saturate+0xa0/0x144 +> +> [ 69.958773] get_ndd+0x5c/0x80 +> +> [ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +> +> [ 69.960253] nd_region_probe+0x100/0x290 +> +> [ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +> +> [ 69.962087] really_probe+0x19c/0x3f0 +> +> [ 69.962620] __driver_probe_device+0x11c/0x190 +> +> [ 69.963258] driver_probe_device+0x44/0xf4 +> +> [ 69.963773] __device_attach_driver+0xa4/0x140 +> +> [ 69.964471] bus_for_each_drv+0x84/0xe0 +> +> [ 69.965068] __device_attach+0xb0/0x1f0 +> +> [ 69.966101] device_initial_probe+0x20/0x30 +> +> [ 69.967142] bus_probe_device+0xa4/0xb0 +> +> [ 69.968104] device_add+0x3e8/0x910 +> +> [ 69.969111] nd_async_device_register+0x24/0x74 +> +> [ 69.969928] async_run_entry_fn+0x40/0x150 +> +> [ 69.970725] process_one_work+0x1dc/0x450 +> +> [ 69.971796] worker_thread+0x154/0x450 +> +> [ 69.972700] kthread+0x118/0x120 +> +> [ 69.974141] ret_from_fork+0x10/0x20 +> +> [ 69.975141] ---[ end trace 0000000000000000 ]--- +> +> [ 70.117887] Into nd_namespace_pmem_set_resource() + +Bobo WL wrote: +> +Hi list +> +> +I want to test cxl functions in arm64, and found some problems I can't +> +figure out. +> +> +My test environment: +> +> +1. build latest bios from +https://github.com/tianocore/edk2.git +master +> +branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2) +> +2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git +> +master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm +> +support patch: +> +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +> +3. build Linux kernel from +> +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git +preview +> +branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683) +> +4. build latest ndctl tools from +https://github.com/pmem/ndctl +> +create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159) +> +> +And my qemu test commands: +> +sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \ +> +-cpu max -smp 8 -nographic -no-reboot \ +> +-kernel $KERNEL -bios $BIOS_BIN \ +> +-drive if=none,file=$ROOTFS,format=qcow2,id=hd \ +> +-device virtio-blk-pci,drive=hd -append 'root=/dev/vda1 +> +nokaslr dyndbg="module cxl* +p"' \ +> +-object memory-backend-ram,size=4G,id=mem0 \ +> +-numa node,nodeid=0,cpus=0-7,memdev=mem0 \ +> +-net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \ +> +-object +> +memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M +> +\ +> +-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> +-device cxl-upstream,bus=root_port0,id=us0 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ +> +-device +> +cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ +> +-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ +> +-device +> +cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ +> +-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ +> +-device +> +cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ +> +-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ +> +-device +> +cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ +> +-M +> +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k +> +> +And I have got two problems. +> +1. When I want to create x1 region with command: "cxl create-region -d +> +decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer +> +reference. Crash log: +> +> +[ 534.697324] cxl_region region0: config state: 0 +> +[ 534.697346] cxl_region region0: probe: -6 +> +[ 534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 534.699115] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256 +> +[ 534.699193] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 534.699405] Unable to handle kernel NULL pointer dereference at +> +virtual address 0000000000000000 +> +[ 534.701474] Mem abort info: +> +[ 534.701994] ESR = 0x0000000086000004 +> +[ 534.702653] EC = 0x21: IABT (current EL), IL = 32 bits +> +[ 534.703616] SET = 0, FnV = 0 +> +[ 534.704174] EA = 0, S1PTW = 0 +> +[ 534.704803] FSC = 0x04: level 0 translation fault +> +[ 534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000 +> +[ 534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 +> +[ 534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP +> +[ 534.710301] Modules linked in: +> +[ 534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted +> +5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11 +> +[ 534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 534.719190] pc : 0x0 +> +[ 534.719928] lr : commit_store+0x118/0x2cc +> +[ 534.721007] sp : ffff80000aec3c30 +> +[ 534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: +> +ffff0000c0c06b30 +> +[ 534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: +> +ffff0000c0a29400 +> +[ 534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: +> +ffff0000c0c06800 +> +[ 534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: +> +0000000000000000 +> +[ 534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: +> +0000ffffd41fe838 +> +[ 534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: +> +0000000000000000 +> +[ 534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : +> +0000000000000000 +> +[ 534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : +> +ffff0000c0906e80 +> +[ 534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : +> +ffff80000aec3bf0 +> +[ 534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : +> +ffff0000c155a000 +> +[ 534.738878] Call trace: +> +[ 534.739368] 0x0 +> +[ 534.739713] dev_attr_store+0x1c/0x30 +> +[ 534.740186] sysfs_kf_write+0x48/0x58 +> +[ 534.740961] kernfs_fop_write_iter+0x128/0x184 +> +[ 534.741872] new_sync_write+0xdc/0x158 +> +[ 534.742706] vfs_write+0x1ac/0x2a8 +> +[ 534.743440] ksys_write+0x68/0xf0 +> +[ 534.744328] __arm64_sys_write+0x1c/0x28 +> +[ 534.745180] invoke_syscall+0x44/0xf0 +> +[ 534.745989] el0_svc_common+0x4c/0xfc +> +[ 534.746661] do_el0_svc+0x60/0xa8 +> +[ 534.747378] el0_svc+0x2c/0x78 +> +[ 534.748066] el0t_64_sync_handler+0xb8/0x12c +> +[ 534.748919] el0t_64_sync+0x18c/0x190 +> +[ 534.749629] Code: bad PC value +> +[ 534.750169] ---[ end trace 0000000000000000 ]--- +What was the top kernel commit when you ran this test? What is the line +number of "commit_store+0x118"? + +> +2. When I want to create x4 region with command: "cxl create-region -d +> +decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors: +> +> +cxl region: create_region: region0: failed to set target3 to mem3 +> +cxl region: cmd_create_region: created 0 regions +> +> +And kernel log as below: +> +[ 60.536663] cxl_region region0: config state: 0 +> +[ 60.536675] cxl_region region0: probe: -6 +> +[ 60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 60.538251] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 60.538647] cxl region0: mem1:endpoint4 decoder4.0 add: +> +mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2 +> +[ 60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1 +> +[ 60.539311] cxl region0: mem2:endpoint5 decoder5.0 add: +> +mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3 +> +[ 60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1 +> +[ 60.539711] cxl region0: mem3:endpoint6 decoder6.0 add: +> +mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4 +> +[ 60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1 +> +[ 60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512 +> +[ 60.539758] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at +> +1 +> +> +I have tried to write sysfs node manually, got same errors. +> +> +Hope I can get some helps here. +What is the output of: + + cxl list -MDTu -d decoder0.0 + +...? It might be the case that mem1 cannot be mapped by decoder0.0, or +at least not in the specified order, or that validation check is broken. + +Hi Dan, + +Thanks for your reply! + +On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote: +> +> +What is the output of: +> +> +cxl list -MDTu -d decoder0.0 +> +> +...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +at least not in the specified order, or that validation check is broken. +Command "cxl list -MDTu -d decoder0.0" output: + +[ + { + "memdevs":[ + { + "memdev":"mem2", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:11:00.0" + }, + { + "memdev":"mem1", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:10:00.0" + }, + { + "memdev":"mem0", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:0f:00.0" + }, + { + "memdev":"mem3", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:12:00.0" + } + ] + }, + { + "root decoders":[ + { + "decoder":"decoder0.0", + "resource":"0x10000000000", + "size":"4.00 GiB (4.29 GB)", + "pmem_capable":true, + "volatile_capable":true, + "accelmem_capable":true, + "nr_targets":1, + "targets":[ + { + "target":"ACPI0016:01", + "alias":"pci0000:0c", + "position":0, + "id":"0xc" + } + ] + } + ] + } +] + +Bobo WL wrote: +> +Hi Dan, +> +> +Thanks for your reply! +> +> +On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> +> What is the output of: +> +> +> +> cxl list -MDTu -d decoder0.0 +> +> +> +> ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> at least not in the specified order, or that validation check is broken. +> +> +Command "cxl list -MDTu -d decoder0.0" output: +Thanks for this, I think I know the problem, but will try some +experiments with cxl_test first. + +Did the commit_store() crash stop reproducing with latest cxl/preview +branch? + +On Tue, Aug 9, 2022 at 11:17 PM Dan Williams <dan.j.williams@intel.com> wrote: +> +> +Bobo WL wrote: +> +> Hi Dan, +> +> +> +> Thanks for your reply! +> +> +> +> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> wrote: +> +> > +> +> > What is the output of: +> +> > +> +> > cxl list -MDTu -d decoder0.0 +> +> > +> +> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > at least not in the specified order, or that validation check is broken. +> +> +> +> Command "cxl list -MDTu -d decoder0.0" output: +> +> +Thanks for this, I think I know the problem, but will try some +> +experiments with cxl_test first. +> +> +Did the commit_store() crash stop reproducing with latest cxl/preview +> +branch? +No, still hitting this bug if don't add extra HB device in qemu + +Dan Williams wrote: +> +Bobo WL wrote: +> +> Hi Dan, +> +> +> +> Thanks for your reply! +> +> +> +> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> wrote: +> +> > +> +> > What is the output of: +> +> > +> +> > cxl list -MDTu -d decoder0.0 +> +> > +> +> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > at least not in the specified order, or that validation check is broken. +> +> +> +> Command "cxl list -MDTu -d decoder0.0" output: +> +> +Thanks for this, I think I know the problem, but will try some +> +experiments with cxl_test first. +Hmm, so my cxl_test experiment unfortunately passed so I'm not +reproducing the failure mode. This is the result of creating x4 region +with devices directly attached to a single host-bridge: + +# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30)) +{ + "region":"region8", + "resource":"0xf1f0000000", + "size":"1024.00 MiB (1073.74 MB)", + "interleave_ways":4, + "interleave_granularity":256, + "decode_state":"commit", + "mappings":[ + { + "position":3, + "memdev":"mem11", + "decoder":"decoder21.0" + }, + { + "position":2, + "memdev":"mem9", + "decoder":"decoder19.0" + }, + { + "position":1, + "memdev":"mem10", + "decoder":"decoder20.0" + }, + { + "position":0, + "memdev":"mem12", + "decoder":"decoder22.0" + } + ] +} +cxl region: cmd_create_region: created 1 region + +> +Did the commit_store() crash stop reproducing with latest cxl/preview +> +branch? +I missed the answer to this question. + +All of these changes are now in Linus' tree perhaps give that a try and +post the debug log again? + +On Thu, 11 Aug 2022 17:46:55 -0700 +Dan Williams <dan.j.williams@intel.com> wrote: + +> +Dan Williams wrote: +> +> Bobo WL wrote: +> +> > Hi Dan, +> +> > +> +> > Thanks for your reply! +> +> > +> +> > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> > wrote: +> +> > > +> +> > > What is the output of: +> +> > > +> +> > > cxl list -MDTu -d decoder0.0 +> +> > > +> +> > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > > at least not in the specified order, or that validation check is +> +> > > broken. +> +> > +> +> > Command "cxl list -MDTu -d decoder0.0" output: +> +> +> +> Thanks for this, I think I know the problem, but will try some +> +> experiments with cxl_test first. +> +> +Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +reproducing the failure mode. This is the result of creating x4 region +> +with devices directly attached to a single host-bridge: +> +> +# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30)) +> +{ +> +"region":"region8", +> +"resource":"0xf1f0000000", +> +"size":"1024.00 MiB (1073.74 MB)", +> +"interleave_ways":4, +> +"interleave_granularity":256, +> +"decode_state":"commit", +> +"mappings":[ +> +{ +> +"position":3, +> +"memdev":"mem11", +> +"decoder":"decoder21.0" +> +}, +> +{ +> +"position":2, +> +"memdev":"mem9", +> +"decoder":"decoder19.0" +> +}, +> +{ +> +"position":1, +> +"memdev":"mem10", +> +"decoder":"decoder20.0" +> +}, +> +{ +> +"position":0, +> +"memdev":"mem12", +> +"decoder":"decoder22.0" +> +} +> +] +> +} +> +cxl region: cmd_create_region: created 1 region +> +> +> Did the commit_store() crash stop reproducing with latest cxl/preview +> +> branch? +> +> +I missed the answer to this question. +> +> +All of these changes are now in Linus' tree perhaps give that a try and +> +post the debug log again? +Hi Dan, + +I've moved onto looking at this one. +1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that +up +at some stage), 1 switch, 4 downstream switch ports each with a type 3 + +I'm not getting a crash, but can't successfully setup a region. +Upon adding the final target +It's failing in check_last_peer() as pos < distance. +Seems distance is 4 which makes me think it's using the wrong level of the +heirarchy for +some reason or that distance check is wrong. +Wasn't a good idea to just skip that step though as it goes boom - though +stack trace is not useful. + +Jonathan + +On Wed, 17 Aug 2022 17:16:19 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Thu, 11 Aug 2022 17:46:55 -0700 +> +Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> Dan Williams wrote: +> +> > Bobo WL wrote: +> +> > > Hi Dan, +> +> > > +> +> > > Thanks for your reply! +> +> > > +> +> > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> > > wrote: +> +> > > > +> +> > > > What is the output of: +> +> > > > +> +> > > > cxl list -MDTu -d decoder0.0 +> +> > > > +> +> > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > > > at least not in the specified order, or that validation check is +> +> > > > broken. +> +> > > +> +> > > Command "cxl list -MDTu -d decoder0.0" output: +> +> > +> +> > Thanks for this, I think I know the problem, but will try some +> +> > experiments with cxl_test first. +> +> +> +> Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +> reproducing the failure mode. This is the result of creating x4 region +> +> with devices directly attached to a single host-bridge: +> +> +> +> # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s +> +> $((1<<30)) +> +> { +> +> "region":"region8", +> +> "resource":"0xf1f0000000", +> +> "size":"1024.00 MiB (1073.74 MB)", +> +> "interleave_ways":4, +> +> "interleave_granularity":256, +> +> "decode_state":"commit", +> +> "mappings":[ +> +> { +> +> "position":3, +> +> "memdev":"mem11", +> +> "decoder":"decoder21.0" +> +> }, +> +> { +> +> "position":2, +> +> "memdev":"mem9", +> +> "decoder":"decoder19.0" +> +> }, +> +> { +> +> "position":1, +> +> "memdev":"mem10", +> +> "decoder":"decoder20.0" +> +> }, +> +> { +> +> "position":0, +> +> "memdev":"mem12", +> +> "decoder":"decoder22.0" +> +> } +> +> ] +> +> } +> +> cxl region: cmd_create_region: created 1 region +> +> +> +> > Did the commit_store() crash stop reproducing with latest cxl/preview +> +> > branch? +> +> +> +> I missed the answer to this question. +> +> +> +> All of these changes are now in Linus' tree perhaps give that a try and +> +> post the debug log again? +> +> +Hi Dan, +> +> +I've moved onto looking at this one. +> +1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy +> +that up +> +at some stage), 1 switch, 4 downstream switch ports each with a type 3 +> +> +I'm not getting a crash, but can't successfully setup a region. +> +Upon adding the final target +> +It's failing in check_last_peer() as pos < distance. +> +Seems distance is 4 which makes me think it's using the wrong level of the +> +heirarchy for +> +some reason or that distance check is wrong. +> +Wasn't a good idea to just skip that step though as it goes boom - though +> +stack trace is not useful. +Turns out really weird corruption happens if you accidentally back two type3 +devices +with the same memory device. Who would have thought it :) + +That aside ignoring the check_last_peer() failure seems to make everything work +for this +topology. I'm not seeing the crash, so my guess is we fixed it somewhere along +the way. + +Now for the fun one. I've replicated the crash if we have + +1HB 1*RP 1SW, 4SW-DSP, 4Type3 + +Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be +programmed +but the null pointer dereference isn't related to that. + +The bug is straight forward. Not all decoders have commit callbacks... Will +send out +a possible fix shortly. + +Jonathan + + + +> +> +Jonathan +> +> +> +> +> +> + +On Thu, 18 Aug 2022 17:37:40 +0100 +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: + +> +On Wed, 17 Aug 2022 17:16:19 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Thu, 11 Aug 2022 17:46:55 -0700 +> +> Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> +> > Dan Williams wrote: +> +> > > Bobo WL wrote: +> +> > > > Hi Dan, +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams +> +> > > > <dan.j.williams@intel.com> wrote: +> +> > > > > +> +> > > > > What is the output of: +> +> > > > > +> +> > > > > cxl list -MDTu -d decoder0.0 +> +> > > > > +> +> > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, +> +> > > > > or +> +> > > > > at least not in the specified order, or that validation check is +> +> > > > > broken. +> +> > > > +> +> > > > Command "cxl list -MDTu -d decoder0.0" output: +> +> > > +> +> > > Thanks for this, I think I know the problem, but will try some +> +> > > experiments with cxl_test first. +> +> > +> +> > Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +> > reproducing the failure mode. This is the result of creating x4 region +> +> > with devices directly attached to a single host-bridge: +> +> > +> +> > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s +> +> > $((1<<30)) +> +> > { +> +> > "region":"region8", +> +> > "resource":"0xf1f0000000", +> +> > "size":"1024.00 MiB (1073.74 MB)", +> +> > "interleave_ways":4, +> +> > "interleave_granularity":256, +> +> > "decode_state":"commit", +> +> > "mappings":[ +> +> > { +> +> > "position":3, +> +> > "memdev":"mem11", +> +> > "decoder":"decoder21.0" +> +> > }, +> +> > { +> +> > "position":2, +> +> > "memdev":"mem9", +> +> > "decoder":"decoder19.0" +> +> > }, +> +> > { +> +> > "position":1, +> +> > "memdev":"mem10", +> +> > "decoder":"decoder20.0" +> +> > }, +> +> > { +> +> > "position":0, +> +> > "memdev":"mem12", +> +> > "decoder":"decoder22.0" +> +> > } +> +> > ] +> +> > } +> +> > cxl region: cmd_create_region: created 1 region +> +> > +> +> > > Did the commit_store() crash stop reproducing with latest cxl/preview +> +> > > branch? +> +> > +> +> > I missed the answer to this question. +> +> > +> +> > All of these changes are now in Linus' tree perhaps give that a try and +> +> > post the debug log again? +> +> +> +> Hi Dan, +> +> +> +> I've moved onto looking at this one. +> +> 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy +> +> that up +> +> at some stage), 1 switch, 4 downstream switch ports each with a type 3 +> +> +> +> I'm not getting a crash, but can't successfully setup a region. +> +> Upon adding the final target +> +> It's failing in check_last_peer() as pos < distance. +> +> Seems distance is 4 which makes me think it's using the wrong level of the +> +> heirarchy for +> +> some reason or that distance check is wrong. +> +> Wasn't a good idea to just skip that step though as it goes boom - though +> +> stack trace is not useful. +> +> +Turns out really weird corruption happens if you accidentally back two type3 +> +devices +> +with the same memory device. Who would have thought it :) +> +> +That aside ignoring the check_last_peer() failure seems to make everything +> +work for this +> +topology. I'm not seeing the crash, so my guess is we fixed it somewhere +> +along the way. +> +> +Now for the fun one. I've replicated the crash if we have +> +> +1HB 1*RP 1SW, 4SW-DSP, 4Type3 +> +> +Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be +> +programmed +> +but the null pointer dereference isn't related to that. +> +> +The bug is straight forward. Not all decoders have commit callbacks... Will +> +send out +> +a possible fix shortly. +> +For completeness I'm carrying this hack because I haven't gotten my head +around the right fix for check_last_peer() failing on this test topology. + +diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c +index c49d9a5f1091..275e143bd748 100644 +--- a/drivers/cxl/core/region.c ++++ b/drivers/cxl/core/region.c +@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, + rc = check_last_peer(cxled, ep, cxl_rr, + distance); + if (rc) +- return rc; ++ // return rc; + goto out_target_set; + } + goto add_target; +-- + +I might find more bugs with more testing, but this is all the ones I've +seen so far + in Bobo's reports. Qemu fixes are now in upstream so +will be there in the release. + +As a reminder, testing on QEMU has a few corners... + +Need a patch to add serial number ECAP support. It is on list for revew, +but will have wait for after QEMU 7.1 release (which may be next week) + +QEMU still assumes HDM decoder on the host bridge will be programmed. +So if you want anything to work there should be at least +2 RP below the HB (no need to plug anything in to one of them). + +I don't want to add a commandline parameter to hide the decoder in QEMU +and detecting there is only one RP would require moving a bunch of static +stuff into runtime code (I think). + +I still think we should make the kernel check to see if there is a decoder, +but if not I might see how bad a hack it is to have QEMU ignore that decoder +if not committed in this one special case (HB HDM decoder with only one place +it can send stuff). Obviously that would be a break from specification +so less than idea! + +Thanks, + +Jonathan + +On Fri, 19 Aug 2022 09:46:55 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Thu, 18 Aug 2022 17:37:40 +0100 +> +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> On Wed, 17 Aug 2022 17:16:19 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Thu, 11 Aug 2022 17:46:55 -0700 +> +> > Dan Williams <dan.j.williams@intel.com> wrote: +> +> > +> +> > > Dan Williams wrote: +> +> > > > Bobo WL wrote: +> +> > > > > Hi Dan, +> +> > > > > +> +> > > > > Thanks for your reply! +> +> > > > > +> +> > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams +> +> > > > > <dan.j.williams@intel.com> wrote: +> +> > > > > > +> +> > > > > > What is the output of: +> +> > > > > > +> +> > > > > > cxl list -MDTu -d decoder0.0 +> +> > > > > > +> +> > > > > > ...? It might be the case that mem1 cannot be mapped by +> +> > > > > > decoder0.0, or +> +> > > > > > at least not in the specified order, or that validation check is +> +> > > > > > broken. +> +> > > > > +> +> > > > > Command "cxl list -MDTu -d decoder0.0" output: +> +> > > > +> +> > > > Thanks for this, I think I know the problem, but will try some +> +> > > > experiments with cxl_test first. +> +> > > +> +> > > Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +> > > reproducing the failure mode. This is the result of creating x4 region +> +> > > with devices directly attached to a single host-bridge: +> +> > > +> +> > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s +> +> > > $((1<<30)) +> +> > > { +> +> > > "region":"region8", +> +> > > "resource":"0xf1f0000000", +> +> > > "size":"1024.00 MiB (1073.74 MB)", +> +> > > "interleave_ways":4, +> +> > > "interleave_granularity":256, +> +> > > "decode_state":"commit", +> +> > > "mappings":[ +> +> > > { +> +> > > "position":3, +> +> > > "memdev":"mem11", +> +> > > "decoder":"decoder21.0" +> +> > > }, +> +> > > { +> +> > > "position":2, +> +> > > "memdev":"mem9", +> +> > > "decoder":"decoder19.0" +> +> > > }, +> +> > > { +> +> > > "position":1, +> +> > > "memdev":"mem10", +> +> > > "decoder":"decoder20.0" +> +> > > }, +> +> > > { +> +> > > "position":0, +> +> > > "memdev":"mem12", +> +> > > "decoder":"decoder22.0" +> +> > > } +> +> > > ] +> +> > > } +> +> > > cxl region: cmd_create_region: created 1 region +> +> > > +> +> > > > Did the commit_store() crash stop reproducing with latest cxl/preview +> +> > > > branch? +> +> > > +> +> > > I missed the answer to this question. +> +> > > +> +> > > All of these changes are now in Linus' tree perhaps give that a try and +> +> > > post the debug log again? +> +> > +> +> > Hi Dan, +> +> > +> +> > I've moved onto looking at this one. +> +> > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy +> +> > that up +> +> > at some stage), 1 switch, 4 downstream switch ports each with a type 3 +> +> > +> +> > I'm not getting a crash, but can't successfully setup a region. +> +> > Upon adding the final target +> +> > It's failing in check_last_peer() as pos < distance. +> +> > Seems distance is 4 which makes me think it's using the wrong level of +> +> > the heirarchy for +> +> > some reason or that distance check is wrong. +> +> > Wasn't a good idea to just skip that step though as it goes boom - though +> +> > stack trace is not useful. +> +> +> +> Turns out really weird corruption happens if you accidentally back two +> +> type3 devices +> +> with the same memory device. Who would have thought it :) +> +> +> +> That aside ignoring the check_last_peer() failure seems to make everything +> +> work for this +> +> topology. I'm not seeing the crash, so my guess is we fixed it somewhere +> +> along the way. +> +> +> +> Now for the fun one. I've replicated the crash if we have +> +> +> +> 1HB 1*RP 1SW, 4SW-DSP, 4Type3 +> +> +> +> Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be +> +> programmed +> +> but the null pointer dereference isn't related to that. +> +> +> +> The bug is straight forward. Not all decoders have commit callbacks... +> +> Will send out +> +> a possible fix shortly. +> +> +> +For completeness I'm carrying this hack because I haven't gotten my head +> +around the right fix for check_last_peer() failing on this test topology. +> +> +diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c +> +index c49d9a5f1091..275e143bd748 100644 +> +--- a/drivers/cxl/core/region.c +> ++++ b/drivers/cxl/core/region.c +> +@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, +> +rc = check_last_peer(cxled, ep, cxl_rr, +> +distance); +> +if (rc) +> +- return rc; +> ++ // return rc; +> +goto out_target_set; +> +} +> +goto add_target; +I'm still carrying this hack and still haven't worked out the right fix. + +Suggestions welcome! If not I'll hopefully get some time on this +towards the end of the week. + +Jonathan + diff --git a/classification_output/01/instruction/2880487 b/classification_output/01/instruction/2880487 new file mode 100644 index 000000000..1d455d6fa --- /dev/null +++ b/classification_output/01/instruction/2880487 @@ -0,0 +1,187 @@ +instruction: 0.925 +semantic: 0.924 +other: 0.894 +mistranslation: 0.826 + +[BUG] AArch64 boot hang with -icount and -smp >1 (iothread locking issue?) + +Hello, + +I am encountering one or more bugs when using -icount and -smp >1 that I am +attempting to sort out. My current theory is that it is an iothread locking +issue. + +I am using a command-line like the following where $kernel is a recent upstream +AArch64 Linux kernel Image (I can provide a binary if that would be helpful - +let me know how is best to post): + + qemu-system-aarch64 \ + -M virt -cpu cortex-a57 -m 1G \ + -nographic \ + -smp 2 \ + -icount 0 \ + -kernel $kernel + +For any/all of the symptoms described below, they seem to disappear when I +either remove `-icount 0` or change smp to `-smp 1`. In other words, it is the +combination of `-smp >1` and `-icount` which triggers what I'm seeing. + +I am seeing two different (but seemingly related) behaviors. The first (and +what I originally started debugging) shows up as a boot hang. When booting +using the above command after Peter's "icount: Take iothread lock when running +QEMU timers" patch [1], The kernel boots for a while and then hangs after: + +> +...snip... +> +[ 0.010764] Serial: AMBA PL011 UART driver +> +[ 0.016334] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 13, base_baud +> += 0) is a PL011 rev1 +> +[ 0.016907] printk: console [ttyAMA0] enabled +> +[ 0.017624] KASLR enabled +> +[ 0.031986] HugeTLB: registered 16.0 GiB page size, pre-allocated 0 pages +> +[ 0.031986] HugeTLB: 16320 KiB vmemmap can be freed for a 16.0 GiB page +> +[ 0.031986] HugeTLB: registered 512 MiB page size, pre-allocated 0 pages +> +[ 0.031986] HugeTLB: 448 KiB vmemmap can be freed for a 512 MiB page +> +[ 0.031986] HugeTLB: registered 2.00 MiB page size, pre-allocated 0 pages +> +[ 0.031986] HugeTLB: 0 KiB vmemmap can be freed for a 2.00 MiB page +When it hangs here, I drop into QEMU's console, attach to the gdbserver, and it +always reports that it is at address 0xffff800008dc42e8 (as shown below from an +objdump of the vmlinux). I note this is in the middle of messing with timer +system registers - which makes me suspect we're attempting to take the iothread +lock when its already held: + +> +ffff800008dc42b8 <arch_timer_set_next_event_virt>: +> +ffff800008dc42b8: d503201f nop +> +ffff800008dc42bc: d503201f nop +> +ffff800008dc42c0: d503233f paciasp +> +ffff800008dc42c4: d53be321 mrs x1, cntv_ctl_el0 +> +ffff800008dc42c8: 32000021 orr w1, w1, #0x1 +> +ffff800008dc42cc: d5033fdf isb +> +ffff800008dc42d0: d53be042 mrs x2, cntvct_el0 +> +ffff800008dc42d4: ca020043 eor x3, x2, x2 +> +ffff800008dc42d8: 8b2363e3 add x3, sp, x3 +> +ffff800008dc42dc: f940007f ldr xzr, [x3] +> +ffff800008dc42e0: 8b020000 add x0, x0, x2 +> +ffff800008dc42e4: d51be340 msr cntv_cval_el0, x0 +> +* ffff800008dc42e8: 927ef820 and x0, x1, #0xfffffffffffffffd +> +ffff800008dc42ec: d51be320 msr cntv_ctl_el0, x0 +> +ffff800008dc42f0: d5033fdf isb +> +ffff800008dc42f4: 52800000 mov w0, #0x0 +> +// #0 +> +ffff800008dc42f8: d50323bf autiasp +> +ffff800008dc42fc: d65f03c0 ret +The second behavior is that prior to Peter's "icount: Take iothread lock when +running QEMU timers" patch [1], I observe the following message (same command +as above): + +> +ERROR:../accel/tcg/tcg-accel-ops.c:79:tcg_handle_interrupt: assertion failed: +> +(qemu_mutex_iothread_locked()) +> +Aborted (core dumped) +This is the same behavior described in Gitlab issue 1130 [0] and addressed by +[1]. I bisected the appearance of this assertion, and found it was introduced +by Pavel's "replay: rewrite async event handling" commit [2]. Commits prior to +that one boot successfully (neither assertions nor hangs) with `-icount 0 -smp +2`. + +I've looked over these two commits ([1], [2]), but it is not obvious to me +how/why they might be interacting to produce the boot hangs I'm seeing and +I welcome any help investigating further. + +Thanks! + +-Aaron Lindsay + +[0] - +https://gitlab.com/qemu-project/qemu/-/issues/1130 +[1] - +https://gitlab.com/qemu-project/qemu/-/commit/c7f26ded6d5065e4116f630f6a490b55f6c5f58e +[2] - +https://gitlab.com/qemu-project/qemu/-/commit/60618e2d77691e44bb78e23b2b0cf07b5c405e56 + +On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay +<aaron@os.amperecomputing.com> wrote: +> +> +Hello, +> +> +I am encountering one or more bugs when using -icount and -smp >1 that I am +> +attempting to sort out. My current theory is that it is an iothread locking +> +issue. +Weird coincidence, that is a bug that's been in the tree for months +but was only reported to me earlier this week. Try reverting +commit a82fd5a4ec24d923ff1e -- that should fix it. +CAFEAcA_i8x00hD-4XX18ySLNbCB6ds1-DSazVb4yDnF8skjd9A@mail.gmail.com +/">https://lore.kernel.org/qemu-devel/ +CAFEAcA_i8x00hD-4XX18ySLNbCB6ds1-DSazVb4yDnF8skjd9A@mail.gmail.com +/ +has the explanation. + +thanks +-- PMM + +On Oct 21 17:00, Peter Maydell wrote: +> +On Fri, 21 Oct 2022 at 16:48, Aaron Lindsay +> +<aaron@os.amperecomputing.com> wrote: +> +> +> +> Hello, +> +> +> +> I am encountering one or more bugs when using -icount and -smp >1 that I am +> +> attempting to sort out. My current theory is that it is an iothread locking +> +> issue. +> +> +Weird coincidence, that is a bug that's been in the tree for months +> +but was only reported to me earlier this week. Try reverting +> +commit a82fd5a4ec24d923ff1e -- that should fix it. +I can confirm that reverting a82fd5a4ec24d923ff1e fixes it for me. +Thanks for the help and fast response! + +-Aaron + diff --git a/classification_output/01/instruction/3457423 b/classification_output/01/instruction/3457423 new file mode 100644 index 000000000..ffcf905b4 --- /dev/null +++ b/classification_output/01/instruction/3457423 @@ -0,0 +1,40 @@ +instruction: 0.778 +semantic: 0.635 +mistranslation: 0.537 +other: 0.236 + +[Qemu-devel] [BUG] Failed to compile using gcc7.1 + +Hi all, + +After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. + +The error is: + +------ + CC block/blkdebug.o +block/blkdebug.c: In function 'blkdebug_refresh_filename': +block/blkdebug.c:693:31: error: '%s' directive output may be truncated +writing up to 4095 bytes into a region of size 4086 +[-Werror=format-truncation=] +"blkdebug:%s:%s", s->config_file ?: "", + ^~ +In file included from /usr/include/stdio.h:939:0, + from /home/adam/qemu/include/qemu/osdep.h:68, + from block/blkdebug.c:25: +/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' +output 11 or more bytes (assuming 4106) into a destination of size 4096 +return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, + ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + __bos (__s), __fmt, __va_arg_pack ()); + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +cc1: all warnings being treated as errors +make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +------ + +It seems that gcc 7 is introducing more restrict check for printf. +If using clang, although there are some extra warning, it can at least +pass the compile. +Thanks, +Qu + diff --git a/classification_output/01/instruction/5843372 b/classification_output/01/instruction/5843372 new file mode 100644 index 000000000..784962c9c --- /dev/null +++ b/classification_output/01/instruction/5843372 @@ -0,0 +1,2056 @@ +instruction: 0.818 +other: 0.811 +semantic: 0.793 +mistranslation: 0.758 + +[BUG, RFC] Block graph deadlock on job-dismiss + +Hi all, + +There's a bug in block layer which leads to block graph deadlock. +Notably, it takes place when blockdev IO is processed within a separate +iothread. + +This was initially caught by our tests, and I was able to reduce it to a +relatively simple reproducer. Such deadlocks are probably supposed to +be covered in iotests/graph-changes-while-io, but this deadlock isn't. + +Basically what the reproducer does is launches QEMU with a drive having +'iothread' option set, creates a chain of 2 snapshots, launches +block-commit job for a snapshot and then dismisses the job, starting +from the lower snapshot. If the guest is issuing IO at the same time, +there's a race in acquiring block graph lock and a potential deadlock. + +Here's how it can be reproduced: + +1. Run QEMU: +> +SRCDIR=/path/to/srcdir +> +> +> +> +> +$SRCDIR/build/qemu-system-x86_64 -enable-kvm \ +> +> +-machine q35 -cpu Nehalem \ +> +> +-name guest=alma8-vm,debug-threads=on \ +> +> +-m 2g -smp 2 \ +> +> +-nographic -nodefaults \ +> +> +-qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ +> +> +-serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ +> +> +-object iothread,id=iothread0 \ +> +> +-blockdev +> +node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 +> +\ +> +-device virtio-blk-pci,drive=disk,iothread=iothread0 +2. Launch IO (random reads) from within the guest: +> +nc -U /var/run/alma8-serial.sock +> +... +> +[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k +> +--size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting +> +--rw=randread --iodepth=1 --filename=/testfile +3. Run snapshots creation & removal of lower snapshot operation in a +loop (script attached): +> +while /bin/true ; do ./remove_lower_snap.sh ; done +And then it occasionally hangs. + +Note: I've tried bisecting this, and looks like deadlock occurs starting +from the following commit: + +(BAD) 5bdbaebcce virtio: Re-enable notifications after drain +(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll + +On the latest v10.0.0 it does hang as well. + + +Here's backtrace of the main thread: + +> +#0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, +> +timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43 +> +#1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, +> +timeout=-1) at ../util/qemu-timer.c:329 +> +#2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, +> +ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 +> +#3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at +> +../util/aio-posix.c:730 +> +#4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, +> +parent=0x0, poll=true) at ../block/io.c:378 +> +#5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at +> +../block/io.c:391 +> +#6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7682 +> +#7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7608 +> +#8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7668 +> +#9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7608 +> +#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7668 +> +#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7608 +> +#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../blockjob.c:157 +> +#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7592 +> +#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7661 +> +#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx +> +(child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = +> +{...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 +> +#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7592 +> +#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, +> +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +errp=0x0) +> +at ../block.c:7661 +> +#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, +> +ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 +> +#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at +> +../block.c:3317 +> +#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at +> +../blockjob.c:209 +> +#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at +> +../blockjob.c:82 +> +#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at +> +../job.c:474 +> +#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at +> +../job.c:771 +> +#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, +> +errp=0x7ffd94b4f488) at ../job.c:783 +> +--Type <RET> for more, q to quit, c to continue without paging-- +> +#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", +> +errp=0x7ffd94b4f488) at ../job-qmp.c:138 +> +#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, +> +ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 +> +#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at +> +../qapi/qmp-dispatch.c:128 +> +#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at +> +../util/async.c:172 +> +#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at +> +../util/async.c:219 +> +#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at +> +../util/aio-posix.c:436 +> +#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, +> +callback=0x0, user_data=0x0) at ../util/async.c:361 +> +#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at +> +../glib/gmain.c:3364 +> +#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 +> +#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 +> +#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at +> +../util/main-loop.c:310 +> +#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at +> +../util/main-loop.c:589 +> +#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 +> +#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at +> +../system/main.c:50 +> +#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at +> +../system/main.c:80 +And here's coroutine trying to acquire read lock: + +> +(gdb) qemu coroutine reader_queue->entries.sqh_first +> +#0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, +> +to_=0x7fc537fff508, action=COROUTINE_YIELD) at +> +../util/coroutine-ucontext.c:321 +> +#1 0x0000557eb47d4d4a in qemu_coroutine_yield () at +> +../util/qemu-coroutine.c:339 +> +#2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 +> +<reader_queue>, lock=0x7fc53c57de50, flags=0) at +> +../util/qemu-coroutine-lock.c:60 +> +#3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231 +> +#4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at +> +/home/root/src/qemu/master/include/block/graph-lock.h:213 +> +#5 0x0000557eb460fa41 in blk_co_do_preadv_part +> +(blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, +> +qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339 +> +#6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at +> +../block/block-backend.c:1619 +> +#7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at +> +../util/coroutine-ucontext.c:175 +> +#8 0x00007fc547c2a360 in __start_context () at +> +../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 +> +#9 0x00007ffd94b4ea40 in () +> +#10 0x0000000000000000 in () +So it looks like main thread is processing job-dismiss request and is +holding write lock taken in block_job_remove_all_bdrv() (frame #20 +above). At the same time iothread spawns a coroutine which performs IO +request. Before the coroutine is spawned, blk_aio_prwv() increases +'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +trying to acquire the read lock. But main thread isn't releasing the +lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +Here's the deadlock. + +Any comments and suggestions on the subject are welcomed. Thanks! + +Andrey +remove_lower_snap.sh +Description: +application/shellscript + +On 4/24/25 8:32 PM, Andrey Drobyshev wrote: +> +Hi all, +> +> +There's a bug in block layer which leads to block graph deadlock. +> +Notably, it takes place when blockdev IO is processed within a separate +> +iothread. +> +> +This was initially caught by our tests, and I was able to reduce it to a +> +relatively simple reproducer. Such deadlocks are probably supposed to +> +be covered in iotests/graph-changes-while-io, but this deadlock isn't. +> +> +Basically what the reproducer does is launches QEMU with a drive having +> +'iothread' option set, creates a chain of 2 snapshots, launches +> +block-commit job for a snapshot and then dismisses the job, starting +> +from the lower snapshot. If the guest is issuing IO at the same time, +> +there's a race in acquiring block graph lock and a potential deadlock. +> +> +Here's how it can be reproduced: +> +> +[...] +> +I took a closer look at iotests/graph-changes-while-io, and have managed +to reproduce the same deadlock in a much simpler setup, without a guest. + +1. Run QSD:> ./build/storage-daemon/qemu-storage-daemon --object +iothread,id=iothread0 \ +> +--blockdev null-co,node-name=node0,read-zeroes=true \ +> +> +--nbd-server addr.type=unix,addr.path=/var/run/qsd_nbd.sock \ +> +> +--export +> +nbd,id=exp0,node-name=node0,iothread=iothread0,fixed-iothread=true,writable=true +> +\ +> +--chardev +> +socket,id=qmp-sock,path=/var/run/qsd_qmp.sock,server=on,wait=off \ +> +--monitor chardev=qmp-sock +2. Launch IO: +> +qemu-img bench -f raw -c 2000000 +> +'nbd+unix:///node0?socket=/var/run/qsd_nbd.sock' +3. Add 2 snapshots and remove lower one (script attached):> while +/bin/true ; do ./rls_qsd.sh ; done + +And then it hangs. + +I'll also send a patch with corresponding test case added directly to +iotests. + +This reproduce seems to be hanging starting from Fiona's commit +67446e605dc ("blockjob: drop AioContext lock before calling +bdrv_graph_wrlock()"). AioContext locks were dropped entirely later on +in Stefan's commit b49f4755c7 ("block: remove AioContext locking"), but +the problem remains. + +Andrey +rls_qsd.sh +Description: +application/shellscript + +From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> + +This case is catching potential deadlock which takes place when job-dismiss +is issued when I/O requests are processed in a separate iothread. + +See +https://mail.gnu.org/archive/html/qemu-devel/2025-04/msg04421.html +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++++++++-- + .../tests/graph-changes-while-io.out | 4 +- + 2 files changed, 96 insertions(+), 9 deletions(-) + +diff --git a/tests/qemu-iotests/tests/graph-changes-while-io +b/tests/qemu-iotests/tests/graph-changes-while-io +index 194fda500e..e30f823da4 100755 +--- a/tests/qemu-iotests/tests/graph-changes-while-io ++++ b/tests/qemu-iotests/tests/graph-changes-while-io +@@ -27,6 +27,8 @@ from iotests import imgfmt, qemu_img, qemu_img_create, +qemu_io, \ + + + top = os.path.join(iotests.test_dir, 'top.img') ++snap1 = os.path.join(iotests.test_dir, 'snap1.img') ++snap2 = os.path.join(iotests.test_dir, 'snap2.img') + nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock') + + +@@ -58,6 +60,15 @@ class TestGraphChangesWhileIO(QMPTestCase): + def tearDown(self) -> None: + self.qsd.stop() + ++ def _wait_for_blockjob(self, status) -> None: ++ done = False ++ while not done: ++ for event in self.qsd.get_qmp().get_events(wait=10.0): ++ if event['event'] != 'JOB_STATUS_CHANGE': ++ continue ++ if event['data']['status'] == status: ++ done = True ++ + def test_blockdev_add_while_io(self) -> None: + # Run qemu-img bench in the background + bench_thr = Thread(target=do_qemu_img_bench) +@@ -116,13 +127,89 @@ class TestGraphChangesWhileIO(QMPTestCase): + 'device': 'job0', + }) + +- cancelled = False +- while not cancelled: +- for event in self.qsd.get_qmp().get_events(wait=10.0): +- if event['event'] != 'JOB_STATUS_CHANGE': +- continue +- if event['data']['status'] == 'null': +- cancelled = True ++ self._wait_for_blockjob('null') ++ ++ bench_thr.join() ++ ++ def test_remove_lower_snapshot_while_io(self) -> None: ++ # Run qemu-img bench in the background ++ bench_thr = Thread(target=do_qemu_img_bench, args=(100000, )) ++ bench_thr.start() ++ ++ # While I/O is performed on 'node0' node, consequently add 2 snapshots ++ # on top of it, then remove (commit) them starting from lower one. ++ while bench_thr.is_alive(): ++ # Recreate snapshot images on every iteration ++ qemu_img_create('-f', imgfmt, snap1, '1G') ++ qemu_img_create('-f', imgfmt, snap2, '1G') ++ ++ self.qsd.cmd('blockdev-add', { ++ 'driver': imgfmt, ++ 'node-name': 'snap1', ++ 'file': { ++ 'driver': 'file', ++ 'filename': snap1 ++ } ++ }) ++ ++ self.qsd.cmd('blockdev-snapshot', { ++ 'node': 'node0', ++ 'overlay': 'snap1', ++ }) ++ ++ self.qsd.cmd('blockdev-add', { ++ 'driver': imgfmt, ++ 'node-name': 'snap2', ++ 'file': { ++ 'driver': 'file', ++ 'filename': snap2 ++ } ++ }) ++ ++ self.qsd.cmd('blockdev-snapshot', { ++ 'node': 'snap1', ++ 'overlay': 'snap2', ++ }) ++ ++ self.qsd.cmd('block-commit', { ++ 'job-id': 'commit-snap1', ++ 'device': 'snap2', ++ 'top-node': 'snap1', ++ 'base-node': 'node0', ++ 'auto-finalize': True, ++ 'auto-dismiss': False, ++ }) ++ ++ self._wait_for_blockjob('concluded') ++ self.qsd.cmd('job-dismiss', { ++ 'id': 'commit-snap1', ++ }) ++ ++ self.qsd.cmd('block-commit', { ++ 'job-id': 'commit-snap2', ++ 'device': 'snap2', ++ 'top-node': 'snap2', ++ 'base-node': 'node0', ++ 'auto-finalize': True, ++ 'auto-dismiss': False, ++ }) ++ ++ self._wait_for_blockjob('ready') ++ self.qsd.cmd('job-complete', { ++ 'id': 'commit-snap2', ++ }) ++ ++ self._wait_for_blockjob('concluded') ++ self.qsd.cmd('job-dismiss', { ++ 'id': 'commit-snap2', ++ }) ++ ++ self.qsd.cmd('blockdev-del', { ++ 'node-name': 'snap1' ++ }) ++ self.qsd.cmd('blockdev-del', { ++ 'node-name': 'snap2' ++ }) + + bench_thr.join() + +diff --git a/tests/qemu-iotests/tests/graph-changes-while-io.out +b/tests/qemu-iotests/tests/graph-changes-while-io.out +index fbc63e62f8..8d7e996700 100644 +--- a/tests/qemu-iotests/tests/graph-changes-while-io.out ++++ b/tests/qemu-iotests/tests/graph-changes-while-io.out +@@ -1,5 +1,5 @@ +-.. ++... + ---------------------------------------------------------------------- +-Ran 2 tests ++Ran 3 tests + + OK +-- +2.43.5 + +Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: +> +So it looks like main thread is processing job-dismiss request and is +> +holding write lock taken in block_job_remove_all_bdrv() (frame #20 +> +above). At the same time iothread spawns a coroutine which performs IO +> +request. Before the coroutine is spawned, blk_aio_prwv() increases +> +'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +> +trying to acquire the read lock. But main thread isn't releasing the +> +lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +> +Here's the deadlock. +And for the IO test you provided, it's client->nb_requests that behaves +similarly to blk->in_flight here. + +The issue also reproduces easily when issuing the following QMP command +in a loop while doing IO on a device: + +> +void qmp_block_locked_drain(const char *node_name, Error **errp) +> +{ +> +BlockDriverState *bs; +> +> +bs = bdrv_find_node(node_name); +> +if (!bs) { +> +error_setg(errp, "node not found"); +> +return; +> +} +> +> +bdrv_graph_wrlock(); +> +bdrv_drained_begin(bs); +> +bdrv_drained_end(bs); +> +bdrv_graph_wrunlock(); +> +} +It seems like either it would be necessary to require: +1. not draining inside an exclusively locked section +or +2. making sure that variables used by drained_poll routines are only set +while holding the reader lock +? + +Those seem to require rather involved changes, so a third option might +be to make draining inside an exclusively locked section possible, by +embedding such locked sections in a drained section: + +> +diff --git a/blockjob.c b/blockjob.c +> +index 32007f31a9..9b2f3b3ea9 100644 +> +--- a/blockjob.c +> ++++ b/blockjob.c +> +@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) +> +* one to make sure that such a concurrent access does not attempt +> +* to process an already freed BdrvChild. +> +*/ +> ++ bdrv_drain_all_begin(); +> +bdrv_graph_wrlock(); +> +while (job->nodes) { +> +GSList *l = job->nodes; +> +@@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) +> +g_slist_free_1(l); +> +} +> +bdrv_graph_wrunlock(); +> ++ bdrv_drain_all_end(); +> +} +> +> +bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) +This seems to fix the issue at hand. I can send a patch if this is +considered an acceptable approach. + +Best Regards, +Fiona + +On 4/30/25 11:47 AM, Fiona Ebner wrote: +> +Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: +> +> So it looks like main thread is processing job-dismiss request and is +> +> holding write lock taken in block_job_remove_all_bdrv() (frame #20 +> +> above). At the same time iothread spawns a coroutine which performs IO +> +> request. Before the coroutine is spawned, blk_aio_prwv() increases +> +> 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +> +> trying to acquire the read lock. But main thread isn't releasing the +> +> lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +> +> Here's the deadlock. +> +> +And for the IO test you provided, it's client->nb_requests that behaves +> +similarly to blk->in_flight here. +> +> +The issue also reproduces easily when issuing the following QMP command +> +in a loop while doing IO on a device: +> +> +> void qmp_block_locked_drain(const char *node_name, Error **errp) +> +> { +> +> BlockDriverState *bs; +> +> +> +> bs = bdrv_find_node(node_name); +> +> if (!bs) { +> +> error_setg(errp, "node not found"); +> +> return; +> +> } +> +> +> +> bdrv_graph_wrlock(); +> +> bdrv_drained_begin(bs); +> +> bdrv_drained_end(bs); +> +> bdrv_graph_wrunlock(); +> +> } +> +> +It seems like either it would be necessary to require: +> +1. not draining inside an exclusively locked section +> +or +> +2. making sure that variables used by drained_poll routines are only set +> +while holding the reader lock +> +? +> +> +Those seem to require rather involved changes, so a third option might +> +be to make draining inside an exclusively locked section possible, by +> +embedding such locked sections in a drained section: +> +> +> diff --git a/blockjob.c b/blockjob.c +> +> index 32007f31a9..9b2f3b3ea9 100644 +> +> --- a/blockjob.c +> +> +++ b/blockjob.c +> +> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) +> +> * one to make sure that such a concurrent access does not attempt +> +> * to process an already freed BdrvChild. +> +> */ +> +> + bdrv_drain_all_begin(); +> +> bdrv_graph_wrlock(); +> +> while (job->nodes) { +> +> GSList *l = job->nodes; +> +> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) +> +> g_slist_free_1(l); +> +> } +> +> bdrv_graph_wrunlock(); +> +> + bdrv_drain_all_end(); +> +> } +> +> +> +> bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) +> +> +This seems to fix the issue at hand. I can send a patch if this is +> +considered an acceptable approach. +> +> +Best Regards, +> +Fiona +> +Hello Fiona, + +Thanks for looking into it. I've tried your 3rd option above and can +confirm it does fix the deadlock, at least I can't reproduce it. Other +iotests also don't seem to be breaking. So I personally am fine with +that patch. Would be nice to hear a word from the maintainers though on +whether there're any caveats with such approach. + +Andrey + +On Wed, Apr 30, 2025 at 10:11â¯AM Andrey Drobyshev +<andrey.drobyshev@virtuozzo.com> wrote: +> +> +On 4/30/25 11:47 AM, Fiona Ebner wrote: +> +> Am 24.04.25 um 19:32 schrieb Andrey Drobyshev: +> +>> So it looks like main thread is processing job-dismiss request and is +> +>> holding write lock taken in block_job_remove_all_bdrv() (frame #20 +> +>> above). At the same time iothread spawns a coroutine which performs IO +> +>> request. Before the coroutine is spawned, blk_aio_prwv() increases +> +>> 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +> +>> trying to acquire the read lock. But main thread isn't releasing the +> +>> lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +> +>> Here's the deadlock. +> +> +> +> And for the IO test you provided, it's client->nb_requests that behaves +> +> similarly to blk->in_flight here. +> +> +> +> The issue also reproduces easily when issuing the following QMP command +> +> in a loop while doing IO on a device: +> +> +> +>> void qmp_block_locked_drain(const char *node_name, Error **errp) +> +>> { +> +>> BlockDriverState *bs; +> +>> +> +>> bs = bdrv_find_node(node_name); +> +>> if (!bs) { +> +>> error_setg(errp, "node not found"); +> +>> return; +> +>> } +> +>> +> +>> bdrv_graph_wrlock(); +> +>> bdrv_drained_begin(bs); +> +>> bdrv_drained_end(bs); +> +>> bdrv_graph_wrunlock(); +> +>> } +> +> +> +> It seems like either it would be necessary to require: +> +> 1. not draining inside an exclusively locked section +> +> or +> +> 2. making sure that variables used by drained_poll routines are only set +> +> while holding the reader lock +> +> ? +> +> +> +> Those seem to require rather involved changes, so a third option might +> +> be to make draining inside an exclusively locked section possible, by +> +> embedding such locked sections in a drained section: +> +> +> +>> diff --git a/blockjob.c b/blockjob.c +> +>> index 32007f31a9..9b2f3b3ea9 100644 +> +>> --- a/blockjob.c +> +>> +++ b/blockjob.c +> +>> @@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job) +> +>> * one to make sure that such a concurrent access does not attempt +> +>> * to process an already freed BdrvChild. +> +>> */ +> +>> + bdrv_drain_all_begin(); +> +>> bdrv_graph_wrlock(); +> +>> while (job->nodes) { +> +>> GSList *l = job->nodes; +> +>> @@ -211,6 +212,7 @@ void block_job_remove_all_bdrv(BlockJob *job) +> +>> g_slist_free_1(l); +> +>> } +> +>> bdrv_graph_wrunlock(); +> +>> + bdrv_drain_all_end(); +> +>> } +> +>> +> +>> bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs) +> +> +> +> This seems to fix the issue at hand. I can send a patch if this is +> +> considered an acceptable approach. +Kevin is aware of this thread but it's a public holiday tomorrow so it +may be a little longer. + +Stefan + +Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: +> +Hi all, +> +> +There's a bug in block layer which leads to block graph deadlock. +> +Notably, it takes place when blockdev IO is processed within a separate +> +iothread. +> +> +This was initially caught by our tests, and I was able to reduce it to a +> +relatively simple reproducer. Such deadlocks are probably supposed to +> +be covered in iotests/graph-changes-while-io, but this deadlock isn't. +> +> +Basically what the reproducer does is launches QEMU with a drive having +> +'iothread' option set, creates a chain of 2 snapshots, launches +> +block-commit job for a snapshot and then dismisses the job, starting +> +from the lower snapshot. If the guest is issuing IO at the same time, +> +there's a race in acquiring block graph lock and a potential deadlock. +> +> +Here's how it can be reproduced: +> +> +1. Run QEMU: +> +> SRCDIR=/path/to/srcdir +> +> +> +> +> +> +> +> +> +> $SRCDIR/build/qemu-system-x86_64 -enable-kvm \ +> +> +> +> -machine q35 -cpu Nehalem \ +> +> +> +> -name guest=alma8-vm,debug-threads=on \ +> +> +> +> -m 2g -smp 2 \ +> +> +> +> -nographic -nodefaults \ +> +> +> +> -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ +> +> +> +> -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ +> +> +> +> -object iothread,id=iothread0 \ +> +> +> +> -blockdev +> +> node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 +> +> \ +> +> -device virtio-blk-pci,drive=disk,iothread=iothread0 +> +> +2. Launch IO (random reads) from within the guest: +> +> nc -U /var/run/alma8-serial.sock +> +> ... +> +> [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k +> +> --size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting +> +> --rw=randread --iodepth=1 --filename=/testfile +> +> +3. Run snapshots creation & removal of lower snapshot operation in a +> +loop (script attached): +> +> while /bin/true ; do ./remove_lower_snap.sh ; done +> +> +And then it occasionally hangs. +> +> +Note: I've tried bisecting this, and looks like deadlock occurs starting +> +from the following commit: +> +> +(BAD) 5bdbaebcce virtio: Re-enable notifications after drain +> +(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll +> +> +On the latest v10.0.0 it does hang as well. +> +> +> +Here's backtrace of the main thread: +> +> +> #0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, +> +> timeout=<optimized out>, sigmask=0x0) at +> +> ../sysdeps/unix/sysv/linux/ppoll.c:43 +> +> #1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, +> +> timeout=-1) at ../util/qemu-timer.c:329 +> +> #2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, +> +> ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 +> +> #3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at +> +> ../util/aio-posix.c:730 +> +> #4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, +> +> parent=0x0, poll=true) at ../block/io.c:378 +> +> #5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at +> +> ../block/io.c:391 +> +> #6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7682 +> +> #7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7608 +> +> #8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7668 +> +> #9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7608 +> +> #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7668 +> +> #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7608 +> +> #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../blockjob.c:157 +> +> #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7592 +> +> #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7661 +> +> #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx +> +> (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = +> +> {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 +> +> #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7592 +> +> #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, +> +> ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +> +> errp=0x0) +> +> at ../block.c:7661 +> +> #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, +> +> ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 +> +> #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at +> +> ../block.c:3317 +> +> #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at +> +> ../blockjob.c:209 +> +> #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at +> +> ../blockjob.c:82 +> +> #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at +> +> ../job.c:474 +> +> #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at +> +> ../job.c:771 +> +> #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, +> +> errp=0x7ffd94b4f488) at ../job.c:783 +> +> --Type <RET> for more, q to quit, c to continue without paging-- +> +> #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 +> +> "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138 +> +> #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, +> +> ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 +> +> #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at +> +> ../qapi/qmp-dispatch.c:128 +> +> #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at +> +> ../util/async.c:172 +> +> #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at +> +> ../util/async.c:219 +> +> #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at +> +> ../util/aio-posix.c:436 +> +> #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, +> +> callback=0x0, user_data=0x0) at ../util/async.c:361 +> +> #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at +> +> ../glib/gmain.c:3364 +> +> #33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 +> +> #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 +> +> #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at +> +> ../util/main-loop.c:310 +> +> #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at +> +> ../util/main-loop.c:589 +> +> #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 +> +> #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at +> +> ../system/main.c:50 +> +> #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at +> +> ../system/main.c:80 +> +> +> +And here's coroutine trying to acquire read lock: +> +> +> (gdb) qemu coroutine reader_queue->entries.sqh_first +> +> #0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, +> +> to_=0x7fc537fff508, action=COROUTINE_YIELD) at +> +> ../util/coroutine-ucontext.c:321 +> +> #1 0x0000557eb47d4d4a in qemu_coroutine_yield () at +> +> ../util/qemu-coroutine.c:339 +> +> #2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 +> +> <reader_queue>, lock=0x7fc53c57de50, flags=0) at +> +> ../util/qemu-coroutine-lock.c:60 +> +> #3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at +> +> ../block/graph-lock.c:231 +> +> #4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at +> +> /home/root/src/qemu/master/include/block/graph-lock.h:213 +> +> #5 0x0000557eb460fa41 in blk_co_do_preadv_part +> +> (blk=0x557eb84c0810, offset=6890553344, bytes=4096, +> +> qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at +> +> ../block/block-backend.c:1339 +> +> #6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at +> +> ../block/block-backend.c:1619 +> +> #7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) +> +> at ../util/coroutine-ucontext.c:175 +> +> #8 0x00007fc547c2a360 in __start_context () at +> +> ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 +> +> #9 0x00007ffd94b4ea40 in () +> +> #10 0x0000000000000000 in () +> +> +> +So it looks like main thread is processing job-dismiss request and is +> +holding write lock taken in block_job_remove_all_bdrv() (frame #20 +> +above). At the same time iothread spawns a coroutine which performs IO +> +request. Before the coroutine is spawned, blk_aio_prwv() increases +> +'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +> +trying to acquire the read lock. But main thread isn't releasing the +> +lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +> +Here's the deadlock. +> +> +Any comments and suggestions on the subject are welcomed. Thanks! +I think this is what the blk_wait_while_drained() call was supposed to +address in blk_co_do_preadv_part(). However, with the use of multiple +I/O threads, this is racy. + +Do you think that in your case we hit the small race window between the +checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there +another reason why blk_wait_while_drained() didn't do its job? + +Kevin + +On 5/2/25 19:34, Kevin Wolf wrote: +Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: +Hi all, + +There's a bug in block layer which leads to block graph deadlock. +Notably, it takes place when blockdev IO is processed within a separate +iothread. + +This was initially caught by our tests, and I was able to reduce it to a +relatively simple reproducer. Such deadlocks are probably supposed to +be covered in iotests/graph-changes-while-io, but this deadlock isn't. + +Basically what the reproducer does is launches QEMU with a drive having +'iothread' option set, creates a chain of 2 snapshots, launches +block-commit job for a snapshot and then dismisses the job, starting +from the lower snapshot. If the guest is issuing IO at the same time, +there's a race in acquiring block graph lock and a potential deadlock. + +Here's how it can be reproduced: + +1. Run QEMU: +SRCDIR=/path/to/srcdir +$SRCDIR/build/qemu-system-x86_64 -enable-kvm \ +-machine q35 -cpu Nehalem \ + -name guest=alma8-vm,debug-threads=on \ + -m 2g -smp 2 \ + -nographic -nodefaults \ + -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ + -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ + -object iothread,id=iothread0 \ + -blockdev +node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 + \ + -device virtio-blk-pci,drive=disk,iothread=iothread0 +2. Launch IO (random reads) from within the guest: +nc -U /var/run/alma8-serial.sock +... +[root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 --bs=4k +--size=1G --numjobs=1 --time_based=1 --runtime=300 --group_reporting +--rw=randread --iodepth=1 --filename=/testfile +3. Run snapshots creation & removal of lower snapshot operation in a +loop (script attached): +while /bin/true ; do ./remove_lower_snap.sh ; done +And then it occasionally hangs. + +Note: I've tried bisecting this, and looks like deadlock occurs starting +from the following commit: + +(BAD) 5bdbaebcce virtio: Re-enable notifications after drain +(GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll + +On the latest v10.0.0 it does hang as well. + + +Here's backtrace of the main thread: +#0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, timeout=<optimized +out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:43 +#1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, timeout=-1) +at ../util/qemu-timer.c:329 +#2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, +ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 +#3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) at +../util/aio-posix.c:730 +#4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, parent=0x0, +poll=true) at ../block/io.c:378 +#5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at +../block/io.c:391 +#6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7682 +#7 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7964250, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7608 +#8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7668 +#9 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb7e59110, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7608 +#10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7668 +#11 0x0000557eb45ebf2b in bdrv_child_change_aio_context (c=0x557eb814ed80, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7608 +#12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../blockjob.c:157 +#13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb7c9d3f0, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7592 +#14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7661 +#15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx + (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 +#16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context (c=0x557eb8565af0, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7592 +#17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, +ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, tran=0x557eb7a87160, +errp=0x0) + at ../block.c:7661 +#18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context (bs=0x557eb79575e0, +ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at ../block.c:7715 +#19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) at +../block.c:3317 +#20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv (job=0x557eb7952800) at +../blockjob.c:209 +#21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at +../blockjob.c:82 +#22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at ../job.c:474 +#23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at +../job.c:771 +#24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, +errp=0x7ffd94b4f488) at ../job.c:783 +--Type <RET> for more, q to quit, c to continue without paging-- +#25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 "commit-snap1", +errp=0x7ffd94b4f488) at ../job-qmp.c:138 +#26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, +ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 +#27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at +../qapi/qmp-dispatch.c:128 +#28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at ../util/async.c:172 +#29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at +../util/async.c:219 +#30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at +../util/aio-posix.c:436 +#31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, +callback=0x0, user_data=0x0) at ../util/async.c:361 +#32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at +../glib/gmain.c:3364 +#33 g_main_context_dispatch (context=0x557eb76c6430) at ../glib/gmain.c:4079 +#34 0x0000557eb47d3ab1 in glib_pollfds_poll () at ../util/main-loop.c:287 +#35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at +../util/main-loop.c:310 +#36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at +../util/main-loop.c:589 +#37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 +#38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at ../system/main.c:50 +#39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at +../system/main.c:80 +And here's coroutine trying to acquire read lock: +(gdb) qemu coroutine reader_queue->entries.sqh_first +#0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, +to_=0x7fc537fff508, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:321 +#1 0x0000557eb47d4d4a in qemu_coroutine_yield () at +../util/qemu-coroutine.c:339 +#2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 +<reader_queue>, lock=0x7fc53c57de50, flags=0) at +../util/qemu-coroutine-lock.c:60 +#3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at ../block/graph-lock.c:231 +#4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) at +/home/root/src/qemu/master/include/block/graph-lock.h:213 +#5 0x0000557eb460fa41 in blk_co_do_preadv_part + (blk=0x557eb84c0810, offset=6890553344, bytes=4096, qiov=0x7fc530006988, +qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at ../block/block-backend.c:1339 +#6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at +../block/block-backend.c:1619 +#7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, i1=21886) at +../util/coroutine-ucontext.c:175 +#8 0x00007fc547c2a360 in __start_context () at +../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 +#9 0x00007ffd94b4ea40 in () +#10 0x0000000000000000 in () +So it looks like main thread is processing job-dismiss request and is +holding write lock taken in block_job_remove_all_bdrv() (frame #20 +above). At the same time iothread spawns a coroutine which performs IO +request. Before the coroutine is spawned, blk_aio_prwv() increases +'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +trying to acquire the read lock. But main thread isn't releasing the +lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +Here's the deadlock. + +Any comments and suggestions on the subject are welcomed. Thanks! +I think this is what the blk_wait_while_drained() call was supposed to +address in blk_co_do_preadv_part(). However, with the use of multiple +I/O threads, this is racy. + +Do you think that in your case we hit the small race window between the +checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there +another reason why blk_wait_while_drained() didn't do its job? + +Kevin +At my opinion there is very big race window. Main thread has +eaten graph write lock. After that another coroutine is stalled +within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only +after that main thread has started drain. That is why Fiona's idea is +looking working. Though this would mean that normally we should always +do that at the moment when we acquire write lock. May be even inside +this function. Den + +Am 02.05.2025 um 19:52 hat Denis V. Lunev geschrieben: +> +On 5/2/25 19:34, Kevin Wolf wrote: +> +> Am 24.04.2025 um 19:32 hat Andrey Drobyshev geschrieben: +> +> > Hi all, +> +> > +> +> > There's a bug in block layer which leads to block graph deadlock. +> +> > Notably, it takes place when blockdev IO is processed within a separate +> +> > iothread. +> +> > +> +> > This was initially caught by our tests, and I was able to reduce it to a +> +> > relatively simple reproducer. Such deadlocks are probably supposed to +> +> > be covered in iotests/graph-changes-while-io, but this deadlock isn't. +> +> > +> +> > Basically what the reproducer does is launches QEMU with a drive having +> +> > 'iothread' option set, creates a chain of 2 snapshots, launches +> +> > block-commit job for a snapshot and then dismisses the job, starting +> +> > from the lower snapshot. If the guest is issuing IO at the same time, +> +> > there's a race in acquiring block graph lock and a potential deadlock. +> +> > +> +> > Here's how it can be reproduced: +> +> > +> +> > 1. Run QEMU: +> +> > > SRCDIR=/path/to/srcdir +> +> > > $SRCDIR/build/qemu-system-x86_64 -enable-kvm \ +> +> > > -machine q35 -cpu Nehalem \ +> +> > > -name guest=alma8-vm,debug-threads=on \ +> +> > > -m 2g -smp 2 \ +> +> > > -nographic -nodefaults \ +> +> > > -qmp unix:/var/run/alma8-qmp.sock,server=on,wait=off \ +> +> > > -serial unix:/var/run/alma8-serial.sock,server=on,wait=off \ +> +> > > -object iothread,id=iothread0 \ +> +> > > -blockdev +> +> > > node-name=disk,driver=qcow2,file.driver=file,file.filename=/path/to/img/alma8.qcow2 +> +> > > \ +> +> > > -device virtio-blk-pci,drive=disk,iothread=iothread0 +> +> > 2. Launch IO (random reads) from within the guest: +> +> > > nc -U /var/run/alma8-serial.sock +> +> > > ... +> +> > > [root@alma8-vm ~]# fio --name=randread --ioengine=libaio --direct=1 +> +> > > --bs=4k --size=1G --numjobs=1 --time_based=1 --runtime=300 +> +> > > --group_reporting --rw=randread --iodepth=1 --filename=/testfile +> +> > 3. Run snapshots creation & removal of lower snapshot operation in a +> +> > loop (script attached): +> +> > > while /bin/true ; do ./remove_lower_snap.sh ; done +> +> > And then it occasionally hangs. +> +> > +> +> > Note: I've tried bisecting this, and looks like deadlock occurs starting +> +> > from the following commit: +> +> > +> +> > (BAD) 5bdbaebcce virtio: Re-enable notifications after drain +> +> > (GOOD) c42c3833e0 virtio-scsi: Attach event vq notifier with no_poll +> +> > +> +> > On the latest v10.0.0 it does hang as well. +> +> > +> +> > +> +> > Here's backtrace of the main thread: +> +> > +> +> > > #0 0x00007fc547d427ce in __ppoll (fds=0x557eb79657b0, nfds=1, +> +> > > timeout=<optimized out>, sigmask=0x0) at +> +> > > ../sysdeps/unix/sysv/linux/ppoll.c:43 +> +> > > #1 0x0000557eb47d955c in qemu_poll_ns (fds=0x557eb79657b0, nfds=1, +> +> > > timeout=-1) at ../util/qemu-timer.c:329 +> +> > > #2 0x0000557eb47b2204 in fdmon_poll_wait (ctx=0x557eb76c5f20, +> +> > > ready_list=0x7ffd94b4edd8, timeout=-1) at ../util/fdmon-poll.c:79 +> +> > > #3 0x0000557eb47b1c45 in aio_poll (ctx=0x557eb76c5f20, blocking=true) +> +> > > at ../util/aio-posix.c:730 +> +> > > #4 0x0000557eb4621edd in bdrv_do_drained_begin (bs=0x557eb795e950, +> +> > > parent=0x0, poll=true) at ../block/io.c:378 +> +> > > #5 0x0000557eb4621f7b in bdrv_drained_begin (bs=0x557eb795e950) at +> +> > > ../block/io.c:391 +> +> > > #6 0x0000557eb45ec125 in bdrv_change_aio_context (bs=0x557eb795e950, +> +> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7682 +> +> > > #7 0x0000557eb45ebf2b in bdrv_child_change_aio_context +> +> > > (c=0x557eb7964250, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7608 +> +> > > #8 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb79575e0, +> +> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7668 +> +> > > #9 0x0000557eb45ebf2b in bdrv_child_change_aio_context +> +> > > (c=0x557eb7e59110, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7608 +> +> > > #10 0x0000557eb45ec0c4 in bdrv_change_aio_context (bs=0x557eb7e51960, +> +> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7668 +> +> > > #11 0x0000557eb45ebf2b in bdrv_child_change_aio_context +> +> > > (c=0x557eb814ed80, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7608 +> +> > > #12 0x0000557eb45ee8e4 in child_job_change_aio_ctx (c=0x557eb7c9d3f0, +> +> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../blockjob.c:157 +> +> > > #13 0x0000557eb45ebe2d in bdrv_parent_change_aio_context +> +> > > (c=0x557eb7c9d3f0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7592 +> +> > > #14 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb7d74310, +> +> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7661 +> +> > > #15 0x0000557eb45dcd7e in bdrv_child_cb_change_aio_ctx +> +> > > (child=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 +> +> > > = {...}, tran=0x557eb7a87160, errp=0x0) at ../block.c:1234 +> +> > > #16 0x0000557eb45ebe2d in bdrv_parent_change_aio_context +> +> > > (c=0x557eb8565af0, ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7592 +> +> > > #17 0x0000557eb45ec06b in bdrv_change_aio_context (bs=0x557eb79575e0, +> +> > > ctx=0x557eb76c5f20, visited=0x557eb7e06b60 = {...}, +> +> > > tran=0x557eb7a87160, errp=0x0) +> +> > > at ../block.c:7661 +> +> > > #18 0x0000557eb45ec1f3 in bdrv_try_change_aio_context +> +> > > (bs=0x557eb79575e0, ctx=0x557eb76c5f20, ignore_child=0x0, errp=0x0) at +> +> > > ../block.c:7715 +> +> > > #19 0x0000557eb45e1b15 in bdrv_root_unref_child (child=0x557eb7966f30) +> +> > > at ../block.c:3317 +> +> > > #20 0x0000557eb45eeaa8 in block_job_remove_all_bdrv +> +> > > (job=0x557eb7952800) at ../blockjob.c:209 +> +> > > #21 0x0000557eb45ee641 in block_job_free (job=0x557eb7952800) at +> +> > > ../blockjob.c:82 +> +> > > #22 0x0000557eb45f17af in job_unref_locked (job=0x557eb7952800) at +> +> > > ../job.c:474 +> +> > > #23 0x0000557eb45f257d in job_do_dismiss_locked (job=0x557eb7952800) at +> +> > > ../job.c:771 +> +> > > #24 0x0000557eb45f25fe in job_dismiss_locked (jobptr=0x7ffd94b4f400, +> +> > > errp=0x7ffd94b4f488) at ../job.c:783 +> +> > > --Type <RET> for more, q to quit, c to continue without paging-- +> +> > > #25 0x0000557eb45d8e84 in qmp_job_dismiss (id=0x557eb7aa42b0 +> +> > > "commit-snap1", errp=0x7ffd94b4f488) at ../job-qmp.c:138 +> +> > > #26 0x0000557eb472f6a3 in qmp_marshal_job_dismiss (args=0x7fc52c00a3b0, +> +> > > ret=0x7fc53c880da8, errp=0x7fc53c880da0) at qapi/qapi-commands-job.c:221 +> +> > > #27 0x0000557eb47a35f3 in do_qmp_dispatch_bh (opaque=0x7fc53c880e40) at +> +> > > ../qapi/qmp-dispatch.c:128 +> +> > > #28 0x0000557eb47d1cd2 in aio_bh_call (bh=0x557eb79568f0) at +> +> > > ../util/async.c:172 +> +> > > #29 0x0000557eb47d1df5 in aio_bh_poll (ctx=0x557eb76c0200) at +> +> > > ../util/async.c:219 +> +> > > #30 0x0000557eb47b12f3 in aio_dispatch (ctx=0x557eb76c0200) at +> +> > > ../util/aio-posix.c:436 +> +> > > #31 0x0000557eb47d2266 in aio_ctx_dispatch (source=0x557eb76c0200, +> +> > > callback=0x0, user_data=0x0) at ../util/async.c:361 +> +> > > #32 0x00007fc549232f4f in g_main_dispatch (context=0x557eb76c6430) at +> +> > > ../glib/gmain.c:3364 +> +> > > #33 g_main_context_dispatch (context=0x557eb76c6430) at +> +> > > ../glib/gmain.c:4079 +> +> > > #34 0x0000557eb47d3ab1 in glib_pollfds_poll () at +> +> > > ../util/main-loop.c:287 +> +> > > #35 0x0000557eb47d3b38 in os_host_main_loop_wait (timeout=0) at +> +> > > ../util/main-loop.c:310 +> +> > > #36 0x0000557eb47d3c58 in main_loop_wait (nonblocking=0) at +> +> > > ../util/main-loop.c:589 +> +> > > #37 0x0000557eb4218b01 in qemu_main_loop () at ../system/runstate.c:835 +> +> > > #38 0x0000557eb46df166 in qemu_default_main (opaque=0x0) at +> +> > > ../system/main.c:50 +> +> > > #39 0x0000557eb46df215 in main (argc=24, argv=0x7ffd94b4f8d8) at +> +> > > ../system/main.c:80 +> +> > +> +> > And here's coroutine trying to acquire read lock: +> +> > +> +> > > (gdb) qemu coroutine reader_queue->entries.sqh_first +> +> > > #0 0x0000557eb47d7068 in qemu_coroutine_switch (from_=0x557eb7aa48b0, +> +> > > to_=0x7fc537fff508, action=COROUTINE_YIELD) at +> +> > > ../util/coroutine-ucontext.c:321 +> +> > > #1 0x0000557eb47d4d4a in qemu_coroutine_yield () at +> +> > > ../util/qemu-coroutine.c:339 +> +> > > #2 0x0000557eb47d56c8 in qemu_co_queue_wait_impl (queue=0x557eb59954c0 +> +> > > <reader_queue>, lock=0x7fc53c57de50, flags=0) at +> +> > > ../util/qemu-coroutine-lock.c:60 +> +> > > #3 0x0000557eb461fea7 in bdrv_graph_co_rdlock () at +> +> > > ../block/graph-lock.c:231 +> +> > > #4 0x0000557eb460c81a in graph_lockable_auto_lock (x=0x7fc53c57dee3) +> +> > > at /home/root/src/qemu/master/include/block/graph-lock.h:213 +> +> > > #5 0x0000557eb460fa41 in blk_co_do_preadv_part +> +> > > (blk=0x557eb84c0810, offset=6890553344, bytes=4096, +> +> > > qiov=0x7fc530006988, qiov_offset=0, flags=BDRV_REQ_REGISTERED_BUF) at +> +> > > ../block/block-backend.c:1339 +> +> > > #6 0x0000557eb46104d7 in blk_aio_read_entry (opaque=0x7fc530003240) at +> +> > > ../block/block-backend.c:1619 +> +> > > #7 0x0000557eb47d6c40 in coroutine_trampoline (i0=-1213577040, +> +> > > i1=21886) at ../util/coroutine-ucontext.c:175 +> +> > > #8 0x00007fc547c2a360 in __start_context () at +> +> > > ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 +> +> > > #9 0x00007ffd94b4ea40 in () +> +> > > #10 0x0000000000000000 in () +> +> > +> +> > So it looks like main thread is processing job-dismiss request and is +> +> > holding write lock taken in block_job_remove_all_bdrv() (frame #20 +> +> > above). At the same time iothread spawns a coroutine which performs IO +> +> > request. Before the coroutine is spawned, blk_aio_prwv() increases +> +> > 'in_flight' counter for Blk. Then blk_co_do_preadv_part() (frame #5) is +> +> > trying to acquire the read lock. But main thread isn't releasing the +> +> > lock as blk_root_drained_poll() returns true since blk->in_flight > 0. +> +> > Here's the deadlock. +> +> > +> +> > Any comments and suggestions on the subject are welcomed. Thanks! +> +> I think this is what the blk_wait_while_drained() call was supposed to +> +> address in blk_co_do_preadv_part(). However, with the use of multiple +> +> I/O threads, this is racy. +> +> +> +> Do you think that in your case we hit the small race window between the +> +> checks in blk_wait_while_drained() and GRAPH_RDLOCK_GUARD()? Or is there +> +> another reason why blk_wait_while_drained() didn't do its job? +> +> +> +At my opinion there is very big race window. Main thread has +> +eaten graph write lock. After that another coroutine is stalled +> +within GRAPH_RDLOCK_GUARD() as there is no drain at the moment and only +> +after that main thread has started drain. +You're right, I confused taking the write lock with draining there. + +> +That is why Fiona's idea is looking working. Though this would mean +> +that normally we should always do that at the moment when we acquire +> +write lock. May be even inside this function. +I actually see now that not all of my graph locking patches were merged. +At least I did have the thought that bdrv_drained_begin() must be marked +GRAPH_UNLOCKED because it polls. That means that calling it from inside +bdrv_try_change_aio_context() is actually forbidden (and that's the part +I didn't see back then because it doesn't have TSA annotations). + +If you refactor the code to move the drain out to before the lock is +taken, I think you end up with Fiona's patch, except you'll remove the +forbidden inner drain and add more annotations for some functions and +clarify the rules around them. I don't know, but I wouldn't be surprised +if along the process we find other bugs, too. + +So Fiona's drain looks right to me, but we should probably approach it +more systematically. + +Kevin + diff --git a/classification_output/01/instruction/6117378 b/classification_output/01/instruction/6117378 new file mode 100644 index 000000000..5dad058d5 --- /dev/null +++ b/classification_output/01/instruction/6117378 @@ -0,0 +1,31 @@ +instruction: 0.693 +mistranslation: 0.533 +other: 0.519 +semantic: 0.454 + +[Qemu-devel] [BUG] network : windows os lost ip address of the network card in some cases + +we found this problem for a long time ãFor example, if we has three network +card in virtual xml file ï¼such as "network connection 1" / "network connection +2"/"network connection 3" ã + +Echo network card has own ip address ï¼such as 192.168.1.1 / 2.1 /3.1 , when +delete the first card ï¼reboot the windows virtual os, then this problem +happened ! + + + + +we found that the sencond network card will replace the first one , then the +ip address of "network connection 2 " become 192.168.1.1 ã + + +Our third party users began to complain about this bug ãAll the business of the +second ip lost !!! + +I mean both of windows and linux has this bug , we solve this bug in linux +throught bonding netcrad pci and mac address ã + +There is no good solution on windows os . thera are ? we implemented a plan to +resumption of IP by QGA. Is there a better way ? + diff --git a/classification_output/01/instruction/7647456 b/classification_output/01/instruction/7647456 new file mode 100644 index 000000000..d887fe7b5 --- /dev/null +++ b/classification_output/01/instruction/7647456 @@ -0,0 +1,110 @@ +instruction: 0.768 +other: 0.737 +semantic: 0.669 +mistranslation: 0.652 + +[Qemu-devel] Can I have someone's feedback on [bug 1809075] Concurrency bug on keyboard events: capslock LED messing up keycode streams causes character misses at guest kernel + +Hi everyone. +Can I please have someone's feedback on this bug? +https://bugs.launchpad.net/qemu/+bug/1809075 +Briefly, guest OS loses characters sent to it via vnc. And I spot the +bug in relation to ps2 driver. +I'm thinking of possible fixes and I might want to use a memory barrier. +But I would really like to have some suggestion from a qemu developer +first. For example, can we brutally drop capslock LED key events in ps2 +queue? +It is actually relevant to openQA, an automated QA tool for openSUSE. +And this bug blocks a few test cases for us. +Thank you in advance! + +Kind regards, +Gao Zhiyuan + +Cc'ing Marc-André & Gerd. + +On 12/19/18 10:31 AM, Gao Zhiyuan wrote: +> +Hi everyone. +> +> +Can I please have someone's feedback on this bug? +> +https://bugs.launchpad.net/qemu/+bug/1809075 +> +Briefly, guest OS loses characters sent to it via vnc. And I spot the +> +bug in relation to ps2 driver. +> +> +I'm thinking of possible fixes and I might want to use a memory barrier. +> +But I would really like to have some suggestion from a qemu developer +> +first. For example, can we brutally drop capslock LED key events in ps2 +> +queue? +> +> +It is actually relevant to openQA, an automated QA tool for openSUSE. +> +And this bug blocks a few test cases for us. +> +> +Thank you in advance! +> +> +Kind regards, +> +Gao Zhiyuan +> + +On Thu, Jan 03, 2019 at 12:05:54PM +0100, Philippe Mathieu-Daudé wrote: +> +Cc'ing Marc-André & Gerd. +> +> +On 12/19/18 10:31 AM, Gao Zhiyuan wrote: +> +> Hi everyone. +> +> +> +> Can I please have someone's feedback on this bug? +> +> +https://bugs.launchpad.net/qemu/+bug/1809075 +> +> Briefly, guest OS loses characters sent to it via vnc. And I spot the +> +> bug in relation to ps2 driver. +> +> +> +> I'm thinking of possible fixes and I might want to use a memory barrier. +> +> But I would really like to have some suggestion from a qemu developer +> +> first. For example, can we brutally drop capslock LED key events in ps2 +> +> queue? +There is no "capslock LED key event". 0xfa is KBD_REPLY_ACK, and the +device queues it in response to guest port writes. Yes, the ack can +race with actual key events. But IMO that isn't a bug in qemu. + +Probably the linux kernel just throws away everything until it got the +ack for the port write, and that way the key event gets lost. On +physical hardware you will not notice because it is next to impossible +to type fast enough to hit the race window. + +So, go fix the kernel. + +Alternatively fix vncdotool to send uppercase letters properly with +shift key pressed. Then qemu wouldn't generate capslock key events +(that happens because qemu thinks guest and host capslock state is out +of sync) and the guests's capslock led update request wouldn't get into +the way. + +cheers, + Gerd + diff --git a/classification_output/01/instruction/7658242 b/classification_output/01/instruction/7658242 new file mode 100644 index 000000000..3ff255be0 --- /dev/null +++ b/classification_output/01/instruction/7658242 @@ -0,0 +1,1125 @@ +instruction: 0.775 +other: 0.771 +mistranslation: 0.719 +semantic: 0.673 + +[BUG] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios + +Early-boot e820 records will be inserted by the bios/efi/early boot +software and be reported to the kernel via insert_resource. Later, when +CXL drivers iterate through the regions again, they will insert another +resource and make the RESERVED memory area a child. + +This RESERVED memory area causes the memory region to become unusable, +and as a result attempting to create memory regions with + + `cxl create-region ...` + +Will fail due to the RESERVED area intersecting with the CXL window. + + +During boot the following traceback is observed: + +0xffffffff81101650 in insert_resource_expand_to_fit () +0xffffffff83d964c5 in e820__reserve_resources_late () +0xffffffff83e03210 in pcibios_resource_survey () +0xffffffff83e04f4a in pcibios_init () + +Which produces a call to reserve the CFMWS area: + +(gdb) p *new +$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved", + flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0, + child = 0x0} + +Later the Kernel parses ACPI tables and reserves the exact same area as +the CXL Fixed Memory Window. The use of `insert_resource_conflict` +retains the RESERVED region and makes it a child of the new region. + +0xffffffff811016a4 in insert_resource_conflict () + insert_resource () +0xffffffff81a81389 in cxl_parse_cfmws () +0xffffffff818c4a81 in call_handler () + acpi_parse_entries_array () + +(gdb) p/x *new +$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0", + flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0, + child = 0x0} + +This produces the following output in /proc/iomem: + +590000000-68fffffff : CXL Window 0 + 590000000-68fffffff : Reserved + +This reserved area causes `get_free_mem_region()` to fail due to a check +against `__region_intersects()`. Due to this reserved area, the +intersect check will only ever return REGION_INTERSECTS, which causes +`cxl create-region` to always fail. + +Signed-off-by: Gregory Price <gregory.price@memverge.com> +--- + hw/i386/pc.c | 2 -- + 1 file changed, 2 deletions(-) + +diff --git a/hw/i386/pc.c b/hw/i386/pc.c +index 566accf7e6..5bf5465a21 100644 +--- a/hw/i386/pc.c ++++ b/hw/i386/pc.c +@@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, + hwaddr cxl_size = MiB; + + cxl_base = pc_get_cxl_range_start(pcms); +- e820_add_entry(cxl_base, cxl_size, E820_RESERVED); + memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); + memory_region_add_subregion(system_memory, cxl_base, mr); + cxl_resv_end = cxl_base + cxl_size; +@@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms, + memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw, + "cxl-fixed-memory-region", fw->size); + memory_region_add_subregion(system_memory, fw->base, &fw->mr); +- e820_add_entry(fw->base, fw->size, E820_RESERVED); + cxl_fmw_base += fw->size; + cxl_resv_end = cxl_fmw_base; + } +-- +2.37.3 + +Early-boot e820 records will be inserted by the bios/efi/early boot +software and be reported to the kernel via insert_resource. Later, when +CXL drivers iterate through the regions again, they will insert another +resource and make the RESERVED memory area a child. + +This RESERVED memory area causes the memory region to become unusable, +and as a result attempting to create memory regions with + + `cxl create-region ...` + +Will fail due to the RESERVED area intersecting with the CXL window. + + +During boot the following traceback is observed: + +0xffffffff81101650 in insert_resource_expand_to_fit () +0xffffffff83d964c5 in e820__reserve_resources_late () +0xffffffff83e03210 in pcibios_resource_survey () +0xffffffff83e04f4a in pcibios_init () + +Which produces a call to reserve the CFMWS area: + +(gdb) p *new +$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved", + flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0, + child = 0x0} + +Later the Kernel parses ACPI tables and reserves the exact same area as +the CXL Fixed Memory Window. The use of `insert_resource_conflict` +retains the RESERVED region and makes it a child of the new region. + +0xffffffff811016a4 in insert_resource_conflict () + insert_resource () +0xffffffff81a81389 in cxl_parse_cfmws () +0xffffffff818c4a81 in call_handler () + acpi_parse_entries_array () + +(gdb) p/x *new +$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0", + flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0, + child = 0x0} + +This produces the following output in /proc/iomem: + +590000000-68fffffff : CXL Window 0 + 590000000-68fffffff : Reserved + +This reserved area causes `get_free_mem_region()` to fail due to a check +against `__region_intersects()`. Due to this reserved area, the +intersect check will only ever return REGION_INTERSECTS, which causes +`cxl create-region` to always fail. + +Signed-off-by: Gregory Price <gregory.price@memverge.com> +--- + hw/i386/pc.c | 2 -- + 1 file changed, 2 deletions(-) + +diff --git a/hw/i386/pc.c b/hw/i386/pc.c +index 566accf7e6..5bf5465a21 100644 +--- a/hw/i386/pc.c ++++ b/hw/i386/pc.c +@@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, + hwaddr cxl_size = MiB; +cxl_base = pc_get_cxl_range_start(pcms); +- e820_add_entry(cxl_base, cxl_size, E820_RESERVED); + memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); + memory_region_add_subregion(system_memory, cxl_base, mr); + cxl_resv_end = cxl_base + cxl_size; +@@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms, + memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, +fw, + "cxl-fixed-memory-region", fw->size); + memory_region_add_subregion(system_memory, fw->base, &fw->mr); +Or will this be subregion of cxl_base? + +Thanks, +Pankaj +- e820_add_entry(fw->base, fw->size, E820_RESERVED); + cxl_fmw_base += fw->size; + cxl_resv_end = cxl_fmw_base; + } + +> +> - e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +> +> memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); +> +> memory_region_add_subregion(system_memory, cxl_base, mr); +> +> cxl_resv_end = cxl_base + cxl_size; +> +> @@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms, +> +> memory_region_init_io(&fw->mr, OBJECT(machine), +> +> &cfmws_ops, fw, +> +> "cxl-fixed-memory-region", +> +> fw->size); +> +> memory_region_add_subregion(system_memory, fw->base, +> +> &fw->mr); +> +> +Or will this be subregion of cxl_base? +> +> +Thanks, +> +Pankaj +The memory region backing this memory area still has to be initialized +and added in the QEMU system, but it will now be initialized for use by +linux after PCI/ACPI setup occurs and the CXL driver discovers it via +CDAT. + +It's also still possible to assign this area a static memory region at +bool by setting up the SRATs in the ACPI tables, but that patch is not +upstream yet. + +On Tue, Oct 18, 2022 at 5:14 AM Gregory Price <gourry.memverge@gmail.com> wrote: +> +> +Early-boot e820 records will be inserted by the bios/efi/early boot +> +software and be reported to the kernel via insert_resource. Later, when +> +CXL drivers iterate through the regions again, they will insert another +> +resource and make the RESERVED memory area a child. +I have already sent a patch +https://www.mail-archive.com/qemu-devel@nongnu.org/msg882012.html +. +When the patch is applied, there would not be any reserved entries +even with passing E820_RESERVED . +So this patch needs to be evaluated in the light of the above patch I +sent. Once you apply my patch, does the issue still exist? + +> +> +This RESERVED memory area causes the memory region to become unusable, +> +and as a result attempting to create memory regions with +> +> +`cxl create-region ...` +> +> +Will fail due to the RESERVED area intersecting with the CXL window. +> +> +> +During boot the following traceback is observed: +> +> +0xffffffff81101650 in insert_resource_expand_to_fit () +> +0xffffffff83d964c5 in e820__reserve_resources_late () +> +0xffffffff83e03210 in pcibios_resource_survey () +> +0xffffffff83e04f4a in pcibios_init () +> +> +Which produces a call to reserve the CFMWS area: +> +> +(gdb) p *new +> +$54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved", +> +flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0, +> +child = 0x0} +> +> +Later the Kernel parses ACPI tables and reserves the exact same area as +> +the CXL Fixed Memory Window. The use of `insert_resource_conflict` +> +retains the RESERVED region and makes it a child of the new region. +> +> +0xffffffff811016a4 in insert_resource_conflict () +> +insert_resource () +> +0xffffffff81a81389 in cxl_parse_cfmws () +> +0xffffffff818c4a81 in call_handler () +> +acpi_parse_entries_array () +> +> +(gdb) p/x *new +> +$59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0", +> +flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0, +> +child = 0x0} +> +> +This produces the following output in /proc/iomem: +> +> +590000000-68fffffff : CXL Window 0 +> +590000000-68fffffff : Reserved +> +> +This reserved area causes `get_free_mem_region()` to fail due to a check +> +against `__region_intersects()`. Due to this reserved area, the +> +intersect check will only ever return REGION_INTERSECTS, which causes +> +`cxl create-region` to always fail. +> +> +Signed-off-by: Gregory Price <gregory.price@memverge.com> +> +--- +> +hw/i386/pc.c | 2 -- +> +1 file changed, 2 deletions(-) +> +> +diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> +index 566accf7e6..5bf5465a21 100644 +> +--- a/hw/i386/pc.c +> ++++ b/hw/i386/pc.c +> +@@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +> +hwaddr cxl_size = MiB; +> +> +cxl_base = pc_get_cxl_range_start(pcms); +> +- e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +> +memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); +> +memory_region_add_subregion(system_memory, cxl_base, mr); +> +cxl_resv_end = cxl_base + cxl_size; +> +@@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms, +> +memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, +> +fw, +> +"cxl-fixed-memory-region", fw->size); +> +memory_region_add_subregion(system_memory, fw->base, +> +&fw->mr); +> +- e820_add_entry(fw->base, fw->size, E820_RESERVED); +> +cxl_fmw_base += fw->size; +> +cxl_resv_end = cxl_fmw_base; +> +} +> +-- +> +2.37.3 +> + +This patch does not resolve the issue, reserved entries are still created. +[  0.000000] BIOS-e820: [mem 0x0000000280000000-0x00000002800fffff] reserved +[  0.000000] BIOS-e820: [mem 0x0000000290000000-0x000000029fffffff] reserved +# cat /proc/iomem +290000000-29fffffff : CXL Window 0 + 290000000-29fffffff : Reserved +# cxl create-region -m -d decoder0.0 -w 1 -g 256 mem0 +cxl region: create_region: region0: set_size failed: Numerical result out of range +cxl region: cmd_create_region: created 0 regions +On Tue, Oct 18, 2022 at 2:05 AM Ani Sinha < +ani@anisinha.ca +> wrote: +On Tue, Oct 18, 2022 at 5:14 AM Gregory Price < +gourry.memverge@gmail.com +> wrote: +> +> Early-boot e820 records will be inserted by the bios/efi/early boot +> software and be reported to the kernel via insert_resource. Later, when +> CXL drivers iterate through the regions again, they will insert another +> resource and make the RESERVED memory area a child. +I have already sent a patch +https://www.mail-archive.com/qemu-devel@nongnu.org/msg882012.html +. +When the patch is applied, there would not be any reserved entries +even with passing E820_RESERVED . +So this patch needs to be evaluated in the light of the above patch I +sent. Once you apply my patch, does the issue still exist? +> +> This RESERVED memory area causes the memory region to become unusable, +> and as a result attempting to create memory regions with +> +>   `cxl create-region ...` +> +> Will fail due to the RESERVED area intersecting with the CXL window. +> +> +> During boot the following traceback is observed: +> +> 0xffffffff81101650 in insert_resource_expand_to_fit () +> 0xffffffff83d964c5 in e820__reserve_resources_late () +> 0xffffffff83e03210 in pcibios_resource_survey () +> 0xffffffff83e04f4a in pcibios_init () +> +> Which produces a call to reserve the CFMWS area: +> +> (gdb) p *new +> $54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved", +>    flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0, +>    child = 0x0} +> +> Later the Kernel parses ACPI tables and reserves the exact same area as +> the CXL Fixed Memory Window. The use of `insert_resource_conflict` +> retains the RESERVED region and makes it a child of the new region. +> +> 0xffffffff811016a4 in insert_resource_conflict () +>            insert_resource () +> 0xffffffff81a81389 in cxl_parse_cfmws () +> 0xffffffff818c4a81 in call_handler () +>            acpi_parse_entries_array () +> +> (gdb) p/x *new +> $59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0", +>    flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0, +>    child = 0x0} +> +> This produces the following output in /proc/iomem: +> +> 590000000-68fffffff : CXL Window 0 +>  590000000-68fffffff : Reserved +> +> This reserved area causes `get_free_mem_region()` to fail due to a check +> against `__region_intersects()`. Due to this reserved area, the +> intersect check will only ever return REGION_INTERSECTS, which causes +> `cxl create-region` to always fail. +> +> Signed-off-by: Gregory Price < +gregory.price@memverge.com +> +> --- +> hw/i386/pc.c | 2 -- +> 1 file changed, 2 deletions(-) +> +> diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> index 566accf7e6..5bf5465a21 100644 +> --- a/hw/i386/pc.c +> +++ b/hw/i386/pc.c +> @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +>     hwaddr cxl_size = MiB; +> +>     cxl_base = pc_get_cxl_range_start(pcms); +> -    e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +>     memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); +>     memory_region_add_subregion(system_memory, cxl_base, mr); +>     cxl_resv_end = cxl_base + cxl_size; +> @@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms, +>         memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw, +>                    "cxl-fixed-memory-region", fw->size); +>         memory_region_add_subregion(system_memory, fw->base, &fw->mr); +> -        e820_add_entry(fw->base, fw->size, E820_RESERVED); +>         cxl_fmw_base += fw->size; +>         cxl_resv_end = cxl_fmw_base; +>       } +> -- +> 2.37.3 +> + ++Gerd Hoffmann + +On Tue, Oct 18, 2022 at 8:16 PM Gregory Price <gourry.memverge@gmail.com> wrote: +> +> +This patch does not resolve the issue, reserved entries are still created. +> +> +[ 0.000000] BIOS-e820: [mem 0x0000000280000000-0x00000002800fffff] reserved +> +[ 0.000000] BIOS-e820: [mem 0x0000000290000000-0x000000029fffffff] reserved +> +> +# cat /proc/iomem +> +290000000-29fffffff : CXL Window 0 +> +290000000-29fffffff : Reserved +> +> +# cxl create-region -m -d decoder0.0 -w 1 -g 256 mem0 +> +cxl region: create_region: region0: set_size failed: Numerical result out of +> +range +> +cxl region: cmd_create_region: created 0 regions +> +> +On Tue, Oct 18, 2022 at 2:05 AM Ani Sinha <ani@anisinha.ca> wrote: +> +> +> +> On Tue, Oct 18, 2022 at 5:14 AM Gregory Price <gourry.memverge@gmail.com> +> +> wrote: +> +> > +> +> > Early-boot e820 records will be inserted by the bios/efi/early boot +> +> > software and be reported to the kernel via insert_resource. Later, when +> +> > CXL drivers iterate through the regions again, they will insert another +> +> > resource and make the RESERVED memory area a child. +> +> +> +> I have already sent a patch +> +> +https://www.mail-archive.com/qemu-devel@nongnu.org/msg882012.html +. +> +> When the patch is applied, there would not be any reserved entries +> +> even with passing E820_RESERVED . +> +> So this patch needs to be evaluated in the light of the above patch I +> +> sent. Once you apply my patch, does the issue still exist? +> +> +> +> > +> +> > This RESERVED memory area causes the memory region to become unusable, +> +> > and as a result attempting to create memory regions with +> +> > +> +> > `cxl create-region ...` +> +> > +> +> > Will fail due to the RESERVED area intersecting with the CXL window. +> +> > +> +> > +> +> > During boot the following traceback is observed: +> +> > +> +> > 0xffffffff81101650 in insert_resource_expand_to_fit () +> +> > 0xffffffff83d964c5 in e820__reserve_resources_late () +> +> > 0xffffffff83e03210 in pcibios_resource_survey () +> +> > 0xffffffff83e04f4a in pcibios_init () +> +> > +> +> > Which produces a call to reserve the CFMWS area: +> +> > +> +> > (gdb) p *new +> +> > $54 = {start = 0x290000000, end = 0x2cfffffff, name = "Reserved", +> +> > flags = 0x200, desc = 0x7, parent = 0x0, sibling = 0x0, +> +> > child = 0x0} +> +> > +> +> > Later the Kernel parses ACPI tables and reserves the exact same area as +> +> > the CXL Fixed Memory Window. The use of `insert_resource_conflict` +> +> > retains the RESERVED region and makes it a child of the new region. +> +> > +> +> > 0xffffffff811016a4 in insert_resource_conflict () +> +> > insert_resource () +> +> > 0xffffffff81a81389 in cxl_parse_cfmws () +> +> > 0xffffffff818c4a81 in call_handler () +> +> > acpi_parse_entries_array () +> +> > +> +> > (gdb) p/x *new +> +> > $59 = {start = 0x290000000, end = 0x2cfffffff, name = "CXL Window 0", +> +> > flags = 0x200, desc = 0x0, parent = 0x0, sibling = 0x0, +> +> > child = 0x0} +> +> > +> +> > This produces the following output in /proc/iomem: +> +> > +> +> > 590000000-68fffffff : CXL Window 0 +> +> > 590000000-68fffffff : Reserved +> +> > +> +> > This reserved area causes `get_free_mem_region()` to fail due to a check +> +> > against `__region_intersects()`. Due to this reserved area, the +> +> > intersect check will only ever return REGION_INTERSECTS, which causes +> +> > `cxl create-region` to always fail. +> +> > +> +> > Signed-off-by: Gregory Price <gregory.price@memverge.com> +> +> > --- +> +> > hw/i386/pc.c | 2 -- +> +> > 1 file changed, 2 deletions(-) +> +> > +> +> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> +> > index 566accf7e6..5bf5465a21 100644 +> +> > --- a/hw/i386/pc.c +> +> > +++ b/hw/i386/pc.c +> +> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +> +> > hwaddr cxl_size = MiB; +> +> > +> +> > cxl_base = pc_get_cxl_range_start(pcms); +> +> > - e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +> +> > memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size); +> +> > memory_region_add_subregion(system_memory, cxl_base, mr); +> +> > cxl_resv_end = cxl_base + cxl_size; +> +> > @@ -1077,7 +1076,6 @@ void pc_memory_init(PCMachineState *pcms, +> +> > memory_region_init_io(&fw->mr, OBJECT(machine), +> +> > &cfmws_ops, fw, +> +> > "cxl-fixed-memory-region", +> +> > fw->size); +> +> > memory_region_add_subregion(system_memory, fw->base, +> +> > &fw->mr); +> +> > - e820_add_entry(fw->base, fw->size, E820_RESERVED); +> +> > cxl_fmw_base += fw->size; +> +> > cxl_resv_end = cxl_fmw_base; +> +> > } +> +> > -- +> +> > 2.37.3 +> +> > + +> +>> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> +>> > index 566accf7e6..5bf5465a21 100644 +> +>> > --- a/hw/i386/pc.c +> +>> > +++ b/hw/i386/pc.c +> +>> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +> +>> > hwaddr cxl_size = MiB; +> +>> > +> +>> > cxl_base = pc_get_cxl_range_start(pcms); +> +>> > - e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +Just dropping it doesn't look like a good plan to me. + +You can try set etc/reserved-memory-end fw_cfg file instead. Firmware +(both seabios and ovmf) read it and will make sure the 64bit pci mmio +window is placed above that address, i.e. this effectively reserves +address space. Right now used by memory hotplug code, but should work +for cxl too I think (disclaimer: don't know much about cxl ...). + +take care & HTH, + Gerd + +On Tue, 8 Nov 2022 12:21:11 +0100 +Gerd Hoffmann <kraxel@redhat.com> wrote: + +> +> >> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> +> >> > index 566accf7e6..5bf5465a21 100644 +> +> >> > --- a/hw/i386/pc.c +> +> >> > +++ b/hw/i386/pc.c +> +> >> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +> +> >> > hwaddr cxl_size = MiB; +> +> >> > +> +> >> > cxl_base = pc_get_cxl_range_start(pcms); +> +> >> > - e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +> +> +Just dropping it doesn't look like a good plan to me. +> +> +You can try set etc/reserved-memory-end fw_cfg file instead. Firmware +> +(both seabios and ovmf) read it and will make sure the 64bit pci mmio +> +window is placed above that address, i.e. this effectively reserves +> +address space. Right now used by memory hotplug code, but should work +> +for cxl too I think (disclaimer: don't know much about cxl ...). +As far as I know CXL impl. in QEMU isn't using etc/reserved-memory-end +at all, it' has its own mapping. + +Regardless of that, reserved E820 entries look wrong, and looking at +commit message OS is right to bailout on them (expected according +to ACPI spec). +Also spec says + +" +E820 Assumptions and Limitations + [...] + The platform boot firmware does not return a range description for the memory +mapping of + PCI devices, ISA Option ROMs, and ISA Plug and Play cards because the OS has +mechanisms + available to detect them. +" + +so dropping reserved entries looks reasonable from ACPI spec point of view. +(disclaimer: don't know much about cxl ... either) +> +> +take care & HTH, +> +Gerd +> + +On Fri, Nov 11, 2022 at 11:51:23AM +0100, Igor Mammedov wrote: +> +On Tue, 8 Nov 2022 12:21:11 +0100 +> +Gerd Hoffmann <kraxel@redhat.com> wrote: +> +> +> > >> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> +> > >> > index 566accf7e6..5bf5465a21 100644 +> +> > >> > --- a/hw/i386/pc.c +> +> > >> > +++ b/hw/i386/pc.c +> +> > >> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +> +> > >> > hwaddr cxl_size = MiB; +> +> > >> > +> +> > >> > cxl_base = pc_get_cxl_range_start(pcms); +> +> > >> > - e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +> +> +> +> Just dropping it doesn't look like a good plan to me. +> +> +> +> You can try set etc/reserved-memory-end fw_cfg file instead. Firmware +> +> (both seabios and ovmf) read it and will make sure the 64bit pci mmio +> +> window is placed above that address, i.e. this effectively reserves +> +> address space. Right now used by memory hotplug code, but should work +> +> for cxl too I think (disclaimer: don't know much about cxl ...). +> +> +As far as I know CXL impl. in QEMU isn't using etc/reserved-memory-end +> +at all, it' has its own mapping. +This should be changed. cxl should make sure the highest address used +is stored in etc/reserved-memory-end to avoid the firmware mapping pci +resources there. + +> +so dropping reserved entries looks reasonable from ACPI spec point of view. +Yep, I don't want dispute that. + +I suspect the reason for these entries to exist in the first place is to +inform the firmware that it should not place stuff there, and if we +remove that to conform with the spec we need some alternative way for +that ... + +take care, + Gerd + +On Fri, 11 Nov 2022 12:40:59 +0100 +Gerd Hoffmann <kraxel@redhat.com> wrote: + +> +On Fri, Nov 11, 2022 at 11:51:23AM +0100, Igor Mammedov wrote: +> +> On Tue, 8 Nov 2022 12:21:11 +0100 +> +> Gerd Hoffmann <kraxel@redhat.com> wrote: +> +> +> +> > > >> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c +> +> > > >> > index 566accf7e6..5bf5465a21 100644 +> +> > > >> > --- a/hw/i386/pc.c +> +> > > >> > +++ b/hw/i386/pc.c +> +> > > >> > @@ -1061,7 +1061,6 @@ void pc_memory_init(PCMachineState *pcms, +> +> > > >> > hwaddr cxl_size = MiB; +> +> > > >> > +> +> > > >> > cxl_base = pc_get_cxl_range_start(pcms); +> +> > > >> > - e820_add_entry(cxl_base, cxl_size, E820_RESERVED); +> +> > +> +> > Just dropping it doesn't look like a good plan to me. +> +> > +> +> > You can try set etc/reserved-memory-end fw_cfg file instead. Firmware +> +> > (both seabios and ovmf) read it and will make sure the 64bit pci mmio +> +> > window is placed above that address, i.e. this effectively reserves +> +> > address space. Right now used by memory hotplug code, but should work +> +> > for cxl too I think (disclaimer: don't know much about cxl ...). +> +> +> +> As far as I know CXL impl. in QEMU isn't using etc/reserved-memory-end +> +> at all, it' has its own mapping. +> +> +This should be changed. cxl should make sure the highest address used +> +is stored in etc/reserved-memory-end to avoid the firmware mapping pci +> +resources there. +if (pcmc->has_reserved_memory && machine->device_memory->base) { + +[...] + + if (pcms->cxl_devices_state.is_enabled) { + + res_mem_end = cxl_resv_end; + +that should be handled by this line + + } + + *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB)); + + fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val)); + + } + +so SeaBIOS shouldn't intrude into CXL address space +(I assume EDK2 behave similarly here) + +> +> so dropping reserved entries looks reasonable from ACPI spec point of view. +> +> +> +> +Yep, I don't want dispute that. +> +> +I suspect the reason for these entries to exist in the first place is to +> +inform the firmware that it should not place stuff there, and if we +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +just to educate me, can you point out what SeaBIOS code does with reservations. + +> +remove that to conform with the spec we need some alternative way for +> +that ... +with etc/reserved-memory-end set as above, +is E820_RESERVED really needed here? + +(my understanding was that E820_RESERVED weren't accounted for when +initializing PCI devices) + +> +> +take care, +> +Gerd +> + +> +if (pcmc->has_reserved_memory && machine->device_memory->base) { +> +> +[...] +> +> +if (pcms->cxl_devices_state.is_enabled) { +> +> +res_mem_end = cxl_resv_end; +> +> +that should be handled by this line +> +> +} +> +> +*val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB)); +> +> +fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, +> +sizeof(*val)); +> +} +> +> +so SeaBIOS shouldn't intrude into CXL address space +Yes, looks good, so with this in place already everyting should be fine. + +> +(I assume EDK2 behave similarly here) +Correct, ovmf reads that fw_cfg file too. + +> +> I suspect the reason for these entries to exist in the first place is to +> +> inform the firmware that it should not place stuff there, and if we +> +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> +just to educate me, can you point out what SeaBIOS code does with +> +reservations. +They are added to the e820 map which gets passed on to the OS. seabios +uses (and updateas) the e820 map too, when allocating memory for +example. While thinking about it I'm not fully sure it actually looks +at reservations, maybe it only uses (and updates) ram entries when +allocating memory. + +> +> remove that to conform with the spec we need some alternative way for +> +> that ... +> +> +with etc/reserved-memory-end set as above, +> +is E820_RESERVED really needed here? +No. Setting etc/reserved-memory-end is enough. + +So for the original patch: +Acked-by: Gerd Hoffmann <kraxel@redhat.com> + +take care, + Gerd + +On Fri, Nov 11, 2022 at 02:36:02PM +0100, Gerd Hoffmann wrote: +> +> if (pcmc->has_reserved_memory && machine->device_memory->base) { +> +> +> +> [...] +> +> +> +> if (pcms->cxl_devices_state.is_enabled) { +> +> +> +> res_mem_end = cxl_resv_end; +> +> +> +> that should be handled by this line +> +> +> +> } +> +> +> +> *val = cpu_to_le64(ROUND_UP(res_mem_end, 1 * GiB)); +> +> +> +> fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, +> +> sizeof(*val)); +> +> } +> +> +> +> so SeaBIOS shouldn't intrude into CXL address space +> +> +Yes, looks good, so with this in place already everyting should be fine. +> +> +> (I assume EDK2 behave similarly here) +> +> +Correct, ovmf reads that fw_cfg file too. +> +> +> > I suspect the reason for these entries to exist in the first place is to +> +> > inform the firmware that it should not place stuff there, and if we +> +> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +> +> just to educate me, can you point out what SeaBIOS code does with +> +> reservations. +> +> +They are added to the e820 map which gets passed on to the OS. seabios +> +uses (and updateas) the e820 map too, when allocating memory for +> +example. While thinking about it I'm not fully sure it actually looks +> +at reservations, maybe it only uses (and updates) ram entries when +> +allocating memory. +> +> +> > remove that to conform with the spec we need some alternative way for +> +> > that ... +> +> +> +> with etc/reserved-memory-end set as above, +> +> is E820_RESERVED really needed here? +> +> +No. Setting etc/reserved-memory-end is enough. +> +> +So for the original patch: +> +Acked-by: Gerd Hoffmann <kraxel@redhat.com> +> +> +take care, +> +Gerd +It's upstream already, sorry I can't add your tag. + +-- +MST + diff --git a/classification_output/01/instruction/7733130 b/classification_output/01/instruction/7733130 new file mode 100644 index 000000000..1c3bc483f --- /dev/null +++ b/classification_output/01/instruction/7733130 @@ -0,0 +1,47 @@ +instruction: 0.758 +semantic: 0.694 +other: 0.687 +mistranslation: 0.516 + +[Qemu-devel] [BUG] VNC: client won't send FramebufferUpdateRequest if job in flight is aborted + +Hi Gerd, Daniel. + +We noticed that if VncSharePolicy was configured with +VNC_SHARE_POLICY_FORCE_SHARED mode and +multiple vnc clients opened vnc connections, some clients could go blank screen +at high probability. +This problem can be reproduced when we regularly reboot suse12sp3 in graphic +mode both +with RealVNC and noVNC client. + +Then we dig into it and find out that some clients go blank screen because they +don't +send FramebufferUpdateRequest any more. One step further, we notice that each +time +the job in flight is aborted one client go blank screen. + +The bug is triggered in the following procedure. +Guest reboot => graphic mode switch => graphic_hw_update => vga_update_display +=> vga_draw_graphic (full_update = 1) => dpy_gfx_replace_surface => +vnc_dpy_switch => +vnc_abort_display_jobs (client may have job in flight) => job removed from the +queue +If one client has vnc job in flight, *vnc_abort_display_jobs* will wait until +its job is abandoned. +This behavior is done in vnc_worker_thread_loop when 'if (job->vs->ioc == NULL +|| job->vs->abort == true)' +branch is taken. + +As we can see, *vnc_abort_display_jobs* is intended to do some optimization to +avoid unnecessary client update. +But if client sends FramebufferUpdateRequest for some graphic area and its +FramebufferUpdate response job +is abandoned, the client may wait for the response and never send new +FramebufferUpdateRequest, which may +case the client go blank screen forever. + +So I am wondering whether we should drop the *vnc_abort_display_jobs* +optimization or do some trick here +to push the client to send new FramebufferUpdateRequest. Do you have any idea ? + diff --git a/classification_output/01/instruction/7960594 b/classification_output/01/instruction/7960594 new file mode 100644 index 000000000..c06d35dd8 --- /dev/null +++ b/classification_output/01/instruction/7960594 @@ -0,0 +1,158 @@ +instruction: 0.991 +other: 0.979 +semantic: 0.974 +mistranslation: 0.930 + +[Qemu-devel] [Bug Report] vm paused after succeeding to migrate + +Hi, all +I encounterd a bug when I try to migrate a windows vm. + +Enviroment information: +host A: cpu E5620(model WestmereEP without flag xsave) +host B: cpu E5-2643(model SandyBridgeEP with xsave) + +The reproduce steps is : +1. Start a windows 2008 vm with -cpu host(which means host-passthrough). +2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully). +3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1. +4. Then migrate the vm to host A (successfully), but vm was paused, and qemu +printed log as followed: + +KVM: entry failed, hardware error 0x80000021 + +If you're running a guest on an Intel machine without unrestricted mode +support, the failure can be most likely due to the guest entering an invalid +state for Intel VT. For example, the guest maybe running in big real mode +which is not supported on less recent Intel processors. + +EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000 +ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20 +EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 +ES =0000 00000000 0000ffff 00009300 +CS =f000 ffff0000 0000ffff 00009b00 +SS =0000 00000000 0000ffff 00009300 +DS =0000 00000000 0000ffff 00009300 +FS =0000 00000000 0000ffff 00009300 +GS =0000 00000000 0000ffff 00009300 +LDT=0000 00000000 0000ffff 00008200 +TR =0000 00000000 0000ffff 00008b00 +GDT= 00000000 0000ffff +IDT= 00000000 0000ffff +CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 +DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 +DR3=0000000000000000 +DR6=00000000ffff0ff0 DR7=0000000000000400 +EFER=0000000000000000 +Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 + +I have found that problem happened when kvm_put_sregs returns err -22(called by +kvm_arch_put_registers(qemu)). +Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has no +X86_FEATURE_XSAVE but cr4.OSXSAVE=1. +So should we cancel migration when kvm_arch_put_registers returns error? + +* linzhecheng (address@hidden) wrote: +> +Hi, all +> +I encounterd a bug when I try to migrate a windows vm. +> +> +Enviroment information: +> +host A: cpu E5620(model WestmereEP without flag xsave) +> +host B: cpu E5-2643(model SandyBridgeEP with xsave) +> +> +The reproduce steps is : +> +1. Start a windows 2008 vm with -cpu host(which means host-passthrough). +> +2. Migrate the vm to host B when cr4.OSXSAVE=0 (successfully). +> +3. Vm runs on host B for a while so that cr4.OSXSAVE changes to 1. +> +4. Then migrate the vm to host A (successfully), but vm was paused, and qemu +> +printed log as followed: +Remember that migrating using -cpu host across different CPU models is NOT +expected to work. + +> +KVM: entry failed, hardware error 0x80000021 +> +> +If you're running a guest on an Intel machine without unrestricted mode +> +support, the failure can be most likely due to the guest entering an invalid +> +state for Intel VT. For example, the guest maybe running in big real mode +> +which is not supported on less recent Intel processors. +> +> +EAX=019b3bb0 EBX=01a3ae80 ECX=01a61ce8 EDX=00000000 +> +ESI=01a62000 EDI=00000000 EBP=00000000 ESP=01718b20 +> +EIP=0185d982 EFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 +> +ES =0000 00000000 0000ffff 00009300 +> +CS =f000 ffff0000 0000ffff 00009b00 +> +SS =0000 00000000 0000ffff 00009300 +> +DS =0000 00000000 0000ffff 00009300 +> +FS =0000 00000000 0000ffff 00009300 +> +GS =0000 00000000 0000ffff 00009300 +> +LDT=0000 00000000 0000ffff 00008200 +> +TR =0000 00000000 0000ffff 00008b00 +> +GDT= 00000000 0000ffff +> +IDT= 00000000 0000ffff +> +CR0=60000010 CR2=00000000 CR3=00000000 CR4=00000000 +> +DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 +> +DR3=0000000000000000 +> +DR6=00000000ffff0ff0 DR7=0000000000000400 +> +EFER=0000000000000000 +> +Code=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 +> +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 +> +00 +> +> +I have found that problem happened when kvm_put_sregs returns err -22(called +> +by kvm_arch_put_registers(qemu)). +> +Because kvm_arch_vcpu_ioctl_set_sregs(kvm-mod) checked that guest_cpuid_has +> +no X86_FEATURE_XSAVE but cr4.OSXSAVE=1. +> +So should we cancel migration when kvm_arch_put_registers returns error? +It would seem good if we can make the migration fail there rather than +hitting that KVM error. +It looks like we need to do a bit of plumbing to convert the places that +call it to return a bool rather than void. + +Dave + +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + diff --git a/classification_output/01/instruction/8019995 b/classification_output/01/instruction/8019995 new file mode 100644 index 000000000..92d85cc82 --- /dev/null +++ b/classification_output/01/instruction/8019995 @@ -0,0 +1,31 @@ +instruction: 0.753 +semantic: 0.698 +mistranslation: 0.633 +other: 0.620 + +[BUG]The latest qemu crashed when I tested cxl + +I test cxl with the patch:[v11,0/2] arm/virt: + CXL support via pxb_cxl. +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +But the qemu crashed,and showing an error: +qemu-system-aarch64: ../hw/arm/virt.c:1735: virt_get_high_memmap_enabled: + Assertion `ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST == ARRAY_SIZE(enabled_array)' failed. +Then I modify the patch to fix the bug: +diff --git a/hw/arm/virt.c b/hw/arm/virt.c +index ea2413a0ba..3d4cee3491 100644 +--- a/hw/arm/virt.c ++++ b/hw/arm/virt.c +@@ -1710,6 +1730,7 @@ static inline bool *virt_get_high_memmap_enabled(VirtMachineState + *vms, +&vms->highmem_redists, +&vms->highmem_ecam, +&vms->highmem_mmio, ++ &vms->cxl_devices_state.is_enabled, +}; +Now qemu works good. +Could you tell me when the patch( +arm/virt: + CXL support via pxb_cxl +) will be merged into upstream? + diff --git a/classification_output/01/instruction/8566429 b/classification_output/01/instruction/8566429 new file mode 100644 index 000000000..dfac92bf4 --- /dev/null +++ b/classification_output/01/instruction/8566429 @@ -0,0 +1,49 @@ +instruction: 0.905 +other: 0.898 +semantic: 0.825 +mistranslation: 0.462 + +[Qemu-devel] [BUG]pcibus_reset assertion failure on guest reboot + +Qemu-2.6.2 + +Start a vm with vhost-net , do reboot and hot-unplug viritio-net nic in short +time, we touch +pcibus_reset assertion failure. + +Here is qemu log: +22:29:46.359386+08:00 acpi_pm1_cnt_write -> guest do soft power off +22:29:46.785310+08:00 qemu_devices_reset +22:29:46.788093+08:00 virtio_pci_device_unplugged -> virtio net unpluged +22:29:46.803427+08:00 pcibus_reset: Assertion `bus->irq_count[i] == 0' failed. + +Here is stack info: +(gdb) bt +#0 0x00007f9a336795d7 in raise () from /usr/lib64/libc.so.6 +#1 0x00007f9a3367acc8 in abort () from /usr/lib64/libc.so.6 +#2 0x00007f9a33672546 in __assert_fail_base () from /usr/lib64/libc.so.6 +#3 0x00007f9a336725f2 in __assert_fail () from /usr/lib64/libc.so.6 +#4 0x0000000000641884 in pcibus_reset (qbus=0x29eee60) at hw/pci/pci.c:283 +#5 0x00000000005bfc30 in qbus_reset_one (bus=0x29eee60, opaque=<optimized +out>) at hw/core/qdev.c:319 +#6 0x00000000005c1b19 in qdev_walk_children (dev=0x29ed2b0, pre_devfn=0x0, +pre_busfn=0x0, post_devfn=0x5c2440 ... +#7 0x00000000005c1c59 in qbus_walk_children (bus=0x2736f80, pre_devfn=0x0, +pre_busfn=0x0, post_devfn=0x5c2440 ... +#8 0x00000000005513f5 in qemu_devices_reset () at vl.c:1998 +#9 0x00000000004cab9d in pc_machine_reset () at +/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/i386/pc.c:1976 +#10 0x000000000055148b in qemu_system_reset (address@hidden) at vl.c:2011 +#11 0x000000000055164f in main_loop_should_exit () at vl.c:2169 +#12 0x0000000000551719 in main_loop () at vl.c:2212 +#13 0x000000000041c9a8 in main (argc=<optimized out>, argv=<optimized out>, +envp=<optimized out>) at vl.c:5130 +(gdb) f 4 +... +(gdb) p bus->irq_count[0] +$6 = 1 + +Seems pci_update_irq_disabled doesn't work well + +can anyone help? + diff --git a/classification_output/01/instruction/9818783 b/classification_output/01/instruction/9818783 new file mode 100644 index 000000000..a78585284 --- /dev/null +++ b/classification_output/01/instruction/9818783 @@ -0,0 +1,308 @@ +instruction: 0.985 +other: 0.985 +semantic: 0.984 +mistranslation: 0.983 + +[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()” + +Bug Description: +Encountering a boot failure when launching a KVM guest with +qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor +crashes. +Reproduction Steps: +# qemu-system-ppc64 --version +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +pseries,accel=kvm \ +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ + -device virtio-scsi-pci,id=scsi \ +-drive +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +\ +-device scsi-hd,drive=drive0,bus=scsi.0 \ + -netdev bridge,id=net0,br=virbr0 \ + -device virtio-net-pci,netdev=net0 \ + -serial pty \ + -device virtio-balloon-pci \ + -cpu host +QEMU 9.2.50 monitor - type 'help' for more information +char device redirected to /dev/pts/2 (label serial0) +(qemu) +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +unavailable: IRQ_XIVE capability must be present for KVM +Falling back to kernel-irqchip=off +** Qemu Hang + +(In another ssh session) +# screen /dev/pts/2 +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +15:20:17 UTC 2024 +Detected machine type: 0000000000000101 +command line: +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +Calling ibm,client-architecture-support... done +memory layout at init: + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 0000000008200000 + alloc_top : 0000000030000000 + alloc_top_hi : 0000000800000000 + rmo_top : 0000000030000000 + ram_top : 0000000800000000 +instantiating rtas at 0x000000002fff0000... done +prom_hold_cpus: skipped +copying OF device tree... +Building dt strings... +Building dt structure... +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +Quiescing Open Firmware ... +Booting Linux via __start() @ 0x0000000000440000 ... +** Guest Console Hang + + +Git Bisect: +Performing git bisect points to the following patch: +# git bisect bad +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +Author: Nicholas Piggin <npiggin@gmail.com> +Date: Thu Dec 19 13:40:31 2024 +1000 + + target/ppc: fix timebase register reset state +(H)DEC and PURR get reset before icount does, which causes them to +be +skewed and not match the init state. This can cause replay to not +match the recorded trace exactly. For DEC and HDEC this is usually +not +noticable since they tend to get programmed before affecting the + target machine. PURR has been observed to cause replay bugs when + running Linux. + + Fix this by resetting using a time of 0. + + Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> + Signed-off-by: Nicholas Piggin <npiggin@gmail.com> + + hw/ppc/ppc.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + + +Reverting the patch helps boot the guest. +Thanks, +Misbah Anjum N + +Thanks for the report. + +Tricky problem. A secondary CPU is hanging before it is started by the +primary via rtas call. + +That secondary keeps calling kvm_cpu_exec(), which keeps exiting out +early with EXCP_HLT because kvm_arch_process_async_events() returns +true because that cpu has ->halted=1. That just goes around he run +loop because there is an interrupt pending (DEC). + +So it never runs. It also never releases the BQL, and another CPU, +the primary which is actually supposed to be running, is stuck in +spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL. + +This patch just exposes the bug I think, by causing the interrupt. +although I'm not quite sure why it's okay previously (-ve decrementer +values should be causing a timer exception too). The timer exception +should not be taken as an interrupt by those secondary CPUs, and it +doesn't because it is masked, until set_all_lpcrs sets an LPCR value +that enables powersave wakeup on decrementer interrupt. + +The start_powered_off sate just sets ->halted, which makes it look +like a powersaving state. Logically I think it's not the same thing +as far as spapr goes. I don't know why start_powered_off only sets +->halted, and not ->stop/stopped as well. + +Not sure how best to solve it cleanly. I'll send a revert if I can't +get something working soon. + +Thanks, +Nick + +On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote: +> +Bug Description: +> +Encountering a boot failure when launching a KVM guest with +> +qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor +> +crashes. +> +> +> +Reproduction Steps: +> +# qemu-system-ppc64 --version +> +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +> +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +> +> +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +> +pseries,accel=kvm \ +> +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ +> +-device virtio-scsi-pci,id=scsi \ +> +-drive +> +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +> +> +\ +> +-device scsi-hd,drive=drive0,bus=scsi.0 \ +> +-netdev bridge,id=net0,br=virbr0 \ +> +-device virtio-net-pci,netdev=net0 \ +> +-serial pty \ +> +-device virtio-balloon-pci \ +> +-cpu host +> +QEMU 9.2.50 monitor - type 'help' for more information +> +char device redirected to /dev/pts/2 (label serial0) +> +(qemu) +> +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +> +unavailable: IRQ_XIVE capability must be present for KVM +> +Falling back to kernel-irqchip=off +> +** Qemu Hang +> +> +(In another ssh session) +> +# screen /dev/pts/2 +> +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +> +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +> +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +> +15:20:17 UTC 2024 +> +Detected machine type: 0000000000000101 +> +command line: +> +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +> +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +> +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +> +Calling ibm,client-architecture-support... done +> +memory layout at init: +> +memory_limit : 0000000000000000 (16 MB aligned) +> +alloc_bottom : 0000000008200000 +> +alloc_top : 0000000030000000 +> +alloc_top_hi : 0000000800000000 +> +rmo_top : 0000000030000000 +> +ram_top : 0000000800000000 +> +instantiating rtas at 0x000000002fff0000... done +> +prom_hold_cpus: skipped +> +copying OF device tree... +> +Building dt strings... +> +Building dt structure... +> +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +> +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +> +Quiescing Open Firmware ... +> +Booting Linux via __start() @ 0x0000000000440000 ... +> +** Guest Console Hang +> +> +> +Git Bisect: +> +Performing git bisect points to the following patch: +> +# git bisect bad +> +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +> +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +> +Author: Nicholas Piggin <npiggin@gmail.com> +> +Date: Thu Dec 19 13:40:31 2024 +1000 +> +> +target/ppc: fix timebase register reset state +> +> +(H)DEC and PURR get reset before icount does, which causes them to +> +be +> +skewed and not match the init state. This can cause replay to not +> +match the recorded trace exactly. For DEC and HDEC this is usually +> +not +> +noticable since they tend to get programmed before affecting the +> +target machine. PURR has been observed to cause replay bugs when +> +running Linux. +> +> +Fix this by resetting using a time of 0. +> +> +Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> +> +Signed-off-by: Nicholas Piggin <npiggin@gmail.com> +> +> +hw/ppc/ppc.c | 11 ++++++++--- +> +1 file changed, 8 insertions(+), 3 deletions(-) +> +> +> +Reverting the patch helps boot the guest. +> +Thanks, +> +Misbah Anjum N + diff --git a/classification_output/01/mistranslation/0247400 b/classification_output/01/mistranslation/0247400 new file mode 100644 index 000000000..746a624cc --- /dev/null +++ b/classification_output/01/mistranslation/0247400 @@ -0,0 +1,1486 @@ +mistranslation: 0.659 +instruction: 0.624 +semantic: 0.600 +other: 0.598 + +[Qemu-devel][bug] qemu crash when migrate vm and vm's disks + +When migrate vm and vmâs disks target host qemu crash due to an invalid free. +#0 object_unref (obj=0x1000) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920 +#1 0x0000560434d79e79 in memory_region_unref (mr=<optimized out>) +at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730 +#2 flatview_destroy (view=0x560439653880) at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292 +#3 0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>) +at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284 +#4 0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0 +#5 0x00007fbc2b099bad in clone () from /lib64/libc.so.6 +test base qemu-2.12.0 +ï¼ +but use lastest qemu(v6.0.0-rc2) also reproduce. +As follow patch can resolve this problem: +https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html +Steps to reproduce: +(1) Create VM (virsh define) +(2) Add 64 virtio scsi disks +(3) migrate vm and vmâdisks +------------------------------------------------------------------------------------------------------------------------------------- +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +é®ä»¶ï¼ +This e-mail and its attachments contain confidential information from New H3C, which is +intended only for the person or entity whose address is listed above. Any use of the +information contained herein in any way (including, but not limited to, total or partial +disclosure, reproduction, or dissemination) by persons other than the intended +recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender +by phone or email immediately and delete it! + +* Yuchen (yu.chen@h3c.com) wrote: +> +When migrate vm and vmâs disks target host qemu crash due to an invalid free. +> +> +#0 object_unref (obj=0x1000) at +> +/qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920 +> +#1 0x0000560434d79e79 in memory_region_unref (mr=<optimized out>) +> +at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730 +> +#2 flatview_destroy (view=0x560439653880) at +> +/qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292 +> +#3 0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>) +> +at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284 +> +#4 0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0 +> +#5 0x00007fbc2b099bad in clone () from /lib64/libc.so.6 +> +> +test base qemu-2.12.0ï¼but use lastest qemu(v6.0.0-rc2) also reproduce. +Interesting. + +> +As follow patch can resolve this problem: +> +https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html +That's a pci/rcu change; ccing Paolo and Micahel. + +> +Steps to reproduce: +> +(1) Create VM (virsh define) +> +(2) Add 64 virtio scsi disks +Is that hot adding the disks later, or are they included in the VM at +creation? +Can you provide a libvirt XML example? + +> +(3) migrate vm and vmâdisks +What do you mean by 'and vm disks' - are you doing a block migration? + +Dave + +> +------------------------------------------------------------------------------------------------------------------------------------- +> +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +> +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +> +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +> +é®ä»¶ï¼ +> +This e-mail and its attachments contain confidential information from New +> +H3C, which is +> +intended only for the person or entity whose address is listed above. Any use +> +of the +> +information contained herein in any way (including, but not limited to, total +> +or partial +> +disclosure, reproduction, or dissemination) by persons other than the intended +> +recipient(s) is prohibited. If you receive this e-mail in error, please +> +notify the sender +> +by phone or email immediately and delete it! +-- +Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK + +> +-----é®ä»¶åä»¶----- +> +å件人: Dr. David Alan Gilbert [ +mailto:dgilbert@redhat.com +] +> +åéæ¶é´: 2021å¹´4æ8æ¥ 19:27 +> +æ¶ä»¶äºº: yuchen (Cloud) <yu.chen@h3c.com>; pbonzini@redhat.com; +> +mst@redhat.com +> +æé: qemu-devel@nongnu.org +> +主é¢: Re: [Qemu-devel][bug] qemu crash when migrate vm and vm's disks +> +> +* Yuchen (yu.chen@h3c.com) wrote: +> +> When migrate vm and vmâs disks target host qemu crash due to an invalid +> +free. +> +> +> +> #0 object_unref (obj=0x1000) at +> +> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/qom/object.c:920 +> +> #1 0x0000560434d79e79 in memory_region_unref (mr=<optimized out>) +> +> at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:1730 +> +> #2 flatview_destroy (view=0x560439653880) at +> +> /qemu-2.12/rpmbuild/BUILD/qemu-2.12/memory.c:292 +> +> #3 0x000056043514dfbe in call_rcu_thread (opaque=<optimized out>) +> +> at /qemu-2.12/rpmbuild/BUILD/qemu-2.12/util/rcu.c:284 +> +> #4 0x00007fbc2b36fe25 in start_thread () from /lib64/libpthread.so.0 +> +> #5 0x00007fbc2b099bad in clone () from /lib64/libc.so.6 +> +> +> +> test base qemu-2.12.0ï¼but use lastest qemu(v6.0.0-rc2) also reproduce. +> +> +Interesting. +> +> +> As follow patch can resolve this problem: +> +> +https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg02272.html +> +> +That's a pci/rcu change; ccing Paolo and Micahel. +> +> +> Steps to reproduce: +> +> (1) Create VM (virsh define) +> +> (2) Add 64 virtio scsi disks +> +> +Is that hot adding the disks later, or are they included in the VM at +> +creation? +> +Can you provide a libvirt XML example? +> +Include disks in the VM at creation + +vm disks xml (only virtio scsi disks): + <devices> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native'/> + <source file='/vms/tempp/vm-os'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data1'/> + <target dev='sda' bus='scsi'/> + <address type='drive' controller='2' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data2'/> + <target dev='sdb' bus='scsi'/> + <address type='drive' controller='3' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data3'/> + <target dev='sdc' bus='scsi'/> + <address type='drive' controller='4' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data4'/> + <target dev='sdd' bus='scsi'/> + <address type='drive' controller='5' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data5'/> + <target dev='sde' bus='scsi'/> + <address type='drive' controller='6' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data6'/> + <target dev='sdf' bus='scsi'/> + <address type='drive' controller='7' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data7'/> + <target dev='sdg' bus='scsi'/> + <address type='drive' controller='8' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data8'/> + <target dev='sdh' bus='scsi'/> + <address type='drive' controller='9' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data9'/> + <target dev='sdi' bus='scsi'/> + <address type='drive' controller='10' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data10'/> + <target dev='sdj' bus='scsi'/> + <address type='drive' controller='11' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data11'/> + <target dev='sdk' bus='scsi'/> + <address type='drive' controller='12' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data12'/> + <target dev='sdl' bus='scsi'/> + <address type='drive' controller='13' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data13'/> + <target dev='sdm' bus='scsi'/> + <address type='drive' controller='14' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data14'/> + <target dev='sdn' bus='scsi'/> + <address type='drive' controller='15' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data15'/> + <target dev='sdo' bus='scsi'/> + <address type='drive' controller='16' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data16'/> + <target dev='sdp' bus='scsi'/> + <address type='drive' controller='17' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data17'/> + <target dev='sdq' bus='scsi'/> + <address type='drive' controller='18' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data18'/> + <target dev='sdr' bus='scsi'/> + <address type='drive' controller='19' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data19'/> + <target dev='sds' bus='scsi'/> + <address type='drive' controller='20' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data20'/> + <target dev='sdt' bus='scsi'/> + <address type='drive' controller='21' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data21'/> + <target dev='sdu' bus='scsi'/> + <address type='drive' controller='22' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data22'/> + <target dev='sdv' bus='scsi'/> + <address type='drive' controller='23' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data23'/> + <target dev='sdw' bus='scsi'/> + <address type='drive' controller='24' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data24'/> + <target dev='sdx' bus='scsi'/> + <address type='drive' controller='25' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data25'/> + <target dev='sdy' bus='scsi'/> + <address type='drive' controller='26' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data26'/> + <target dev='sdz' bus='scsi'/> + <address type='drive' controller='27' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data27'/> + <target dev='sdaa' bus='scsi'/> + <address type='drive' controller='28' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data28'/> + <target dev='sdab' bus='scsi'/> + <address type='drive' controller='29' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data29'/> + <target dev='sdac' bus='scsi'/> + <address type='drive' controller='30' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data30'/> + <target dev='sdad' bus='scsi'/> + <address type='drive' controller='31' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data31'/> + <target dev='sdae' bus='scsi'/> + <address type='drive' controller='32' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data32'/> + <target dev='sdaf' bus='scsi'/> + <address type='drive' controller='33' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data33'/> + <target dev='sdag' bus='scsi'/> + <address type='drive' controller='34' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data34'/> + <target dev='sdah' bus='scsi'/> + <address type='drive' controller='35' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data35'/> + <target dev='sdai' bus='scsi'/> + <address type='drive' controller='36' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data36'/> + <target dev='sdaj' bus='scsi'/> + <address type='drive' controller='37' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data37'/> + <target dev='sdak' bus='scsi'/> + <address type='drive' controller='38' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data38'/> + <target dev='sdal' bus='scsi'/> + <address type='drive' controller='39' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data39'/> + <target dev='sdam' bus='scsi'/> + <address type='drive' controller='40' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data40'/> + <target dev='sdan' bus='scsi'/> + <address type='drive' controller='41' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data41'/> + <target dev='sdao' bus='scsi'/> + <address type='drive' controller='42' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data42'/> + <target dev='sdap' bus='scsi'/> + <address type='drive' controller='43' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data43'/> + <target dev='sdaq' bus='scsi'/> + <address type='drive' controller='44' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data44'/> + <target dev='sdar' bus='scsi'/> + <address type='drive' controller='45' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data45'/> + <target dev='sdas' bus='scsi'/> + <address type='drive' controller='46' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data46'/> + <target dev='sdat' bus='scsi'/> + <address type='drive' controller='47' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data47'/> + <target dev='sdau' bus='scsi'/> + <address type='drive' controller='48' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data48'/> + <target dev='sdav' bus='scsi'/> + <address type='drive' controller='49' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data49'/> + <target dev='sdaw' bus='scsi'/> + <address type='drive' controller='50' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data50'/> + <target dev='sdax' bus='scsi'/> + <address type='drive' controller='51' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data51'/> + <target dev='sday' bus='scsi'/> + <address type='drive' controller='52' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data52'/> + <target dev='sdaz' bus='scsi'/> + <address type='drive' controller='53' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data53'/> + <target dev='sdba' bus='scsi'/> + <address type='drive' controller='54' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data54'/> + <target dev='sdbb' bus='scsi'/> + <address type='drive' controller='55' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data55'/> + <target dev='sdbc' bus='scsi'/> + <address type='drive' controller='56' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data56'/> + <target dev='sdbd' bus='scsi'/> + <address type='drive' controller='57' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data57'/> + <target dev='sdbe' bus='scsi'/> + <address type='drive' controller='58' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data58'/> + <target dev='sdbf' bus='scsi'/> + <address type='drive' controller='59' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data59'/> + <target dev='sdbg' bus='scsi'/> + <address type='drive' controller='60' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data60'/> + <target dev='sdbh' bus='scsi'/> + <address type='drive' controller='61' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data61'/> + <target dev='sdbi' bus='scsi'/> + <address type='drive' controller='62' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data62'/> + <target dev='sdbj' bus='scsi'/> + <address type='drive' controller='63' bus='0' target='0' unit='0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data63'/> + <target dev='sdbk' bus='scsi'/> + <address type='drive' controller='64' bus='0' target='0' unit='0'/> + </disk> + <controller type='scsi' index='0'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x02' +function='0x0'/> + </controller> + <controller type='scsi' index='1' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' +function='0x0'/> + </controller> + <controller type='scsi' index='2' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' +function='0x0'/> + </controller> + <controller type='scsi' index='3' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x03' +function='0x0'/> + </controller> + <controller type='scsi' index='4' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x04' +function='0x0'/> + </controller> + <controller type='scsi' index='5' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x05' +function='0x0'/> + </controller> + <controller type='scsi' index='6' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x06' +function='0x0'/> + </controller> + <controller type='scsi' index='7' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x07' +function='0x0'/> + </controller> + <controller type='scsi' index='8' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x08' +function='0x0'/> + </controller> + <controller type='scsi' index='9' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x09' +function='0x0'/> + </controller> + <controller type='scsi' index='10' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' +function='0x0'/> + </controller> + <controller type='scsi' index='11' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' +function='0x0'/> + </controller> + <controller type='scsi' index='12' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' +function='0x0'/> + </controller> + <controller type='scsi' index='13' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' +function='0x0'/> + </controller> + <controller type='scsi' index='14' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' +function='0x0'/> + </controller> + <controller type='scsi' index='15' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' +function='0x0'/> + </controller> + <controller type='scsi' index='16' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x10' +function='0x0'/> + </controller> + <controller type='scsi' index='17' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x11' +function='0x0'/> + </controller> + <controller type='scsi' index='18' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x12' +function='0x0'/> + </controller> + <controller type='scsi' index='19' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x13' +function='0x0'/> + </controller> + <controller type='scsi' index='20' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x14' +function='0x0'/> + </controller> + <controller type='scsi' index='21' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x15' +function='0x0'/> + </controller> + <controller type='scsi' index='22' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x16' +function='0x0'/> + </controller> + <controller type='scsi' index='23' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x17' +function='0x0'/> + </controller> + <controller type='scsi' index='24' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x18' +function='0x0'/> + </controller> + <controller type='scsi' index='25' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x19' +function='0x0'/> + </controller> + <controller type='scsi' index='26' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' +function='0x0'/> + </controller> + <controller type='scsi' index='27' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' +function='0x0'/> + </controller> + <controller type='scsi' index='28' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' +function='0x0'/> + </controller> + <controller type='scsi' index='29' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' +function='0x0'/> + </controller> + <controller type='scsi' index='30' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' +function='0x0'/> + </controller> + <controller type='scsi' index='31' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' +function='0x0'/> + </controller> + <controller type='scsi' index='32' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x02' +function='0x0'/> + </controller> + <controller type='scsi' index='33' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x03' +function='0x0'/> + </controller> + <controller type='scsi' index='34' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x04' +function='0x0'/> + </controller> + <controller type='scsi' index='35' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x05' +function='0x0'/> + </controller> + <controller type='scsi' index='36' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x06' +function='0x0'/> + </controller> + <controller type='scsi' index='37' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x07' +function='0x0'/> + </controller> + <controller type='scsi' index='38' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x08' +function='0x0'/> + </controller> + <controller type='scsi' index='39' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x09' +function='0x0'/> + </controller> + <controller type='scsi' index='40' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' +function='0x0'/> + </controller> + <controller type='scsi' index='41' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' +function='0x0'/> + </controller> + <controller type='scsi' index='42' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x0c' +function='0x0'/> + </controller> + <controller type='scsi' index='43' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x0d' +function='0x0'/> + </controller> + <controller type='scsi' index='44' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' +function='0x0'/> + </controller> + <controller type='scsi' index='45' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x09' +function='0x0'/> + </controller> + <controller type='scsi' index='46' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' +function='0x0'/> + </controller> + <controller type='scsi' index='47' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' +function='0x0'/> + </controller> + <controller type='scsi' index='48' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' +function='0x0'/> + </controller> + <controller type='scsi' index='49' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' +function='0x0'/> + </controller> + <controller type='scsi' index='50' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' +function='0x0'/> + </controller> + <controller type='scsi' index='51' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x10' +function='0x0'/> + </controller> + <controller type='scsi' index='52' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x11' +function='0x0'/> + </controller> + <controller type='scsi' index='53' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x12' +function='0x0'/> + </controller> + <controller type='scsi' index='54' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x13' +function='0x0'/> + </controller> + <controller type='scsi' index='55' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x14' +function='0x0'/> + </controller> + <controller type='scsi' index='56' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x15' +function='0x0'/> + </controller> + <controller type='scsi' index='57' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x16' +function='0x0'/> + </controller> + <controller type='scsi' index='58' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x17' +function='0x0'/> + </controller> + <controller type='scsi' index='59' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x18' +function='0x0'/> + </controller> + <controller type='scsi' index='60' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x19' +function='0x0'/> + </controller> + <controller type='scsi' index='61' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' +function='0x0'/> + </controller> + <controller type='scsi' index='62' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' +function='0x0'/> + </controller> + <controller type='scsi' index='63' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' +function='0x0'/> + </controller> + <controller type='scsi' index='64' model='virtio-scsi'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' +function='0x0'/> + </controller> + <controller type='pci' index='0' model='pci-root'/> + <controller type='pci' index='1' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' +function='0x0'/> + </controller> + <controller type='pci' index='2' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='2'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' +function='0x0'/> + </controller> + </devices> + +vm disks xml (only virtio disks): + <devices> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native'/> + <source file='/vms/tempp/vm-os'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data2'/> + <target dev='vdb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x06' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data3'/> + <target dev='vdc' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x09' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data4'/> + <target dev='vdd' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data5'/> + <target dev='vde' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data6'/> + <target dev='vdf' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data7'/> + <target dev='vdg' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0e' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data8'/> + <target dev='vdh' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x0f' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data9'/> + <target dev='vdi' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x10' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data10'/> + <target dev='vdj' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x11' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data11'/> + <target dev='vdk' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x12' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data12'/> + <target dev='vdl' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x13' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data13'/> + <target dev='vdm' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x14' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data14'/> + <target dev='vdn' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x15' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data15'/> + <target dev='vdo' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x16' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data16'/> + <target dev='vdp' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x17' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data17'/> + <target dev='vdq' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x18' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data18'/> + <target dev='vdr' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x19' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data19'/> + <target dev='vds' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1a' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data20'/> + <target dev='vdt' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data21'/> + <target dev='vdu' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data22'/> + <target dev='vdv' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data23'/> + <target dev='vdw' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data24'/> + <target dev='vdx' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x01' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data25'/> + <target dev='vdy' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x03' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data26'/> + <target dev='vdz' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x04' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data27'/> + <target dev='vdaa' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x05' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data28'/> + <target dev='vdab' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x06' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data29'/> + <target dev='vdac' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x07' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data30'/> + <target dev='vdad' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x08' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data31'/> + <target dev='vdae' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x09' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data32'/> + <target dev='vdaf' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0a' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data33'/> + <target dev='vdag' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0b' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data34'/> + <target dev='vdah' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0c' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data35'/> + <target dev='vdai' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0d' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data36'/> + <target dev='vdaj' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0e' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data37'/> + <target dev='vdak' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x0f' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data38'/> + <target dev='vdal' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x10' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data39'/> + <target dev='vdam' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x11' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data40'/> + <target dev='vdan' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x12' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data41'/> + <target dev='vdao' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x13' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data42'/> + <target dev='vdap' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x14' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data43'/> + <target dev='vdaq' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x15' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data44'/> + <target dev='vdar' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x16' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data45'/> + <target dev='vdas' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x17' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data46'/> + <target dev='vdat' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x18' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data47'/> + <target dev='vdau' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x19' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data48'/> + <target dev='vdav' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1a' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data49'/> + <target dev='vdaw' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1b' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data50'/> + <target dev='vdax' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1c' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data51'/> + <target dev='vday' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1d' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data52'/> + <target dev='vdaz' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1e' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data53'/> + <target dev='vdba' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x01' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data54'/> + <target dev='vdbb' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x02' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data55'/> + <target dev='vdbc' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x03' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data56'/> + <target dev='vdbd' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x04' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data57'/> + <target dev='vdbe' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x05' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data58'/> + <target dev='vdbf' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x06' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data59'/> + <target dev='vdbg' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x07' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data60'/> + <target dev='vdbh' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x08' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data61'/> + <target dev='vdbi' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x09' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data62'/> + <target dev='vdbj' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x0a' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data63'/> + <target dev='vdbk' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x02' slot='0x0b' +function='0x0'/> + </disk> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2' cache='directsync' io='native' +discard='unmap'/> + <source file='/vms/tempp/vm-data1'/> + <target dev='vdbl' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x03' +function='0x0'/> + </disk> + <controller type='pci' index='0' model='pci-root'/> + <controller type='pci' index='1' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='1'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' +function='0x0'/> + </controller> + <controller type='pci' index='2' model='pci-bridge'> + <model name='pci-bridge'/> + <target chassisNr='2'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x1f' +function='0x0'/> + </controller> + </devices> + +> +> (3) migrate vm and vmâdisks +> +> +What do you mean by 'and vm disks' - are you doing a block migration? +> +Yes, block migration. +In fact, only migration domain also reproduced. + +> +Dave +> +> +> ---------------------------------------------------------------------- +> +> --------------------------------------------------------------- +> +Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK +------------------------------------------------------------------------------------------------------------------------------------- +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +é®ä»¶ï¼ +This e-mail and its attachments contain confidential information from New H3C, +which is +intended only for the person or entity whose address is listed above. Any use +of the +information contained herein in any way (including, but not limited to, total +or partial +disclosure, reproduction, or dissemination) by persons other than the intended +recipient(s) is prohibited. If you receive this e-mail in error, please notify +the sender +by phone or email immediately and delete it! + diff --git a/classification_output/01/mistranslation/1267916 b/classification_output/01/mistranslation/1267916 new file mode 100644 index 000000000..fffafcf77 --- /dev/null +++ b/classification_output/01/mistranslation/1267916 @@ -0,0 +1,1878 @@ +mistranslation: 0.927 +instruction: 0.903 +semantic: 0.891 +other: 0.877 + +[Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration + +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +And now we found the same issue at the tcg vm(kvm is fine), after +migration, the content VM's memory is inconsistent. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) +} + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile +*f, bool iterable_only) +save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) +section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); ++ check_host_md5(); + + return ret; + } + +* Li Zhijian (address@hidden) wrote: +> +Hi all, +> +> +Does anyboday remember the similar issue post by hailiang months ago +> +http://patchwork.ozlabs.org/patch/454322/ +> +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. + +> +And now we found the same issue at the tcg vm(kvm is fine), after migration, +> +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. + +> +we add a patch to check memory content, you can find it from affix +> +> +steps to reporduce: +> +1) apply the patch and re-build qemu +> +2) prepare the ubuntu guest and run memtest in grub. +> +soruce side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off +> +> +destination side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +> +> +3) start migration +> +with 1000M NIC, migration will finish within 3 min. +> +> +at source: +> +(qemu) migrate tcp:192.168.2.66:8881 +> +after saving ram complete +> +e9e725df678d392b1a83b3a917f332bb +> +qemu-system-x86_64: end ram md5 +> +(qemu) +> +> +at destination: +> +...skip... +> +Completed load of VM with exit code 0 seq iteration 1264 +> +Completed load of VM with exit code 0 seq iteration 1265 +> +Completed load of VM with exit code 0 seq iteration 1266 +> +qemu-system-x86_64: after loading state section id 2(ram) +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init +> +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +> +This occurs occasionally and only at tcg machine. It seems that +> +some pages dirtied in source side don't transferred to destination. +> +This problem can be reproduced even if we disable virtio. +> +> +Is it OK for some pages that not transferred to destination when do +> +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. + +Dave + +> +Any idea... +> +> +=================md5 check patch============================= +> +> +diff --git a/Makefile.target b/Makefile.target +> +index 962d004..e2cb8e9 100644 +> +--- a/Makefile.target +> ++++ b/Makefile.target +> +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o +> +obj-y += memory_mapping.o +> +obj-y += dump.o +> +obj-y += migration/ram.o migration/savevm.o +> +-LIBS := $(libs_softmmu) $(LIBS) +> ++LIBS := $(libs_softmmu) $(LIBS) -lplumb +> +> +# xen support +> +obj-$(CONFIG_XEN) += xen-common.o +> +diff --git a/migration/ram.c b/migration/ram.c +> +index 1eb155a..3b7a09d 100644 +> +--- a/migration/ram.c +> ++++ b/migration/ram.c +> +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +> +version_id) +> +} +> +> +rcu_read_unlock(); +> +- DPRINTF("Completed load of VM with exit code %d seq iteration " +> ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " +> +"%" PRIu64 "\n", ret, seq_iter); +> +return ret; +> +} +> +diff --git a/migration/savevm.c b/migration/savevm.c +> +index 0ad1b93..3feaa61 100644 +> +--- a/migration/savevm.c +> ++++ b/migration/savevm.c +> +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) +> +> +} +> +> ++#include "exec/ram_addr.h" +> ++#include "qemu/rcu_queue.h" +> ++#include <clplumbing/md5.h> +> ++#ifndef MD5_DIGEST_LENGTH +> ++#define MD5_DIGEST_LENGTH 16 +> ++#endif +> ++ +> ++static void check_host_md5(void) +> ++{ +> ++ int i; +> ++ unsigned char md[MD5_DIGEST_LENGTH]; +> ++ rcu_read_lock(); +> ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +> +'pc.ram' block */ +> ++ rcu_read_unlock(); +> ++ +> ++ MD5(block->host, block->used_length, md); +> ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { +> ++ fprintf(stderr, "%02x", md[i]); +> ++ } +> ++ fprintf(stderr, "\n"); +> ++ error_report("end ram md5"); +> ++} +> ++ +> +void qemu_savevm_state_begin(QEMUFile *f, +> +const MigrationParams *params) +> +{ +> +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +> +bool iterable_only) +> +save_section_header(f, se, QEMU_VM_SECTION_END); +> +> +ret = se->ops->save_live_complete_precopy(f, se->opaque); +> ++ +> ++ fprintf(stderr, "after saving %s complete\n", se->idstr); +> ++ check_host_md5(); +> ++ +> +trace_savevm_section_end(se->idstr, se->section_id, ret); +> +save_section_footer(f, se); +> +if (ret < 0) { +> +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +> +MigrationIncomingState *mis) +> +section_id, le->se->idstr); +> +return ret; +> +} +> ++ if (section_type == QEMU_VM_SECTION_END) { +> ++ error_report("after loading state section id %d(%s)", +> ++ section_id, le->se->idstr); +> ++ check_host_md5(); +> ++ } +> +if (!check_section_footer(f, le)) { +> +return -EINVAL; +> +} +> +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) +> +} +> +> +cpu_synchronize_all_post_init(); +> ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); +> ++ check_host_md5(); +> +> +return ret; +> +} +> +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: +* Li Zhijian (address@hidden) wrote: +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. +And now we found the same issue at the tcg vm(kvm is fine), after migration, +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off + +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 + +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. + +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. +Maybe one better way to do that is with the help of userfaultfd's write-protect +capability. It is still in the development by Andrea Arcangeli, but there +is a RFC version available, please refer to +http://www.spinics.net/lists/linux-mm/msg97422.html +ï¼I'm developing live memory snapshot which based on it, maybe this is another +scene where we +can use userfaultfd's WP ;) ). +Dave +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) + } + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +bool iterable_only) + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) + section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); ++ check_host_md5(); + + return ret; + } +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +. + +On 12/03/2015 05:37 PM, Hailiang Zhang wrote: +On 2015/12/3 17:24, Dr. David Alan Gilbert wrote: +* Li Zhijian (address@hidden) wrote: +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. +And now we found the same issue at the tcg vm(kvm is fine), after +migration, +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 + +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off + +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 + +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 + +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after +cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. + +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. +Maybe one better way to do that is with the help of userfaultfd's +write-protect +capability. It is still in the development by Andrea Arcangeli, but there +is a RFC version available, please refer to +http://www.spinics.net/lists/linux-mm/msg97422.html +ï¼I'm developing live memory snapshot which based on it, maybe this is +another scene where we +can use userfaultfd's WP ;) ). +sounds good. + +thanks +Li +Dave +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) + } + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq +iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void +qemu_savevm_state_complete_precopy(QEMUFile *f, +bool iterable_only) + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) + section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", +__func__); ++ check_host_md5(); + + return ret; + } +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +. +. +-- +Best regards. +Li Zhijian (8555) + +On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote: +* Li Zhijian (address@hidden) wrote: +Hi all, + +Does anyboday remember the similar issue post by hailiang months ago +http://patchwork.ozlabs.org/patch/454322/ +At least tow bugs about migration had been fixed since that. +Yes, I wondered what happened to that. +And now we found the same issue at the tcg vm(kvm is fine), after migration, +the content VM's memory is inconsistent. +Hmm, TCG only - I don't know much about that; but I guess something must +be accessing memory without using the proper macros/functions so +it doesn't mark it as dirty. +we add a patch to check memory content, you can find it from affix + +steps to reporduce: +1) apply the patch and re-build qemu +2) prepare the ubuntu guest and run memtest in grub. +soruce side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off + +destination side: +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 + +3) start migration +with 1000M NIC, migration will finish within 3 min. + +at source: +(qemu) migrate tcp:192.168.2.66:8881 +after saving ram complete +e9e725df678d392b1a83b3a917f332bb +qemu-system-x86_64: end ram md5 +(qemu) + +at destination: +...skip... +Completed load of VM with exit code 0 seq iteration 1264 +Completed load of VM with exit code 0 seq iteration 1265 +Completed load of VM with exit code 0 seq iteration 1266 +qemu-system-x86_64: after loading state section id 2(ram) +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init + +49c2dac7bde0e5e22db7280dcb3824f9 +qemu-system-x86_64: end ram md5 + +This occurs occasionally and only at tcg machine. It seems that +some pages dirtied in source side don't transferred to destination. +This problem can be reproduced even if we disable virtio. + +Is it OK for some pages that not transferred to destination when do +migration ? Or is it a bug? +I'm pretty sure that means it's a bug. Hard to find though, I guess +at least memtest is smaller than a big OS. I think I'd dump the whole +of memory on both sides, hexdump and diff them - I'd guess it would +just be one byte/word different, maybe that would offer some idea what +wrote it. +I try to dump and compare them, more than 10 pages are different. +in source side, they are random value rather than always 'FF' 'FB' 'EF' +'BF'... in destination. +and not all of the different pages are continuous. + +thanks +Li +Dave +Any idea... + +=================md5 check patch============================= + +diff --git a/Makefile.target b/Makefile.target +index 962d004..e2cb8e9 100644 +--- a/Makefile.target ++++ b/Makefile.target +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o + obj-y += memory_mapping.o + obj-y += dump.o + obj-y += migration/ram.o migration/savevm.o +-LIBS := $(libs_softmmu) $(LIBS) ++LIBS := $(libs_softmmu) $(LIBS) -lplumb + + # xen support + obj-$(CONFIG_XEN) += xen-common.o +diff --git a/migration/ram.c b/migration/ram.c +index 1eb155a..3b7a09d 100644 +--- a/migration/ram.c ++++ b/migration/ram.c +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +version_id) + } + + rcu_read_unlock(); +- DPRINTF("Completed load of VM with exit code %d seq iteration " ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " + "%" PRIu64 "\n", ret, seq_iter); + return ret; + } +diff --git a/migration/savevm.c b/migration/savevm.c +index 0ad1b93..3feaa61 100644 +--- a/migration/savevm.c ++++ b/migration/savevm.c +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) + + } + ++#include "exec/ram_addr.h" ++#include "qemu/rcu_queue.h" ++#include <clplumbing/md5.h> ++#ifndef MD5_DIGEST_LENGTH ++#define MD5_DIGEST_LENGTH 16 ++#endif ++ ++static void check_host_md5(void) ++{ ++ int i; ++ unsigned char md[MD5_DIGEST_LENGTH]; ++ rcu_read_lock(); ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +'pc.ram' block */ ++ rcu_read_unlock(); ++ ++ MD5(block->host, block->used_length, md); ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { ++ fprintf(stderr, "%02x", md[i]); ++ } ++ fprintf(stderr, "\n"); ++ error_report("end ram md5"); ++} ++ + void qemu_savevm_state_begin(QEMUFile *f, + const MigrationParams *params) + { +@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +bool iterable_only) + save_section_header(f, se, QEMU_VM_SECTION_END); + + ret = se->ops->save_live_complete_precopy(f, se->opaque); ++ ++ fprintf(stderr, "after saving %s complete\n", se->idstr); ++ check_host_md5(); ++ + trace_savevm_section_end(se->idstr, se->section_id, ret); + save_section_footer(f, se); + if (ret < 0) { +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +MigrationIncomingState *mis) + section_id, le->se->idstr); + return ret; + } ++ if (section_type == QEMU_VM_SECTION_END) { ++ error_report("after loading state section id %d(%s)", ++ section_id, le->se->idstr); ++ check_host_md5(); ++ } + if (!check_section_footer(f, le)) { + return -EINVAL; + } +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) + } + + cpu_synchronize_all_post_init(); ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); ++ check_host_md5(); + + return ret; + } +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + + +. +-- +Best regards. +Li Zhijian (8555) + +* Li Zhijian (address@hidden) wrote: +> +> +> +On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote: +> +>* Li Zhijian (address@hidden) wrote: +> +>>Hi all, +> +>> +> +>>Does anyboday remember the similar issue post by hailiang months ago +> +>> +http://patchwork.ozlabs.org/patch/454322/ +> +>>At least tow bugs about migration had been fixed since that. +> +> +> +>Yes, I wondered what happened to that. +> +> +> +>>And now we found the same issue at the tcg vm(kvm is fine), after migration, +> +>>the content VM's memory is inconsistent. +> +> +> +>Hmm, TCG only - I don't know much about that; but I guess something must +> +>be accessing memory without using the proper macros/functions so +> +>it doesn't mark it as dirty. +> +> +> +>>we add a patch to check memory content, you can find it from affix +> +>> +> +>>steps to reporduce: +> +>>1) apply the patch and re-build qemu +> +>>2) prepare the ubuntu guest and run memtest in grub. +> +>>soruce side: +> +>>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +>>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +>>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +>>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +>>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +>>pc-i440fx-2.3,accel=tcg,usb=off +> +>> +> +>>destination side: +> +>>x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +>>e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +>>if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +>>virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +>>-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +>>tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +>>pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +> +>> +> +>>3) start migration +> +>>with 1000M NIC, migration will finish within 3 min. +> +>> +> +>>at source: +> +>>(qemu) migrate tcp:192.168.2.66:8881 +> +>>after saving ram complete +> +>>e9e725df678d392b1a83b3a917f332bb +> +>>qemu-system-x86_64: end ram md5 +> +>>(qemu) +> +>> +> +>>at destination: +> +>>...skip... +> +>>Completed load of VM with exit code 0 seq iteration 1264 +> +>>Completed load of VM with exit code 0 seq iteration 1265 +> +>>Completed load of VM with exit code 0 seq iteration 1266 +> +>>qemu-system-x86_64: after loading state section id 2(ram) +> +>>49c2dac7bde0e5e22db7280dcb3824f9 +> +>>qemu-system-x86_64: end ram md5 +> +>>qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init +> +>> +> +>>49c2dac7bde0e5e22db7280dcb3824f9 +> +>>qemu-system-x86_64: end ram md5 +> +>> +> +>>This occurs occasionally and only at tcg machine. It seems that +> +>>some pages dirtied in source side don't transferred to destination. +> +>>This problem can be reproduced even if we disable virtio. +> +>> +> +>>Is it OK for some pages that not transferred to destination when do +> +>>migration ? Or is it a bug? +> +> +> +>I'm pretty sure that means it's a bug. Hard to find though, I guess +> +>at least memtest is smaller than a big OS. I think I'd dump the whole +> +>of memory on both sides, hexdump and diff them - I'd guess it would +> +>just be one byte/word different, maybe that would offer some idea what +> +>wrote it. +> +> +I try to dump and compare them, more than 10 pages are different. +> +in source side, they are random value rather than always 'FF' 'FB' 'EF' +> +'BF'... in destination. +> +> +and not all of the different pages are continuous. +I wonder if it happens on all of memtest's different test patterns, +perhaps it might be possible to narrow it down if you tell memtest +to only run one test at a time. + +Dave + +> +> +thanks +> +Li +> +> +> +> +> +>Dave +> +> +> +>>Any idea... +> +>> +> +>>=================md5 check patch============================= +> +>> +> +>>diff --git a/Makefile.target b/Makefile.target +> +>>index 962d004..e2cb8e9 100644 +> +>>--- a/Makefile.target +> +>>+++ b/Makefile.target +> +>>@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o +> +>> obj-y += memory_mapping.o +> +>> obj-y += dump.o +> +>> obj-y += migration/ram.o migration/savevm.o +> +>>-LIBS := $(libs_softmmu) $(LIBS) +> +>>+LIBS := $(libs_softmmu) $(LIBS) -lplumb +> +>> +> +>> # xen support +> +>> obj-$(CONFIG_XEN) += xen-common.o +> +>>diff --git a/migration/ram.c b/migration/ram.c +> +>>index 1eb155a..3b7a09d 100644 +> +>>--- a/migration/ram.c +> +>>+++ b/migration/ram.c +> +>>@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int +> +>>version_id) +> +>> } +> +>> +> +>> rcu_read_unlock(); +> +>>- DPRINTF("Completed load of VM with exit code %d seq iteration " +> +>>+ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " +> +>> "%" PRIu64 "\n", ret, seq_iter); +> +>> return ret; +> +>> } +> +>>diff --git a/migration/savevm.c b/migration/savevm.c +> +>>index 0ad1b93..3feaa61 100644 +> +>>--- a/migration/savevm.c +> +>>+++ b/migration/savevm.c +> +>>@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) +> +>> +> +>> } +> +>> +> +>>+#include "exec/ram_addr.h" +> +>>+#include "qemu/rcu_queue.h" +> +>>+#include <clplumbing/md5.h> +> +>>+#ifndef MD5_DIGEST_LENGTH +> +>>+#define MD5_DIGEST_LENGTH 16 +> +>>+#endif +> +>>+ +> +>>+static void check_host_md5(void) +> +>>+{ +> +>>+ int i; +> +>>+ unsigned char md[MD5_DIGEST_LENGTH]; +> +>>+ rcu_read_lock(); +> +>>+ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +> +>>'pc.ram' block */ +> +>>+ rcu_read_unlock(); +> +>>+ +> +>>+ MD5(block->host, block->used_length, md); +> +>>+ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { +> +>>+ fprintf(stderr, "%02x", md[i]); +> +>>+ } +> +>>+ fprintf(stderr, "\n"); +> +>>+ error_report("end ram md5"); +> +>>+} +> +>>+ +> +>> void qemu_savevm_state_begin(QEMUFile *f, +> +>> const MigrationParams *params) +> +>> { +> +>>@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f, +> +>>bool iterable_only) +> +>> save_section_header(f, se, QEMU_VM_SECTION_END); +> +>> +> +>> ret = se->ops->save_live_complete_precopy(f, se->opaque); +> +>>+ +> +>>+ fprintf(stderr, "after saving %s complete\n", se->idstr); +> +>>+ check_host_md5(); +> +>>+ +> +>> trace_savevm_section_end(se->idstr, se->section_id, ret); +> +>> save_section_footer(f, se); +> +>> if (ret < 0) { +> +>>@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +> +>>MigrationIncomingState *mis) +> +>> section_id, le->se->idstr); +> +>> return ret; +> +>> } +> +>>+ if (section_type == QEMU_VM_SECTION_END) { +> +>>+ error_report("after loading state section id %d(%s)", +> +>>+ section_id, le->se->idstr); +> +>>+ check_host_md5(); +> +>>+ } +> +>> if (!check_section_footer(f, le)) { +> +>> return -EINVAL; +> +>> } +> +>>@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) +> +>> } +> +>> +> +>> cpu_synchronize_all_post_init(); +> +>>+ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); +> +>>+ check_host_md5(); +> +>> +> +>> return ret; +> +>> } +> +>> +> +>> +> +>> +> +>-- +> +>Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> +> +> +> +>. +> +> +> +> +-- +> +Best regards. +> +Li Zhijian (8555) +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +Li Zhijian <address@hidden> wrote: +> +Hi all, +> +> +Does anyboday remember the similar issue post by hailiang months ago +> +http://patchwork.ozlabs.org/patch/454322/ +> +At least tow bugs about migration had been fixed since that. +> +> +And now we found the same issue at the tcg vm(kvm is fine), after +> +migration, the content VM's memory is inconsistent. +> +> +we add a patch to check memory content, you can find it from affix +> +> +steps to reporduce: +> +1) apply the patch and re-build qemu +> +2) prepare the ubuntu guest and run memtest in grub. +> +soruce side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off +> +> +destination side: +> +x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device +> +e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive +> +if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device +> +virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 +> +-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp +> +tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine +> +pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881 +> +> +3) start migration +> +with 1000M NIC, migration will finish within 3 min. +> +> +at source: +> +(qemu) migrate tcp:192.168.2.66:8881 +> +after saving ram complete +> +e9e725df678d392b1a83b3a917f332bb +> +qemu-system-x86_64: end ram md5 +> +(qemu) +> +> +at destination: +> +...skip... +> +Completed load of VM with exit code 0 seq iteration 1264 +> +Completed load of VM with exit code 0 seq iteration 1265 +> +Completed load of VM with exit code 0 seq iteration 1266 +> +qemu-system-x86_64: after loading state section id 2(ram) +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init +> +> +49c2dac7bde0e5e22db7280dcb3824f9 +> +qemu-system-x86_64: end ram md5 +> +> +This occurs occasionally and only at tcg machine. It seems that +> +some pages dirtied in source side don't transferred to destination. +> +This problem can be reproduced even if we disable virtio. +> +> +Is it OK for some pages that not transferred to destination when do +> +migration ? Or is it a bug? +> +> +Any idea... +Thanks for describing how to reproduce the bug. +If some pages are not transferred to destination then it is a bug, so we +need to know what the problem is, notice that the problem can be that +TCG is not marking dirty some page, that Migration code "forgets" about +that page, or anything eles altogether, that is what we need to find. + +There are more posibilities, I am not sure that memtest is on 32bit +mode, and it is inside posibility that we are missing some state when we +are on real mode. + +Will try to take a look at this. + +THanks, again. + + +> +> +=================md5 check patch============================= +> +> +diff --git a/Makefile.target b/Makefile.target +> +index 962d004..e2cb8e9 100644 +> +--- a/Makefile.target +> ++++ b/Makefile.target +> +@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o +> +obj-y += memory_mapping.o +> +obj-y += dump.o +> +obj-y += migration/ram.o migration/savevm.o +> +-LIBS := $(libs_softmmu) $(LIBS) +> ++LIBS := $(libs_softmmu) $(LIBS) -lplumb +> +> +# xen support +> +obj-$(CONFIG_XEN) += xen-common.o +> +diff --git a/migration/ram.c b/migration/ram.c +> +index 1eb155a..3b7a09d 100644 +> +--- a/migration/ram.c +> ++++ b/migration/ram.c +> +@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, +> +int version_id) +> +} +> +> +rcu_read_unlock(); +> +- DPRINTF("Completed load of VM with exit code %d seq iteration " +> ++ fprintf(stderr, "Completed load of VM with exit code %d seq iteration " +> +"%" PRIu64 "\n", ret, seq_iter); +> +return ret; +> +} +> +diff --git a/migration/savevm.c b/migration/savevm.c +> +index 0ad1b93..3feaa61 100644 +> +--- a/migration/savevm.c +> ++++ b/migration/savevm.c +> +@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f) +> +> +} +> +> ++#include "exec/ram_addr.h" +> ++#include "qemu/rcu_queue.h" +> ++#include <clplumbing/md5.h> +> ++#ifndef MD5_DIGEST_LENGTH +> ++#define MD5_DIGEST_LENGTH 16 +> ++#endif +> ++ +> ++static void check_host_md5(void) +> ++{ +> ++ int i; +> ++ unsigned char md[MD5_DIGEST_LENGTH]; +> ++ rcu_read_lock(); +> ++ RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check +> +'pc.ram' block */ +> ++ rcu_read_unlock(); +> ++ +> ++ MD5(block->host, block->used_length, md); +> ++ for(i = 0; i < MD5_DIGEST_LENGTH; i++) { +> ++ fprintf(stderr, "%02x", md[i]); +> ++ } +> ++ fprintf(stderr, "\n"); +> ++ error_report("end ram md5"); +> ++} +> ++ +> +void qemu_savevm_state_begin(QEMUFile *f, +> +const MigrationParams *params) +> +{ +> +@@ -1056,6 +1079,10 @@ void +> +qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only) +> +save_section_header(f, se, QEMU_VM_SECTION_END); +> +> +ret = se->ops->save_live_complete_precopy(f, se->opaque); +> ++ +> ++ fprintf(stderr, "after saving %s complete\n", se->idstr); +> ++ check_host_md5(); +> ++ +> +trace_savevm_section_end(se->idstr, se->section_id, ret); +> +save_section_footer(f, se); +> +if (ret < 0) { +> +@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f, +> +MigrationIncomingState *mis) +> +section_id, le->se->idstr); +> +return ret; +> +} +> ++ if (section_type == QEMU_VM_SECTION_END) { +> ++ error_report("after loading state section id %d(%s)", +> ++ section_id, le->se->idstr); +> ++ check_host_md5(); +> ++ } +> +if (!check_section_footer(f, le)) { +> +return -EINVAL; +> +} +> +@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f) +> +} +> +> +cpu_synchronize_all_post_init(); +> ++ error_report("%s: after cpu_synchronize_all_post_init\n", __func__); +> ++ check_host_md5(); +> +> +return ret; +> +} + +> +> +Thanks for describing how to reproduce the bug. +> +If some pages are not transferred to destination then it is a bug, so we need +> +to know what the problem is, notice that the problem can be that TCG is not +> +marking dirty some page, that Migration code "forgets" about that page, or +> +anything eles altogether, that is what we need to find. +> +> +There are more posibilities, I am not sure that memtest is on 32bit mode, and +> +it is inside posibility that we are missing some state when we are on real +> +mode. +> +> +Will try to take a look at this. +> +> +THanks, again. +> +Hi Juan & Amit + + Do you think we should add a mechanism to check the data integrity during LM +like Zhijian's patch did? it may be very helpful for developers. + Actually, I did the similar thing before in order to make sure that I did the +right thing we I change the code related to LM. + +Liang + +On (Fri) 04 Dec 2015 [01:43:07], Li, Liang Z wrote: +> +> +> +> Thanks for describing how to reproduce the bug. +> +> If some pages are not transferred to destination then it is a bug, so we +> +> need +> +> to know what the problem is, notice that the problem can be that TCG is not +> +> marking dirty some page, that Migration code "forgets" about that page, or +> +> anything eles altogether, that is what we need to find. +> +> +> +> There are more posibilities, I am not sure that memtest is on 32bit mode, +> +> and +> +> it is inside posibility that we are missing some state when we are on real +> +> mode. +> +> +> +> Will try to take a look at this. +> +> +> +> THanks, again. +> +> +> +> +Hi Juan & Amit +> +> +Do you think we should add a mechanism to check the data integrity during LM +> +like Zhijian's patch did? it may be very helpful for developers. +> +Actually, I did the similar thing before in order to make sure that I did +> +the right thing we I change the code related to LM. +If you mean for debugging, something that's not always on, then I'm +fine with it. + +A script that goes along that shows the result of comparison of the +diff will be helpful too, something that shows how many pages are +differnt, how many bytes in a page on average, and so on. + + Amit + diff --git a/classification_output/01/mistranslation/1693040 b/classification_output/01/mistranslation/1693040 new file mode 100644 index 000000000..67353acda --- /dev/null +++ b/classification_output/01/mistranslation/1693040 @@ -0,0 +1,1061 @@ +mistranslation: 0.862 +semantic: 0.858 +instruction: 0.856 +other: 0.852 + +[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang + +hi: + +yes.it is better. + +And should we delete + + + + +#ifdef WIN32 + + QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL) + +#endif + + + + +in qio_channel_socket_acceptï¼ + +qio_channel_socket_new already have it. + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 +æéäººï¼ address@hidden address@hidden address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: çå¤: Re: [BUG]COLO failover hang + + + + + +Hi, + +On 2017/3/22 9:42, address@hidden wrote: +ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ +ï¼ +ï¼ index 13966f1..d65a0ea 100644 +ï¼ +ï¼ +ï¼ --- a/migration/socket.c +ï¼ +ï¼ +ï¼ +++ b/migration/socket.c +ï¼ +ï¼ +ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ +ï¼ +ï¼ } +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ trace_migration_socket_incoming_accepted() +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +ï¼ +ï¼ +ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ +ï¼ +ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ +ï¼ +ï¼ QIO_CHANNEL(sioc)) +ï¼ +ï¼ +ï¼ object_unref(OBJECT(sioc)) +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ Is this patch ok? +ï¼ + +Yes, i think this works, but a better way maybe to call +qio_channel_set_feature() +in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the +socket accept fd, +Or fix it by this: + +diff --git a/io/channel-socket.c b/io/channel-socket.c +index f546c68..ce6894c 100644 +--- a/io/channel-socket.c ++++ b/io/channel-socket.c +@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, + Error **errp) + { + QIOChannelSocket *cioc +- +- cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)) +- cioc-ï¼fd = -1 ++ ++ cioc = qio_channel_socket_new() + cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr) + cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr) + + +Thanks, +Hailiang + +ï¼ I have test it . The test could not hang any more. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ +ï¼ +ï¼ +ï¼ åä»¶äººï¼ address@hidden +ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden +ï¼ æéäººï¼ address@hidden address@hidden address@hidden +ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 +ï¼ ä¸» é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: +ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote: +ï¼ ï¼ï¼ Hi, +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do +failover, +ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) +ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if +ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN). +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() +ï¼ ï¼ï¼ if we tried to cancel migration. +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, +Error **errp) +ï¼ ï¼ï¼ { +ï¼ ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") +ï¼ ï¼ï¼ migration_channel_connect(s, ioc, NULL) +ï¼ ï¼ï¼ ... ... +ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +ï¼ ï¼ï¼ and the +ï¼ ï¼ï¼ migrate_fd_cancel() +ï¼ ï¼ï¼ { +ï¼ ï¼ï¼ ... ... +ï¼ ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { +ï¼ ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? +ï¼ ï¼ï¼ } +ï¼ ï¼ï¼ } +ï¼ ï¼ +ï¼ ï¼ (cc'd in Daniel Berrange). +ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, +QIO_CHANNEL_FEATURE_SHUTDOWN) at the +ï¼ ï¼ top of qio_channel_socket_new so I think that's safe isn't it? +ï¼ ï¼ +ï¼ +ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, +thanks. +ï¼ +ï¼ ï¼ Dave +ï¼ ï¼ +ï¼ ï¼ï¼ Thanks, +ï¼ ï¼ï¼ Hailiang +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: +ï¼ ï¼ï¼ï¼ Thank youã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ I have test areadyã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same +placeã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node +qemu will not produce the problem,but Primary Node panic canã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ I test a patch: +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ --- a/migration/socket.c +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ } +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), +"migration-socket-incoming") +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ QIO_CHANNEL(sioc)) +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ object_unref(OBJECT(sioc)) +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ My test will not hang any more. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ åå§é®ä»¶ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden +ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden +ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden +ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +ï¼ ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Hi,Wang. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ You can test this branch: +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +http://wiki.qemu-project.org/Features/COLO +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Thanks +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Zhang Chen +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ hi. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", +ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ at migration/colo.c:264 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $3 = 0 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $1 = 6 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should +ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ thank you. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶ +ï¼ ï¼ï¼ï¼ ï¼ address@hidden +ï¼ ï¼ï¼ï¼ ï¼ address@hidden +ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ +ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, +ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, +ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ Thanks +ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized +outï¼, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, +errp=0x0) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message +(errp=0x7f3d62bfaa48, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ -- +ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ -- +ï¼ ï¼ï¼ï¼ ï¼ Thanks +ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ +ï¼ ï¼ -- +ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK +ï¼ ï¼ +ï¼ ï¼ . +ï¼ ï¼ +ï¼ + +On 2017/3/22 16:09, address@hidden wrote: +hi: + +yes.it is better. + +And should we delete +Yes, you are right. +#ifdef WIN32 + + QIO_CHANNEL(cioc)-ï¼event = CreateEvent(NULL, FALSE, FALSE, NULL) + +#endif + + + + +in qio_channel_socket_acceptï¼ + +qio_channel_socket_new already have it. + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 +æéäººï¼ address@hidden address@hidden address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ22æ¥ 15:03 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: çå¤: Re: [BUG]COLO failover hang + + + + + +Hi, + +On 2017/3/22 9:42, address@hidden wrote: +ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ +ï¼ +ï¼ index 13966f1..d65a0ea 100644 +ï¼ +ï¼ +ï¼ --- a/migration/socket.c +ï¼ +ï¼ +ï¼ +++ b/migration/socket.c +ï¼ +ï¼ +ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ +ï¼ +ï¼ } +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ trace_migration_socket_incoming_accepted() +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +ï¼ +ï¼ +ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ +ï¼ +ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ +ï¼ +ï¼ QIO_CHANNEL(sioc)) +ï¼ +ï¼ +ï¼ object_unref(OBJECT(sioc)) +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ Is this patch ok? +ï¼ + +Yes, i think this works, but a better way maybe to call +qio_channel_set_feature() +in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the +socket accept fd, +Or fix it by this: + +diff --git a/io/channel-socket.c b/io/channel-socket.c +index f546c68..ce6894c 100644 +--- a/io/channel-socket.c ++++ b/io/channel-socket.c +@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, + Error **errp) + { + QIOChannelSocket *cioc +- +- cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)) +- cioc-ï¼fd = -1 ++ ++ cioc = qio_channel_socket_new() + cioc-ï¼remoteAddrLen = sizeof(ioc-ï¼remoteAddr) + cioc-ï¼localAddrLen = sizeof(ioc-ï¼localAddr) + + +Thanks, +Hailiang + +ï¼ I have test it . The test could not hang any more. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ +ï¼ +ï¼ +ï¼ åä»¶äººï¼ address@hidden +ï¼ æ¶ä»¶äººï¼ address@hidden address@hidden +ï¼ æéäººï¼ address@hidden address@hidden address@hidden +ï¼ æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 +ï¼ ä¸» é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: +ï¼ ï¼ * Hailiang Zhang (address@hidden) wrote: +ï¼ ï¼ï¼ Hi, +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +ï¼ ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do +failover, +ï¼ ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) +ï¼ ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if +ï¼ ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN). +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() +ï¼ ï¼ï¼ if we tried to cancel migration. +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, +Error **errp) +ï¼ ï¼ï¼ { +ï¼ ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") +ï¼ ï¼ï¼ migration_channel_connect(s, ioc, NULL) +ï¼ ï¼ï¼ ... ... +ï¼ ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +ï¼ ï¼ï¼ and the +ï¼ ï¼ï¼ migrate_fd_cancel() +ï¼ ï¼ï¼ { +ï¼ ï¼ï¼ ... ... +ï¼ ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { +ï¼ ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? +ï¼ ï¼ï¼ } +ï¼ ï¼ï¼ } +ï¼ ï¼ +ï¼ ï¼ (cc'd in Daniel Berrange). +ï¼ ï¼ I see that we call qio_channel_set_feature(ioc, +QIO_CHANNEL_FEATURE_SHUTDOWN) at the +ï¼ ï¼ top of qio_channel_socket_new so I think that's safe isn't it? +ï¼ ï¼ +ï¼ +ï¼ Hmm, you are right, this problem is only exist for the migration incoming fd, +thanks. +ï¼ +ï¼ ï¼ Dave +ï¼ ï¼ +ï¼ ï¼ï¼ Thanks, +ï¼ ï¼ï¼ Hailiang +ï¼ ï¼ï¼ +ï¼ ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: +ï¼ ï¼ï¼ï¼ Thank youã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ I have test areadyã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same +placeã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node +qemu will not produce the problem,but Primary Node panic canã +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ I test a patch: +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ --- a/migration/socket.c +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +++ b/migration/socket.c +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ } +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), +"migration-socket-incoming") +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ QIO_CHANNEL(sioc)) +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ object_unref(OBJECT(sioc)) +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ My test will not hang any more. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ åå§é®ä»¶ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ åä»¶äººï¼ address@hidden +ï¼ ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden +ï¼ ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden +ï¼ ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +ï¼ ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Hi,Wang. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ You can test this branch: +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +http://wiki.qemu-project.org/Features/COLO +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Thanks +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ Zhang Chen +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ hi. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", +ï¼ ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ ï¼ï¼ï¼ ï¼ outï¼, address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ at migration/colo.c:264 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $3 = 0 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ ï¼ï¼ï¼ ï¼ gmain.c:3054 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:258 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ util/main-loop.c:506 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $1 = 6 +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should +ï¼ ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ thank you. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ åå§é®ä»¶ +ï¼ ï¼ï¼ï¼ ï¼ address@hidden +ï¼ ï¼ï¼ï¼ ï¼ address@hidden +ï¼ ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ +ï¼ ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, +ï¼ ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, +ï¼ ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ Thanks +ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized +outï¼, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, +errp=0x0) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message +(errp=0x7f3d62bfaa48, +ï¼ ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ -- +ï¼ ï¼ï¼ï¼ ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ -- +ï¼ ï¼ï¼ï¼ ï¼ Thanks +ï¼ ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ ï¼ +ï¼ ï¼ï¼ï¼ +ï¼ ï¼ï¼ +ï¼ ï¼ -- +ï¼ ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK +ï¼ ï¼ +ï¼ ï¼ . +ï¼ ï¼ +ï¼ + diff --git a/classification_output/01/mistranslation/3886413 b/classification_output/01/mistranslation/3886413 new file mode 100644 index 000000000..5f79c452f --- /dev/null +++ b/classification_output/01/mistranslation/3886413 @@ -0,0 +1,33 @@ +mistranslation: 0.637 +instruction: 0.555 +other: 0.535 +semantic: 0.487 + +[Qemu-devel] [BUG] vhost-user: hot-unplug vhost-user nic for windows guest OS will fail with 100% reproduce rate + +Hi, guys + +I met a problem when hot-unplug vhost-user nic for Windows 2008 rc2 sp1 64 +(Guest OS) + +The xml of nic is as followed: +<interface type='vhostuser'> + <mac address='52:54:00:3b:83:aa'/> + <source type='unix' path='/var/run/vhost-user/port1' mode='client'/> + <target dev='port1'/> + <model type='virtio'/> + <driver queues='4'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> +</interface> + +Firstly, I use virsh attach-device win2008 vif.xml to hot-plug a nic for Guest +OS. This operation returns success. +After guest OS discover nic successfully, I use virsh detach-device win2008 +vif.xml to hot-unplug it. This operation will fail with 100% reproduce rate. + +However, if I hot-plug and hot-unplug virtio-net nic , it will not fail. + +I have analysis the process of qmp_device_del , I found that qemu have inject +interrupt to acpi to let it notice guest OS to remove nic. +I guess there is something wrong in Windows when handle the interrupt. + diff --git a/classification_output/01/mistranslation/4158985 b/classification_output/01/mistranslation/4158985 new file mode 100644 index 000000000..798c2e866 --- /dev/null +++ b/classification_output/01/mistranslation/4158985 @@ -0,0 +1,1480 @@ +mistranslation: 0.922 +other: 0.898 +semantic: 0.890 +instruction: 0.877 + +[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device + +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) + +Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +the autogenerated virtio-net-ccw device is present) works. Specifying +several "-device virtio-net-pci" works as well. + +Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +works (in-between state does not compile). + +This is reproducible with tcg as well. Same problem both with +--enable-vhost-vdpa and --disable-vhost-vdpa. + +Have not yet tried to figure out what might be special with +virtio-ccw... anyone have an idea? + +[This should probably be considered a blocker?] + +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +When I start qemu with a second virtio-net-ccw device (i.e. adding +> +-device virtio-net-ccw in addition to the autogenerated device), I get +> +a segfault. gdb points to +> +> +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +config=0x55d6ad9e3f80 "RT") at +> +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> +(backtrace doesn't go further) +> +> +Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +the autogenerated virtio-net-ccw device is present) works. Specifying +> +several "-device virtio-net-pci" works as well. +> +> +Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +works (in-between state does not compile). +Ouch. I didn't test all in-between states :( +But I wish we had a 0-day instrastructure like kernel has, +that catches things like that. + +> +This is reproducible with tcg as well. Same problem both with +> +--enable-vhost-vdpa and --disable-vhost-vdpa. +> +> +Have not yet tried to figure out what might be special with +> +virtio-ccw... anyone have an idea? +> +> +[This should probably be considered a blocker?] + +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin" <mst@redhat.com> wrote: + +> +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> When I start qemu with a second virtio-net-ccw device (i.e. adding +> +> -device virtio-net-ccw in addition to the autogenerated device), I get +> +> a segfault. gdb points to +> +> +> +> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +> config=0x55d6ad9e3f80 "RT") at +> +> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> +> +> (backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. + +> +> +> +> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +> the autogenerated virtio-net-ccw device is present) works. Specifying +> +> several "-device virtio-net-pci" works as well. +> +> +> +> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +> works (in-between state does not compile). +> +> +Ouch. I didn't test all in-between states :( +> +But I wish we had a 0-day instrastructure like kernel has, +> +that catches things like that. +Yep, that would be useful... so patchew only builds the complete series? + +> +> +> This is reproducible with tcg as well. Same problem both with +> +> --enable-vhost-vdpa and --disable-vhost-vdpa. +> +> +> +> Have not yet tried to figure out what might be special with +> +> virtio-ccw... anyone have an idea? +> +> +> +> [This should probably be considered a blocker?] +I think so, as it makes s390x unusable with more that one +virtio-net-ccw device, and I don't even see a workaround. + +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +On Fri, 24 Jul 2020 09:30:58 -0400 +> +"Michael S. Tsirkin" <mst@redhat.com> wrote: +> +> +> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> > When I start qemu with a second virtio-net-ccw device (i.e. adding +> +> > -device virtio-net-ccw in addition to the autogenerated device), I get +> +> > a segfault. gdb points to +> +> > +> +> > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +> > config=0x55d6ad9e3f80 "RT") at +> +> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > +> +> > (backtrace doesn't go further) +> +> +The core was incomplete, but running under gdb directly shows that it +> +is just a bog-standard config space access (first for that device). +> +> +The cause of the crash is that nc->peer is not set... no idea how that +> +can happen, not that familiar with that part of QEMU. (Should the code +> +check, or is that really something that should not happen?) +> +> +What I don't understand is why it is set correctly for the first, +> +autogenerated virtio-net-ccw device, but not for the second one, and +> +why virtio-net-pci doesn't show these problems. The only difference +> +between -ccw and -pci that comes to my mind here is that config space +> +accesses for ccw are done via an asynchronous operation, so timing +> +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? + +> +> > +> +> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +> > the autogenerated virtio-net-ccw device is present) works. Specifying +> +> > several "-device virtio-net-pci" works as well. +> +> > +> +> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +> > works (in-between state does not compile). +> +> +> +> Ouch. I didn't test all in-between states :( +> +> But I wish we had a 0-day instrastructure like kernel has, +> +> that catches things like that. +> +> +Yep, that would be useful... so patchew only builds the complete series? +> +> +> +> +> > This is reproducible with tcg as well. Same problem both with +> +> > --enable-vhost-vdpa and --disable-vhost-vdpa. +> +> > +> +> > Have not yet tried to figure out what might be special with +> +> > virtio-ccw... anyone have an idea? +> +> > +> +> > [This should probably be considered a blocker?] +> +> +I think so, as it makes s390x unusable with more that one +> +virtio-net-ccw device, and I don't even see a workaround. + +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin" <mst@redhat.com> wrote: + +> +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +> On Fri, 24 Jul 2020 09:30:58 -0400 +> +> "Michael S. Tsirkin" <mst@redhat.com> wrote: +> +> +> +> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> > > When I start qemu with a second virtio-net-ccw device (i.e. adding +> +> > > -device virtio-net-ccw in addition to the autogenerated device), I get +> +> > > a segfault. gdb points to +> +> > > +> +> > > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +> > > config=0x55d6ad9e3f80 "RT") at +> +> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > > +> +> > > (backtrace doesn't go further) +> +> +> +> The core was incomplete, but running under gdb directly shows that it +> +> is just a bog-standard config space access (first for that device). +> +> +> +> The cause of the crash is that nc->peer is not set... no idea how that +> +> can happen, not that familiar with that part of QEMU. (Should the code +> +> check, or is that really something that should not happen?) +> +> +> +> What I don't understand is why it is set correctly for the first, +> +> autogenerated virtio-net-ccw device, but not for the second one, and +> +> why virtio-net-pci doesn't show these problems. The only difference +> +> between -ccw and -pci that comes to my mind here is that config space +> +> accesses for ccw are done via an asynchronous operation, so timing +> +> might be different. +> +> +Hopefully Jason has an idea. Could you post a full command line +> +please? Do you need a working guest to trigger this? Does this trigger +> +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 + +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) + +> +> +> > > +> +> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +> > > the autogenerated virtio-net-ccw device is present) works. Specifying +> +> > > several "-device virtio-net-pci" works as well. +> +> > > +> +> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +> > > works (in-between state does not compile). +> +> > +> +> > Ouch. I didn't test all in-between states :( +> +> > But I wish we had a 0-day instrastructure like kernel has, +> +> > that catches things like that. +> +> +> +> Yep, that would be useful... so patchew only builds the complete series? +> +> +> +> > +> +> > > This is reproducible with tcg as well. Same problem both with +> +> > > --enable-vhost-vdpa and --disable-vhost-vdpa. +> +> > > +> +> > > Have not yet tried to figure out what might be special with +> +> > > virtio-ccw... anyone have an idea? +> +> > > +> +> > > [This should probably be considered a blocker?] +> +> +> +> I think so, as it makes s390x unusable with more that one +> +> virtio-net-ccw device, and I don't even see a workaround. +> + +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. + +Thanks +0001-virtio-net-check-the-existence-of-peer-before-accesi.patch +Description: +Text Data + +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang <jasowang@redhat.com> wrote: + +> +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +> On Fri, 24 Jul 2020 11:17:57 -0400 +> +> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> +> +>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +>>> On Fri, 24 Jul 2020 09:30:58 -0400 +> +>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +>>> +> +>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding +> +>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get +> +>>>>> a segfault. gdb points to +> +>>>>> +> +>>>>> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +>>>>> config=0x55d6ad9e3f80 "RT") at +> +>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +>>>>> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +>>>>> +> +>>>>> (backtrace doesn't go further) +> +>>> The core was incomplete, but running under gdb directly shows that it +> +>>> is just a bog-standard config space access (first for that device). +> +>>> +> +>>> The cause of the crash is that nc->peer is not set... no idea how that +> +>>> can happen, not that familiar with that part of QEMU. (Should the code +> +>>> check, or is that really something that should not happen?) +> +>>> +> +>>> What I don't understand is why it is set correctly for the first, +> +>>> autogenerated virtio-net-ccw device, but not for the second one, and +> +>>> why virtio-net-pci doesn't show these problems. The only difference +> +>>> between -ccw and -pci that comes to my mind here is that config space +> +>>> accesses for ccw are done via an asynchronous operation, so timing +> +>>> might be different. +> +>> Hopefully Jason has an idea. Could you post a full command line +> +>> please? Do you need a working guest to trigger this? Does this trigger +> +>> on an x86 host? +> +> Yes, it does trigger with tcg-on-x86 as well. I've been using +> +> +> +> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu +> +> qemu,zpci=on +> +> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +> -device +> +> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +> -device virtio-net-ccw +> +> +> +> It seems it needs the guest actually doing something with the nics; I +> +> cannot reproduce the crash if I use the old advent calendar moon buggy +> +> image and just add a virtio-net-ccw device. +> +> +> +> (I don't think it's a problem with my local build, as I see the problem +> +> both on my laptop and on an LPAR.) +> +> +> +It looks to me we forget the check the existence of peer. +> +> +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck <cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? + +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang <jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck <cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +For autogenerated virtio-net-cww, I think the reason is that it has +already had a peer set. +Thanks + +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang <jasowang@redhat.com> wrote: + +> +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +> +> On Sat, 25 Jul 2020 08:40:07 +0800 +> +> Jason Wang <jasowang@redhat.com> wrote: +> +> +> +>> On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +>>> On Fri, 24 Jul 2020 11:17:57 -0400 +> +>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +>>> +> +>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +>>>>> On Fri, 24 Jul 2020 09:30:58 -0400 +> +>>>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +>>>>> +> +>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding +> +>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get +> +>>>>>>> a segfault. gdb points to +> +>>>>>>> +> +>>>>>>> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +>>>>>>> config=0x55d6ad9e3f80 "RT") at +> +>>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +>>>>>>> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +>>>>>>> +> +>>>>>>> (backtrace doesn't go further) +> +>>>>> The core was incomplete, but running under gdb directly shows that it +> +>>>>> is just a bog-standard config space access (first for that device). +> +>>>>> +> +>>>>> The cause of the crash is that nc->peer is not set... no idea how that +> +>>>>> can happen, not that familiar with that part of QEMU. (Should the code +> +>>>>> check, or is that really something that should not happen?) +> +>>>>> +> +>>>>> What I don't understand is why it is set correctly for the first, +> +>>>>> autogenerated virtio-net-ccw device, but not for the second one, and +> +>>>>> why virtio-net-pci doesn't show these problems. The only difference +> +>>>>> between -ccw and -pci that comes to my mind here is that config space +> +>>>>> accesses for ccw are done via an asynchronous operation, so timing +> +>>>>> might be different. +> +>>>> Hopefully Jason has an idea. Could you post a full command line +> +>>>> please? Do you need a working guest to trigger this? Does this trigger +> +>>>> on an x86 host? +> +>>> Yes, it does trigger with tcg-on-x86 as well. I've been using +> +>>> +> +>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu +> +>>> qemu,zpci=on +> +>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +>>> -device +> +>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +>>> -device virtio-net-ccw +> +>>> +> +>>> It seems it needs the guest actually doing something with the nics; I +> +>>> cannot reproduce the crash if I use the old advent calendar moon buggy +> +>>> image and just add a virtio-net-ccw device. +> +>>> +> +>>> (I don't think it's a problem with my local build, as I see the problem +> +>>> both on my laptop and on an LPAR.) +> +>> +> +>> It looks to me we forget the check the existence of peer. +> +>> +> +>> Please try the attached patch to see if it works. +> +> Thanks, that patch gets my guest up and running again. So, FWIW, +> +> +> +> Tested-by: Cornelia Huck <cohuck@redhat.com> +> +> +> +> Any idea why this did not hit with virtio-net-pci (or the autogenerated +> +> virtio-net-ccw device)? +> +> +> +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. + +> +> +For autogenerated virtio-net-cww, I think the reason is that it has +> +already had a peer set. +Ok, that might well be. + +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang <jasowang@redhat.com> wrote: +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang <jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck <cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. +Sorry for being unclear, I meant for networking part, you just need +start without peer, and you need a real guest (any Linux) that is trying +to access the config space of virtio-net. +Thanks +For autogenerated virtio-net-cww, I think the reason is that it has +already had a peer set. +Ok, that might well be. + +On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +> +> +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +> +> On Mon, 27 Jul 2020 15:38:12 +0800 +> +> Jason Wang <jasowang@redhat.com> wrote: +> +> +> +> > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +> +> > > On Sat, 25 Jul 2020 08:40:07 +0800 +> +> > > Jason Wang <jasowang@redhat.com> wrote: +> +> > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +> > > > > On Fri, 24 Jul 2020 11:17:57 -0400 +> +> > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400 +> +> > > > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e. +> +> > > > > > > > > adding +> +> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated +> +> > > > > > > > > device), I get +> +> > > > > > > > > a segfault. gdb points to +> +> > > > > > > > > +> +> > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config +> +> > > > > > > > > (vdev=<optimized out>, +> +> > > > > > > > > config=0x55d6ad9e3f80 "RT") at +> +> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > > > > > > > > 146 if (nc->peer->info->type == +> +> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > > > > > > > > +> +> > > > > > > > > (backtrace doesn't go further) +> +> > > > > > > The core was incomplete, but running under gdb directly shows +> +> > > > > > > that it +> +> > > > > > > is just a bog-standard config space access (first for that +> +> > > > > > > device). +> +> > > > > > > +> +> > > > > > > The cause of the crash is that nc->peer is not set... no idea +> +> > > > > > > how that +> +> > > > > > > can happen, not that familiar with that part of QEMU. (Should +> +> > > > > > > the code +> +> > > > > > > check, or is that really something that should not happen?) +> +> > > > > > > +> +> > > > > > > What I don't understand is why it is set correctly for the +> +> > > > > > > first, +> +> > > > > > > autogenerated virtio-net-ccw device, but not for the second +> +> > > > > > > one, and +> +> > > > > > > why virtio-net-pci doesn't show these problems. The only +> +> > > > > > > difference +> +> > > > > > > between -ccw and -pci that comes to my mind here is that config +> +> > > > > > > space +> +> > > > > > > accesses for ccw are done via an asynchronous operation, so +> +> > > > > > > timing +> +> > > > > > > might be different. +> +> > > > > > Hopefully Jason has an idea. Could you post a full command line +> +> > > > > > please? Do you need a working guest to trigger this? Does this +> +> > > > > > trigger +> +> > > > > > on an x86 host? +> +> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using +> +> > > > > +> +> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu +> +> > > > > qemu,zpci=on +> +> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +> > > > > -device +> +> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +> > > > > -device virtio-net-ccw +> +> > > > > +> +> > > > > It seems it needs the guest actually doing something with the nics; +> +> > > > > I +> +> > > > > cannot reproduce the crash if I use the old advent calendar moon +> +> > > > > buggy +> +> > > > > image and just add a virtio-net-ccw device. +> +> > > > > +> +> > > > > (I don't think it's a problem with my local build, as I see the +> +> > > > > problem +> +> > > > > both on my laptop and on an LPAR.) +> +> > > > It looks to me we forget the check the existence of peer. +> +> > > > +> +> > > > Please try the attached patch to see if it works. +> +> > > Thanks, that patch gets my guest up and running again. So, FWIW, +> +> > > +> +> > > Tested-by: Cornelia Huck <cohuck@redhat.com> +> +> > > +> +> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated +> +> > > virtio-net-ccw device)? +> +> > +> +> > It can be hit with virtio-net-pci as well (just start without peer). +> +> Hm, I had not been able to reproduce the crash with a 'naked' -device +> +> virtio-net-pci. But checking seems to be the right idea anyway. +> +> +> +Sorry for being unclear, I meant for networking part, you just need start +> +without peer, and you need a real guest (any Linux) that is trying to access +> +the config space of virtio-net. +> +> +Thanks +A pxe guest will do it, but that doesn't support ccw, right? + +I'm still unclear why this triggers with ccw but not pci - +any idea? + +> +> +> +> +> > For autogenerated virtio-net-cww, I think the reason is that it has +> +> > already had a peer set. +> +> Ok, that might well be. +> +> +> +> + +On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote: +On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck<cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. +Sorry for being unclear, I meant for networking part, you just need start +without peer, and you need a real guest (any Linux) that is trying to access +the config space of virtio-net. + +Thanks +A pxe guest will do it, but that doesn't support ccw, right? +Yes, it depends on the cli actually. +I'm still unclear why this triggers with ccw but not pci - +any idea? +I don't test pxe but I can reproduce this with pci (just start a linux +guest without a peer). +Thanks + +On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote: +> +> +On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote: +> +> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +> +> > On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +> +> > > On Mon, 27 Jul 2020 15:38:12 +0800 +> +> > > Jason Wang<jasowang@redhat.com> wrote: +> +> > > +> +> > > > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +> +> > > > > On Sat, 25 Jul 2020 08:40:07 +0800 +> +> > > > > Jason Wang<jasowang@redhat.com> wrote: +> +> > > > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400 +> +> > > > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400 +> +> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck +> +> > > > > > > > > > wrote: +> +> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device +> +> > > > > > > > > > > (i.e. adding +> +> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated +> +> > > > > > > > > > > device), I get +> +> > > > > > > > > > > a segfault. gdb points to +> +> > > > > > > > > > > +> +> > > > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config +> +> > > > > > > > > > > (vdev=<optimized out>, +> +> > > > > > > > > > > config=0x55d6ad9e3f80 "RT") at +> +> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > > > > > > > > > > 146 if (nc->peer->info->type == +> +> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > > > > > > > > > > +> +> > > > > > > > > > > (backtrace doesn't go further) +> +> > > > > > > > > The core was incomplete, but running under gdb directly +> +> > > > > > > > > shows that it +> +> > > > > > > > > is just a bog-standard config space access (first for that +> +> > > > > > > > > device). +> +> > > > > > > > > +> +> > > > > > > > > The cause of the crash is that nc->peer is not set... no +> +> > > > > > > > > idea how that +> +> > > > > > > > > can happen, not that familiar with that part of QEMU. +> +> > > > > > > > > (Should the code +> +> > > > > > > > > check, or is that really something that should not happen?) +> +> > > > > > > > > +> +> > > > > > > > > What I don't understand is why it is set correctly for the +> +> > > > > > > > > first, +> +> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second +> +> > > > > > > > > one, and +> +> > > > > > > > > why virtio-net-pci doesn't show these problems. The only +> +> > > > > > > > > difference +> +> > > > > > > > > between -ccw and -pci that comes to my mind here is that +> +> > > > > > > > > config space +> +> > > > > > > > > accesses for ccw are done via an asynchronous operation, so +> +> > > > > > > > > timing +> +> > > > > > > > > might be different. +> +> > > > > > > > Hopefully Jason has an idea. Could you post a full command +> +> > > > > > > > line +> +> > > > > > > > please? Do you need a working guest to trigger this? Does +> +> > > > > > > > this trigger +> +> > > > > > > > on an x86 host? +> +> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using +> +> > > > > > > +> +> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg +> +> > > > > > > -cpu qemu,zpci=on +> +> > > > > > > -m 1024 -nographic -device +> +> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +> > > > > > > -drive +> +> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +> > > > > > > -device +> +> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +> > > > > > > -device virtio-net-ccw +> +> > > > > > > +> +> > > > > > > It seems it needs the guest actually doing something with the +> +> > > > > > > nics; I +> +> > > > > > > cannot reproduce the crash if I use the old advent calendar +> +> > > > > > > moon buggy +> +> > > > > > > image and just add a virtio-net-ccw device. +> +> > > > > > > +> +> > > > > > > (I don't think it's a problem with my local build, as I see the +> +> > > > > > > problem +> +> > > > > > > both on my laptop and on an LPAR.) +> +> > > > > > It looks to me we forget the check the existence of peer. +> +> > > > > > +> +> > > > > > Please try the attached patch to see if it works. +> +> > > > > Thanks, that patch gets my guest up and running again. So, FWIW, +> +> > > > > +> +> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com> +> +> > > > > +> +> > > > > Any idea why this did not hit with virtio-net-pci (or the +> +> > > > > autogenerated +> +> > > > > virtio-net-ccw device)? +> +> > > > It can be hit with virtio-net-pci as well (just start without peer). +> +> > > Hm, I had not been able to reproduce the crash with a 'naked' -device +> +> > > virtio-net-pci. But checking seems to be the right idea anyway. +> +> > Sorry for being unclear, I meant for networking part, you just need start +> +> > without peer, and you need a real guest (any Linux) that is trying to +> +> > access +> +> > the config space of virtio-net. +> +> > +> +> > Thanks +> +> A pxe guest will do it, but that doesn't support ccw, right? +> +> +> +Yes, it depends on the cli actually. +> +> +> +> +> +> I'm still unclear why this triggers with ccw but not pci - +> +> any idea? +> +> +> +I don't test pxe but I can reproduce this with pci (just start a linux guest +> +without a peer). +> +> +Thanks +> +Might be a good addition to a unit test. Not sure what would the +test do exactly: just make sure guest runs? Looks like a lot of work +for an empty test ... maybe we can poke at the guest config with +qtest commands at least. + +-- +MST + +On 2020/7/27 ä¸å9:16, Michael S. Tsirkin wrote: +On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote: +On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote: +On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck<cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. +Sorry for being unclear, I meant for networking part, you just need start +without peer, and you need a real guest (any Linux) that is trying to access +the config space of virtio-net. + +Thanks +A pxe guest will do it, but that doesn't support ccw, right? +Yes, it depends on the cli actually. +I'm still unclear why this triggers with ccw but not pci - +any idea? +I don't test pxe but I can reproduce this with pci (just start a linux guest +without a peer). + +Thanks +Might be a good addition to a unit test. Not sure what would the +test do exactly: just make sure guest runs? Looks like a lot of work +for an empty test ... maybe we can poke at the guest config with +qtest commands at least. +That should work or we can simply extend the exist virtio-net qtest to +do that. +Thanks + diff --git a/classification_output/01/mistranslation/4412535 b/classification_output/01/mistranslation/4412535 new file mode 100644 index 000000000..97712c2f5 --- /dev/null +++ b/classification_output/01/mistranslation/4412535 @@ -0,0 +1,348 @@ +mistranslation: 0.800 +other: 0.786 +instruction: 0.751 +semantic: 0.737 + +[BUG] accel/tcg: cpu_exec_longjmp_cleanup: assertion failed: (cpu == current_cpu) + +It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 +code. + +This code: + +#include <stdio.h> +#include <stdlib.h> +#include <pthread.h> +#include <signal.h> +#include <unistd.h> + +pthread_t thread1, thread2; + +// Signal handler for SIGALRM +void alarm_handler(int sig) { + // Do nothing, just wake up the other thread +} + +// Thread 1 function +void* thread1_func(void* arg) { + // Set up the signal handler for SIGALRM + signal(SIGALRM, alarm_handler); + + // Wait for 5 seconds + sleep(1); + + // Send SIGALRM signal to thread 2 + pthread_kill(thread2, SIGALRM); + + return NULL; +} + +// Thread 2 function +void* thread2_func(void* arg) { + // Wait for the SIGALRM signal + pause(); + + printf("Thread 2 woke up!\n"); + + return NULL; +} + +int main() { + // Create thread 1 + if (pthread_create(&thread1, NULL, thread1_func, NULL) != 0) { + fprintf(stderr, "Failed to create thread 1\n"); + return 1; + } + + // Create thread 2 + if (pthread_create(&thread2, NULL, thread2_func, NULL) != 0) { + fprintf(stderr, "Failed to create thread 2\n"); + return 1; + } + + // Wait for both threads to finish + pthread_join(thread1, NULL); + pthread_join(thread2, NULL); + + return 0; +} + + +Fails with this -strace log (there are also unsupported syscalls 334 and 435, +but it seems it doesn't affect the code much): + +... +736 rt_sigaction(SIGALRM,0x000000001123ec20,0x000000001123ecc0) = 0 +736 clock_nanosleep(CLOCK_REALTIME,0,{tv_sec = 1,tv_nsec = 0},{tv_sec = +1,tv_nsec = 0}) +736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x0000000010800b38,8) = 0 +736 Unknown syscall 435 +736 +clone(CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID| + ... +736 rt_sigprocmask(SIG_SETMASK,0x0000000010800b38,NULL,8) +736 set_robust_list(0x11a419a0,0) = -1 errno=38 (Function not implemented) +736 rt_sigprocmask(SIG_SETMASK,0x0000000011a41fb0,NULL,8) = 0 + = 0 +736 pause(0,0,2,277186368,0,295966400) +736 +futex(0x000000001123f990,FUTEX_CLOCK_REALTIME|FUTEX_WAIT_BITSET,738,NULL,NULL,0) + = 0 +736 rt_sigprocmask(SIG_BLOCK,0x00000000109fad20,0x000000001123ee88,8) = 0 +736 getpid() = 736 +736 tgkill(736,739,SIGALRM) = 0 + = -1 errno=4 (Interrupted system call) +--- SIGALRM {si_signo=SIGALRM, si_code=SI_TKILL, si_pid=736, si_uid=0} --- +0x48874a != 0x3c69e10 +736 rt_sigprocmask(SIG_SETMASK,0x000000001123ee88,NULL,8) = 0 +** +ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: +(cpu == current_cpu) +Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion +failed: (cpu == current_cpu) +0x48874a != 0x3c69e10 +** +ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: +(cpu == current_cpu) +Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion +failed: (cpu == current_cpu) +# + +The code fails either with or without -singlestep, the command line: + +/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep /opt/x86_64/alarm.bin + +Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't +use RDTSC on i486" [1], +with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints +current pointers of +cpu and current_cpu (line "0x48874a != 0x3c69e10"). + +config.log (built as a part of buildroot, basically the minimal possible +configuration for running x86_64 on 486): + +# Configured with: +'/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/build/qemu-8.1.1/configure' + '--prefix=/usr' +'--cross-prefix=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/i486-buildroot-linux-gnu-' + '--audio-drv-list=' +'--python=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/python3' + +'--ninja=/mnt/hd_8tb_p1/p1/home/crossgen/buildroot_486_2/output/host/bin/ninja' +'--disable-alsa' '--disable-bpf' '--disable-brlapi' '--disable-bsd-user' +'--disable-cap-ng' '--disable-capstone' '--disable-containers' +'--disable-coreaudio' '--disable-curl' '--disable-curses' +'--disable-dbus-display' '--disable-docs' '--disable-dsound' '--disable-hvf' +'--disable-jack' '--disable-libiscsi' '--disable-linux-aio' +'--disable-linux-io-uring' '--disable-malloc-trim' '--disable-membarrier' +'--disable-mpath' '--disable-netmap' '--disable-opengl' '--disable-oss' +'--disable-pa' '--disable-rbd' '--disable-sanitizers' '--disable-selinux' +'--disable-sparse' '--disable-strip' '--disable-vde' '--disable-vhost-crypto' +'--disable-vhost-user-blk-server' '--disable-virtfs' '--disable-whpx' +'--disable-xen' '--disable-attr' '--disable-kvm' '--disable-vhost-net' +'--disable-download' '--disable-hexagon-idef-parser' '--disable-system' +'--enable-linux-user' '--target-list=x86_64-linux-user' '--disable-vhost-user' +'--disable-slirp' '--disable-sdl' '--disable-fdt' '--enable-trace-backends=nop' +'--disable-tools' '--disable-guest-agent' '--disable-fuse' +'--disable-fuse-lseek' '--disable-seccomp' '--disable-libssh' +'--disable-libusb' '--disable-vnc' '--disable-nettle' '--disable-numa' +'--disable-pipewire' '--disable-spice' '--disable-usb-redir' +'--disable-install-blobs' + +Emulation of the same x86_64 code with qemu 6.2.0 installed on another x86_64 +native machine works fine. + +[1] +https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg05387.html +Best regards, +Petr + +On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote: +> +> +It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 +> +code. +486 host is pretty well out of support currently. Can you reproduce +this on a less ancient host CPU type ? + +> +ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: +> +(cpu == current_cpu) +> +Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: +> +assertion failed: (cpu == current_cpu) +> +0x48874a != 0x3c69e10 +> +** +> +ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: +> +(cpu == current_cpu) +> +Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: +> +assertion failed: (cpu == current_cpu) +What compiler version do you build QEMU with? That +assert is there because we have seen some buggy compilers +in the past which don't correctly preserve the variable +value as the setjmp/longjmp spec requires them to. + +thanks +-- PMM + +Dne 27. 11. 23 v 10:37 Peter Maydell napsal(a): +> +On Sat, 25 Nov 2023 at 13:09, Petr Cvek <petrcvekcz@gmail.com> wrote: +> +> +> +> It seems there is a bug in SIGALRM handling when 486 system emulates x86_64 +> +> code. +> +> +486 host is pretty well out of support currently. Can you reproduce +> +this on a less ancient host CPU type ? +> +It seems it only fails when the code is compiled for i486. QEMU built with the +same compiler with -march=i586 and above runs on the same physical hardware +without a problem. All -march= variants were executed on ryzen 3600. + +> +> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion +> +> failed: (cpu == current_cpu) +> +> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: +> +> assertion failed: (cpu == current_cpu) +> +> 0x48874a != 0x3c69e10 +> +> ** +> +> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion +> +> failed: (cpu == current_cpu) +> +> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: +> +> assertion failed: (cpu == current_cpu) +> +> +What compiler version do you build QEMU with? That +> +assert is there because we have seen some buggy compilers +> +in the past which don't correctly preserve the variable +> +value as the setjmp/longjmp spec requires them to. +> +i486 and i586+ code variants were compiled with GCC 13.2.0 (more exactly, +slackware64 current multilib distribution). + +i486 binary which runs on the real 486 is also GCC 13.2.0 and installed as a +part of the buildroot crosscompiler (about two week old git snapshot). + +> +thanks +> +-- PMM +best regards, +Petr + +On 11/25/23 07:08, Petr Cvek wrote: +ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion failed: +(cpu == current_cpu) +Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion +failed: (cpu == current_cpu) +# + +The code fails either with or without -singlestep, the command line: + +/usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep /opt/x86_64/alarm.bin + +Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't use +RDTSC on i486" [1], +with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now prints +current pointers of +cpu and current_cpu (line "0x48874a != 0x3c69e10"). +If you try this again with 8.2-rc2, you should not see an assertion failure. +You should see instead + +QEMU internal SIGILL {code=ILLOPC, addr=0x12345678} +which I think more accurately summarizes the situation of attempting RDTSC on hardware +that does not support it. +r~ + +Dne 29. 11. 23 v 15:25 Richard Henderson napsal(a): +> +On 11/25/23 07:08, Petr Cvek wrote: +> +> ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: assertion +> +> failed: (cpu == current_cpu) +> +> Bail out! ERROR:../accel/tcg/cpu-exec.c:546:cpu_exec_longjmp_cleanup: +> +> assertion failed: (cpu == current_cpu) +> +> # +> +> +> +> The code fails either with or without -singlestep, the command line: +> +> +> +> /usr/bin/qemu-x86_64 -L /opt/x86_64 -strace -singlestep +> +> /opt/x86_64/alarm.bin +> +> +> +> Source code of QEMU 8.1.1 was modified with patch "[PATCH] qemu/timer: Don't +> +> use RDTSC on i486" [1], +> +> with added few ioctls (not relevant) and cpu_exec_longjmp_cleanup() now +> +> prints current pointers of +> +> cpu and current_cpu (line "0x48874a != 0x3c69e10"). +> +> +> +If you try this again with 8.2-rc2, you should not see an assertion failure. +> +You should see instead +> +> +QEMU internal SIGILL {code=ILLOPC, addr=0x12345678} +> +> +which I think more accurately summarizes the situation of attempting RDTSC on +> +hardware that does not support it. +> +> +Compilation of vanilla qemu v8.2.0-rc2 with -march=i486 by GCC 13.2.0 and +running the resulting binary on ryzen still leads to: + +** +ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion failed: +(cpu == current_cpu) +Bail out! ERROR:../accel/tcg/cpu-exec.c:533:cpu_exec_longjmp_cleanup: assertion +failed: (cpu == current_cpu) +Aborted + +> +> +r~ +Petr + diff --git a/classification_output/01/mistranslation/5373318 b/classification_output/01/mistranslation/5373318 new file mode 100644 index 000000000..e4d4789c4 --- /dev/null +++ b/classification_output/01/mistranslation/5373318 @@ -0,0 +1,692 @@ +mistranslation: 0.881 +other: 0.839 +instruction: 0.755 +semantic: 0.752 + +[Qemu-devel] [BUG?] aio_get_linux_aio: Assertion `ctx->linux_aio' failed + +Hi, + +I am seeing some strange QEMU assertion failures for qemu on s390x, +which prevents a guest from starting. + +Git bisecting points to the following commit as the source of the error. + +commit ed6e2161715c527330f936d44af4c547f25f687e +Author: Nishanth Aravamudan <address@hidden> +Date: Fri Jun 22 12:37:00 2018 -0700 + + linux-aio: properly bubble up errors from initialization + + laio_init() can fail for a couple of reasons, which will lead to a NULL + pointer dereference in laio_attach_aio_context(). + + To solve this, add a aio_setup_linux_aio() function which is called + early in raw_open_common. If this fails, propagate the error up. The + signature of aio_get_linux_aio() was not modified, because it seems + preferable to return the actual errno from the possible failing + initialization calls. + + Additionally, when the AioContext changes, we need to associate a + LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context + callback and call the new aio_setup_linux_aio(), which will allocate a +new AioContext if needed, and return errors on failures. If it +fails for +any reason, fallback to threaded AIO with an error message, as the + device is already in-use by the guest. + + Add an assert that aio_get_linux_aio() cannot return NULL. + + Signed-off-by: Nishanth Aravamudan <address@hidden> + Message-id: address@hidden + Signed-off-by: Stefan Hajnoczi <address@hidden> +Not sure what is causing this assertion to fail. Here is the qemu +command line of the guest, from qemu log, which throws this error: +LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin +QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name +guest=rt_vm1,debug-threads=on -S -object +secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes +-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m +1024 -realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object +iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d +-display none -no-user-config -nodefaults -chardev +socket,id=charmonitor,fd=28,server,nowait -mon +chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown +-boot strict=on -drive +file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native +-device +virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on +-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device +virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000 +-netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device +virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002 +-chardev pty,id=charconsole0 -device +sclpconsole,chardev=charconsole0,id=console0 -device +virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox +on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny +-msg timestamp=on +2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges +2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev +pty,id=charconsole0: char device redirected to /dev/pts/3 (label +charconsole0) +qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion +`ctx->linux_aio' failed. +2018-07-17 15:48:43.309+0000: shutting down, reason=failed + + +Any help debugging this would be greatly appreciated. + +Thank you +Farhan + +On 17.07.2018 [13:25:53 -0400], Farhan Ali wrote: +> +Hi, +> +> +I am seeing some strange QEMU assertion failures for qemu on s390x, +> +which prevents a guest from starting. +> +> +Git bisecting points to the following commit as the source of the error. +> +> +commit ed6e2161715c527330f936d44af4c547f25f687e +> +Author: Nishanth Aravamudan <address@hidden> +> +Date: Fri Jun 22 12:37:00 2018 -0700 +> +> +linux-aio: properly bubble up errors from initialization +> +> +laio_init() can fail for a couple of reasons, which will lead to a NULL +> +pointer dereference in laio_attach_aio_context(). +> +> +To solve this, add a aio_setup_linux_aio() function which is called +> +early in raw_open_common. If this fails, propagate the error up. The +> +signature of aio_get_linux_aio() was not modified, because it seems +> +preferable to return the actual errno from the possible failing +> +initialization calls. +> +> +Additionally, when the AioContext changes, we need to associate a +> +LinuxAioState with the new AioContext. Use the bdrv_attach_aio_context +> +callback and call the new aio_setup_linux_aio(), which will allocate a +> +new AioContext if needed, and return errors on failures. If it fails for +> +any reason, fallback to threaded AIO with an error message, as the +> +device is already in-use by the guest. +> +> +Add an assert that aio_get_linux_aio() cannot return NULL. +> +> +Signed-off-by: Nishanth Aravamudan <address@hidden> +> +Message-id: address@hidden +> +Signed-off-by: Stefan Hajnoczi <address@hidden> +> +> +> +Not sure what is causing this assertion to fail. Here is the qemu command +> +line of the guest, from qemu log, which throws this error: +> +> +> +LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin +> +QEMU_AUDIO_DRV=none /usr/local/bin/qemu-system-s390x -name +> +guest=rt_vm1,debug-threads=on -S -object +> +secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-21-rt_vm1/master-key.aes +> +-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off -m 1024 +> +-realtime mlock=off -smp 4,sockets=4,cores=1,threads=1 -object +> +iothread,id=iothread1 -uuid 0cde16cd-091d-41bd-9ac2-5243df5c9a0d -display +> +none -no-user-config -nodefaults -chardev +> +socket,id=charmonitor,fd=28,server,nowait -mon +> +chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot +> +strict=on -drive +> +file=/dev/mapper/360050763998b0883980000002a000031,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native +> +-device +> +virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1,write-cache=on +> +-netdev tap,fd=30,id=hostnet0,vhost=on,vhostfd=31 -device +> +virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:3a:c8:67:95:84,devno=fe.0.0000 +> +-netdev tap,fd=32,id=hostnet1,vhost=on,vhostfd=33 -device +> +virtio-net-ccw,netdev=hostnet1,id=net1,mac=52:54:00:2a:e5:08,devno=fe.0.0002 +> +-chardev pty,id=charconsole0 -device +> +sclpconsole,chardev=charconsole0,id=console0 -device +> +virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -sandbox +> +on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg +> +timestamp=on +> +> +> +> +2018-07-17 15:48:42.252+0000: Domain id=21 is tainted: high-privileges +> +2018-07-17T15:48:42.279380Z qemu-system-s390x: -chardev pty,id=charconsole0: +> +char device redirected to /dev/pts/3 (label charconsole0) +> +qemu-system-s390x: util/async.c:339: aio_get_linux_aio: Assertion +> +`ctx->linux_aio' failed. +> +2018-07-17 15:48:43.309+0000: shutting down, reason=failed +> +> +> +Any help debugging this would be greatly appreciated. +iiuc, this possibly implies AIO was not actually used previously on this +guest (it might have silently been falling back to threaded IO?). I +don't have access to s390x, but would it be possible to run qemu under +gdb and see if aio_setup_linux_aio is being called at all (I think it +might not be, but I'm not sure why), and if so, if it's for the context +in question? + +If it's not being called first, could you see what callpath is calling +aio_get_linux_aio when this assertion trips? + +Thanks! +-Nish + +On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: +iiuc, this possibly implies AIO was not actually used previously on this +guest (it might have silently been falling back to threaded IO?). I +don't have access to s390x, but would it be possible to run qemu under +gdb and see if aio_setup_linux_aio is being called at all (I think it +might not be, but I'm not sure why), and if so, if it's for the context +in question? + +If it's not being called first, could you see what callpath is calling +aio_get_linux_aio when this assertion trips? + +Thanks! +-Nish +Hi Nishant, +From the coredump of the guest this is the call trace that calls +aio_get_linux_aio: +Stack trace of thread 145158: +#0 0x000003ff94dbe274 raise (libc.so.6) +#1 0x000003ff94da39a8 abort (libc.so.6) +#2 0x000003ff94db62ce __assert_fail_base (libc.so.6) +#3 0x000003ff94db634c __assert_fail (libc.so.6) +#4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) +#5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) +#6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) +#7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) +#8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) +#9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) +#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) +#11 0x000003ff94f879a8 start_thread (libpthread.so.0) +#12 0x000003ff94e797ee thread_start (libc.so.6) + + +Thanks for taking a look and responding. + +Thanks +Farhan + +On 07/18/2018 09:42 AM, Farhan Ali wrote: +On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: +iiuc, this possibly implies AIO was not actually used previously on this +guest (it might have silently been falling back to threaded IO?). I +don't have access to s390x, but would it be possible to run qemu under +gdb and see if aio_setup_linux_aio is being called at all (I think it +might not be, but I'm not sure why), and if so, if it's for the context +in question? + +If it's not being called first, could you see what callpath is calling +aio_get_linux_aio when this assertion trips? + +Thanks! +-Nish +Hi Nishant, +From the coredump of the guest this is the call trace that calls +aio_get_linux_aio: +Stack trace of thread 145158: +#0 0x000003ff94dbe274 raise (libc.so.6) +#1 0x000003ff94da39a8 abort (libc.so.6) +#2 0x000003ff94db62ce __assert_fail_base (libc.so.6) +#3 0x000003ff94db634c __assert_fail (libc.so.6) +#4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) +#5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) +#6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) +#7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) +#8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) +#9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) +#10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) +#11 0x000003ff94f879a8 start_thread (libpthread.so.0) +#12 0x000003ff94e797ee thread_start (libc.so.6) + + +Thanks for taking a look and responding. + +Thanks +Farhan +Trying to debug a little further, the block device in this case is a +"host device". And looking at your commit carefully you use the +bdrv_attach_aio_context callback to setup a Linux AioContext. +For some reason the "host device" struct (BlockDriver bdrv_host_device +in block/file-posix.c) does not have a bdrv_attach_aio_context defined. +So a simple change of adding the callback to the struct solves the issue +and the guest starts fine. +diff --git a/block/file-posix.c b/block/file-posix.c +index 28824aa..b8d59fb 100644 +--- a/block/file-posix.c ++++ b/block/file-posix.c +@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { + .bdrv_refresh_limits = raw_refresh_limits, + .bdrv_io_plug = raw_aio_plug, + .bdrv_io_unplug = raw_aio_unplug, ++ .bdrv_attach_aio_context = raw_aio_attach_aio_context, + + .bdrv_co_truncate = raw_co_truncate, + .bdrv_getlength = raw_getlength, +I am not too familiar with block device code in QEMU, so not sure if +this is the right fix or if there are some underlying problems. +Thanks +Farhan + +On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: +> +> +> +On 07/18/2018 09:42 AM, Farhan Ali wrote: +> +> +> +> +> +> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: +> +> > iiuc, this possibly implies AIO was not actually used previously on this +> +> > guest (it might have silently been falling back to threaded IO?). I +> +> > don't have access to s390x, but would it be possible to run qemu under +> +> > gdb and see if aio_setup_linux_aio is being called at all (I think it +> +> > might not be, but I'm not sure why), and if so, if it's for the context +> +> > in question? +> +> > +> +> > If it's not being called first, could you see what callpath is calling +> +> > aio_get_linux_aio when this assertion trips? +> +> > +> +> > Thanks! +> +> > -Nish +> +> +> +> +> +> Hi Nishant, +> +> +> +> From the coredump of the guest this is the call trace that calls +> +> aio_get_linux_aio: +> +> +> +> +> +> Stack trace of thread 145158: +> +> #0 0x000003ff94dbe274 raise (libc.so.6) +> +> #1 0x000003ff94da39a8 abort (libc.so.6) +> +> #2 0x000003ff94db62ce __assert_fail_base (libc.so.6) +> +> #3 0x000003ff94db634c __assert_fail (libc.so.6) +> +> #4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) +> +> #5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) +> +> #6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) +> +> #7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) +> +> #8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) +> +> #9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) +> +> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) +> +> #11 0x000003ff94f879a8 start_thread (libpthread.so.0) +> +> #12 0x000003ff94e797ee thread_start (libc.so.6) +> +> +> +> +> +> Thanks for taking a look and responding. +> +> +> +> Thanks +> +> Farhan +> +> +> +> +> +> +> +> +Trying to debug a little further, the block device in this case is a "host +> +device". And looking at your commit carefully you use the +> +bdrv_attach_aio_context callback to setup a Linux AioContext. +> +> +For some reason the "host device" struct (BlockDriver bdrv_host_device in +> +block/file-posix.c) does not have a bdrv_attach_aio_context defined. +> +So a simple change of adding the callback to the struct solves the issue and +> +the guest starts fine. +> +> +> +diff --git a/block/file-posix.c b/block/file-posix.c +> +index 28824aa..b8d59fb 100644 +> +--- a/block/file-posix.c +> ++++ b/block/file-posix.c +> +@@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { +> +.bdrv_refresh_limits = raw_refresh_limits, +> +.bdrv_io_plug = raw_aio_plug, +> +.bdrv_io_unplug = raw_aio_unplug, +> ++ .bdrv_attach_aio_context = raw_aio_attach_aio_context, +> +> +.bdrv_co_truncate = raw_co_truncate, +> +.bdrv_getlength = raw_getlength, +> +> +> +> +I am not too familiar with block device code in QEMU, so not sure if +> +this is the right fix or if there are some underlying problems. +Oh this is quite embarassing! I only added the bdrv_attach_aio_context +callback for the file-backed device. Your fix is definitely corect for +host device. Let me make sure there weren't any others missed and I will +send out a properly formatted patch. Thank you for the quick testing and +turnaround! + +-Nish + +On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote: +> +On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: +> +> +> +> +> +> On 07/18/2018 09:42 AM, Farhan Ali wrote: +> +>> +> +>> +> +>> On 07/17/2018 04:52 PM, Nishanth Aravamudan wrote: +> +>>> iiuc, this possibly implies AIO was not actually used previously on this +> +>>> guest (it might have silently been falling back to threaded IO?). I +> +>>> don't have access to s390x, but would it be possible to run qemu under +> +>>> gdb and see if aio_setup_linux_aio is being called at all (I think it +> +>>> might not be, but I'm not sure why), and if so, if it's for the context +> +>>> in question? +> +>>> +> +>>> If it's not being called first, could you see what callpath is calling +> +>>> aio_get_linux_aio when this assertion trips? +> +>>> +> +>>> Thanks! +> +>>> -Nish +> +>> +> +>> +> +>> Hi Nishant, +> +>> +> +>> From the coredump of the guest this is the call trace that calls +> +>> aio_get_linux_aio: +> +>> +> +>> +> +>> Stack trace of thread 145158: +> +>> #0 0x000003ff94dbe274 raise (libc.so.6) +> +>> #1 0x000003ff94da39a8 abort (libc.so.6) +> +>> #2 0x000003ff94db62ce __assert_fail_base (libc.so.6) +> +>> #3 0x000003ff94db634c __assert_fail (libc.so.6) +> +>> #4 0x000002aa20db067a aio_get_linux_aio (qemu-system-s390x) +> +>> #5 0x000002aa20d229a8 raw_aio_plug (qemu-system-s390x) +> +>> #6 0x000002aa20d309ee bdrv_io_plug (qemu-system-s390x) +> +>> #7 0x000002aa20b5a8ea virtio_blk_handle_vq (qemu-system-s390x) +> +>> #8 0x000002aa20db2f6e aio_dispatch_handlers (qemu-system-s390x) +> +>> #9 0x000002aa20db3c34 aio_poll (qemu-system-s390x) +> +>> #10 0x000002aa20be32a2 iothread_run (qemu-system-s390x) +> +>> #11 0x000003ff94f879a8 start_thread (libpthread.so.0) +> +>> #12 0x000003ff94e797ee thread_start (libc.so.6) +> +>> +> +>> +> +>> Thanks for taking a look and responding. +> +>> +> +>> Thanks +> +>> Farhan +> +>> +> +>> +> +>> +> +> +> +> Trying to debug a little further, the block device in this case is a "host +> +> device". And looking at your commit carefully you use the +> +> bdrv_attach_aio_context callback to setup a Linux AioContext. +> +> +> +> For some reason the "host device" struct (BlockDriver bdrv_host_device in +> +> block/file-posix.c) does not have a bdrv_attach_aio_context defined. +> +> So a simple change of adding the callback to the struct solves the issue and +> +> the guest starts fine. +> +> +> +> +> +> diff --git a/block/file-posix.c b/block/file-posix.c +> +> index 28824aa..b8d59fb 100644 +> +> --- a/block/file-posix.c +> +> +++ b/block/file-posix.c +> +> @@ -3135,6 +3135,7 @@ static BlockDriver bdrv_host_device = { +> +> .bdrv_refresh_limits = raw_refresh_limits, +> +> .bdrv_io_plug = raw_aio_plug, +> +> .bdrv_io_unplug = raw_aio_unplug, +> +> + .bdrv_attach_aio_context = raw_aio_attach_aio_context, +> +> +> +> .bdrv_co_truncate = raw_co_truncate, +> +> .bdrv_getlength = raw_getlength, +> +> +> +> +> +> +> +> I am not too familiar with block device code in QEMU, so not sure if +> +> this is the right fix or if there are some underlying problems. +> +> +Oh this is quite embarassing! I only added the bdrv_attach_aio_context +> +callback for the file-backed device. Your fix is definitely corect for +> +host device. Let me make sure there weren't any others missed and I will +> +send out a properly formatted patch. Thank you for the quick testing and +> +turnaround! +Farhan, can you respin your patch with proper sign-off and patch description? +Adding qemu-block. + +Hi Christian, + +On 19.07.2018 [08:55:20 +0200], Christian Borntraeger wrote: +> +> +> +On 07/18/2018 08:52 PM, Nishanth Aravamudan wrote: +> +> On 18.07.2018 [11:10:27 -0400], Farhan Ali wrote: +> +>> +> +>> +> +>> On 07/18/2018 09:42 AM, Farhan Ali wrote: +<snip> + +> +>> I am not too familiar with block device code in QEMU, so not sure if +> +>> this is the right fix or if there are some underlying problems. +> +> +> +> Oh this is quite embarassing! I only added the bdrv_attach_aio_context +> +> callback for the file-backed device. Your fix is definitely corect for +> +> host device. Let me make sure there weren't any others missed and I will +> +> send out a properly formatted patch. Thank you for the quick testing and +> +> turnaround! +> +> +Farhan, can you respin your patch with proper sign-off and patch description? +> +Adding qemu-block. +I sent it yesterday, sorry I didn't cc everyone from this e-mail: +http://lists.nongnu.org/archive/html/qemu-block/2018-07/msg00516.html +Thanks, +Nish + diff --git a/classification_output/01/mistranslation/5798945 b/classification_output/01/mistranslation/5798945 new file mode 100644 index 000000000..95c3f61d1 --- /dev/null +++ b/classification_output/01/mistranslation/5798945 @@ -0,0 +1,43 @@ +mistranslation: 0.472 +semantic: 0.387 +other: 0.345 +instruction: 0.261 + +[BUG][CPU hot-plug]CPU hot-plugs cause the qemu process to coredump + +Hello,Recently, when I was developing CPU hot-plugs under the loongarch +architecture, +I found that there was a problem with qemu cpu hot-plugs under x86 +architecture, +which caused the qemu process coredump when repeatedly inserting and +unplugging +the CPU when the TCG was accelerated. + + +The specific operation process is as follows: + +1.Use the following command to start the virtual machine + +qemu-system-x86_64 \ +-machine q35 \ +-cpu Broadwell-IBRS \ +-smp 1,maxcpus=4,sockets=4,cores=1,threads=1 \ +-m 4G \ +-drive file=~/anolis-8.8.qcow2 \ +-serial stdio  \ +-monitor telnet:localhost:4498,server,nowait + + +2.Enter QEMU Monitor via telnet for repeated CPU insertion and unplugging + +telnet 127.0.0.1 4498 +(qemu) device_add +Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1 +(qemu) device_del cpu1 +(qemu) device_add +Broadwell-IBRS-x86_64-cpu,socket-id=1,core-id=0,thread-id=0,id=cpu1 +3.You will notice that the QEMU process has a coredump + +# malloc(): unsorted double linked list corrupted +Aborted (core dumped) + diff --git a/classification_output/01/mistranslation/5933279 b/classification_output/01/mistranslation/5933279 new file mode 100644 index 000000000..719c03c74 --- /dev/null +++ b/classification_output/01/mistranslation/5933279 @@ -0,0 +1,4581 @@ +mistranslation: 0.962 +instruction: 0.930 +other: 0.930 +semantic: 0.923 + +[BUG, RFC] cpr-transfer: qxl guest driver crashes after migration + +Hi all, + +We've been experimenting with cpr-transfer migration mode recently and +have discovered the following issue with the guest QXL driver: + +Run migration source: +> +EMULATOR=/path/to/emulator +> +ROOTFS=/path/to/image +> +QMPSOCK=/var/run/alma8qmp-src.sock +> +> +$EMULATOR -enable-kvm \ +> +-machine q35 \ +> +-cpu host -smp 2 -m 2G \ +> +-object +> +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ +> +-machine memory-backend=ram0 \ +> +-machine aux-ram-share=on \ +> +-drive file=$ROOTFS,media=disk,if=virtio \ +> +-qmp unix:$QMPSOCK,server=on,wait=off \ +> +-nographic \ +> +-device qxl-vga +Run migration target: +> +EMULATOR=/path/to/emulator +> +ROOTFS=/path/to/image +> +QMPSOCK=/var/run/alma8qmp-dst.sock +> +> +> +> +$EMULATOR -enable-kvm \ +> +-machine q35 \ +> +-cpu host -smp 2 -m 2G \ +> +-object +> +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ +> +-machine memory-backend=ram0 \ +> +-machine aux-ram-share=on \ +> +-drive file=$ROOTFS,media=disk,if=virtio \ +> +-qmp unix:$QMPSOCK,server=on,wait=off \ +> +-nographic \ +> +-device qxl-vga \ +> +-incoming tcp:0:44444 \ +> +-incoming '{"channel-type": "cpr", "addr": { "transport": "socket", +> +"type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +> +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +> +QMPSOCK=/var/run/alma8qmp-src.sock +> +> +$QMPSHELL -p $QMPSOCK <<EOF +> +migrate-set-parameters mode=cpr-transfer +> +migrate +> +channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}] +> +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +> +[ 73.962002] [TTM] Buffer eviction failed +> +[ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001) +> +[ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate +> +VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which speeds up +the crash in the guest): +> +#!/bin/bash +> +> +chvt 3 +> +> +for j in $(seq 80); do +> +echo "$(date) starting round $j" +> +if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" +> +]; then +> +echo "bug was reproduced after $j tries" +> +exit 1 +> +fi +> +for i in $(seq 100); do +> +dmesg > /dev/tty3 +> +done +> +done +> +> +echo "bug could not be reproduced" +> +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into this? Any +suggestions would be appreciated. Thanks! + +Andrey + +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ + -machine q35 \ + -cpu host -smp 2 -m 2G \ + -object +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ + -machine memory-backend=ram0 \ + -machine aux-ram-share=on \ + -drive file=$ROOTFS,media=disk,if=virtio \ + -qmp unix:$QMPSOCK,server=on,wait=off \ + -nographic \ + -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +-machine q35 \ + -cpu host -smp 2 -m 2G \ + -object +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ + -machine memory-backend=ram0 \ + -machine aux-ram-share=on \ + -drive file=$ROOTFS,media=disk,if=virtio \ + -qmp unix:$QMPSOCK,server=on,wait=off \ + -nographic \ + -device qxl-vga \ + -incoming tcp:0:44444 \ + -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", +"path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF + migrate-set-parameters mode=cpr-transfer + migrate +channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[ 73.962002] [TTM] Buffer eviction failed +[ 73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001) +[ 73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate +VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do + echo "$(date) starting round $j" + if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" +]; then + echo "bug was reproduced after $j tries" + exit 1 + fi + for i in $(seq 100); do + dmesg > /dev/tty3 + done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' + +- Steve + +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +    -machine q35 \ +    -cpu host -smp 2 -m 2G \ +    -object +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ +    -machine memory-backend=ram0 \ +    -machine aux-ram-share=on \ +    -drive file=$ROOTFS,media=disk,if=virtio \ +    -qmp unix:$QMPSOCK,server=on,wait=off \ +    -nographic \ +    -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +    -machine q35 \ +    -cpu host -smp 2 -m 2G \ +    -object +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ram0,share=on\ +    -machine memory-backend=ram0 \ +    -machine aux-ram-share=on \ +    -drive file=$ROOTFS,media=disk,if=virtio \ +    -qmp unix:$QMPSOCK,server=on,wait=off \ +    -nographic \ +    -device qxl-vga \ +    -incoming tcp:0:44444 \ +    -incoming '{"channel-type": "cpr", "addr": { "transport": "socket", "type": "unix", +"path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +    migrate-set-parameters mode=cpr-transfer +    migrate +channels=[{"channel-type":"main","addr":{"transport":"socket","type":"inet","host":"0","port":"44444"}},{"channel-type":"cpr","addr":{"transport":"socket","type":"unix","path":"/var/run/alma8cpr-dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, 0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to allocate +VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +        echo "$(date) starting round $j" +        if [ "$(journalctl --boot | grep "failed to allocate VRAM BO")" != "" +]; then +                echo "bug was reproduced after $j tries" +                exit 1 +        fi +        for i in $(seq 100); do +                dmesg > /dev/tty3 +        done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +1740667681-257312-1-git-send-email-steven.sistare@oracle.com +/">https://lore.kernel.org/qemu-devel/ +1740667681-257312-1-git-send-email-steven.sistare@oracle.com +/ +- Steve + +On 2/28/25 8:20 PM, Steven Sistare wrote: +> +On 2/28/2025 1:13 PM, Steven Sistare wrote: +> +> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +> +>> Hi all, +> +>> +> +>> We've been experimenting with cpr-transfer migration mode recently and +> +>> have discovered the following issue with the guest QXL driver: +> +>> +> +>> Run migration source: +> +>>> EMULATOR=/path/to/emulator +> +>>> ROOTFS=/path/to/image +> +>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>> +> +>>> $EMULATOR -enable-kvm \ +> +>>>     -machine q35 \ +> +>>>     -cpu host -smp 2 -m 2G \ +> +>>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +> +>>> ram0,share=on\ +> +>>>     -machine memory-backend=ram0 \ +> +>>>     -machine aux-ram-share=on \ +> +>>>     -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>     -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>     -nographic \ +> +>>>     -device qxl-vga +> +>> +> +>> Run migration target: +> +>>> EMULATOR=/path/to/emulator +> +>>> ROOTFS=/path/to/image +> +>>> QMPSOCK=/var/run/alma8qmp-dst.sock +> +>>> $EMULATOR -enable-kvm \ +> +>>>     -machine q35 \ +> +>>>     -cpu host -smp 2 -m 2G \ +> +>>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +> +>>> ram0,share=on\ +> +>>>     -machine memory-backend=ram0 \ +> +>>>     -machine aux-ram-share=on \ +> +>>>     -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>     -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>     -nographic \ +> +>>>     -device qxl-vga \ +> +>>>     -incoming tcp:0:44444 \ +> +>>>     -incoming '{"channel-type": "cpr", "addr": { "transport": +> +>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +> +>> +> +>> +> +>> Launch the migration: +> +>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +> +>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>> +> +>>> $QMPSHELL -p $QMPSOCK <<EOF +> +>>>     migrate-set-parameters mode=cpr-transfer +> +>>>     migrate channels=[{"channel-type":"main","addr": +> +>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, +> +>>> {"channel-type":"cpr","addr": +> +>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +> +>>> dst.sock"}}] +> +>>> EOF +> +>> +> +>> Then, after a while, QXL guest driver on target crashes spewing the +> +>> following messages: +> +>>> [  73.962002] [TTM] Buffer eviction failed +> +>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +> +>>> 0x00000001) +> +>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +> +>>> allocate VRAM BO +> +>> +> +>> That seems to be a known kernel QXL driver bug: +> +>> +> +>> +https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ +> +>> +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +> +>> +> +>> (the latter discussion contains that reproduce script which speeds up +> +>> the crash in the guest): +> +>>> #!/bin/bash +> +>>> +> +>>> chvt 3 +> +>>> +> +>>> for j in $(seq 80); do +> +>>>         echo "$(date) starting round $j" +> +>>>         if [ "$(journalctl --boot | grep "failed to allocate VRAM +> +>>> BO")" != "" ]; then +> +>>>                 echo "bug was reproduced after $j tries" +> +>>>                 exit 1 +> +>>>         fi +> +>>>         for i in $(seq 100); do +> +>>>                 dmesg > /dev/tty3 +> +>>>         done +> +>>> done +> +>>> +> +>>> echo "bug could not be reproduced" +> +>>> exit 0 +> +>> +> +>> The bug itself seems to remain unfixed, as I was able to reproduce that +> +>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +> +>> cpr-transfer code also seems to be buggy as it triggers the crash - +> +>> without the cpr-transfer migration the above reproduce doesn't lead to +> +>> crash on the source VM. +> +>> +> +>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but +> +>> rather passes it through the memory backend object, our code might +> +>> somehow corrupt the VRAM. However, I wasn't able to trace the +> +>> corruption so far. +> +>> +> +>> Could somebody help the investigation and take a look into this? Any +> +>> suggestions would be appreciated. Thanks! +> +> +> +> Possibly some memory region created by qxl is not being preserved. +> +> Try adding these traces to see what is preserved: +> +> +> +> -trace enable='*cpr*' +> +> -trace enable='*ram_alloc*' +> +> +Also try adding this patch to see if it flags any ram blocks as not +> +compatible with cpr. A message is printed at migration start time. +> + +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email- +> +steven.sistare@oracle.com/ +> +> +- Steve +> +With the traces enabled + the "migration: ram block cpr blockers" patch +applied: + +Source: +> +cpr_find_fd pc.bios, id 0 returns -1 +> +cpr_save_fd pc.bios, id 0, fd 22 +> +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +> +0x7fec18e00000 +> +cpr_find_fd pc.rom, id 0 returns -1 +> +cpr_save_fd pc.rom, id 0, fd 23 +> +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +> +0x7fec18c00000 +> +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +> +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +> +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd +> +24 host 0x7fec18a00000 +> +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +> +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +> +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 +> +fd 25 host 0x7feb77e00000 +> +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +> +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +> +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 +> +host 0x7fec18800000 +> +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +> +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +> +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 +> +fd 28 host 0x7feb73c00000 +> +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +> +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +> +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 +> +host 0x7fec18600000 +> +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +> +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +> +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 +> +host 0x7fec18200000 +> +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +> +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +> +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 +> +host 0x7feb8b600000 +> +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +> +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +> +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host +> +0x7feb8b400000 +> +> +cpr_state_save cpr-transfer mode +> +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +> +cpr_transfer_input /var/run/alma8cpr-dst.sock +> +cpr_state_load cpr-transfer mode +> +cpr_find_fd pc.bios, id 0 returns 20 +> +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +> +0x7fcdc9800000 +> +cpr_find_fd pc.rom, id 0 returns 19 +> +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +> +0x7fcdc9600000 +> +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +> +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd +> +18 host 0x7fcdc9400000 +> +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +> +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 +> +fd 17 host 0x7fcd27e00000 +> +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +> +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 +> +host 0x7fcdc9200000 +> +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +> +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 +> +fd 15 host 0x7fcd23c00000 +> +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +> +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 +> +host 0x7fcdc8800000 +> +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +> +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 +> +host 0x7fcdc8400000 +> +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +> +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 +> +host 0x7fcdc8200000 +> +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +> +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host +> +0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with the same +addresses), and no incompatible ram blocks are found during migration. + +Andrey + +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +> +On 2/28/25 8:20 PM, Steven Sistare wrote: +> +> On 2/28/2025 1:13 PM, Steven Sistare wrote: +> +>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +> +>>> Hi all, +> +>>> +> +>>> We've been experimenting with cpr-transfer migration mode recently and +> +>>> have discovered the following issue with the guest QXL driver: +> +>>> +> +>>> Run migration source: +> +>>>> EMULATOR=/path/to/emulator +> +>>>> ROOTFS=/path/to/image +> +>>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>>> +> +>>>> $EMULATOR -enable-kvm \ +> +>>>>     -machine q35 \ +> +>>>>     -cpu host -smp 2 -m 2G \ +> +>>>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +> +>>>> ram0,share=on\ +> +>>>>     -machine memory-backend=ram0 \ +> +>>>>     -machine aux-ram-share=on \ +> +>>>>     -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>>     -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>>     -nographic \ +> +>>>>     -device qxl-vga +> +>>> +> +>>> Run migration target: +> +>>>> EMULATOR=/path/to/emulator +> +>>>> ROOTFS=/path/to/image +> +>>>> QMPSOCK=/var/run/alma8qmp-dst.sock +> +>>>> $EMULATOR -enable-kvm \ +> +>>>>     -machine q35 \ +> +>>>>     -cpu host -smp 2 -m 2G \ +> +>>>>     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +> +>>>> ram0,share=on\ +> +>>>>     -machine memory-backend=ram0 \ +> +>>>>     -machine aux-ram-share=on \ +> +>>>>     -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>>     -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>>     -nographic \ +> +>>>>     -device qxl-vga \ +> +>>>>     -incoming tcp:0:44444 \ +> +>>>>     -incoming '{"channel-type": "cpr", "addr": { "transport": +> +>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +> +>>> +> +>>> +> +>>> Launch the migration: +> +>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +> +>>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>>> +> +>>>> $QMPSHELL -p $QMPSOCK <<EOF +> +>>>>     migrate-set-parameters mode=cpr-transfer +> +>>>>     migrate channels=[{"channel-type":"main","addr": +> +>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, +> +>>>> {"channel-type":"cpr","addr": +> +>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +> +>>>> dst.sock"}}] +> +>>>> EOF +> +>>> +> +>>> Then, after a while, QXL guest driver on target crashes spewing the +> +>>> following messages: +> +>>>> [  73.962002] [TTM] Buffer eviction failed +> +>>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +> +>>>> 0x00000001) +> +>>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +> +>>>> allocate VRAM BO +> +>>> +> +>>> That seems to be a known kernel QXL driver bug: +> +>>> +> +>>> +https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ +> +>>> +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +> +>>> +> +>>> (the latter discussion contains that reproduce script which speeds up +> +>>> the crash in the guest): +> +>>>> #!/bin/bash +> +>>>> +> +>>>> chvt 3 +> +>>>> +> +>>>> for j in $(seq 80); do +> +>>>>         echo "$(date) starting round $j" +> +>>>>         if [ "$(journalctl --boot | grep "failed to allocate VRAM +> +>>>> BO")" != "" ]; then +> +>>>>                 echo "bug was reproduced after $j tries" +> +>>>>                 exit 1 +> +>>>>         fi +> +>>>>         for i in $(seq 100); do +> +>>>>                 dmesg > /dev/tty3 +> +>>>>         done +> +>>>> done +> +>>>> +> +>>>> echo "bug could not be reproduced" +> +>>>> exit 0 +> +>>> +> +>>> The bug itself seems to remain unfixed, as I was able to reproduce that +> +>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +> +>>> cpr-transfer code also seems to be buggy as it triggers the crash - +> +>>> without the cpr-transfer migration the above reproduce doesn't lead to +> +>>> crash on the source VM. +> +>>> +> +>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but +> +>>> rather passes it through the memory backend object, our code might +> +>>> somehow corrupt the VRAM. However, I wasn't able to trace the +> +>>> corruption so far. +> +>>> +> +>>> Could somebody help the investigation and take a look into this? Any +> +>>> suggestions would be appreciated. Thanks! +> +>> +> +>> Possibly some memory region created by qxl is not being preserved. +> +>> Try adding these traces to see what is preserved: +> +>> +> +>> -trace enable='*cpr*' +> +>> -trace enable='*ram_alloc*' +> +> +> +> Also try adding this patch to see if it flags any ram blocks as not +> +> compatible with cpr. A message is printed at migration start time. +> +>  +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email- +> +> steven.sistare@oracle.com/ +> +> +> +> - Steve +> +> +> +> +With the traces enabled + the "migration: ram block cpr blockers" patch +> +applied: +> +> +Source: +> +> cpr_find_fd pc.bios, id 0 returns -1 +> +> cpr_save_fd pc.bios, id 0, fd 22 +> +> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +> +> 0x7fec18e00000 +> +> cpr_find_fd pc.rom, id 0 returns -1 +> +> cpr_save_fd pc.rom, id 0, fd 23 +> +> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +> +> 0x7fec18c00000 +> +> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +> +> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +> +> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd +> +> 24 host 0x7fec18a00000 +> +> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +> +> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +> +> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 +> +> fd 25 host 0x7feb77e00000 +> +> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +> +> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +> +> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 +> +> host 0x7fec18800000 +> +> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +> +> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +> +> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 +> +> fd 28 host 0x7feb73c00000 +> +> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +> +> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +> +> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 +> +> host 0x7fec18600000 +> +> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +> +> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +> +> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd +> +> 35 host 0x7fec18200000 +> +> cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +> +> cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +> +> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 +> +> host 0x7feb8b600000 +> +> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +> +> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +> +> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host +> +> 0x7feb8b400000 +> +> +> +> cpr_state_save cpr-transfer mode +> +> cpr_transfer_output /var/run/alma8cpr-dst.sock +> +> +Target: +> +> cpr_transfer_input /var/run/alma8cpr-dst.sock +> +> cpr_state_load cpr-transfer mode +> +> cpr_find_fd pc.bios, id 0 returns 20 +> +> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +> +> 0x7fcdc9800000 +> +> cpr_find_fd pc.rom, id 0 returns 19 +> +> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +> +> 0x7fcdc9600000 +> +> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +> +> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd +> +> 18 host 0x7fcdc9400000 +> +> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +> +> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 +> +> fd 17 host 0x7fcd27e00000 +> +> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +> +> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 +> +> host 0x7fcdc9200000 +> +> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +> +> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 +> +> fd 15 host 0x7fcd23c00000 +> +> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +> +> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 +> +> host 0x7fcdc8800000 +> +> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +> +> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd +> +> 13 host 0x7fcdc8400000 +> +> cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +> +> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 +> +> host 0x7fcdc8200000 +> +> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +> +> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host +> +> 0x7fcd3be00000 +> +> +Looks like both vga.vram and qxl.vram are being preserved (with the same +> +addresses), and no incompatible ram blocks are found during migration. +> +Sorry, addressed are not the same, of course. However corresponding ram +blocks do seem to be preserved and initialized. + +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +On 2/28/25 8:20 PM, Steven Sistare wrote: +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +     -machine q35 \ +     -cpu host -smp 2 -m 2G \ +     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +ram0,share=on\ +     -machine memory-backend=ram0 \ +     -machine aux-ram-share=on \ +     -drive file=$ROOTFS,media=disk,if=virtio \ +     -qmp unix:$QMPSOCK,server=on,wait=off \ +     -nographic \ +     -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +     -machine q35 \ +     -cpu host -smp 2 -m 2G \ +     -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +ram0,share=on\ +     -machine memory-backend=ram0 \ +     -machine aux-ram-share=on \ +     -drive file=$ROOTFS,media=disk,if=virtio \ +     -qmp unix:$QMPSOCK,server=on,wait=off \ +     -nographic \ +     -device qxl-vga \ +     -incoming tcp:0:44444 \ +     -incoming '{"channel-type": "cpr", "addr": { "transport": +"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +     migrate-set-parameters mode=cpr-transfer +     migrate channels=[{"channel-type":"main","addr": +{"transport":"socket","type":"inet","host":"0","port":"44444"}}, +{"channel-type":"cpr","addr": +{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +allocate VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1-min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +         echo "$(date) starting round $j" +         if [ "$(journalctl --boot | grep "failed to allocate VRAM +BO")" != "" ]; then +                 echo "bug was reproduced after $j tries" +                 exit 1 +         fi +         for i in $(seq 100); do +                 dmesg > /dev/tty3 +         done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +  +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send-email- +steven.sistare@oracle.com/ + +- Steve +With the traces enabled + the "migration: ram block cpr blockers" patch +applied: + +Source: +cpr_find_fd pc.bios, id 0 returns -1 +cpr_save_fd pc.bios, id 0, fd 22 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +0x7fec18e00000 +cpr_find_fd pc.rom, id 0 returns -1 +cpr_save_fd pc.rom, id 0, fd 23 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +0x7fec18c00000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 24 +host 0x7fec18a00000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd +25 host 0x7feb77e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 27 host +0x7fec18800000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd +28 host 0x7feb73c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 34 host +0x7fec18600000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 35 +host 0x7fec18200000 +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 36 host +0x7feb8b600000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 37 host +0x7feb8b400000 + +cpr_state_save cpr-transfer mode +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +cpr_transfer_input /var/run/alma8cpr-dst.sock +cpr_state_load cpr-transfer mode +cpr_find_fd pc.bios, id 0 returns 20 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +0x7fcdc9800000 +cpr_find_fd pc.rom, id 0 returns 19 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +0x7fcdc9600000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size 262144 fd 18 +host 0x7fcdc9400000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size 67108864 fd +17 host 0x7fcd27e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 fd 16 host +0x7fcdc9200000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size 67108864 fd +15 host 0x7fcd23c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 fd 14 host +0x7fcdc8800000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size 2097152 fd 13 +host 0x7fcdc8400000 +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 fd 11 host +0x7fcdc8200000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd 10 host +0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with the same +addresses), and no incompatible ram blocks are found during migration. +Sorry, addressed are not the same, of course. However corresponding ram +blocks do seem to be preserved and initialized. +So far, I have not reproduced the guest driver failure. + +However, I have isolated places where new QEMU improperly writes to +the qxl memory regions prior to starting the guest, by mmap'ing them +readonly after cpr: + + qemu_ram_alloc_internal() + if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) + ram_flags |= RAM_READONLY; + new_block = qemu_ram_alloc_from_fd(...) + +I have attached a draft fix; try it and let me know. +My console window looks fine before and after cpr, using +-vnc $hostip:0 -vga qxl + +- Steve +0001-hw-qxl-cpr-support-preliminary.patch +Description: +Text document + +On 3/4/25 9:05 PM, Steven Sistare wrote: +> +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +> +> On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +> +>> On 2/28/25 8:20 PM, Steven Sistare wrote: +> +>>> On 2/28/2025 1:13 PM, Steven Sistare wrote: +> +>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +> +>>>>> Hi all, +> +>>>>> +> +>>>>> We've been experimenting with cpr-transfer migration mode recently +> +>>>>> and +> +>>>>> have discovered the following issue with the guest QXL driver: +> +>>>>> +> +>>>>> Run migration source: +> +>>>>>> EMULATOR=/path/to/emulator +> +>>>>>> ROOTFS=/path/to/image +> +>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>>>>> +> +>>>>>> $EMULATOR -enable-kvm \ +> +>>>>>>      -machine q35 \ +> +>>>>>>      -cpu host -smp 2 -m 2G \ +> +>>>>>>      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +> +>>>>>> ram0,share=on\ +> +>>>>>>      -machine memory-backend=ram0 \ +> +>>>>>>      -machine aux-ram-share=on \ +> +>>>>>>      -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>>>>      -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>>>>      -nographic \ +> +>>>>>>      -device qxl-vga +> +>>>>> +> +>>>>> Run migration target: +> +>>>>>> EMULATOR=/path/to/emulator +> +>>>>>> ROOTFS=/path/to/image +> +>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock +> +>>>>>> $EMULATOR -enable-kvm \ +> +>>>>>>      -machine q35 \ +> +>>>>>>      -cpu host -smp 2 -m 2G \ +> +>>>>>>      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +> +>>>>>> ram0,share=on\ +> +>>>>>>      -machine memory-backend=ram0 \ +> +>>>>>>      -machine aux-ram-share=on \ +> +>>>>>>      -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>>>>      -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>>>>      -nographic \ +> +>>>>>>      -device qxl-vga \ +> +>>>>>>      -incoming tcp:0:44444 \ +> +>>>>>>      -incoming '{"channel-type": "cpr", "addr": { "transport": +> +>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +> +>>>>> +> +>>>>> +> +>>>>> Launch the migration: +> +>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +> +>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>>>>> +> +>>>>>> $QMPSHELL -p $QMPSOCK <<EOF +> +>>>>>>      migrate-set-parameters mode=cpr-transfer +> +>>>>>>      migrate channels=[{"channel-type":"main","addr": +> +>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, +> +>>>>>> {"channel-type":"cpr","addr": +> +>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +> +>>>>>> dst.sock"}}] +> +>>>>>> EOF +> +>>>>> +> +>>>>> Then, after a while, QXL guest driver on target crashes spewing the +> +>>>>> following messages: +> +>>>>>> [  73.962002] [TTM] Buffer eviction failed +> +>>>>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +> +>>>>>> 0x00000001) +> +>>>>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +> +>>>>>> allocate VRAM BO +> +>>>>> +> +>>>>> That seems to be a known kernel QXL driver bug: +> +>>>>> +> +>>>>> +https://lore.kernel.org/all/20220907094423.93581-1- +> +>>>>> min_halo@163.com/T/ +> +>>>>> +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +> +>>>>> +> +>>>>> (the latter discussion contains that reproduce script which speeds up +> +>>>>> the crash in the guest): +> +>>>>>> #!/bin/bash +> +>>>>>> +> +>>>>>> chvt 3 +> +>>>>>> +> +>>>>>> for j in $(seq 80); do +> +>>>>>>          echo "$(date) starting round $j" +> +>>>>>>          if [ "$(journalctl --boot | grep "failed to allocate VRAM +> +>>>>>> BO")" != "" ]; then +> +>>>>>>                  echo "bug was reproduced after $j tries" +> +>>>>>>                  exit 1 +> +>>>>>>          fi +> +>>>>>>          for i in $(seq 100); do +> +>>>>>>                  dmesg > /dev/tty3 +> +>>>>>>          done +> +>>>>>> done +> +>>>>>> +> +>>>>>> echo "bug could not be reproduced" +> +>>>>>> exit 0 +> +>>>>> +> +>>>>> The bug itself seems to remain unfixed, as I was able to reproduce +> +>>>>> that +> +>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +> +>>>>> cpr-transfer code also seems to be buggy as it triggers the crash - +> +>>>>> without the cpr-transfer migration the above reproduce doesn't +> +>>>>> lead to +> +>>>>> crash on the source VM. +> +>>>>> +> +>>>>> I suspect that, as cpr-transfer doesn't migrate the guest memory, but +> +>>>>> rather passes it through the memory backend object, our code might +> +>>>>> somehow corrupt the VRAM. However, I wasn't able to trace the +> +>>>>> corruption so far. +> +>>>>> +> +>>>>> Could somebody help the investigation and take a look into this? Any +> +>>>>> suggestions would be appreciated. Thanks! +> +>>>> +> +>>>> Possibly some memory region created by qxl is not being preserved. +> +>>>> Try adding these traces to see what is preserved: +> +>>>> +> +>>>> -trace enable='*cpr*' +> +>>>> -trace enable='*ram_alloc*' +> +>>> +> +>>> Also try adding this patch to see if it flags any ram blocks as not +> +>>> compatible with cpr. A message is printed at migration start time. +> +>>>   +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +> +>>> email- +> +>>> steven.sistare@oracle.com/ +> +>>> +> +>>> - Steve +> +>>> +> +>> +> +>> With the traces enabled + the "migration: ram block cpr blockers" patch +> +>> applied: +> +>> +> +>> Source: +> +>>> cpr_find_fd pc.bios, id 0 returns -1 +> +>>> cpr_save_fd pc.bios, id 0, fd 22 +> +>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +> +>>> 0x7fec18e00000 +> +>>> cpr_find_fd pc.rom, id 0 returns -1 +> +>>> cpr_save_fd pc.rom, id 0, fd 23 +> +>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +> +>>> 0x7fec18c00000 +> +>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +> +>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +> +>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +> +>>> 262144 fd 24 host 0x7fec18a00000 +> +>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +> +>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +> +>>> 67108864 fd 25 host 0x7feb77e00000 +> +>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +> +>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +> +>>> fd 27 host 0x7fec18800000 +> +>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +> +>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +> +>>> 67108864 fd 28 host 0x7feb73c00000 +> +>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +> +>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +> +>>> fd 34 host 0x7fec18600000 +> +>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +> +>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +> +>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +> +>>> 2097152 fd 35 host 0x7fec18200000 +> +>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +> +>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +> +>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +> +>>> fd 36 host 0x7feb8b600000 +> +>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +> +>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +> +>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +> +>>> 37 host 0x7feb8b400000 +> +>>> +> +>>> cpr_state_save cpr-transfer mode +> +>>> cpr_transfer_output /var/run/alma8cpr-dst.sock +> +>> +> +>> Target: +> +>>> cpr_transfer_input /var/run/alma8cpr-dst.sock +> +>>> cpr_state_load cpr-transfer mode +> +>>> cpr_find_fd pc.bios, id 0 returns 20 +> +>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +> +>>> 0x7fcdc9800000 +> +>>> cpr_find_fd pc.rom, id 0 returns 19 +> +>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +> +>>> 0x7fcdc9600000 +> +>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +> +>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +> +>>> 262144 fd 18 host 0x7fcdc9400000 +> +>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +> +>>> 67108864 fd 17 host 0x7fcd27e00000 +> +>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +> +>>> fd 16 host 0x7fcdc9200000 +> +>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +> +>>> 67108864 fd 15 host 0x7fcd23c00000 +> +>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +> +>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +> +>>> fd 14 host 0x7fcdc8800000 +> +>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +> +>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +> +>>> 2097152 fd 13 host 0x7fcdc8400000 +> +>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +> +>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +> +>>> fd 11 host 0x7fcdc8200000 +> +>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +> +>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +> +>>> 10 host 0x7fcd3be00000 +> +>> +> +>> Looks like both vga.vram and qxl.vram are being preserved (with the same +> +>> addresses), and no incompatible ram blocks are found during migration. +> +> +> +> Sorry, addressed are not the same, of course. However corresponding ram +> +> blocks do seem to be preserved and initialized. +> +> +So far, I have not reproduced the guest driver failure. +> +> +However, I have isolated places where new QEMU improperly writes to +> +the qxl memory regions prior to starting the guest, by mmap'ing them +> +readonly after cpr: +> +> + qemu_ram_alloc_internal() +> +   if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +> +       ram_flags |= RAM_READONLY; +> +   new_block = qemu_ram_alloc_from_fd(...) +> +> +I have attached a draft fix; try it and let me know. +> +My console window looks fine before and after cpr, using +> +-vnc $hostip:0 -vga qxl +> +> +- Steve +Regarding the reproduce: when I launch the buggy version with the same +options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +my VNC client silently hangs on the target after a while. Could it +happen on your stand as well? Could you try launching VM with +"-nographic -device qxl-vga"? That way VM's serial console is given you +directly in the shell, so when qxl driver crashes you're still able to +inspect the kernel messages. + +As for your patch, I can report that it doesn't resolve the issue as it +is. But I was able to track down another possible memory corruption +using your approach with readonly mmap'ing: + +> +Program terminated with signal SIGSEGV, Segmentation fault. +> +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +> +412 d->ram->magic = cpu_to_le32(QXL_RAM_MAGIC); +> +[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +> +(gdb) bt +> +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +> +#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +> +errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +> +#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +> +errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +> +#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +> +errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +> +#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, +> +value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +> +#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, +> +v=0x5638996f3770, name=0x56389759b141 "realized", opaque=0x5638987893d0, +> +errp=0x7ffd3c2b84e0) +> +at ../qom/object.c:2374 +> +#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, +> +name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) +> +at ../qom/object.c:1449 +> +#7 0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, +> +name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0) +> +at ../qom/qom-qobject.c:28 +> +#8 0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, +> +name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0) +> +at ../qom/object.c:1519 +> +#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, +> +bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +> +#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, +> +from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714 +> +#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, +> +errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 +> +#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, +> +errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207 +> +#13 0x000056389737a6cc in qemu_opts_foreach +> +(list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca +> +<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) +> +at ../util/qemu-option.c:1135 +> +#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745 +> +#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +> +<error_fatal>) at ../system/vl.c:2806 +> +#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at +> +../system/vl.c:3838 +> +#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at +> +../system/main.c:72 +So the attached adjusted version of your patch does seem to help. At +least I can't reproduce the crash on my stand. + +I'm wondering, could it be useful to explicitly mark all the reused +memory regions readonly upon cpr-transfer, and then make them writable +back again after the migration is done? That way we will be segfaulting +early on instead of debugging tricky memory corruptions. + +Andrey +0001-hw-qxl-cpr-support-preliminary.patch +Description: +Text Data + +On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: +On 3/4/25 9:05 PM, Steven Sistare wrote: +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +On 2/28/25 8:20 PM, Steven Sistare wrote: +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently +and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +      -machine q35 \ +      -cpu host -smp 2 -m 2G \ +      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +ram0,share=on\ +      -machine memory-backend=ram0 \ +      -machine aux-ram-share=on \ +      -drive file=$ROOTFS,media=disk,if=virtio \ +      -qmp unix:$QMPSOCK,server=on,wait=off \ +      -nographic \ +      -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +      -machine q35 \ +      -cpu host -smp 2 -m 2G \ +      -object memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +ram0,share=on\ +      -machine memory-backend=ram0 \ +      -machine aux-ram-share=on \ +      -drive file=$ROOTFS,media=disk,if=virtio \ +      -qmp unix:$QMPSOCK,server=on,wait=off \ +      -nographic \ +      -device qxl-vga \ +      -incoming tcp:0:44444 \ +      -incoming '{"channel-type": "cpr", "addr": { "transport": +"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +      migrate-set-parameters mode=cpr-transfer +      migrate channels=[{"channel-type":"main","addr": +{"transport":"socket","type":"inet","host":"0","port":"44444"}}, +{"channel-type":"cpr","addr": +{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +allocate VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1- +min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +          echo "$(date) starting round $j" +          if [ "$(journalctl --boot | grep "failed to allocate VRAM +BO")" != "" ]; then +                  echo "bug was reproduced after $j tries" +                  exit 1 +          fi +          for i in $(seq 100); do +                  dmesg > /dev/tty3 +          done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce +that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't +lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +   +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +email- +steven.sistare@oracle.com/ + +- Steve +With the traces enabled + the "migration: ram block cpr blockers" patch +applied: + +Source: +cpr_find_fd pc.bios, id 0 returns -1 +cpr_save_fd pc.bios, id 0, fd 22 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +0x7fec18e00000 +cpr_find_fd pc.rom, id 0 returns -1 +cpr_save_fd pc.rom, id 0, fd 23 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +0x7fec18c00000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 24 host 0x7fec18a00000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 25 host 0x7feb77e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 27 host 0x7fec18800000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 28 host 0x7feb73c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 34 host 0x7fec18600000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 35 host 0x7fec18200000 +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 36 host 0x7feb8b600000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +37 host 0x7feb8b400000 + +cpr_state_save cpr-transfer mode +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +cpr_transfer_input /var/run/alma8cpr-dst.sock +cpr_state_load cpr-transfer mode +cpr_find_fd pc.bios, id 0 returns 20 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +0x7fcdc9800000 +cpr_find_fd pc.rom, id 0 returns 19 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +0x7fcdc9600000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 18 host 0x7fcdc9400000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 17 host 0x7fcd27e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 16 host 0x7fcdc9200000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 15 host 0x7fcd23c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 14 host 0x7fcdc8800000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 13 host 0x7fcdc8400000 +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 11 host 0x7fcdc8200000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +10 host 0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with the same +addresses), and no incompatible ram blocks are found during migration. +Sorry, addressed are not the same, of course. However corresponding ram +blocks do seem to be preserved and initialized. +So far, I have not reproduced the guest driver failure. + +However, I have isolated places where new QEMU improperly writes to +the qxl memory regions prior to starting the guest, by mmap'ing them +readonly after cpr: + +  qemu_ram_alloc_internal() +    if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +        ram_flags |= RAM_READONLY; +    new_block = qemu_ram_alloc_from_fd(...) + +I have attached a draft fix; try it and let me know. +My console window looks fine before and after cpr, using +-vnc $hostip:0 -vga qxl + +- Steve +Regarding the reproduce: when I launch the buggy version with the same +options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +my VNC client silently hangs on the target after a while. Could it +happen on your stand as well? +cpr does not preserve the vnc connection and session. To test, I specify +port 0 for the source VM and port 1 for the dest. When the src vnc goes +dormant the dest vnc becomes active. +Could you try launching VM with +"-nographic -device qxl-vga"? That way VM's serial console is given you +directly in the shell, so when qxl driver crashes you're still able to +inspect the kernel messages. +I have been running like that, but have not reproduced the qxl driver crash, +and I suspect my guest image+kernel is too old. However, once I realized the +issue was post-cpr modification of qxl memory, I switched my attention to the +fix. +As for your patch, I can report that it doesn't resolve the issue as it +is. But I was able to track down another possible memory corruption +using your approach with readonly mmap'ing: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +412 d->ram->magic = cpu_to_le32(QXL_RAM_MAGIC); +[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +(gdb) bt +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, value=true, +errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, v=0x5638996f3770, +name=0x56389759b141 "realized", opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) + at ../qom/object.c:2374 +#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, name=0x56389759b141 +"realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) + at ../qom/object.c:1449 +#7 0x00005638970f8586 in object_property_set_qobject (obj=0x5638996e0e70, +name=0x56389759b141 "realized", value=0x5638996df900, errp=0x7ffd3c2b84e0) + at ../qom/qom-qobject.c:28 +#8 0x00005638970f3d8d in object_property_set_bool (obj=0x5638996e0e70, +name=0x56389759b141 "realized", value=true, errp=0x7ffd3c2b84e0) + at ../qom/object.c:1519 +#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, bus=0x563898cf3c20, +errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +#10 0x0000563896dba675 in qdev_device_add_from_qdict (opts=0x5638996dfe50, +from_json=false, errp=0x7ffd3c2b84e0) at ../system/qdev-monitor.c:714 +#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, errp=0x56389855dc40 +<error_fatal>) at ../system/qdev-monitor.c:733 +#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, opts=0x563898786150, +errp=0x56389855dc40 <error_fatal>) at ../system/vl.c:1207 +#13 0x000056389737a6cc in qemu_opts_foreach + (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca <device_init_func>, +opaque=0x0, errp=0x56389855dc40 <error_fatal>) + at ../util/qemu-option.c:1135 +#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/vl.c:2745 +#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +<error_fatal>) at ../system/vl.c:2806 +#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) at +../system/vl.c:3838 +#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at +../system/main.c:72 +So the attached adjusted version of your patch does seem to help. At +least I can't reproduce the crash on my stand. +Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram are +definitely harmful. Try V2 of the patch, attached, which skips the lines +of init_qxl_ram that modify guest memory. +I'm wondering, could it be useful to explicitly mark all the reused +memory regions readonly upon cpr-transfer, and then make them writable +back again after the migration is done? That way we will be segfaulting +early on instead of debugging tricky memory corruptions. +It's a useful debugging technique, but changing protection on a large memory +region +can be too expensive for production due to TLB shootdowns. + +Also, there are cases where writes are performed but the value is guaranteed to +be the same: + qxl_post_load() + qxl_set_mode() + d->rom->mode = cpu_to_le32(modenr); +The value is the same because mode and shadow_rom.mode were passed in vmstate +from old qemu. + +- Steve +0001-hw-qxl-cpr-support-preliminary-V2.patch +Description: +Text document + +On 3/5/25 22:19, Steven Sistare wrote: +On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: +On 3/4/25 9:05 PM, Steven Sistare wrote: +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +On 2/28/25 8:20 PM, Steven Sistare wrote: +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently +and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +      -machine q35 \ +      -cpu host -smp 2 -m 2G \ +      -object +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +ram0,share=on\ +      -machine memory-backend=ram0 \ +      -machine aux-ram-share=on \ +      -drive file=$ROOTFS,media=disk,if=virtio \ +      -qmp unix:$QMPSOCK,server=on,wait=off \ +      -nographic \ +      -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +      -machine q35 \ +      -cpu host -smp 2 -m 2G \ +      -object +memory-backend-file,id=ram0,size=2G,mem-path=/dev/shm/ +ram0,share=on\ +      -machine memory-backend=ram0 \ +      -machine aux-ram-share=on \ +      -drive file=$ROOTFS,media=disk,if=virtio \ +      -qmp unix:$QMPSOCK,server=on,wait=off \ +      -nographic \ +      -device qxl-vga \ +      -incoming tcp:0:44444 \ +      -incoming '{"channel-type": "cpr", "addr": { "transport": +"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +      migrate-set-parameters mode=cpr-transfer +      migrate channels=[{"channel-type":"main","addr": +{"transport":"socket","type":"inet","host":"0","port":"44444"}}, +{"channel-type":"cpr","addr": +{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing +the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* +failed to +allocate VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1- +min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which +speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +          echo "$(date) starting round $j" +          if [ "$(journalctl --boot | grep "failed to +allocate VRAM +BO")" != "" ]; then +                  echo "bug was reproduced after $j tries" +                  exit 1 +          fi +          for i in $(seq 100); do +                  dmesg > /dev/tty3 +          done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce +that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the +crash - +without the cpr-transfer migration the above reproduce doesn't +lead to +crash on the source VM. +I suspect that, as cpr-transfer doesn't migrate the guest +memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. +Could somebody help the investigation and take a look into +this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +email- +steven.sistare@oracle.com/ + +- Steve +With the traces enabled + the "migration: ram block cpr blockers" +patch +applied: + +Source: +cpr_find_fd pc.bios, id 0 returns -1 +cpr_save_fd pc.bios, id 0, fd 22 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +0x7fec18e00000 +cpr_find_fd pc.rom, id 0 returns -1 +cpr_save_fd pc.rom, id 0, fd 23 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +0x7fec18c00000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 24 host 0x7fec18a00000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 25 host 0x7feb77e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 27 host 0x7fec18800000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 28 host 0x7feb73c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 34 host 0x7fec18600000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 35 host 0x7fec18200000 +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 36 host 0x7feb8b600000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +37 host 0x7feb8b400000 + +cpr_state_save cpr-transfer mode +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +cpr_transfer_input /var/run/alma8cpr-dst.sock +cpr_state_load cpr-transfer mode +cpr_find_fd pc.bios, id 0 returns 20 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +0x7fcdc9800000 +cpr_find_fd pc.rom, id 0 returns 19 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +0x7fcdc9600000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 18 host 0x7fcdc9400000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 17 host 0x7fcd27e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 16 host 0x7fcdc9200000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 15 host 0x7fcd23c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 14 host 0x7fcdc8800000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 13 host 0x7fcdc8400000 +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 11 host 0x7fcdc8200000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +10 host 0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with +the same +addresses), and no incompatible ram blocks are found during +migration. +Sorry, addressed are not the same, of course. However +corresponding ram +blocks do seem to be preserved and initialized. +So far, I have not reproduced the guest driver failure. + +However, I have isolated places where new QEMU improperly writes to +the qxl memory regions prior to starting the guest, by mmap'ing them +readonly after cpr: + +  qemu_ram_alloc_internal() +    if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +        ram_flags |= RAM_READONLY; +    new_block = qemu_ram_alloc_from_fd(...) + +I have attached a draft fix; try it and let me know. +My console window looks fine before and after cpr, using +-vnc $hostip:0 -vga qxl + +- Steve +Regarding the reproduce: when I launch the buggy version with the same +options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +my VNC client silently hangs on the target after a while. Could it +happen on your stand as well? +cpr does not preserve the vnc connection and session. To test, I specify +port 0 for the source VM and port 1 for the dest. When the src vnc goes +dormant the dest vnc becomes active. +Could you try launching VM with +"-nographic -device qxl-vga"? That way VM's serial console is given you +directly in the shell, so when qxl driver crashes you're still able to +inspect the kernel messages. +I have been running like that, but have not reproduced the qxl driver +crash, +and I suspect my guest image+kernel is too old. However, once I +realized the +issue was post-cpr modification of qxl memory, I switched my attention +to the +fix. +As for your patch, I can report that it doesn't resolve the issue as it +is. But I was able to track down another possible memory corruption +using your approach with readonly mmap'ing: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); +[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +(gdb) bt +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, +value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, +v=0x5638996f3770, name=0x56389759b141 "realized", +opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) +    at ../qom/object.c:2374 +#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, +name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) +    at ../qom/object.c:1449 +#7 0x00005638970f8586 in object_property_set_qobject +(obj=0x5638996e0e70, name=0x56389759b141 "realized", +value=0x5638996df900, errp=0x7ffd3c2b84e0) +    at ../qom/qom-qobject.c:28 +#8 0x00005638970f3d8d in object_property_set_bool +(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, +errp=0x7ffd3c2b84e0) +    at ../qom/object.c:1519 +#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, +bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +#10 0x0000563896dba675 in qdev_device_add_from_qdict +(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at +../system/qdev-monitor.c:714 +#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, +errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 +#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, +opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at +../system/vl.c:1207 +#13 0x000056389737a6cc in qemu_opts_foreach +    (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca +<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) +    at ../util/qemu-option.c:1135 +#14 0x0000563896dc89b5 in qemu_create_cli_devices () at +../system/vl.c:2745 +#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +<error_fatal>) at ../system/vl.c:2806 +#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) +at ../system/vl.c:3838 +#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at +../system/main.c:72 +So the attached adjusted version of your patch does seem to help. At +least I can't reproduce the crash on my stand. +Thanks for the stack trace; the calls to SPICE_RING_INIT in +init_qxl_ram are +definitely harmful. Try V2 of the patch, attached, which skips the lines +of init_qxl_ram that modify guest memory. +I'm wondering, could it be useful to explicitly mark all the reused +memory regions readonly upon cpr-transfer, and then make them writable +back again after the migration is done? That way we will be segfaulting +early on instead of debugging tricky memory corruptions. +It's a useful debugging technique, but changing protection on a large +memory region +can be too expensive for production due to TLB shootdowns. +Good point. Though we could move this code under non-default option to +avoid re-writing. + +Den + +On 3/5/25 11:19 PM, Steven Sistare wrote: +> +On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: +> +> On 3/4/25 9:05 PM, Steven Sistare wrote: +> +>> On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +> +>>> On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +> +>>>> On 2/28/25 8:20 PM, Steven Sistare wrote: +> +>>>>> On 2/28/2025 1:13 PM, Steven Sistare wrote: +> +>>>>>> On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +> +>>>>>>> Hi all, +> +>>>>>>> +> +>>>>>>> We've been experimenting with cpr-transfer migration mode recently +> +>>>>>>> and +> +>>>>>>> have discovered the following issue with the guest QXL driver: +> +>>>>>>> +> +>>>>>>> Run migration source: +> +>>>>>>>> EMULATOR=/path/to/emulator +> +>>>>>>>> ROOTFS=/path/to/image +> +>>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>>>>>>> +> +>>>>>>>> $EMULATOR -enable-kvm \ +> +>>>>>>>>       -machine q35 \ +> +>>>>>>>>       -cpu host -smp 2 -m 2G \ +> +>>>>>>>>       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +> +>>>>>>>> dev/shm/ +> +>>>>>>>> ram0,share=on\ +> +>>>>>>>>       -machine memory-backend=ram0 \ +> +>>>>>>>>       -machine aux-ram-share=on \ +> +>>>>>>>>       -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>>>>>>       -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>>>>>>       -nographic \ +> +>>>>>>>>       -device qxl-vga +> +>>>>>>> +> +>>>>>>> Run migration target: +> +>>>>>>>> EMULATOR=/path/to/emulator +> +>>>>>>>> ROOTFS=/path/to/image +> +>>>>>>>> QMPSOCK=/var/run/alma8qmp-dst.sock +> +>>>>>>>> $EMULATOR -enable-kvm \ +> +>>>>>>>>       -machine q35 \ +> +>>>>>>>>       -cpu host -smp 2 -m 2G \ +> +>>>>>>>>       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +> +>>>>>>>> dev/shm/ +> +>>>>>>>> ram0,share=on\ +> +>>>>>>>>       -machine memory-backend=ram0 \ +> +>>>>>>>>       -machine aux-ram-share=on \ +> +>>>>>>>>       -drive file=$ROOTFS,media=disk,if=virtio \ +> +>>>>>>>>       -qmp unix:$QMPSOCK,server=on,wait=off \ +> +>>>>>>>>       -nographic \ +> +>>>>>>>>       -device qxl-vga \ +> +>>>>>>>>       -incoming tcp:0:44444 \ +> +>>>>>>>>       -incoming '{"channel-type": "cpr", "addr": { "transport": +> +>>>>>>>> "socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +> +>>>>>>> +> +>>>>>>> +> +>>>>>>> Launch the migration: +> +>>>>>>>> QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +> +>>>>>>>> QMPSOCK=/var/run/alma8qmp-src.sock +> +>>>>>>>> +> +>>>>>>>> $QMPSHELL -p $QMPSOCK <<EOF +> +>>>>>>>>       migrate-set-parameters mode=cpr-transfer +> +>>>>>>>>       migrate channels=[{"channel-type":"main","addr": +> +>>>>>>>> {"transport":"socket","type":"inet","host":"0","port":"44444"}}, +> +>>>>>>>> {"channel-type":"cpr","addr": +> +>>>>>>>> {"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +> +>>>>>>>> dst.sock"}}] +> +>>>>>>>> EOF +> +>>>>>>> +> +>>>>>>> Then, after a while, QXL guest driver on target crashes spewing the +> +>>>>>>> following messages: +> +>>>>>>>> [  73.962002] [TTM] Buffer eviction failed +> +>>>>>>>> [  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +> +>>>>>>>> 0x00000001) +> +>>>>>>>> [  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +> +>>>>>>>> allocate VRAM BO +> +>>>>>>> +> +>>>>>>> That seems to be a known kernel QXL driver bug: +> +>>>>>>> +> +>>>>>>> +https://lore.kernel.org/all/20220907094423.93581-1- +> +>>>>>>> min_halo@163.com/T/ +> +>>>>>>> +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +> +>>>>>>> +> +>>>>>>> (the latter discussion contains that reproduce script which +> +>>>>>>> speeds up +> +>>>>>>> the crash in the guest): +> +>>>>>>>> #!/bin/bash +> +>>>>>>>> +> +>>>>>>>> chvt 3 +> +>>>>>>>> +> +>>>>>>>> for j in $(seq 80); do +> +>>>>>>>>           echo "$(date) starting round $j" +> +>>>>>>>>           if [ "$(journalctl --boot | grep "failed to allocate +> +>>>>>>>> VRAM +> +>>>>>>>> BO")" != "" ]; then +> +>>>>>>>>                   echo "bug was reproduced after $j tries" +> +>>>>>>>>                   exit 1 +> +>>>>>>>>           fi +> +>>>>>>>>           for i in $(seq 100); do +> +>>>>>>>>                   dmesg > /dev/tty3 +> +>>>>>>>>           done +> +>>>>>>>> done +> +>>>>>>>> +> +>>>>>>>> echo "bug could not be reproduced" +> +>>>>>>>> exit 0 +> +>>>>>>> +> +>>>>>>> The bug itself seems to remain unfixed, as I was able to reproduce +> +>>>>>>> that +> +>>>>>>> with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +> +>>>>>>> cpr-transfer code also seems to be buggy as it triggers the crash - +> +>>>>>>> without the cpr-transfer migration the above reproduce doesn't +> +>>>>>>> lead to +> +>>>>>>> crash on the source VM. +> +>>>>>>> +> +>>>>>>> I suspect that, as cpr-transfer doesn't migrate the guest +> +>>>>>>> memory, but +> +>>>>>>> rather passes it through the memory backend object, our code might +> +>>>>>>> somehow corrupt the VRAM. However, I wasn't able to trace the +> +>>>>>>> corruption so far. +> +>>>>>>> +> +>>>>>>> Could somebody help the investigation and take a look into +> +>>>>>>> this? Any +> +>>>>>>> suggestions would be appreciated. Thanks! +> +>>>>>> +> +>>>>>> Possibly some memory region created by qxl is not being preserved. +> +>>>>>> Try adding these traces to see what is preserved: +> +>>>>>> +> +>>>>>> -trace enable='*cpr*' +> +>>>>>> -trace enable='*ram_alloc*' +> +>>>>> +> +>>>>> Also try adding this patch to see if it flags any ram blocks as not +> +>>>>> compatible with cpr. A message is printed at migration start time. +> +>>>>>    +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +> +>>>>> email- +> +>>>>> steven.sistare@oracle.com/ +> +>>>>> +> +>>>>> - Steve +> +>>>>> +> +>>>> +> +>>>> With the traces enabled + the "migration: ram block cpr blockers" +> +>>>> patch +> +>>>> applied: +> +>>>> +> +>>>> Source: +> +>>>>> cpr_find_fd pc.bios, id 0 returns -1 +> +>>>>> cpr_save_fd pc.bios, id 0, fd 22 +> +>>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +> +>>>>> 0x7fec18e00000 +> +>>>>> cpr_find_fd pc.rom, id 0 returns -1 +> +>>>>> cpr_save_fd pc.rom, id 0, fd 23 +> +>>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +> +>>>>> 0x7fec18c00000 +> +>>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +> +>>>>> cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +> +>>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +> +>>>>> 262144 fd 24 host 0x7fec18a00000 +> +>>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +> +>>>>> cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +> +>>>>> 67108864 fd 25 host 0x7feb77e00000 +> +>>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +> +>>>>> cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +> +>>>>> fd 27 host 0x7fec18800000 +> +>>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +> +>>>>> cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +> +>>>>> 67108864 fd 28 host 0x7feb73c00000 +> +>>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +> +>>>>> cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +> +>>>>> fd 34 host 0x7fec18600000 +> +>>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +> +>>>>> cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +> +>>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +> +>>>>> 2097152 fd 35 host 0x7fec18200000 +> +>>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +> +>>>>> cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +> +>>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +> +>>>>> fd 36 host 0x7feb8b600000 +> +>>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +> +>>>>> cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +> +>>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +> +>>>>> 37 host 0x7feb8b400000 +> +>>>>> +> +>>>>> cpr_state_save cpr-transfer mode +> +>>>>> cpr_transfer_output /var/run/alma8cpr-dst.sock +> +>>>> +> +>>>> Target: +> +>>>>> cpr_transfer_input /var/run/alma8cpr-dst.sock +> +>>>>> cpr_state_load cpr-transfer mode +> +>>>>> cpr_find_fd pc.bios, id 0 returns 20 +> +>>>>> qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +> +>>>>> 0x7fcdc9800000 +> +>>>>> cpr_find_fd pc.rom, id 0 returns 19 +> +>>>>> qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +> +>>>>> 0x7fcdc9600000 +> +>>>>> cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +> +>>>>> qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +> +>>>>> 262144 fd 18 host 0x7fcdc9400000 +> +>>>>> cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +> +>>>>> 67108864 fd 17 host 0x7fcd27e00000 +> +>>>>> cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +> +>>>>> fd 16 host 0x7fcdc9200000 +> +>>>>> cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +> +>>>>> 67108864 fd 15 host 0x7fcd23c00000 +> +>>>>> cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +> +>>>>> qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +> +>>>>> fd 14 host 0x7fcdc8800000 +> +>>>>> cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +> +>>>>> qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +> +>>>>> 2097152 fd 13 host 0x7fcdc8400000 +> +>>>>> cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +> +>>>>> qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +> +>>>>> fd 11 host 0x7fcdc8200000 +> +>>>>> cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +> +>>>>> qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +> +>>>>> 10 host 0x7fcd3be00000 +> +>>>> +> +>>>> Looks like both vga.vram and qxl.vram are being preserved (with the +> +>>>> same +> +>>>> addresses), and no incompatible ram blocks are found during migration. +> +>>> +> +>>> Sorry, addressed are not the same, of course. However corresponding +> +>>> ram +> +>>> blocks do seem to be preserved and initialized. +> +>> +> +>> So far, I have not reproduced the guest driver failure. +> +>> +> +>> However, I have isolated places where new QEMU improperly writes to +> +>> the qxl memory regions prior to starting the guest, by mmap'ing them +> +>> readonly after cpr: +> +>> +> +>>   qemu_ram_alloc_internal() +> +>>     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +> +>>         ram_flags |= RAM_READONLY; +> +>>     new_block = qemu_ram_alloc_from_fd(...) +> +>> +> +>> I have attached a draft fix; try it and let me know. +> +>> My console window looks fine before and after cpr, using +> +>> -vnc $hostip:0 -vga qxl +> +>> +> +>> - Steve +> +> +> +> Regarding the reproduce: when I launch the buggy version with the same +> +> options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +> +> my VNC client silently hangs on the target after a while. Could it +> +> happen on your stand as well? +> +> +cpr does not preserve the vnc connection and session. To test, I specify +> +port 0 for the source VM and port 1 for the dest. When the src vnc goes +> +dormant the dest vnc becomes active. +> +Sure, I meant that VNC on the dest (on the port 1) works for a while +after the migration and then hangs, apparently after the guest QXL crash. + +> +> Could you try launching VM with +> +> "-nographic -device qxl-vga"? That way VM's serial console is given you +> +> directly in the shell, so when qxl driver crashes you're still able to +> +> inspect the kernel messages. +> +> +I have been running like that, but have not reproduced the qxl driver +> +crash, +> +and I suspect my guest image+kernel is too old. +Yes, that's probably the case. But the crash occurs on my Fedora 41 +guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to +be buggy. + + +> +However, once I realized the +> +issue was post-cpr modification of qxl memory, I switched my attention +> +to the +> +fix. +> +> +> As for your patch, I can report that it doesn't resolve the issue as it +> +> is. But I was able to track down another possible memory corruption +> +> using your approach with readonly mmap'ing: +> +> +> +>> Program terminated with signal SIGSEGV, Segmentation fault. +> +>> #0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +> +>> 412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); +> +>> [Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +> +>> (gdb) bt +> +>> #0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +> +>> #1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +> +>> errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +> +>> #2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +> +>> errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +> +>> #3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +> +>> errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +> +>> #4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, +> +>> value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +> +>> #5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, +> +>> v=0x5638996f3770, name=0x56389759b141 "realized", +> +>> opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) +> +>>     at ../qom/object.c:2374 +> +>> #6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, +> +>> name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) +> +>>     at ../qom/object.c:1449 +> +>> #7 0x00005638970f8586 in object_property_set_qobject +> +>> (obj=0x5638996e0e70, name=0x56389759b141 "realized", +> +>> value=0x5638996df900, errp=0x7ffd3c2b84e0) +> +>>     at ../qom/qom-qobject.c:28 +> +>> #8 0x00005638970f3d8d in object_property_set_bool +> +>> (obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, +> +>> errp=0x7ffd3c2b84e0) +> +>>     at ../qom/object.c:1519 +> +>> #9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, +> +>> bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +> +>> #10 0x0000563896dba675 in qdev_device_add_from_qdict +> +>> (opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ +> +>> system/qdev-monitor.c:714 +> +>> #11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, +> +>> errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 +> +>> #12 0x0000563896dc48f1 in device_init_func (opaque=0x0, +> +>> opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ +> +>> vl.c:1207 +> +>> #13 0x000056389737a6cc in qemu_opts_foreach +> +>>     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca +> +>> <device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) +> +>>     at ../util/qemu-option.c:1135 +> +>> #14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ +> +>> vl.c:2745 +> +>> #15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +> +>> <error_fatal>) at ../system/vl.c:2806 +> +>> #16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) +> +>> at ../system/vl.c:3838 +> +>> #17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ +> +>> system/main.c:72 +> +> +> +> So the attached adjusted version of your patch does seem to help. At +> +> least I can't reproduce the crash on my stand. +> +> +Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram +> +are +> +definitely harmful. Try V2 of the patch, attached, which skips the lines +> +of init_qxl_ram that modify guest memory. +> +Thanks, your v2 patch does seem to prevent the crash. Would you re-send +it to the list as a proper fix? + +> +> I'm wondering, could it be useful to explicitly mark all the reused +> +> memory regions readonly upon cpr-transfer, and then make them writable +> +> back again after the migration is done? That way we will be segfaulting +> +> early on instead of debugging tricky memory corruptions. +> +> +It's a useful debugging technique, but changing protection on a large +> +memory region +> +can be too expensive for production due to TLB shootdowns. +> +> +Also, there are cases where writes are performed but the value is +> +guaranteed to +> +be the same: +> + qxl_post_load() +> +   qxl_set_mode() +> +     d->rom->mode = cpu_to_le32(modenr); +> +The value is the same because mode and shadow_rom.mode were passed in +> +vmstate +> +from old qemu. +> +There're also cases where devices' ROM might be re-initialized. E.g. +this segfault occures upon further exploration of RO mapped RAM blocks: + +> +Program terminated with signal SIGSEGV, Segmentation fault. +> +#0 __memmove_avx_unaligned_erms () at +> +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +> +664 rep movsb +> +[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] +> +(gdb) bt +> +#0 __memmove_avx_unaligned_erms () at +> +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +> +#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, +> +owner=0x55aa2019ac10, name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) +> +at ../hw/core/loader.c:1032 +> +#2 0x000055aa1d031577 in rom_add_blob +> +(name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, +> +max_len=2097152, addr=18446744073709551615, fw_file_name=0x55aa1da51f13 +> +"etc/acpi/tables", fw_callback=0x55aa1d441f59 <acpi_build_update>, +> +callback_opaque=0x55aa20ff0010, as=0x0, read_only=true) at +> +../hw/core/loader.c:1147 +> +#3 0x000055aa1cfd788d in acpi_add_rom_blob +> +(update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, +> +blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at +> +../hw/acpi/utils.c:46 +> +#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 +> +#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) +> +at ../hw/i386/pc.c:638 +> +#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 +> +<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 +> +#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at +> +../hw/core/machine.c:1749 +> +#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 +> +<error_fatal>) at ../system/vl.c:2779 +> +#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 +> +<error_fatal>) at ../system/vl.c:2807 +> +#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at +> +../system/vl.c:3838 +> +#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at +> +../system/main.c:72 +I'm not sure whether ACPI tables ROM in particular is rewritten with the +same content, but there might be cases where ROM can be read from file +system upon initialization. That is undesirable as guest kernel +certainly won't be too happy about sudden change of the device's ROM +content. + +So the issue we're dealing with here is any unwanted memory related +device initialization upon cpr. + +For now the only thing that comes to my mind is to make a test where we +put as many devices as we can into a VM, make ram blocks RO upon cpr +(and remap them as RW later after migration is done, if needed), and +catch any unwanted memory violations. As Den suggested, we might +consider adding that behaviour as a separate non-default option (or +"migrate" command flag specific to cpr-transfer), which would only be +used in the testing. + +Andrey + +On 3/6/25 16:16, Andrey Drobyshev wrote: +On 3/5/25 11:19 PM, Steven Sistare wrote: +On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: +On 3/4/25 9:05 PM, Steven Sistare wrote: +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +On 2/28/25 8:20 PM, Steven Sistare wrote: +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently +and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +       -machine q35 \ +       -cpu host -smp 2 -m 2G \ +       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +dev/shm/ +ram0,share=on\ +       -machine memory-backend=ram0 \ +       -machine aux-ram-share=on \ +       -drive file=$ROOTFS,media=disk,if=virtio \ +       -qmp unix:$QMPSOCK,server=on,wait=off \ +       -nographic \ +       -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +       -machine q35 \ +       -cpu host -smp 2 -m 2G \ +       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +dev/shm/ +ram0,share=on\ +       -machine memory-backend=ram0 \ +       -machine aux-ram-share=on \ +       -drive file=$ROOTFS,media=disk,if=virtio \ +       -qmp unix:$QMPSOCK,server=on,wait=off \ +       -nographic \ +       -device qxl-vga \ +       -incoming tcp:0:44444 \ +       -incoming '{"channel-type": "cpr", "addr": { "transport": +"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +       migrate-set-parameters mode=cpr-transfer +       migrate channels=[{"channel-type":"main","addr": +{"transport":"socket","type":"inet","host":"0","port":"44444"}}, +{"channel-type":"cpr","addr": +{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +allocate VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1- +min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which +speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +           echo "$(date) starting round $j" +           if [ "$(journalctl --boot | grep "failed to allocate +VRAM +BO")" != "" ]; then +                   echo "bug was reproduced after $j tries" +                   exit 1 +           fi +           for i in $(seq 100); do +                   dmesg > /dev/tty3 +           done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce +that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't +lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest +memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into +this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +    +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +email- +steven.sistare@oracle.com/ + +- Steve +With the traces enabled + the "migration: ram block cpr blockers" +patch +applied: + +Source: +cpr_find_fd pc.bios, id 0 returns -1 +cpr_save_fd pc.bios, id 0, fd 22 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +0x7fec18e00000 +cpr_find_fd pc.rom, id 0 returns -1 +cpr_save_fd pc.rom, id 0, fd 23 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +0x7fec18c00000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 24 host 0x7fec18a00000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 25 host 0x7feb77e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 27 host 0x7fec18800000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 28 host 0x7feb73c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 34 host 0x7fec18600000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 35 host 0x7fec18200000 +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 36 host 0x7feb8b600000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +37 host 0x7feb8b400000 + +cpr_state_save cpr-transfer mode +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +cpr_transfer_input /var/run/alma8cpr-dst.sock +cpr_state_load cpr-transfer mode +cpr_find_fd pc.bios, id 0 returns 20 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +0x7fcdc9800000 +cpr_find_fd pc.rom, id 0 returns 19 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +0x7fcdc9600000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 18 host 0x7fcdc9400000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 17 host 0x7fcd27e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 16 host 0x7fcdc9200000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 15 host 0x7fcd23c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 14 host 0x7fcdc8800000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 13 host 0x7fcdc8400000 +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 11 host 0x7fcdc8200000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +10 host 0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with the +same +addresses), and no incompatible ram blocks are found during migration. +Sorry, addressed are not the same, of course. However corresponding +ram +blocks do seem to be preserved and initialized. +So far, I have not reproduced the guest driver failure. + +However, I have isolated places where new QEMU improperly writes to +the qxl memory regions prior to starting the guest, by mmap'ing them +readonly after cpr: + +   qemu_ram_alloc_internal() +     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +         ram_flags |= RAM_READONLY; +     new_block = qemu_ram_alloc_from_fd(...) + +I have attached a draft fix; try it and let me know. +My console window looks fine before and after cpr, using +-vnc $hostip:0 -vga qxl + +- Steve +Regarding the reproduce: when I launch the buggy version with the same +options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +my VNC client silently hangs on the target after a while. Could it +happen on your stand as well? +cpr does not preserve the vnc connection and session. To test, I specify +port 0 for the source VM and port 1 for the dest. When the src vnc goes +dormant the dest vnc becomes active. +Sure, I meant that VNC on the dest (on the port 1) works for a while +after the migration and then hangs, apparently after the guest QXL crash. +Could you try launching VM with +"-nographic -device qxl-vga"? That way VM's serial console is given you +directly in the shell, so when qxl driver crashes you're still able to +inspect the kernel messages. +I have been running like that, but have not reproduced the qxl driver +crash, +and I suspect my guest image+kernel is too old. +Yes, that's probably the case. But the crash occurs on my Fedora 41 +guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to +be buggy. +However, once I realized the +issue was post-cpr modification of qxl memory, I switched my attention +to the +fix. +As for your patch, I can report that it doesn't resolve the issue as it +is. But I was able to track down another possible memory corruption +using your approach with readonly mmap'ing: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); +[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +(gdb) bt +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, +value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, +v=0x5638996f3770, name=0x56389759b141 "realized", +opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) +     at ../qom/object.c:2374 +#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, +name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) +     at ../qom/object.c:1449 +#7 0x00005638970f8586 in object_property_set_qobject +(obj=0x5638996e0e70, name=0x56389759b141 "realized", +value=0x5638996df900, errp=0x7ffd3c2b84e0) +     at ../qom/qom-qobject.c:28 +#8 0x00005638970f3d8d in object_property_set_bool +(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, +errp=0x7ffd3c2b84e0) +     at ../qom/object.c:1519 +#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, +bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +#10 0x0000563896dba675 in qdev_device_add_from_qdict +(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ +system/qdev-monitor.c:714 +#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, +errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 +#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, +opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ +vl.c:1207 +#13 0x000056389737a6cc in qemu_opts_foreach +     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca +<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) +     at ../util/qemu-option.c:1135 +#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ +vl.c:2745 +#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +<error_fatal>) at ../system/vl.c:2806 +#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) +at ../system/vl.c:3838 +#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ +system/main.c:72 +So the attached adjusted version of your patch does seem to help. At +least I can't reproduce the crash on my stand. +Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram +are +definitely harmful. Try V2 of the patch, attached, which skips the lines +of init_qxl_ram that modify guest memory. +Thanks, your v2 patch does seem to prevent the crash. Would you re-send +it to the list as a proper fix? +I'm wondering, could it be useful to explicitly mark all the reused +memory regions readonly upon cpr-transfer, and then make them writable +back again after the migration is done? That way we will be segfaulting +early on instead of debugging tricky memory corruptions. +It's a useful debugging technique, but changing protection on a large +memory region +can be too expensive for production due to TLB shootdowns. + +Also, there are cases where writes are performed but the value is +guaranteed to +be the same: +  qxl_post_load() +    qxl_set_mode() +      d->rom->mode = cpu_to_le32(modenr); +The value is the same because mode and shadow_rom.mode were passed in +vmstate +from old qemu. +There're also cases where devices' ROM might be re-initialized. E.g. +this segfault occures upon further exploration of RO mapped RAM blocks: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 __memmove_avx_unaligned_erms () at +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +664 rep movsb +[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] +(gdb) bt +#0 __memmove_avx_unaligned_erms () at +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, +name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) + at ../hw/core/loader.c:1032 +#2 0x000055aa1d031577 in rom_add_blob + (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, +addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", +fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, +read_only=true) at ../hw/core/loader.c:1147 +#3 0x000055aa1cfd788d in acpi_add_rom_blob + (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, +blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46 +#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 +#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) +at ../hw/i386/pc.c:638 +#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 +<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 +#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at +../hw/core/machine.c:1749 +#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 +<error_fatal>) at ../system/vl.c:2779 +#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 +<error_fatal>) at ../system/vl.c:2807 +#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at +../system/vl.c:3838 +#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at +../system/main.c:72 +I'm not sure whether ACPI tables ROM in particular is rewritten with the +same content, but there might be cases where ROM can be read from file +system upon initialization. That is undesirable as guest kernel +certainly won't be too happy about sudden change of the device's ROM +content. + +So the issue we're dealing with here is any unwanted memory related +device initialization upon cpr. + +For now the only thing that comes to my mind is to make a test where we +put as many devices as we can into a VM, make ram blocks RO upon cpr +(and remap them as RW later after migration is done, if needed), and +catch any unwanted memory violations. As Den suggested, we might +consider adding that behaviour as a separate non-default option (or +"migrate" command flag specific to cpr-transfer), which would only be +used in the testing. + +Andrey +No way. ACPI with the source must be used in the same way as BIOSes +and optional ROMs. + +Den + +On 3/6/2025 10:52 AM, Denis V. Lunev wrote: +On 3/6/25 16:16, Andrey Drobyshev wrote: +On 3/5/25 11:19 PM, Steven Sistare wrote: +On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: +On 3/4/25 9:05 PM, Steven Sistare wrote: +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +On 2/28/25 8:20 PM, Steven Sistare wrote: +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently +and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +       -machine q35 \ +       -cpu host -smp 2 -m 2G \ +       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +dev/shm/ +ram0,share=on\ +       -machine memory-backend=ram0 \ +       -machine aux-ram-share=on \ +       -drive file=$ROOTFS,media=disk,if=virtio \ +       -qmp unix:$QMPSOCK,server=on,wait=off \ +       -nographic \ +       -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +       -machine q35 \ +       -cpu host -smp 2 -m 2G \ +       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +dev/shm/ +ram0,share=on\ +       -machine memory-backend=ram0 \ +       -machine aux-ram-share=on \ +       -drive file=$ROOTFS,media=disk,if=virtio \ +       -qmp unix:$QMPSOCK,server=on,wait=off \ +       -nographic \ +       -device qxl-vga \ +       -incoming tcp:0:44444 \ +       -incoming '{"channel-type": "cpr", "addr": { "transport": +"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +       migrate-set-parameters mode=cpr-transfer +       migrate channels=[{"channel-type":"main","addr": +{"transport":"socket","type":"inet","host":"0","port":"44444"}}, +{"channel-type":"cpr","addr": +{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +allocate VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1- +min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which +speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +           echo "$(date) starting round $j" +           if [ "$(journalctl --boot | grep "failed to allocate +VRAM +BO")" != "" ]; then +                   echo "bug was reproduced after $j tries" +                   exit 1 +           fi +           for i in $(seq 100); do +                   dmesg > /dev/tty3 +           done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce +that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't +lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest +memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into +this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +    +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +email- +steven.sistare@oracle.com/ + +- Steve +With the traces enabled + the "migration: ram block cpr blockers" +patch +applied: + +Source: +cpr_find_fd pc.bios, id 0 returns -1 +cpr_save_fd pc.bios, id 0, fd 22 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +0x7fec18e00000 +cpr_find_fd pc.rom, id 0 returns -1 +cpr_save_fd pc.rom, id 0, fd 23 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +0x7fec18c00000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 24 host 0x7fec18a00000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 25 host 0x7feb77e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 27 host 0x7fec18800000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 28 host 0x7feb73c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 34 host 0x7fec18600000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 35 host 0x7fec18200000 +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 36 host 0x7feb8b600000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +37 host 0x7feb8b400000 + +cpr_state_save cpr-transfer mode +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +cpr_transfer_input /var/run/alma8cpr-dst.sock +cpr_state_load cpr-transfer mode +cpr_find_fd pc.bios, id 0 returns 20 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +0x7fcdc9800000 +cpr_find_fd pc.rom, id 0 returns 19 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +0x7fcdc9600000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 18 host 0x7fcdc9400000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 17 host 0x7fcd27e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 16 host 0x7fcdc9200000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 15 host 0x7fcd23c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 14 host 0x7fcdc8800000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 13 host 0x7fcdc8400000 +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 11 host 0x7fcdc8200000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +10 host 0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with the +same +addresses), and no incompatible ram blocks are found during migration. +Sorry, addressed are not the same, of course. However corresponding +ram +blocks do seem to be preserved and initialized. +So far, I have not reproduced the guest driver failure. + +However, I have isolated places where new QEMU improperly writes to +the qxl memory regions prior to starting the guest, by mmap'ing them +readonly after cpr: + +   qemu_ram_alloc_internal() +     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +         ram_flags |= RAM_READONLY; +     new_block = qemu_ram_alloc_from_fd(...) + +I have attached a draft fix; try it and let me know. +My console window looks fine before and after cpr, using +-vnc $hostip:0 -vga qxl + +- Steve +Regarding the reproduce: when I launch the buggy version with the same +options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +my VNC client silently hangs on the target after a while. Could it +happen on your stand as well? +cpr does not preserve the vnc connection and session. To test, I specify +port 0 for the source VM and port 1 for the dest. When the src vnc goes +dormant the dest vnc becomes active. +Sure, I meant that VNC on the dest (on the port 1) works for a while +after the migration and then hangs, apparently after the guest QXL crash. +Could you try launching VM with +"-nographic -device qxl-vga"? That way VM's serial console is given you +directly in the shell, so when qxl driver crashes you're still able to +inspect the kernel messages. +I have been running like that, but have not reproduced the qxl driver +crash, +and I suspect my guest image+kernel is too old. +Yes, that's probably the case. But the crash occurs on my Fedora 41 +guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to +be buggy. +However, once I realized the +issue was post-cpr modification of qxl memory, I switched my attention +to the +fix. +As for your patch, I can report that it doesn't resolve the issue as it +is. But I was able to track down another possible memory corruption +using your approach with readonly mmap'ing: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); +[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +(gdb) bt +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, +value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, +v=0x5638996f3770, name=0x56389759b141 "realized", +opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) +     at ../qom/object.c:2374 +#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, +name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) +     at ../qom/object.c:1449 +#7 0x00005638970f8586 in object_property_set_qobject +(obj=0x5638996e0e70, name=0x56389759b141 "realized", +value=0x5638996df900, errp=0x7ffd3c2b84e0) +     at ../qom/qom-qobject.c:28 +#8 0x00005638970f3d8d in object_property_set_bool +(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, +errp=0x7ffd3c2b84e0) +     at ../qom/object.c:1519 +#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, +bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +#10 0x0000563896dba675 in qdev_device_add_from_qdict +(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ +system/qdev-monitor.c:714 +#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, +errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 +#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, +opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ +vl.c:1207 +#13 0x000056389737a6cc in qemu_opts_foreach +     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca +<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) +     at ../util/qemu-option.c:1135 +#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ +vl.c:2745 +#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +<error_fatal>) at ../system/vl.c:2806 +#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) +at ../system/vl.c:3838 +#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ +system/main.c:72 +So the attached adjusted version of your patch does seem to help. At +least I can't reproduce the crash on my stand. +Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram +are +definitely harmful. Try V2 of the patch, attached, which skips the lines +of init_qxl_ram that modify guest memory. +Thanks, your v2 patch does seem to prevent the crash. Would you re-send +it to the list as a proper fix? +Yes. Was waiting for your confirmation. +I'm wondering, could it be useful to explicitly mark all the reused +memory regions readonly upon cpr-transfer, and then make them writable +back again after the migration is done? That way we will be segfaulting +early on instead of debugging tricky memory corruptions. +It's a useful debugging technique, but changing protection on a large +memory region +can be too expensive for production due to TLB shootdowns. + +Also, there are cases where writes are performed but the value is +guaranteed to +be the same: +  qxl_post_load() +    qxl_set_mode() +      d->rom->mode = cpu_to_le32(modenr); +The value is the same because mode and shadow_rom.mode were passed in +vmstate +from old qemu. +There're also cases where devices' ROM might be re-initialized. E.g. +this segfault occures upon further exploration of RO mapped RAM blocks: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 __memmove_avx_unaligned_erms () at +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +664            rep    movsb +[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] +(gdb) bt +#0 __memmove_avx_unaligned_erms () at +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, +name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) +    at ../hw/core/loader.c:1032 +#2 0x000055aa1d031577 in rom_add_blob +    (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, +addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", +fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, +read_only=true) at ../hw/core/loader.c:1147 +#3 0x000055aa1cfd788d in acpi_add_rom_blob +    (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, +blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46 +#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 +#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) +at ../hw/i386/pc.c:638 +#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 +<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 +#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at +../hw/core/machine.c:1749 +#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 +<error_fatal>) at ../system/vl.c:2779 +#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 +<error_fatal>) at ../system/vl.c:2807 +#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at +../system/vl.c:3838 +#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at +../system/main.c:72 +I'm not sure whether ACPI tables ROM in particular is rewritten with the +same content, but there might be cases where ROM can be read from file +system upon initialization. That is undesirable as guest kernel +certainly won't be too happy about sudden change of the device's ROM +content. + +So the issue we're dealing with here is any unwanted memory related +device initialization upon cpr. + +For now the only thing that comes to my mind is to make a test where we +put as many devices as we can into a VM, make ram blocks RO upon cpr +(and remap them as RW later after migration is done, if needed), and +catch any unwanted memory violations. As Den suggested, we might +consider adding that behaviour as a separate non-default option (or +"migrate" command flag specific to cpr-transfer), which would only be +used in the testing. +I'll look into adding an option, but there may be too many false positives, +such as the qxl_set_mode case above. And the maintainers may object to me +eliminating the false positives by adding more CPR_IN tests, due to gratuitous +(from their POV) ugliness. + +But I will use the technique to look for more write violations. +Andrey +No way. ACPI with the source must be used in the same way as BIOSes +and optional ROMs. +Yup, its a bug. Will fix. + +- Steve + +see +1741380954-341079-1-git-send-email-steven.sistare@oracle.com +/">https://lore.kernel.org/qemu-devel/ +1741380954-341079-1-git-send-email-steven.sistare@oracle.com +/ +- Steve + +On 3/6/2025 11:13 AM, Steven Sistare wrote: +On 3/6/2025 10:52 AM, Denis V. Lunev wrote: +On 3/6/25 16:16, Andrey Drobyshev wrote: +On 3/5/25 11:19 PM, Steven Sistare wrote: +On 3/5/2025 11:50 AM, Andrey Drobyshev wrote: +On 3/4/25 9:05 PM, Steven Sistare wrote: +On 2/28/2025 1:37 PM, Andrey Drobyshev wrote: +On 2/28/25 8:35 PM, Andrey Drobyshev wrote: +On 2/28/25 8:20 PM, Steven Sistare wrote: +On 2/28/2025 1:13 PM, Steven Sistare wrote: +On 2/28/2025 12:39 PM, Andrey Drobyshev wrote: +Hi all, + +We've been experimenting with cpr-transfer migration mode recently +and +have discovered the following issue with the guest QXL driver: + +Run migration source: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-src.sock + +$EMULATOR -enable-kvm \ +       -machine q35 \ +       -cpu host -smp 2 -m 2G \ +       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +dev/shm/ +ram0,share=on\ +       -machine memory-backend=ram0 \ +       -machine aux-ram-share=on \ +       -drive file=$ROOTFS,media=disk,if=virtio \ +       -qmp unix:$QMPSOCK,server=on,wait=off \ +       -nographic \ +       -device qxl-vga +Run migration target: +EMULATOR=/path/to/emulator +ROOTFS=/path/to/image +QMPSOCK=/var/run/alma8qmp-dst.sock +$EMULATOR -enable-kvm \ +       -machine q35 \ +       -cpu host -smp 2 -m 2G \ +       -object memory-backend-file,id=ram0,size=2G,mem-path=/ +dev/shm/ +ram0,share=on\ +       -machine memory-backend=ram0 \ +       -machine aux-ram-share=on \ +       -drive file=$ROOTFS,media=disk,if=virtio \ +       -qmp unix:$QMPSOCK,server=on,wait=off \ +       -nographic \ +       -device qxl-vga \ +       -incoming tcp:0:44444 \ +       -incoming '{"channel-type": "cpr", "addr": { "transport": +"socket", "type": "unix", "path": "/var/run/alma8cpr-dst.sock"}}' +Launch the migration: +QMPSHELL=/root/src/qemu/master/scripts/qmp/qmp-shell +QMPSOCK=/var/run/alma8qmp-src.sock + +$QMPSHELL -p $QMPSOCK <<EOF +       migrate-set-parameters mode=cpr-transfer +       migrate channels=[{"channel-type":"main","addr": +{"transport":"socket","type":"inet","host":"0","port":"44444"}}, +{"channel-type":"cpr","addr": +{"transport":"socket","type":"unix","path":"/var/run/alma8cpr- +dst.sock"}}] +EOF +Then, after a while, QXL guest driver on target crashes spewing the +following messages: +[  73.962002] [TTM] Buffer eviction failed +[  73.962072] qxl 0000:00:02.0: object_init failed for (3149824, +0x00000001) +[  73.962081] [drm:qxl_alloc_bo_reserved [qxl]] *ERROR* failed to +allocate VRAM BO +That seems to be a known kernel QXL driver bug: +https://lore.kernel.org/all/20220907094423.93581-1- +min_halo@163.com/T/ +https://lore.kernel.org/lkml/ZTgydqRlK6WX_b29@eldamar.lan/ +(the latter discussion contains that reproduce script which +speeds up +the crash in the guest): +#!/bin/bash + +chvt 3 + +for j in $(seq 80); do +           echo "$(date) starting round $j" +           if [ "$(journalctl --boot | grep "failed to allocate +VRAM +BO")" != "" ]; then +                   echo "bug was reproduced after $j tries" +                   exit 1 +           fi +           for i in $(seq 100); do +                   dmesg > /dev/tty3 +           done +done + +echo "bug could not be reproduced" +exit 0 +The bug itself seems to remain unfixed, as I was able to reproduce +that +with Fedora 41 guest, as well as AlmaLinux 8 guest. However our +cpr-transfer code also seems to be buggy as it triggers the crash - +without the cpr-transfer migration the above reproduce doesn't +lead to +crash on the source VM. + +I suspect that, as cpr-transfer doesn't migrate the guest +memory, but +rather passes it through the memory backend object, our code might +somehow corrupt the VRAM. However, I wasn't able to trace the +corruption so far. + +Could somebody help the investigation and take a look into +this? Any +suggestions would be appreciated. Thanks! +Possibly some memory region created by qxl is not being preserved. +Try adding these traces to see what is preserved: + +-trace enable='*cpr*' +-trace enable='*ram_alloc*' +Also try adding this patch to see if it flags any ram blocks as not +compatible with cpr. A message is printed at migration start time. +    +https://lore.kernel.org/qemu-devel/1740667681-257312-1-git-send- +email- +steven.sistare@oracle.com/ + +- Steve +With the traces enabled + the "migration: ram block cpr blockers" +patch +applied: + +Source: +cpr_find_fd pc.bios, id 0 returns -1 +cpr_save_fd pc.bios, id 0, fd 22 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 22 host +0x7fec18e00000 +cpr_find_fd pc.rom, id 0 returns -1 +cpr_save_fd pc.rom, id 0, fd 23 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 23 host +0x7fec18c00000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns -1 +cpr_save_fd 0000:00:01.0/e1000e.rom, id 0, fd 24 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 24 host 0x7fec18a00000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/vga.vram, id 0, fd 25 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 25 host 0x7feb77e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vrom, id 0, fd 27 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 27 host 0x7fec18800000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.vram, id 0, fd 28 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 28 host 0x7feb73c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns -1 +cpr_save_fd 0000:00:02.0/qxl.rom, id 0, fd 34 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 34 host 0x7fec18600000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/tables, id 0, fd 35 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 35 host 0x7fec18200000 +cpr_find_fd /rom@etc/table-loader, id 0 returns -1 +cpr_save_fd /rom@etc/table-loader, id 0, fd 36 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 36 host 0x7feb8b600000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns -1 +cpr_save_fd /rom@etc/acpi/rsdp, id 0, fd 37 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +37 host 0x7feb8b400000 + +cpr_state_save cpr-transfer mode +cpr_transfer_output /var/run/alma8cpr-dst.sock +Target: +cpr_transfer_input /var/run/alma8cpr-dst.sock +cpr_state_load cpr-transfer mode +cpr_find_fd pc.bios, id 0 returns 20 +qemu_ram_alloc_shared pc.bios size 262144 max_size 262144 fd 20 host +0x7fcdc9800000 +cpr_find_fd pc.rom, id 0 returns 19 +qemu_ram_alloc_shared pc.rom size 131072 max_size 131072 fd 19 host +0x7fcdc9600000 +cpr_find_fd 0000:00:01.0/e1000e.rom, id 0 returns 18 +qemu_ram_alloc_shared 0000:00:01.0/e1000e.rom size 262144 max_size +262144 fd 18 host 0x7fcdc9400000 +cpr_find_fd 0000:00:02.0/vga.vram, id 0 returns 17 +qemu_ram_alloc_shared 0000:00:02.0/vga.vram size 67108864 max_size +67108864 fd 17 host 0x7fcd27e00000 +cpr_find_fd 0000:00:02.0/qxl.vrom, id 0 returns 16 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vrom size 8192 max_size 8192 +fd 16 host 0x7fcdc9200000 +cpr_find_fd 0000:00:02.0/qxl.vram, id 0 returns 15 +qemu_ram_alloc_shared 0000:00:02.0/qxl.vram size 67108864 max_size +67108864 fd 15 host 0x7fcd23c00000 +cpr_find_fd 0000:00:02.0/qxl.rom, id 0 returns 14 +qemu_ram_alloc_shared 0000:00:02.0/qxl.rom size 65536 max_size 65536 +fd 14 host 0x7fcdc8800000 +cpr_find_fd /rom@etc/acpi/tables, id 0 returns 13 +qemu_ram_alloc_shared /rom@etc/acpi/tables size 131072 max_size +2097152 fd 13 host 0x7fcdc8400000 +cpr_find_fd /rom@etc/table-loader, id 0 returns 11 +qemu_ram_alloc_shared /rom@etc/table-loader size 4096 max_size 65536 +fd 11 host 0x7fcdc8200000 +cpr_find_fd /rom@etc/acpi/rsdp, id 0 returns 10 +qemu_ram_alloc_shared /rom@etc/acpi/rsdp size 4096 max_size 4096 fd +10 host 0x7fcd3be00000 +Looks like both vga.vram and qxl.vram are being preserved (with the +same +addresses), and no incompatible ram blocks are found during migration. +Sorry, addressed are not the same, of course. However corresponding +ram +blocks do seem to be preserved and initialized. +So far, I have not reproduced the guest driver failure. + +However, I have isolated places where new QEMU improperly writes to +the qxl memory regions prior to starting the guest, by mmap'ing them +readonly after cpr: + +   qemu_ram_alloc_internal() +     if (reused && (strstr(name, "qxl") || strstr("name", "vga"))) +         ram_flags |= RAM_READONLY; +     new_block = qemu_ram_alloc_from_fd(...) + +I have attached a draft fix; try it and let me know. +My console window looks fine before and after cpr, using +-vnc $hostip:0 -vga qxl + +- Steve +Regarding the reproduce: when I launch the buggy version with the same +options as you, i.e. "-vnc 0.0.0.0:$port -vga qxl", and do cpr-transfer, +my VNC client silently hangs on the target after a while. Could it +happen on your stand as well? +cpr does not preserve the vnc connection and session. To test, I specify +port 0 for the source VM and port 1 for the dest. When the src vnc goes +dormant the dest vnc becomes active. +Sure, I meant that VNC on the dest (on the port 1) works for a while +after the migration and then hangs, apparently after the guest QXL crash. +Could you try launching VM with +"-nographic -device qxl-vga"? That way VM's serial console is given you +directly in the shell, so when qxl driver crashes you're still able to +inspect the kernel messages. +I have been running like that, but have not reproduced the qxl driver +crash, +and I suspect my guest image+kernel is too old. +Yes, that's probably the case. But the crash occurs on my Fedora 41 +guest with the 6.11.5-300.fc41.x86_64 kernel, so newer kernels seem to +be buggy. +However, once I realized the +issue was post-cpr modification of qxl memory, I switched my attention +to the +fix. +As for your patch, I can report that it doesn't resolve the issue as it +is. But I was able to track down another possible memory corruption +using your approach with readonly mmap'ing: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +412        d->ram->magic      = cpu_to_le32(QXL_RAM_MAGIC); +[Current thread is 1 (Thread 0x7f1a4f83b480 (LWP 229798))] +(gdb) bt +#0 init_qxl_ram (d=0x5638996e0e70) at ../hw/display/qxl.c:412 +#1 0x0000563896e7f467 in qxl_realize_common (qxl=0x5638996e0e70, +errp=0x7ffd3c2b8170) at ../hw/display/qxl.c:2142 +#2 0x0000563896e7fda1 in qxl_realize_primary (dev=0x5638996e0e70, +errp=0x7ffd3c2b81d0) at ../hw/display/qxl.c:2257 +#3 0x0000563896c7e8f2 in pci_qdev_realize (qdev=0x5638996e0e70, +errp=0x7ffd3c2b8250) at ../hw/pci/pci.c:2174 +#4 0x00005638970eb54b in device_set_realized (obj=0x5638996e0e70, +value=true, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:494 +#5 0x00005638970f5e14 in property_set_bool (obj=0x5638996e0e70, +v=0x5638996f3770, name=0x56389759b141 "realized", +opaque=0x5638987893d0, errp=0x7ffd3c2b84e0) +     at ../qom/object.c:2374 +#6 0x00005638970f39f8 in object_property_set (obj=0x5638996e0e70, +name=0x56389759b141 "realized", v=0x5638996f3770, errp=0x7ffd3c2b84e0) +     at ../qom/object.c:1449 +#7 0x00005638970f8586 in object_property_set_qobject +(obj=0x5638996e0e70, name=0x56389759b141 "realized", +value=0x5638996df900, errp=0x7ffd3c2b84e0) +     at ../qom/qom-qobject.c:28 +#8 0x00005638970f3d8d in object_property_set_bool +(obj=0x5638996e0e70, name=0x56389759b141 "realized", value=true, +errp=0x7ffd3c2b84e0) +     at ../qom/object.c:1519 +#9 0x00005638970eacb0 in qdev_realize (dev=0x5638996e0e70, +bus=0x563898cf3c20, errp=0x7ffd3c2b84e0) at ../hw/core/qdev.c:276 +#10 0x0000563896dba675 in qdev_device_add_from_qdict +(opts=0x5638996dfe50, from_json=false, errp=0x7ffd3c2b84e0) at ../ +system/qdev-monitor.c:714 +#11 0x0000563896dba721 in qdev_device_add (opts=0x563898786150, +errp=0x56389855dc40 <error_fatal>) at ../system/qdev-monitor.c:733 +#12 0x0000563896dc48f1 in device_init_func (opaque=0x0, +opts=0x563898786150, errp=0x56389855dc40 <error_fatal>) at ../system/ +vl.c:1207 +#13 0x000056389737a6cc in qemu_opts_foreach +     (list=0x563898427b60 <qemu_device_opts>, func=0x563896dc48ca +<device_init_func>, opaque=0x0, errp=0x56389855dc40 <error_fatal>) +     at ../util/qemu-option.c:1135 +#14 0x0000563896dc89b5 in qemu_create_cli_devices () at ../system/ +vl.c:2745 +#15 0x0000563896dc8c00 in qmp_x_exit_preconfig (errp=0x56389855dc40 +<error_fatal>) at ../system/vl.c:2806 +#16 0x0000563896dcb5de in qemu_init (argc=33, argv=0x7ffd3c2b8948) +at ../system/vl.c:3838 +#17 0x0000563897297323 in main (argc=33, argv=0x7ffd3c2b8948) at ../ +system/main.c:72 +So the attached adjusted version of your patch does seem to help. At +least I can't reproduce the crash on my stand. +Thanks for the stack trace; the calls to SPICE_RING_INIT in init_qxl_ram +are +definitely harmful. Try V2 of the patch, attached, which skips the lines +of init_qxl_ram that modify guest memory. +Thanks, your v2 patch does seem to prevent the crash. Would you re-send +it to the list as a proper fix? +Yes. Was waiting for your confirmation. +I'm wondering, could it be useful to explicitly mark all the reused +memory regions readonly upon cpr-transfer, and then make them writable +back again after the migration is done? That way we will be segfaulting +early on instead of debugging tricky memory corruptions. +It's a useful debugging technique, but changing protection on a large +memory region +can be too expensive for production due to TLB shootdowns. + +Also, there are cases where writes are performed but the value is +guaranteed to +be the same: +  qxl_post_load() +    qxl_set_mode() +      d->rom->mode = cpu_to_le32(modenr); +The value is the same because mode and shadow_rom.mode were passed in +vmstate +from old qemu. +There're also cases where devices' ROM might be re-initialized. E.g. +this segfault occures upon further exploration of RO mapped RAM blocks: +Program terminated with signal SIGSEGV, Segmentation fault. +#0 __memmove_avx_unaligned_erms () at +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +664            rep    movsb +[Current thread is 1 (Thread 0x7f6e7d08b480 (LWP 310379))] +(gdb) bt +#0 __memmove_avx_unaligned_erms () at +../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:664 +#1 0x000055aa1d030ecd in rom_set_mr (rom=0x55aa200ba380, owner=0x55aa2019ac10, +name=0x7fffb8272bc0 "/rom@etc/acpi/tables", ro=true) +    at ../hw/core/loader.c:1032 +#2 0x000055aa1d031577 in rom_add_blob +    (name=0x55aa1da51f13 "etc/acpi/tables", blob=0x55aa208a1070, len=131072, max_len=2097152, +addr=18446744073709551615, fw_file_name=0x55aa1da51f13 "etc/acpi/tables", +fw_callback=0x55aa1d441f59 <acpi_build_update>, callback_opaque=0x55aa20ff0010, as=0x0, +read_only=true) at ../hw/core/loader.c:1147 +#3 0x000055aa1cfd788d in acpi_add_rom_blob +    (update=0x55aa1d441f59 <acpi_build_update>, opaque=0x55aa20ff0010, +blob=0x55aa1fc9aa00, name=0x55aa1da51f13 "etc/acpi/tables") at ../hw/acpi/utils.c:46 +#4 0x000055aa1d44213f in acpi_setup () at ../hw/i386/acpi-build.c:2720 +#5 0x000055aa1d434199 in pc_machine_done (notifier=0x55aa1ff15050, data=0x0) +at ../hw/i386/pc.c:638 +#6 0x000055aa1d876845 in notifier_list_notify (list=0x55aa1ea25c10 +<machine_init_done_notifiers>, data=0x0) at ../util/notify.c:39 +#7 0x000055aa1d039ee5 in qdev_machine_creation_done () at +../hw/core/machine.c:1749 +#8 0x000055aa1d2c7b3e in qemu_machine_creation_done (errp=0x55aa1ea5cc40 +<error_fatal>) at ../system/vl.c:2779 +#9 0x000055aa1d2c7c7d in qmp_x_exit_preconfig (errp=0x55aa1ea5cc40 +<error_fatal>) at ../system/vl.c:2807 +#10 0x000055aa1d2ca64f in qemu_init (argc=35, argv=0x7fffb82730e8) at +../system/vl.c:3838 +#11 0x000055aa1d79638c in main (argc=35, argv=0x7fffb82730e8) at +../system/main.c:72 +I'm not sure whether ACPI tables ROM in particular is rewritten with the +same content, but there might be cases where ROM can be read from file +system upon initialization. That is undesirable as guest kernel +certainly won't be too happy about sudden change of the device's ROM +content. + +So the issue we're dealing with here is any unwanted memory related +device initialization upon cpr. + +For now the only thing that comes to my mind is to make a test where we +put as many devices as we can into a VM, make ram blocks RO upon cpr +(and remap them as RW later after migration is done, if needed), and +catch any unwanted memory violations. As Den suggested, we might +consider adding that behaviour as a separate non-default option (or +"migrate" command flag specific to cpr-transfer), which would only be +used in the testing. +I'll look into adding an option, but there may be too many false positives, +such as the qxl_set_mode case above. And the maintainers may object to me +eliminating the false positives by adding more CPR_IN tests, due to gratuitous +(from their POV) ugliness. + +But I will use the technique to look for more write violations. +Andrey +No way. ACPI with the source must be used in the same way as BIOSes +and optional ROMs. +Yup, its a bug. Will fix. + +- Steve + diff --git a/classification_output/01/mistranslation/6178292 b/classification_output/01/mistranslation/6178292 new file mode 100644 index 000000000..f13db3b86 --- /dev/null +++ b/classification_output/01/mistranslation/6178292 @@ -0,0 +1,258 @@ +mistranslation: 0.930 +semantic: 0.928 +instruction: 0.905 +other: 0.890 + +[BUG][RFC] CPR transfer Issues: Socket permissions and PID files + +Hello, + +While testing CPR transfer I encountered two issues. The first is that the +transfer fails when running with pidfiles due to the destination qemu process +attempting to create the pidfile while it is still locked by the source +process. The second is that the transfer fails when running with the -run-with +user=$USERID parameter. This is because the destination qemu process creates +the UNIX sockets used for the CPR transfer before dropping to the lower +permissioned user, which causes them to be owned by the original user. The +source qemu process then does not have permission to connect to it because it +is already running as the lesser permissioned user. + +Reproducing the first issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have the -pidfile parameter passed on the command line. You +should see the following error on the command line of the second process: + +qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource +temporarily unavailable + +Reproducing the second issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have -run-with user=$USERID passed on the command line, where +$USERID is a different user from the one launching the processes. Then attempt +a CPR transfer using UNIX sockets for the main and cpr sockets. You should +receive the following error via QMP: +{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': +Permission denied"}} + +I provided a minimal patch that works around the second issue. + +Thank you, +Ben Chaney + +--- +include/system/os-posix.h | 4 ++++ +os-posix.c | 8 -------- +util/qemu-sockets.c | 21 +++++++++++++++++++++ +3 files changed, 25 insertions(+), 8 deletions(-) + +diff --git a/include/system/os-posix.h b/include/system/os-posix.h +index ce5b3bccf8..2a414a914a 100644 +--- a/include/system/os-posix.h ++++ b/include/system/os-posix.h +@@ -55,6 +55,10 @@ void os_setup_limits(void); +void os_setup_post(void); +int os_mlock(bool on_fault); + ++extern struct passwd *user_pwd; ++extern uid_t user_uid; ++extern gid_t user_gid; ++ +/** +* qemu_alloc_stack: +* @sz: pointer to a size_t holding the requested usable stack size +diff --git a/os-posix.c b/os-posix.c +index 52925c23d3..9369b312a0 100644 +--- a/os-posix.c ++++ b/os-posix.c +@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s) +} + + +-/* +- * Must set all three of these at once. +- * Legal combinations are unset by name by uid +- */ +-static struct passwd *user_pwd; /* NULL non-NULL NULL */ +-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ +-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ +- +/* +* Prepare to change user ID. user_id can be one of 3 forms: +* - a username, in which case user ID will be changed to its uid, +diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c +index 77477c1cd5..987977ead9 100644 +--- a/util/qemu-sockets.c ++++ b/util/qemu-sockets.c +@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr) +#endif +} + ++/* ++ * Must set all three of these at once. ++ * Legal combinations are unset by name by uid ++ */ ++struct passwd *user_pwd; /* NULL non-NULL NULL */ ++uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ ++gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ ++ +static int unix_listen_saddr(UnixSocketAddress *saddr, +int num, +Error **errp) +@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr, +error_setg_errno(errp, errno, "Failed to bind socket to %s", path); +goto err; +} ++ if (user_pwd) { ++ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ else if (user_uid != -1 && user_gid != -1) { ++ if (chown(un.sun_path, user_uid, user_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ +if (listen(sock, num) < 0) { +error_setg_errno(errp, errno, "Failed to listen on socket"); +goto err; +-- +2.40.1 + +Thank you Ben. I appreciate you testing CPR and shaking out the bugs. +I will study these and propose patches. + +My initial reaction to the pidfile issue is that the orchestration layer must +pass a different filename when starting the destination qemu instance. When +using live update without containers, these types of resource conflicts in the +global namespaces are a known issue. + +- Steve + +On 3/14/2025 2:33 PM, Chaney, Ben wrote: +Hello, + +While testing CPR transfer I encountered two issues. The first is that the +transfer fails when running with pidfiles due to the destination qemu process +attempting to create the pidfile while it is still locked by the source +process. The second is that the transfer fails when running with the -run-with +user=$USERID parameter. This is because the destination qemu process creates +the UNIX sockets used for the CPR transfer before dropping to the lower +permissioned user, which causes them to be owned by the original user. The +source qemu process then does not have permission to connect to it because it +is already running as the lesser permissioned user. + +Reproducing the first issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have the -pidfile parameter passed on the command line. You +should see the following error on the command line of the second process: + +qemu-system-x86_64: cannot create PID file: Cannot lock pid file: Resource +temporarily unavailable + +Reproducing the second issue: + +Create a source and destination qemu instance associated with the same VM where +both processes have -run-with user=$USERID passed on the command line, where +$USERID is a different user from the one launching the processes. Then attempt +a CPR transfer using UNIX sockets for the main and cpr sockets. You should +receive the following error via QMP: +{"error": {"class": "GenericError", "desc": "Failed to connect to 'cpr.sock': +Permission denied"}} + +I provided a minimal patch that works around the second issue. + +Thank you, +Ben Chaney + +--- +include/system/os-posix.h | 4 ++++ +os-posix.c | 8 -------- +util/qemu-sockets.c | 21 +++++++++++++++++++++ +3 files changed, 25 insertions(+), 8 deletions(-) + +diff --git a/include/system/os-posix.h b/include/system/os-posix.h +index ce5b3bccf8..2a414a914a 100644 +--- a/include/system/os-posix.h ++++ b/include/system/os-posix.h +@@ -55,6 +55,10 @@ void os_setup_limits(void); +void os_setup_post(void); +int os_mlock(bool on_fault); + ++extern struct passwd *user_pwd; ++extern uid_t user_uid; ++extern gid_t user_gid; ++ +/** +* qemu_alloc_stack: +* @sz: pointer to a size_t holding the requested usable stack size +diff --git a/os-posix.c b/os-posix.c +index 52925c23d3..9369b312a0 100644 +--- a/os-posix.c ++++ b/os-posix.c +@@ -86,14 +86,6 @@ void os_set_proc_name(const char *s) +} + + +-/* +- * Must set all three of these at once. +- * Legal combinations are unset by name by uid +- */ +-static struct passwd *user_pwd; /* NULL non-NULL NULL */ +-static uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ +-static gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ +- +/* +* Prepare to change user ID. user_id can be one of 3 forms: +* - a username, in which case user ID will be changed to its uid, +diff --git a/util/qemu-sockets.c b/util/qemu-sockets.c +index 77477c1cd5..987977ead9 100644 +--- a/util/qemu-sockets.c ++++ b/util/qemu-sockets.c +@@ -871,6 +871,14 @@ static bool saddr_is_tight(UnixSocketAddress *saddr) +#endif +} + ++/* ++ * Must set all three of these at once. ++ * Legal combinations are unset by name by uid ++ */ ++struct passwd *user_pwd; /* NULL non-NULL NULL */ ++uid_t user_uid = (uid_t)-1; /* -1 -1 >=0 */ ++gid_t user_gid = (gid_t)-1; /* -1 -1 >=0 */ ++ +static int unix_listen_saddr(UnixSocketAddress *saddr, +int num, +Error **errp) +@@ -947,6 +955,19 @@ static int unix_listen_saddr(UnixSocketAddress *saddr, +error_setg_errno(errp, errno, "Failed to bind socket to %s", path); +goto err; +} ++ if (user_pwd) { ++ if (chown(un.sun_path, user_pwd->pw_uid, user_pwd->pw_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ else if (user_uid != -1 && user_gid != -1) { ++ if (chown(un.sun_path, user_uid, user_gid) < 0) { ++ error_setg_errno(errp, errno, "Failed to change permissions on socket %s", +path); ++ goto err; ++ } ++ } ++ +if (listen(sock, num) < 0) { +error_setg_errno(errp, errno, "Failed to listen on socket"); +goto err; +-- +2.40.1 + diff --git a/classification_output/01/mistranslation/6866700 b/classification_output/01/mistranslation/6866700 new file mode 100644 index 000000000..2f16ce872 --- /dev/null +++ b/classification_output/01/mistranslation/6866700 @@ -0,0 +1,54 @@ +mistranslation: 0.936 +semantic: 0.906 +other: 0.881 +instruction: 0.864 + +[Qemu-devel] [BUG] trace: QEMU hangs on initialization with the "simple" backend + +While starting the softmmu version of QEMU, the simple backend waits for the +writeout thread to signal a condition variable when initializing the output file +path. But since the writeout thread has not been created, it just waits forever. + +Thanks, + Lluis + +On Tue, Feb 09, 2016 at 09:24:04PM +0100, LluÃs Vilanova wrote: +> +While starting the softmmu version of QEMU, the simple backend waits for the +> +writeout thread to signal a condition variable when initializing the output +> +file +> +path. But since the writeout thread has not been created, it just waits +> +forever. +Denis Lunev posted a fix: +https://patchwork.ozlabs.org/patch/580968/ +Stefan +signature.asc +Description: +PGP signature + +Stefan Hajnoczi writes: + +> +On Tue, Feb 09, 2016 at 09:24:04PM +0100, LluÃs Vilanova wrote: +> +> While starting the softmmu version of QEMU, the simple backend waits for the +> +> writeout thread to signal a condition variable when initializing the output +> +> file +> +> path. But since the writeout thread has not been created, it just waits +> +> forever. +> +Denis Lunev posted a fix: +> +https://patchwork.ozlabs.org/patch/580968/ +Great, thanks. + +Lluis + diff --git a/classification_output/01/mistranslation/7711787 b/classification_output/01/mistranslation/7711787 new file mode 100644 index 000000000..ead1f32fd --- /dev/null +++ b/classification_output/01/mistranslation/7711787 @@ -0,0 +1,165 @@ +mistranslation: 0.915 +semantic: 0.904 +instruction: 0.888 +other: 0.813 + +[BUG] cxl,i386: e820 mappings may not be correct for cxl + +Context included below from prior discussion + - `cxl create-region` would fail on inability to allocate memory + - traced this down to the memory region being marked RESERVED + - E820 map marks the CXL fixed memory window as RESERVED + + +Re: x86 errors, I found that region worked with this patch. (I also +added the SRAT patches the Davidlohr posted, but I do not think they are +relevant). + +I don't think this is correct, and setting this to E820_RAM causes the +system to fail to boot at all, but with this change `cxl create-region` +succeeds, which suggests our e820 mappings in the i386 machine are +incorrect. + +Anyone who can help or have an idea as to what e820 should actually be +doing with this region, or if this is correct and something else is +failing, please help! + + +diff --git a/hw/i386/pc.c b/hw/i386/pc.c +index 566accf7e6..a5e688a742 100644 +--- a/hw/i386/pc.c ++++ b/hw/i386/pc.c +@@ -1077,7 +1077,7 @@ void pc_memory_init(PCMachineState *pcms, + memory_region_init_io(&fw->mr, OBJECT(machine), &cfmws_ops, fw, + "cxl-fixed-memory-region", fw->size); + memory_region_add_subregion(system_memory, fw->base, &fw->mr); +- e820_add_entry(fw->base, fw->size, E820_RESERVED); ++ e820_add_entry(fw->base, fw->size, E820_NVS); + cxl_fmw_base += fw->size; + cxl_resv_end = cxl_fmw_base; + } + + +On Mon, Oct 10, 2022 at 05:32:42PM +0100, Jonathan Cameron wrote: +> +> +> > but i'm not sure of what to do with this info. We have some proof +> +> > that real hardware works with this no problem, and the only difference +> +> > is that the EFI/bios/firmware is setting the memory regions as `usable` +> +> > or `soft reserved`, which would imply the EDK2 is the blocker here +> +> > regardless of the OS driver status. +> +> > +> +> > But I'd seen elsewhere you had gotten some of this working, and I'm +> +> > failing to get anything working at the moment. If you have any input i +> +> > would greatly appreciate the help. +> +> > +> +> > QEMU config: +> +> > +> +> > /opt/qemu-cxl2/bin/qemu-system-x86_64 \ +> +> > -drive +> +> > file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=d\ +> +> > -m 2G,slots=4,maxmem=4G \ +> +> > -smp 4 \ +> +> > -machine type=q35,accel=kvm,cxl=on \ +> +> > -enable-kvm \ +> +> > -nographic \ +> +> > -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \ +> +> > -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,slot=0 \ +> +> > -object memory-backend-file,id=cxl-mem0,mem-path=/tmp/cxl-mem0,size=256M \ +> +> > -object memory-backend-file,id=lsa0,mem-path=/tmp/cxl-lsa0,size=256M \ +> +> > -device cxl-type3,bus=rp0,pmem=true,memdev=cxl-mem0,lsa=lsa0,id=cxl-pmem0 +> +> > \ +> +> > -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=256M +> +> > +> +> > I'd seen on the lists that you had seen issues with single-rp setups, +> +> > but no combination of configuration I've tried (including all the ones +> +> > in the docs and tests) lead to a successful region creation with +> +> > `cxl create-region` +> +> +> +> Hmm. Let me have a play. I've not run x86 tests for a while so +> +> perhaps something is missing there. +> +> +> +> I'm carrying a patch to override check_last_peer() in +> +> cxl_port_setup_targets() as that is wrong for some combinations, +> +> but that doesn't look like it's related to what you are seeing. +> +> +I'm not sure if it's relevant, but turned out I'd forgotten I'm carrying 3 +> +patches that aren't upstream (and one is a horrible hack). +> +> +Hack: +https://lore.kernel.org/linux-cxl/20220819094655.000005ed@huawei.com/ +> +Shouldn't affect a simple case like this... +> +> +https://lore.kernel.org/linux-cxl/20220819093133.00006c22@huawei.com/T/#t +> +(Dan's version) +> +> +https://lore.kernel.org/linux-cxl/20220815154044.24733-1-Jonathan.Cameron@huawei.com/T/#t +> +> +For writes to work you will currently need two rps (nothing on the second is +> +fine) +> +as we still haven't resolved if the kernel should support an HDM decoder on +> +a host bridge with one port. I think it should (Spec allows it), others +> +unconvinced. +> +> +Note I haven't shifted over to x86 yet so may still be something different +> +from +> +arm64. +> +> +Jonathan +> +> + diff --git a/classification_output/01/mistranslation/8720260 b/classification_output/01/mistranslation/8720260 new file mode 100644 index 000000000..32d247ac7 --- /dev/null +++ b/classification_output/01/mistranslation/8720260 @@ -0,0 +1,344 @@ +mistranslation: 0.752 +instruction: 0.700 +other: 0.683 +semantic: 0.669 + +[Bug Report][RFC PATCH 0/1] block: fix failing assert on paused VM migration + +There's a bug (failing assert) which is reproduced during migration of +a paused VM. I am able to reproduce it on a stand with 2 nodes and a common +NFS share, with VM's disk on that share. + +root@fedora40-1-vm:~# virsh domblklist alma8-vm + Target Source +------------------------------------------ + sda /mnt/shared/images/alma8.qcow2 + +root@fedora40-1-vm:~# df -Th /mnt/shared +Filesystem Type Size Used Avail Use% Mounted on +127.0.0.1:/srv/nfsd nfs4 63G 16G 48G 25% /mnt/shared + +On the 1st node: + +root@fedora40-1-vm:~# virsh start alma8-vm ; virsh suspend alma8-vm +root@fedora40-1-vm:~# virsh migrate --compressed --p2p --persistent +--undefinesource --live alma8-vm qemu+ssh://fedora40-2-vm/system + +Then on the 2nd node: + +root@fedora40-2-vm:~# virsh migrate --compressed --p2p --persistent +--undefinesource --live alma8-vm qemu+ssh://fedora40-1-vm/system +error: operation failed: domain is not running + +root@fedora40-2-vm:~# tail -3 /var/log/libvirt/qemu/alma8-vm.log +2024-09-19 13:53:33.336+0000: initiating migration +qemu-system-x86_64: ../block.c:6976: int +bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & +BDRV_O_INACTIVE)' failed. +2024-09-19 13:53:42.991+0000: shutting down, reason=crashed + +Backtrace: + +(gdb) bt +#0 0x00007f7eaa2f1664 in __pthread_kill_implementation () at /lib64/libc.so.6 +#1 0x00007f7eaa298c4e in raise () at /lib64/libc.so.6 +#2 0x00007f7eaa280902 in abort () at /lib64/libc.so.6 +#3 0x00007f7eaa28081e in __assert_fail_base.cold () at /lib64/libc.so.6 +#4 0x00007f7eaa290d87 in __assert_fail () at /lib64/libc.so.6 +#5 0x0000563c38b95eb8 in bdrv_inactivate_recurse (bs=0x563c3b6c60c0) at +../block.c:6976 +#6 0x0000563c38b95aeb in bdrv_inactivate_all () at ../block.c:7038 +#7 0x0000563c3884d354 in qemu_savevm_state_complete_precopy_non_iterable +(f=0x563c3b700c20, in_postcopy=false, inactivate_disks=true) + at ../migration/savevm.c:1571 +#8 0x0000563c3884dc1a in qemu_savevm_state_complete_precopy (f=0x563c3b700c20, +iterable_only=false, inactivate_disks=true) at ../migration/savevm.c:1631 +#9 0x0000563c3883a340 in migration_completion_precopy (s=0x563c3b4d51f0, +current_active_state=<optimized out>) at ../migration/migration.c:2780 +#10 migration_completion (s=0x563c3b4d51f0) at ../migration/migration.c:2844 +#11 migration_iteration_run (s=0x563c3b4d51f0) at ../migration/migration.c:3270 +#12 migration_thread (opaque=0x563c3b4d51f0) at ../migration/migration.c:3536 +#13 0x0000563c38dbcf14 in qemu_thread_start (args=0x563c3c2d5bf0) at +../util/qemu-thread-posix.c:541 +#14 0x00007f7eaa2ef6d7 in start_thread () at /lib64/libc.so.6 +#15 0x00007f7eaa373414 in clone () at /lib64/libc.so.6 + +What happens here is that after 1st migration BDS related to HDD remains +inactive as VM is still paused. Then when we initiate 2nd migration, +bdrv_inactivate_all() leads to the attempt to set BDRV_O_INACTIVE flag +on that node which is already set, thus assert fails. + +Attached patch which simply skips setting flag if it's already set is more +of a kludge than a clean solution. Should we use more sophisticated logic +which allows some of the nodes be in inactive state prior to the migration, +and takes them into account during bdrv_inactivate_all()? Comments would +be appreciated. + +Andrey + +Andrey Drobyshev (1): + block: do not fail when inactivating node which is inactive + + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +-- +2.39.3 + +Instead of throwing an assert let's just ignore that flag is already set +and return. We assume that it's going to be safe to ignore. Otherwise +this assert fails when migrating a paused VM back and forth. + +Ideally we'd like to have a more sophisticated solution, e.g. not even +scan the nodes which should be inactive at this point. + +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +diff --git a/block.c b/block.c +index 7d90007cae..c1dcf906d1 100644 +--- a/block.c ++++ b/block.c +@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK +bdrv_inactivate_recurse(BlockDriverState *bs) + return 0; + } + +- assert(!(bs->open_flags & BDRV_O_INACTIVE)); ++ if (bs->open_flags & BDRV_O_INACTIVE) { ++ /* ++ * Return here instead of throwing assert as a workaround to ++ * prevent failure on migrating paused VM. ++ * Here we assume that if we're trying to inactivate BDS that's ++ * already inactive, it's safe to just ignore it. ++ */ ++ return 0; ++ } + + /* Inactivate this node */ + if (bs->drv->bdrv_inactivate) { +-- +2.39.3 + +[add migration maintainers] + +On 24.09.24 15:56, Andrey Drobyshev wrote: +Instead of throwing an assert let's just ignore that flag is already set +and return. We assume that it's going to be safe to ignore. Otherwise +this assert fails when migrating a paused VM back and forth. + +Ideally we'd like to have a more sophisticated solution, e.g. not even +scan the nodes which should be inactive at this point. + +Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> +--- + block.c | 10 +++++++++- + 1 file changed, 9 insertions(+), 1 deletion(-) + +diff --git a/block.c b/block.c +index 7d90007cae..c1dcf906d1 100644 +--- a/block.c ++++ b/block.c +@@ -6973,7 +6973,15 @@ static int GRAPH_RDLOCK +bdrv_inactivate_recurse(BlockDriverState *bs) + return 0; + } +- assert(!(bs->open_flags & BDRV_O_INACTIVE)); ++ if (bs->open_flags & BDRV_O_INACTIVE) { ++ /* ++ * Return here instead of throwing assert as a workaround to ++ * prevent failure on migrating paused VM. ++ * Here we assume that if we're trying to inactivate BDS that's ++ * already inactive, it's safe to just ignore it. ++ */ ++ return 0; ++ } +/* Inactivate this node */ +if (bs->drv->bdrv_inactivate) { +I doubt that this a correct way to go. + +As far as I understand, "inactive" actually means that "storage is not belong to +qemu, but to someone else (another qemu process for example), and may be changed +transparently". In turn this means that Qemu should do nothing with inactive disks. So the +problem is that nobody called bdrv_activate_all on target, and we shouldn't ignore that. + +Hmm, I see in process_incoming_migration_bh() we do call bdrv_activate_all(), +but only in some scenarios. May be, the condition should be less strict here. + +Why we need any condition here at all? Don't we want to activate block-layer on +target after migration anyway? + +-- +Best regards, +Vladimir + +On 9/30/24 12:25 PM, Vladimir Sementsov-Ogievskiy wrote: +> +[add migration maintainers] +> +> +On 24.09.24 15:56, Andrey Drobyshev wrote: +> +> [...] +> +> +I doubt that this a correct way to go. +> +> +As far as I understand, "inactive" actually means that "storage is not +> +belong to qemu, but to someone else (another qemu process for example), +> +and may be changed transparently". In turn this means that Qemu should +> +do nothing with inactive disks. So the problem is that nobody called +> +bdrv_activate_all on target, and we shouldn't ignore that. +> +> +Hmm, I see in process_incoming_migration_bh() we do call +> +bdrv_activate_all(), but only in some scenarios. May be, the condition +> +should be less strict here. +> +> +Why we need any condition here at all? Don't we want to activate +> +block-layer on target after migration anyway? +> +Hmm I'm not sure about the unconditional activation, since we at least +have to honor LATE_BLOCK_ACTIVATE cap if it's set (and probably delay it +in such a case). In current libvirt upstream I see such code: + +> +/* Migration capabilities which should always be enabled as long as they +> +> +* are supported by QEMU. If the capability is supposed to be enabled on both +> +> +* sides of migration, it won't be enabled unless both sides support it. +> +> +*/ +> +> +static const qemuMigrationParamsAlwaysOnItem qemuMigrationParamsAlwaysOn[] = +> +{ +> +> +{QEMU_MIGRATION_CAP_PAUSE_BEFORE_SWITCHOVER, +> +> +QEMU_MIGRATION_SOURCE}, +> +> +> +> +{QEMU_MIGRATION_CAP_LATE_BLOCK_ACTIVATE, +> +> +QEMU_MIGRATION_DESTINATION}, +> +> +}; +which means that libvirt always wants LATE_BLOCK_ACTIVATE to be set. + +The code from process_incoming_migration_bh() you're referring to: + +> +/* If capability late_block_activate is set: +> +> +* Only fire up the block code now if we're going to restart the +> +> +* VM, else 'cont' will do it. +> +> +* This causes file locking to happen; so we don't want it to happen +> +> +* unless we really are starting the VM. +> +> +*/ +> +> +if (!migrate_late_block_activate() || +> +> +(autostart && (!global_state_received() || +> +> +runstate_is_live(global_state_get_runstate())))) { +> +> +/* Make sure all file formats throw away their mutable metadata. +> +> +> +* If we get an error here, just don't restart the VM yet. */ +> +> +bdrv_activate_all(&local_err); +> +> +if (local_err) { +> +> +error_report_err(local_err); +> +> +local_err = NULL; +> +> +autostart = false; +> +> +} +> +> +} +It states explicitly that we're either going to start VM right at this +point if (autostart == true), or we wait till "cont" command happens. +None of this is going to happen if we start another migration while +still being in PAUSED state. So I think it seems reasonable to take +such case into account. For instance, this patch does prevent the crash: + +> +diff --git a/migration/migration.c b/migration/migration.c +> +index ae2be31557..3222f6745b 100644 +> +--- a/migration/migration.c +> ++++ b/migration/migration.c +> +@@ -733,7 +733,8 @@ static void process_incoming_migration_bh(void *opaque) +> +*/ +> +if (!migrate_late_block_activate() || +> +(autostart && (!global_state_received() || +> +- runstate_is_live(global_state_get_runstate())))) { +> ++ runstate_is_live(global_state_get_runstate()))) || +> ++ (!autostart && global_state_get_runstate() == RUN_STATE_PAUSED)) { +> +/* Make sure all file formats throw away their mutable metadata. +> +* If we get an error here, just don't restart the VM yet. */ +> +bdrv_activate_all(&local_err); +What are your thoughts on it? + +Andrey + diff --git a/classification_output/01/mistranslation/8874178 b/classification_output/01/mistranslation/8874178 new file mode 100644 index 000000000..1ebfe2889 --- /dev/null +++ b/classification_output/01/mistranslation/8874178 @@ -0,0 +1,202 @@ +mistranslation: 0.928 +other: 0.912 +instruction: 0.835 +semantic: 0.829 + +[Qemu-devel] [Bug?] Guest pause because VMPTRLD failed in KVM + +Hello, + + We encountered a problem that a guest paused because the KMOD report VMPTRLD +failed. + +The related information is as follows: + +1) Qemu command: + /usr/bin/qemu-kvm -name omu1 -S -machine pc-i440fx-2.3,accel=kvm,usb=off -cpu +host -m 15625 -realtime mlock=off -smp 8,sockets=1,cores=8,threads=1 -uuid +a2aacfff-6583-48b4-b6a4-e6830e519931 -no-user-config -nodefaults -chardev +socket,id=charmonitor,path=/var/lib/libvirt/qemu/omu1.monitor,server,nowait +-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown +-boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive +file=/home/env/guest1.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,aio=native + -device +virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk0,id=virtio-disk0 + -drive +file=/home/env/guest_300G.img,if=none,id=drive-virtio-disk1,format=raw,cache=none,aio=native + -device +virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk1,id=virtio-disk1 + -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=26 -device +virtio-net-pci,netdev=hostnet0,id=net0,mac=00:00:80:05:00:00,bus=pci.0,addr=0x3 +-netdev tap,fd=27,id=hostnet1,vhost=on,vhostfd=28 -device +virtio-net-pci,netdev=hostnet1,id=net1,mac=00:00:80:05:00:01,bus=pci.0,addr=0x4 +-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 +-device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device +cirrus-vga,id=video0,vgamem_mb=16,bus=pci.0,addr=0x2 -device +virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on + + 2) Qemu log: + KVM: entry failed, hardware error 0x4 + RAX=00000000ffffffed RBX=ffff8803fa2d7fd8 RCX=0100000000000000 +RDX=0000000000000000 + RSI=0000000000000000 RDI=0000000000000046 RBP=ffff8803fa2d7e90 +RSP=ffff8803fa2efe90 + R8 =0000000000000000 R9 =0000000000000000 R10=0000000000000000 +R11=000000000000b69a + R12=0000000000000001 R13=ffffffff81a25b40 R14=0000000000000000 +R15=ffff8803fa2d7fd8 + RIP=ffffffff81053e16 RFL=00000286 [--S--P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 + ES =0000 0000000000000000 ffffffff 00c00000 + CS =0010 0000000000000000 ffffffff 00a09b00 DPL=0 CS64 [-RA] + SS =0018 0000000000000000 ffffffff 00c09300 DPL=0 DS [-WA] + DS =0000 0000000000000000 ffffffff 00c00000 + FS =0000 0000000000000000 ffffffff 00c00000 + GS =0000 ffff88040f540000 ffffffff 00c00000 + LDT=0000 0000000000000000 ffffffff 00c00000 + TR =0040 ffff88040f550a40 00002087 00008b00 DPL=0 TSS64-busy + GDT= ffff88040f549000 0000007f + IDT= ffffffffff529000 00000fff + CR0=80050033 CR2=00007f81ca0c5000 CR3=00000003f5081000 CR4=000407e0 + DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 +DR3=0000000000000000 + DR6=00000000ffff0ff0 DR7=0000000000000400 + EFER=0000000000000d01 + Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? <??> ?? ?? +?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? + + 3) Demsg + [347315.028339] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed + klogd 1.4.1, ---------- state change ---------- + [347315.039506] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed + [347315.051728] kvm: vmptrld ffff8817ec5f0000/17ec5f0000 failed + [347315.057472] vmwrite error: reg 6c0a value ffff88307e66e480 (err +2120672384) + [347315.064567] Pid: 69523, comm: qemu-kvm Tainted: GF X +3.0.93-0.8-default #1 + [347315.064569] Call Trace: + [347315.064587] [<ffffffff810049d5>] dump_trace+0x75/0x300 + [347315.064595] [<ffffffff8145e3e3>] dump_stack+0x69/0x6f + [347315.064617] [<ffffffffa03738de>] vmx_vcpu_load+0x11e/0x1d0 [kvm_intel] + [347315.064647] [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm] + [347315.064669] [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0 + [347315.064676] [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7 + [347315.064687] [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm] + [347315.064703] [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm] + [347315.064732] [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 +[kvm] + [347315.064759] [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] + [347315.064771] [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0 + [347315.064776] [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0 + [347315.064783] [<ffffffff81469272>] system_call_fastpath+0x16/0x1b + [347315.064797] [<00007fee51969ce7>] 0x7fee51969ce6 + [347315.064799] vmwrite error: reg 6c0c value ffff88307e664000 (err +2120630272) + [347315.064802] Pid: 69523, comm: qemu-kvm Tainted: GF X +3.0.93-0.8-default #1 + [347315.064803] Call Trace: + [347315.064807] [<ffffffff810049d5>] dump_trace+0x75/0x300 + [347315.064811] [<ffffffff8145e3e3>] dump_stack+0x69/0x6f + [347315.064817] [<ffffffffa03738ec>] vmx_vcpu_load+0x12c/0x1d0 [kvm_intel] + [347315.064832] [<ffffffffa029a204>] kvm_arch_vcpu_load+0x44/0x1d0 [kvm] + [347315.064851] [<ffffffff81054ee1>] finish_task_switch+0x81/0xe0 + [347315.064855] [<ffffffff8145f0b4>] thread_return+0x3b/0x2a7 + [347315.064865] [<ffffffffa028d9b5>] kvm_vcpu_block+0x65/0xa0 [kvm] + [347315.064880] [<ffffffffa02a16d1>] __vcpu_run+0xd1/0x260 [kvm] + [347315.064907] [<ffffffffa02a2418>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 +[kvm] + [347315.064933] [<ffffffffa028ecee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] + [347315.064943] [<ffffffff8116bdfb>] do_vfs_ioctl+0x8b/0x3b0 + [347315.064947] [<ffffffff8116c1c1>] sys_ioctl+0xa1/0xb0 + [347315.064951] [<ffffffff81469272>] system_call_fastpath+0x16/0x1b + [347315.064957] [<00007fee51969ce7>] 0x7fee51969ce6 + [347315.064959] vmwrite error: reg 6c10 value 0 (err 0) + + 4) The isssue can't be reporduced. I search the Intel VMX sepc about reaseons +of vmptrld failure: + The instruction fails if its operand is not properly aligned, sets +unsupported physical-address bits, or is equal to the VMXON + pointer. In addition, the instruction fails if the 32 bits in memory +referenced by the operand do not match the VMCS + revision identifier supported by this processor. + + But I can't find any cues from the KVM source code. It seems each + error conditions is impossible in theory. :( + +Any suggestions will be appreciated! Paolo? + +-- +Regards, +-Gonglei + +On 10/11/2016 15:10, gong lei wrote: +> +4) The isssue can't be reporduced. I search the Intel VMX sepc about +> +reaseons +> +of vmptrld failure: +> +The instruction fails if its operand is not properly aligned, sets +> +unsupported physical-address bits, or is equal to the VMXON +> +pointer. In addition, the instruction fails if the 32 bits in memory +> +referenced by the operand do not match the VMCS +> +revision identifier supported by this processor. +> +> +But I can't find any cues from the KVM source code. It seems each +> +error conditions is impossible in theory. :( +Yes, it should not happen. :( + +If it's not reproducible, it's really hard to say what it was, except a +random memory corruption elsewhere or even a bit flip (!). + +Paolo + +On 2016/11/17 20:39, Paolo Bonzini wrote: +> +> +On 10/11/2016 15:10, gong lei wrote: +> +> 4) The isssue can't be reporduced. I search the Intel VMX sepc about +> +> reaseons +> +> of vmptrld failure: +> +> The instruction fails if its operand is not properly aligned, sets +> +> unsupported physical-address bits, or is equal to the VMXON +> +> pointer. In addition, the instruction fails if the 32 bits in memory +> +> referenced by the operand do not match the VMCS +> +> revision identifier supported by this processor. +> +> +> +> But I can't find any cues from the KVM source code. It seems each +> +> error conditions is impossible in theory. :( +> +Yes, it should not happen. :( +> +> +If it's not reproducible, it's really hard to say what it was, except a +> +random memory corruption elsewhere or even a bit flip (!). +> +> +Paolo +Thanks for your reply, Paolo :) + +-- +Regards, +-Gonglei + diff --git a/classification_output/01/other/0001467 b/classification_output/01/other/0001467 new file mode 100644 index 000000000..ebd922167 --- /dev/null +++ b/classification_output/01/other/0001467 @@ -0,0 +1,100 @@ +other: 0.954 +mistranslation: 0.947 +semantic: 0.933 +instruction: 0.922 + +[BUG] Qemu abort with error "kvm_mem_ioeventfd_add: error adding ioeventfd: File exists (17)" + +Hi list, + +When I did some tests in my virtual domain with live-attached virtio deivces, I +got a coredump file of Qemu. + +The error print from qemu is "kvm_mem_ioeventfd_add: error adding ioeventfd: +File exists (17)". +And the call trace in the coredump file displays as below: +#0 0x0000ffff89acecc8 in ?? () from /usr/lib64/libc.so.6 +#1 0x0000ffff89a8acbc in raise () from /usr/lib64/libc.so.6 +#2 0x0000ffff89a78d2c in abort () from /usr/lib64/libc.so.6 +#3 0x0000aaaabd7ccf1c in kvm_mem_ioeventfd_add (listener=<optimized out>, +section=<optimized out>, match_data=<optimized out>, data=<optimized out>, +e=<optimized out>) at ../accel/kvm/kvm-all.c:1607 +#4 0x0000aaaabd6e0304 in address_space_add_del_ioeventfds (fds_old_nb=164, +fds_old=0xffff5c80a1d0, fds_new_nb=160, fds_new=0xffff5c565080, +as=0xaaaabdfa8810 <address_space_memory>) + at ../softmmu/memory.c:795 +#5 address_space_update_ioeventfds (as=0xaaaabdfa8810 <address_space_memory>) +at ../softmmu/memory.c:856 +#6 0x0000aaaabd6e24d8 in memory_region_commit () at ../softmmu/memory.c:1113 +#7 0x0000aaaabd6e25c4 in memory_region_transaction_commit () at +../softmmu/memory.c:1144 +#8 0x0000aaaabd394eb4 in pci_bridge_update_mappings +(br=br@entry=0xaaaae755f7c0) at ../hw/pci/pci_bridge.c:248 +#9 0x0000aaaabd394f4c in pci_bridge_write_config (d=0xaaaae755f7c0, +address=44, val=<optimized out>, len=4) at ../hw/pci/pci_bridge.c:272 +#10 0x0000aaaabd39a928 in rp_write_config (d=0xaaaae755f7c0, address=44, +val=128, len=4) at ../hw/pci-bridge/pcie_root_port.c:39 +#11 0x0000aaaabd6df328 in memory_region_write_accessor (mr=0xaaaae63898d0, +addr=65580, value=<optimized out>, size=4, shift=<optimized out>, +mask=<optimized out>, attrs=...) at ../softmmu/memory.c:494 +#12 0x0000aaaabd6dcb6c in access_with_adjusted_size (addr=addr@entry=65580, +value=value@entry=0xffff817adc78, size=size@entry=4, access_size_min=<optimized +out>, access_size_max=<optimized out>, + access_fn=access_fn@entry=0xaaaabd6df284 <memory_region_write_accessor>, +mr=mr@entry=0xaaaae63898d0, attrs=attrs@entry=...) at ../softmmu/memory.c:556 +#13 0x0000aaaabd6e0dc8 in memory_region_dispatch_write +(mr=mr@entry=0xaaaae63898d0, addr=65580, data=<optimized out>, op=MO_32, +attrs=attrs@entry=...) at ../softmmu/memory.c:1534 +#14 0x0000aaaabd6d0574 in flatview_write_continue (fv=fv@entry=0xffff5c02da00, +addr=addr@entry=275146407980, attrs=attrs@entry=..., +ptr=ptr@entry=0xffff8aa8c028, len=len@entry=4, + addr1=<optimized out>, l=<optimized out>, mr=mr@entry=0xaaaae63898d0) at +/usr/src/debug/qemu-6.2.0-226.aarch64/include/qemu/host-utils.h:165 +#15 0x0000aaaabd6d4584 in flatview_write (len=4, buf=0xffff8aa8c028, attrs=..., +addr=275146407980, fv=0xffff5c02da00) at ../softmmu/physmem.c:3375 +#16 address_space_write (as=<optimized out>, addr=275146407980, attrs=..., +buf=buf@entry=0xffff8aa8c028, len=4) at ../softmmu/physmem.c:3467 +#17 0x0000aaaabd6d462c in address_space_rw (as=<optimized out>, addr=<optimized +out>, attrs=..., attrs@entry=..., buf=buf@entry=0xffff8aa8c028, len=<optimized +out>, is_write=<optimized out>) + at ../softmmu/physmem.c:3477 +#18 0x0000aaaabd7cf6e8 in kvm_cpu_exec (cpu=cpu@entry=0xaaaae625dfd0) at +../accel/kvm/kvm-all.c:2970 +#19 0x0000aaaabd7d09bc in kvm_vcpu_thread_fn (arg=arg@entry=0xaaaae625dfd0) at +../accel/kvm/kvm-accel-ops.c:49 +#20 0x0000aaaabd94ccd8 in qemu_thread_start (args=<optimized out>) at +../util/qemu-thread-posix.c:559 + + +By printing more info in the coredump file, I found that the addr of +fds_old[146] and fds_new[146] are same, but fds_old[146] belonged to a +live-attached virtio-scsi device while fds_new[146] was owned by another +live-attached virtio-net. +The reason why addr conflicted was then been found from vm's console log. Just +before qemu aborted, the guest kernel crashed and kdump.service booted the +dump-capture kernel where re-alloced address for the devices. +Because those virtio devices were live-attached after vm creating, different +addr may been assigned to them in the dump-capture kernel: + +the initial kernel booting log: +[ 1.663297] pci 0000:00:02.1: BAR 14: assigned [mem 0x11900000-0x11afffff] +[ 1.664560] pci 0000:00:02.1: BAR 15: assigned [mem +0x8001800000-0x80019fffff 64bit pref] + +the dump-capture kernel booting log: +[ 1.845211] pci 0000:00:02.0: BAR 14: assigned [mem 0x11900000-0x11bfffff] +[ 1.846542] pci 0000:00:02.0: BAR 15: assigned [mem +0x8001800000-0x8001afffff 64bit pref] + + +I think directly aborting the qemu process may not be the best choice in this +case cuz it will interrupt the work of kdump.service so that failed to generate +memory dump of the crashed guest kernel. +Perhaps, IMO, the error could be simply ignored in this case and just let kdump +to reboot the system after memory-dump finishing, but I failed to find a +suitable judgment in the codes. + +Any solution for this problem? Hope I can get some helps here. + +Hao + diff --git a/classification_output/01/other/0804350 b/classification_output/01/other/0804350 new file mode 100644 index 000000000..c030ad8d3 --- /dev/null +++ b/classification_output/01/other/0804350 @@ -0,0 +1,7448 @@ +other: 0.963 +semantic: 0.946 +mistranslation: 0.929 +instruction: 0.880 + +[Qemu-devel] [BUG]Unassigned mem write during pci device hot-plug + +Hi all, + +In our test, we configured VM with several pci-bridges and a virtio-net nic +been attached with bus 4, +After VM is startup, We ping this nic from host to judge if it is working +normally. Then, we hot add pci devices to this VM with bus 0. +We found the virtio-net NIC in bus 4 is not working (can not connect) +occasionally, as it kick virtio backend failure with error below: + Unassigned mem write 00000000fc803004 = 0x1 + +memory-region: pci_bridge_pci + 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci + 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci + 00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common + 00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr + 00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device + 00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify <- io +mem unassigned + ⦠+ +We caught an exceptional address changing while this problem happened, show as +follow: +Before pci_bridge_update_mappingsï¼ + 00000000fc000000-00000000fc1fffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fc000000-00000000fc1fffff + 00000000fc200000-00000000fc3fffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fc200000-00000000fc3fffff + 00000000fc400000-00000000fc5fffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fc400000-00000000fc5fffff + 00000000fc600000-00000000fc7fffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fc600000-00000000fc7fffff + 00000000fc800000-00000000fc9fffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce + 00000000fca00000-00000000fcbfffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fca00000-00000000fcbfffff + 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fcc00000-00000000fcdfffff + 00000000fce00000-00000000fcffffff (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci 00000000fce00000-00000000fcffffff + +After pci_bridge_update_mappingsï¼ + 00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fda00000-00000000fdbfffff + 00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fdc00000-00000000fddfffff + 00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fde00000-00000000fdffffff + 00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fe000000-00000000fe1fffff + 00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fe200000-00000000fe3fffff + 00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fe400000-00000000fe5fffff + 00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fe600000-00000000fe7fffff + 00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem +@pci_bridge_pci 00000000fe800000-00000000fe9fffff + fffffffffc800000-fffffffffc800000 (prio 1, RW): alias pci_bridge_pref_mem +@pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress Space + +We have figured out why this address becomes this value, according to pci +spec, pci driver can get BAR address size by writing 0xffffffff to +the pci register firstly, and then read back the value from this register. +We didn't handle this value specially while process pci write in qemu, the +function call stack is: +Pci_bridge_dev_write_config +-> pci_bridge_write_config +-> pci_default_write_config (we update the config[address] value here to +fffffffffc800000, which should be 0xfc800000 ) +-> pci_bridge_update_mappings + ->pci_bridge_region_del(br, br->windows); +-> pci_bridge_region_init + +->pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value +fffffffffc800000) + -> +memory_region_transaction_commit + +So, as we can see, we use the wrong base address in qemu to update the memory +regions, though, we update the base address to +The correct value after pci driver in VM write the original value back, the +virtio NIC in bus 4 may still sends net packets concurrently with +The wrong memory region address. + +We have tried to skip the memory region update action in qemu while detect pci +write with 0xffffffff value, and it does work, but +This seems to be not gently. + +diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +index b2e50c3..84b405d 100644 +--- a/hw/pci/pci_bridge.c ++++ b/hw/pci/pci_bridge.c +@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, + pci_default_write_config(d, address, val, len); +- if (ranges_overlap(address, len, PCI_COMMAND, 2) || ++ if ( (val != 0xffffffff) && ++ (ranges_overlap(address, len, PCI_COMMAND, 2) || + /* io base/limit */ + ranges_overlap(address, len, PCI_IO_BASE, 2) || +@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, + ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || + /* vga enable */ +- ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { ++ ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { + pci_bridge_update_mappings(s); + } + +Thinks, +Xu + +On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +Hi all, +> +> +> +> +In our test, we configured VM with several pci-bridges and a virtio-net nic +> +been attached with bus 4, +> +> +After VM is startup, We ping this nic from host to judge if it is working +> +normally. Then, we hot add pci devices to this VM with bus 0. +> +> +We found the virtio-net NIC in bus 4 is not working (can not connect) +> +occasionally, as it kick virtio backend failure with error below: +> +> +Unassigned mem write 00000000fc803004 = 0x1 +Thanks for the report. Which guest was used to produce this problem? + +-- +MST + +n Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> Hi all, +> +> +> +> +> +> +> +> In our test, we configured VM with several pci-bridges and a +> +> virtio-net nic been attached with bus 4, +> +> +> +> After VM is startup, We ping this nic from host to judge if it is +> +> working normally. Then, we hot add pci devices to this VM with bus 0. +> +> +> +> We found the virtio-net NIC in bus 4 is not working (can not connect) +> +> occasionally, as it kick virtio backend failure with error below: +> +> +> +> Unassigned mem write 00000000fc803004 = 0x1 +> +> +Thanks for the report. Which guest was used to produce this problem? +> +> +-- +> +MST +I was seeing this problem when I hotplug a VFIO device to guest CentOS 7.4, +after that I compiled the latest Linux kernel and it also contains this problem. + +Thinks, +Xu + +On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +Hi all, +> +> +> +> +In our test, we configured VM with several pci-bridges and a virtio-net nic +> +been attached with bus 4, +> +> +After VM is startup, We ping this nic from host to judge if it is working +> +normally. Then, we hot add pci devices to this VM with bus 0. +> +> +We found the virtio-net NIC in bus 4 is not working (can not connect) +> +occasionally, as it kick virtio backend failure with error below: +> +> +Unassigned mem write 00000000fc803004 = 0x1 +> +> +> +> +memory-region: pci_bridge_pci +> +> +0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci +> +> +00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> +00000000fc800000-00000000fc800fff (prio 0, RW): virtio-pci-common +> +> +00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr +> +> +00000000fc802000-00000000fc802fff (prio 0, RW): virtio-pci-device +> +> +00000000fc803000-00000000fc803fff (prio 0, RW): virtio-pci-notify <- io +> +mem unassigned +> +> +⦠+> +> +> +> +We caught an exceptional address changing while this problem happened, show as +> +follow: +> +> +Before pci_bridge_update_mappingsï¼ +> +> +00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fc000000-00000000fc1fffff +> +> +00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fc200000-00000000fc3fffff +> +> +00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fc400000-00000000fc5fffff +> +> +00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fc600000-00000000fc7fffff +> +> +00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fc800000-00000000fc9fffff <- correct Adress Spce +> +> +00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fca00000-00000000fcbfffff +> +> +00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fcc00000-00000000fcdfffff +> +> +00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci 00000000fce00000-00000000fcffffff +> +> +> +> +After pci_bridge_update_mappingsï¼ +> +> +00000000fda00000-00000000fdbfffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fda00000-00000000fdbfffff +> +> +00000000fdc00000-00000000fddfffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fdc00000-00000000fddfffff +> +> +00000000fde00000-00000000fdffffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fde00000-00000000fdffffff +> +> +00000000fe000000-00000000fe1fffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fe000000-00000000fe1fffff +> +> +00000000fe200000-00000000fe3fffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fe200000-00000000fe3fffff +> +> +00000000fe400000-00000000fe5fffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fe400000-00000000fe5fffff +> +> +00000000fe600000-00000000fe7fffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fe600000-00000000fe7fffff +> +> +00000000fe800000-00000000fe9fffff (prio 1, RW): alias pci_bridge_mem +> +@pci_bridge_pci 00000000fe800000-00000000fe9fffff +> +> +fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +pci_bridge_pref_mem +> +@pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress +> +Space +This one is empty though right? + +> +> +> +We have figured out why this address becomes this value, according to pci +> +spec, pci driver can get BAR address size by writing 0xffffffff to +> +> +the pci register firstly, and then read back the value from this register. +OK however as you show below the BAR being sized is the BAR +if a bridge. Are you then adding a bridge device by hotplug? + + + +> +We didn't handle this value specially while process pci write in qemu, the +> +function call stack is: +> +> +Pci_bridge_dev_write_config +> +> +-> pci_bridge_write_config +> +> +-> pci_default_write_config (we update the config[address] value here to +> +fffffffffc800000, which should be 0xfc800000 ) +> +> +-> pci_bridge_update_mappings +> +> +->pci_bridge_region_del(br, br->windows); +> +> +-> pci_bridge_region_init +> +> +-> +> +pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong value +> +fffffffffc800000) +> +> +-> +> +memory_region_transaction_commit +> +> +> +> +So, as we can see, we use the wrong base address in qemu to update the memory +> +regions, though, we update the base address to +> +> +The correct value after pci driver in VM write the original value back, the +> +virtio NIC in bus 4 may still sends net packets concurrently with +> +> +The wrong memory region address. +> +> +> +> +We have tried to skip the memory region update action in qemu while detect pci +> +write with 0xffffffff value, and it does work, but +> +> +This seems to be not gently. +For sure. But I'm still puzzled as to why does Linux try to +size the BAR of the bridge while a device behind it is +used. + +Can you pls post your QEMU command line? + + + +> +> +> +diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +> +> +index b2e50c3..84b405d 100644 +> +> +--- a/hw/pci/pci_bridge.c +> +> ++++ b/hw/pci/pci_bridge.c +> +> +@@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, +> +> +pci_default_write_config(d, address, val, len); +> +> +- if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> ++ if ( (val != 0xffffffff) && +> +> ++ (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> +/* io base/limit */ +> +> +ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> +@@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, +> +> +ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> +> +/* vga enable */ +> +> +- ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +> ++ ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { +> +> +pci_bridge_update_mappings(s); +> +> +} +> +> +> +> +Thinks, +> +> +Xu +> + +On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> Hi all, +> +> +> +> +> +> +> +> In our test, we configured VM with several pci-bridges and a +> +> virtio-net nic been attached with bus 4, +> +> +> +> After VM is startup, We ping this nic from host to judge if it is +> +> working normally. Then, we hot add pci devices to this VM with bus 0. +> +> +> +> We found the virtio-net NIC in bus 4 is not working (can not connect) +> +> occasionally, as it kick virtio backend failure with error below: +> +> +> +> Unassigned mem write 00000000fc803004 = 0x1 +> +> +> +> +> +> +> +> memory-region: pci_bridge_pci +> +> +> +> 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci +> +> +> +> 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> +> +> 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> virtio-pci-common +> +> +> +> 00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr +> +> +> +> 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> virtio-pci-device +> +> +> +> 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> virtio-pci-notify <- io mem unassigned +> +> +> +> ⦠+> +> +> +> +> +> +> +> We caught an exceptional address changing while this problem happened, +> +> show as +> +> follow: +> +> +> +> Before pci_bridge_update_mappingsï¼ +> +> +> +> 00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff +> +> +> +> 00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff +> +> +> +> 00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff +> +> +> +> 00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff +> +> +> +> 00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff +> +> <- correct Adress Spce +> +> +> +> 00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff +> +> +> +> 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff +> +> +> +> 00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +> pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff +> +> +> +> +> +> +> +> After pci_bridge_update_mappingsï¼ +> +> +> +> 00000000fda00000-00000000fdbfffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff +> +> +> +> 00000000fdc00000-00000000fddfffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff +> +> +> +> 00000000fde00000-00000000fdffffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff +> +> +> +> 00000000fe000000-00000000fe1fffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff +> +> +> +> 00000000fe200000-00000000fe3fffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff +> +> +> +> 00000000fe400000-00000000fe5fffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff +> +> +> +> 00000000fe600000-00000000fe7fffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff +> +> +> +> 00000000fe800000-00000000fe9fffff (prio 1, RW): alias +> +> pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff +> +> +> +> fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +> pci_bridge_pref_mem +> +> @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress +> +Space +> +> +This one is empty though right? +> +> +> +> +> +> +> We have figured out why this address becomes this value, according to +> +> pci spec, pci driver can get BAR address size by writing 0xffffffff +> +> to +> +> +> +> the pci register firstly, and then read back the value from this register. +> +> +> +OK however as you show below the BAR being sized is the BAR if a bridge. Are +> +you then adding a bridge device by hotplug? +No, I just simply hot plugged a VFIO device to Bus 0, another interesting +phenomenon is +If I hot plug the device to other bus, this doesn't happened. + +> +> +> +> We didn't handle this value specially while process pci write in +> +> qemu, the function call stack is: +> +> +> +> Pci_bridge_dev_write_config +> +> +> +> -> pci_bridge_write_config +> +> +> +> -> pci_default_write_config (we update the config[address] value here +> +> -> to +> +> fffffffffc800000, which should be 0xfc800000 ) +> +> +> +> -> pci_bridge_update_mappings +> +> +> +> ->pci_bridge_region_del(br, br->windows); +> +> +> +> -> pci_bridge_region_init +> +> +> +> -> +> +> pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong +> +> value +> +> fffffffffc800000) +> +> +> +> -> +> +> memory_region_transaction_commit +> +> +> +> +> +> +> +> So, as we can see, we use the wrong base address in qemu to update the +> +> memory regions, though, we update the base address to +> +> +> +> The correct value after pci driver in VM write the original value +> +> back, the virtio NIC in bus 4 may still sends net packets concurrently +> +> with +> +> +> +> The wrong memory region address. +> +> +> +> +> +> +> +> We have tried to skip the memory region update action in qemu while +> +> detect pci write with 0xffffffff value, and it does work, but +> +> +> +> This seems to be not gently. +> +> +For sure. But I'm still puzzled as to why does Linux try to size the BAR of +> +the +> +bridge while a device behind it is used. +> +> +Can you pls post your QEMU command line? +My QEMU command line: +/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object +secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes + -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp +20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa +node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa +node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 +-no-user-config -nodefaults -chardev +socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait + -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet +-global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device +pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none + -device +virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 + -drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev +tap,fd=35,id=hostnet0 -device +virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 +-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 +-device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on + +I am also very curious about this issue, in the linux kernel code, maybe double +check in function pci_bridge_check_ranges triggered this problem. + + +> +> +> +> +> +> +> +> +> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +> +> +> +> index b2e50c3..84b405d 100644 +> +> +> +> --- a/hw/pci/pci_bridge.c +> +> +> +> +++ b/hw/pci/pci_bridge.c +> +> +> +> @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, +> +> +> +> pci_default_write_config(d, address, val, len); +> +> +> +> - if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> +> +> + if ( (val != 0xffffffff) && +> +> +> +> + (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> +> +> /* io base/limit */ +> +> +> +> ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> +> +> @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, +> +> +> +> ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> +> +> +> /* vga enable */ +> +> +> +> - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +> +> +> + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { +> +> +> +> pci_bridge_update_mappings(s); +> +> +> +> } +> +> +> +> +> +> +> +> Thinks, +> +> +> +> Xu +> +> + +On Mon, Dec 10, 2018 at 03:12:53AM +0000, xuyandong wrote: +> +On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > Hi all, +> +> > +> +> > +> +> > +> +> > In our test, we configured VM with several pci-bridges and a +> +> > virtio-net nic been attached with bus 4, +> +> > +> +> > After VM is startup, We ping this nic from host to judge if it is +> +> > working normally. Then, we hot add pci devices to this VM with bus 0. +> +> > +> +> > We found the virtio-net NIC in bus 4 is not working (can not connect) +> +> > occasionally, as it kick virtio backend failure with error below: +> +> > +> +> > Unassigned mem write 00000000fc803004 = 0x1 +> +> > +> +> > +> +> > +> +> > memory-region: pci_bridge_pci +> +> > +> +> > 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci +> +> > +> +> > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> > +> +> > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > virtio-pci-common +> +> > +> +> > 00000000fc801000-00000000fc801fff (prio 0, RW): virtio-pci-isr +> +> > +> +> > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > virtio-pci-device +> +> > +> +> > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > virtio-pci-notify <- io mem unassigned +> +> > +> +> > ⦠+> +> > +> +> > +> +> > +> +> > We caught an exceptional address changing while this problem happened, +> +> > show as +> +> > follow: +> +> > +> +> > Before pci_bridge_update_mappingsï¼ +> +> > +> +> > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc000000-00000000fc1fffff +> +> > +> +> > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc200000-00000000fc3fffff +> +> > +> +> > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc400000-00000000fc5fffff +> +> > +> +> > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc600000-00000000fc7fffff +> +> > +> +> > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fc800000-00000000fc9fffff +> +> > <- correct Adress Spce +> +> > +> +> > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fca00000-00000000fcbfffff +> +> > +> +> > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fcc00000-00000000fcdfffff +> +> > +> +> > 00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +> > pci_bridge_pref_mem @pci_bridge_pci 00000000fce00000-00000000fcffffff +> +> > +> +> > +> +> > +> +> > After pci_bridge_update_mappingsï¼ +> +> > +> +> > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff +> +> > +> +> > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff +> +> > +> +> > 00000000fde00000-00000000fdffffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff +> +> > +> +> > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff +> +> > +> +> > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff +> +> > +> +> > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff +> +> > +> +> > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff +> +> > +> +> > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias +> +> > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff +> +> > +> +> > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +> > pci_bridge_pref_mem +> +> > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional Adress +> +> Space +> +> +> +> This one is empty though right? +> +> +> +> > +> +> > +> +> > We have figured out why this address becomes this value, according to +> +> > pci spec, pci driver can get BAR address size by writing 0xffffffff +> +> > to +> +> > +> +> > the pci register firstly, and then read back the value from this register. +> +> +> +> +> +> OK however as you show below the BAR being sized is the BAR if a bridge. Are +> +> you then adding a bridge device by hotplug? +> +> +No, I just simply hot plugged a VFIO device to Bus 0, another interesting +> +phenomenon is +> +If I hot plug the device to other bus, this doesn't happened. +> +> +> +> +> +> +> > We didn't handle this value specially while process pci write in +> +> > qemu, the function call stack is: +> +> > +> +> > Pci_bridge_dev_write_config +> +> > +> +> > -> pci_bridge_write_config +> +> > +> +> > -> pci_default_write_config (we update the config[address] value here +> +> > -> to +> +> > fffffffffc800000, which should be 0xfc800000 ) +> +> > +> +> > -> pci_bridge_update_mappings +> +> > +> +> > ->pci_bridge_region_del(br, br->windows); +> +> > +> +> > -> pci_bridge_region_init +> +> > +> +> > -> +> +> > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong +> +> > value +> +> > fffffffffc800000) +> +> > +> +> > -> +> +> > memory_region_transaction_commit +> +> > +> +> > +> +> > +> +> > So, as we can see, we use the wrong base address in qemu to update the +> +> > memory regions, though, we update the base address to +> +> > +> +> > The correct value after pci driver in VM write the original value +> +> > back, the virtio NIC in bus 4 may still sends net packets concurrently +> +> > with +> +> > +> +> > The wrong memory region address. +> +> > +> +> > +> +> > +> +> > We have tried to skip the memory region update action in qemu while +> +> > detect pci write with 0xffffffff value, and it does work, but +> +> > +> +> > This seems to be not gently. +> +> +> +> For sure. But I'm still puzzled as to why does Linux try to size the BAR of +> +> the +> +> bridge while a device behind it is used. +> +> +> +> Can you pls post your QEMU command line? +> +> +My QEMU command line: +> +/root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S -object +> +secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194-Linux/master-key.aes +> +-machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp +> +20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 -numa +> +node,nodeid=1,cpus=5-9,mem=1024 -numa node,nodeid=2,cpus=10-14,mem=1024 -numa +> +node,nodeid=3,cpus=15-19,mem=1024 -uuid 34a588c7-b0f2-4952-b39c-47fae3411439 +> +-no-user-config -nodefaults -chardev +> +socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/monitor.sock,server,nowait +> +-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet +> +-global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device +> +pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-virtio-disk0,cache=none +> +-device +> +virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 +> +-drive if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev +> +tap,fd=35,id=hostnet0 -device +> +virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4,addr=0x1 +> +-chardev pty,id=charserial0 -device +> +isa-serial,chardev=charserial0,id=serial0 -device +> +usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on +> +> +I am also very curious about this issue, in the linux kernel code, maybe +> +double check in function pci_bridge_check_ranges triggered this problem. +If you can get the stacktrace in Linux when it tries to write this +fffff value, that would be quite helpful. + + +> +> +> +> +> +> +> +> +> > +> +> > +> +> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +> +> > +> +> > index b2e50c3..84b405d 100644 +> +> > +> +> > --- a/hw/pci/pci_bridge.c +> +> > +> +> > +++ b/hw/pci/pci_bridge.c +> +> > +> +> > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, +> +> > +> +> > pci_default_write_config(d, address, val, len); +> +> > +> +> > - if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> > +> +> > + if ( (val != 0xffffffff) && +> +> > +> +> > + (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> > +> +> > /* io base/limit */ +> +> > +> +> > ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> > +> +> > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, +> +> > +> +> > ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> +> > +> +> > /* vga enable */ +> +> > +> +> > - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +> > +> +> > + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { +> +> > +> +> > pci_bridge_update_mappings(s); +> +> > +> +> > } +> +> > +> +> > +> +> > +> +> > Thinks, +> +> > +> +> > Xu +> +> > + +On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > > Hi all, +> +> > > +> +> > > +> +> > > +> +> > > In our test, we configured VM with several pci-bridges and a +> +> > > virtio-net nic been attached with bus 4, +> +> > > +> +> > > After VM is startup, We ping this nic from host to judge if it is +> +> > > working normally. Then, we hot add pci devices to this VM with bus 0. +> +> > > +> +> > > We found the virtio-net NIC in bus 4 is not working (can not +> +> > > connect) occasionally, as it kick virtio backend failure with error +> +> > > below: +> +> > > +> +> > > Unassigned mem write 00000000fc803004 = 0x1 +> +> > > +> +> > > +> +> > > +> +> > > memory-region: pci_bridge_pci +> +> > > +> +> > > 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci +> +> > > +> +> > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> > > +> +> > > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > > virtio-pci-common +> +> > > +> +> > > 00000000fc801000-00000000fc801fff (prio 0, RW): +> +> > > virtio-pci-isr +> +> > > +> +> > > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > > virtio-pci-device +> +> > > +> +> > > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > > virtio-pci-notify <- io mem unassigned +> +> > > +> +> > > ⦠+> +> > > +> +> > > +> +> > > +> +> > > We caught an exceptional address changing while this problem +> +> > > happened, show as +> +> > > follow: +> +> > > +> +> > > Before pci_bridge_update_mappingsï¼ +> +> > > +> +> > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fc000000-00000000fc1fffff +> +> > > +> +> > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fc200000-00000000fc3fffff +> +> > > +> +> > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fc400000-00000000fc5fffff +> +> > > +> +> > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fc600000-00000000fc7fffff +> +> > > +> +> > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fc800000-00000000fc9fffff +> +> > > <- correct Adress Spce +> +> > > +> +> > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fca00000-00000000fcbfffff +> +> > > +> +> > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fcc00000-00000000fcdfffff +> +> > > +> +> > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +> > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > 00000000fce00000-00000000fcffffff +> +> > > +> +> > > +> +> > > +> +> > > After pci_bridge_update_mappingsï¼ +> +> > > +> +> > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff +> +> > > +> +> > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff +> +> > > +> +> > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff +> +> > > +> +> > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff +> +> > > +> +> > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff +> +> > > +> +> > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff +> +> > > +> +> > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff +> +> > > +> +> > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias +> +> > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff +> +> > > +> +> > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +pci_bridge_pref_mem +> +> > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional +> +> > > Adress +> +> > Space +> +> > +> +> > This one is empty though right? +> +> > +> +> > > +> +> > > +> +> > > We have figured out why this address becomes this value, +> +> > > according to pci spec, pci driver can get BAR address size by +> +> > > writing 0xffffffff to +> +> > > +> +> > > the pci register firstly, and then read back the value from this +> +> > > register. +> +> > +> +> > +> +> > OK however as you show below the BAR being sized is the BAR if a +> +> > bridge. Are you then adding a bridge device by hotplug? +> +> +> +> No, I just simply hot plugged a VFIO device to Bus 0, another +> +> interesting phenomenon is If I hot plug the device to other bus, this +> +> doesn't +> +happened. +> +> +> +> > +> +> > +> +> > > We didn't handle this value specially while process pci write in +> +> > > qemu, the function call stack is: +> +> > > +> +> > > Pci_bridge_dev_write_config +> +> > > +> +> > > -> pci_bridge_write_config +> +> > > +> +> > > -> pci_default_write_config (we update the config[address] value +> +> > > -> here to +> +> > > fffffffffc800000, which should be 0xfc800000 ) +> +> > > +> +> > > -> pci_bridge_update_mappings +> +> > > +> +> > > ->pci_bridge_region_del(br, br->windows); +> +> > > +> +> > > -> pci_bridge_region_init +> +> > > +> +> > > -> +> +> > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong +> +> > > value +> +> > > fffffffffc800000) +> +> > > +> +> > > -> +> +> > > memory_region_transaction_commit +> +> > > +> +> > > +> +> > > +> +> > > So, as we can see, we use the wrong base address in qemu to update +> +> > > the memory regions, though, we update the base address to +> +> > > +> +> > > The correct value after pci driver in VM write the original value +> +> > > back, the virtio NIC in bus 4 may still sends net packets +> +> > > concurrently with +> +> > > +> +> > > The wrong memory region address. +> +> > > +> +> > > +> +> > > +> +> > > We have tried to skip the memory region update action in qemu +> +> > > while detect pci write with 0xffffffff value, and it does work, +> +> > > but +> +> > > +> +> > > This seems to be not gently. +> +> > +> +> > For sure. But I'm still puzzled as to why does Linux try to size the +> +> > BAR of the bridge while a device behind it is used. +> +> > +> +> > Can you pls post your QEMU command line? +> +> +> +> My QEMU command line: +> +> /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S +> +> -object +> +> secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194- +> +> Linux/master-key.aes -machine +> +> pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +> host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +> size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp +> +> 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 +> +> -numa node,nodeid=1,cpus=5-9,mem=1024 -numa +> +> node,nodeid=2,cpus=10-14,mem=1024 -numa +> +> node,nodeid=3,cpus=15-19,mem=1024 -uuid +> +> 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults +> +> -chardev +> +> socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni +> +> tor.sock,server,nowait -mon +> +> chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet +> +> -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on +> +> -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +> pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +> pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +> pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +> pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +> usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +> nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +> virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +> virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +> virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +> virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +> file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v +> +> irtio-disk0,cache=none -device +> +> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id +> +> =virtio-disk0,bootindex=1 -drive +> +> if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +> ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev +> +> tap,fd=35,id=hostnet0 -device +> +> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4 +> +> ,addr=0x1 -chardev pty,id=charserial0 -device +> +> isa-serial,chardev=charserial0,id=serial0 -device +> +> usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on +> +> +> +> I am also very curious about this issue, in the linux kernel code, maybe +> +> double +> +check in function pci_bridge_check_ranges triggered this problem. +> +> +If you can get the stacktrace in Linux when it tries to write this fffff +> +value, that +> +would be quite helpful. +> +After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon is +easier to reproduce, below is my modify in kernel: +diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c +index cb389277..86e232d 100644 +--- a/drivers/pci/setup-bus.c ++++ b/drivers/pci/setup-bus.c +@@ -27,7 +27,7 @@ + #include <linux/slab.h> + #include <linux/acpi.h> + #include "pci.h" +- ++#include <linux/delay.h> + unsigned int pci_flags; + + struct pci_dev_resource { +@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus) + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, + 0xffffffff); + pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp); ++ mdelay(100); ++ printk(KERN_ERR "sleep\n"); ++ dump_stack(); + if (!tmp) + b_res[2].flags &= ~IORESOURCE_MEM_64; + pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, + +After hot plugging, we get the following log: + +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 0: assigned [mem +0xc2360000-0xc237ffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:14.0: BAR 3: assigned [mem +0xc2328000-0xc232bfff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:16 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:17 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:18 uefi-linux kernel: sleep +Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not +tainted 4.11.0-rc3+ #11 +Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + +PIIX, 1996), BIOS 0.0.0 02/06/2015 +Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn +Dec 11 09:28:18 uefi-linux kernel: Call Trace: +Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87 +Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 +Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50 +Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0 +Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 +Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 +Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 +Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 +Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 +Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 +Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410 +Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0 +Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140 +Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380 +Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90 +Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40 +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:18 uefi-linux kernel: sleep +Dec 11 09:28:18 uefi-linux kernel: CPU: 16 PID: 502 Comm: kworker/u40:1 Not +tainted 4.11.0-rc3+ #11 +Dec 11 09:28:18 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + +PIIX, 1996), BIOS 0.0.0 02/06/2015 +Dec 11 09:28:18 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn +Dec 11 09:28:18 uefi-linux kernel: Call Trace: +Dec 11 09:28:18 uefi-linux kernel: dump_stack+0x63/0x87 +Dec 11 09:28:18 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 +Dec 11 09:28:18 uefi-linux kernel: ? dev_printk+0x4d/0x50 +Dec 11 09:28:18 uefi-linux kernel: enable_slot+0x140/0x2f0 +Dec 11 09:28:18 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 +Dec 11 09:28:18 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:18 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 +Dec 11 09:28:18 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 +Dec 11 09:28:18 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 +Dec 11 09:28:18 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 +Dec 11 09:28:18 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 +Dec 11 09:28:18 uefi-linux kernel: process_one_work+0x165/0x410 +Dec 11 09:28:18 uefi-linux kernel: worker_thread+0x137/0x4c0 +Dec 11 09:28:18 uefi-linux kernel: kthread+0x101/0x140 +Dec 11 09:28:18 uefi-linux kernel: ? rescuer_thread+0x380/0x380 +Dec 11 09:28:18 uefi-linux kernel: ? kthread_park+0x90/0x90 +Dec 11 09:28:18 uefi-linux kernel: ret_from_fork+0x2c/0x40 +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:18 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:19 uefi-linux kernel: sleep +Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not +tainted 4.11.0-rc3+ #11 +Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + +PIIX, 1996), BIOS 0.0.0 02/06/2015 +Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn +Dec 11 09:28:19 uefi-linux kernel: Call Trace: +Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87 +Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 +Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50 +Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0 +Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 +Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 +Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 +Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 +Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 +Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 +Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410 +Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0 +Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140 +Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380 +Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90 +Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40 +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:19 uefi-linux kernel: sleep +Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not +tainted 4.11.0-rc3+ #11 +Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + +PIIX, 1996), BIOS 0.0.0 02/06/2015 +Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn +Dec 11 09:28:19 uefi-linux kernel: Call Trace: +Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87 +Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 +Dec 11 09:28:19 uefi-linux kernel: ? pci_conf1_read+0xba/0x100 +Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0xe9/0x960 +Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50 +Dec 11 09:28:19 uefi-linux kernel: ? pcibios_allocate_rom_resources+0x45/0x80 +Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0 +Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 +Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 +Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 +Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 +Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 +Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 +Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410 +Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0 +Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140 +Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380 +Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90 +Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40 +Dec 11 09:28:19 uefi-linux kernel: sleep +Dec 11 09:28:19 uefi-linux kernel: CPU: 17 PID: 502 Comm: kworker/u40:1 Not +tainted 4.11.0-rc3+ #11 +Dec 11 09:28:19 uefi-linux kernel: Hardware name: QEMU Standard PC (i440FX + +PIIX, 1996), BIOS 0.0.0 02/06/2015 +Dec 11 09:28:19 uefi-linux kernel: Workqueue: kacpi_hotplug acpi_hotplug_work_fn +Dec 11 09:28:19 uefi-linux kernel: Call Trace: +Dec 11 09:28:19 uefi-linux kernel: dump_stack+0x63/0x87 +Dec 11 09:28:19 uefi-linux kernel: __pci_bus_size_bridges+0x931/0x960 +Dec 11 09:28:19 uefi-linux kernel: ? dev_printk+0x4d/0x50 +Dec 11 09:28:19 uefi-linux kernel: enable_slot+0x140/0x2f0 +Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:19 uefi-linux kernel: ? __pm_runtime_resume+0x5c/0x80 +Dec 11 09:28:19 uefi-linux kernel: ? trim_stale_devices+0x9a/0x120 +Dec 11 09:28:19 uefi-linux kernel: acpiphp_check_bridge.part.6+0xf5/0x120 +Dec 11 09:28:19 uefi-linux kernel: acpiphp_hotplug_notify+0x145/0x1c0 +Dec 11 09:28:19 uefi-linux kernel: ? acpiphp_post_dock_fixup+0xc0/0xc0 +Dec 11 09:28:19 uefi-linux kernel: acpi_device_hotplug+0x3a6/0x3f3 +Dec 11 09:28:19 uefi-linux kernel: acpi_hotplug_work_fn+0x1e/0x29 +Dec 11 09:28:19 uefi-linux kernel: process_one_work+0x165/0x410 +Dec 11 09:28:19 uefi-linux kernel: worker_thread+0x137/0x4c0 +Dec 11 09:28:19 uefi-linux kernel: kthread+0x101/0x140 +Dec 11 09:28:19 uefi-linux kernel: ? rescuer_thread+0x380/0x380 +Dec 11 09:28:19 uefi-linux kernel: ? kthread_park+0x90/0x90 +Dec 11 09:28:19 uefi-linux kernel: ret_from_fork+0x2c/0x40 +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:19 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:20 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:20 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:20 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 lost sync at byte 1 +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: psmouse serio1: VMMouse at +isa0060/serio1/input0 - driver resynced. +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:21 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: PCI bridge to [bus 01] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: bridge window [io +0xf000-0xffff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2800000-0xc29fffff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:08.0: bridge window [mem +0xc2b00000-0xc2cfffff 64bit pref] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: PCI bridge to [bus 02] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: bridge window [io +0xe000-0xefff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2600000-0xc27fffff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:09.0: bridge window [mem +0xc2d00000-0xc2efffff 64bit pref] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: PCI bridge to [bus 03] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: bridge window [io +0xd000-0xdfff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2400000-0xc25fffff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:00:0a.0: bridge window [mem +0xc2f00000-0xc30fffff 64bit pref] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: PCI bridge to [bus 05] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: bridge window [io +0xc000-0xcfff] +Dec 11 09:28:22 uefi-linux kernel: pci 0000:04:0c.0: bridge window [mem +0xc2000000-0xc21fffff] + +> +> +> +> > +> +> > +> +> > +> +> > > +> +> > > +> +> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +> +> > > +> +> > > index b2e50c3..84b405d 100644 +> +> > > +> +> > > --- a/hw/pci/pci_bridge.c +> +> > > +> +> > > +++ b/hw/pci/pci_bridge.c +> +> > > +> +> > > @@ -256,7 +256,8 @@ void pci_bridge_write_config(PCIDevice *d, +> +> > > +> +> > > pci_default_write_config(d, address, val, len); +> +> > > +> +> > > - if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> > > +> +> > > + if ( (val != 0xffffffff) && +> +> > > +> +> > > + (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> > > +> +> > > /* io base/limit */ +> +> > > +> +> > > ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> > > +> +> > > @@ -266,7 +267,7 @@ void pci_bridge_write_config(PCIDevice *d, +> +> > > +> +> > > ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> +> > > +> +> > > /* vga enable */ +> +> > > +> +> > > - ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +> > > +> +> > > + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2))) { +> +> > > +> +> > > pci_bridge_update_mappings(s); +> +> > > +> +> > > } +> +> > > +> +> > > +> +> > > +> +> > > Thinks, +> +> > > +> +> > > Xu +> +> > > + +On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: +> +On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > > > Hi all, +> +> > > > +> +> > > > +> +> > > > +> +> > > > In our test, we configured VM with several pci-bridges and a +> +> > > > virtio-net nic been attached with bus 4, +> +> > > > +> +> > > > After VM is startup, We ping this nic from host to judge if it is +> +> > > > working normally. Then, we hot add pci devices to this VM with bus 0. +> +> > > > +> +> > > > We found the virtio-net NIC in bus 4 is not working (can not +> +> > > > connect) occasionally, as it kick virtio backend failure with error +> +> > > > below: +> +> > > > +> +> > > > Unassigned mem write 00000000fc803004 = 0x1 +> +> > > > +> +> > > > +> +> > > > +> +> > > > memory-region: pci_bridge_pci +> +> > > > +> +> > > > 0000000000000000-ffffffffffffffff (prio 0, RW): pci_bridge_pci +> +> > > > +> +> > > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> > > > +> +> > > > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > > > virtio-pci-common +> +> > > > +> +> > > > 00000000fc801000-00000000fc801fff (prio 0, RW): +> +> > > > virtio-pci-isr +> +> > > > +> +> > > > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > > > virtio-pci-device +> +> > > > +> +> > > > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > > > virtio-pci-notify <- io mem unassigned +> +> > > > +> +> > > > ⦠+> +> > > > +> +> > > > +> +> > > > +> +> > > > We caught an exceptional address changing while this problem +> +> > > > happened, show as +> +> > > > follow: +> +> > > > +> +> > > > Before pci_bridge_update_mappingsï¼ +> +> > > > +> +> > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fc000000-00000000fc1fffff +> +> > > > +> +> > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fc200000-00000000fc3fffff +> +> > > > +> +> > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fc400000-00000000fc5fffff +> +> > > > +> +> > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fc600000-00000000fc7fffff +> +> > > > +> +> > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fc800000-00000000fc9fffff +> +> > > > <- correct Adress Spce +> +> > > > +> +> > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fca00000-00000000fcbfffff +> +> > > > +> +> > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fcc00000-00000000fcdfffff +> +> > > > +> +> > > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +> > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > 00000000fce00000-00000000fcffffff +> +> > > > +> +> > > > +> +> > > > +> +> > > > After pci_bridge_update_mappingsï¼ +> +> > > > +> +> > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fda00000-00000000fdbfffff +> +> > > > +> +> > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fdc00000-00000000fddfffff +> +> > > > +> +> > > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fde00000-00000000fdffffff +> +> > > > +> +> > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fe000000-00000000fe1fffff +> +> > > > +> +> > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fe200000-00000000fe3fffff +> +> > > > +> +> > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fe400000-00000000fe5fffff +> +> > > > +> +> > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fe600000-00000000fe7fffff +> +> > > > +> +> > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias +> +> > > > pci_bridge_mem @pci_bridge_pci 00000000fe800000-00000000fe9fffff +> +> > > > +> +> > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +> pci_bridge_pref_mem +> +> > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional +> +> > > > Adress +> +> > > Space +> +> > > +> +> > > This one is empty though right? +> +> > > +> +> > > > +> +> > > > +> +> > > > We have figured out why this address becomes this value, +> +> > > > according to pci spec, pci driver can get BAR address size by +> +> > > > writing 0xffffffff to +> +> > > > +> +> > > > the pci register firstly, and then read back the value from this +> +> > > > register. +> +> > > +> +> > > +> +> > > OK however as you show below the BAR being sized is the BAR if a +> +> > > bridge. Are you then adding a bridge device by hotplug? +> +> > +> +> > No, I just simply hot plugged a VFIO device to Bus 0, another +> +> > interesting phenomenon is If I hot plug the device to other bus, this +> +> > doesn't +> +> happened. +> +> > +> +> > > +> +> > > +> +> > > > We didn't handle this value specially while process pci write in +> +> > > > qemu, the function call stack is: +> +> > > > +> +> > > > Pci_bridge_dev_write_config +> +> > > > +> +> > > > -> pci_bridge_write_config +> +> > > > +> +> > > > -> pci_default_write_config (we update the config[address] value +> +> > > > -> here to +> +> > > > fffffffffc800000, which should be 0xfc800000 ) +> +> > > > +> +> > > > -> pci_bridge_update_mappings +> +> > > > +> +> > > > ->pci_bridge_region_del(br, br->windows); +> +> > > > +> +> > > > -> pci_bridge_region_init +> +> > > > +> +> > > > -> +> +> > > > pci_bridge_init_alias (here pci_bridge_get_base, we use the wrong +> +> > > > value +> +> > > > fffffffffc800000) +> +> > > > +> +> > > > -> +> +> > > > memory_region_transaction_commit +> +> > > > +> +> > > > +> +> > > > +> +> > > > So, as we can see, we use the wrong base address in qemu to update +> +> > > > the memory regions, though, we update the base address to +> +> > > > +> +> > > > The correct value after pci driver in VM write the original value +> +> > > > back, the virtio NIC in bus 4 may still sends net packets +> +> > > > concurrently with +> +> > > > +> +> > > > The wrong memory region address. +> +> > > > +> +> > > > +> +> > > > +> +> > > > We have tried to skip the memory region update action in qemu +> +> > > > while detect pci write with 0xffffffff value, and it does work, +> +> > > > but +> +> > > > +> +> > > > This seems to be not gently. +> +> > > +> +> > > For sure. But I'm still puzzled as to why does Linux try to size the +> +> > > BAR of the bridge while a device behind it is used. +> +> > > +> +> > > Can you pls post your QEMU command line? +> +> > +> +> > My QEMU command line: +> +> > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S +> +> > -object +> +> > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain-194- +> +> > Linux/master-key.aes -machine +> +> > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +> > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +> > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp +> +> > 20,sockets=20,cores=1,threads=1 -numa node,nodeid=0,cpus=0-4,mem=1024 +> +> > -numa node,nodeid=1,cpus=5-9,mem=1024 -numa +> +> > node,nodeid=2,cpus=10-14,mem=1024 -numa +> +> > node,nodeid=3,cpus=15-19,mem=1024 -uuid +> +> > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults +> +> > -chardev +> +> > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/moni +> +> > tor.sock,server,nowait -mon +> +> > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet +> +> > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on +> +> > -device pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +> > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +> > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +> > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +> > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +> > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +> > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +> > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +> > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +> > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +> > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +> > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +> > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +> > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=drive-v +> +> > irtio-disk0,cache=none -device +> +> > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk0,id +> +> > =virtio-disk0,bootindex=1 -drive +> +> > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +> > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev +> +> > tap,fd=35,id=hostnet0 -device +> +> > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=pci.4 +> +> > ,addr=0x1 -chardev pty,id=charserial0 -device +> +> > isa-serial,chardev=charserial0,id=serial0 -device +> +> > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +> > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +> > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg timestamp=on +> +> > +> +> > I am also very curious about this issue, in the linux kernel code, maybe +> +> > double +> +> check in function pci_bridge_check_ranges triggered this problem. +> +> +> +> If you can get the stacktrace in Linux when it tries to write this fffff +> +> value, that +> +> would be quite helpful. +> +> +> +> +After I add mdelay(100) in function pci_bridge_check_ranges, this phenomenon +> +is +> +easier to reproduce, below is my modify in kernel: +> +diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c +> +index cb389277..86e232d 100644 +> +--- a/drivers/pci/setup-bus.c +> ++++ b/drivers/pci/setup-bus.c +> +@@ -27,7 +27,7 @@ +> +#include <linux/slab.h> +> +#include <linux/acpi.h> +> +#include "pci.h" +> +- +> ++#include <linux/delay.h> +> +unsigned int pci_flags; +> +> +struct pci_dev_resource { +> +@@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus *bus) +> +pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +0xffffffff); +> +pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, &tmp); +> ++ mdelay(100); +> ++ printk(KERN_ERR "sleep\n"); +> ++ dump_stack(); +> +if (!tmp) +> +b_res[2].flags &= ~IORESOURCE_MEM_64; +> +pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +OK! +I just sent a Linux patch that should help. +I would appreciate it if you will give it a try +and if that helps reply to it with +a Tested-by: tag. + +-- +MST + +On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: +> +> On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > > > > Hi all, +> +> > > > > +> +> > > > > +> +> > > > > +> +> > > > > In our test, we configured VM with several pci-bridges and a +> +> > > > > virtio-net nic been attached with bus 4, +> +> > > > > +> +> > > > > After VM is startup, We ping this nic from host to judge if it +> +> > > > > is working normally. Then, we hot add pci devices to this VM with +> +> > > > > bus +> +0. +> +> > > > > +> +> > > > > We found the virtio-net NIC in bus 4 is not working (can not +> +> > > > > connect) occasionally, as it kick virtio backend failure with error +> +> > > > > below: +> +> > > > > +> +> > > > > Unassigned mem write 00000000fc803004 = 0x1 +> +> > > > > +> +> > > > > +> +> > > > > +> +> > > > > memory-region: pci_bridge_pci +> +> > > > > +> +> > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): +> +> > > > > pci_bridge_pci +> +> > > > > +> +> > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> > > > > +> +> > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > > > > virtio-pci-common +> +> > > > > +> +> > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): +> +> > > > > virtio-pci-isr +> +> > > > > +> +> > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > > > > virtio-pci-device +> +> > > > > +> +> > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > > > > virtio-pci-notify <- io mem unassigned +> +> > > > > +> +> > > > > ⦠+> +> > > > > +> +> > > > > +> +> > > > > +> +> > > > > We caught an exceptional address changing while this problem +> +> > > > > happened, show as +> +> > > > > follow: +> +> > > > > +> +> > > > > Before pci_bridge_update_mappingsï¼ +> +> > > > > +> +> > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fc000000-00000000fc1fffff +> +> > > > > +> +> > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fc200000-00000000fc3fffff +> +> > > > > +> +> > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fc400000-00000000fc5fffff +> +> > > > > +> +> > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fc600000-00000000fc7fffff +> +> > > > > +> +> > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fc800000-00000000fc9fffff +> +> > > > > <- correct Adress Spce +> +> > > > > +> +> > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fca00000-00000000fcbfffff +> +> > > > > +> +> > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fcc00000-00000000fcdfffff +> +> > > > > +> +> > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +> > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > 00000000fce00000-00000000fcffffff +> +> > > > > +> +> > > > > +> +> > > > > +> +> > > > > After pci_bridge_update_mappingsï¼ +> +> > > > > +> +> > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fda00000-00000000fdbfffff +> +> > > > > +> +> > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fdc00000-00000000fddfffff +> +> > > > > +> +> > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fde00000-00000000fdffffff +> +> > > > > +> +> > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fe000000-00000000fe1fffff +> +> > > > > +> +> > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fe200000-00000000fe3fffff +> +> > > > > +> +> > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fe400000-00000000fe5fffff +> +> > > > > +> +> > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fe600000-00000000fe7fffff +> +> > > > > +> +> > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias +> +> > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > 00000000fe800000-00000000fe9fffff +> +> > > > > +> +> > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +> > pci_bridge_pref_mem +> +> > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional +> +Adress +> +> > > > Space +> +> > > > +> +> > > > This one is empty though right? +> +> > > > +> +> > > > > +> +> > > > > +> +> > > > > We have figured out why this address becomes this value, +> +> > > > > according to pci spec, pci driver can get BAR address size by +> +> > > > > writing 0xffffffff to +> +> > > > > +> +> > > > > the pci register firstly, and then read back the value from this +> +> > > > > register. +> +> > > > +> +> > > > +> +> > > > OK however as you show below the BAR being sized is the BAR if a +> +> > > > bridge. Are you then adding a bridge device by hotplug? +> +> > > +> +> > > No, I just simply hot plugged a VFIO device to Bus 0, another +> +> > > interesting phenomenon is If I hot plug the device to other bus, +> +> > > this doesn't +> +> > happened. +> +> > > +> +> > > > +> +> > > > +> +> > > > > We didn't handle this value specially while process pci write +> +> > > > > in qemu, the function call stack is: +> +> > > > > +> +> > > > > Pci_bridge_dev_write_config +> +> > > > > +> +> > > > > -> pci_bridge_write_config +> +> > > > > +> +> > > > > -> pci_default_write_config (we update the config[address] +> +> > > > > -> value here to +> +> > > > > fffffffffc800000, which should be 0xfc800000 ) +> +> > > > > +> +> > > > > -> pci_bridge_update_mappings +> +> > > > > +> +> > > > > ->pci_bridge_region_del(br, br->windows); +> +> > > > > +> +> > > > > -> pci_bridge_region_init +> +> > > > > +> +> > > > > +> +> > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the +> +> > > > > wrong value +> +> > > > > fffffffffc800000) +> +> > > > > +> +> > > > > -> +> +> > > > > memory_region_transaction_commit +> +> > > > > +> +> > > > > +> +> > > > > +> +> > > > > So, as we can see, we use the wrong base address in qemu to +> +> > > > > update the memory regions, though, we update the base address +> +> > > > > to +> +> > > > > +> +> > > > > The correct value after pci driver in VM write the original +> +> > > > > value back, the virtio NIC in bus 4 may still sends net +> +> > > > > packets concurrently with +> +> > > > > +> +> > > > > The wrong memory region address. +> +> > > > > +> +> > > > > +> +> > > > > +> +> > > > > We have tried to skip the memory region update action in qemu +> +> > > > > while detect pci write with 0xffffffff value, and it does +> +> > > > > work, but +> +> > > > > +> +> > > > > This seems to be not gently. +> +> > > > +> +> > > > For sure. But I'm still puzzled as to why does Linux try to size +> +> > > > the BAR of the bridge while a device behind it is used. +> +> > > > +> +> > > > Can you pls post your QEMU command line? +> +> > > +> +> > > My QEMU command line: +> +> > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S +> +> > > -object +> +> > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain- +> +> > > 194- +> +> > > Linux/master-key.aes -machine +> +> > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +> > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +> > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp +> +> > > 20,sockets=20,cores=1,threads=1 -numa +> +> > > node,nodeid=0,cpus=0-4,mem=1024 -numa +> +> > > node,nodeid=1,cpus=5-9,mem=1024 -numa +> +> > > node,nodeid=2,cpus=10-14,mem=1024 -numa +> +> > > node,nodeid=3,cpus=15-19,mem=1024 -uuid +> +> > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults +> +> > > -chardev +> +> > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/ +> +> > > moni +> +> > > tor.sock,server,nowait -mon +> +> > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet +> +> > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot +> +> > > strict=on -device +> +> > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +> > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +> > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +> > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +> > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +> > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +> > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +> > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +> > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +> > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +> > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +> > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +> > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +> > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri +> +> > > ve-v +> +> > > irtio-disk0,cache=none -device +> +> > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk +> +> > > 0,id +> +> > > =virtio-disk0,bootindex=1 -drive +> +> > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +> > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev +> +> > > tap,fd=35,id=hostnet0 -device +> +> > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p +> +> > > ci.4 +> +> > > ,addr=0x1 -chardev pty,id=charserial0 -device +> +> > > isa-serial,chardev=charserial0,id=serial0 -device +> +> > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +> > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +> > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg +> +> > > timestamp=on +> +> > > +> +> > > I am also very curious about this issue, in the linux kernel code, +> +> > > maybe double +> +> > check in function pci_bridge_check_ranges triggered this problem. +> +> > +> +> > If you can get the stacktrace in Linux when it tries to write this +> +> > fffff value, that would be quite helpful. +> +> > +> +> +> +> After I add mdelay(100) in function pci_bridge_check_ranges, this +> +> phenomenon is easier to reproduce, below is my modify in kernel: +> +> diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index +> +> cb389277..86e232d 100644 +> +> --- a/drivers/pci/setup-bus.c +> +> +++ b/drivers/pci/setup-bus.c +> +> @@ -27,7 +27,7 @@ +> +> #include <linux/slab.h> +> +> #include <linux/acpi.h> +> +> #include "pci.h" +> +> - +> +> +#include <linux/delay.h> +> +> unsigned int pci_flags; +> +> +> +> struct pci_dev_resource { +> +> @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus +> +*bus) +> +> pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> 0xffffffff); +> +> pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> &tmp); +> +> + mdelay(100); +> +> + printk(KERN_ERR "sleep\n"); +> +> + dump_stack(); +> +> if (!tmp) +> +> b_res[2].flags &= ~IORESOURCE_MEM_64; +> +> pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> +> +> +OK! +> +I just sent a Linux patch that should help. +> +I would appreciate it if you will give it a try and if that helps reply to it +> +with a +> +Tested-by: tag. +> +I tested this patch and it works fine on my machine. + +But I have another question, if we only fix this problem in the kernel, the +Linux +version that has been released does not work well on the virtualization +platform. +Is there a way to fix this problem in the backend? + +> +-- +> +MST + +On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote: +> +On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: +> +> > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > > > > > Hi all, +> +> > > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > In our test, we configured VM with several pci-bridges and a +> +> > > > > > virtio-net nic been attached with bus 4, +> +> > > > > > +> +> > > > > > After VM is startup, We ping this nic from host to judge if it +> +> > > > > > is working normally. Then, we hot add pci devices to this VM with +> +> > > > > > bus +> +> 0. +> +> > > > > > +> +> > > > > > We found the virtio-net NIC in bus 4 is not working (can not +> +> > > > > > connect) occasionally, as it kick virtio backend failure with +> +> > > > > > error below: +> +> > > > > > +> +> > > > > > Unassigned mem write 00000000fc803004 = 0x1 +> +> > > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > memory-region: pci_bridge_pci +> +> > > > > > +> +> > > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): +> +> > > > > > pci_bridge_pci +> +> > > > > > +> +> > > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): virtio-pci +> +> > > > > > +> +> > > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > > > > > virtio-pci-common +> +> > > > > > +> +> > > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): +> +> > > > > > virtio-pci-isr +> +> > > > > > +> +> > > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > > > > > virtio-pci-device +> +> > > > > > +> +> > > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > > > > > virtio-pci-notify <- io mem unassigned +> +> > > > > > +> +> > > > > > ⦠+> +> > > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > We caught an exceptional address changing while this problem +> +> > > > > > happened, show as +> +> > > > > > follow: +> +> > > > > > +> +> > > > > > Before pci_bridge_update_mappingsï¼ +> +> > > > > > +> +> > > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fc000000-00000000fc1fffff +> +> > > > > > +> +> > > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fc200000-00000000fc3fffff +> +> > > > > > +> +> > > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fc400000-00000000fc5fffff +> +> > > > > > +> +> > > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fc600000-00000000fc7fffff +> +> > > > > > +> +> > > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fc800000-00000000fc9fffff +> +> > > > > > <- correct Adress Spce +> +> > > > > > +> +> > > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fca00000-00000000fcbfffff +> +> > > > > > +> +> > > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fcc00000-00000000fcdfffff +> +> > > > > > +> +> > > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): alias +> +> > > > > > pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > 00000000fce00000-00000000fcffffff +> +> > > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > After pci_bridge_update_mappingsï¼ +> +> > > > > > +> +> > > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fda00000-00000000fdbfffff +> +> > > > > > +> +> > > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fdc00000-00000000fddfffff +> +> > > > > > +> +> > > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fde00000-00000000fdffffff +> +> > > > > > +> +> > > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fe000000-00000000fe1fffff +> +> > > > > > +> +> > > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fe200000-00000000fe3fffff +> +> > > > > > +> +> > > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fe400000-00000000fe5fffff +> +> > > > > > +> +> > > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fe600000-00000000fe7fffff +> +> > > > > > +> +> > > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): alias +> +> > > > > > pci_bridge_mem @pci_bridge_pci +> +> > > > > > 00000000fe800000-00000000fe9fffff +> +> > > > > > +> +> > > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): alias +> +> > > pci_bridge_pref_mem +> +> > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- Exceptional +> +> Adress +> +> > > > > Space +> +> > > > > +> +> > > > > This one is empty though right? +> +> > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > We have figured out why this address becomes this value, +> +> > > > > > according to pci spec, pci driver can get BAR address size by +> +> > > > > > writing 0xffffffff to +> +> > > > > > +> +> > > > > > the pci register firstly, and then read back the value from this +> +> > > > > > register. +> +> > > > > +> +> > > > > +> +> > > > > OK however as you show below the BAR being sized is the BAR if a +> +> > > > > bridge. Are you then adding a bridge device by hotplug? +> +> > > > +> +> > > > No, I just simply hot plugged a VFIO device to Bus 0, another +> +> > > > interesting phenomenon is If I hot plug the device to other bus, +> +> > > > this doesn't +> +> > > happened. +> +> > > > +> +> > > > > +> +> > > > > +> +> > > > > > We didn't handle this value specially while process pci write +> +> > > > > > in qemu, the function call stack is: +> +> > > > > > +> +> > > > > > Pci_bridge_dev_write_config +> +> > > > > > +> +> > > > > > -> pci_bridge_write_config +> +> > > > > > +> +> > > > > > -> pci_default_write_config (we update the config[address] +> +> > > > > > -> value here to +> +> > > > > > fffffffffc800000, which should be 0xfc800000 ) +> +> > > > > > +> +> > > > > > -> pci_bridge_update_mappings +> +> > > > > > +> +> > > > > > ->pci_bridge_region_del(br, br->windows); +> +> > > > > > +> +> > > > > > -> pci_bridge_region_init +> +> > > > > > +> +> > > > > > +> +> > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use the +> +> > > > > > wrong value +> +> > > > > > fffffffffc800000) +> +> > > > > > +> +> > > > > > -> +> +> > > > > > memory_region_transaction_commit +> +> > > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > So, as we can see, we use the wrong base address in qemu to +> +> > > > > > update the memory regions, though, we update the base address +> +> > > > > > to +> +> > > > > > +> +> > > > > > The correct value after pci driver in VM write the original +> +> > > > > > value back, the virtio NIC in bus 4 may still sends net +> +> > > > > > packets concurrently with +> +> > > > > > +> +> > > > > > The wrong memory region address. +> +> > > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > We have tried to skip the memory region update action in qemu +> +> > > > > > while detect pci write with 0xffffffff value, and it does +> +> > > > > > work, but +> +> > > > > > +> +> > > > > > This seems to be not gently. +> +> > > > > +> +> > > > > For sure. But I'm still puzzled as to why does Linux try to size +> +> > > > > the BAR of the bridge while a device behind it is used. +> +> > > > > +> +> > > > > Can you pls post your QEMU command line? +> +> > > > +> +> > > > My QEMU command line: +> +> > > > /root/xyd/qemu-system-x86_64 -name guest=Linux,debug-threads=on -S +> +> > > > -object +> +> > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/domain- +> +> > > > 194- +> +> > > > Linux/master-key.aes -machine +> +> > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +> > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +> > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off -smp +> +> > > > 20,sockets=20,cores=1,threads=1 -numa +> +> > > > node,nodeid=0,cpus=0-4,mem=1024 -numa +> +> > > > node,nodeid=1,cpus=5-9,mem=1024 -numa +> +> > > > node,nodeid=2,cpus=10-14,mem=1024 -numa +> +> > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid +> +> > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config -nodefaults +> +> > > > -chardev +> +> > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Linux/ +> +> > > > moni +> +> > > > tor.sock,server,nowait -mon +> +> > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-hpet +> +> > > > -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot +> +> > > > strict=on -device +> +> > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +> > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +> > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +> > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +> > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +> > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +> > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +> > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +> > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +> > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +> > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +> > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +> > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +> > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id=dri +> +> > > > ve-v +> +> > > > irtio-disk0,cache=none -device +> +> > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio-disk +> +> > > > 0,id +> +> > > > =virtio-disk0,bootindex=1 -drive +> +> > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +> > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 -netdev +> +> > > > tap,fd=35,id=hostnet0 -device +> +> > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,bus=p +> +> > > > ci.4 +> +> > > > ,addr=0x1 -chardev pty,id=charserial0 -device +> +> > > > isa-serial,chardev=charserial0,id=serial0 -device +> +> > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +> > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +> > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg +> +> > > > timestamp=on +> +> > > > +> +> > > > I am also very curious about this issue, in the linux kernel code, +> +> > > > maybe double +> +> > > check in function pci_bridge_check_ranges triggered this problem. +> +> > > +> +> > > If you can get the stacktrace in Linux when it tries to write this +> +> > > fffff value, that would be quite helpful. +> +> > > +> +> > +> +> > After I add mdelay(100) in function pci_bridge_check_ranges, this +> +> > phenomenon is easier to reproduce, below is my modify in kernel: +> +> > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index +> +> > cb389277..86e232d 100644 +> +> > --- a/drivers/pci/setup-bus.c +> +> > +++ b/drivers/pci/setup-bus.c +> +> > @@ -27,7 +27,7 @@ +> +> > #include <linux/slab.h> +> +> > #include <linux/acpi.h> +> +> > #include "pci.h" +> +> > - +> +> > +#include <linux/delay.h> +> +> > unsigned int pci_flags; +> +> > +> +> > struct pci_dev_resource { +> +> > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct pci_bus +> +> *bus) +> +> > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> > 0xffffffff); +> +> > pci_read_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> > &tmp); +> +> > + mdelay(100); +> +> > + printk(KERN_ERR "sleep\n"); +> +> > + dump_stack(); +> +> > if (!tmp) +> +> > b_res[2].flags &= ~IORESOURCE_MEM_64; +> +> > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> > +> +> +> +> OK! +> +> I just sent a Linux patch that should help. +> +> I would appreciate it if you will give it a try and if that helps reply to +> +> it with a +> +> Tested-by: tag. +> +> +> +> +I tested this patch and it works fine on my machine. +> +> +But I have another question, if we only fix this problem in the kernel, the +> +Linux +> +version that has been released does not work well on the virtualization +> +platform. +> +Is there a way to fix this problem in the backend? +There could we a way to work around this. +Does below help? + +diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c +index 236a20eaa8..7834cac4b0 100644 +--- a/hw/i386/acpi-build.c ++++ b/hw/i386/acpi-build.c +@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml *parent_scope, +PCIBus *bus, + + aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); + aml_append(method, +- aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check */) ++ aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device Check +Light */) + ); + aml_append(method, + aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request */) + +> +On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote: +> +> On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: +> +> > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > > > > > > Hi all, +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > In our test, we configured VM with several pci-bridges and +> +> > > > > > > a virtio-net nic been attached with bus 4, +> +> > > > > > > +> +> > > > > > > After VM is startup, We ping this nic from host to judge +> +> > > > > > > if it is working normally. Then, we hot add pci devices to +> +> > > > > > > this VM with bus +> +> > 0. +> +> > > > > > > +> +> > > > > > > We found the virtio-net NIC in bus 4 is not working (can +> +> > > > > > > not +> +> > > > > > > connect) occasionally, as it kick virtio backend failure with +> +> > > > > > > error +> +below: +> +> > > > > > > +> +> > > > > > > Unassigned mem write 00000000fc803004 = 0x1 +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > memory-region: pci_bridge_pci +> +> > > > > > > +> +> > > > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): +> +> > > > > > > pci_bridge_pci +> +> > > > > > > +> +> > > > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): +> +> > > > > > > virtio-pci +> +> > > > > > > +> +> > > > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > > > > > > virtio-pci-common +> +> > > > > > > +> +> > > > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): +> +> > > > > > > virtio-pci-isr +> +> > > > > > > +> +> > > > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > > > > > > virtio-pci-device +> +> > > > > > > +> +> > > > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > > > > > > virtio-pci-notify <- io mem unassigned +> +> > > > > > > +> +> > > > > > > ⦠+> +> > > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > We caught an exceptional address changing while this +> +> > > > > > > problem happened, show as +> +> > > > > > > follow: +> +> > > > > > > +> +> > > > > > > Before pci_bridge_update_mappingsï¼ +> +> > > > > > > +> +> > > > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fc000000-00000000fc1fffff +> +> > > > > > > +> +> > > > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fc200000-00000000fc3fffff +> +> > > > > > > +> +> > > > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fc400000-00000000fc5fffff +> +> > > > > > > +> +> > > > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fc600000-00000000fc7fffff +> +> > > > > > > +> +> > > > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fc800000-00000000fc9fffff +> +> > > > > > > <- correct Adress Spce +> +> > > > > > > +> +> > > > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fca00000-00000000fcbfffff +> +> > > > > > > +> +> > > > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fcc00000-00000000fcdfffff +> +> > > > > > > +> +> > > > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > 00000000fce00000-00000000fcffffff +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > After pci_bridge_update_mappingsï¼ +> +> > > > > > > +> +> > > > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fda00000-00000000fdbfffff +> +> > > > > > > +> +> > > > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fdc00000-00000000fddfffff +> +> > > > > > > +> +> > > > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fde00000-00000000fdffffff +> +> > > > > > > +> +> > > > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fe000000-00000000fe1fffff +> +> > > > > > > +> +> > > > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fe200000-00000000fe3fffff +> +> > > > > > > +> +> > > > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fe400000-00000000fe5fffff +> +> > > > > > > +> +> > > > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fe600000-00000000fe7fffff +> +> > > > > > > +> +> > > > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): +> +> > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > 00000000fe800000-00000000fe9fffff +> +> > > > > > > +> +> > > > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): +> +> > > > > > > alias +> +> > > > pci_bridge_pref_mem +> +> > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- +> +> > > > > > > Exceptional +> +> > Adress +> +> > > > > > Space +> +> > > > > > +> +> > > > > > This one is empty though right? +> +> > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > We have figured out why this address becomes this value, +> +> > > > > > > according to pci spec, pci driver can get BAR address +> +> > > > > > > size by writing 0xffffffff to +> +> > > > > > > +> +> > > > > > > the pci register firstly, and then read back the value from this +> +register. +> +> > > > > > +> +> > > > > > +> +> > > > > > OK however as you show below the BAR being sized is the BAR +> +> > > > > > if a bridge. Are you then adding a bridge device by hotplug? +> +> > > > > +> +> > > > > No, I just simply hot plugged a VFIO device to Bus 0, another +> +> > > > > interesting phenomenon is If I hot plug the device to other +> +> > > > > bus, this doesn't +> +> > > > happened. +> +> > > > > +> +> > > > > > +> +> > > > > > +> +> > > > > > > We didn't handle this value specially while process pci +> +> > > > > > > write in qemu, the function call stack is: +> +> > > > > > > +> +> > > > > > > Pci_bridge_dev_write_config +> +> > > > > > > +> +> > > > > > > -> pci_bridge_write_config +> +> > > > > > > +> +> > > > > > > -> pci_default_write_config (we update the config[address] +> +> > > > > > > -> value here to +> +> > > > > > > fffffffffc800000, which should be 0xfc800000 ) +> +> > > > > > > +> +> > > > > > > -> pci_bridge_update_mappings +> +> > > > > > > +> +> > > > > > > ->pci_bridge_region_del(br, br->windows); +> +> > > > > > > +> +> > > > > > > -> pci_bridge_region_init +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use +> +> > > > > > > -> the +> +> > > > > > > wrong value +> +> > > > > > > fffffffffc800000) +> +> > > > > > > +> +> > > > > > > -> +> +> > > > > > > memory_region_transaction_commit +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > So, as we can see, we use the wrong base address in qemu +> +> > > > > > > to update the memory regions, though, we update the base +> +> > > > > > > address to +> +> > > > > > > +> +> > > > > > > The correct value after pci driver in VM write the +> +> > > > > > > original value back, the virtio NIC in bus 4 may still +> +> > > > > > > sends net packets concurrently with +> +> > > > > > > +> +> > > > > > > The wrong memory region address. +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > We have tried to skip the memory region update action in +> +> > > > > > > qemu while detect pci write with 0xffffffff value, and it +> +> > > > > > > does work, but +> +> > > > > > > +> +> > > > > > > This seems to be not gently. +> +> > > > > > +> +> > > > > > For sure. But I'm still puzzled as to why does Linux try to +> +> > > > > > size the BAR of the bridge while a device behind it is used. +> +> > > > > > +> +> > > > > > Can you pls post your QEMU command line? +> +> > > > > +> +> > > > > My QEMU command line: +> +> > > > > /root/xyd/qemu-system-x86_64 -name +> +> > > > > guest=Linux,debug-threads=on -S -object +> +> > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom +> +> > > > > ain- +> +> > > > > 194- +> +> > > > > Linux/master-key.aes -machine +> +> > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +> > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +> > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off +> +> > > > > -smp +> +> > > > > 20,sockets=20,cores=1,threads=1 -numa +> +> > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa +> +> > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa +> +> > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa +> +> > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid +> +> > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config +> +> > > > > -nodefaults -chardev +> +> > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li +> +> > > > > nux/ +> +> > > > > moni +> +> > > > > tor.sock,server,nowait -mon +> +> > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc +> +> > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown +> +> > > > > -boot strict=on -device +> +> > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +> > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +> > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +> > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +> > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +> > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +> > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +> > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +> > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +> > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +> > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +> > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +> > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +> > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id +> +> > > > > =dri +> +> > > > > ve-v +> +> > > > > irtio-disk0,cache=none -device +> +> > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio- +> +> > > > > disk +> +> > > > > 0,id +> +> > > > > =virtio-disk0,bootindex=1 -drive +> +> > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +> > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 +> +> > > > > -netdev +> +> > > > > tap,fd=35,id=hostnet0 -device +> +> > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b +> +> > > > > us=p +> +> > > > > ci.4 +> +> > > > > ,addr=0x1 -chardev pty,id=charserial0 -device +> +> > > > > isa-serial,chardev=charserial0,id=serial0 -device +> +> > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +> > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +> > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg +> +> > > > > timestamp=on +> +> > > > > +> +> > > > > I am also very curious about this issue, in the linux kernel +> +> > > > > code, maybe double +> +> > > > check in function pci_bridge_check_ranges triggered this problem. +> +> > > > +> +> > > > If you can get the stacktrace in Linux when it tries to write +> +> > > > this fffff value, that would be quite helpful. +> +> > > > +> +> > > +> +> > > After I add mdelay(100) in function pci_bridge_check_ranges, this +> +> > > phenomenon is easier to reproduce, below is my modify in kernel: +> +> > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c +> +> > > index cb389277..86e232d 100644 +> +> > > --- a/drivers/pci/setup-bus.c +> +> > > +++ b/drivers/pci/setup-bus.c +> +> > > @@ -27,7 +27,7 @@ +> +> > > #include <linux/slab.h> +> +> > > #include <linux/acpi.h> +> +> > > #include "pci.h" +> +> > > - +> +> > > +#include <linux/delay.h> +> +> > > unsigned int pci_flags; +> +> > > +> +> > > struct pci_dev_resource { +> +> > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct +> +> > > pci_bus +> +> > *bus) +> +> > > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> > > 0xffffffff); +> +> > > pci_read_config_dword(bridge, +> +> > > PCI_PREF_BASE_UPPER32, &tmp); +> +> > > + mdelay(100); +> +> > > + printk(KERN_ERR "sleep\n"); +> +> > > + dump_stack(); +> +> > > if (!tmp) +> +> > > b_res[2].flags &= ~IORESOURCE_MEM_64; +> +> > > pci_write_config_dword(bridge, +> +> > > PCI_PREF_BASE_UPPER32, +> +> > > +> +> > +> +> > OK! +> +> > I just sent a Linux patch that should help. +> +> > I would appreciate it if you will give it a try and if that helps +> +> > reply to it with a +> +> > Tested-by: tag. +> +> > +> +> +> +> I tested this patch and it works fine on my machine. +> +> +> +> But I have another question, if we only fix this problem in the +> +> kernel, the Linux version that has been released does not work well on the +> +virtualization platform. +> +> Is there a way to fix this problem in the backend? +> +> +There could we a way to work around this. +> +Does below help? +I am sorry to tell you, I tested this patch and it doesn't work fine. + +> +> +diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index +> +236a20eaa8..7834cac4b0 100644 +> +--- a/hw/i386/acpi-build.c +> ++++ b/hw/i386/acpi-build.c +> +@@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml +> +*parent_scope, PCIBus *bus, +> +> +aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); +> +aml_append(method, +> +- aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check +> +*/) +> ++ aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device +> ++ Check Light */) +> +); +> +aml_append(method, +> +aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request +> +*/) + +On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote: +> +> There could we a way to work around this. +> +> Does below help? +> +> +I am sorry to tell you, I tested this patch and it doesn't work fine. +What happens? + +> +> +> +> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index +> +> 236a20eaa8..7834cac4b0 100644 +> +> --- a/hw/i386/acpi-build.c +> +> +++ b/hw/i386/acpi-build.c +> +> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml +> +> *parent_scope, PCIBus *bus, +> +> +> +> aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); +> +> aml_append(method, +> +> - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check +> +> */) +> +> + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device +> +> + Check Light */) +> +> ); +> +> aml_append(method, +> +> aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request +> +> */) + +On Tue, Dec 11, 2018 at 03:51:09AM +0000, xuyandong wrote: +> +> On Tue, Dec 11, 2018 at 02:55:43AM +0000, xuyandong wrote: +> +> > On Tue, Dec 11, 2018 at 01:47:37AM +0000, xuyandong wrote: +> +> > > > On Sat, Dec 08, 2018 at 11:58:59AM +0000, xuyandong wrote: +> +> > > > > > > > Hi all, +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > In our test, we configured VM with several pci-bridges and +> +> > > > > > > > a virtio-net nic been attached with bus 4, +> +> > > > > > > > +> +> > > > > > > > After VM is startup, We ping this nic from host to judge +> +> > > > > > > > if it is working normally. Then, we hot add pci devices to +> +> > > > > > > > this VM with bus +> +> > > 0. +> +> > > > > > > > +> +> > > > > > > > We found the virtio-net NIC in bus 4 is not working (can +> +> > > > > > > > not +> +> > > > > > > > connect) occasionally, as it kick virtio backend failure with +> +> > > > > > > > error +> +> below: +> +> > > > > > > > +> +> > > > > > > > Unassigned mem write 00000000fc803004 = 0x1 +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > memory-region: pci_bridge_pci +> +> > > > > > > > +> +> > > > > > > > 0000000000000000-ffffffffffffffff (prio 0, RW): +> +> > > > > > > > pci_bridge_pci +> +> > > > > > > > +> +> > > > > > > > 00000000fc800000-00000000fc803fff (prio 1, RW): +> +> > > > > > > > virtio-pci +> +> > > > > > > > +> +> > > > > > > > 00000000fc800000-00000000fc800fff (prio 0, RW): +> +> > > > > > > > virtio-pci-common +> +> > > > > > > > +> +> > > > > > > > 00000000fc801000-00000000fc801fff (prio 0, RW): +> +> > > > > > > > virtio-pci-isr +> +> > > > > > > > +> +> > > > > > > > 00000000fc802000-00000000fc802fff (prio 0, RW): +> +> > > > > > > > virtio-pci-device +> +> > > > > > > > +> +> > > > > > > > 00000000fc803000-00000000fc803fff (prio 0, RW): +> +> > > > > > > > virtio-pci-notify <- io mem unassigned +> +> > > > > > > > +> +> > > > > > > > ⦠+> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > We caught an exceptional address changing while this +> +> > > > > > > > problem happened, show as +> +> > > > > > > > follow: +> +> > > > > > > > +> +> > > > > > > > Before pci_bridge_update_mappingsï¼ +> +> > > > > > > > +> +> > > > > > > > 00000000fc000000-00000000fc1fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fc000000-00000000fc1fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fc200000-00000000fc3fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fc200000-00000000fc3fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fc400000-00000000fc5fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fc400000-00000000fc5fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fc600000-00000000fc7fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fc600000-00000000fc7fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fc800000-00000000fc9fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fc800000-00000000fc9fffff +> +> > > > > > > > <- correct Adress Spce +> +> > > > > > > > +> +> > > > > > > > 00000000fca00000-00000000fcbfffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fca00000-00000000fcbfffff +> +> > > > > > > > +> +> > > > > > > > 00000000fcc00000-00000000fcdfffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fcc00000-00000000fcdfffff +> +> > > > > > > > +> +> > > > > > > > 00000000fce00000-00000000fcffffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_pref_mem @pci_bridge_pci +> +> > > > > > > > 00000000fce00000-00000000fcffffff +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > After pci_bridge_update_mappingsï¼ +> +> > > > > > > > +> +> > > > > > > > 00000000fda00000-00000000fdbfffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fda00000-00000000fdbfffff +> +> > > > > > > > +> +> > > > > > > > 00000000fdc00000-00000000fddfffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fdc00000-00000000fddfffff +> +> > > > > > > > +> +> > > > > > > > 00000000fde00000-00000000fdffffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fde00000-00000000fdffffff +> +> > > > > > > > +> +> > > > > > > > 00000000fe000000-00000000fe1fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fe000000-00000000fe1fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fe200000-00000000fe3fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fe200000-00000000fe3fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fe400000-00000000fe5fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fe400000-00000000fe5fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fe600000-00000000fe7fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fe600000-00000000fe7fffff +> +> > > > > > > > +> +> > > > > > > > 00000000fe800000-00000000fe9fffff (prio 1, RW): +> +> > > > > > > > alias pci_bridge_mem @pci_bridge_pci +> +> > > > > > > > 00000000fe800000-00000000fe9fffff +> +> > > > > > > > +> +> > > > > > > > fffffffffc800000-fffffffffc800000 (prio 1, RW): +> +> > > > > > > > alias +> +> > > > > pci_bridge_pref_mem +> +> > > > > > > > @pci_bridge_pci fffffffffc800000-fffffffffc800000 <- +> +> > > > > > > > Exceptional +> +> > > Adress +> +> > > > > > > Space +> +> > > > > > > +> +> > > > > > > This one is empty though right? +> +> > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > We have figured out why this address becomes this value, +> +> > > > > > > > according to pci spec, pci driver can get BAR address +> +> > > > > > > > size by writing 0xffffffff to +> +> > > > > > > > +> +> > > > > > > > the pci register firstly, and then read back the value from +> +> > > > > > > > this +> +> register. +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > OK however as you show below the BAR being sized is the BAR +> +> > > > > > > if a bridge. Are you then adding a bridge device by hotplug? +> +> > > > > > +> +> > > > > > No, I just simply hot plugged a VFIO device to Bus 0, another +> +> > > > > > interesting phenomenon is If I hot plug the device to other +> +> > > > > > bus, this doesn't +> +> > > > > happened. +> +> > > > > > +> +> > > > > > > +> +> > > > > > > +> +> > > > > > > > We didn't handle this value specially while process pci +> +> > > > > > > > write in qemu, the function call stack is: +> +> > > > > > > > +> +> > > > > > > > Pci_bridge_dev_write_config +> +> > > > > > > > +> +> > > > > > > > -> pci_bridge_write_config +> +> > > > > > > > +> +> > > > > > > > -> pci_default_write_config (we update the config[address] +> +> > > > > > > > -> value here to +> +> > > > > > > > fffffffffc800000, which should be 0xfc800000 ) +> +> > > > > > > > +> +> > > > > > > > -> pci_bridge_update_mappings +> +> > > > > > > > +> +> > > > > > > > ->pci_bridge_region_del(br, br->windows); +> +> > > > > > > > +> +> > > > > > > > -> pci_bridge_region_init +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > -> pci_bridge_init_alias (here pci_bridge_get_base, we use +> +> > > > > > > > -> the +> +> > > > > > > > wrong value +> +> > > > > > > > fffffffffc800000) +> +> > > > > > > > +> +> > > > > > > > -> +> +> > > > > > > > memory_region_transaction_commit +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > So, as we can see, we use the wrong base address in qemu +> +> > > > > > > > to update the memory regions, though, we update the base +> +> > > > > > > > address to +> +> > > > > > > > +> +> > > > > > > > The correct value after pci driver in VM write the +> +> > > > > > > > original value back, the virtio NIC in bus 4 may still +> +> > > > > > > > sends net packets concurrently with +> +> > > > > > > > +> +> > > > > > > > The wrong memory region address. +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > +> +> > > > > > > > We have tried to skip the memory region update action in +> +> > > > > > > > qemu while detect pci write with 0xffffffff value, and it +> +> > > > > > > > does work, but +> +> > > > > > > > +> +> > > > > > > > This seems to be not gently. +> +> > > > > > > +> +> > > > > > > For sure. But I'm still puzzled as to why does Linux try to +> +> > > > > > > size the BAR of the bridge while a device behind it is used. +> +> > > > > > > +> +> > > > > > > Can you pls post your QEMU command line? +> +> > > > > > +> +> > > > > > My QEMU command line: +> +> > > > > > /root/xyd/qemu-system-x86_64 -name +> +> > > > > > guest=Linux,debug-threads=on -S -object +> +> > > > > > secret,id=masterKey0,format=raw,file=/var/run/libvirt/qemu/dom +> +> > > > > > ain- +> +> > > > > > 194- +> +> > > > > > Linux/master-key.aes -machine +> +> > > > > > pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu +> +> > > > > > host,+kvm_pv_eoi -bios /usr/share/OVMF/OVMF.fd -m +> +> > > > > > size=4194304k,slots=256,maxmem=33554432k -realtime mlock=off +> +> > > > > > -smp +> +> > > > > > 20,sockets=20,cores=1,threads=1 -numa +> +> > > > > > node,nodeid=0,cpus=0-4,mem=1024 -numa +> +> > > > > > node,nodeid=1,cpus=5-9,mem=1024 -numa +> +> > > > > > node,nodeid=2,cpus=10-14,mem=1024 -numa +> +> > > > > > node,nodeid=3,cpus=15-19,mem=1024 -uuid +> +> > > > > > 34a588c7-b0f2-4952-b39c-47fae3411439 -no-user-config +> +> > > > > > -nodefaults -chardev +> +> > > > > > socket,id=charmonitor,path=/var/run/libvirt/qemu/domain-194-Li +> +> > > > > > nux/ +> +> > > > > > moni +> +> > > > > > tor.sock,server,nowait -mon +> +> > > > > > chardev=charmonitor,id=monitor,mode=control -rtc base=utc +> +> > > > > > -no-hpet -global kvm-pit.lost_tick_policy=delay -no-shutdown +> +> > > > > > -boot strict=on -device +> +> > > > > > pci-bridge,chassis_nr=1,id=pci.1,bus=pci.0,addr=0x8 -device +> +> > > > > > pci-bridge,chassis_nr=2,id=pci.2,bus=pci.0,addr=0x9 -device +> +> > > > > > pci-bridge,chassis_nr=3,id=pci.3,bus=pci.0,addr=0xa -device +> +> > > > > > pci-bridge,chassis_nr=4,id=pci.4,bus=pci.0,addr=0xb -device +> +> > > > > > pci-bridge,chassis_nr=5,id=pci.5,bus=pci.0,addr=0xc -device +> +> > > > > > piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device +> +> > > > > > usb-ehci,id=usb1,bus=pci.0,addr=0x10 -device +> +> > > > > > nec-usb-xhci,id=usb2,bus=pci.0,addr=0x11 -device +> +> > > > > > virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device +> +> > > > > > virtio-scsi-pci,id=scsi1,bus=pci.0,addr=0x4 -device +> +> > > > > > virtio-scsi-pci,id=scsi2,bus=pci.0,addr=0x5 -device +> +> > > > > > virtio-scsi-pci,id=scsi3,bus=pci.0,addr=0x6 -device +> +> > > > > > virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x7 -drive +> +> > > > > > file=/mnt/sdb/xml/centos_74_x64_uefi.raw,format=raw,if=none,id +> +> > > > > > =dri +> +> > > > > > ve-v +> +> > > > > > irtio-disk0,cache=none -device +> +> > > > > > virtio-blk-pci,scsi=off,bus=pci.0,addr=0x2,drive=drive-virtio- +> +> > > > > > disk +> +> > > > > > 0,id +> +> > > > > > =virtio-disk0,bootindex=1 -drive +> +> > > > > > if=none,id=drive-ide0-1-1,readonly=on,cache=none -device +> +> > > > > > ide-cd,bus=ide.1,unit=1,drive=drive-ide0-1-1,id=ide0-1-1 +> +> > > > > > -netdev +> +> > > > > > tap,fd=35,id=hostnet0 -device +> +> > > > > > virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:89:5d:8b,b +> +> > > > > > us=p +> +> > > > > > ci.4 +> +> > > > > > ,addr=0x1 -chardev pty,id=charserial0 -device +> +> > > > > > isa-serial,chardev=charserial0,id=serial0 -device +> +> > > > > > usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:0 -device +> +> > > > > > cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x12 -device +> +> > > > > > virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xd -msg +> +> > > > > > timestamp=on +> +> > > > > > +> +> > > > > > I am also very curious about this issue, in the linux kernel +> +> > > > > > code, maybe double +> +> > > > > check in function pci_bridge_check_ranges triggered this problem. +> +> > > > > +> +> > > > > If you can get the stacktrace in Linux when it tries to write +> +> > > > > this fffff value, that would be quite helpful. +> +> > > > > +> +> > > > +> +> > > > After I add mdelay(100) in function pci_bridge_check_ranges, this +> +> > > > phenomenon is easier to reproduce, below is my modify in kernel: +> +> > > > diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c +> +> > > > index cb389277..86e232d 100644 +> +> > > > --- a/drivers/pci/setup-bus.c +> +> > > > +++ b/drivers/pci/setup-bus.c +> +> > > > @@ -27,7 +27,7 @@ +> +> > > > #include <linux/slab.h> +> +> > > > #include <linux/acpi.h> +> +> > > > #include "pci.h" +> +> > > > - +> +> > > > +#include <linux/delay.h> +> +> > > > unsigned int pci_flags; +> +> > > > +> +> > > > struct pci_dev_resource { +> +> > > > @@ -787,6 +787,9 @@ static void pci_bridge_check_ranges(struct +> +> > > > pci_bus +> +> > > *bus) +> +> > > > pci_write_config_dword(bridge, PCI_PREF_BASE_UPPER32, +> +> > > > 0xffffffff); +> +> > > > pci_read_config_dword(bridge, +> +> > > > PCI_PREF_BASE_UPPER32, &tmp); +> +> > > > + mdelay(100); +> +> > > > + printk(KERN_ERR "sleep\n"); +> +> > > > + dump_stack(); +> +> > > > if (!tmp) +> +> > > > b_res[2].flags &= ~IORESOURCE_MEM_64; +> +> > > > pci_write_config_dword(bridge, +> +> > > > PCI_PREF_BASE_UPPER32, +> +> > > > +> +> > > +> +> > > OK! +> +> > > I just sent a Linux patch that should help. +> +> > > I would appreciate it if you will give it a try and if that helps +> +> > > reply to it with a +> +> > > Tested-by: tag. +> +> > > +> +> > +> +> > I tested this patch and it works fine on my machine. +> +> > +> +> > But I have another question, if we only fix this problem in the +> +> > kernel, the Linux version that has been released does not work well on the +> +> virtualization platform. +> +> > Is there a way to fix this problem in the backend? +> +> +> +> There could we a way to work around this. +> +> Does below help? +> +> +I am sorry to tell you, I tested this patch and it doesn't work fine. +> +> +> +> +> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index +> +> 236a20eaa8..7834cac4b0 100644 +> +> --- a/hw/i386/acpi-build.c +> +> +++ b/hw/i386/acpi-build.c +> +> @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml +> +> *parent_scope, PCIBus *bus, +> +> +> +> aml_append(method, aml_store(aml_int(bsel_val), aml_name("BNUM"))); +> +> aml_append(method, +> +> - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device Check +> +> */) +> +> + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* Device +> +> + Check Light */) +> +> ); +> +> aml_append(method, +> +> aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject Request +> +> */) +Oh I see, another bug: + + case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: + acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT +event\n"); + /* TBD: Exactly what does 'light' mean? */ + break; + +And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type) +and friends all just ignore this event type. + + + +-- +MST + +> +> > > > > > > > > Hi all, +> +> > > > > > > > > +> +> > > > > > > > > +> +> > > > > > > > > +> +> > > > > > > > > In our test, we configured VM with several pci-bridges +> +> > > > > > > > > and a virtio-net nic been attached with bus 4, +> +> > > > > > > > > +> +> > > > > > > > > After VM is startup, We ping this nic from host to +> +> > > > > > > > > judge if it is working normally. Then, we hot add pci +> +> > > > > > > > > devices to this VM with bus +> +> > > > 0. +> +> > > > > > > > > +> +> > > > > > > > > We found the virtio-net NIC in bus 4 is not working +> +> > > > > > > > > (can not +> +> > > > > > > > > connect) occasionally, as it kick virtio backend +> +> > > > > > > > > failure with error +> +> > > But I have another question, if we only fix this problem in the +> +> > > kernel, the Linux version that has been released does not work +> +> > > well on the +> +> > virtualization platform. +> +> > > Is there a way to fix this problem in the backend? +> +> > +> +> > There could we a way to work around this. +> +> > Does below help? +> +> +> +> I am sorry to tell you, I tested this patch and it doesn't work fine. +> +> +> +> > +> +> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index +> +> > 236a20eaa8..7834cac4b0 100644 +> +> > --- a/hw/i386/acpi-build.c +> +> > +++ b/hw/i386/acpi-build.c +> +> > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml +> +> > *parent_scope, PCIBus *bus, +> +> > +> +> > aml_append(method, aml_store(aml_int(bsel_val), +> +aml_name("BNUM"))); +> +> > aml_append(method, +> +> > - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device +> +> > Check +> +*/) +> +> > + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* +> +> > + Device Check Light */) +> +> > ); +> +> > aml_append(method, +> +> > aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject +> +> > Request */) +> +> +> +Oh I see, another bug: +> +> +case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: +> +acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT +> +event\n"); +> +/* TBD: Exactly what does 'light' mean? */ +> +break; +> +> +And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type) +> +and friends all just ignore this event type. +> +> +> +> +-- +> +MST +Hi Michael, + +If we want to fix this problem on the backend, it is not enough to consider +only PCI +device hot plugging, because I found that if we use a command like +"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to reproduce. + +From the perspective of device emulation, when guest writes 0xffffffff to the +BAR, +guest just want to get the size of the region but not really updating the +address space. +So I made the following patch to avoid update pci mapping. + +Do you think this make sense? + +[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR + +When guest writes 0xffffffff to the BAR, guest just want to get the size of the +region +but not really updating the address space. +So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings +or pci_bridge_update_mappings. + +Signed-off-by: xuyandong <address@hidden> +--- + hw/pci/pci.c | 6 ++++-- + hw/pci/pci_bridge.c | 8 +++++--- + 2 files changed, 9 insertions(+), 5 deletions(-) + +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index 56b13b3..ef368e1 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t +addr, uint32_t val_in, int + { + int i, was_irq_disabled = pci_irq_disabled(d); + uint32_t val = val_in; ++ uint64_t barmask = (1 << l*8) - 1; + + for (i = 0; i < l; val >>= 8, ++i) { + uint8_t wmask = d->wmask[addr + i]; +@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t +addr, uint32_t val_in, int + d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); + d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ + } +- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || ++ if ((val_in != barmask && ++ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || + ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || +- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || ++ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || + range_covers_byte(addr, l, PCI_COMMAND)) + pci_update_mappings(d); + +diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +index ee9dff2..f2bad79 100644 +--- a/hw/pci/pci_bridge.c ++++ b/hw/pci/pci_bridge.c +@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, + PCIBridge *s = PCI_BRIDGE(d); + uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); + uint16_t newctl; ++ uint64_t barmask = (1 << len * 8) - 1; + + pci_default_write_config(d, address, val, len); + + if (ranges_overlap(address, len, PCI_COMMAND, 2) || + +- /* io base/limit */ +- ranges_overlap(address, len, PCI_IO_BASE, 2) || ++ (val != barmask && ++ /* io base/limit */ ++ (ranges_overlap(address, len, PCI_IO_BASE, 2) || + + /* memory base/limit, prefetchable base/limit and + io base/limit upper 16 */ +- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || ++ ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || + + /* vga enable */ + ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +-- +1.8.3.1 + +On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote: +> +> > > > > > > > > > Hi all, +> +> > > > > > > > > > +> +> > > > > > > > > > +> +> > > > > > > > > > +> +> > > > > > > > > > In our test, we configured VM with several pci-bridges +> +> > > > > > > > > > and a virtio-net nic been attached with bus 4, +> +> > > > > > > > > > +> +> > > > > > > > > > After VM is startup, We ping this nic from host to +> +> > > > > > > > > > judge if it is working normally. Then, we hot add pci +> +> > > > > > > > > > devices to this VM with bus +> +> > > > > 0. +> +> > > > > > > > > > +> +> > > > > > > > > > We found the virtio-net NIC in bus 4 is not working +> +> > > > > > > > > > (can not +> +> > > > > > > > > > connect) occasionally, as it kick virtio backend +> +> > > > > > > > > > failure with error +> +> +> > > > But I have another question, if we only fix this problem in the +> +> > > > kernel, the Linux version that has been released does not work +> +> > > > well on the +> +> > > virtualization platform. +> +> > > > Is there a way to fix this problem in the backend? +> +> > > +> +> > > There could we a way to work around this. +> +> > > Does below help? +> +> > +> +> > I am sorry to tell you, I tested this patch and it doesn't work fine. +> +> > +> +> > > +> +> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index +> +> > > 236a20eaa8..7834cac4b0 100644 +> +> > > --- a/hw/i386/acpi-build.c +> +> > > +++ b/hw/i386/acpi-build.c +> +> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml +> +> > > *parent_scope, PCIBus *bus, +> +> > > +> +> > > aml_append(method, aml_store(aml_int(bsel_val), +> +> aml_name("BNUM"))); +> +> > > aml_append(method, +> +> > > - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device +> +> > > Check +> +> */) +> +> > > + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* +> +> > > + Device Check Light */) +> +> > > ); +> +> > > aml_append(method, +> +> > > aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* Eject +> +> > > Request */) +> +> +> +> +> +> Oh I see, another bug: +> +> +> +> case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: +> +> acpi_handle_debug(handle, "ACPI_NOTIFY_DEVICE_CHECK_LIGHT +> +> event\n"); +> +> /* TBD: Exactly what does 'light' mean? */ +> +> break; +> +> +> +> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 type) +> +> and friends all just ignore this event type. +> +> +> +> +> +> +> +> -- +> +> MST +> +> +Hi Michael, +> +> +If we want to fix this problem on the backend, it is not enough to consider +> +only PCI +> +device hot plugging, because I found that if we use a command like +> +"echo 1 > /sys/bus/pci/rescan" in guest, this problem is very easy to +> +reproduce. +> +> +From the perspective of device emulation, when guest writes 0xffffffff to the +> +BAR, +> +guest just want to get the size of the region but not really updating the +> +address space. +> +So I made the following patch to avoid update pci mapping. +> +> +Do you think this make sense? +> +> +[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR +> +> +When guest writes 0xffffffff to the BAR, guest just want to get the size of +> +the region +> +but not really updating the address space. +> +So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings +> +or pci_bridge_update_mappings. +> +> +Signed-off-by: xuyandong <address@hidden> +I see how that will address the common case however there are a bunch of +issues here. First of all it's easy to trigger the update by some other +action like VM migration. More importantly it's just possible that +guest actually does want to set the low 32 bit of the address to all +ones. For example, that is clearly listed as a way to disable all +devices behind the bridge in the pci to pci bridge spec. + +Given upstream is dragging it's feet I'm open to adding a flag +that will help keep guests going as a temporary measure. +We will need to think about ways to restrict this as much as +we can. + + +> +--- +> +hw/pci/pci.c | 6 ++++-- +> +hw/pci/pci_bridge.c | 8 +++++--- +> +2 files changed, 9 insertions(+), 5 deletions(-) +> +> +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +> +index 56b13b3..ef368e1 100644 +> +--- a/hw/pci/pci.c +> ++++ b/hw/pci/pci.c +> +@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t +> +addr, uint32_t val_in, int +> +{ +> +int i, was_irq_disabled = pci_irq_disabled(d); +> +uint32_t val = val_in; +> ++ uint64_t barmask = (1 << l*8) - 1; +> +> +for (i = 0; i < l; val >>= 8, ++i) { +> +uint8_t wmask = d->wmask[addr + i]; +> +@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t +> +addr, uint32_t val_in, int +> +d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); +> +d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ +> +} +> +- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> ++ if ((val_in != barmask && +> ++ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> +ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || +> +- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || +> ++ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || +> +range_covers_byte(addr, l, PCI_COMMAND)) +> +pci_update_mappings(d); +> +> +diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +> +index ee9dff2..f2bad79 100644 +> +--- a/hw/pci/pci_bridge.c +> ++++ b/hw/pci/pci_bridge.c +> +@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, +> +PCIBridge *s = PCI_BRIDGE(d); +> +uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); +> +uint16_t newctl; +> ++ uint64_t barmask = (1 << len * 8) - 1; +> +> +pci_default_write_config(d, address, val, len); +> +> +if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> +- /* io base/limit */ +> +- ranges_overlap(address, len, PCI_IO_BASE, 2) || +> ++ (val != barmask && +> ++ /* io base/limit */ +> ++ (ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> +/* memory base/limit, prefetchable base/limit and +> +io base/limit upper 16 */ +> +- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> ++ ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || +> +> +/* vga enable */ +> +ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +-- +> +1.8.3.1 +> +> +> + +> +-----Original Message----- +> +From: Michael S. Tsirkin [ +mailto:address@hidden +> +Sent: Monday, January 07, 2019 11:06 PM +> +To: xuyandong <address@hidden> +> +Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu- +> +address@hidden; Zhanghailiang <address@hidden>; +> +wangxin (U) <address@hidden>; Huangweidong (C) +> +<address@hidden> +> +Subject: Re: [BUG]Unassigned mem write during pci device hot-plug +> +> +On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote: +> +> > > > > > > > > > > Hi all, +> +> > > > > > > > > > > +> +> > > > > > > > > > > +> +> > > > > > > > > > > +> +> > > > > > > > > > > In our test, we configured VM with several +> +> > > > > > > > > > > pci-bridges and a virtio-net nic been attached +> +> > > > > > > > > > > with bus 4, +> +> > > > > > > > > > > +> +> > > > > > > > > > > After VM is startup, We ping this nic from host to +> +> > > > > > > > > > > judge if it is working normally. Then, we hot add +> +> > > > > > > > > > > pci devices to this VM with bus +> +> > > > > > 0. +> +> > > > > > > > > > > +> +> > > > > > > > > > > We found the virtio-net NIC in bus 4 is not +> +> > > > > > > > > > > working (can not +> +> > > > > > > > > > > connect) occasionally, as it kick virtio backend +> +> > > > > > > > > > > failure with error +> +> +> +> > > > > But I have another question, if we only fix this problem in +> +> > > > > the kernel, the Linux version that has been released does not +> +> > > > > work well on the +> +> > > > virtualization platform. +> +> > > > > Is there a way to fix this problem in the backend? +> +> +> +> Hi Michael, +> +> +> +> If we want to fix this problem on the backend, it is not enough to +> +> consider only PCI device hot plugging, because I found that if we use +> +> a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is very +> +easy to reproduce. +> +> +> +> From the perspective of device emulation, when guest writes 0xffffffff +> +> to the BAR, guest just want to get the size of the region but not really +> +updating the address space. +> +> So I made the following patch to avoid update pci mapping. +> +> +> +> Do you think this make sense? +> +> +> +> [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR +> +> +> +> When guest writes 0xffffffff to the BAR, guest just want to get the +> +> size of the region but not really updating the address space. +> +> So when guest writes 0xffffffff to BAR, we need avoid +> +> pci_update_mappings or pci_bridge_update_mappings. +> +> +> +> Signed-off-by: xuyandong <address@hidden> +> +> +I see how that will address the common case however there are a bunch of +> +issues here. First of all it's easy to trigger the update by some other +> +action like +> +VM migration. More importantly it's just possible that guest actually does +> +want +> +to set the low 32 bit of the address to all ones. For example, that is +> +clearly +> +listed as a way to disable all devices behind the bridge in the pci to pci +> +bridge +> +spec. +Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable +Base Upper 32 Bits +to meet the kernel double check problem. +Do you think there is still risk? + +> +> +Given upstream is dragging it's feet I'm open to adding a flag that will help +> +keep guests going as a temporary measure. +> +We will need to think about ways to restrict this as much as we can. +> +> +> +> --- +> +> hw/pci/pci.c | 6 ++++-- +> +> hw/pci/pci_bridge.c | 8 +++++--- +> +> 2 files changed, 9 insertions(+), 5 deletions(-) +> +> +> +> diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644 +> +> --- a/hw/pci/pci.c +> +> +++ b/hw/pci/pci.c +> +> @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, +> +> uint32_t addr, uint32_t val_in, int { +> +> int i, was_irq_disabled = pci_irq_disabled(d); +> +> uint32_t val = val_in; +> +> + uint64_t barmask = (1 << l*8) - 1; +> +> +> +> for (i = 0; i < l; val >>= 8, ++i) { +> +> uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ +> +> void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, +> +int +> +> d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & +> +> wmask); +> +> d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear +> +> */ +> +> } +> +> - if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> +> + if ((val_in != barmask && +> +> + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> +> ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || +> +> - ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || +> +> + ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || +> +> range_covers_byte(addr, l, PCI_COMMAND)) +> +> pci_update_mappings(d); +> +> +> +> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index +> +> ee9dff2..f2bad79 100644 +> +> --- a/hw/pci/pci_bridge.c +> +> +++ b/hw/pci/pci_bridge.c +> +> @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, +> +> PCIBridge *s = PCI_BRIDGE(d); +> +> uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); +> +> uint16_t newctl; +> +> + uint64_t barmask = (1 << len * 8) - 1; +> +> +> +> pci_default_write_config(d, address, val, len); +> +> +> +> if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> +> +> - /* io base/limit */ +> +> - ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> + (val != barmask && +> +> + /* io base/limit */ +> +> + (ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> +> +> /* memory base/limit, prefetchable base/limit and +> +> io base/limit upper 16 */ +> +> - ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> +> + ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || +> +> +> +> /* vga enable */ +> +> ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +> -- +> +> 1.8.3.1 +> +> +> +> +> +> + +On Mon, Jan 07, 2019 at 03:28:36PM +0000, xuyandong wrote: +> +> +> +> -----Original Message----- +> +> From: Michael S. Tsirkin [ +mailto:address@hidden +> +> Sent: Monday, January 07, 2019 11:06 PM +> +> To: xuyandong <address@hidden> +> +> Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu- +> +> address@hidden; Zhanghailiang <address@hidden>; +> +> wangxin (U) <address@hidden>; Huangweidong (C) +> +> <address@hidden> +> +> Subject: Re: [BUG]Unassigned mem write during pci device hot-plug +> +> +> +> On Mon, Jan 07, 2019 at 02:37:17PM +0000, xuyandong wrote: +> +> > > > > > > > > > > > Hi all, +> +> > > > > > > > > > > > +> +> > > > > > > > > > > > +> +> > > > > > > > > > > > +> +> > > > > > > > > > > > In our test, we configured VM with several +> +> > > > > > > > > > > > pci-bridges and a virtio-net nic been attached +> +> > > > > > > > > > > > with bus 4, +> +> > > > > > > > > > > > +> +> > > > > > > > > > > > After VM is startup, We ping this nic from host to +> +> > > > > > > > > > > > judge if it is working normally. Then, we hot add +> +> > > > > > > > > > > > pci devices to this VM with bus +> +> > > > > > > 0. +> +> > > > > > > > > > > > +> +> > > > > > > > > > > > We found the virtio-net NIC in bus 4 is not +> +> > > > > > > > > > > > working (can not +> +> > > > > > > > > > > > connect) occasionally, as it kick virtio backend +> +> > > > > > > > > > > > failure with error +> +> > +> +> > > > > > But I have another question, if we only fix this problem in +> +> > > > > > the kernel, the Linux version that has been released does not +> +> > > > > > work well on the +> +> > > > > virtualization platform. +> +> > > > > > Is there a way to fix this problem in the backend? +> +> > +> +> > Hi Michael, +> +> > +> +> > If we want to fix this problem on the backend, it is not enough to +> +> > consider only PCI device hot plugging, because I found that if we use +> +> > a command like "echo 1 > /sys/bus/pci/rescan" in guest, this problem is +> +> > very +> +> easy to reproduce. +> +> > +> +> > From the perspective of device emulation, when guest writes 0xffffffff +> +> > to the BAR, guest just want to get the size of the region but not really +> +> updating the address space. +> +> > So I made the following patch to avoid update pci mapping. +> +> > +> +> > Do you think this make sense? +> +> > +> +> > [PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR +> +> > +> +> > When guest writes 0xffffffff to the BAR, guest just want to get the +> +> > size of the region but not really updating the address space. +> +> > So when guest writes 0xffffffff to BAR, we need avoid +> +> > pci_update_mappings or pci_bridge_update_mappings. +> +> > +> +> > Signed-off-by: xuyandong <address@hidden> +> +> +> +> I see how that will address the common case however there are a bunch of +> +> issues here. First of all it's easy to trigger the update by some other +> +> action like +> +> VM migration. More importantly it's just possible that guest actually does +> +> want +> +> to set the low 32 bit of the address to all ones. For example, that is +> +> clearly +> +> listed as a way to disable all devices behind the bridge in the pci to pci +> +> bridge +> +> spec. +> +> +Ok, I see. If I only skip upate when guest writing 0xFFFFFFFF to Prefetcable +> +Base Upper 32 Bits +> +to meet the kernel double check problem. +> +Do you think there is still risk? +Well it's non zero since spec says such a write should disable all +accesses. Just an idea: why not add an option to disable upper 32 bit? +That is ugly and limits space but spec compliant. + +> +> +> +> Given upstream is dragging it's feet I'm open to adding a flag that will +> +> help +> +> keep guests going as a temporary measure. +> +> We will need to think about ways to restrict this as much as we can. +> +> +> +> +> +> > --- +> +> > hw/pci/pci.c | 6 ++++-- +> +> > hw/pci/pci_bridge.c | 8 +++++--- +> +> > 2 files changed, 9 insertions(+), 5 deletions(-) +> +> > +> +> > diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644 +> +> > --- a/hw/pci/pci.c +> +> > +++ b/hw/pci/pci.c +> +> > @@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, +> +> > uint32_t addr, uint32_t val_in, int { +> +> > int i, was_irq_disabled = pci_irq_disabled(d); +> +> > uint32_t val = val_in; +> +> > + uint64_t barmask = (1 << l*8) - 1; +> +> > +> +> > for (i = 0; i < l; val >>= 8, ++i) { +> +> > uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ +> +> > void pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t +> +> > val_in, +> +> int +> +> > d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & +> +> > wmask); +> +> > d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to +> +> > Clear */ +> +> > } +> +> > - if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> +> > + if ((val_in != barmask && +> +> > + (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> +> > ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || +> +> > - ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || +> +> > + ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || +> +> > range_covers_byte(addr, l, PCI_COMMAND)) +> +> > pci_update_mappings(d); +> +> > +> +> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index +> +> > ee9dff2..f2bad79 100644 +> +> > --- a/hw/pci/pci_bridge.c +> +> > +++ b/hw/pci/pci_bridge.c +> +> > @@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, +> +> > PCIBridge *s = PCI_BRIDGE(d); +> +> > uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); +> +> > uint16_t newctl; +> +> > + uint64_t barmask = (1 << len * 8) - 1; +> +> > +> +> > pci_default_write_config(d, address, val, len); +> +> > +> +> > if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> > +> +> > - /* io base/limit */ +> +> > - ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> > + (val != barmask && +> +> > + /* io base/limit */ +> +> > + (ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> > +> +> > /* memory base/limit, prefetchable base/limit and +> +> > io base/limit upper 16 */ +> +> > - ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> +> > + ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || +> +> > +> +> > /* vga enable */ +> +> > ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +> > -- +> +> > 1.8.3.1 +> +> > +> +> > +> +> > + +> +-----Original Message----- +> +From: xuyandong +> +Sent: Monday, January 07, 2019 10:37 PM +> +To: 'Michael S. Tsirkin' <address@hidden> +> +Cc: address@hidden; Paolo Bonzini <address@hidden>; qemu- +> +address@hidden; Zhanghailiang <address@hidden>; +> +wangxin (U) <address@hidden>; Huangweidong (C) +> +<address@hidden> +> +Subject: RE: [BUG]Unassigned mem write during pci device hot-plug +> +> +> > > > > > > > > > Hi all, +> +> > > > > > > > > > +> +> > > > > > > > > > +> +> > > > > > > > > > +> +> > > > > > > > > > In our test, we configured VM with several +> +> > > > > > > > > > pci-bridges and a virtio-net nic been attached with +> +> > > > > > > > > > bus 4, +> +> > > > > > > > > > +> +> > > > > > > > > > After VM is startup, We ping this nic from host to +> +> > > > > > > > > > judge if it is working normally. Then, we hot add +> +> > > > > > > > > > pci devices to this VM with bus +> +> > > > > 0. +> +> > > > > > > > > > +> +> > > > > > > > > > We found the virtio-net NIC in bus 4 is not working +> +> > > > > > > > > > (can not +> +> > > > > > > > > > connect) occasionally, as it kick virtio backend +> +> > > > > > > > > > failure with error +> +> +> > > > But I have another question, if we only fix this problem in the +> +> > > > kernel, the Linux version that has been released does not work +> +> > > > well on the +> +> > > virtualization platform. +> +> > > > Is there a way to fix this problem in the backend? +> +> > > +> +> > > There could we a way to work around this. +> +> > > Does below help? +> +> > +> +> > I am sorry to tell you, I tested this patch and it doesn't work fine. +> +> > +> +> > > +> +> > > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index +> +> > > 236a20eaa8..7834cac4b0 100644 +> +> > > --- a/hw/i386/acpi-build.c +> +> > > +++ b/hw/i386/acpi-build.c +> +> > > @@ -551,7 +551,7 @@ static void build_append_pci_bus_devices(Aml +> +> > > *parent_scope, PCIBus *bus, +> +> > > +> +> > > aml_append(method, aml_store(aml_int(bsel_val), +> +> aml_name("BNUM"))); +> +> > > aml_append(method, +> +> > > - aml_call2("DVNT", aml_name("PCIU"), aml_int(1) /* Device +> +Check +> +> */) +> +> > > + aml_call2("DVNT", aml_name("PCIU"), aml_int(4) /* +> +> > > + Device Check Light */) +> +> > > ); +> +> > > aml_append(method, +> +> > > aml_call2("DVNT", aml_name("PCID"), aml_int(3)/* +> +> > > Eject Request */) +> +> +> +> +> +> Oh I see, another bug: +> +> +> +> case ACPI_NOTIFY_DEVICE_CHECK_LIGHT: +> +> acpi_handle_debug(handle, +> +> "ACPI_NOTIFY_DEVICE_CHECK_LIGHT event\n"); +> +> /* TBD: Exactly what does 'light' mean? */ +> +> break; +> +> +> +> And then e.g. acpi_generic_hotplug_event(struct acpi_device *adev, u32 +> +> type) and friends all just ignore this event type. +> +> +> +> +> +> +> +> -- +> +> MST +> +> +Hi Michael, +> +> +If we want to fix this problem on the backend, it is not enough to consider +> +only +> +PCI device hot plugging, because I found that if we use a command like "echo +> +1 > +> +/sys/bus/pci/rescan" in guest, this problem is very easy to reproduce. +> +> +From the perspective of device emulation, when guest writes 0xffffffff to the +> +BAR, guest just want to get the size of the region but not really updating the +> +address space. +> +So I made the following patch to avoid update pci mapping. +> +> +Do you think this make sense? +> +> +[PATCH] pci: avoid update pci mapping when writing 0xFFFF FFFF to BAR +> +> +When guest writes 0xffffffff to the BAR, guest just want to get the size of +> +the +> +region but not really updating the address space. +> +So when guest writes 0xffffffff to BAR, we need avoid pci_update_mappings or +> +pci_bridge_update_mappings. +> +> +Signed-off-by: xuyandong <address@hidden> +> +--- +> +hw/pci/pci.c | 6 ++++-- +> +hw/pci/pci_bridge.c | 8 +++++--- +> +2 files changed, 9 insertions(+), 5 deletions(-) +> +> +diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 56b13b3..ef368e1 100644 +> +--- a/hw/pci/pci.c +> ++++ b/hw/pci/pci.c +> +@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t +> +addr, uint32_t val_in, int { +> +int i, was_irq_disabled = pci_irq_disabled(d); +> +uint32_t val = val_in; +> ++ uint64_t barmask = (1 << l*8) - 1; +> +> +for (i = 0; i < l; val >>= 8, ++i) { +> +uint8_t wmask = d->wmask[addr + i]; @@ -1369,9 +1370,10 @@ void +> +pci_default_write_config(PCIDevice *d, uint32_t addr, uint32_t val_in, int +> +d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); +> +d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ +> +} +> +- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> ++ if ((val_in != barmask && +> ++ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || +> +ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || +> +- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || +> ++ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || +> +range_covers_byte(addr, l, PCI_COMMAND)) +> +pci_update_mappings(d); +> +> +diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c index ee9dff2..f2bad79 +> +100644 +> +--- a/hw/pci/pci_bridge.c +> ++++ b/hw/pci/pci_bridge.c +> +@@ -253,17 +253,19 @@ void pci_bridge_write_config(PCIDevice *d, +> +PCIBridge *s = PCI_BRIDGE(d); +> +uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); +> +uint16_t newctl; +> ++ uint64_t barmask = (1 << len * 8) - 1; +> +> +pci_default_write_config(d, address, val, len); +> +> +if (ranges_overlap(address, len, PCI_COMMAND, 2) || +> +> +- /* io base/limit */ +> +- ranges_overlap(address, len, PCI_IO_BASE, 2) || +> ++ (val != barmask && +> ++ /* io base/limit */ +> ++ (ranges_overlap(address, len, PCI_IO_BASE, 2) || +> +> +/* memory base/limit, prefetchable base/limit and +> +io base/limit upper 16 */ +> +- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || +> ++ ranges_overlap(address, len, PCI_MEMORY_BASE, 20))) || +> +> +/* vga enable */ +> +ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { +> +-- +> +1.8.3.1 +> +> +Sorry, please ignore the patch above. + +Here is the patch I want to post: + +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index 56b13b3..38a300f 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -1361,6 +1361,7 @@ void pci_default_write_config(PCIDevice *d, uint32_t +addr, uint32_t val_in, int + { + int i, was_irq_disabled = pci_irq_disabled(d); + uint32_t val = val_in; ++ uint64_t barmask = ((uint64_t)1 << l*8) - 1; + + for (i = 0; i < l; val >>= 8, ++i) { + uint8_t wmask = d->wmask[addr + i]; +@@ -1369,9 +1370,10 @@ void pci_default_write_config(PCIDevice *d, uint32_t +addr, uint32_t val_in, int + d->config[addr + i] = (d->config[addr + i] & ~wmask) | (val & wmask); + d->config[addr + i] &= ~(val & w1cmask); /* W1C: Write 1 to Clear */ + } +- if (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || ++ if ((val_in != barmask && ++ (ranges_overlap(addr, l, PCI_BASE_ADDRESS_0, 24) || + ranges_overlap(addr, l, PCI_ROM_ADDRESS, 4) || +- ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4) || ++ ranges_overlap(addr, l, PCI_ROM_ADDRESS1, 4))) || + range_covers_byte(addr, l, PCI_COMMAND)) + pci_update_mappings(d); + +diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c +index ee9dff2..b8f7d48 100644 +--- a/hw/pci/pci_bridge.c ++++ b/hw/pci/pci_bridge.c +@@ -253,20 +253,22 @@ void pci_bridge_write_config(PCIDevice *d, + PCIBridge *s = PCI_BRIDGE(d); + uint16_t oldctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL); + uint16_t newctl; ++ uint64_t barmask = ((uint64_t)1 << len * 8) - 1; + + pci_default_write_config(d, address, val, len); + + if (ranges_overlap(address, len, PCI_COMMAND, 2) || + +- /* io base/limit */ +- ranges_overlap(address, len, PCI_IO_BASE, 2) || ++ /* vga enable */ ++ ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2) || + +- /* memory base/limit, prefetchable base/limit and +- io base/limit upper 16 */ +- ranges_overlap(address, len, PCI_MEMORY_BASE, 20) || ++ (val != barmask && ++ /* io base/limit */ ++ (ranges_overlap(address, len, PCI_IO_BASE, 2) || + +- /* vga enable */ +- ranges_overlap(address, len, PCI_BRIDGE_CONTROL, 2)) { ++ /* memory base/limit, prefetchable base/limit and ++ io base/limit upper 16 */ ++ ranges_overlap(address, len, PCI_MEMORY_BASE, 20)))) { + pci_bridge_update_mappings(s); + } + +-- +1.8.3.1 + diff --git a/classification_output/01/other/1067127 b/classification_output/01/other/1067127 new file mode 100644 index 000000000..5ebbf9849 --- /dev/null +++ b/classification_output/01/other/1067127 @@ -0,0 +1,154 @@ +other: 0.901 +semantic: 0.846 +instruction: 0.845 +mistranslation: 0.781 + +[Qemu-devel] [BUG] qemu stuck when detach host-usb device + +Description of problem: +The guest has a host-usb device(Kingston Technology DataTraveler 100 G3/G4/SE9 +G2), which is attached +to xhci controller(on host). Qemu will stuck if I detach it from guest. + +How reproducible: +100% + +Steps to Reproduce: +1. Use usb stick to copy files in guest , make it busy working. +2. virsh detach-device vm_name usb.xml + +Then qemu will stuck for 20s, I found this is because libusb_release_interface +block for 20s. +Dmesg prints: + +[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. +[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. +[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. +[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. + +Is this a hardware error or software's bug? + +On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote: +> +Description of problem: +> +The guest has a host-usb device(Kingston Technology DataTraveler 100 +> +G3/G4/SE9 G2), which is attached +> +to xhci controller(on host). Qemu will stuck if I detach it from guest. +> +> +How reproducible: +> +100% +> +> +Steps to Reproduce: +> +1. Use usb stick to copy files in guest , make it busy working. +> +2. virsh detach-device vm_name usb.xml +> +> +Then qemu will stuck for 20s, I found this is because +> +libusb_release_interface block for 20s. +> +Dmesg prints: +> +> +[35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. +> +[35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. +> +[35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. +> +[35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. +> +> +Is this a hardware error or software's bug? +I'd guess software error, could be is libusb or (host) linux kernel. +Cc'ing libusb-devel. + +cheers, + Gerd + +> +-----Original Message----- +> +From: Gerd Hoffmann [ +mailto:address@hidden +> +Sent: Tuesday, November 27, 2018 2:09 PM +> +To: linzhecheng <address@hidden> +> +Cc: address@hidden; wangxin (U) <address@hidden>; +> +Zhoujian (jay) <address@hidden>; address@hidden +> +Subject: Re: [Qemu-devel] [BUG] qemu stuck when detach host-usb device +> +> +On Tue, Nov 27, 2018 at 01:26:24AM +0000, linzhecheng wrote: +> +> Description of problem: +> +> The guest has a host-usb device(Kingston Technology DataTraveler 100 +> +> G3/G4/SE9 G2), which is attached to xhci controller(on host). Qemu will +> +> stuck +> +if I detach it from guest. +> +> +> +> How reproducible: +> +> 100% +> +> +> +> Steps to Reproduce: +> +> 1. Use usb stick to copy files in guest , make it busy working. +> +> 2. virsh detach-device vm_name usb.xml +> +> +> +> Then qemu will stuck for 20s, I found this is because +> +> libusb_release_interface +> +block for 20s. +> +> Dmesg prints: +> +> +> +> [35442.034861] usb 4-2.1: Disable of device-initiated U1 failed. +> +> [35447.034993] usb 4-2.1: Disable of device-initiated U2 failed. +> +> [35452.035131] usb 4-2.1: Set SEL for device-initiated U1 failed. +> +> [35457.035259] usb 4-2.1: Set SEL for device-initiated U2 failed. +> +> +> +> Is this a hardware error or software's bug? +> +> +I'd guess software error, could be is libusb or (host) linux kernel. +> +Cc'ing libusb-devel. +Perhaps it's usb driver's bug. Could you also reproduce it? +> +> +cheers, +> +Gerd + diff --git a/classification_output/01/other/1195866 b/classification_output/01/other/1195866 new file mode 100644 index 000000000..e91a98df4 --- /dev/null +++ b/classification_output/01/other/1195866 @@ -0,0 +1,242 @@ +other: 0.788 +semantic: 0.774 +mistranslation: 0.719 +instruction: 0.661 + +[Qemu-devel] [BUG] qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. + +Hello all, +I wanted to submit a bug report in the tracker, but it seem to require +an Ubuntu One account, which I'm having trouble with, so I'll just +give it here and hopefully somebody can make use of it. The issue +seems to be in an experimental format, so it's likely not very +consequential anyway. + +For the sake of anyone else simply googling for a workaround, I'll +just paste in the (cleaned up) brief IRC conversation about my issue +from the official channel: +<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, +Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine +(FreeBSD-11.1). The VM always aborts at the same point in the +installation (downloading 'ports.tgz') with the following error +message: +"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: +qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. +zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 +-enable-kvm -hda freebsd/freebsd.qed -devic" +The commands I ran to create the machine are as follows: +"qemu-img create -f qed freebsd/freebsd.qed 16G" +"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda +freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 +-cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" +I tried adding logging options with the -d flag, but I didn't get +anything that seemed relevant, since I'm not sure what to look for. +<stsquad> ohh what's a qed device? +<stsquad> quy: it might be a workaround to use a qcow2 image for now +<stsquad> ahh the wiki has a statement "It is not recommended to use +QED for any new images. " +<danpb> 'qed' was an experimental disk image format created by IBM +before qcow2 v3 came along +<danpb> honestly nothing should ever use QED these days +<danpb> the good ideas from QED became qcow2v3 +<stsquad> danpb: sounds like we should put a warning on the option to +remind users of that fact +<danpb> quy: sounds like qed driver is simply broken - please do file +a bug against qemu bug tracker +<danpb> quy: but you should also really switch to qcow2 +<quy> I see; some people need to update their wikis then. I don't +remember where which guide I read when I first learned what little +QEMU I know, but I remember it specifically remember it saying QED was +the newest and most optimal format. +<stsquad> quy: we can only be responsible for our own wiki I'm afraid... +<danpb> if you remember where you saw that please let us know so we +can try to get it fixed +<quy> Thank you very much for the info; I will switch to QCOW. +Unfortunately, I'm not sure if I will be able to file any bug reports +in the tracker as I can't seem to log Launchpad, which it seems to +require. +<danpb> quy: an email to the mailing list would suffice too if you +can't deal with launchpad +<danpb> kwolf: ^^^ in case you're interested in possible QED +assertions from 2.12 + +If any more info is needed, feel free to email me; I'm not actually +subscribed to this list though. +Thank you, +Quytelda Kahja + +CC Qemu Block; looks like QED is a bit busted. + +On 06/27/2018 10:25 AM, Quytelda Kahja wrote: +> +Hello all, +> +I wanted to submit a bug report in the tracker, but it seem to require +> +an Ubuntu One account, which I'm having trouble with, so I'll just +> +give it here and hopefully somebody can make use of it. The issue +> +seems to be in an experimental format, so it's likely not very +> +consequential anyway. +> +> +For the sake of anyone else simply googling for a workaround, I'll +> +just paste in the (cleaned up) brief IRC conversation about my issue +> +from the official channel: +> +<quy> I'm using QEMU version 2.12.0 on an x86_64 host (Arch Linux, +> +Kernel v4.17.2), and I'm trying to create an x86_64 virtual machine +> +(FreeBSD-11.1). The VM always aborts at the same point in the +> +installation (downloading 'ports.tgz') with the following error +> +message: +> +"qemu-system-x86_64: /build/qemu/src/qemu-2.12.0/block/qed.c:1197: +> +qed_aio_write_alloc: Assertion `s->allocating_acb == NULL' failed. +> +zsh: abort (core dumped) qemu-system-x86_64 -smp 2 -m 4096 +> +-enable-kvm -hda freebsd/freebsd.qed -devic" +> +The commands I ran to create the machine are as follows: +> +"qemu-img create -f qed freebsd/freebsd.qed 16G" +> +"qemu-system-x86_64 -smp 2 -m 4096 -enable-kvm -hda +> +freebsd/freebsd.qed -device e1000,netdev=net0 -netdev user,id=net0 +> +-cdrom FreeBSD-11.1-RELEASE-amd64-bootonly.iso -boot order=d" +> +I tried adding logging options with the -d flag, but I didn't get +> +anything that seemed relevant, since I'm not sure what to look for. +> +<stsquad> ohh what's a qed device? +> +<stsquad> quy: it might be a workaround to use a qcow2 image for now +> +<stsquad> ahh the wiki has a statement "It is not recommended to use +> +QED for any new images. " +> +<danpb> 'qed' was an experimental disk image format created by IBM +> +before qcow2 v3 came along +> +<danpb> honestly nothing should ever use QED these days +> +<danpb> the good ideas from QED became qcow2v3 +> +<stsquad> danpb: sounds like we should put a warning on the option to +> +remind users of that fact +> +<danpb> quy: sounds like qed driver is simply broken - please do file +> +a bug against qemu bug tracker +> +<danpb> quy: but you should also really switch to qcow2 +> +<quy> I see; some people need to update their wikis then. I don't +> +remember where which guide I read when I first learned what little +> +QEMU I know, but I remember it specifically remember it saying QED was +> +the newest and most optimal format. +> +<stsquad> quy: we can only be responsible for our own wiki I'm afraid... +> +<danpb> if you remember where you saw that please let us know so we +> +can try to get it fixed +> +<quy> Thank you very much for the info; I will switch to QCOW. +> +Unfortunately, I'm not sure if I will be able to file any bug reports +> +in the tracker as I can't seem to log Launchpad, which it seems to +> +require. +> +<danpb> quy: an email to the mailing list would suffice too if you +> +can't deal with launchpad +> +<danpb> kwolf: ^^^ in case you're interested in possible QED +> +assertions from 2.12 +> +> +If any more info is needed, feel free to email me; I'm not actually +> +subscribed to this list though. +> +Thank you, +> +Quytelda Kahja +> + +On 06/29/2018 03:07 PM, John Snow wrote: +CC Qemu Block; looks like QED is a bit busted. + +On 06/27/2018 10:25 AM, Quytelda Kahja wrote: +Hello all, +I wanted to submit a bug report in the tracker, but it seem to require +an Ubuntu One account, which I'm having trouble with, so I'll just +give it here and hopefully somebody can make use of it. The issue +seems to be in an experimental format, so it's likely not very +consequential anyway. +Analysis in another thread may be relevant: +https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html +-- +Eric Blake, Principal Software Engineer +Red Hat, Inc. +1-919-301-3266 +Virtualization: qemu.org | libvirt.org + +Am 29.06.2018 um 22:16 hat Eric Blake geschrieben: +> +On 06/29/2018 03:07 PM, John Snow wrote: +> +> CC Qemu Block; looks like QED is a bit busted. +> +> +> +> On 06/27/2018 10:25 AM, Quytelda Kahja wrote: +> +> > Hello all, +> +> > I wanted to submit a bug report in the tracker, but it seem to require +> +> > an Ubuntu One account, which I'm having trouble with, so I'll just +> +> > give it here and hopefully somebody can make use of it. The issue +> +> > seems to be in an experimental format, so it's likely not very +> +> > consequential anyway. +> +> +Analysis in another thread may be relevant: +> +> +https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg08963.html +The assertion there was: + +qemu-system-x86_64: block.c:3434: bdrv_replace_node: Assertion +`!atomic_read(&to->in_flight)' failed. + +Which quite clearly pointed to a drain bug. This one, however, doesn't +seem to be related to drain, so I think it's probably a different bug. + +Kevin + diff --git a/classification_output/01/other/1398669 b/classification_output/01/other/1398669 new file mode 100644 index 000000000..17e9f6ea6 --- /dev/null +++ b/classification_output/01/other/1398669 @@ -0,0 +1,785 @@ +other: 0.922 +mistranslation: 0.917 +semantic: 0.903 +instruction: 0.894 + +[BUG] Migration hv_time rollback + +Hi, + +We are experiencing timestamp rollbacks during live-migration of +Windows 10 guests with the following qemu configuration (linux 5.4.46 +and qemu master): +``` +$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +``` + +I have tracked the bug to the fact that `kvmclock` is not exposed and +disabled from qemu PoV but is in fact used by `hv-time` (in KVM). + +I think we should enable the `kvmclock` (qemu device) if `hv-time` is +present and add Hyper-V support for the `kvmclock_current_nsec` +function. + +I'm asking for advice because I am unsure this is the _right_ approach +and how to keep migration compatibility between qemu versions. + +Thank you all, + +-- +Antoine 'xdbob' Damhet +signature.asc +Description: +PGP signature + +cc'ing in Vitaly who knows about the hv stuff. + +* Antoine Damhet (antoine.damhet@blade-group.com) wrote: +> +Hi, +> +> +We are experiencing timestamp rollbacks during live-migration of +> +Windows 10 guests with the following qemu configuration (linux 5.4.46 +> +and qemu master): +> +``` +> +$ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +> +``` +How big a jump are you seeing, and how did you notice it in the guest? + +Dave + +> +I have tracked the bug to the fact that `kvmclock` is not exposed and +> +disabled from qemu PoV but is in fact used by `hv-time` (in KVM). +> +> +I think we should enable the `kvmclock` (qemu device) if `hv-time` is +> +present and add Hyper-V support for the `kvmclock_current_nsec` +> +function. +> +> +I'm asking for advice because I am unsure this is the _right_ approach +> +and how to keep migration compatibility between qemu versions. +> +> +Thank you all, +> +> +-- +> +Antoine 'xdbob' Damhet +-- +Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK + +"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: + +> +cc'ing in Vitaly who knows about the hv stuff. +> +cc'ing Marcelo who knows about clocksources :-) + +> +* Antoine Damhet (antoine.damhet@blade-group.com) wrote: +> +> Hi, +> +> +> +> We are experiencing timestamp rollbacks during live-migration of +> +> Windows 10 guests +Are you migrating to the same hardware (with the same TSC frequency)? Is +TSC used as the clocksource on the host? + +> +> with the following qemu configuration (linux 5.4.46 +> +> and qemu master): +> +> ``` +> +> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +> +> ``` +Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is +not going to check for KVM identification anyway so we pretend we're +Hyper-V. + +Also, have you tried adding more Hyper-V enlightenments? + +> +> +How big a jump are you seeing, and how did you notice it in the guest? +> +> +Dave +> +> +> I have tracked the bug to the fact that `kvmclock` is not exposed and +> +> disabled from qemu PoV but is in fact used by `hv-time` (in KVM). +> +> +> +> I think we should enable the `kvmclock` (qemu device) if `hv-time` is +> +> present and add Hyper-V support for the `kvmclock_current_nsec` +> +> function. +AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by +the guest: + + if (!(env->system_time_msr & 1ULL)) { + /* KVM clock not active */ + return 0; + } + +and this is (and way) always false for Windows guests. + +> +> +> +> I'm asking for advice because I am unsure this is the _right_ approach +> +> and how to keep migration compatibility between qemu versions. +> +> +> +> Thank you all, +> +> +> +> -- +> +> Antoine 'xdbob' Damhet +-- +Vitaly + +On Wed, Sep 16, 2020 at 01:59:43PM +0200, Vitaly Kuznetsov wrote: +> +"Dr. David Alan Gilbert" <dgilbert@redhat.com> writes: +> +> +> cc'ing in Vitaly who knows about the hv stuff. +> +> +> +> +cc'ing Marcelo who knows about clocksources :-) +> +> +> * Antoine Damhet (antoine.damhet@blade-group.com) wrote: +> +>> Hi, +> +>> +> +>> We are experiencing timestamp rollbacks during live-migration of +> +>> Windows 10 guests +> +> +Are you migrating to the same hardware (with the same TSC frequency)? Is +> +TSC used as the clocksource on the host? +Yes we are migrating to the exact same hardware. And yes TSC is used as +a clocksource in the host (but the bug is still happening with `hpet` as +a clocksource). + +> +> +>> with the following qemu configuration (linux 5.4.46 +> +>> and qemu master): +> +>> ``` +> +>> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +> +>> ``` +> +> +Out of pure curiosity, what's the purpose of doing 'kvm=off'? Windows is +> +not going to check for KVM identification anyway so we pretend we're +> +Hyper-V. +Some softwares explicitly checks for the presence of KVM and then crash +if they find it in CPUID :/ + +> +> +Also, have you tried adding more Hyper-V enlightenments? +Yes, I published a stripped-down command-line for a minimal reproducer +but even `hv-frequencies` and `hv-reenlightenment` don't help. + +> +> +> +> +> How big a jump are you seeing, and how did you notice it in the guest? +> +> +> +> Dave +> +> +> +>> I have tracked the bug to the fact that `kvmclock` is not exposed and +> +>> disabled from qemu PoV but is in fact used by `hv-time` (in KVM). +> +>> +> +>> I think we should enable the `kvmclock` (qemu device) if `hv-time` is +> +>> present and add Hyper-V support for the `kvmclock_current_nsec` +> +>> function. +> +> +AFAICT kvmclock_current_nsec() checks whether kvmclock was enabled by +> +the guest: +> +> +if (!(env->system_time_msr & 1ULL)) { +> +/* KVM clock not active */ +> +return 0; +> +} +> +> +and this is (and way) always false for Windows guests. +Hooo, I missed this piece. When is `clock_is_reliable` expected to be +false ? Because if it is I still think we should be able to query at +least `HV_X64_MSR_REFERENCE_TSC` + +> +> +>> +> +>> I'm asking for advice because I am unsure this is the _right_ approach +> +>> and how to keep migration compatibility between qemu versions. +> +>> +> +>> Thank you all, +> +>> +> +>> -- +> +>> Antoine 'xdbob' Damhet +> +> +-- +> +Vitaly +> +-- +Antoine 'xdbob' Damhet +signature.asc +Description: +PGP signature + +On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote: +> +cc'ing in Vitaly who knows about the hv stuff. +Thanks + +> +> +* Antoine Damhet (antoine.damhet@blade-group.com) wrote: +> +> Hi, +> +> +> +> We are experiencing timestamp rollbacks during live-migration of +> +> Windows 10 guests with the following qemu configuration (linux 5.4.46 +> +> and qemu master): +> +> ``` +> +> $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +> +> ``` +> +> +How big a jump are you seeing, and how did you notice it in the guest? +I'm seeing jumps of about the guest uptime (indicating a reset of the +counter). It's expected because we won't call `KVM_SET_CLOCK` to +restore any value. + +We first noticed it because after some migrations `dwm.exe` crashes with +the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in +the past." error code. + +I can also confirm the following hack makes the behavior disappear: + +``` +diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c +index 64283358f9..f334bdf35f 100644 +--- a/hw/i386/kvm/clock.c ++++ b/hw/i386/kvm/clock.c +@@ -332,11 +332,7 @@ void kvmclock_create(void) + { + X86CPU *cpu = X86_CPU(first_cpu); + +- if (kvm_enabled() && +- cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | +- (1ULL << KVM_FEATURE_CLOCKSOURCE2))) { +- sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); +- } ++ sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); + } + + static void kvmclock_register_types(void) +diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c +index 32b1453e6a..11d980ba85 100644 +--- a/hw/i386/pc_piix.c ++++ b/hw/i386/pc_piix.c +@@ -158,9 +158,7 @@ static void pc_init1(MachineState *machine, + + x86_cpus_init(x86ms, pcmc->default_cpu_version); + +- if (kvm_enabled() && pcmc->kvmclock_enabled) { +- kvmclock_create(); +- } ++ kvmclock_create(); + + if (pcmc->pci_enabled) { + pci_memory = g_new(MemoryRegion, 1); +``` + +> +> +Dave +> +> +> I have tracked the bug to the fact that `kvmclock` is not exposed and +> +> disabled from qemu PoV but is in fact used by `hv-time` (in KVM). +> +> +> +> I think we should enable the `kvmclock` (qemu device) if `hv-time` is +> +> present and add Hyper-V support for the `kvmclock_current_nsec` +> +> function. +> +> +> +> I'm asking for advice because I am unsure this is the _right_ approach +> +> and how to keep migration compatibility between qemu versions. +> +> +> +> Thank you all, +> +> +> +> -- +> +> Antoine 'xdbob' Damhet +> +> +> +-- +> +Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK +> +-- +Antoine 'xdbob' Damhet +signature.asc +Description: +PGP signature + +Antoine Damhet <antoine.damhet@blade-group.com> writes: + +> +On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote: +> +> cc'ing in Vitaly who knows about the hv stuff. +> +> +Thanks +> +> +> +> +> * Antoine Damhet (antoine.damhet@blade-group.com) wrote: +> +> > Hi, +> +> > +> +> > We are experiencing timestamp rollbacks during live-migration of +> +> > Windows 10 guests with the following qemu configuration (linux 5.4.46 +> +> > and qemu master): +> +> > ``` +> +> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +> +> > ``` +> +> +> +> How big a jump are you seeing, and how did you notice it in the guest? +> +> +I'm seeing jumps of about the guest uptime (indicating a reset of the +> +counter). It's expected because we won't call `KVM_SET_CLOCK` to +> +restore any value. +> +> +We first noticed it because after some migrations `dwm.exe` crashes with +> +the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in +> +the past." error code. +> +> +I can also confirm the following hack makes the behavior disappear: +> +> +``` +> +diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c +> +index 64283358f9..f334bdf35f 100644 +> +--- a/hw/i386/kvm/clock.c +> ++++ b/hw/i386/kvm/clock.c +> +@@ -332,11 +332,7 @@ void kvmclock_create(void) +> +{ +> +X86CPU *cpu = X86_CPU(first_cpu); +> +> +- if (kvm_enabled() && +> +- cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | +> +- (1ULL << KVM_FEATURE_CLOCKSOURCE2))) { +> +- sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); +> +- } +> ++ sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); +> +} +> +Oh, I think I see what's going on. When you add 'kvm=off' +cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so +kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on +migration. + +In case we really want to support 'kvm=off' I think we can add Hyper-V +features check here along with KVM, this should do the job. + +-- +Vitaly + +Vitaly Kuznetsov <vkuznets@redhat.com> writes: + +> +Antoine Damhet <antoine.damhet@blade-group.com> writes: +> +> +> On Wed, Sep 16, 2020 at 12:29:56PM +0100, Dr. David Alan Gilbert wrote: +> +>> cc'ing in Vitaly who knows about the hv stuff. +> +> +> +> Thanks +> +> +> +>> +> +>> * Antoine Damhet (antoine.damhet@blade-group.com) wrote: +> +>> > Hi, +> +>> > +> +>> > We are experiencing timestamp rollbacks during live-migration of +> +>> > Windows 10 guests with the following qemu configuration (linux 5.4.46 +> +>> > and qemu master): +> +>> > ``` +> +>> > $ qemu-system-x86_64 -enable-kvm -cpu host,kvm=off,hv_time [...] +> +>> > ``` +> +>> +> +>> How big a jump are you seeing, and how did you notice it in the guest? +> +> +> +> I'm seeing jumps of about the guest uptime (indicating a reset of the +> +> counter). It's expected because we won't call `KVM_SET_CLOCK` to +> +> restore any value. +> +> +> +> We first noticed it because after some migrations `dwm.exe` crashes with +> +> the "(NTSTATUS) 0x8898009b - QueryPerformanceCounter returned a time in +> +> the past." error code. +> +> +> +> I can also confirm the following hack makes the behavior disappear: +> +> +> +> ``` +> +> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c +> +> index 64283358f9..f334bdf35f 100644 +> +> --- a/hw/i386/kvm/clock.c +> +> +++ b/hw/i386/kvm/clock.c +> +> @@ -332,11 +332,7 @@ void kvmclock_create(void) +> +> { +> +> X86CPU *cpu = X86_CPU(first_cpu); +> +> +> +> - if (kvm_enabled() && +> +> - cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | +> +> - (1ULL << KVM_FEATURE_CLOCKSOURCE2))) +> +> { +> +> - sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); +> +> - } +> +> + sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); +> +> } +> +> +> +> +> +Oh, I think I see what's going on. When you add 'kvm=off' +> +cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so +> +kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on +> +migration. +> +> +In case we really want to support 'kvm=off' I think we can add Hyper-V +> +features check here along with KVM, this should do the job. +Does the untested + +diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c +index 64283358f91d..e03b2ca6d8f6 100644 +--- a/hw/i386/kvm/clock.c ++++ b/hw/i386/kvm/clock.c +@@ -333,8 +333,9 @@ void kvmclock_create(void) + X86CPU *cpu = X86_CPU(first_cpu); + + if (kvm_enabled() && +- cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | +- (1ULL << KVM_FEATURE_CLOCKSOURCE2))) { ++ ((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | ++ (1ULL << KVM_FEATURE_CLOCKSOURCE2))) +|| ++ (cpu->env.features[FEAT_HYPERV_EAX] & HV_TIME_REF_COUNT_AVAILABLE))) { + sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); + } + } + +help? + +(I don't think we need to remove all 'if (kvm_enabled())' checks from +machine types as 'kvm=off' should not be related). + +-- +Vitaly + +On Wed, Sep 16, 2020 at 02:50:56PM +0200, Vitaly Kuznetsov wrote: +[...] + +> +>> +> +> +> +> +> +> Oh, I think I see what's going on. When you add 'kvm=off' +> +> cpu->env.features[FEAT_KVM] is reset (see x86_cpu_expand_features()) so +> +> kvmclock QEMU device is not created and nobody calls KVM_SET_CLOCK on +> +> migration. +> +> +> +> In case we really want to support 'kvm=off' I think we can add Hyper-V +> +> features check here along with KVM, this should do the job. +> +> +Does the untested +> +> +diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c +> +index 64283358f91d..e03b2ca6d8f6 100644 +> +--- a/hw/i386/kvm/clock.c +> ++++ b/hw/i386/kvm/clock.c +> +@@ -333,8 +333,9 @@ void kvmclock_create(void) +> +X86CPU *cpu = X86_CPU(first_cpu); +> +> +if (kvm_enabled() && +> +- cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | +> +- (1ULL << KVM_FEATURE_CLOCKSOURCE2))) { +> ++ ((cpu->env.features[FEAT_KVM] & ((1ULL << KVM_FEATURE_CLOCKSOURCE) | +> ++ (1ULL << +> +KVM_FEATURE_CLOCKSOURCE2))) || +> ++ (cpu->env.features[FEAT_HYPERV_EAX] & +> +HV_TIME_REF_COUNT_AVAILABLE))) { +> +sysbus_create_simple(TYPE_KVM_CLOCK, -1, NULL); +> +} +> +} +> +> +help? +It appears to work :) + +> +> +(I don't think we need to remove all 'if (kvm_enabled())' checks from +> +machine types as 'kvm=off' should not be related). +Indeed (I didn't look at the macro, it was just quick & dirty). + +> +> +-- +> +Vitaly +> +> +-- +Antoine 'xdbob' Damhet +signature.asc +Description: +PGP signature + +On 16/09/20 13:29, Dr. David Alan Gilbert wrote: +> +> I have tracked the bug to the fact that `kvmclock` is not exposed and +> +> disabled from qemu PoV but is in fact used by `hv-time` (in KVM). +> +> +> +> I think we should enable the `kvmclock` (qemu device) if `hv-time` is +> +> present and add Hyper-V support for the `kvmclock_current_nsec` +> +> function. +Yes, this seems correct. I would have to check but it may even be +better to _always_ send kvmclock data in the live migration stream. + +Paolo + +Paolo Bonzini <pbonzini@redhat.com> writes: + +> +On 16/09/20 13:29, Dr. David Alan Gilbert wrote: +> +>> I have tracked the bug to the fact that `kvmclock` is not exposed and +> +>> disabled from qemu PoV but is in fact used by `hv-time` (in KVM). +> +>> +> +>> I think we should enable the `kvmclock` (qemu device) if `hv-time` is +> +>> present and add Hyper-V support for the `kvmclock_current_nsec` +> +>> function. +> +> +Yes, this seems correct. I would have to check but it may even be +> +better to _always_ send kvmclock data in the live migration stream. +> +The question I have is: with 'kvm=off', do we actually restore TSC +reading on migration? (and I guess the answer is 'no' or Hyper-V TSC +page would 'just work' I guess). So yea, maybe dropping the +'cpu->env.features[FEAT_KVM]' check is the right fix. + +-- +Vitaly + diff --git a/classification_output/01/other/1412913 b/classification_output/01/other/1412913 new file mode 100644 index 000000000..b3cb5b265 --- /dev/null +++ b/classification_output/01/other/1412913 @@ -0,0 +1,2900 @@ +other: 0.987 +semantic: 0.976 +instruction: 0.974 +mistranslation: 0.942 + +[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device + +Hi, + +I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +think it's because io completion hits use-after-free when device is +already gone. Is this a known bug that has been fixed? (I went through +the git log but didn't find anything obvious). + +gdb backtrace is: + +Core was generated by `/usr/local/libexec/qemu-kvm -name +sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +Program terminated with signal 11, Segmentation fault. +#0 object_get_class (obj=obj@entry=0x0) at +/usr/src/debug/qemu-4.0/qom/object.c:903 +903 return obj->class; +(gdb) bt +#0 object_get_class (obj=obj@entry=0x0) at +/usr/src/debug/qemu-4.0/qom/object.c:903 +#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +  opaque=0x558a2f2fd420, ret=0) +  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +  i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +#6  0x00007fff9ed75780 in ?? () +#7  0x0000000000000000 in ?? () + +It seems like qemu was completing a discard/write_zero request, but +parent BusState was already freed & set to NULL. + +Do we need to drain all pending request before unrealizing virtio-blk +device? Like the following patch proposed? +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +If more info is needed, please let me know. + +Thanks, +Eryu + +On Tue, 31 Dec 2019 18:34:34 +0800 +Eryu Guan <address@hidden> wrote: + +> +Hi, +> +> +I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +think it's because io completion hits use-after-free when device is +> +already gone. Is this a known bug that has been fixed? (I went through +> +the git log but didn't find anything obvious). +> +> +gdb backtrace is: +> +> +Core was generated by `/usr/local/libexec/qemu-kvm -name +> +sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +Program terminated with signal 11, Segmentation fault. +> +#0 object_get_class (obj=obj@entry=0x0) at +> +/usr/src/debug/qemu-4.0/qom/object.c:903 +> +903 return obj->class; +> +(gdb) bt +> +#0 object_get_class (obj=obj@entry=0x0) at +> +/usr/src/debug/qemu-4.0/qom/object.c:903 +> +#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +  opaque=0x558a2f2fd420, ret=0) +> +  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +  i1=<optimized out>) at +> +/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +#6  0x00007fff9ed75780 in ?? () +> +#7  0x0000000000000000 in ?? () +> +> +It seems like qemu was completing a discard/write_zero request, but +> +parent BusState was already freed & set to NULL. +> +> +Do we need to drain all pending request before unrealizing virtio-blk +> +device? Like the following patch proposed? +> +> +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> +If more info is needed, please let me know. +may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +Thanks, +> +Eryu +> + +On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +On Tue, 31 Dec 2019 18:34:34 +0800 +> +Eryu Guan <address@hidden> wrote: +> +> +> Hi, +> +> +> +> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> think it's because io completion hits use-after-free when device is +> +> already gone. Is this a known bug that has been fixed? (I went through +> +> the git log but didn't find anything obvious). +> +> +> +> gdb backtrace is: +> +> +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 object_get_class (obj=obj@entry=0x0) at +> +> /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> 903 return obj->class; +> +> (gdb) bt +> +> #0 object_get_class (obj=obj@entry=0x0) at +> +> /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +>   vector=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +>   opaque=0x558a2f2fd420, ret=0) +> +>   at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +>   at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +>   i1=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> #6  0x00007fff9ed75780 in ?? () +> +> #7  0x0000000000000000 in ?? () +> +> +> +> It seems like qemu was completing a discard/write_zero request, but +> +> parent BusState was already freed & set to NULL. +> +> +> +> Do we need to drain all pending request before unrealizing virtio-blk +> +> device? Like the following patch proposed? +> +> +> +> +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> +> +> If more info is needed, please let me know. +> +> +may be this will help: +https://patchwork.kernel.org/patch/11213047/ +Yeah, this looks promising! I'll try it out (though it's a one-time +crash for me). Thanks! + +Eryu + +On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> On Tue, 31 Dec 2019 18:34:34 +0800 +> +> Eryu Guan <address@hidden> wrote: +> +> +> +> > Hi, +> +> > +> +> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > think it's because io completion hits use-after-free when device is +> +> > already gone. Is this a known bug that has been fixed? (I went through +> +> > the git log but didn't find anything obvious). +> +> > +> +> > gdb backtrace is: +> +> > +> +> > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > Program terminated with signal 11, Segmentation fault. +> +> > #0 object_get_class (obj=obj@entry=0x0) at +> +> > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > 903 return obj->class; +> +> > (gdb) bt +> +> > #0 object_get_class (obj=obj@entry=0x0) at +> +> > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> >   vector=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> >   opaque=0x558a2f2fd420, ret=0) +> +> >   at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> >   at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> >   i1=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > #6  0x00007fff9ed75780 in ?? () +> +> > #7  0x0000000000000000 in ?? () +> +> > +> +> > It seems like qemu was completing a discard/write_zero request, but +> +> > parent BusState was already freed & set to NULL. +> +> > +> +> > Do we need to drain all pending request before unrealizing virtio-blk +> +> > device? Like the following patch proposed? +> +> > +> +> > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > +> +> > If more info is needed, please let me know. +> +> +> +> may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +Yeah, this looks promising! I'll try it out (though it's a one-time +> +crash for me). Thanks! +After applying this patch, I don't see the original segfaut and +backtrace, but I see this crash + +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib64/libthread_db.so.1". +Core was generated by `/usr/local/libexec/qemu-kvm -name +sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +Program terminated with signal 11, Segmentation fault. +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +addr=0, val=<optimized out>, size=<optimized out>) at +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +1324 VirtIOPCIProxy *proxy = +VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +Missing separate debuginfos, use: debuginfo-install +glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +(gdb) bt +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +addr=0, val=<optimized out>, size=<optimized out>) at +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +addr=<optimized out>, value=<optimized out>, size=<optimized out>, +shift=<optimized out>, mask=<optimized out>, attrs=...) at +/usr/src/debug/qemu-4.0/memory.c:502 +#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized +out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 +<memory_region_write_accessor>, mr=0x56121846d340, attrs=...) + at /usr/src/debug/qemu-4.0/memory.c:568 +#3 0x0000561216837c66 in memory_region_dispatch_write +(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +#4 0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, +addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address +0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, +l=<optimized out>, mr=0x56121846d340) + at /usr/src/debug/qemu-4.0/exec.c:3279 +#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, +attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at +/usr/src/debug/qemu-4.0/exec.c:3318 +#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at +/usr/src/debug/qemu-4.0/exec.c:3408 +#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized +out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address +0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) +at /usr/src/debug/qemu-4.0/exec.c:3419 +#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +#9 0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) +at /usr/src/debug/qemu-4.0/cpus.c:1281 +#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 + +And I searched and found +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +bug. + +But I can still hit the bug even after applying the commit. Do I miss +anything? + +Thanks, +Eryu +> +Eryu + +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > Eryu Guan <address@hidden> wrote: +> +> > +> +> > > Hi, +> +> > > +> +> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > think it's because io completion hits use-after-free when device is +> +> > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > the git log but didn't find anything obvious). +> +> > > +> +> > > gdb backtrace is: +> +> > > +> +> > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > Program terminated with signal 11, Segmentation fault. +> +> > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > 903 return obj->class; +> +> > > (gdb) bt +> +> > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > vector=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > opaque=0x558a2f2fd420, ret=0) +> +> > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > i1=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > #6 0x00007fff9ed75780 in ?? () +> +> > > #7 0x0000000000000000 in ?? () +> +> > > +> +> > > It seems like qemu was completing a discard/write_zero request, but +> +> > > parent BusState was already freed & set to NULL. +> +> > > +> +> > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > device? Like the following patch proposed? +> +> > > +> +> > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > +> +> > > If more info is needed, please let me know. +> +> > +> +> > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +> +> Yeah, this looks promising! I'll try it out (though it's a one-time +> +> crash for me). Thanks! +> +> +After applying this patch, I don't see the original segfaut and +> +backtrace, but I see this crash +> +> +[Thread debugging using libthread_db enabled] +> +Using host libthread_db library "/lib64/libthread_db.so.1". +> +Core was generated by `/usr/local/libexec/qemu-kvm -name +> +sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +Program terminated with signal 11, Segmentation fault. +> +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +addr=0, val=<optimized out>, size=<optimized out>) at +> +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +1324 VirtIOPCIProxy *proxy = +> +VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +Missing separate debuginfos, use: debuginfo-install +> +glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +(gdb) bt +> +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +addr=0, val=<optimized out>, size=<optimized out>) at +> +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +/usr/src/debug/qemu-4.0/memory.c:502 +> +#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +access_size_min=<optimized out>, access_size_max=<optimized out>, +> +access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +attrs=...) +> +at /usr/src/debug/qemu-4.0/memory.c:568 +> +#3 0x0000561216837c66 in memory_region_dispatch_write +> +(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +#4 0x00005612167e036f in flatview_write_continue +> +(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340) +> +at /usr/src/debug/qemu-4.0/exec.c:3279 +> +#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out +> +of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at +> +/usr/src/debug/qemu-4.0/exec.c:3408 +> +#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +addr=<optimized out>, attrs=..., attrs@entry=..., +> +buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +len=<optimized out>, is_write=<optimized out>) at +> +/usr/src/debug/qemu-4.0/exec.c:3419 +> +#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +#9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +And I searched and found +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +bug. +> +> +But I can still hit the bug even after applying the commit. Do I miss +> +anything? +Hi Eryu, +This backtrace seems to be caused by this bug (there were two bugs in +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +Although the solution hasn't been tested on virtio-blk yet, you may +want to apply this patch: +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +Let me know if this works. + +Best regards, Julia Suvorova. + +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +> +> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > > Hi, +> +> > > > +> +> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > > think it's because io completion hits use-after-free when device is +> +> > > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > > the git log but didn't find anything obvious). +> +> > > > +> +> > > > gdb backtrace is: +> +> > > > +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > 903 return obj->class; +> +> > > > (gdb) bt +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > vector=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > i1=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > #7 0x0000000000000000 in ?? () +> +> > > > +> +> > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > parent BusState was already freed & set to NULL. +> +> > > > +> +> > > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > > device? Like the following patch proposed? +> +> > > > +> +> > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > +> +> > > > If more info is needed, please let me know. +> +> > > +> +> > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > +> +> > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > crash for me). Thanks! +> +> +> +> After applying this patch, I don't see the original segfaut and +> +> backtrace, but I see this crash +> +> +> +> [Thread debugging using libthread_db enabled] +> +> Using host libthread_db library "/lib64/libthread_db.so.1". +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> 1324 VirtIOPCIProxy *proxy = +> +> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> Missing separate debuginfos, use: debuginfo-install +> +> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> (gdb) bt +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +> addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> /usr/src/debug/qemu-4.0/memory.c:502 +> +> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +> attrs=...) +> +> at /usr/src/debug/qemu-4.0/memory.c:568 +> +> #3 0x0000561216837c66 in memory_region_dispatch_write +> +> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> #4 0x00005612167e036f in flatview_write_continue +> +> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> mr=0x56121846d340) +> +> at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) +> +> at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., attrs@entry=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=<optimized out>, is_write=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/exec.c:3419 +> +> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +> +> And I searched and found +> +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> bug. +> +> +> +> But I can still hit the bug even after applying the commit. Do I miss +> +> anything? +> +> +Hi Eryu, +> +This backtrace seems to be caused by this bug (there were two bugs in +> +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +Although the solution hasn't been tested on virtio-blk yet, you may +> +want to apply this patch: +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +Let me know if this works. +Will try it out, thanks a lot! + +Eryu + +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +> +> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > > Hi, +> +> > > > +> +> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > > think it's because io completion hits use-after-free when device is +> +> > > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > > the git log but didn't find anything obvious). +> +> > > > +> +> > > > gdb backtrace is: +> +> > > > +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > 903 return obj->class; +> +> > > > (gdb) bt +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > vector=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > i1=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > #7 0x0000000000000000 in ?? () +> +> > > > +> +> > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > parent BusState was already freed & set to NULL. +> +> > > > +> +> > > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > > device? Like the following patch proposed? +> +> > > > +> +> > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > +> +> > > > If more info is needed, please let me know. +> +> > > +> +> > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > +> +> > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > crash for me). Thanks! +> +> +> +> After applying this patch, I don't see the original segfaut and +> +> backtrace, but I see this crash +> +> +> +> [Thread debugging using libthread_db enabled] +> +> Using host libthread_db library "/lib64/libthread_db.so.1". +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> 1324 VirtIOPCIProxy *proxy = +> +> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> Missing separate debuginfos, use: debuginfo-install +> +> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> (gdb) bt +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +> addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> /usr/src/debug/qemu-4.0/memory.c:502 +> +> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +> attrs=...) +> +> at /usr/src/debug/qemu-4.0/memory.c:568 +> +> #3 0x0000561216837c66 in memory_region_dispatch_write +> +> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> #4 0x00005612167e036f in flatview_write_continue +> +> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> mr=0x56121846d340) +> +> at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) +> +> at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., attrs@entry=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=<optimized out>, is_write=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/exec.c:3419 +> +> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +> +> And I searched and found +> +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> bug. +> +> +> +> But I can still hit the bug even after applying the commit. Do I miss +> +> anything? +> +> +Hi Eryu, +> +This backtrace seems to be caused by this bug (there were two bugs in +> +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +Although the solution hasn't been tested on virtio-blk yet, you may +> +want to apply this patch: +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +Let me know if this works. +Unfortunately, I still see the same segfault & backtrace after applying +commit 421afd2fe8dd ("virtio: reset region cache when on queue +deletion") + +Anything I can help to debug? + +Thanks, +Eryu + +On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > +> +> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > Eryu Guan <address@hidden> wrote: +> +> > > > +> +> > > > > Hi, +> +> > > > > +> +> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, +> +> > > > > I +> +> > > > > think it's because io completion hits use-after-free when device is +> +> > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > through +> +> > > > > the git log but didn't find anything obvious). +> +> > > > > +> +> > > > > gdb backtrace is: +> +> > > > > +> +> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > 903 return obj->class; +> +> > > > > (gdb) bt +> +> > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > > vector=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > > i1=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > #7 0x0000000000000000 in ?? () +> +> > > > > +> +> > > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > > parent BusState was already freed & set to NULL. +> +> > > > > +> +> > > > > Do we need to drain all pending request before unrealizing +> +> > > > > virtio-blk +> +> > > > > device? Like the following patch proposed? +> +> > > > > +> +> > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > +> +> > > > > If more info is needed, please let me know. +> +> > > > +> +> > > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > > +> +> > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > crash for me). Thanks! +> +> > +> +> > After applying this patch, I don't see the original segfaut and +> +> > backtrace, but I see this crash +> +> > +> +> > [Thread debugging using libthread_db enabled] +> +> > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > Program terminated with signal 11, Segmentation fault. +> +> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> > addr=0, val=<optimized out>, size=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > 1324 VirtIOPCIProxy *proxy = +> +> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > Missing separate debuginfos, use: debuginfo-install +> +> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > (gdb) bt +> +> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> > addr=0, val=<optimized out>, size=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> > shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> > access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> > access_fn=0x561216835ac0 <memory_region_write_accessor>, +> +> > mr=0x56121846d340, attrs=...) +> +> > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > #4 0x00005612167e036f in flatview_write_continue +> +> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > mr=0x56121846d340) +> +> > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > len=<optimized out>, is_write=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > +> +> > And I searched and found +> +> > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> > bug. +> +> > +> +> > But I can still hit the bug even after applying the commit. Do I miss +> +> > anything? +> +> +> +> Hi Eryu, +> +> This backtrace seems to be caused by this bug (there were two bugs in +> +> 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> Although the solution hasn't been tested on virtio-blk yet, you may +> +> want to apply this patch: +> +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> Let me know if this works. +> +> +Unfortunately, I still see the same segfault & backtrace after applying +> +commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +deletion") +> +> +Anything I can help to debug? +Please post the QEMU command-line and the QMP commands use to remove the +device. + +The backtrace shows a vcpu thread submitting a request. The device +seems to be partially destroyed. That's surprising because the monitor +and the vcpu thread should use the QEMU global mutex to avoid race +conditions. Maybe seeing the QMP commands will make it clearer... + +Stefan +signature.asc +Description: +PGP signature + +On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > +> +> > > > > > Hi, +> +> > > > > > +> +> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > sandbox, I +> +> > > > > > think it's because io completion hits use-after-free when device +> +> > > > > > is +> +> > > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > > through +> +> > > > > > the git log but didn't find anything obvious). +> +> > > > > > +> +> > > > > > gdb backtrace is: +> +> > > > > > +> +> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > 903 return obj->class; +> +> > > > > > (gdb) bt +> +> > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > vector=<optimized out>) at +> +> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > out>, +> +> > > > > > i1=<optimized out>) at +> +> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > +> +> > > > > > It seems like qemu was completing a discard/write_zero request, +> +> > > > > > but +> +> > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > +> +> > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > virtio-blk +> +> > > > > > device? Like the following patch proposed? +> +> > > > > > +> +> > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > +> +> > > > > > If more info is needed, please let me know. +> +> > > > > +> +> > > > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > > > +> +> > > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > > crash for me). Thanks! +> +> > > +> +> > > After applying this patch, I don't see the original segfaut and +> +> > > backtrace, but I see this crash +> +> > > +> +> > > [Thread debugging using libthread_db enabled] +> +> > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > Program terminated with signal 11, Segmentation fault. +> +> > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > 1324 VirtIOPCIProxy *proxy = +> +> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > Missing separate debuginfos, use: debuginfo-install +> +> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > > (gdb) bt +> +> > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized +> +> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> > > access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> > > access_fn=0x561216835ac0 <memory_region_write_accessor>, +> +> > > mr=0x56121846d340, attrs=...) +> +> > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > #4 0x00005612167e036f in flatview_write_continue +> +> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > > mr=0x56121846d340) +> +> > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > len=<optimized out>, is_write=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) +> +> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > +> +> > > And I searched and found +> +> > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> > > bug. +> +> > > +> +> > > But I can still hit the bug even after applying the commit. Do I miss +> +> > > anything? +> +> > +> +> > Hi Eryu, +> +> > This backtrace seems to be caused by this bug (there were two bugs in +> +> > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > want to apply this patch: +> +> > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > Let me know if this works. +> +> +> +> Unfortunately, I still see the same segfault & backtrace after applying +> +> commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> deletion") +> +> +> +> Anything I can help to debug? +> +> +Please post the QEMU command-line and the QMP commands use to remove the +> +device. +It's a normal kata instance using virtio-fs as rootfs. + +/usr/local/libexec/qemu-kvm -name +sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ + -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ + -cpu host -qmp +unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait + \ + -qmp +unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait + \ + -m 2048M,slots=10,maxmem=773893M -device +pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ + -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +virtconsole,chardev=charconsole0,id=console0 \ + -chardev +socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait + \ + -device +virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ + -chardev +socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait + \ + -device nvdimm,id=nv0,memdev=mem0 -object +memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 + \ + -object rng-random,id=rng0,filename=/dev/urandom -device +virtio-rng,rng=rng0,romfile= \ + -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ + -chardev +socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait + \ + -chardev +socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock + \ + -device +vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ + -device +driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= + \ + -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults +-nographic -daemonize \ + -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ + -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 +i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 +console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 +root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro +rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 +agent.use_vsock=false init=/usr/lib/systemd/systemd +systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service +systemd.mask=systemd-networkd.socket \ + -pidfile +/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +\ + -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 + +QMP command to delete device (the device id is just an example, not the +one caused the crash): + +"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" + +which has been hot plugged by: +"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +"{\"return\": {}}" +"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +"{\"return\": {}}" + +> +> +The backtrace shows a vcpu thread submitting a request. The device +> +seems to be partially destroyed. That's surprising because the monitor +> +and the vcpu thread should use the QEMU global mutex to avoid race +> +conditions. Maybe seeing the QMP commands will make it clearer... +> +> +Stefan +Thanks! + +Eryu + +On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: +> +On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > > +> +> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > > +> +> > > > > > > Hi, +> +> > > > > > > +> +> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > > sandbox, I +> +> > > > > > > think it's because io completion hits use-after-free when +> +> > > > > > > device is +> +> > > > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > > > through +> +> > > > > > > the git log but didn't find anything obvious). +> +> > > > > > > +> +> > > > > > > gdb backtrace is: +> +> > > > > > > +> +> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > 903 return obj->class; +> +> > > > > > > (gdb) bt +> +> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > > vector=<optimized out>) at +> +> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > > out>, +> +> > > > > > > i1=<optimized out>) at +> +> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > > +> +> > > > > > > It seems like qemu was completing a discard/write_zero request, +> +> > > > > > > but +> +> > > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > > +> +> > > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > > virtio-blk +> +> > > > > > > device? Like the following patch proposed? +> +> > > > > > > +> +> > > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > > +> +> > > > > > > If more info is needed, please let me know. +> +> > > > > > +> +> > > > > > may be this will help: +> +> > > > > > +https://patchwork.kernel.org/patch/11213047/ +> +> > > > > +> +> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > > > crash for me). Thanks! +> +> > > > +> +> > > > After applying this patch, I don't see the original segfaut and +> +> > > > backtrace, but I see this crash +> +> > > > +> +> > > > [Thread debugging using libthread_db enabled] +> +> > > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > 1324 VirtIOPCIProxy *proxy = +> +> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > > Missing separate debuginfos, use: debuginfo-install +> +> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > > libstdc++-4.8.5-28.alios7.1.x86_64 +> +> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 +> +> > > > zlib-1.2.7-16.2.alios7.x86_64 +> +> > > > (gdb) bt +> +> > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized +> +> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > > > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > > #2 0x0000561216833c5d in access_with_adjusted_size +> +> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, +> +> > > > size=size@entry=2, access_size_min=<optimized out>, +> +> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 +> +> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) +> +> > > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > > #4 0x00005612167e036f in flatview_write_continue +> +> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > > > mr=0x56121846d340) +> +> > > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > > len=<optimized out>, is_write=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) +> +> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > > +> +> > > > And I searched and found +> +> > > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the +> +> > > > same +> +> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this +> +> > > > particular +> +> > > > bug. +> +> > > > +> +> > > > But I can still hit the bug even after applying the commit. Do I miss +> +> > > > anything? +> +> > > +> +> > > Hi Eryu, +> +> > > This backtrace seems to be caused by this bug (there were two bugs in +> +> > > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > > want to apply this patch: +> +> > > +> +> > > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > > Let me know if this works. +> +> > +> +> > Unfortunately, I still see the same segfault & backtrace after applying +> +> > commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> > deletion") +> +> > +> +> > Anything I can help to debug? +> +> +> +> Please post the QEMU command-line and the QMP commands use to remove the +> +> device. +> +> +It's a normal kata instance using virtio-fs as rootfs. +> +> +/usr/local/libexec/qemu-kvm -name +> +sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ +> +-uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +> +q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ +> +-cpu host -qmp +> +unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +\ +> +-qmp +> +unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +\ +> +-m 2048M,slots=10,maxmem=773893M -device +> +pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ +> +-device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +> +virtconsole,chardev=charconsole0,id=console0 \ +> +-chardev +> +socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait +> +\ +> +-device +> +virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ +> +-chardev +> +socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait +> +\ +> +-device nvdimm,id=nv0,memdev=mem0 -object +> +memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 +> +\ +> +-object rng-random,id=rng0,filename=/dev/urandom -device +> +virtio-rng,rng=rng0,romfile= \ +> +-device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ +> +-chardev +> +socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait +> +\ +> +-chardev +> +socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock +> +\ +> +-device +> +vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +> +-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ +> +-device +> +driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= +> +\ +> +-global kvm-pit.lost_tick_policy=discard -vga none -no-user-config +> +-nodefaults -nographic -daemonize \ +> +-object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +> +-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ +> +-append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 +> +i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k +> +console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 +> +pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro +> +ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 +> +agent.use_vsock=false init=/usr/lib/systemd/systemd +> +systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service +> +systemd.mask=systemd-networkd.socket \ +> +-pidfile +> +/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +> +\ +> +-smp 1,cores=1,threads=1,sockets=96,maxcpus=96 +> +> +QMP command to delete device (the device id is just an example, not the +> +one caused the crash): +> +> +"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" +> +> +which has been hot plugged by: +> +"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +> +"{\"return\": {}}" +> +"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +> +"{\"return\": {}}" +Thanks. I wasn't able to reproduce this crash with qemu.git/master. + +One thing that is strange about the latest backtrace you posted: QEMU is +dispatching the memory access instead of using the ioeventfd code that +that virtio-blk-pci normally takes when a virtqueue is notified. I +guess this means ioeventfd has already been disabled due to the hot +unplug. + +Could you try with machine type "i440fx" instead of "q35"? I wonder if +pci-bridge/shpc is part of the problem. + +Stefan +signature.asc +Description: +PGP signature + +On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote: +> +On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: +> +> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > > > +> +> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > > > +> +> > > > > > > > Hi, +> +> > > > > > > > +> +> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > > > sandbox, I +> +> > > > > > > > think it's because io completion hits use-after-free when +> +> > > > > > > > device is +> +> > > > > > > > already gone. Is this a known bug that has been fixed? (I +> +> > > > > > > > went through +> +> > > > > > > > the git log but didn't find anything obvious). +> +> > > > > > > > +> +> > > > > > > > gdb backtrace is: +> +> > > > > > > > +> +> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > > 903 return obj->class; +> +> > > > > > > > (gdb) bt +> +> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > > > vector=<optimized out>) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete +> +> > > > > > > > (acb=0x558a2eed7420) +> +> > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > > > out>, +> +> > > > > > > > i1=<optimized out>) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > > > +> +> > > > > > > > It seems like qemu was completing a discard/write_zero +> +> > > > > > > > request, but +> +> > > > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > > > +> +> > > > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > > > virtio-blk +> +> > > > > > > > device? Like the following patch proposed? +> +> > > > > > > > +> +> > > > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > > > +> +> > > > > > > > If more info is needed, please let me know. +> +> > > > > > > +> +> > > > > > > may be this will help: +> +> > > > > > > +https://patchwork.kernel.org/patch/11213047/ +> +> > > > > > +> +> > > > > > Yeah, this looks promising! I'll try it out (though it's a +> +> > > > > > one-time +> +> > > > > > crash for me). Thanks! +> +> > > > > +> +> > > > > After applying this patch, I don't see the original segfaut and +> +> > > > > backtrace, but I see this crash +> +> > > > > +> +> > > > > [Thread debugging using libthread_db enabled] +> +> > > > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, +> +> > > > > size=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > > 1324 VirtIOPCIProxy *proxy = +> +> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > > > Missing separate debuginfos, use: debuginfo-install +> +> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > > > libstdc++-4.8.5-28.alios7.1.x86_64 +> +> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > > > > (gdb) bt +> +> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, +> +> > > > > size=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > > #1 0x0000561216835b22 in memory_region_write_accessor +> +> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, +> +> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>, +> +> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > > > #2 0x0000561216833c5d in access_with_adjusted_size +> +> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, +> +> > > > > size=size@entry=2, access_size_min=<optimized out>, +> +> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 +> +> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) +> +> > > > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > > > #4 0x00005612167e036f in flatview_write_continue +> +> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, +> +> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out +> +> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized +> +> > > > > out>, mr=0x56121846d340) +> +> > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>, +> +> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of +> +> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > > > #8 0x0000561216849da1 in kvm_cpu_exec +> +> > > > > (cpu=cpu@entry=0x56121849aa00) at +> +> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > > > (arg=arg@entry=0x56121849aa00) at +> +> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) +> +> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > > > #11 0x00007fce7bef6e25 in start_thread () from +> +> > > > > /lib64/libpthread.so.0 +> +> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > > > +> +> > > > > And I searched and found +> +> > > > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the +> +> > > > > same +> +> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: +> +> > > > > Add +> +> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this +> +> > > > > particular +> +> > > > > bug. +> +> > > > > +> +> > > > > But I can still hit the bug even after applying the commit. Do I +> +> > > > > miss +> +> > > > > anything? +> +> > > > +> +> > > > Hi Eryu, +> +> > > > This backtrace seems to be caused by this bug (there were two bugs in +> +> > > > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > > > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > > > want to apply this patch: +> +> > > > +> +> > > > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > > > Let me know if this works. +> +> > > +> +> > > Unfortunately, I still see the same segfault & backtrace after applying +> +> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> > > deletion") +> +> > > +> +> > > Anything I can help to debug? +> +> > +> +> > Please post the QEMU command-line and the QMP commands use to remove the +> +> > device. +> +> +> +> It's a normal kata instance using virtio-fs as rootfs. +> +> +> +> /usr/local/libexec/qemu-kvm -name +> +> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ +> +> -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +> +> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ +> +> -cpu host -qmp +> +> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +> \ +> +> -qmp +> +> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +> \ +> +> -m 2048M,slots=10,maxmem=773893M -device +> +> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ +> +> -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +> +> virtconsole,chardev=charconsole0,id=console0 \ +> +> -chardev +> +> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait +> +> \ +> +> -device +> +> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 +> +> \ +> +> -chardev +> +> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait +> +> \ +> +> -device nvdimm,id=nv0,memdev=mem0 -object +> +> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 +> +> \ +> +> -object rng-random,id=rng0,filename=/dev/urandom -device +> +> virtio-rng,rng=rng0,romfile= \ +> +> -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ +> +> -chardev +> +> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait +> +> \ +> +> -chardev +> +> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock +> +> \ +> +> -device +> +> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +> +> -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ +> +> -device +> +> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= +> +> \ +> +> -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config +> +> -nodefaults -nographic -daemonize \ +> +> -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +> +> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ +> +> -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 +> +> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp +> +> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests +> +> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 +> +> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet +> +> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false +> +> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target +> +> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \ +> +> -pidfile +> +> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +> +> \ +> +> -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 +> +> +> +> QMP command to delete device (the device id is just an example, not the +> +> one caused the crash): +> +> +> +> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" +> +> +> +> which has been hot plugged by: +> +> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +> +> "{\"return\": {}}" +> +> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +> +> "{\"return\": {}}" +> +> +Thanks. I wasn't able to reproduce this crash with qemu.git/master. +> +> +One thing that is strange about the latest backtrace you posted: QEMU is +> +dispatching the memory access instead of using the ioeventfd code that +> +that virtio-blk-pci normally takes when a virtqueue is notified. I +> +guess this means ioeventfd has already been disabled due to the hot +> +unplug. +> +> +Could you try with machine type "i440fx" instead of "q35"? I wonder if +> +pci-bridge/shpc is part of the problem. +Sure, will try it. But it may take some time, as the test bed is busy +with other testing tasks. I'll report back once I got the results. + +Thanks, +Eryu + diff --git a/classification_output/01/other/2308923 b/classification_output/01/other/2308923 new file mode 100644 index 000000000..9c03c9d16 --- /dev/null +++ b/classification_output/01/other/2308923 @@ -0,0 +1,235 @@ +other: 0.877 +semantic: 0.825 +instruction: 0.816 +mistranslation: 0.811 + +[Qemu-devel] [BUG] Monitor QMP is broken ? + +Hello! + + I have updated my qemu to the recent version and it seems to have lost +compatibility with +libvirt. The error message is: +--- cut --- +internal error: unable to execute QEMU command 'qmp_capabilities': QMP input +object member +'id' is unexpected +--- cut --- + What does it mean? Is it intentional or not? + +Kind regards, +Pavel Fedin +Expert Engineer +Samsung Electronics Research center Russia + +Hello! + +> +I have updated my qemu to the recent version and it seems to have lost +> +compatibility +with +> +libvirt. The error message is: +> +--- cut --- +> +internal error: unable to execute QEMU command 'qmp_capabilities': QMP input +> +object +> +member +> +'id' is unexpected +> +--- cut --- +> +What does it mean? Is it intentional or not? +I have found the problem. It is caused by commit +65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the +removed +asynchronous interface but it still feeds in JSONs with 'id' field set to +something. So i +think the related fragment in qmp_check_input_obj() function should be brought +back + +Kind regards, +Pavel Fedin +Expert Engineer +Samsung Electronics Research center Russia + +On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote: +> +Hello! +> +> +> I have updated my qemu to the recent version and it seems to have lost +> +> compatibility +> +with +> +> libvirt. The error message is: +> +> --- cut --- +> +> internal error: unable to execute QEMU command 'qmp_capabilities': QMP +> +> input object +> +> member +> +> 'id' is unexpected +> +> --- cut --- +> +> What does it mean? Is it intentional or not? +> +> +I have found the problem. It is caused by commit +> +65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to use the +> +removed +> +asynchronous interface but it still feeds in JSONs with 'id' field set to +> +something. So i +> +think the related fragment in qmp_check_input_obj() function should be +> +brought back +If QMP is rejecting the 'id' parameter that is a regression bug. + +[quote] +The QMP spec says + +2.3 Issuing Commands +-------------------- + +The format for command execution is: + +{ "execute": json-string, "arguments": json-object, "id": json-value } + + Where, + +- The "execute" member identifies the command to be executed by the Server +- The "arguments" member is used to pass any arguments required for the + execution of the command, it is optional when no arguments are + required. Each command documents what contents will be considered + valid when handling the json-argument +- The "id" member is a transaction identification associated with the + command execution, it is optional and will be part of the response if + provided. The "id" member can be any json-value, although most + clients merely use a json-number incremented for each successive + command + + +2.4 Commands Responses +---------------------- + +There are two possible responses which the Server will issue as the result +of a command execution: success or error. + +2.4.1 success +------------- + +The format of a success response is: + +{ "return": json-value, "id": json-value } + + Where, + +- The "return" member contains the data returned by the command, which + is defined on a per-command basis (usually a json-object or + json-array of json-objects, but sometimes a json-number, json-string, + or json-array of json-strings); it is an empty json-object if the + command does not return data +- The "id" member contains the transaction identification associated + with the command execution if issued by the Client + +[/quote] + +And as such, libvirt chose to /always/ send an 'id' parameter in all +commands it issues. + +We don't however validate the id in the reply, though arguably we +should have done so. + +Regards, +Daniel +-- +|: +http://berrange.com +-o- +http://www.flickr.com/photos/dberrange/ +:| +|: +http://libvirt.org +-o- +http://virt-manager.org +:| +|: +http://autobuild.org +-o- +http://search.cpan.org/~danberr/ +:| +|: +http://entangle-photo.org +-o- +http://live.gnome.org/gtk-vnc +:| + +"Daniel P. Berrange" <address@hidden> writes: + +> +On Fri, Jun 05, 2015 at 04:58:46PM +0300, Pavel Fedin wrote: +> +> Hello! +> +> +> +> > I have updated my qemu to the recent version and it seems to have +> +> > lost compatibility +> +> with +> +> > libvirt. The error message is: +> +> > --- cut --- +> +> > internal error: unable to execute QEMU command 'qmp_capabilities': +> +> > QMP input object +> +> > member +> +> > 'id' is unexpected +> +> > --- cut --- +> +> > What does it mean? Is it intentional or not? +> +> +> +> I have found the problem. It is caused by commit +> +> 65207c59d99f2260c5f1d3b9c491146616a522aa. libvirt does not seem to +> +> use the removed +> +> asynchronous interface but it still feeds in JSONs with 'id' field +> +> set to something. So i +> +> think the related fragment in qmp_check_input_obj() function should +> +> be brought back +> +> +If QMP is rejecting the 'id' parameter that is a regression bug. +It is definitely a regression, my fault, and I'll get it fixed a.s.a.p. + +[...] + diff --git a/classification_output/01/other/2393649 b/classification_output/01/other/2393649 new file mode 100644 index 000000000..87a9c6fbe --- /dev/null +++ b/classification_output/01/other/2393649 @@ -0,0 +1,344 @@ +other: 0.791 +mistranslation: 0.735 +semantic: 0.705 +instruction: 0.653 + +[Qemu-devel] [Bug] virtio-blk: qemu will crash if hotplug virtio-blk device failed + +I found that hotplug virtio-blk device will lead to qemu crash. + +Re-production steps: + +1. Run VM named vm001 + +2. Create a virtio-blk.xml which contains wrong configurations: +<disk device="lun" rawio="yes" type="block"> + <driver cache="none" io="native" name="qemu" type="raw" /> + <source dev="/dev/mapper/11-dm" /> + <target bus="virtio" dev="vdx" /> +</disk> + +3. Run command : virsh attach-device vm001 vm001 + +Libvirt will return err msg: + +error: Failed to attach device from blk-scsi.xml + +error: internal error: unable to execute QEMU command 'device_add': Please set +scsi=off for virtio-blk devices in order to use virtio 1.0 + +it means hotplug virtio-blk device failed. + +4. Suspend or shutdown VM will leads to qemu crash + + + +from gdb: + + +(gdb) bt +#0 object_get_class (address@hidden) at qom/object.c:750 +#1 0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960, +running=0, state=<optimized out>) at +/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203 +#2 0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at +vl.c:1685 +#3 0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at +/mnt/sdb/lzc/code/open/qemu/cpus.c:941 +#4 vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807 +#5 0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102 +#6 0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>, +ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854 +#7 0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0, +request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at +qapi/qmp-dispatch.c:104 +#8 qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at +qapi/qmp-dispatch.c:131 +#9 0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>, +tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852 +#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498, +input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at +qobject/json-streamer.c:105 +#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}', +address@hidden) at qobject/json-lexer.c:323 +#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498, +buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373 +#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>, +buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124 +#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>, +buf=<optimized out>, size=<optimized out>) at +/mnt/sdb/lzc/code/open/qemu/monitor.c:3894 +#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized +out>, opaque=<optimized out>) at chardev/char-socket.c:441 +#16 0x00007f9a6e03e99a in g_main_context_dispatch () from +/usr/lib64/libglib-2.0.so.0 +#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214 +#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261 +#19 main_loop_wait (address@hidden) at util/main-loop.c:515 +#20 0x00007f9a724e7547 in main_loop () at vl.c:1999 +#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at +vl.c:4877 + +Problem happens in virtio_vmstate_change which is called by vm_state_notify, +static void virtio_vmstate_change(void *opaque, int running, RunState state) +{ + VirtIODevice *vdev = opaque; + BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); + VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); + bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK); + vdev->vm_running = running; + + if (backend_run) { + virtio_set_status(vdev, vdev->status); + } + + if (k->vmstate_change) { + k->vmstate_change(qbus->parent, backend_run); + } + + if (!backend_run) { + virtio_set_status(vdev, vdev->status); + } +} + +Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash. +virtio_vmstate_change is added to the list vm_change_state_head at +virtio_blk_device_realize(virtio_init), +but after hotplug virtio-blk failed, virtio_vmstate_change will not be removed +from vm_change_state_head. + + +I apply a patch as follews: + +diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c +index 5884ce3..ea532dc 100644 +--- a/hw/virtio/virtio.c ++++ b/hw/virtio/virtio.c +@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev, Error +**errp) + virtio_bus_device_plugged(vdev, &err); + if (err != NULL) { + error_propagate(errp, err); ++ vdc->unrealize(dev, NULL); + return; + } + +On Tue, Oct 31, 2017 at 05:19:08AM +0000, linzhecheng wrote: +> +I found that hotplug virtio-blk device will lead to qemu crash. +The author posted a patch in a separate email thread. Please see +"[PATCH] fix: unrealize virtio device if we fail to hotplug it". + +> +Re-production steps: +> +> +1. Run VM named vm001 +> +> +2. Create a virtio-blk.xml which contains wrong configurations: +> +<disk device="lun" rawio="yes" type="block"> +> +<driver cache="none" io="native" name="qemu" type="raw" /> +> +<source dev="/dev/mapper/11-dm" /> +> +<target bus="virtio" dev="vdx" /> +> +</disk> +> +> +3. Run command : virsh attach-device vm001 vm001 +> +> +Libvirt will return err msg: +> +> +error: Failed to attach device from blk-scsi.xml +> +> +error: internal error: unable to execute QEMU command 'device_add': Please +> +set scsi=off for virtio-blk devices in order to use virtio 1.0 +> +> +it means hotplug virtio-blk device failed. +> +> +4. Suspend or shutdown VM will leads to qemu crash +> +> +> +> +from gdb: +> +> +> +(gdb) bt +> +#0 object_get_class (address@hidden) at qom/object.c:750 +> +#1 0x00007f9a72582e01 in virtio_vmstate_change (opaque=0x7f9a73d10960, +> +running=0, state=<optimized out>) at +> +/mnt/sdb/lzc/code/open/qemu/hw/virtio/virtio.c:2203 +> +#2 0x00007f9a7261ef52 in vm_state_notify (address@hidden, address@hidden) at +> +vl.c:1685 +> +#3 0x00007f9a7252603a in do_vm_stop (state=RUN_STATE_PAUSED) at +> +/mnt/sdb/lzc/code/open/qemu/cpus.c:941 +> +#4 vm_stop (address@hidden) at /mnt/sdb/lzc/code/open/qemu/cpus.c:1807 +> +#5 0x00007f9a7262eb1b in qmp_stop (address@hidden) at qmp.c:102 +> +#6 0x00007f9a7262c70a in qmp_marshal_stop (args=<optimized out>, +> +ret=<optimized out>, errp=0x7ffe63e255d8) at qmp-marshal.c:5854 +> +#7 0x00007f9a72897e79 in do_qmp_dispatch (errp=0x7ffe63e255d0, +> +request=0x7f9a76510120, cmds=0x7f9a72ee7980 <qmp_commands>) at +> +qapi/qmp-dispatch.c:104 +> +#8 qmp_dispatch (cmds=0x7f9a72ee7980 <qmp_commands>, address@hidden) at +> +qapi/qmp-dispatch.c:131 +> +#9 0x00007f9a725288d5 in handle_qmp_command (parser=<optimized out>, +> +tokens=<optimized out>) at /mnt/sdb/lzc/code/open/qemu/monitor.c:3852 +> +#10 0x00007f9a7289d514 in json_message_process_token (lexer=0x7f9a73ce4498, +> +input=0x7f9a73cc6880, type=JSON_RCURLY, x=36, y=17) at +> +qobject/json-streamer.c:105 +> +#11 0x00007f9a728bb69b in json_lexer_feed_char (address@hidden, ch=125 '}', +> +address@hidden) at qobject/json-lexer.c:323 +> +#12 0x00007f9a728bb75e in json_lexer_feed (lexer=0x7f9a73ce4498, +> +buffer=<optimized out>, size=<optimized out>) at qobject/json-lexer.c:373 +> +#13 0x00007f9a7289d5d9 in json_message_parser_feed (parser=<optimized out>, +> +buffer=<optimized out>, size=<optimized out>) at qobject/json-streamer.c:124 +> +#14 0x00007f9a7252722e in monitor_qmp_read (opaque=<optimized out>, +> +buf=<optimized out>, size=<optimized out>) at +> +/mnt/sdb/lzc/code/open/qemu/monitor.c:3894 +> +#15 0x00007f9a7284ee1b in tcp_chr_read (chan=<optimized out>, cond=<optimized +> +out>, opaque=<optimized out>) at chardev/char-socket.c:441 +> +#16 0x00007f9a6e03e99a in g_main_context_dispatch () from +> +/usr/lib64/libglib-2.0.so.0 +> +#17 0x00007f9a728a342c in glib_pollfds_poll () at util/main-loop.c:214 +> +#18 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:261 +> +#19 main_loop_wait (address@hidden) at util/main-loop.c:515 +> +#20 0x00007f9a724e7547 in main_loop () at vl.c:1999 +> +#21 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) +> +at vl.c:4877 +> +> +Problem happens in virtio_vmstate_change which is called by vm_state_notify, +> +static void virtio_vmstate_change(void *opaque, int running, RunState state) +> +{ +> +VirtIODevice *vdev = opaque; +> +BusState *qbus = qdev_get_parent_bus(DEVICE(vdev)); +> +VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus); +> +bool backend_run = running && (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK); +> +vdev->vm_running = running; +> +> +if (backend_run) { +> +virtio_set_status(vdev, vdev->status); +> +} +> +> +if (k->vmstate_change) { +> +k->vmstate_change(qbus->parent, backend_run); +> +} +> +> +if (!backend_run) { +> +virtio_set_status(vdev, vdev->status); +> +} +> +} +> +> +Vdev's parent_bus is NULL, so qdev_get_parent_bus(DEVICE(vdev)) will crash. +> +virtio_vmstate_change is added to the list vm_change_state_head at +> +virtio_blk_device_realize(virtio_init), +> +but after hotplug virtio-blk failed, virtio_vmstate_change will not be +> +removed from vm_change_state_head. +> +> +> +I apply a patch as follews: +> +> +diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c +> +index 5884ce3..ea532dc 100644 +> +--- a/hw/virtio/virtio.c +> ++++ b/hw/virtio/virtio.c +> +@@ -2491,6 +2491,7 @@ static void virtio_device_realize(DeviceState *dev, +> +Error **errp) +> +virtio_bus_device_plugged(vdev, &err); +> +if (err != NULL) { +> +error_propagate(errp, err); +> ++ vdc->unrealize(dev, NULL); +> +return; +> +} +signature.asc +Description: +PGP signature + diff --git a/classification_output/01/other/2409210 b/classification_output/01/other/2409210 new file mode 100644 index 000000000..7371cd578 --- /dev/null +++ b/classification_output/01/other/2409210 @@ -0,0 +1,418 @@ +other: 0.997 +semantic: 0.995 +instruction: 0.993 +mistranslation: 0.974 + +[Qemu-devel] Fwd: [BUG] Failed to compile using gcc7.1 + +Hi all, +I encountered the same problem on gcc 7.1.1 and found Qu's mail in +this list from google search. + +Temporarily fix it by specifying the string length in snprintf +directive. Hope this is helpful to other people encountered the same +problem. + +@@ -1,9 +1,7 @@ +--- +--- a/block/blkdebug.c +- "blkdebug:%s:%s", s->config_file ?: "", +--- a/block/blkverify.c +- "blkverify:%s:%s", +--- a/hw/usb/bus.c +- snprintf(downstream->path, sizeof(downstream->path), "%s.%d", +- snprintf(downstream->path, sizeof(downstream->path), "%d", portnr); +-- ++++ b/block/blkdebug.c ++ "blkdebug:%.2037s:%.2037s", s->config_file ?: "", ++++ b/block/blkverify.c ++ "blkverify:%.2038s:%.2038s", ++++ b/hw/usb/bus.c ++ snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", ++ snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr); + +Tsung-en Hsiao + +> +Qu Wenruo Wrote: +> +> +Hi all, +> +> +After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. +> +> +The error is: +> +> +------ +> +CC block/blkdebug.o +> +block/blkdebug.c: In function 'blkdebug_refresh_filename': +> +> +block/blkdebug.c:693:31: error: '%s' directive output may be truncated +> +writing up to 4095 bytes into a region of size 4086 +> +[-Werror=format-truncation=] +> +> +"blkdebug:%s:%s", s->config_file ?: "", +> +^~ +> +In file included from /usr/include/stdio.h:939:0, +> +from /home/adam/qemu/include/qemu/osdep.h:68, +> +from block/blkdebug.c:25: +> +> +/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 +> +or more bytes (assuming 4106) into a destination of size 4096 +> +> +return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, +> +^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +__bos (__s), __fmt, __va_arg_pack ()); +> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +cc1: all warnings being treated as errors +> +make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +> +------ +> +> +It seems that gcc 7 is introducing more restrict check for printf. +> +> +If using clang, although there are some extra warning, it can at least pass +> +the compile. +> +> +Thanks, +> +Qu + +Hi Tsung-en, + +On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote: +Hi all, +I encountered the same problem on gcc 7.1.1 and found Qu's mail in +this list from google search. + +Temporarily fix it by specifying the string length in snprintf +directive. Hope this is helpful to other people encountered the same +problem. +Thank your for sharing this. +@@ -1,9 +1,7 @@ +--- +--- a/block/blkdebug.c +- "blkdebug:%s:%s", s->config_file ?: "", +--- a/block/blkverify.c +- "blkverify:%s:%s", +--- a/hw/usb/bus.c +- snprintf(downstream->path, sizeof(downstream->path), "%s.%d", +- snprintf(downstream->path, sizeof(downstream->path), "%d", portnr); +-- ++++ b/block/blkdebug.c ++ "blkdebug:%.2037s:%.2037s", s->config_file ?: "", +It is a rather funny way to silent this warning :) Truncating the +filename until it fits. +However I don't think it is the correct way since there is indeed an +overflow of bs->exact_filename. +Apparently exact_filename from "block/block_int.h" is defined to hold a +pathname: +char exact_filename[PATH_MAX]; +but is used for more than that (for example in blkdebug.c it might use +until 10+2*PATH_MAX chars). +I suppose it started as a buffer to hold a pathname then more block +drivers were added and this buffer ended used differently. +If it is a multi-purpose buffer one safer option might be to declare it +as a GString* and use g_string_printf(). +I CC'ed the block folks to have their feedback. + +Regards, + +Phil. ++++ b/block/blkverify.c ++ "blkverify:%.2038s:%.2038s", ++++ b/hw/usb/bus.c ++ snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", ++ snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr); + +Tsung-en Hsiao +Qu Wenruo Wrote: + +Hi all, + +After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. + +The error is: + +------ + CC block/blkdebug.o +block/blkdebug.c: In function 'blkdebug_refresh_filename': + +block/blkdebug.c:693:31: error: '%s' directive output may be truncated writing +up to 4095 bytes into a region of size 4086 [-Werror=format-truncation=] + + "blkdebug:%s:%s", s->config_file ?: "", + ^~ +In file included from /usr/include/stdio.h:939:0, + from /home/adam/qemu/include/qemu/osdep.h:68, + from block/blkdebug.c:25: + +/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 or +more bytes (assuming 4106) into a destination of size 4096 + + return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, + ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + __bos (__s), __fmt, __va_arg_pack ()); + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +cc1: all warnings being treated as errors +make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +------ + +It seems that gcc 7 is introducing more restrict check for printf. + +If using clang, although there are some extra warning, it can at least pass the +compile. + +Thanks, +Qu + +On 2017-06-12 05:19, Philippe Mathieu-Daudé wrote: +> +Hi Tsung-en, +> +> +On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote: +> +> Hi all, +> +> I encountered the same problem on gcc 7.1.1 and found Qu's mail in +> +> this list from google search. +> +> +> +> Temporarily fix it by specifying the string length in snprintf +> +> directive. Hope this is helpful to other people encountered the same +> +> problem. +> +> +Thank your for sharing this. +> +> +> +> +> @@ -1,9 +1,7 @@ +> +> --- +> +> --- a/block/blkdebug.c +> +> - "blkdebug:%s:%s", s->config_file ?: "", +> +> --- a/block/blkverify.c +> +> - "blkverify:%s:%s", +> +> --- a/hw/usb/bus.c +> +> - snprintf(downstream->path, sizeof(downstream->path), "%s.%d", +> +> - snprintf(downstream->path, sizeof(downstream->path), "%d", +> +> portnr); +> +> -- +> +> +++ b/block/blkdebug.c +> +> + "blkdebug:%.2037s:%.2037s", s->config_file ?: "", +> +> +It is a rather funny way to silent this warning :) Truncating the +> +filename until it fits. +> +> +However I don't think it is the correct way since there is indeed an +> +overflow of bs->exact_filename. +> +> +Apparently exact_filename from "block/block_int.h" is defined to hold a +> +pathname: +> +char exact_filename[PATH_MAX]; +> +> +but is used for more than that (for example in blkdebug.c it might use +> +until 10+2*PATH_MAX chars). +In any case, truncating the filenames will do just as much as truncating +the result: You'll get an unusable filename. + +> +I suppose it started as a buffer to hold a pathname then more block +> +drivers were added and this buffer ended used differently. +> +> +If it is a multi-purpose buffer one safer option might be to declare it +> +as a GString* and use g_string_printf(). +What it is supposed to be now is just an information string we can print +to the user, because strings are nicer than JSON objects. There are some +commands that take a filename for identifying a block node, but I dream +we can get rid of them in 3.0... + +The right solution is to remove it altogether and have a +"char *bdrv_filename(BlockDriverState *bs)" function (which generates +the filename every time it's called). I've been working on this for some +years now, actually, but it was never pressing enough to get it finished +(so I never had enough time). + +What we can do in the meantime is to not generate a plain filename if it +won't fit into bs->exact_filename. + +(The easiest way to do this probably would be to truncate +bs->exact_filename back to an empty string if snprintf() returns a value +greater than or equal to the length of bs->exact_filename.) + +What to do about hw/usb/bus.c I don't know (I guess the best solution +would be to ignore the warning, but I don't suppose that is going to work). + +Max + +> +> +I CC'ed the block folks to have their feedback. +> +> +Regards, +> +> +Phil. +> +> +> +++ b/block/blkverify.c +> +> + "blkverify:%.2038s:%.2038s", +> +> +++ b/hw/usb/bus.c +> +> + snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", +> +> + snprintf(downstream->path, sizeof(downstream->path), "%.12d", +> +> portnr); +> +> +> +> Tsung-en Hsiao +> +> +> +>> Qu Wenruo Wrote: +> +>> +> +>> Hi all, +> +>> +> +>> After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with +> +>> gcc. +> +>> +> +>> The error is: +> +>> +> +>> ------ +> +>> CC block/blkdebug.o +> +>> block/blkdebug.c: In function 'blkdebug_refresh_filename': +> +>> +> +>> block/blkdebug.c:693:31: error: '%s' directive output may be +> +>> truncated writing up to 4095 bytes into a region of size 4086 +> +>> [-Werror=format-truncation=] +> +>> +> +>> "blkdebug:%s:%s", s->config_file ?: "", +> +>> ^~ +> +>> In file included from /usr/include/stdio.h:939:0, +> +>> from /home/adam/qemu/include/qemu/osdep.h:68, +> +>> from block/blkdebug.c:25: +> +>> +> +>> /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' +> +>> output 11 or more bytes (assuming 4106) into a destination of size 4096 +> +>> +> +>> return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, +> +>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +>> __bos (__s), __fmt, __va_arg_pack ()); +> +>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +>> cc1: all warnings being treated as errors +> +>> make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +> +>> ------ +> +>> +> +>> It seems that gcc 7 is introducing more restrict check for printf. +> +>> +> +>> If using clang, although there are some extra warning, it can at +> +>> least pass the compile. +> +>> +> +>> Thanks, +> +>> Qu +> +> +signature.asc +Description: +OpenPGP digital signature + diff --git a/classification_output/01/other/2537817 b/classification_output/01/other/2537817 new file mode 100644 index 000000000..099e56865 --- /dev/null +++ b/classification_output/01/other/2537817 @@ -0,0 +1,532 @@ +other: 0.626 +mistranslation: 0.615 +instruction: 0.572 +semantic: 0.555 + +[Qemu-devel] [Bug] Docs build fails at interop.rst + +https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw +running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31 +(Rawhide) + +uname - a +Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32 +UTC 2019 x86_64 x86_64 x86_64 GNU/Linux + +Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431 +allows for the build to occur + +Regards +Aarushi Mehta + +On 5/20/19 7:30 AM, Aarushi Mehta wrote: +> +https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw +> +running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31 +> +(Rawhide) +> +> +uname - a +> +Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32 +> +UTC 2019 x86_64 x86_64 x86_64 GNU/Linux +> +> +Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431 +> +allows for the build to occur +> +> +Regards +> +Aarushi Mehta +> +> +Ah, dang. The blocks aren't strictly conforming json, but the version I +tested this under didn't seem to care. Your version is much newer. (I +was using 1.7 as provided by Fedora 29.) + +For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead, +which should at least turn off the "warnings as errors" option, but I +don't think that reverting -n will turn off this warning. + +I'll try to get ahold of this newer version and see if I can't fix it +more appropriately. + +--js + +On 5/20/19 12:37 PM, John Snow wrote: +> +> +> +On 5/20/19 7:30 AM, Aarushi Mehta wrote: +> +> +https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw +> +> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31 +> +> (Rawhide) +> +> +> +> uname - a +> +> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32 +> +> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux +> +> +> +> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431 +> +> allows for the build to occur +> +> +> +> Regards +> +> Aarushi Mehta +> +> +> +> +> +> +Ah, dang. The blocks aren't strictly conforming json, but the version I +> +tested this under didn't seem to care. Your version is much newer. (I +> +was using 1.7 as provided by Fedora 29.) +> +> +For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead, +> +which should at least turn off the "warnings as errors" option, but I +> +don't think that reverting -n will turn off this warning. +> +> +I'll try to get ahold of this newer version and see if I can't fix it +> +more appropriately. +> +> +--js +> +...Sigh, okay. + +So, I am still not actually sure what changed from pygments 2.2 and +sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx +by default always tries to do add a filter to the pygments lexer that +raises an error on highlighting failure, instead of the default behavior +which is to just highlight those errors in the output. There is no +option to Sphinx that I am aware of to retain this lexing behavior. +(Effectively, it's strict or nothing.) + +This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the +build works with our malformed json. + +There are a few options: + +1. Update conf.py to ignore these warnings (and all future lexing +errors), and settle for the fact that there will be no QMP highlighting +wherever we use the directionality indicators ('->', '<-'). + +2. Update bitmaps.rst to remove the directionality indicators. + +3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON. + +4. Update bitmaps.rst to remove the "json" specification from the code +block. This will cause sphinx to "guess" the formatting, and the +pygments guesser will decide it's Python3. + +This will parse well enough, but will mis-highlight 'true' and 'false' +which are not python keywords. This approach may break in the future if +the Python3 lexer is upgraded to be stricter (because '->' and '<-' are +still invalid), and leaves us at the mercy of both the guesser and the +lexer. + +I'm not actually sure what I dislike the least; I think I dislike #1 the +most. #4 gets us most of what we want but is perhaps porcelain. + +I suspect if we attempt to move more of our documentation to ReST and +Sphinx that we will need to answer for ourselves how we intend to +document QMP code flow examples. + +--js + +On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote: +> +> +> +On 5/20/19 12:37 PM, John Snow wrote: +> +> +> +> +> +> On 5/20/19 7:30 AM, Aarushi Mehta wrote: +> +>> +https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw +> +>> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31 +> +>> (Rawhide) +> +>> +> +>> uname - a +> +>> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32 +> +>> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux +> +>> +> +>> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431 +> +>> allows for the build to occur +> +>> +> +>> Regards +> +>> Aarushi Mehta +> +>> +> +>> +> +> +> +> Ah, dang. The blocks aren't strictly conforming json, but the version I +> +> tested this under didn't seem to care. Your version is much newer. (I +> +> was using 1.7 as provided by Fedora 29.) +> +> +> +> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead, +> +> which should at least turn off the "warnings as errors" option, but I +> +> don't think that reverting -n will turn off this warning. +> +> +> +> I'll try to get ahold of this newer version and see if I can't fix it +> +> more appropriately. +> +> +> +> --js +> +> +> +> +...Sigh, okay. +> +> +So, I am still not actually sure what changed from pygments 2.2 and +> +sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx +> +by default always tries to do add a filter to the pygments lexer that +> +raises an error on highlighting failure, instead of the default behavior +> +which is to just highlight those errors in the output. There is no +> +option to Sphinx that I am aware of to retain this lexing behavior. +> +(Effectively, it's strict or nothing.) +> +> +This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the +> +build works with our malformed json. +> +> +There are a few options: +> +> +1. Update conf.py to ignore these warnings (and all future lexing +> +errors), and settle for the fact that there will be no QMP highlighting +> +wherever we use the directionality indicators ('->', '<-'). +> +> +2. Update bitmaps.rst to remove the directionality indicators. +> +> +3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON. +> +> +4. Update bitmaps.rst to remove the "json" specification from the code +> +block. This will cause sphinx to "guess" the formatting, and the +> +pygments guesser will decide it's Python3. +> +> +This will parse well enough, but will mis-highlight 'true' and 'false' +> +which are not python keywords. This approach may break in the future if +> +the Python3 lexer is upgraded to be stricter (because '->' and '<-' are +> +still invalid), and leaves us at the mercy of both the guesser and the +> +lexer. +> +> +I'm not actually sure what I dislike the least; I think I dislike #1 the +> +most. #4 gets us most of what we want but is perhaps porcelain. +> +> +I suspect if we attempt to move more of our documentation to ReST and +> +Sphinx that we will need to answer for ourselves how we intend to +> +document QMP code flow examples. +Writing a custom lexer that handles "<-" and "->" was simple (see below). + +Now, is it possible to convince Sphinx to register and use a custom lexer? + +$ cat > /tmp/lexer.py <<EOF +from pygments.lexer import RegexLexer, DelegatingLexer +from pygments.lexers.data import JsonLexer +import re +from pygments.token import * + +class QMPExampleMarkersLexer(RegexLexer): + tokens = { + 'root': [ + (r' *-> *', Generic.Prompt), + (r' *<- *', Generic.Output), + ] + } + +class QMPExampleLexer(DelegatingLexer): + def __init__(self, **options): + super(QMPExampleLexer, self).__init__(JsonLexer, +QMPExampleMarkersLexer, Error, **options) +EOF +$ pygmentize -l /tmp/lexer.py:QMPExampleLexer -x -f html <<EOF + -> { + "execute": "drive-backup", + "arguments": { + "device": "drive0", + "bitmap": "bitmap0", + "target": "drive0.inc0.qcow2", + "format": "qcow2", + "sync": "incremental", + "mode": "existing" + } + } + + <- { "return": {} } +EOF +<div class="highlight"><pre><span></span><span class="gp"> -> +</span><span class="p">{</span> + <span class="nt">"execute"</span><span class="p">:</span> +<span class="s2">"drive-backup"</span><span class="p">,</span> + <span class="nt">"arguments"</span><span class="p">:</span> +<span class="p">{</span> + <span class="nt">"device"</span><span class="p">:</span> +<span class="s2">"drive0"</span><span class="p">,</span> + <span class="nt">"bitmap"</span><span class="p">:</span> +<span class="s2">"bitmap0"</span><span class="p">,</span> + <span class="nt">"target"</span><span class="p">:</span> +<span class="s2">"drive0.inc0.qcow2"</span><span class="p">,</span> + <span class="nt">"format"</span><span class="p">:</span> +<span class="s2">"qcow2"</span><span class="p">,</span> + <span class="nt">"sync"</span><span class="p">:</span> +<span class="s2">"incremental"</span><span class="p">,</span> + <span class="nt">"mode"</span><span class="p">:</span> +<span class="s2">"existing"</span> + <span class="p">}</span> + <span class="p">}</span> + +<span class="go"> <- </span><span class="p">{</span> <span +class="nt">"return"</span><span class="p">:</span> <span +class="p">{}</span> <span class="p">}</span> +</pre></div> +$ + + +-- +Eduardo + +On 5/20/19 7:04 PM, Eduardo Habkost wrote: +> +On Mon, May 20, 2019 at 05:25:28PM -0400, John Snow wrote: +> +> +> +> +> +> On 5/20/19 12:37 PM, John Snow wrote: +> +>> +> +>> +> +>> On 5/20/19 7:30 AM, Aarushi Mehta wrote: +> +>>> +https://paste.fedoraproject.org/paste/kOPx4jhtUli---TmxSLrlw +> +>>> running python3-sphinx-2.0.1-1.fc31.noarch on Fedora release 31 +> +>>> (Rawhide) +> +>>> +> +>>> uname - a +> +>>> Linux iouring 5.1.0-0.rc6.git3.1.fc31.x86_64 #1 SMP Thu Apr 25 14:25:32 +> +>>> UTC 2019 x86_64 x86_64 x86_64 GNU/Linux +> +>>> +> +>>> Reverting commmit 90edef80a0852cf8a3d2668898ee40e8970e431 +> +>>> allows for the build to occur +> +>>> +> +>>> Regards +> +>>> Aarushi Mehta +> +>>> +> +>>> +> +>> +> +>> Ah, dang. The blocks aren't strictly conforming json, but the version I +> +>> tested this under didn't seem to care. Your version is much newer. (I +> +>> was using 1.7 as provided by Fedora 29.) +> +>> +> +>> For now, try reverting 9e5b6cb87db66dfb606604fe6cf40e5ddf1ef0e7 instead, +> +>> which should at least turn off the "warnings as errors" option, but I +> +>> don't think that reverting -n will turn off this warning. +> +>> +> +>> I'll try to get ahold of this newer version and see if I can't fix it +> +>> more appropriately. +> +>> +> +>> --js +> +>> +> +> +> +> ...Sigh, okay. +> +> +> +> So, I am still not actually sure what changed from pygments 2.2 and +> +> sphinx 1.7 to pygments 2.4 and sphinx 2.0.1, but it appears as if Sphinx +> +> by default always tries to do add a filter to the pygments lexer that +> +> raises an error on highlighting failure, instead of the default behavior +> +> which is to just highlight those errors in the output. There is no +> +> option to Sphinx that I am aware of to retain this lexing behavior. +> +> (Effectively, it's strict or nothing.) +> +> +> +> This approach, apparently, is broken in Sphinx 1.7/Pygments 2.2, so the +> +> build works with our malformed json. +> +> +> +> There are a few options: +> +> +> +> 1. Update conf.py to ignore these warnings (and all future lexing +> +> errors), and settle for the fact that there will be no QMP highlighting +> +> wherever we use the directionality indicators ('->', '<-'). +> +> +> +> 2. Update bitmaps.rst to remove the directionality indicators. +> +> +> +> 3. Update bitmaps.rst to format the QMP blocks as raw text instead of JSON. +> +> +> +> 4. Update bitmaps.rst to remove the "json" specification from the code +> +> block. This will cause sphinx to "guess" the formatting, and the +> +> pygments guesser will decide it's Python3. +> +> +> +> This will parse well enough, but will mis-highlight 'true' and 'false' +> +> which are not python keywords. This approach may break in the future if +> +> the Python3 lexer is upgraded to be stricter (because '->' and '<-' are +> +> still invalid), and leaves us at the mercy of both the guesser and the +> +> lexer. +> +> +> +> I'm not actually sure what I dislike the least; I think I dislike #1 the +> +> most. #4 gets us most of what we want but is perhaps porcelain. +> +> +> +> I suspect if we attempt to move more of our documentation to ReST and +> +> Sphinx that we will need to answer for ourselves how we intend to +> +> document QMP code flow examples. +> +> +Writing a custom lexer that handles "<-" and "->" was simple (see below). +> +> +Now, is it possible to convince Sphinx to register and use a custom lexer? +> +Spoilers, yes, and I've sent a patch to list. Thanks for your help! + diff --git a/classification_output/01/other/2562302 b/classification_output/01/other/2562302 new file mode 100644 index 000000000..9058ba907 --- /dev/null +++ b/classification_output/01/other/2562302 @@ -0,0 +1,149 @@ +other: 0.332 +semantic: 0.327 +mistranslation: 0.314 +instruction: 0.307 + +[Qemu-devel] [PATCH, Bug 1612908] scripts: Add TCP endpoints for qom-* scripts + +From: Carl Allendorph <address@hidden> + +I've created a patch for bug #1612908. The current docs for the scripts +in the "scripts/qmp/" directory suggest that both unix sockets and +tcp endpoints can be used. The TCP endpoints don't work for most of the +scripts, with notable exception of 'qmp-shell'. This patch attempts to +refactor the process of distinguishing between unix path endpoints and +tcp endpoints to work for all of these scripts. + +Carl Allendorph (1): + scripts: Add ability for qom-* python scripts to target tcp endpoints + + scripts/qmp/qmp-shell | 22 ++-------------------- + scripts/qmp/qmp.py | 23 ++++++++++++++++++++--- + 2 files changed, 22 insertions(+), 23 deletions(-) + +-- +2.7.4 + +From: Carl Allendorph <address@hidden> + +The current code for QEMUMonitorProtocol accepts both a unix socket +endpoint as a string and a tcp endpoint as a tuple. Most of the scripts +that use this class don't massage the command line argument to generate +a tuple. This patch refactors qmp-shell slightly to reuse the existing +parsing of the "host:port" string for all the qom-* scripts. + +Signed-off-by: Carl Allendorph <address@hidden> +--- + scripts/qmp/qmp-shell | 22 ++-------------------- + scripts/qmp/qmp.py | 23 ++++++++++++++++++++--- + 2 files changed, 22 insertions(+), 23 deletions(-) + +diff --git a/scripts/qmp/qmp-shell b/scripts/qmp/qmp-shell +index 0373b24..8a2a437 100755 +--- a/scripts/qmp/qmp-shell ++++ b/scripts/qmp/qmp-shell +@@ -83,9 +83,6 @@ class QMPCompleter(list): + class QMPShellError(Exception): + pass + +-class QMPShellBadPort(QMPShellError): +- pass +- + class FuzzyJSON(ast.NodeTransformer): + '''This extension of ast.NodeTransformer filters literal "true/false/null" + values in an AST and replaces them by proper "True/False/None" values that +@@ -103,28 +100,13 @@ class FuzzyJSON(ast.NodeTransformer): + # _execute_cmd()). Let's design a better one. + class QMPShell(qmp.QEMUMonitorProtocol): + def __init__(self, address, pretty=False): +- qmp.QEMUMonitorProtocol.__init__(self, self.__get_address(address)) ++ qmp.QEMUMonitorProtocol.__init__(self, address) + self._greeting = None + self._completer = None + self._pretty = pretty + self._transmode = False + self._actions = list() + +- def __get_address(self, arg): +- """ +- Figure out if the argument is in the port:host form, if it's not it's +- probably a file path. +- """ +- addr = arg.split(':') +- if len(addr) == 2: +- try: +- port = int(addr[1]) +- except ValueError: +- raise QMPShellBadPort +- return ( addr[0], port ) +- # socket path +- return arg +- + def _fill_completion(self): + for cmd in self.cmd('query-commands')['return']: + self._completer.append(cmd['name']) +@@ -400,7 +382,7 @@ def main(): + + if qemu is None: + fail_cmdline() +- except QMPShellBadPort: ++ except qmp.QMPShellBadPort: + die('bad port number in command-line') + + try: +diff --git a/scripts/qmp/qmp.py b/scripts/qmp/qmp.py +index 62d3651..261ece8 100644 +--- a/scripts/qmp/qmp.py ++++ b/scripts/qmp/qmp.py +@@ -25,21 +25,23 @@ class QMPCapabilitiesError(QMPError): + class QMPTimeoutError(QMPError): + pass + ++class QMPShellBadPort(QMPError): ++ pass ++ + class QEMUMonitorProtocol: + def __init__(self, address, server=False, debug=False): + """ + Create a QEMUMonitorProtocol class. + + @param address: QEMU address, can be either a unix socket path (string) +- or a tuple in the form ( address, port ) for a TCP +- connection ++ or a TCP endpoint (string in the format "host:port") + @param server: server mode listens on the socket (bool) + @raise socket.error on socket connection errors + @note No connection is established, this is done by the connect() or + accept() methods + """ + self.__events = [] +- self.__address = address ++ self.__address = self.__get_address(address) + self._debug = debug + self.__sock = self.__get_sock() + if server: +@@ -47,6 +49,21 @@ class QEMUMonitorProtocol: + self.__sock.bind(self.__address) + self.__sock.listen(1) + ++ def __get_address(self, arg): ++ """ ++ Figure out if the argument is in the port:host form, if it's not it's ++ probably a file path. ++ """ ++ addr = arg.split(':') ++ if len(addr) == 2: ++ try: ++ port = int(addr[1]) ++ except ValueError: ++ raise QMPShellBadPort ++ return ( addr[0], port ) ++ # socket path ++ return arg ++ + def __get_sock(self): + if isinstance(self.__address, tuple): + family = socket.AF_INET +-- +2.7.4 + diff --git a/classification_output/01/other/3223447 b/classification_output/01/other/3223447 new file mode 100644 index 000000000..8d257ea4d --- /dev/null +++ b/classification_output/01/other/3223447 @@ -0,0 +1,199 @@ +other: 0.853 +semantic: 0.843 +instruction: 0.821 +mistranslation: 0.768 + +[BUG, RFC] Base node is in RW after making external snapshot + +Hi everyone, + +When making an external snapshot, we end up in a situation when 2 block +graph nodes related to the same image file (format and storage nodes) +have different RO flags set on them. + +E.g. + +# ls -la /proc/PID/fd +lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd + +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +--pretty | egrep '"node-name"|"ro"' + "ro": false, + "node-name": "libvirt-1-format", + "ro": false, + "node-name": "libvirt-1-storage", + +# virsh snapshot-create-as VM --name snap --disk-only +Domain snapshot snap created + +# ls -la /proc/PID/fd +lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd +lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap + +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +--pretty | egrep '"node-name"|"ro"' + "ro": false, + "node-name": "libvirt-2-format", + "ro": false, + "node-name": "libvirt-2-storage", + "ro": true, + "node-name": "libvirt-1-format", + "ro": false, <-------------- + "node-name": "libvirt-1-storage", + +File descriptor has been reopened in RO, but "libvirt-1-storage" node +still has RW permissions set. + +I'm wondering it this a bug or this is intended? Looks like a bug to +me, although I see that some iotests (e.g. 273) expect 2 nodes related +to the same image file to have different RO flags. + +bdrv_reopen_set_read_only() + bdrv_reopen() + bdrv_reopen_queue() + bdrv_reopen_queue_child() + bdrv_reopen_multiple() + bdrv_list_refresh_perms() + bdrv_topological_dfs() + bdrv_do_refresh_perms() + bdrv_reopen_commit() + +In the stack above bdrv_reopen_set_read_only() is only being called for +the parent (libvirt-1-format) node. There're 2 lists: BDSs from +refresh_list are used by bdrv_drv_set_perm and this leads to actual +reopen with RO of the file descriptor. And then there's reopen queue +bs_queue -- BDSs from this queue get their parameters updated. While +refresh_list ends up having the whole subtree (including children, this +is done in bdrv_topological_dfs()) bs_queue only has the parent. And +that is because storage (child) node's (bs->inherits_from == NULL), so +bdrv_reopen_queue_child() never adds it to the queue. Could it be the +source of this bug? + +Anyway, would greatly appreciate a clarification. + +Andrey + +On 4/24/24 21:00, Andrey Drobyshev wrote: +> +Hi everyone, +> +> +When making an external snapshot, we end up in a situation when 2 block +> +graph nodes related to the same image file (format and storage nodes) +> +have different RO flags set on them. +> +> +E.g. +> +> +# ls -la /proc/PID/fd +> +lrwx------ 1 root qemu 64 Apr 24 20:14 12 -> /path/to/harddisk.hdd +> +> +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +> +--pretty | egrep '"node-name"|"ro"' +> +"ro": false, +> +"node-name": "libvirt-1-format", +> +"ro": false, +> +"node-name": "libvirt-1-storage", +> +> +# virsh snapshot-create-as VM --name snap --disk-only +> +Domain snapshot snap created +> +> +# ls -la /proc/PID/fd +> +lr-x------ 1 root qemu 64 Apr 24 20:14 134 -> /path/to/harddisk.hdd +> +lrwx------ 1 root qemu 64 Apr 24 20:14 135 -> /path/to/harddisk.snap +> +> +# virsh qemu-monitor-command VM '{"execute": "query-named-block-nodes"}' +> +--pretty | egrep '"node-name"|"ro"' +> +"ro": false, +> +"node-name": "libvirt-2-format", +> +"ro": false, +> +"node-name": "libvirt-2-storage", +> +"ro": true, +> +"node-name": "libvirt-1-format", +> +"ro": false, <-------------- +> +"node-name": "libvirt-1-storage", +> +> +File descriptor has been reopened in RO, but "libvirt-1-storage" node +> +still has RW permissions set. +> +> +I'm wondering it this a bug or this is intended? Looks like a bug to +> +me, although I see that some iotests (e.g. 273) expect 2 nodes related +> +to the same image file to have different RO flags. +> +> +bdrv_reopen_set_read_only() +> +bdrv_reopen() +> +bdrv_reopen_queue() +> +bdrv_reopen_queue_child() +> +bdrv_reopen_multiple() +> +bdrv_list_refresh_perms() +> +bdrv_topological_dfs() +> +bdrv_do_refresh_perms() +> +bdrv_reopen_commit() +> +> +In the stack above bdrv_reopen_set_read_only() is only being called for +> +the parent (libvirt-1-format) node. There're 2 lists: BDSs from +> +refresh_list are used by bdrv_drv_set_perm and this leads to actual +> +reopen with RO of the file descriptor. And then there's reopen queue +> +bs_queue -- BDSs from this queue get their parameters updated. While +> +refresh_list ends up having the whole subtree (including children, this +> +is done in bdrv_topological_dfs()) bs_queue only has the parent. And +> +that is because storage (child) node's (bs->inherits_from == NULL), so +> +bdrv_reopen_queue_child() never adds it to the queue. Could it be the +> +source of this bug? +> +> +Anyway, would greatly appreciate a clarification. +> +> +Andrey +Friendly ping. Could somebody confirm that it is a bug indeed? + diff --git a/classification_output/01/other/3501174 b/classification_output/01/other/3501174 new file mode 100644 index 000000000..6543b7f26 --- /dev/null +++ b/classification_output/01/other/3501174 @@ -0,0 +1,2793 @@ +other: 0.727 +instruction: 0.670 +semantic: 0.665 +mistranslation: 0.650 + +[Qemu-devel] 答复: Re: 答复: Re: [BUG]COLO failover hang + +Thank youã + +I have test areadyã + +When the Primary Node panic,the Secondary Node qemu hang at the same placeã + +Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node qemu +will not produce the problem,but Primary Node panic canã + +I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. + + +when failover,channel_shutdown could not shut down the channel. + + +so the colo_process_incoming_thread will hang at recvmsg. + + +I test a patch: + + +diff --git a/migration/socket.c b/migration/socket.c + + +index 13966f1..d65a0ea 100644 + + +--- a/migration/socket.c + + ++++ b/migration/socket.c + + +@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel +*ioc, + + + } + + + + + + trace_migration_socket_incoming_accepted() + + + + + + qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") + + ++ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) + + + migration_channel_process_incoming(migrate_get_current(), + + + QIO_CHANNEL(sioc)) + + + object_unref(OBJECT(sioc)) + + + + +My test will not hang any more. + + + + + + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 address@hidden +æéäººï¼ address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang + + + + + +Hi,Wang. + +You can test this branch: +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +and please follow wiki ensure your own configuration correctly. +http://wiki.qemu-project.org/Features/COLO +Thanks + +Zhang Chen + + +On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ +ï¼ hi. +ï¼ +ï¼ I test the git qemu master have the same problem. +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ +ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ (address@hidden, address@hidden "", +ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ +ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ migration/qemu-file-channel.c:78 +ï¼ +ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ migration/qemu-file.c:295 +ï¼ +ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ +ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ migration/qemu-file.c:568 +ï¼ +ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ migration/qemu-file.c:648 +ï¼ +ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ address@hidden) at migration/colo.c:244 +ï¼ +ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ outï¼, address@hidden, +ï¼ address@hidden) +ï¼ +ï¼ at migration/colo.c:264 +ï¼ +ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ +ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ +ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ +ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ +ï¼ $3 = 0 +ï¼ +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ +ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ gmain.c:3054 +ï¼ +ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ address@hidden) at gmain.c:3630 +ï¼ +ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ +ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ util/main-loop.c:258 +ï¼ +ï¼ #5 main_loop_wait (address@hidden) at +ï¼ util/main-loop.c:506 +ï¼ +ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ +ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ outï¼) at vl.c:4709 +ï¼ +ï¼ (gdb) p ioc-ï¼features +ï¼ +ï¼ $1 = 6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ +ï¼ +ï¼ May be socket_accept_incoming_migration should +ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ +ï¼ +ï¼ thank you. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ address@hidden +ï¼ address@hidden +ï¼ address@hidden@huawei.comï¼ +ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ +ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ +ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ Do the colo have any plan for development? +ï¼ +ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ +ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ +ï¼ In our internal version can run it successfully, +ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ Next time if you have some question about COLO, +ï¼ please cc me and zhanghailiang address@hidden +ï¼ +ï¼ +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ (gdb) bt +ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ -- +ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ +ï¼ -- +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ + +-- +Thanks +Zhang Chen + +Hi, + +On 2017/3/21 16:10, address@hidden wrote: +Thank youã + +I have test areadyã + +When the Primary Node panic,the Secondary Node qemu hang at the same placeã + +Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node qemu +will not produce the problem,but Primary Node panic canã + +I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. +Yes, you are right, when we do failover for primary/secondary VM, we will +shutdown the related +fd in case it is stuck in the read/write fd. + +It seems that you didn't follow the above introduction exactly to do the test. +Could you +share your test procedures ? Especially the commands used in the test. + +Thanks, +Hailiang +when failover,channel_shutdown could not shut down the channel. + + +so the colo_process_incoming_thread will hang at recvmsg. + + +I test a patch: + + +diff --git a/migration/socket.c b/migration/socket.c + + +index 13966f1..d65a0ea 100644 + + +--- a/migration/socket.c + + ++++ b/migration/socket.c + + +@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel +*ioc, + + + } + + + + + + trace_migration_socket_incoming_accepted() + + + + + + qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") + + ++ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) + + + migration_channel_process_incoming(migrate_get_current(), + + + QIO_CHANNEL(sioc)) + + + object_unref(OBJECT(sioc)) + + + + +My test will not hang any more. + + + + + + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 address@hidden +æéäººï¼ address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang + + + + + +Hi,Wang. + +You can test this branch: +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +and please follow wiki ensure your own configuration correctly. +http://wiki.qemu-project.org/Features/COLO +Thanks + +Zhang Chen + + +On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ +ï¼ hi. +ï¼ +ï¼ I test the git qemu master have the same problem. +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ +ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ (address@hidden, address@hidden "", +ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ +ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ migration/qemu-file-channel.c:78 +ï¼ +ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ migration/qemu-file.c:295 +ï¼ +ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ +ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ migration/qemu-file.c:568 +ï¼ +ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ migration/qemu-file.c:648 +ï¼ +ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ address@hidden) at migration/colo.c:244 +ï¼ +ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ outï¼, address@hidden, +ï¼ address@hidden) +ï¼ +ï¼ at migration/colo.c:264 +ï¼ +ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ +ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ +ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ +ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ +ï¼ $3 = 0 +ï¼ +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ +ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ gmain.c:3054 +ï¼ +ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ address@hidden) at gmain.c:3630 +ï¼ +ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ +ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ util/main-loop.c:258 +ï¼ +ï¼ #5 main_loop_wait (address@hidden) at +ï¼ util/main-loop.c:506 +ï¼ +ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ +ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ outï¼) at vl.c:4709 +ï¼ +ï¼ (gdb) p ioc-ï¼features +ï¼ +ï¼ $1 = 6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ +ï¼ +ï¼ May be socket_accept_incoming_migration should +ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ +ï¼ +ï¼ thank you. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ address@hidden +ï¼ address@hidden +ï¼ address@hidden@huawei.comï¼ +ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ +ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ +ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ Do the colo have any plan for development? +ï¼ +ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ +ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ +ï¼ In our internal version can run it successfully, +ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ Next time if you have some question about COLO, +ï¼ please cc me and zhanghailiang address@hidden +ï¼ +ï¼ +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ (gdb) bt +ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ -- +ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ +ï¼ -- +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ + +Hi, + +Thanks for reporting this, and i confirmed it in my test, and it is a bug. + +Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +case COLO thread/incoming thread is stuck in read/write() while do failover, +but it didn't take effect, because all the fd used by COLO (also migration) +has been wrapped by qio channel, and it will not call the shutdown API if +we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN). + +Cc: Dr. David Alan Gilbert <address@hidden> + +I doubted migration cancel has the same problem, it may be stuck in write() +if we tried to cancel migration. + +void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error +**errp) +{ + qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing"); + migration_channel_connect(s, ioc, NULL); + ... ... +We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +and the +migrate_fd_cancel() +{ + ... ... + if (s->state == MIGRATION_STATUS_CANCELLING && f) { + qemu_file_shutdown(f); --> This will not take effect. No ? + } +} + +Thanks, +Hailiang + +On 2017/3/21 16:10, address@hidden wrote: +Thank youã + +I have test areadyã + +When the Primary Node panic,the Secondary Node qemu hang at the same placeã + +Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node qemu +will not produce the problem,but Primary Node panic canã + +I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. + + +when failover,channel_shutdown could not shut down the channel. + + +so the colo_process_incoming_thread will hang at recvmsg. + + +I test a patch: + + +diff --git a/migration/socket.c b/migration/socket.c + + +index 13966f1..d65a0ea 100644 + + +--- a/migration/socket.c + + ++++ b/migration/socket.c + + +@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel +*ioc, + + + } + + + + + + trace_migration_socket_incoming_accepted() + + + + + + qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") + + ++ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) + + + migration_channel_process_incoming(migrate_get_current(), + + + QIO_CHANNEL(sioc)) + + + object_unref(OBJECT(sioc)) + + + + +My test will not hang any more. + + + + + + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 address@hidden +æéäººï¼ address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang + + + + + +Hi,Wang. + +You can test this branch: +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +and please follow wiki ensure your own configuration correctly. +http://wiki.qemu-project.org/Features/COLO +Thanks + +Zhang Chen + + +On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ +ï¼ hi. +ï¼ +ï¼ I test the git qemu master have the same problem. +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ +ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ (address@hidden, address@hidden "", +ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ +ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ migration/qemu-file-channel.c:78 +ï¼ +ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ migration/qemu-file.c:295 +ï¼ +ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ +ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ migration/qemu-file.c:568 +ï¼ +ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ migration/qemu-file.c:648 +ï¼ +ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ address@hidden) at migration/colo.c:244 +ï¼ +ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ outï¼, address@hidden, +ï¼ address@hidden) +ï¼ +ï¼ at migration/colo.c:264 +ï¼ +ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ +ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ +ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ +ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ +ï¼ $3 = 0 +ï¼ +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ +ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ gmain.c:3054 +ï¼ +ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ address@hidden) at gmain.c:3630 +ï¼ +ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ +ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ util/main-loop.c:258 +ï¼ +ï¼ #5 main_loop_wait (address@hidden) at +ï¼ util/main-loop.c:506 +ï¼ +ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ +ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ outï¼) at vl.c:4709 +ï¼ +ï¼ (gdb) p ioc-ï¼features +ï¼ +ï¼ $1 = 6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ +ï¼ +ï¼ May be socket_accept_incoming_migration should +ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ +ï¼ +ï¼ thank you. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ address@hidden +ï¼ address@hidden +ï¼ address@hidden@huawei.comï¼ +ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ +ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ +ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ Do the colo have any plan for development? +ï¼ +ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ +ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ +ï¼ In our internal version can run it successfully, +ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ Next time if you have some question about COLO, +ï¼ please cc me and zhanghailiang address@hidden +ï¼ +ï¼ +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ (gdb) bt +ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ -- +ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ +ï¼ -- +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ + +* Hailiang Zhang (address@hidden) wrote: +> +Hi, +> +> +Thanks for reporting this, and i confirmed it in my test, and it is a bug. +> +> +Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +> +case COLO thread/incoming thread is stuck in read/write() while do failover, +> +but it didn't take effect, because all the fd used by COLO (also migration) +> +has been wrapped by qio channel, and it will not call the shutdown API if +> +we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +> +QIO_CHANNEL_FEATURE_SHUTDOWN). +> +> +Cc: Dr. David Alan Gilbert <address@hidden> +> +> +I doubted migration cancel has the same problem, it may be stuck in write() +> +if we tried to cancel migration. +> +> +void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error +> +**errp) +> +{ +> +qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing"); +> +migration_channel_connect(s, ioc, NULL); +> +... ... +> +We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +> +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +> +and the +> +migrate_fd_cancel() +> +{ +> +... ... +> +if (s->state == MIGRATION_STATUS_CANCELLING && f) { +> +qemu_file_shutdown(f); --> This will not take effect. No ? +> +} +> +} +(cc'd in Daniel Berrange). +I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); +at the +top of qio_channel_socket_new; so I think that's safe isn't it? + +Dave + +> +Thanks, +> +Hailiang +> +> +On 2017/3/21 16:10, address@hidden wrote: +> +> Thank youã +> +> +> +> I have test areadyã +> +> +> +> When the Primary Node panic,the Secondary Node qemu hang at the same placeã +> +> +> +> Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node +> +> qemu will not produce the problem,but Primary Node panic canã +> +> +> +> I think due to the feature of channel does not support +> +> QIO_CHANNEL_FEATURE_SHUTDOWN. +> +> +> +> +> +> when failover,channel_shutdown could not shut down the channel. +> +> +> +> +> +> so the colo_process_incoming_thread will hang at recvmsg. +> +> +> +> +> +> I test a patch: +> +> +> +> +> +> diff --git a/migration/socket.c b/migration/socket.c +> +> +> +> +> +> index 13966f1..d65a0ea 100644 +> +> +> +> +> +> --- a/migration/socket.c +> +> +> +> +> +> +++ b/migration/socket.c +> +> +> +> +> +> @@ -147,8 +147,9 @@ static gboolean +> +> socket_accept_incoming_migration(QIOChannel *ioc, +> +> +> +> +> +> } +> +> +> +> +> +> +> +> +> +> +> +> trace_migration_socket_incoming_accepted() +> +> +> +> +> +> +> +> +> +> +> +> qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +> +> +> +> +> +> + qio_channel_set_feature(QIO_CHANNEL(sioc), +> +> QIO_CHANNEL_FEATURE_SHUTDOWN) +> +> +> +> +> +> migration_channel_process_incoming(migrate_get_current(), +> +> +> +> +> +> QIO_CHANNEL(sioc)) +> +> +> +> +> +> object_unref(OBJECT(sioc)) +> +> +> +> +> +> +> +> +> +> My test will not hang any more. +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> +> åå§é®ä»¶ +> +> +> +> +> +> +> +> åä»¶äººï¼ address@hidden +> +> æ¶ä»¶äººï¼ç广10165992 address@hidden +> +> æéäººï¼ address@hidden address@hidden +> +> æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +> +> 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +> +> +> +> +> +> +> +> +> +> +> +> Hi,Wang. +> +> +> +> You can test this branch: +> +> +> +> +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +> +> +> +> and please follow wiki ensure your own configuration correctly. +> +> +> +> +http://wiki.qemu-project.org/Features/COLO +> +> +> +> +> +> Thanks +> +> +> +> Zhang Chen +> +> +> +> +> +> On 03/21/2017 03:27 PM, address@hidden wrote: +> +> ï¼ +> +> ï¼ hi. +> +> ï¼ +> +> ï¼ I test the git qemu master have the same problem. +> +> ï¼ +> +> ï¼ (gdb) bt +> +> ï¼ +> +> ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +> +> ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +> +> ï¼ +> +> ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +> +> ï¼ (address@hidden, address@hidden "", +> +> ï¼ address@hidden, address@hidden) at io/channel.c:114 +> +> ï¼ +> +> ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +> +> ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +> +> ï¼ migration/qemu-file-channel.c:78 +> +> ï¼ +> +> ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +> +> ï¼ migration/qemu-file.c:295 +> +> ï¼ +> +> ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +> +> ï¼ address@hidden) at migration/qemu-file.c:555 +> +> ï¼ +> +> ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +> +> ï¼ migration/qemu-file.c:568 +> +> ï¼ +> +> ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +> +> ï¼ migration/qemu-file.c:648 +> +> ï¼ +> +> ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +> +> ï¼ address@hidden) at migration/colo.c:244 +> +> ï¼ +> +> ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +> +> ï¼ outï¼, address@hidden, +> +> ï¼ address@hidden) +> +> ï¼ +> +> ï¼ at migration/colo.c:264 +> +> ï¼ +> +> ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +> +> ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +> +> ï¼ +> +> ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +> +> ï¼ +> +> ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +> +> ï¼ +> +> ï¼ (gdb) p ioc-ï¼name +> +> ï¼ +> +> ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +> +> ï¼ +> +> ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +> +> ï¼ +> +> ï¼ $3 = 0 +> +> ï¼ +> +> ï¼ +> +> ï¼ (gdb) bt +> +> ï¼ +> +> ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +> +> ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +> +> ï¼ +> +> ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +> +> ï¼ gmain.c:3054 +> +> ï¼ +> +> ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +> +> ï¼ address@hidden) at gmain.c:3630 +> +> ï¼ +> +> ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +> +> ï¼ +> +> ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +> +> ï¼ util/main-loop.c:258 +> +> ï¼ +> +> ï¼ #5 main_loop_wait (address@hidden) at +> +> ï¼ util/main-loop.c:506 +> +> ï¼ +> +> ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +> +> ï¼ +> +> ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +> +> ï¼ outï¼) at vl.c:4709 +> +> ï¼ +> +> ï¼ (gdb) p ioc-ï¼features +> +> ï¼ +> +> ï¼ $1 = 6 +> +> ï¼ +> +> ï¼ (gdb) p ioc-ï¼name +> +> ï¼ +> +> ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +> +> ï¼ +> +> ï¼ +> +> ï¼ May be socket_accept_incoming_migration should +> +> ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +> +> ï¼ +> +> ï¼ +> +> ï¼ thank you. +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ åå§é®ä»¶ +> +> ï¼ address@hidden +> +> ï¼ address@hidden +> +> ï¼ address@hidden@huawei.comï¼ +> +> ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +> +> ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +> +> ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +> +> ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +> +> ï¼ ï¼ +> +> ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +> +> ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +> +> ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +> +> ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +> +> ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +> +> ï¼ ï¼ +> +> ï¼ ï¼ I found that the colo in qemu is not complete yet. +> +> ï¼ ï¼ Do the colo have any plan for development? +> +> ï¼ +> +> ï¼ Yes, We are developing. You can see some of patch we pushing. +> +> ï¼ +> +> ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +> +> ï¼ +> +> ï¼ In our internal version can run it successfully, +> +> ï¼ The failover detail you can ask Zhanghailiang for help. +> +> ï¼ Next time if you have some question about COLO, +> +> ï¼ please cc me and zhanghailiang address@hidden +> +> ï¼ +> +> ï¼ +> +> ï¼ Thanks +> +> ï¼ Zhang Chen +> +> ï¼ +> +> ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ centos7.2+qemu2.7.50 +> +> ï¼ ï¼ (gdb) bt +> +> ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +> +> ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +> +> ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) +> +> at +> +> ï¼ ï¼ io/channel-socket.c:497 +> +> ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +> +> ï¼ ï¼ address@hidden "", address@hidden, +> +> ï¼ ï¼ address@hidden) at io/channel.c:97 +> +> ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +> +> ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +> +> ï¼ ï¼ migration/qemu-file-channel.c:78 +> +> ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +> +> ï¼ ï¼ migration/qemu-file.c:257 +> +> ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +> +> ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +> +> ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +> +> ï¼ ï¼ migration/qemu-file.c:523 +> +> ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +> +> ï¼ ï¼ migration/qemu-file.c:603 +> +> ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +> +> ï¼ ï¼ address@hidden) at migration/colo.c:215 +> +> ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +> +> ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +> +> ï¼ ï¼ migration/colo.c:546 +> +> ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +> +> ï¼ ï¼ migration/colo.c:649 +> +> ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +> +> ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ -- +> +> ï¼ ï¼ View this message in context: +> +> +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +> +> ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ ï¼ +> +> ï¼ +> +> ï¼ -- +> +> ï¼ Thanks +> +> ï¼ Zhang Chen +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> ï¼ +> +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: +* Hailiang Zhang (address@hidden) wrote: +Hi, + +Thanks for reporting this, and i confirmed it in my test, and it is a bug. + +Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +case COLO thread/incoming thread is stuck in read/write() while do failover, +but it didn't take effect, because all the fd used by COLO (also migration) +has been wrapped by qio channel, and it will not call the shutdown API if +we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN). + +Cc: Dr. David Alan Gilbert <address@hidden> + +I doubted migration cancel has the same problem, it may be stuck in write() +if we tried to cancel migration. + +void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error +**errp) +{ + qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing"); + migration_channel_connect(s, ioc, NULL); + ... ... +We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +and the +migrate_fd_cancel() +{ + ... ... + if (s->state == MIGRATION_STATUS_CANCELLING && f) { + qemu_file_shutdown(f); --> This will not take effect. No ? + } +} +(cc'd in Daniel Berrange). +I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN); +at the +top of qio_channel_socket_new; so I think that's safe isn't it? +Hmm, you are right, this problem is only exist for the migration incoming fd, +thanks. +Dave +Thanks, +Hailiang + +On 2017/3/21 16:10, address@hidden wrote: +Thank youã + +I have test areadyã + +When the Primary Node panic,the Secondary Node qemu hang at the same placeã + +Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node qemu +will not produce the problem,but Primary Node panic canã + +I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. + + +when failover,channel_shutdown could not shut down the channel. + + +so the colo_process_incoming_thread will hang at recvmsg. + + +I test a patch: + + +diff --git a/migration/socket.c b/migration/socket.c + + +index 13966f1..d65a0ea 100644 + + +--- a/migration/socket.c + + ++++ b/migration/socket.c + + +@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel +*ioc, + + + } + + + + + + trace_migration_socket_incoming_accepted() + + + + + + qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") + + ++ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) + + + migration_channel_process_incoming(migrate_get_current(), + + + QIO_CHANNEL(sioc)) + + + object_unref(OBJECT(sioc)) + + + + +My test will not hang any more. + + + + + + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 address@hidden +æéäººï¼ address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang + + + + + +Hi,Wang. + +You can test this branch: +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +and please follow wiki ensure your own configuration correctly. +http://wiki.qemu-project.org/Features/COLO +Thanks + +Zhang Chen + + +On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ +ï¼ hi. +ï¼ +ï¼ I test the git qemu master have the same problem. +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ +ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ (address@hidden, address@hidden "", +ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ +ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ migration/qemu-file-channel.c:78 +ï¼ +ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ migration/qemu-file.c:295 +ï¼ +ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ +ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ migration/qemu-file.c:568 +ï¼ +ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ migration/qemu-file.c:648 +ï¼ +ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ address@hidden) at migration/colo.c:244 +ï¼ +ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ outï¼, address@hidden, +ï¼ address@hidden) +ï¼ +ï¼ at migration/colo.c:264 +ï¼ +ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ +ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ +ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ +ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ +ï¼ $3 = 0 +ï¼ +ï¼ +ï¼ (gdb) bt +ï¼ +ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ +ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ gmain.c:3054 +ï¼ +ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ address@hidden) at gmain.c:3630 +ï¼ +ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ +ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ util/main-loop.c:258 +ï¼ +ï¼ #5 main_loop_wait (address@hidden) at +ï¼ util/main-loop.c:506 +ï¼ +ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ +ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ outï¼) at vl.c:4709 +ï¼ +ï¼ (gdb) p ioc-ï¼features +ï¼ +ï¼ $1 = 6 +ï¼ +ï¼ (gdb) p ioc-ï¼name +ï¼ +ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ +ï¼ +ï¼ May be socket_accept_incoming_migration should +ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ +ï¼ +ï¼ thank you. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ address@hidden +ï¼ address@hidden +ï¼ address@hidden@huawei.comï¼ +ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ +ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ +ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ Do the colo have any plan for development? +ï¼ +ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ +ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ +ï¼ In our internal version can run it successfully, +ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ Next time if you have some question about COLO, +ï¼ please cc me and zhanghailiang address@hidden +ï¼ +ï¼ +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ (gdb) bt +ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ -- +ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ +ï¼ -- +ï¼ Thanks +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +. + +* Hailiang Zhang (address@hidden) wrote: +> +On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: +> +> * Hailiang Zhang (address@hidden) wrote: +> +> > Hi, +> +> > +> +> > Thanks for reporting this, and i confirmed it in my test, and it is a bug. +> +> > +> +> > Though we tried to call qemu_file_shutdown() to shutdown the related fd, +> +> > in +> +> > case COLO thread/incoming thread is stuck in read/write() while do +> +> > failover, +> +> > but it didn't take effect, because all the fd used by COLO (also +> +> > migration) +> +> > has been wrapped by qio channel, and it will not call the shutdown API if +> +> > we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +> +> > QIO_CHANNEL_FEATURE_SHUTDOWN). +> +> > +> +> > Cc: Dr. David Alan Gilbert <address@hidden> +> +> > +> +> > I doubted migration cancel has the same problem, it may be stuck in +> +> > write() +> +> > if we tried to cancel migration. +> +> > +> +> > void fd_start_outgoing_migration(MigrationState *s, const char *fdname, +> +> > Error **errp) +> +> > { +> +> > qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing"); +> +> > migration_channel_connect(s, ioc, NULL); +> +> > ... ... +> +> > We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +> +> > QIO_CHANNEL_FEATURE_SHUTDOWN) above, +> +> > and the +> +> > migrate_fd_cancel() +> +> > { +> +> > ... ... +> +> > if (s->state == MIGRATION_STATUS_CANCELLING && f) { +> +> > qemu_file_shutdown(f); --> This will not take effect. No ? +> +> > } +> +> > } +> +> +> +> (cc'd in Daniel Berrange). +> +> I see that we call qio_channel_set_feature(ioc, +> +> QIO_CHANNEL_FEATURE_SHUTDOWN); at the +> +> top of qio_channel_socket_new; so I think that's safe isn't it? +> +> +> +> +Hmm, you are right, this problem is only exist for the migration incoming fd, +> +thanks. +Yes, and I don't think we normally do a cancel on the incoming side of a +migration. + +Dave + +> +> Dave +> +> +> +> > Thanks, +> +> > Hailiang +> +> > +> +> > On 2017/3/21 16:10, address@hidden wrote: +> +> > > Thank youã +> +> > > +> +> > > I have test areadyã +> +> > > +> +> > > When the Primary Node panic,the Secondary Node qemu hang at the same +> +> > > placeã +> +> > > +> +> > > Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary +> +> > > Node qemu will not produce the problem,but Primary Node panic canã +> +> > > +> +> > > I think due to the feature of channel does not support +> +> > > QIO_CHANNEL_FEATURE_SHUTDOWN. +> +> > > +> +> > > +> +> > > when failover,channel_shutdown could not shut down the channel. +> +> > > +> +> > > +> +> > > so the colo_process_incoming_thread will hang at recvmsg. +> +> > > +> +> > > +> +> > > I test a patch: +> +> > > +> +> > > +> +> > > diff --git a/migration/socket.c b/migration/socket.c +> +> > > +> +> > > +> +> > > index 13966f1..d65a0ea 100644 +> +> > > +> +> > > +> +> > > --- a/migration/socket.c +> +> > > +> +> > > +> +> > > +++ b/migration/socket.c +> +> > > +> +> > > +> +> > > @@ -147,8 +147,9 @@ static gboolean +> +> > > socket_accept_incoming_migration(QIOChannel *ioc, +> +> > > +> +> > > +> +> > > } +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > trace_migration_socket_incoming_accepted() +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > qio_channel_set_name(QIO_CHANNEL(sioc), +> +> > > "migration-socket-incoming") +> +> > > +> +> > > +> +> > > + qio_channel_set_feature(QIO_CHANNEL(sioc), +> +> > > QIO_CHANNEL_FEATURE_SHUTDOWN) +> +> > > +> +> > > +> +> > > migration_channel_process_incoming(migrate_get_current(), +> +> > > +> +> > > +> +> > > QIO_CHANNEL(sioc)) +> +> > > +> +> > > +> +> > > object_unref(OBJECT(sioc)) +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > My test will not hang any more. +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > åå§é®ä»¶ +> +> > > +> +> > > +> +> > > +> +> > > åä»¶äººï¼ address@hidden +> +> > > æ¶ä»¶äººï¼ç广10165992 address@hidden +> +> > > æéäººï¼ address@hidden address@hidden +> +> > > æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +> +> > > 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > +> +> > > Hi,Wang. +> +> > > +> +> > > You can test this branch: +> +> > > +> +> > > +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +> +> > > +> +> > > and please follow wiki ensure your own configuration correctly. +> +> > > +> +> > > +http://wiki.qemu-project.org/Features/COLO +> +> > > +> +> > > +> +> > > Thanks +> +> > > +> +> > > Zhang Chen +> +> > > +> +> > > +> +> > > On 03/21/2017 03:27 PM, address@hidden wrote: +> +> > > ï¼ +> +> > > ï¼ hi. +> +> > > ï¼ +> +> > > ï¼ I test the git qemu master have the same problem. +> +> > > ï¼ +> +> > > ï¼ (gdb) bt +> +> > > ï¼ +> +> > > ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +> +> > > ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +> +> > > ï¼ +> +> > > ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +> +> > > ï¼ (address@hidden, address@hidden "", +> +> > > ï¼ address@hidden, address@hidden) at io/channel.c:114 +> +> > > ï¼ +> +> > > ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +> +> > > ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +> +> > > ï¼ migration/qemu-file-channel.c:78 +> +> > > ï¼ +> +> > > ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +> +> > > ï¼ migration/qemu-file.c:295 +> +> > > ï¼ +> +> > > ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +> +> > > ï¼ address@hidden) at migration/qemu-file.c:555 +> +> > > ï¼ +> +> > > ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +> +> > > ï¼ migration/qemu-file.c:568 +> +> > > ï¼ +> +> > > ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +> +> > > ï¼ migration/qemu-file.c:648 +> +> > > ï¼ +> +> > > ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +> +> > > ï¼ address@hidden) at migration/colo.c:244 +> +> > > ï¼ +> +> > > ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +> +> > > ï¼ outï¼, address@hidden, +> +> > > ï¼ address@hidden) +> +> > > ï¼ +> +> > > ï¼ at migration/colo.c:264 +> +> > > ï¼ +> +> > > ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +> +> > > ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +> +> > > ï¼ +> +> > > ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +> +> > > ï¼ +> +> > > ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +> +> > > ï¼ +> +> > > ï¼ (gdb) p ioc-ï¼name +> +> > > ï¼ +> +> > > ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +> +> > > ï¼ +> +> > > ï¼ (gdb) p ioc-ï¼features Do not support +> +> > > QIO_CHANNEL_FEATURE_SHUTDOWN +> +> > > ï¼ +> +> > > ï¼ $3 = 0 +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ (gdb) bt +> +> > > ï¼ +> +> > > ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +> +> > > ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +> +> > > ï¼ +> +> > > ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +> +> > > ï¼ gmain.c:3054 +> +> > > ï¼ +> +> > > ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +> +> > > ï¼ address@hidden) at gmain.c:3630 +> +> > > ï¼ +> +> > > ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +> +> > > ï¼ +> +> > > ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +> +> > > ï¼ util/main-loop.c:258 +> +> > > ï¼ +> +> > > ï¼ #5 main_loop_wait (address@hidden) at +> +> > > ï¼ util/main-loop.c:506 +> +> > > ï¼ +> +> > > ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +> +> > > ï¼ +> +> > > ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +> +> > > ï¼ outï¼) at vl.c:4709 +> +> > > ï¼ +> +> > > ï¼ (gdb) p ioc-ï¼features +> +> > > ï¼ +> +> > > ï¼ $1 = 6 +> +> > > ï¼ +> +> > > ï¼ (gdb) p ioc-ï¼name +> +> > > ï¼ +> +> > > ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ May be socket_accept_incoming_migration should +> +> > > ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ thank you. +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ åå§é®ä»¶ +> +> > > ï¼ address@hidden +> +> > > ï¼ address@hidden +> +> > > ï¼ address@hidden@huawei.comï¼ +> +> > > ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +> +> > > ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +> +> > > ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +> +> > > ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +> +> > > ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +> +> > > ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +> +> > > ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +> +> > > ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ I found that the colo in qemu is not complete yet. +> +> > > ï¼ ï¼ Do the colo have any plan for development? +> +> > > ï¼ +> +> > > ï¼ Yes, We are developing. You can see some of patch we pushing. +> +> > > ï¼ +> +> > > ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +> +> > > ï¼ +> +> > > ï¼ In our internal version can run it successfully, +> +> > > ï¼ The failover detail you can ask Zhanghailiang for help. +> +> > > ï¼ Next time if you have some question about COLO, +> +> > > ï¼ please cc me and zhanghailiang address@hidden +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ Thanks +> +> > > ï¼ Zhang Chen +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ centos7.2+qemu2.7.50 +> +> > > ï¼ ï¼ (gdb) bt +> +> > > ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +> +> > > ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized +> +> > > outï¼, +> +> > > ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, +> +> > > errp=0x0) at +> +> > > ï¼ ï¼ io/channel-socket.c:497 +> +> > > ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +> +> > > ï¼ ï¼ address@hidden "", address@hidden, +> +> > > ï¼ ï¼ address@hidden) at io/channel.c:97 +> +> > > ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized +> +> > > outï¼, +> +> > > ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +> +> > > ï¼ ï¼ migration/qemu-file-channel.c:78 +> +> > > ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +> +> > > ï¼ ï¼ migration/qemu-file.c:257 +> +> > > ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +> +> > > ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +> +> > > ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +> +> > > ï¼ ï¼ migration/qemu-file.c:523 +> +> > > ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +> +> > > ï¼ ï¼ migration/qemu-file.c:603 +> +> > > ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +> +> > > ï¼ ï¼ address@hidden) at migration/colo.c:215 +> +> > > ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message +> +> > > (errp=0x7f3d62bfaa48, +> +> > > ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +> +> > > ï¼ ï¼ migration/colo.c:546 +> +> > > ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +> +> > > ï¼ ï¼ migration/colo.c:649 +> +> > > ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from +> +> > > /lib64/libpthread.so.0 +> +> > > ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ -- +> +> > > ï¼ ï¼ View this message in context: +> +> > > +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +> +> > > ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ ï¼ +> +> > > ï¼ +> +> > > ï¼ -- +> +> > > ï¼ Thanks +> +> > > ï¼ Zhang Chen +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > ï¼ +> +> > > +> +> > +> +> -- +> +> Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> +> +> . +> +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + diff --git a/classification_output/01/other/3749377 b/classification_output/01/other/3749377 new file mode 100644 index 000000000..0142a9653 --- /dev/null +++ b/classification_output/01/other/3749377 @@ -0,0 +1,363 @@ +other: 0.956 +semantic: 0.942 +instruction: 0.927 +mistranslation: 0.912 + +[Qemu-devel] [BUG] Inappropriate size of target_sigset_t + +Hello, Peter, Laurent, + +While working on another problem yesterday, I think I discovered a +long-standing bug in QEMU Linux user mode: our target_sigset_t structure is +eight times smaller as it should be! + +In this code segment from syscalls_def.h: + +#ifdef TARGET_MIPS +#define TARGET_NSIG 128 +#else +#define TARGET_NSIG 64 +#endif +#define TARGET_NSIG_BPW TARGET_ABI_BITS +#define TARGET_NSIG_WORDS (TARGET_NSIG / TARGET_NSIG_BPW) + +typedef struct { + abi_ulong sig[TARGET_NSIG_WORDS]; +} target_sigset_t; + +... TARGET_ABI_BITS should be replaced by eight times smaller constant (in +fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is +needed is actually "a byte per signal" in target_sigset_t, and we allow "a bit +per signal"). + +All this probably sounds to you like something impossible, since this code is +in QEMU "since forever", but I checked everything, and the bug seems real. I +wish you can prove me wrong. + +I just wanted to let you know about this, given the sensitive timing of current +softfreeze, and the fact that I won't be able to do more investigation on this +in coming weeks, since I am busy with other tasks, but perhaps you can analyze +and do something which you consider appropriate. + +Yours, +Aleksandar + +Le 03/07/2019 à 21:46, Aleksandar Markovic a écrit : +> +Hello, Peter, Laurent, +> +> +While working on another problem yesterday, I think I discovered a +> +long-standing bug in QEMU Linux user mode: our target_sigset_t structure is +> +eight times smaller as it should be! +> +> +In this code segment from syscalls_def.h: +> +> +#ifdef TARGET_MIPS +> +#define TARGET_NSIG 128 +> +#else +> +#define TARGET_NSIG 64 +> +#endif +> +#define TARGET_NSIG_BPW TARGET_ABI_BITS +> +#define TARGET_NSIG_WORDS (TARGET_NSIG / TARGET_NSIG_BPW) +> +> +typedef struct { +> +abi_ulong sig[TARGET_NSIG_WORDS]; +> +} target_sigset_t; +> +> +... TARGET_ABI_BITS should be replaced by eight times smaller constant (in +> +fact, semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is +> +needed is actually "a byte per signal" in target_sigset_t, and we allow "a +> +bit per signal"). +TARGET_NSIG is divided by TARGET_ABI_BITS which gives you the number of +abi_ulong words we need in target_sigset_t. + +> +All this probably sounds to you like something impossible, since this code is +> +in QEMU "since forever", but I checked everything, and the bug seems real. I +> +wish you can prove me wrong. +> +> +I just wanted to let you know about this, given the sensitive timing of +> +current softfreeze, and the fact that I won't be able to do more +> +investigation on this in coming weeks, since I am busy with other tasks, but +> +perhaps you can analyze and do something which you consider appropriate. +If I compare with kernel, it looks good: + +In Linux: + + arch/mips/include/uapi/asm/signal.h + + #define _NSIG 128 + #define _NSIG_BPW (sizeof(unsigned long) * 8) + #define _NSIG_WORDS (_NSIG / _NSIG_BPW) + + typedef struct { + unsigned long sig[_NSIG_WORDS]; + } sigset_t; + +_NSIG_BPW is 8 * 8 = 64 on MIPS64 or 4 * 8 = 32 on MIPS + +In QEMU: + +TARGET_NSIG_BPW is TARGET_ABI_BITS which is TARGET_LONG_BITS which is +64 on MIPS64 and 32 on MIPS. + +I think there is no problem. + +Thanks, +Laurent + +From: Laurent Vivier <address@hidden> +> +If I compare with kernel, it looks good: +> +... +> +I think there is no problem. +Sure, thanks for such fast response - again, I am glad if you are right. +However, for some reason, glibc (and musl too) define sigset_t differently than +kernel. Please take a look. I am not sure if this is covered fine in our code. + +Yours, +Aleksandar + +> +Thanks, +> +Laurent + +On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote: +> +> +From: Laurent Vivier <address@hidden> +> +> If I compare with kernel, it looks good: +> +> ... +> +> I think there is no problem. +> +> +Sure, thanks for such fast response - again, I am glad if you are right. +> +However, for some reason, glibc (and musl too) define sigset_t differently +> +than kernel. Please take a look. I am not sure if this is covered fine in our +> +code. +Yeah, the libc definitions of sigset_t don't match the +kernel ones (this is for obscure historical reasons IIRC). +We're providing implementations of the target +syscall interface, so our target_sigset_t should be the +target kernel's version (and the target libc's version doesn't +matter to us). On the other hand we will be using the +host libc version, I think, so a little caution is required +and it's possible we have some bugs in our code. + +thanks +-- PMM + +> +From: Peter Maydell <address@hidden> +> +> +On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote: +> +> +> +> From: Laurent Vivier <address@hidden> +> +> > If I compare with kernel, it looks good: +> +> > ... +> +> > I think there is no problem. +> +> +> +> Sure, thanks for such fast response - again, I am glad if you are right. +> +> However, for some reason, glibc (and musl too) define sigset_t differently +> +> than kernel. Please take a look. I am not sure if this is covered fine in +> +> our code. +> +> +Yeah, the libc definitions of sigset_t don't match the +> +kernel ones (this is for obscure historical reasons IIRC). +> +We're providing implementations of the target +> +syscall interface, so our target_sigset_t should be the +> +target kernel's version (and the target libc's version doesn't +> +matter to us). On the other hand we will be using the +> +host libc version, I think, so a little caution is required +> +and it's possible we have some bugs in our code. +OK, I gather than this is not something that requires our immediate attention +(for 4.1), but we can analyze it later on. + +Thanks for response!! + +Sincerely, +Aleksandar + +> +thanks +> +-- PMM + +Le 03/07/2019 à 22:28, Peter Maydell a écrit : +> +On Wed, 3 Jul 2019 at 21:20, Aleksandar Markovic <address@hidden> wrote: +> +> +> +> From: Laurent Vivier <address@hidden> +> +>> If I compare with kernel, it looks good: +> +>> ... +> +>> I think there is no problem. +> +> +> +> Sure, thanks for such fast response - again, I am glad if you are right. +> +> However, for some reason, glibc (and musl too) define sigset_t differently +> +> than kernel. Please take a look. I am not sure if this is covered fine in +> +> our code. +> +> +Yeah, the libc definitions of sigset_t don't match the +> +kernel ones (this is for obscure historical reasons IIRC). +> +We're providing implementations of the target +> +syscall interface, so our target_sigset_t should be the +> +target kernel's version (and the target libc's version doesn't +> +matter to us). On the other hand we will be using the +> +host libc version, I think, so a little caution is required +> +and it's possible we have some bugs in our code. +It's why we need host_to_target_sigset_internal() and +target_to_host_sigset_internal() that translates bits and bytes between +guest kernel interface and host libc interface. + +void host_to_target_sigset_internal(target_sigset_t *d, + const sigset_t *s) +{ + int i; + target_sigemptyset(d); + for (i = 1; i <= TARGET_NSIG; i++) { + if (sigismember(s, i)) { + target_sigaddset(d, host_to_target_signal(i)); + } + } +} + +void target_to_host_sigset_internal(sigset_t *d, + const target_sigset_t *s) +{ + int i; + sigemptyset(d); + for (i = 1; i <= TARGET_NSIG; i++) { + if (target_sigismember(s, i)) { + sigaddset(d, target_to_host_signal(i)); + } + } +} + +Thanks, +Laurent + +Hi Aleksandar, + +On Wed, Jul 3, 2019 at 12:48 PM Aleksandar Markovic +<address@hidden> wrote: +> +#define TARGET_NSIG_BPW TARGET_ABI_BITS +> +#define TARGET_NSIG_WORDS (TARGET_NSIG / TARGET_NSIG_BPW) +> +> +typedef struct { +> +abi_ulong sig[TARGET_NSIG_WORDS]; +> +} target_sigset_t; +> +> +... TARGET_ABI_BITS should be replaced by eight times smaller constant (in +> +fact, +> +semantically, we need TARGET_ABI_BYTES, but it is not defined) (what is needed +> +is actually "a byte per signal" in target_sigset_t, and we allow "a bit per +> +signal"). +Why do we need a byte per target signal, if the functions in linux-user/signal.c +operate with bits? + +-- +Thanks. +-- Max + +> +Why do we need a byte per target signal, if the functions in +> +linux-user/signal.c +> +operate with bits? +Max, + +I did not base my findings on code analysis, but on dumping size/offsets of +elements of some structures, as they are emulated in QEMU, and in real systems. +So, I can't really answer your question. + +Yours, +Aleksandar + +> +-- +> +Thanks. +> +-- Max + diff --git a/classification_output/01/other/3825088 b/classification_output/01/other/3825088 new file mode 100644 index 000000000..e9f59b8c0 --- /dev/null +++ b/classification_output/01/other/3825088 @@ -0,0 +1,521 @@ +other: 0.933 +instruction: 0.812 +semantic: 0.798 +mistranslation: 0.641 + +[Qemu-devel] [BUG] QEMU crashes with dpdk virtio pmd + +Qemu crashes, with pre-condition: +vm xml config with multiqueue, and the vm's driver virtio-net support +multi-queue + +reproduce steps: +i. start dpdk testpmd in VM with the virtio nic +ii. stop testpmd +iii. reboot the VM + +This commit "f9d6dbf0 remove virtio queues if the guest doesn't support +multiqueue" is introduced. + +Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) +VM DPDK version: DPDK-1.6.1 + +Call Trace: +#0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 +#1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 +#2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 +#3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 +#4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 +#5 0x00007f6088fea32c in iter_remove_or_steal () from +/usr/lib64/libglib-2.0.so.0 +#6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at +qom/object.c:410 +#7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 +#8 object_unref (address@hidden) at qom/object.c:903 +#9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at +git/qemu/exec.c:1154 +#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 +#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514 +#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at +util/rcu.c:272 +#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 + +Call Trace: +#0 0x00007fdccaeb9790 in ?? () +#1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at +qom/object.c:405 +#2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 +#3 object_unref (address@hidden) at qom/object.c:903 +#4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at +git/qemu/exec.c:1154 +#5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 +#6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514 +#7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at +util/rcu.c:272 +#8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +#9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 + +The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio +queues +if the guest doesn't support multiqueue. But it might be still referenced by +others (eg . virtio_net_set_status()), +which need so set NULL. + +diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c +index 7d091c9..98bd683 100644 +--- a/hw/net/virtio-net.c ++++ b/hw/net/virtio-net.c +@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index) + if (q->tx_timer) { + timer_del(q->tx_timer); + timer_free(q->tx_timer); ++ q->tx_timer = NULL; + } else { + qemu_bh_delete(q->tx_bh); ++ q->tx_bh = NULL; + } ++ q->tx_waiting = 0; + virtio_del_queue(vdev, index * 2 + 1); + } + +From: wangyunjian +Sent: Monday, April 24, 2017 6:10 PM +To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' +<address@hidden> +Cc: wangyunjian <address@hidden>; caihe <address@hidden> +Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd + +Qemu crashes, with pre-condition: +vm xml config with multiqueue, and the vm's driver virtio-net support +multi-queue + +reproduce steps: +i. start dpdk testpmd in VM with the virtio nic +ii. stop testpmd +iii. reboot the VM + +This commit "f9d6dbf0 remove virtio queues if the guest doesn't support +multiqueue" is introduced. + +Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) +VM DPDK version:  DPDK-1.6.1 + +Call Trace: +#0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 +#1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 +#2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 +#3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 +#4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 +#5 0x00007f6088fea32c in iter_remove_or_steal () from +/usr/lib64/libglib-2.0.so.0 +#6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at +qom/object.c:410 +#7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 +#8 object_unref (address@hidden) at qom/object.c:903 +#9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at +git/qemu/exec.c:1154 +#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 +#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514 +#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at +util/rcu.c:272 +#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 + +Call Trace: +#0 0x00007fdccaeb9790 in ?? () +#1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at +qom/object.c:405 +#2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 +#3 object_unref (address@hidden) at qom/object.c:903 +#4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at +git/qemu/exec.c:1154 +#5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 +#6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514 +#7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at +util/rcu.c:272 +#8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +#9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 + +On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote: +The q->tx_bh will free in virtio_net_del_queue() function, when remove virtio +queues +if the guest doesn't support multiqueue. But it might be still referenced by +others (eg . virtio_net_set_status()), +which need so set NULL. + +diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c +index 7d091c9..98bd683 100644 +--- a/hw/net/virtio-net.c ++++ b/hw/net/virtio-net.c +@@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, int index) + if (q->tx_timer) { + timer_del(q->tx_timer); + timer_free(q->tx_timer); ++ q->tx_timer = NULL; + } else { + qemu_bh_delete(q->tx_bh); ++ q->tx_bh = NULL; + } ++ q->tx_waiting = 0; + virtio_del_queue(vdev, index * 2 + 1); + } +Thanks a lot for the fix. + +Two questions: +- If virtio_net_set_status() is the only function that may access tx_bh, +it looks like setting tx_waiting to zero is sufficient? +- Can you post a formal patch for this? + +Thanks +From: wangyunjian +Sent: Monday, April 24, 2017 6:10 PM +To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason Wang' +<address@hidden> +Cc: wangyunjian <address@hidden>; caihe <address@hidden> +Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd + +Qemu crashes, with pre-condition: +vm xml config with multiqueue, and the vm's driver virtio-net support +multi-queue + +reproduce steps: +i. start dpdk testpmd in VM with the virtio nic +ii. stop testpmd +iii. reboot the VM + +This commit "f9d6dbf0 remove virtio queues if the guest doesn't support +multiqueue" is introduced. + +Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) +VM DPDK version: DPDK-1.6.1 + +Call Trace: +#0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 +#1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 +#2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 +#3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 +#4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 +#5 0x00007f6088fea32c in iter_remove_or_steal () from +/usr/lib64/libglib-2.0.so.0 +#6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) at +qom/object.c:410 +#7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 +#8 object_unref (address@hidden) at qom/object.c:903 +#9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at +git/qemu/exec.c:1154 +#10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 +#11 address_space_dispatch_free (d=0x7f6090b72b90) at git/qemu/exec.c:2514 +#12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at +util/rcu.c:272 +#13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +#14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 + +Call Trace: +#0 0x00007fdccaeb9790 in ?? () +#1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at +qom/object.c:405 +#2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 +#3 object_unref (address@hidden) at qom/object.c:903 +#4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at +git/qemu/exec.c:1154 +#5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 +#6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at git/qemu/exec.c:2514 +#7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at +util/rcu.c:272 +#8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +#9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 + +CCing Paolo and Stefan, since it has a relationship with bh in Qemu. + +> +-----Original Message----- +> +From: Jason Wang [ +mailto:address@hidden +> +> +> +On 2017å¹´04æ25æ¥ 19:37, wangyunjian wrote: +> +> The q->tx_bh will free in virtio_net_del_queue() function, when remove +> +> virtio +> +queues +> +> if the guest doesn't support multiqueue. But it might be still referenced by +> +others (eg . virtio_net_set_status()), +> +> which need so set NULL. +> +> +> +> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c +> +> index 7d091c9..98bd683 100644 +> +> --- a/hw/net/virtio-net.c +> +> +++ b/hw/net/virtio-net.c +> +> @@ -1522,9 +1522,12 @@ static void virtio_net_del_queue(VirtIONet *n, +> +int index) +> +> if (q->tx_timer) { +> +> timer_del(q->tx_timer); +> +> timer_free(q->tx_timer); +> +> + q->tx_timer = NULL; +> +> } else { +> +> qemu_bh_delete(q->tx_bh); +> +> + q->tx_bh = NULL; +> +> } +> +> + q->tx_waiting = 0; +> +> virtio_del_queue(vdev, index * 2 + 1); +> +> } +> +> +Thanks a lot for the fix. +> +> +Two questions: +> +> +- If virtio_net_set_status() is the only function that may access tx_bh, +> +it looks like setting tx_waiting to zero is sufficient? +Currently yes, but we don't assure that it works for all scenarios, so +we set the tx_bh and tx_timer to NULL to avoid to possibly access wild pointer, +which is the common method for usage of bh in Qemu. + +I have another question about the root cause of this issure. + +This below trace is the path of setting tx_waiting to one in +virtio_net_handle_tx_bh() : + +Breakpoint 1, virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at +/data/wyj/git/qemu/hw/net/virtio-net.c:1398 +1398 { +(gdb) bt +#0 virtio_net_handle_tx_bh (vdev=0x0, vq=0x7f335ad13900) at +/data/wyj/git/qemu/hw/net/virtio-net.c:1398 +#1 0x00007f3357bddf9c in virtio_bus_set_host_notifier (bus=<optimized out>, +address@hidden, address@hidden) at hw/virtio/virtio-bus.c:297 +#2 0x00007f3357a0055d in vhost_dev_disable_notifiers (address@hidden, +address@hidden) at /data/wyj/git/qemu/hw/virtio/vhost.c:1422 +#3 0x00007f33579e3373 in vhost_net_stop_one (net=0x7f335ad84dc0, +dev=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/vhost_net.c:289 +#4 0x00007f33579e385b in vhost_net_stop (address@hidden, ncs=<optimized out>, +address@hidden) at /data/wyj/git/qemu/hw/net/vhost_net.c:367 +#5 0x00007f33579e15de in virtio_net_vhost_status (status=<optimized out>, +n=0x7f335c6f5f90) at /data/wyj/git/qemu/hw/net/virtio-net.c:176 +#6 virtio_net_set_status (vdev=0x7f335c6f5f90, status=0 '\000') at +/data/wyj/git/qemu/hw/net/virtio-net.c:250 +#7 0x00007f33579f8dc6 in virtio_set_status (address@hidden, address@hidden +'\000') at /data/wyj/git/qemu/hw/virtio/virtio.c:1146 +#8 0x00007f3357bdd3cc in virtio_ioport_write (val=0, addr=18, +opaque=0x7f335c6eda80) at hw/virtio/virtio-pci.c:387 +#9 virtio_pci_config_write (opaque=0x7f335c6eda80, addr=18, val=0, +size=<optimized out>) at hw/virtio/virtio-pci.c:511 +#10 0x00007f33579b2155 in memory_region_write_accessor (mr=0x7f335c6ee470, +addr=18, value=<optimized out>, size=1, shift=<optimized out>, mask=<optimized +out>, attrs=...) at /data/wyj/git/qemu/memory.c:526 +#11 0x00007f33579af2e9 in access_with_adjusted_size (address@hidden, +address@hidden, address@hidden, access_size_min=<optimized out>, +access_size_max=<optimized out>, address@hidden + 0x7f33579b20f0 <memory_region_write_accessor>, address@hidden, +address@hidden) at /data/wyj/git/qemu/memory.c:592 +#12 0x00007f33579b2e15 in memory_region_dispatch_write (address@hidden, +address@hidden, data=0, address@hidden, address@hidden) at +/data/wyj/git/qemu/memory.c:1319 +#13 0x00007f335796cd93 in address_space_write_continue (mr=0x7f335c6ee470, l=1, +addr1=18, len=1, buf=0x7f335773d000 "", attrs=..., addr=49170, +as=0x7f3358317060 <address_space_io>) at /data/wyj/git/qemu/exec.c:2834 +#14 address_space_write (as=<optimized out>, addr=<optimized out>, attrs=..., +buf=<optimized out>, len=<optimized out>) at /data/wyj/git/qemu/exec.c:2879 +#15 0x00007f335796d3ad in address_space_rw (as=<optimized out>, address@hidden, +attrs=..., address@hidden, buf=<optimized out>, address@hidden, address@hidden) +at /data/wyj/git/qemu/exec.c:2981 +#16 0x00007f33579ae226 in kvm_handle_io (count=1, size=1, direction=<optimized +out>, data=<optimized out>, attrs=..., port=49170) at +/data/wyj/git/qemu/kvm-all.c:1803 +#17 kvm_cpu_exec (address@hidden) at /data/wyj/git/qemu/kvm-all.c:2032 +#18 0x00007f335799b632 in qemu_kvm_cpu_thread_fn (arg=0x7f335ae82070) at +/data/wyj/git/qemu/cpus.c:1118 +#19 0x00007f3352983dc5 in start_thread () from /usr/lib64/libpthread.so.0 +#20 0x00007f335113571d in clone () from /usr/lib64/libc.so.6 + +It calls qemu_bh_schedule(q->tx_bh) at the bottom of virtio_net_handle_tx_bh(), +I don't know why virtio_net_tx_bh() doesn't be invoked, so that the +q->tx_waiting is not zero. +[ps: we added logs in virtio_net_tx_bh() to verify that] + +Some other information: + +It won't crash if we don't use vhost-net. + + +Thanks, +-Gonglei + +> +- Can you post a formal patch for this? +> +> +Thanks +> +> +> From: wangyunjian +> +> Sent: Monday, April 24, 2017 6:10 PM +> +> To: address@hidden; Michael S. Tsirkin <address@hidden>; 'Jason +> +Wang' <address@hidden> +> +> Cc: wangyunjian <address@hidden>; caihe <address@hidden> +> +> Subject: [Qemu-devel][BUG] QEMU crashes with dpdk virtio pmd +> +> +> +> Qemu crashes, with pre-condition: +> +> vm xml config with multiqueue, and the vm's driver virtio-net support +> +multi-queue +> +> +> +> reproduce steps: +> +> i. start dpdk testpmd in VM with the virtio nic +> +> ii. stop testpmd +> +> iii. reboot the VM +> +> +> +> This commit "f9d6dbf0 remove virtio queues if the guest doesn't support +> +multiqueue" is introduced. +> +> +> +> Qemu version: QEMU emulator version 2.9.50 (v2.9.0-137-g32c7e0a) +> +> VM DPDK version: DPDK-1.6.1 +> +> +> +> Call Trace: +> +> #0 0x00007f60881fe5d7 in raise () from /usr/lib64/libc.so.6 +> +> #1 0x00007f60881ffcc8 in abort () from /usr/lib64/libc.so.6 +> +> #2 0x00007f608823e2f7 in __libc_message () from /usr/lib64/libc.so.6 +> +> #3 0x00007f60882456d3 in _int_free () from /usr/lib64/libc.so.6 +> +> #4 0x00007f608900158f in g_free () from /usr/lib64/libglib-2.0.so.0 +> +> #5 0x00007f6088fea32c in iter_remove_or_steal () from +> +/usr/lib64/libglib-2.0.so.0 +> +> #6 0x00007f608edc0986 in object_property_del_all (obj=0x7f6091e74800) +> +at qom/object.c:410 +> +> #7 object_finalize (data=0x7f6091e74800) at qom/object.c:467 +> +> #8 object_unref (address@hidden) at qom/object.c:903 +> +> #9 0x00007f608eaf1fd3 in phys_section_destroy (mr=0x7f6091e74800) at +> +git/qemu/exec.c:1154 +> +> #10 phys_sections_free (map=0x7f6090b72bb0) at git/qemu/exec.c:1163 +> +> #11 address_space_dispatch_free (d=0x7f6090b72b90) at +> +git/qemu/exec.c:2514 +> +> #12 0x00007f608ee91ace in call_rcu_thread (opaque=<optimized out>) at +> +util/rcu.c:272 +> +> #13 0x00007f6089b0ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +> +> #14 0x00007f60882bf71d in clone () from /usr/lib64/libc.so.6 +> +> +> +> Call Trace: +> +> #0 0x00007fdccaeb9790 in ?? () +> +> #1 0x00007fdcd82d09fc in object_property_del_all (obj=0x7fdcdb8acf60) at +> +qom/object.c:405 +> +> #2 object_finalize (data=0x7fdcdb8acf60) at qom/object.c:467 +> +> #3 object_unref (address@hidden) at qom/object.c:903 +> +> #4 0x00007fdcd8001fd3 in phys_section_destroy (mr=0x7fdcdb8acf60) at +> +git/qemu/exec.c:1154 +> +> #5 phys_sections_free (map=0x7fdcdc86aa00) at git/qemu/exec.c:1163 +> +> #6 address_space_dispatch_free (d=0x7fdcdc86a9e0) at +> +git/qemu/exec.c:2514 +> +> #7 0x00007fdcd83a1ace in call_rcu_thread (opaque=<optimized out>) at +> +util/rcu.c:272 +> +> #8 0x00007fdcd301ddc5 in start_thread () from /usr/lib64/libpthread.so.0 +> +> #9 0x00007fdcd17cf71d in clone () from /usr/lib64/libc.so.6 +> +> +> +> + +On 25/04/2017 14:02, Jason Wang wrote: +> +> +Thanks a lot for the fix. +> +> +Two questions: +> +> +- If virtio_net_set_status() is the only function that may access tx_bh, +> +it looks like setting tx_waiting to zero is sufficient? +I think clearing tx_bh is better anyway, as leaving a dangling pointer +is not very hygienic. + +Paolo + +> +- Can you post a formal patch for this? + diff --git a/classification_output/01/other/4314117 b/classification_output/01/other/4314117 new file mode 100644 index 000000000..e48b51621 --- /dev/null +++ b/classification_output/01/other/4314117 @@ -0,0 +1,711 @@ +other: 0.922 +instruction: 0.908 +semantic: 0.905 +mistranslation: 0.885 + +[Qemu-devel] [BUG] user-to-root privesc inside VM via bad translation caching + +This is an issue in QEMU's system emulation for X86 in TCG mode. +The issue permits an attacker who can execute code in guest ring 3 +with normal user privileges to inject code into other processes that +are running in guest ring 3, in particular root-owned processes. + +== reproduction steps == + + - Create an x86-64 VM and install Debian Jessie in it. The following + steps should all be executed inside the VM. + - Verify that procmail is installed and the correct version: + address@hidden:~# apt-cache show procmail | egrep 'Version|SHA' + Version: 3.22-24 + SHA1: 54ed2d51db0e76f027f06068ab5371048c13434c + SHA256: 4488cf6975af9134a9b5238d5d70e8be277f70caa45a840dfbefd2dc444bfe7f + - Install build-essential and nasm ("apt install build-essential nasm"). + - Unpack the exploit, compile it and run it: + address@hidden:~$ tar xvf procmail_cache_attack.tar + procmail_cache_attack/ + procmail_cache_attack/shellcode.asm + procmail_cache_attack/xp.c + procmail_cache_attack/compile.sh + procmail_cache_attack/attack.c + address@hidden:~$ cd procmail_cache_attack + address@hidden:~/procmail_cache_attack$ ./compile.sh + address@hidden:~/procmail_cache_attack$ ./attack + memory mappings set up + child is dead, codegen should be complete + executing code as root! :) + address@hidden:~/procmail_cache_attack# id + uid=0(root) gid=0(root) groups=0(root),[...] + +Note: While the exploit depends on the precise version of procmail, +the actual vulnerability is in QEMU, not in procmail. procmail merely +serves as a seldomly-executed setuid root binary into which code can +be injected. + + +== detailed issue description == +QEMU caches translated basic blocks. To look up a translated basic +block, the function tb_find() is used, which uses tb_htable_lookup() +in its slowpath, which in turn compares translated basic blocks +(TranslationBlock) to the lookup information (struct tb_desc) using +tb_cmp(). + +tb_cmp() attempts to ensure (among other things) that both the virtual +start address of the basic block and the physical addresses that the +basic block covers match. When checking the physical addresses, it +assumes that a basic block can span at most two pages. + +gen_intermediate_code() attempts to enforce this by stopping the +translation of a basic block if nearly one page of instructions has +been translated already: + + /* if too long translation, stop generation too */ + if (tcg_op_buf_full() || + (pc_ptr - pc_start) >= (TARGET_PAGE_SIZE - 32) || + num_insns >= max_insns) { + gen_jmp_im(pc_ptr - dc->cs_base); + gen_eob(dc); + break; + } + +However, while real X86 processors have a maximum instruction length +of 15 bytes, QEMU's instruction decoder for X86 does not place any +limit on the instruction length or the number of instruction prefixes. +Therefore, it is possible to create an arbitrarily long instruction +by e.g. prepending an arbitrary number of LOCK prefixes to a normal +instruction. This permits creating a basic block that spans three +pages by simply appending an approximately page-sized instruction to +the end of a normal basic block that starts close to the end of a +page. + +Such an overlong basic block causes the basic block caching to fail as +follows: If code is generated and cached for a basic block that spans +the physical pages (A,E,B), this basic block will be returned by +lookups in a process in which the physical pages (A,B,C) are mapped +in the same virtual address range (assuming that all other lookup +parameters match). + +This behavior can be abused by an attacker e.g. as follows: If a +non-relocatable world-readable setuid executable legitimately contains +the pages (A,B,C), an attacker can map (A,E,B) into his own process, +at the normal load address of A, where E is an attacker-controlled +page. If a legitimate basic block spans the pages A and B, an attacker +can write arbitrary non-branch instructions at the start of E, then +append an overlong instruction +that ends behind the start of C, yielding a modified basic block that +spans all three pages. If the attacker then executes the modified +basic block in his process, the modified basic block is cached. +Next, the attacker can execute the setuid binary, which will reuse the +cached modified basic block, executing attacker-controlled +instructions in the context of the privileged process. + +I am sending this to qemu-devel because a QEMU security contact +told me that QEMU does not consider privilege escalation inside a +TCG VM to be a security concern. +procmail_cache_attack.tar +Description: +Unix tar archive + +On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: +> +This is an issue in QEMU's system emulation for X86 in TCG mode. +> +The issue permits an attacker who can execute code in guest ring 3 +> +with normal user privileges to inject code into other processes that +> +are running in guest ring 3, in particular root-owned processes. +> +I am sending this to qemu-devel because a QEMU security contact +> +told me that QEMU does not consider privilege escalation inside a +> +TCG VM to be a security concern. +Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. + +We should really fix the crossing-a-page-boundary code for x86. +I believe we do get it correct for ARM Thumb instructions. + +thanks +-- PMM + +On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote: +> +On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: +> +> This is an issue in QEMU's system emulation for X86 in TCG mode. +> +> The issue permits an attacker who can execute code in guest ring 3 +> +> with normal user privileges to inject code into other processes that +> +> are running in guest ring 3, in particular root-owned processes. +> +> +> I am sending this to qemu-devel because a QEMU security contact +> +> told me that QEMU does not consider privilege escalation inside a +> +> TCG VM to be a security concern. +> +> +Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. +> +> +We should really fix the crossing-a-page-boundary code for x86. +> +I believe we do get it correct for ARM Thumb instructions. +How about doing the instruction size check as follows? + +diff --git a/target/i386/translate.c b/target/i386/translate.c +index 72c1b03a2a..94cf3da719 100644 +--- a/target/i386/translate.c ++++ b/target/i386/translate.c +@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State +*env, DisasContext *s, + default: + goto unknown_op; + } ++ if (s->pc - pc_start > 15) { ++ s->pc = pc_start; ++ goto illegal_op; ++ } + return s->pc; + illegal_op: + gen_illegal_opcode(s); + +Thanks, +-- +Pranith + +On 22 March 2017 at 14:55, Pranith Kumar <address@hidden> wrote: +> +On Mon, Mar 20, 2017 at 10:46 AM, Peter Maydell wrote: +> +> On 20 March 2017 at 14:36, Jann Horn <address@hidden> wrote: +> +>> This is an issue in QEMU's system emulation for X86 in TCG mode. +> +>> The issue permits an attacker who can execute code in guest ring 3 +> +>> with normal user privileges to inject code into other processes that +> +>> are running in guest ring 3, in particular root-owned processes. +> +> +> +>> I am sending this to qemu-devel because a QEMU security contact +> +>> told me that QEMU does not consider privilege escalation inside a +> +>> TCG VM to be a security concern. +> +> +> +> Correct; it's just a bug. Don't trust TCG QEMU as a security boundary. +> +> +> +> We should really fix the crossing-a-page-boundary code for x86. +> +> I believe we do get it correct for ARM Thumb instructions. +> +> +How about doing the instruction size check as follows? +> +> +diff --git a/target/i386/translate.c b/target/i386/translate.c +> +index 72c1b03a2a..94cf3da719 100644 +> +--- a/target/i386/translate.c +> ++++ b/target/i386/translate.c +> +@@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State +> +*env, DisasContext *s, +> +default: +> +goto unknown_op; +> +} +> ++ if (s->pc - pc_start > 15) { +> ++ s->pc = pc_start; +> ++ goto illegal_op; +> ++ } +> +return s->pc; +> +illegal_op: +> +gen_illegal_opcode(s); +This doesn't look right because it means we'll check +only after we've emitted all the code to do the +instruction operation, so the effect will be +"execute instruction, then take illegal-opcode +exception". + +We should check what the x86 architecture spec actually +says and implement that. + +thanks +-- PMM + +On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +<address@hidden> wrote: +> +> +> +> How about doing the instruction size check as follows? +> +> +> +> diff --git a/target/i386/translate.c b/target/i386/translate.c +> +> index 72c1b03a2a..94cf3da719 100644 +> +> --- a/target/i386/translate.c +> +> +++ b/target/i386/translate.c +> +> @@ -8235,6 +8235,10 @@ static target_ulong disas_insn(CPUX86State +> +> *env, DisasContext *s, +> +> default: +> +> goto unknown_op; +> +> } +> +> + if (s->pc - pc_start > 15) { +> +> + s->pc = pc_start; +> +> + goto illegal_op; +> +> + } +> +> return s->pc; +> +> illegal_op: +> +> gen_illegal_opcode(s); +> +> +This doesn't look right because it means we'll check +> +only after we've emitted all the code to do the +> +instruction operation, so the effect will be +> +"execute instruction, then take illegal-opcode +> +exception". +> +The pc is restored to original address (s->pc = pc_start), so the +exception will overwrite the generated illegal instruction and will be +executed first. + +But yes, it's better to follow the architecture manual. + +Thanks, +-- +Pranith + +On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: +> +On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +> +<address@hidden> wrote: +> +> This doesn't look right because it means we'll check +> +> only after we've emitted all the code to do the +> +> instruction operation, so the effect will be +> +> "execute instruction, then take illegal-opcode +> +> exception". +> +The pc is restored to original address (s->pc = pc_start), so the +> +exception will overwrite the generated illegal instruction and will be +> +executed first. +s->pc is the guest PC -- moving that backwards will +not do anything about the generated TCG IR that's +already been written. You'd need to rewind the +write pointer in the IR stream, which there is +no support for doing AFAIK. + +thanks +-- PMM + +On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell +<address@hidden> wrote: +> +On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: +> +> On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +> +> <address@hidden> wrote: +> +>> This doesn't look right because it means we'll check +> +>> only after we've emitted all the code to do the +> +>> instruction operation, so the effect will be +> +>> "execute instruction, then take illegal-opcode +> +>> exception". +> +> +> The pc is restored to original address (s->pc = pc_start), so the +> +> exception will overwrite the generated illegal instruction and will be +> +> executed first. +> +> +s->pc is the guest PC -- moving that backwards will +> +not do anything about the generated TCG IR that's +> +already been written. You'd need to rewind the +> +write pointer in the IR stream, which there is +> +no support for doing AFAIK. +Ah, OK. Thanks for the explanation. May be we should check the size of +the instruction while decoding the prefixes and error out once we +exceed the limit. We would not generate any IR code. + +-- +Pranith + +On 03/23/2017 02:29 AM, Pranith Kumar wrote: +On Wed, Mar 22, 2017 at 11:21 AM, Peter Maydell +<address@hidden> wrote: +On 22 March 2017 at 15:14, Pranith Kumar <address@hidden> wrote: +On Wed, Mar 22, 2017 at 11:04 AM, Peter Maydell +<address@hidden> wrote: +This doesn't look right because it means we'll check +only after we've emitted all the code to do the +instruction operation, so the effect will be +"execute instruction, then take illegal-opcode +exception". +The pc is restored to original address (s->pc = pc_start), so the +exception will overwrite the generated illegal instruction and will be +executed first. +s->pc is the guest PC -- moving that backwards will +not do anything about the generated TCG IR that's +already been written. You'd need to rewind the +write pointer in the IR stream, which there is +no support for doing AFAIK. +Ah, OK. Thanks for the explanation. May be we should check the size of +the instruction while decoding the prefixes and error out once we +exceed the limit. We would not generate any IR code. +Yes. +It would not enforce a true limit of 15 bytes, since you can't know that until +you've done the rest of the decode. But you'd be able to say that no more than +14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 bytes is used. +Which does fix the bug. + + +r~ + +On 22/03/2017 21:01, Richard Henderson wrote: +> +> +> +> Ah, OK. Thanks for the explanation. May be we should check the size of +> +> the instruction while decoding the prefixes and error out once we +> +> exceed the limit. We would not generate any IR code. +> +> +Yes. +> +> +It would not enforce a true limit of 15 bytes, since you can't know that +> +until you've done the rest of the decode. But you'd be able to say that +> +no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +bytes is used. +> +> +Which does fix the bug. +Yeah, that would work for 2.9 if somebody wants to put together a patch. + Ensuring that all instruction fetching happens before translation side +effects is a little harder, but perhaps it's also the opportunity to get +rid of s->rip_offset which is a little ugly. + +Paolo + +On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: +> +> +> +On 22/03/2017 21:01, Richard Henderson wrote: +> +>> +> +>> Ah, OK. Thanks for the explanation. May be we should check the size of +> +>> the instruction while decoding the prefixes and error out once we +> +>> exceed the limit. We would not generate any IR code. +> +> +> +> Yes. +> +> +> +> It would not enforce a true limit of 15 bytes, since you can't know that +> +> until you've done the rest of the decode. But you'd be able to say that +> +> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +> bytes is used. +> +> +> +> Which does fix the bug. +> +> +Yeah, that would work for 2.9 if somebody wants to put together a patch. +> +Ensuring that all instruction fetching happens before translation side +> +effects is a little harder, but perhaps it's also the opportunity to get +> +rid of s->rip_offset which is a little ugly. +How about the following? + +diff --git a/target/i386/translate.c b/target/i386/translate.c +index 72c1b03a2a..67c58b8900 100644 +--- a/target/i386/translate.c ++++ b/target/i386/translate.c +@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State +*env, DisasContext *s, + s->vex_l = 0; + s->vex_v = 0; + next_byte: ++ /* The prefixes can atmost be 14 bytes since x86 has an upper ++ limit of 15 bytes for the instruction */ ++ if (s->pc - pc_start > 14) { ++ goto illegal_op; ++ } + b = cpu_ldub_code(env, s->pc); + s->pc++; + /* Collect prefixes. */ + +-- +Pranith + +On 23/03/2017 17:50, Pranith Kumar wrote: +> +On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: +> +> +> +> +> +> On 22/03/2017 21:01, Richard Henderson wrote: +> +>>> +> +>>> Ah, OK. Thanks for the explanation. May be we should check the size of +> +>>> the instruction while decoding the prefixes and error out once we +> +>>> exceed the limit. We would not generate any IR code. +> +>> +> +>> Yes. +> +>> +> +>> It would not enforce a true limit of 15 bytes, since you can't know that +> +>> until you've done the rest of the decode. But you'd be able to say that +> +>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +>> bytes is used. +> +>> +> +>> Which does fix the bug. +> +> +> +> Yeah, that would work for 2.9 if somebody wants to put together a patch. +> +> Ensuring that all instruction fetching happens before translation side +> +> effects is a little harder, but perhaps it's also the opportunity to get +> +> rid of s->rip_offset which is a little ugly. +> +> +How about the following? +> +> +diff --git a/target/i386/translate.c b/target/i386/translate.c +> +index 72c1b03a2a..67c58b8900 100644 +> +--- a/target/i386/translate.c +> ++++ b/target/i386/translate.c +> +@@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State +> +*env, DisasContext *s, +> +s->vex_l = 0; +> +s->vex_v = 0; +> +next_byte: +> ++ /* The prefixes can atmost be 14 bytes since x86 has an upper +> ++ limit of 15 bytes for the instruction */ +> ++ if (s->pc - pc_start > 14) { +> ++ goto illegal_op; +> ++ } +> +b = cpu_ldub_code(env, s->pc); +> +s->pc++; +> +/* Collect prefixes. */ +Please make the comment more verbose, based on Richard's remark. We +should apply it to 2.9. + +Also, QEMU usually formats comments with stars on every line. + +Paolo + +On Thu, Mar 23, 2017 at 1:37 PM, Paolo Bonzini <address@hidden> wrote: +> +> +> +On 23/03/2017 17:50, Pranith Kumar wrote: +> +> On Thu, Mar 23, 2017 at 6:27 AM, Paolo Bonzini <address@hidden> wrote: +> +>> +> +>> +> +>> On 22/03/2017 21:01, Richard Henderson wrote: +> +>>>> +> +>>>> Ah, OK. Thanks for the explanation. May be we should check the size of +> +>>>> the instruction while decoding the prefixes and error out once we +> +>>>> exceed the limit. We would not generate any IR code. +> +>>> +> +>>> Yes. +> +>>> +> +>>> It would not enforce a true limit of 15 bytes, since you can't know that +> +>>> until you've done the rest of the decode. But you'd be able to say that +> +>>> no more than 14 prefix + 1 opc + 6 modrm+sib+ofs + 4 immediate = 25 +> +>>> bytes is used. +> +>>> +> +>>> Which does fix the bug. +> +>> +> +>> Yeah, that would work for 2.9 if somebody wants to put together a patch. +> +>> Ensuring that all instruction fetching happens before translation side +> +>> effects is a little harder, but perhaps it's also the opportunity to get +> +>> rid of s->rip_offset which is a little ugly. +> +> +> +> How about the following? +> +> +> +> diff --git a/target/i386/translate.c b/target/i386/translate.c +> +> index 72c1b03a2a..67c58b8900 100644 +> +> --- a/target/i386/translate.c +> +> +++ b/target/i386/translate.c +> +> @@ -4418,6 +4418,11 @@ static target_ulong disas_insn(CPUX86State +> +> *env, DisasContext *s, +> +> s->vex_l = 0; +> +> s->vex_v = 0; +> +> next_byte: +> +> + /* The prefixes can atmost be 14 bytes since x86 has an upper +> +> + limit of 15 bytes for the instruction */ +> +> + if (s->pc - pc_start > 14) { +> +> + goto illegal_op; +> +> + } +> +> b = cpu_ldub_code(env, s->pc); +> +> s->pc++; +> +> /* Collect prefixes. */ +> +> +Please make the comment more verbose, based on Richard's remark. We +> +should apply it to 2.9. +> +> +Also, QEMU usually formats comments with stars on every line. +OK. I'll send a proper patch with updated comment. + +Thanks, +-- +Pranith + diff --git a/classification_output/01/other/4774720 b/classification_output/01/other/4774720 new file mode 100644 index 000000000..97f1a001c --- /dev/null +++ b/classification_output/01/other/4774720 @@ -0,0 +1,328 @@ +other: 0.979 +instruction: 0.974 +semantic: 0.967 +mistranslation: 0.933 + +[BUG] qemu git error with virgl + +Hello, + +i can't start any system if i use virgl. I get the following error: +qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +`con->gl' failed. +./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +intel-hda,id=sound0,msi=on -device +hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +-device usb-tablet,bus=xhci.0 -net +nic,macaddr=52:54:00:12:34:62,model=e1000 -net +tap,ifname=$INTERFACE,script=no,downscript=no -drive +file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads +Set 'tap3' nonpersistent + +i have bicected the issue: + +towo:Defiant> git bisect good +b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +Author: Paolo Bonzini <pbonzini@redhat.com> +Date:  Tue Oct 27 08:44:23 2020 -0400 + +   vl: remove separate preconfig main_loop +   Move post-preconfig initialization to the x-exit-preconfig. If +preconfig +   is not requested, just exit preconfig mode immediately with the QMP +   command. + +   As a result, the preconfig loop will run with accel_setup_post +   and os_setup_post restrictions (xen_restrict, chroot, etc.) +   already done. + +   Reviewed-by: Igor Mammedov <imammedo@redhat.com> +   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> + + include/sysemu/runstate.h | 1 - + monitor/qmp-cmds.c       | 9 ----- + softmmu/vl.c             | 95 +++++++++++++++++++++--------------------------- + 3 files changed, 41 insertions(+), 64 deletions(-) + +Regards, + +Torsten Wohlfarth + +Cc'ing Gerd + patch author/reviewer. + +On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: +> +Hello, +> +> +i can't start any system if i use virgl. I get the following error: +> +> +qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +> +`con->gl' failed. +> +./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +> +-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +> +virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +> +intel-hda,id=sound0,msi=on -device +> +hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +> +-device usb-tablet,bus=xhci.0 -net +> +nic,macaddr=52:54:00:12:34:62,model=e1000 -net +> +tap,ifname=$INTERFACE,script=no,downscript=no -drive +> +file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads +> +> +Set 'tap3' nonpersistent +> +> +i have bicected the issue: +> +> +towo:Defiant> git bisect good +> +b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +> +commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +> +Author: Paolo Bonzini <pbonzini@redhat.com> +> +Date:  Tue Oct 27 08:44:23 2020 -0400 +> +> +   vl: remove separate preconfig main_loop +> +> +   Move post-preconfig initialization to the x-exit-preconfig. If +> +preconfig +> +   is not requested, just exit preconfig mode immediately with the QMP +> +   command. +> +> +   As a result, the preconfig loop will run with accel_setup_post +> +   and os_setup_post restrictions (xen_restrict, chroot, etc.) +> +   already done. +> +> +   Reviewed-by: Igor Mammedov <imammedo@redhat.com> +> +   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> +> +> + include/sysemu/runstate.h | 1 - +> + monitor/qmp-cmds.c       | 9 ----- +> + softmmu/vl.c             | 95 +> +++++++++++++++++++++--------------------------- +> + 3 files changed, 41 insertions(+), 64 deletions(-) +> +> +Regards, +> +> +Torsten Wohlfarth +> +> +> + +On Sun, 3 Jan 2021 18:28:11 +0100 +Philippe Mathieu-Daudé <philmd@redhat.com> wrote: + +> +Cc'ing Gerd + patch author/reviewer. +> +> +On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: +> +> Hello, +> +> +> +> i can't start any system if i use virgl. I get the following error: +> +> +> +> qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +> +> `con->gl' failed. +Does following fix issue: + [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration + +> +> ./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +> +> -smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +> +> virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +> +> intel-hda,id=sound0,msi=on -device +> +> hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +> +> -device usb-tablet,bus=xhci.0 -net +> +> nic,macaddr=52:54:00:12:34:62,model=e1000 -net +> +> tap,ifname=$INTERFACE,script=no,downscript=no -drive +> +> file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads +> +> +> +> Set 'tap3' nonpersistent +> +> +> +> i have bicected the issue: +> +> +> +> towo:Defiant> git bisect good +> +> b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +> +> commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +> +> Author: Paolo Bonzini <pbonzini@redhat.com> +> +> Date:  Tue Oct 27 08:44:23 2020 -0400 +> +> +> +>    vl: remove separate preconfig main_loop +> +> +> +>    Move post-preconfig initialization to the x-exit-preconfig. If +> +> preconfig +> +>    is not requested, just exit preconfig mode immediately with the QMP +> +>    command. +> +> +> +>    As a result, the preconfig loop will run with accel_setup_post +> +>    and os_setup_post restrictions (xen_restrict, chroot, etc.) +> +>    already done. +> +> +> +>    Reviewed-by: Igor Mammedov <imammedo@redhat.com> +> +>    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> +> +> +> +>  include/sysemu/runstate.h | 1 - +> +>  monitor/qmp-cmds.c       | 9 ----- +> +>  softmmu/vl.c             | 95 +> +> ++++++++++++++++++++--------------------------- +> +>  3 files changed, 41 insertions(+), 64 deletions(-) +> +> +> +> Regards, +> +> +> +> Torsten Wohlfarth +> +> +> +> +> +> +> +> + +Hi Igor, + +yes, that fixes my issue. + +Regards, Torsten + +Am 04.01.21 um 19:50 schrieb Igor Mammedov: +On Sun, 3 Jan 2021 18:28:11 +0100 +Philippe Mathieu-Daudé <philmd@redhat.com> wrote: +Cc'ing Gerd + patch author/reviewer. + +On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: +Hello, + +i can't start any system if i use virgl. I get the following error: + +qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +`con->gl' failed. +Does following fix issue: + [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration +./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +intel-hda,id=sound0,msi=on -device +hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +-device usb-tablet,bus=xhci.0 -net +nic,macaddr=52:54:00:12:34:62,model=e1000 -net +tap,ifname=$INTERFACE,script=no,downscript=no -drive +file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads + +Set 'tap3' nonpersistent + +i have bicected the issue: +towo:Defiant> git bisect good +b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +Author: Paolo Bonzini <pbonzini@redhat.com> +Date:  Tue Oct 27 08:44:23 2020 -0400 + +    vl: remove separate preconfig main_loop + +    Move post-preconfig initialization to the x-exit-preconfig. If +preconfig +    is not requested, just exit preconfig mode immediately with the QMP +    command. + +    As a result, the preconfig loop will run with accel_setup_post +    and os_setup_post restrictions (xen_restrict, chroot, etc.) +    already done. + +    Reviewed-by: Igor Mammedov <imammedo@redhat.com> +    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> + +  include/sysemu/runstate.h | 1 - +  monitor/qmp-cmds.c       | 9 ----- +  softmmu/vl.c             | 95 +++++++++++++++++++++--------------------------- +  3 files changed, 41 insertions(+), 64 deletions(-) + +Regards, + +Torsten Wohlfarth + diff --git a/classification_output/01/other/4800759 b/classification_output/01/other/4800759 new file mode 100644 index 000000000..95a58c71f --- /dev/null +++ b/classification_output/01/other/4800759 @@ -0,0 +1,369 @@ +other: 0.886 +instruction: 0.861 +mistranslation: 0.859 +semantic: 0.850 + +[Qemu-devel] [BUG] nanoMIPS support problem related to extract2 support for i386 TCG target + +Hello, Richard, Peter, and others. + +As a part of activities before 4.1 release, I tested nanoMIPS support +in QEMU (which was officially fully integrated in 4.0, is currently +limited to system mode only, and was tested in a similar fashion right +prior to 4.0). + +This support appears to be broken now. Following command line works in +4.0, but results in kernel panic for the current tip of the tree: + +~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel +-cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m +1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append +"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" + +(kernel and rootfs image files used in this commend line can be +downloaded from the locations mentioned in our user guide) + +The quick bisect points to the commit: + +commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab +Author: Richard Henderson <address@hidden> +Date: Mon Feb 25 11:42:35 2019 -0800 + + tcg/i386: Support INDEX_op_extract2_{i32,i64} + + Signed-off-by: Richard Henderson <address@hidden> + +Please advise on further actions. + +Yours, +Aleksandar + +On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic +<address@hidden> wrote: +> +> +Hello, Richard, Peter, and others. +> +> +As a part of activities before 4.1 release, I tested nanoMIPS support +> +in QEMU (which was officially fully integrated in 4.0, is currently +> +limited to system mode only, and was tested in a similar fashion right +> +prior to 4.0). +> +> +This support appears to be broken now. Following command line works in +> +4.0, but results in kernel panic for the current tip of the tree: +> +> +~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel +> +-cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m +> +1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append +> +"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" +> +> +(kernel and rootfs image files used in this commend line can be +> +downloaded from the locations mentioned in our user guide) +> +> +The quick bisect points to the commit: +> +> +commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab +> +Author: Richard Henderson <address@hidden> +> +Date: Mon Feb 25 11:42:35 2019 -0800 +> +> +tcg/i386: Support INDEX_op_extract2_{i32,i64} +> +> +Signed-off-by: Richard Henderson <address@hidden> +> +> +Please advise on further actions. +> +Just to add a data point: + +If the following change is applied: + +diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h +index 928e8b8..b6a4cf2 100644 +--- a/tcg/i386/tcg-target.h ++++ b/tcg/i386/tcg-target.h +@@ -124,7 +124,7 @@ extern bool have_avx2; + #define TCG_TARGET_HAS_deposit_i32 1 + #define TCG_TARGET_HAS_extract_i32 1 + #define TCG_TARGET_HAS_sextract_i32 1 +-#define TCG_TARGET_HAS_extract2_i32 1 ++#define TCG_TARGET_HAS_extract2_i32 0 + #define TCG_TARGET_HAS_movcond_i32 1 + #define TCG_TARGET_HAS_add2_i32 1 + #define TCG_TARGET_HAS_sub2_i32 1 +@@ -163,7 +163,7 @@ extern bool have_avx2; + #define TCG_TARGET_HAS_deposit_i64 1 + #define TCG_TARGET_HAS_extract_i64 1 + #define TCG_TARGET_HAS_sextract_i64 0 +-#define TCG_TARGET_HAS_extract2_i64 1 ++#define TCG_TARGET_HAS_extract2_i64 0 + #define TCG_TARGET_HAS_movcond_i64 1 + #define TCG_TARGET_HAS_add2_i64 1 + #define TCG_TARGET_HAS_sub2_i64 1 + +... the problem disappears. + + +> +Yours, +> +Aleksandar + +On Fri, Jul 12, 2019 at 8:19 PM Aleksandar Markovic +<address@hidden> wrote: +> +> +On Fri, Jul 12, 2019 at 8:09 PM Aleksandar Markovic +> +<address@hidden> wrote: +> +> +> +> Hello, Richard, Peter, and others. +> +> +> +> As a part of activities before 4.1 release, I tested nanoMIPS support +> +> in QEMU (which was officially fully integrated in 4.0, is currently +> +> limited to system mode only, and was tested in a similar fashion right +> +> prior to 4.0). +> +> +> +> This support appears to be broken now. Following command line works in +> +> 4.0, but results in kernel panic for the current tip of the tree: +> +> +> +> ~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel +> +> -cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m +> +> 1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append +> +> "mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" +> +> +> +> (kernel and rootfs image files used in this commend line can be +> +> downloaded from the locations mentioned in our user guide) +> +> +> +> The quick bisect points to the commit: +> +> +> +> commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab +> +> Author: Richard Henderson <address@hidden> +> +> Date: Mon Feb 25 11:42:35 2019 -0800 +> +> +> +> tcg/i386: Support INDEX_op_extract2_{i32,i64} +> +> +> +> Signed-off-by: Richard Henderson <address@hidden> +> +> +> +> Please advise on further actions. +> +> +> +> +Just to add a data point: +> +> +If the following change is applied: +> +> +diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h +> +index 928e8b8..b6a4cf2 100644 +> +--- a/tcg/i386/tcg-target.h +> ++++ b/tcg/i386/tcg-target.h +> +@@ -124,7 +124,7 @@ extern bool have_avx2; +> +#define TCG_TARGET_HAS_deposit_i32 1 +> +#define TCG_TARGET_HAS_extract_i32 1 +> +#define TCG_TARGET_HAS_sextract_i32 1 +> +-#define TCG_TARGET_HAS_extract2_i32 1 +> ++#define TCG_TARGET_HAS_extract2_i32 0 +> +#define TCG_TARGET_HAS_movcond_i32 1 +> +#define TCG_TARGET_HAS_add2_i32 1 +> +#define TCG_TARGET_HAS_sub2_i32 1 +> +@@ -163,7 +163,7 @@ extern bool have_avx2; +> +#define TCG_TARGET_HAS_deposit_i64 1 +> +#define TCG_TARGET_HAS_extract_i64 1 +> +#define TCG_TARGET_HAS_sextract_i64 0 +> +-#define TCG_TARGET_HAS_extract2_i64 1 +> ++#define TCG_TARGET_HAS_extract2_i64 0 +> +#define TCG_TARGET_HAS_movcond_i64 1 +> +#define TCG_TARGET_HAS_add2_i64 1 +> +#define TCG_TARGET_HAS_sub2_i64 1 +> +> +... the problem disappears. +> +It looks the problem is in this code segment in of tcg_gen_deposit_i32(): + + if (ofs == 0) { + tcg_gen_extract2_i32(ret, arg1, arg2, len); + tcg_gen_rotli_i32(ret, ret, len); + goto done; + } + +) + +If that code segment is deleted altogether (which effectively forces +usage of "fallback" part of tcg_gen_deposit_i32()), the problem also +vanishes (without changes from my previous mail). + +> +> +> Yours, +> +> Aleksandar + +Aleksandar Markovic <address@hidden> writes: + +> +Hello, Richard, Peter, and others. +> +> +As a part of activities before 4.1 release, I tested nanoMIPS support +> +in QEMU (which was officially fully integrated in 4.0, is currently +> +limited to system mode only, and was tested in a similar fashion right +> +prior to 4.0). +> +> +This support appears to be broken now. Following command line works in +> +4.0, but results in kernel panic for the current tip of the tree: +> +> +~/Build/qemu-test-revert-c6fb8c0cf704/mipsel-softmmu/qemu-system-mipsel +> +-cpu I7200 -kernel generic_nano32r6el_page4k -M malta -serial stdio -m +> +1G -hda nanomips32r6_le_sf_2017.05-03-59-gf5595d6.ext4 -append +> +"mem=256m@0x0 rw console=ttyS0 vga=cirrus vesa=0x111 root=/dev/sda" +> +> +(kernel and rootfs image files used in this commend line can be +> +downloaded from the locations mentioned in our user guide) +> +> +The quick bisect points to the commit: +> +> +commit c6fb8c0cf704c4a1a48c3e99e995ad4c58150dab +> +Author: Richard Henderson <address@hidden> +> +Date: Mon Feb 25 11:42:35 2019 -0800 +> +> +tcg/i386: Support INDEX_op_extract2_{i32,i64} +> +> +Signed-off-by: Richard Henderson <address@hidden> +> +> +Please advise on further actions. +Please see the fix: + + Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32 + Date: Tue, 9 Jul 2019 14:19:00 +0200 + Message-Id: <address@hidden> + +> +> +Yours, +> +Aleksandar +-- +Alex Bennée + +On Sat, Jul 13, 2019 at 9:21 AM Alex Bennée <address@hidden> wrote: +> +> +Please see the fix: +> +> +Subject: [PATCH for-4.1] tcg: Fix constant folding of INDEX_op_extract2_i32 +> +Date: Tue, 9 Jul 2019 14:19:00 +0200 +> +Message-Id: <address@hidden> +> +Thanks, this fixed the behavior. + +Sincerely, +Aleksandar + +> +> +> +> +> Yours, +> +> Aleksandar +> +> +> +-- +> +Alex Bennée +> + diff --git a/classification_output/01/other/4938208 b/classification_output/01/other/4938208 new file mode 100644 index 000000000..33edbe3c0 --- /dev/null +++ b/classification_output/01/other/4938208 @@ -0,0 +1,1844 @@ +other: 0.535 +mistranslation: 0.518 +instruction: 0.442 +semantic: 0.411 + +[Qemu-devel] [Bug?] BQL about live migration + +Hello Juan & Dave, + +We hit a bug in our test: +Network error occurs when migrating a guest, libvirt then rollback the +migration, causes qemu coredump +qemu log: +2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: + {"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event": "STOP"} +2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: + qmp_cmd_name: migrate_cancel +2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: + {"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event": +"MIGRATION", "data": {"status": "cancelling"}} +2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: + qmp_cmd_name: cont +2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: + virtio-balloon device status is 7 that means DRIVER OK +2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: + virtio-net device status is 7 that means DRIVER OK +2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: + virtio-blk device status is 7 that means DRIVER OK +2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: + virtio-serial device status is 7 that means DRIVER OK +2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|: +vm_state-notify:3ms +2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: + {"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event": +"RESUME"} +2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|: + this iteration cycle takes 3s, new dirtied data:0MB +2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: + {"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event": +"MIGRATION_PASS", "data": {"pass": 3}} +2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for +(131583/18446744073709551615) +qemu-kvm: /home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519: +virtio_net_save: Assertion `!n->vhost_started' failed. +2017-03-01 12:54:43.028: shutting down + +> +From qemu log, qemu received and processed migrate_cancel/cont qmp commands +after guest been stopped and entered the last round of migration. Then +migration thread try to save device state when guest is running(started by +cont command), causes assert and coredump. +This is because in last iter, we call cpu_synchronize_all_states() to +synchronize vcpu states, this call will release qemu_global_mutex and wait +for do_kvm_cpu_synchronize_state() to be executed on target vcpu: +(gdb) bt +#0 0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from +/lib64/libpthread.so.0 +#1 0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0 <qemu_work_cond>, +mutex=0x7f764445eba0 <qemu_global_mutex>) at util/qemu-thread-posix.c:132 +#2 0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413 +<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at +/mnt/public/yanghy/qemu-kvm/cpus.c:995 +#3 0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at +/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805 +#4 0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at +/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457 +#5 0x00007f7643a2db0c in cpu_synchronize_all_states () at +/mnt/public/yanghy/qemu-kvm/cpus.c:766 +#6 0x00007f7643a67b5b in qemu_savevm_state_complete_precopy (f=0x7f76462f2d30, +iterable_only=false) at /mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051 +#7 0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0 +<current_migration.37571>, current_active_state=4, +old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at +migration/migration.c:1753 +#8 0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0 +<current_migration.37571>) at migration/migration.c:1922 +#9 0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0 +#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6 +(gdb) p iothread_locked +$1 = true + +and then, qemu main thread been executed, it won't block because migration +thread released the qemu_global_mutex: +(gdb) thr 1 +[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))] +#0 os_host_main_loop_wait (timeout=931565) at main-loop.c:270 +270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout %d\n", +timeout); +(gdb) p iothread_locked +$2 = true +(gdb) l 268 +263 +264 ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, +timeout); +265 +266 +267 if (timeout) { +268 qemu_mutex_lock_iothread(); +269 if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { +270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout %d\n", +timeout); +271 } +272 } +(gdb) + +So, although we've hold iothread_lock in stop© phase of migration, we +can't guarantee the iothread been locked all through the stop & copy phase, +any thoughts on how to solve this problem? + + +Thanks, +-Gonglei + +On Fri, 03/03 09:29, Gonglei (Arei) wrote: +> +Hello Juan & Dave, +> +> +We hit a bug in our test: +> +Network error occurs when migrating a guest, libvirt then rollback the +> +migration, causes qemu coredump +> +qemu log: +> +2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event": +> +"STOP"} +> +2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: +> +qmp_cmd_name: migrate_cancel +> +2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event": +> +"MIGRATION", "data": {"status": "cancelling"}} +> +2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: +> +qmp_cmd_name: cont +> +2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-balloon device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-net device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-blk device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-serial device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|: +> +vm_state-notify:3ms +> +2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event": +> +"RESUME"} +> +2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|: +> +this iteration cycle takes 3s, new dirtied data:0MB +> +2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event": +> +"MIGRATION_PASS", "data": {"pass": 3}} +> +2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for +> +(131583/18446744073709551615) +> +qemu-kvm: +> +/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519: +> +virtio_net_save: Assertion `!n->vhost_started' failed. +> +2017-03-01 12:54:43.028: shutting down +> +> +From qemu log, qemu received and processed migrate_cancel/cont qmp commands +> +after guest been stopped and entered the last round of migration. Then +> +migration thread try to save device state when guest is running(started by +> +cont command), causes assert and coredump. +> +This is because in last iter, we call cpu_synchronize_all_states() to +> +synchronize vcpu states, this call will release qemu_global_mutex and wait +> +for do_kvm_cpu_synchronize_state() to be executed on target vcpu: +> +(gdb) bt +> +#0 0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +/lib64/libpthread.so.0 +> +#1 0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0 +> +<qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at +> +util/qemu-thread-posix.c:132 +> +#2 0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413 +> +<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at +> +/mnt/public/yanghy/qemu-kvm/cpus.c:995 +> +#3 0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at +> +/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805 +> +#4 0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at +> +/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457 +> +#5 0x00007f7643a2db0c in cpu_synchronize_all_states () at +> +/mnt/public/yanghy/qemu-kvm/cpus.c:766 +> +#6 0x00007f7643a67b5b in qemu_savevm_state_complete_precopy +> +(f=0x7f76462f2d30, iterable_only=false) at +> +/mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051 +> +#7 0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0 +> +<current_migration.37571>, current_active_state=4, +> +old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at +> +migration/migration.c:1753 +> +#8 0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0 +> +<current_migration.37571>) at migration/migration.c:1922 +> +#9 0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0 +> +#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6 +> +(gdb) p iothread_locked +> +$1 = true +> +> +and then, qemu main thread been executed, it won't block because migration +> +thread released the qemu_global_mutex: +> +(gdb) thr 1 +> +[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))] +> +#0 os_host_main_loop_wait (timeout=931565) at main-loop.c:270 +> +270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout +> +%d\n", timeout); +> +(gdb) p iothread_locked +> +$2 = true +> +(gdb) l 268 +> +263 +> +264 ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, +> +timeout); +> +265 +> +266 +> +267 if (timeout) { +> +268 qemu_mutex_lock_iothread(); +> +269 if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { +> +270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout +> +%d\n", timeout); +> +271 } +> +272 } +> +(gdb) +> +> +So, although we've hold iothread_lock in stop© phase of migration, we +> +can't guarantee the iothread been locked all through the stop & copy phase, +> +any thoughts on how to solve this problem? +Could you post a backtrace of the assertion? + +Fam + +On 2017/3/3 18:42, Fam Zheng wrote: +> +On Fri, 03/03 09:29, Gonglei (Arei) wrote: +> +> Hello Juan & Dave, +> +> +> +> We hit a bug in our test: +> +> Network error occurs when migrating a guest, libvirt then rollback the +> +> migration, causes qemu coredump +> +> qemu log: +> +> 2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: +> +> {"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event": +> +> "STOP"} +> +> 2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: +> +> qmp_cmd_name: migrate_cancel +> +> 2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: +> +> {"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event": +> +> "MIGRATION", "data": {"status": "cancelling"}} +> +> 2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: +> +> qmp_cmd_name: cont +> +> 2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +> virtio-balloon device status is 7 that means DRIVER OK +> +> 2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +> virtio-net device status is 7 that means DRIVER OK +> +> 2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +> virtio-blk device status is 7 that means DRIVER OK +> +> 2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +> virtio-serial device status is 7 that means DRIVER OK +> +> 2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|: +> +> vm_state-notify:3ms +> +> 2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: +> +> {"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event": +> +> "RESUME"} +> +> 2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|: +> +> this iteration cycle takes 3s, new dirtied data:0MB +> +> 2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: +> +> {"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event": +> +> "MIGRATION_PASS", "data": {"pass": 3}} +> +> 2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for +> +> (131583/18446744073709551615) +> +> qemu-kvm: +> +> /home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519: +> +> virtio_net_save: Assertion `!n->vhost_started' failed. +> +> 2017-03-01 12:54:43.028: shutting down +> +> +> +> From qemu log, qemu received and processed migrate_cancel/cont qmp commands +> +> after guest been stopped and entered the last round of migration. Then +> +> migration thread try to save device state when guest is running(started by +> +> cont command), causes assert and coredump. +> +> This is because in last iter, we call cpu_synchronize_all_states() to +> +> synchronize vcpu states, this call will release qemu_global_mutex and wait +> +> for do_kvm_cpu_synchronize_state() to be executed on target vcpu: +> +> (gdb) bt +> +> #0 0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +> /lib64/libpthread.so.0 +> +> #1 0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0 +> +> <qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at +> +> util/qemu-thread-posix.c:132 +> +> #2 0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, +> +> func=0x7f7643a46413 <do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at +> +> /mnt/public/yanghy/qemu-kvm/cpus.c:995 +> +> #3 0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at +> +> /mnt/public/yanghy/qemu-kvm/kvm-all.c:1805 +> +> #4 0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at +> +> /mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457 +> +> #5 0x00007f7643a2db0c in cpu_synchronize_all_states () at +> +> /mnt/public/yanghy/qemu-kvm/cpus.c:766 +> +> #6 0x00007f7643a67b5b in qemu_savevm_state_complete_precopy +> +> (f=0x7f76462f2d30, iterable_only=false) at +> +> /mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051 +> +> #7 0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0 +> +> <current_migration.37571>, current_active_state=4, +> +> old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at +> +> migration/migration.c:1753 +> +> #8 0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0 +> +> <current_migration.37571>) at migration/migration.c:1922 +> +> #9 0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0 +> +> #10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6 +> +> (gdb) p iothread_locked +> +> $1 = true +> +> +> +> and then, qemu main thread been executed, it won't block because migration +> +> thread released the qemu_global_mutex: +> +> (gdb) thr 1 +> +> [Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))] +> +> #0 os_host_main_loop_wait (timeout=931565) at main-loop.c:270 +> +> 270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout +> +> %d\n", timeout); +> +> (gdb) p iothread_locked +> +> $2 = true +> +> (gdb) l 268 +> +> 263 +> +> 264 ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, +> +> timeout); +> +> 265 +> +> 266 +> +> 267 if (timeout) { +> +> 268 qemu_mutex_lock_iothread(); +> +> 269 if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { +> +> 270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout +> +> %d\n", timeout); +> +> 271 } +> +> 272 } +> +> (gdb) +> +> +> +> So, although we've hold iothread_lock in stop© phase of migration, we +> +> can't guarantee the iothread been locked all through the stop & copy phase, +> +> any thoughts on how to solve this problem? +> +> +Could you post a backtrace of the assertion? +#0 0x00007f97b1fbe5d7 in raise () from /usr/lib64/libc.so.6 +#1 0x00007f97b1fbfcc8 in abort () from /usr/lib64/libc.so.6 +#2 0x00007f97b1fb7546 in __assert_fail_base () from /usr/lib64/libc.so.6 +#3 0x00007f97b1fb75f2 in __assert_fail () from /usr/lib64/libc.so.6 +#4 0x000000000049fd19 in virtio_net_save (f=0x7f97a8ca44d0, +opaque=0x7f97a86e9018) at /usr/src/debug/qemu-kvm-2.6.0/hw/ +#5 0x000000000047e380 in vmstate_save_old_style (address@hidden, +address@hidden, se=0x7f9 +#6 0x000000000047fb93 in vmstate_save (address@hidden, address@hidden, +address@hidden +#7 0x0000000000481ad2 in qemu_savevm_state_complete_precopy (f=0x7f97a8ca44d0, +address@hidden) +#8 0x00000000006c6b60 in migration_completion (address@hidden +<current_migration.38312>, current_active_state=curre + address@hidden) at migration/migration.c:1761 +#9 0x00000000006c71db in migration_thread (address@hidden +<current_migration.38312>) at migration/migrati + +> +> +Fam +> +-- +Thanks, +Yang + +* Gonglei (Arei) (address@hidden) wrote: +> +Hello Juan & Dave, +cc'ing in pbonzini since it's magic involving cpu_synrhonize_all_states() + +> +We hit a bug in our test: +> +Network error occurs when migrating a guest, libvirt then rollback the +> +migration, causes qemu coredump +> +qemu log: +> +2017-03-01T12:54:33.904949+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344073, "microseconds": 904914}, "event": +> +"STOP"} +> +2017-03-01T12:54:37.522500+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: +> +qmp_cmd_name: migrate_cancel +> +2017-03-01T12:54:37.522607+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344077, "microseconds": 522556}, "event": +> +"MIGRATION", "data": {"status": "cancelling"}} +> +2017-03-01T12:54:37.524671+08:00|info|qemu[17672]|[17672]|handle_qmp_command[3930]|: +> +qmp_cmd_name: cont +> +2017-03-01T12:54:37.524733+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-balloon device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.525434+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-net device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.525484+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-blk device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.525562+08:00|info|qemu[17672]|[17672]|virtio_set_status[725]|: +> +virtio-serial device status is 7 that means DRIVER OK +> +2017-03-01T12:54:37.527653+08:00|info|qemu[17672]|[17672]|vm_start[981]|: +> +vm_state-notify:3ms +> +2017-03-01T12:54:37.528523+08:00|info|qemu[17672]|[17672]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344077, "microseconds": 527699}, "event": +> +"RESUME"} +> +2017-03-01T12:54:37.530680+08:00|info|qemu[17672]|[33614]|migration_bitmap_sync[720]|: +> +this iteration cycle takes 3s, new dirtied data:0MB +> +2017-03-01T12:54:37.530909+08:00|info|qemu[17672]|[33614]|monitor_qapi_event_emit[479]|: +> +{"timestamp": {"seconds": 1488344077, "microseconds": 530733}, "event": +> +"MIGRATION_PASS", "data": {"pass": 3}} +> +2017-03-01T04:54:37.530997Z qemu-kvm: socket_writev_buffer: Got err=32 for +> +(131583/18446744073709551615) +> +qemu-kvm: +> +/home/abuild/rpmbuild/BUILD/qemu-kvm-2.6.0/hw/net/virtio_net.c:1519: +> +virtio_net_save: Assertion `!n->vhost_started' failed. +> +2017-03-01 12:54:43.028: shutting down +> +> +From qemu log, qemu received and processed migrate_cancel/cont qmp commands +> +after guest been stopped and entered the last round of migration. Then +> +migration thread try to save device state when guest is running(started by +> +cont command), causes assert and coredump. +> +This is because in last iter, we call cpu_synchronize_all_states() to +> +synchronize vcpu states, this call will release qemu_global_mutex and wait +> +for do_kvm_cpu_synchronize_state() to be executed on target vcpu: +> +(gdb) bt +> +#0 0x00007f763d1046d5 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +/lib64/libpthread.so.0 +> +#1 0x00007f7643e51d7f in qemu_cond_wait (cond=0x7f764445eca0 +> +<qemu_work_cond>, mutex=0x7f764445eba0 <qemu_global_mutex>) at +> +util/qemu-thread-posix.c:132 +> +#2 0x00007f7643a2e154 in run_on_cpu (cpu=0x7f7644e06d80, func=0x7f7643a46413 +> +<do_kvm_cpu_synchronize_state>, data=0x7f7644e06d80) at +> +/mnt/public/yanghy/qemu-kvm/cpus.c:995 +> +#3 0x00007f7643a46487 in kvm_cpu_synchronize_state (cpu=0x7f7644e06d80) at +> +/mnt/public/yanghy/qemu-kvm/kvm-all.c:1805 +> +#4 0x00007f7643a2c700 in cpu_synchronize_state (cpu=0x7f7644e06d80) at +> +/mnt/public/yanghy/qemu-kvm/include/sysemu/kvm.h:457 +> +#5 0x00007f7643a2db0c in cpu_synchronize_all_states () at +> +/mnt/public/yanghy/qemu-kvm/cpus.c:766 +> +#6 0x00007f7643a67b5b in qemu_savevm_state_complete_precopy +> +(f=0x7f76462f2d30, iterable_only=false) at +> +/mnt/public/yanghy/qemu-kvm/migration/savevm.c:1051 +> +#7 0x00007f7643d121e9 in migration_completion (s=0x7f76443e78c0 +> +<current_migration.37571>, current_active_state=4, +> +old_vm_running=0x7f74343fda00, start_time=0x7f74343fda08) at +> +migration/migration.c:1753 +> +#8 0x00007f7643d126c5 in migration_thread (opaque=0x7f76443e78c0 +> +<current_migration.37571>) at migration/migration.c:1922 +> +#9 0x00007f763d100dc5 in start_thread () from /lib64/libpthread.so.0 +> +#10 0x00007f763ce2e71d in clone () from /lib64/libc.so.6 +> +(gdb) p iothread_locked +> +$1 = true +> +> +and then, qemu main thread been executed, it won't block because migration +> +thread released the qemu_global_mutex: +> +(gdb) thr 1 +> +[Switching to thread 1 (Thread 0x7fe298e08bc0 (LWP 30767))] +> +#0 os_host_main_loop_wait (timeout=931565) at main-loop.c:270 +> +270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout +> +%d\n", timeout); +> +(gdb) p iothread_locked +> +$2 = true +> +(gdb) l 268 +> +263 +> +264 ret = qemu_poll_ns((GPollFD *)gpollfds->data, gpollfds->len, +> +timeout); +> +265 +> +266 +> +267 if (timeout) { +> +268 qemu_mutex_lock_iothread(); +> +269 if (runstate_check(RUN_STATE_FINISH_MIGRATE)) { +> +270 QEMU_LOG(LOG_INFO,"***** after qemu_pool_ns: timeout +> +%d\n", timeout); +> +271 } +> +272 } +> +(gdb) +> +> +So, although we've hold iothread_lock in stop© phase of migration, we +> +can't guarantee the iothread been locked all through the stop & copy phase, +> +any thoughts on how to solve this problem? +Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that +their were times when run_on_cpu would have to drop the BQL and I worried about +it, +but this is the 1st time I've seen an error due to it. + +Do you know what the migration state was at that point? Was it +MIGRATION_STATUS_CANCELLING? +I'm thinking perhaps we should stop 'cont' from continuing while migration is in +MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - so +that +perhaps libvirt could avoid sending the 'cont' until then? + +Dave + + +> +> +Thanks, +> +-Gonglei +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that +> +their were times when run_on_cpu would have to drop the BQL and I worried +> +about it, +> +but this is the 1st time I've seen an error due to it. +> +> +Do you know what the migration state was at that point? Was it +> +MIGRATION_STATUS_CANCELLING? +> +I'm thinking perhaps we should stop 'cont' from continuing while migration is +> +in +> +MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - so +> +that +> +perhaps libvirt could avoid sending the 'cont' until then? +No, there's no event, though I thought libvirt would poll until +"query-migrate" returns the cancelled state. Of course that is a small +consolation, because a segfault is unacceptable. + +One possibility is to suspend the monitor in qmp_migrate_cancel and +resume it (with add_migration_state_change_notifier) when we hit the +CANCELLED state. I'm not sure what the latency would be between the end +of migrate_fd_cancel and finally reaching CANCELLED. + +Paolo + +* Paolo Bonzini (address@hidden) wrote: +> +> +> +On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that +> +> their were times when run_on_cpu would have to drop the BQL and I worried +> +> about it, +> +> but this is the 1st time I've seen an error due to it. +> +> +> +> Do you know what the migration state was at that point? Was it +> +> MIGRATION_STATUS_CANCELLING? +> +> I'm thinking perhaps we should stop 'cont' from continuing while migration +> +> is in +> +> MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - +> +> so that +> +> perhaps libvirt could avoid sending the 'cont' until then? +> +> +No, there's no event, though I thought libvirt would poll until +> +"query-migrate" returns the cancelled state. Of course that is a small +> +consolation, because a segfault is unacceptable. +I think you might get an event if you set the new migrate capability called +'events' on! + +void migrate_set_state(int *state, int old_state, int new_state) +{ + if (atomic_cmpxchg(state, old_state, new_state) == old_state) { + trace_migrate_set_state(new_state); + migrate_generate_event(new_state); + } +} + +static void migrate_generate_event(int new_state) +{ + if (migrate_use_events()) { + qapi_event_send_migration(new_state, &error_abort); + } +} + +That event feature went in sometime after 2.3.0. + +> +One possibility is to suspend the monitor in qmp_migrate_cancel and +> +resume it (with add_migration_state_change_notifier) when we hit the +> +CANCELLED state. I'm not sure what the latency would be between the end +> +of migrate_fd_cancel and finally reaching CANCELLED. +I don't like suspending monitors; it can potentially take quite a significant +time to do a cancel. +How about making 'cont' fail if we're in CANCELLING? + +I'd really love to see the 'run_on_cpu' being more careful about the BQL; +we really need all of the rest of the devices to stay quiesced at times. + +Dave + +> +Paolo +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 03/03/2017 14:11, Dr. David Alan Gilbert wrote: +> +* Paolo Bonzini (address@hidden) wrote: +> +> +> +> +> +> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that +> +>> their were times when run_on_cpu would have to drop the BQL and I worried +> +>> about it, +> +>> but this is the 1st time I've seen an error due to it. +> +>> +> +>> Do you know what the migration state was at that point? Was it +> +>> MIGRATION_STATUS_CANCELLING? +> +>> I'm thinking perhaps we should stop 'cont' from continuing while migration +> +>> is in +> +>> MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - +> +>> so that +> +>> perhaps libvirt could avoid sending the 'cont' until then? +> +> +> +> No, there's no event, though I thought libvirt would poll until +> +> "query-migrate" returns the cancelled state. Of course that is a small +> +> consolation, because a segfault is unacceptable. +> +> +I think you might get an event if you set the new migrate capability called +> +'events' on! +> +> +void migrate_set_state(int *state, int old_state, int new_state) +> +{ +> +if (atomic_cmpxchg(state, old_state, new_state) == old_state) { +> +trace_migrate_set_state(new_state); +> +migrate_generate_event(new_state); +> +} +> +} +> +> +static void migrate_generate_event(int new_state) +> +{ +> +if (migrate_use_events()) { +> +qapi_event_send_migration(new_state, &error_abort); +> +} +> +} +> +> +That event feature went in sometime after 2.3.0. +> +> +> One possibility is to suspend the monitor in qmp_migrate_cancel and +> +> resume it (with add_migration_state_change_notifier) when we hit the +> +> CANCELLED state. I'm not sure what the latency would be between the end +> +> of migrate_fd_cancel and finally reaching CANCELLED. +> +> +I don't like suspending monitors; it can potentially take quite a significant +> +time to do a cancel. +> +How about making 'cont' fail if we're in CANCELLING? +Actually I thought that would be the case already (in fact CANCELLING is +internal only; the outside world sees it as "active" in query-migrate). + +Lei, what is the runstate? (That is, why did cont succeed at all)? + +Paolo + +> +I'd really love to see the 'run_on_cpu' being more careful about the BQL; +> +we really need all of the rest of the devices to stay quiesced at times. +That's not really possible, because of how condition variables work. :( + +* Paolo Bonzini (address@hidden) wrote: +> +> +> +On 03/03/2017 14:11, Dr. David Alan Gilbert wrote: +> +> * Paolo Bonzini (address@hidden) wrote: +> +>> +> +>> +> +>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago +> +>>> that +> +>>> their were times when run_on_cpu would have to drop the BQL and I worried +> +>>> about it, +> +>>> but this is the 1st time I've seen an error due to it. +> +>>> +> +>>> Do you know what the migration state was at that point? Was it +> +>>> MIGRATION_STATUS_CANCELLING? +> +>>> I'm thinking perhaps we should stop 'cont' from continuing while +> +>>> migration is in +> +>>> MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - +> +>>> so that +> +>>> perhaps libvirt could avoid sending the 'cont' until then? +> +>> +> +>> No, there's no event, though I thought libvirt would poll until +> +>> "query-migrate" returns the cancelled state. Of course that is a small +> +>> consolation, because a segfault is unacceptable. +> +> +> +> I think you might get an event if you set the new migrate capability called +> +> 'events' on! +> +> +> +> void migrate_set_state(int *state, int old_state, int new_state) +> +> { +> +> if (atomic_cmpxchg(state, old_state, new_state) == old_state) { +> +> trace_migrate_set_state(new_state); +> +> migrate_generate_event(new_state); +> +> } +> +> } +> +> +> +> static void migrate_generate_event(int new_state) +> +> { +> +> if (migrate_use_events()) { +> +> qapi_event_send_migration(new_state, &error_abort); +> +> } +> +> } +> +> +> +> That event feature went in sometime after 2.3.0. +> +> +> +>> One possibility is to suspend the monitor in qmp_migrate_cancel and +> +>> resume it (with add_migration_state_change_notifier) when we hit the +> +>> CANCELLED state. I'm not sure what the latency would be between the end +> +>> of migrate_fd_cancel and finally reaching CANCELLED. +> +> +> +> I don't like suspending monitors; it can potentially take quite a +> +> significant +> +> time to do a cancel. +> +> How about making 'cont' fail if we're in CANCELLING? +> +> +Actually I thought that would be the case already (in fact CANCELLING is +> +internal only; the outside world sees it as "active" in query-migrate). +> +> +Lei, what is the runstate? (That is, why did cont succeed at all)? +I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device +save, and that's what we get at the end of a migrate and it's legal to restart +from there. + +> +Paolo +> +> +> I'd really love to see the 'run_on_cpu' being more careful about the BQL; +> +> we really need all of the rest of the devices to stay quiesced at times. +> +> +That's not really possible, because of how condition variables work. :( +*Really* we need to find a solution to that - there's probably lots of +other things that can spring up in that small window other than the +'cont'. + +Dave + +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 03/03/2017 14:26, Dr. David Alan Gilbert wrote: +> +* Paolo Bonzini (address@hidden) wrote: +> +> +> +> +> +> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote: +> +>> * Paolo Bonzini (address@hidden) wrote: +> +>>> +> +>>> +> +>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago +> +>>>> that +> +>>>> their were times when run_on_cpu would have to drop the BQL and I worried +> +>>>> about it, +> +>>>> but this is the 1st time I've seen an error due to it. +> +>>>> +> +>>>> Do you know what the migration state was at that point? Was it +> +>>>> MIGRATION_STATUS_CANCELLING? +> +>>>> I'm thinking perhaps we should stop 'cont' from continuing while +> +>>>> migration is in +> +>>>> MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - +> +>>>> so that +> +>>>> perhaps libvirt could avoid sending the 'cont' until then? +> +>>> +> +>>> No, there's no event, though I thought libvirt would poll until +> +>>> "query-migrate" returns the cancelled state. Of course that is a small +> +>>> consolation, because a segfault is unacceptable. +> +>> +> +>> I think you might get an event if you set the new migrate capability called +> +>> 'events' on! +> +>> +> +>> void migrate_set_state(int *state, int old_state, int new_state) +> +>> { +> +>> if (atomic_cmpxchg(state, old_state, new_state) == old_state) { +> +>> trace_migrate_set_state(new_state); +> +>> migrate_generate_event(new_state); +> +>> } +> +>> } +> +>> +> +>> static void migrate_generate_event(int new_state) +> +>> { +> +>> if (migrate_use_events()) { +> +>> qapi_event_send_migration(new_state, &error_abort); +> +>> } +> +>> } +> +>> +> +>> That event feature went in sometime after 2.3.0. +> +>> +> +>>> One possibility is to suspend the monitor in qmp_migrate_cancel and +> +>>> resume it (with add_migration_state_change_notifier) when we hit the +> +>>> CANCELLED state. I'm not sure what the latency would be between the end +> +>>> of migrate_fd_cancel and finally reaching CANCELLED. +> +>> +> +>> I don't like suspending monitors; it can potentially take quite a +> +>> significant +> +>> time to do a cancel. +> +>> How about making 'cont' fail if we're in CANCELLING? +> +> +> +> Actually I thought that would be the case already (in fact CANCELLING is +> +> internal only; the outside world sees it as "active" in query-migrate). +> +> +> +> Lei, what is the runstate? (That is, why did cont succeed at all)? +> +> +I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the device +> +save, and that's what we get at the end of a migrate and it's legal to restart +> +from there. +Yeah, but I think we get there at the end of a failed migrate only. So +perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid +"cont" from finish-migrate (only allow it from failed-migrate)? + +Paolo + +> +> Paolo +> +> +> +>> I'd really love to see the 'run_on_cpu' being more careful about the BQL; +> +>> we really need all of the rest of the devices to stay quiesced at times. +> +> +> +> That's not really possible, because of how condition variables work. :( +> +> +*Really* we need to find a solution to that - there's probably lots of +> +other things that can spring up in that small window other than the +> +'cont'. +> +> +Dave +> +> +-- +> +Dr. David Alan Gilbert / address@hidden / Manchester, UK +> + +Hi Paolo, + +On Fri, Mar 3, 2017 at 9:33 PM, Paolo Bonzini <address@hidden> wrote: + +> +> +> +On 03/03/2017 14:26, Dr. David Alan Gilbert wrote: +> +> * Paolo Bonzini (address@hidden) wrote: +> +>> +> +>> +> +>> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote: +> +>>> * Paolo Bonzini (address@hidden) wrote: +> +>>>> +> +>>>> +> +>>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +>>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while +> +ago that +> +>>>>> their were times when run_on_cpu would have to drop the BQL and I +> +worried about it, +> +>>>>> but this is the 1st time I've seen an error due to it. +> +>>>>> +> +>>>>> Do you know what the migration state was at that point? Was it +> +MIGRATION_STATUS_CANCELLING? +> +>>>>> I'm thinking perhaps we should stop 'cont' from continuing while +> +migration is in +> +>>>>> MIGRATION_STATUS_CANCELLING. Do we send an event when we hit +> +CANCELLED - so that +> +>>>>> perhaps libvirt could avoid sending the 'cont' until then? +> +>>>> +> +>>>> No, there's no event, though I thought libvirt would poll until +> +>>>> "query-migrate" returns the cancelled state. Of course that is a +> +small +> +>>>> consolation, because a segfault is unacceptable. +> +>>> +> +>>> I think you might get an event if you set the new migrate capability +> +called +> +>>> 'events' on! +> +>>> +> +>>> void migrate_set_state(int *state, int old_state, int new_state) +> +>>> { +> +>>> if (atomic_cmpxchg(state, old_state, new_state) == old_state) { +> +>>> trace_migrate_set_state(new_state); +> +>>> migrate_generate_event(new_state); +> +>>> } +> +>>> } +> +>>> +> +>>> static void migrate_generate_event(int new_state) +> +>>> { +> +>>> if (migrate_use_events()) { +> +>>> qapi_event_send_migration(new_state, &error_abort); +> +>>> } +> +>>> } +> +>>> +> +>>> That event feature went in sometime after 2.3.0. +> +>>> +> +>>>> One possibility is to suspend the monitor in qmp_migrate_cancel and +> +>>>> resume it (with add_migration_state_change_notifier) when we hit the +> +>>>> CANCELLED state. I'm not sure what the latency would be between the +> +end +> +>>>> of migrate_fd_cancel and finally reaching CANCELLED. +> +>>> +> +>>> I don't like suspending monitors; it can potentially take quite a +> +significant +> +>>> time to do a cancel. +> +>>> How about making 'cont' fail if we're in CANCELLING? +> +>> +> +>> Actually I thought that would be the case already (in fact CANCELLING is +> +>> internal only; the outside world sees it as "active" in query-migrate). +> +>> +> +>> Lei, what is the runstate? (That is, why did cont succeed at all)? +> +> +> +> I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the +> +device +> +> save, and that's what we get at the end of a migrate and it's legal to +> +restart +> +> from there. +> +> +Yeah, but I think we get there at the end of a failed migrate only. So +> +perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE +I think we do not need to introduce a new state here. If we hit 'cont' and +the run state is RUN_STATE_FINISH_MIGRATE, we could assume that +migration failed because 'RUN_STATE_FINISH_MIGRATE' only exists on +source side, means we are finishing migration, a 'cont' at the meantime +indicates that we are rolling back, otherwise source side should be +destroyed. + + +> +and forbid +> +"cont" from finish-migrate (only allow it from failed-migrate)? +> +The problem of forbid 'cont' here is that it will result in a failed +migration and the source +side will remain paused. We actually expect a usable guest when rollback. +Is there a way to kill migration thread when we're under main thread, if +there is, we +could do the following to solve this problem: +1. 'cont' received during runstate RUN_STATE_FINISH_MIGRATE +2. kill migration thread +3. vm_start() + +But this only solves 'cont' problem. As Dave said before, other things could +happen during the small windows while we are finishing migration, that's +what I was worried about... + + +> +Paolo +> +> +>> Paolo +> +>> +> +>>> I'd really love to see the 'run_on_cpu' being more careful about the +> +BQL; +> +>>> we really need all of the rest of the devices to stay quiesced at +> +times. +> +>> +> +>> That's not really possible, because of how condition variables work. :( +> +> +> +> *Really* we need to find a solution to that - there's probably lots of +> +> other things that can spring up in that small window other than the +> +> 'cont'. +> +> +> +> Dave +> +> +> +> -- +> +> Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> +> +> + +* Paolo Bonzini (address@hidden) wrote: +> +> +> +On 03/03/2017 14:26, Dr. David Alan Gilbert wrote: +> +> * Paolo Bonzini (address@hidden) wrote: +> +>> +> +>> +> +>> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote: +> +>>> * Paolo Bonzini (address@hidden) wrote: +> +>>>> +> +>>>> +> +>>>> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +>>>>> Ouch that's pretty nasty; I remember Paolo explaining to me a while ago +> +>>>>> that +> +>>>>> their were times when run_on_cpu would have to drop the BQL and I +> +>>>>> worried about it, +> +>>>>> but this is the 1st time I've seen an error due to it. +> +>>>>> +> +>>>>> Do you know what the migration state was at that point? Was it +> +>>>>> MIGRATION_STATUS_CANCELLING? +> +>>>>> I'm thinking perhaps we should stop 'cont' from continuing while +> +>>>>> migration is in +> +>>>>> MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED +> +>>>>> - so that +> +>>>>> perhaps libvirt could avoid sending the 'cont' until then? +> +>>>> +> +>>>> No, there's no event, though I thought libvirt would poll until +> +>>>> "query-migrate" returns the cancelled state. Of course that is a small +> +>>>> consolation, because a segfault is unacceptable. +> +>>> +> +>>> I think you might get an event if you set the new migrate capability +> +>>> called +> +>>> 'events' on! +> +>>> +> +>>> void migrate_set_state(int *state, int old_state, int new_state) +> +>>> { +> +>>> if (atomic_cmpxchg(state, old_state, new_state) == old_state) { +> +>>> trace_migrate_set_state(new_state); +> +>>> migrate_generate_event(new_state); +> +>>> } +> +>>> } +> +>>> +> +>>> static void migrate_generate_event(int new_state) +> +>>> { +> +>>> if (migrate_use_events()) { +> +>>> qapi_event_send_migration(new_state, &error_abort); +> +>>> } +> +>>> } +> +>>> +> +>>> That event feature went in sometime after 2.3.0. +> +>>> +> +>>>> One possibility is to suspend the monitor in qmp_migrate_cancel and +> +>>>> resume it (with add_migration_state_change_notifier) when we hit the +> +>>>> CANCELLED state. I'm not sure what the latency would be between the end +> +>>>> of migrate_fd_cancel and finally reaching CANCELLED. +> +>>> +> +>>> I don't like suspending monitors; it can potentially take quite a +> +>>> significant +> +>>> time to do a cancel. +> +>>> How about making 'cont' fail if we're in CANCELLING? +> +>> +> +>> Actually I thought that would be the case already (in fact CANCELLING is +> +>> internal only; the outside world sees it as "active" in query-migrate). +> +>> +> +>> Lei, what is the runstate? (That is, why did cont succeed at all)? +> +> +> +> I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the +> +> device +> +> save, and that's what we get at the end of a migrate and it's legal to +> +> restart +> +> from there. +> +> +Yeah, but I think we get there at the end of a failed migrate only. So +> +perhaps we can introduce a new state RUN_STATE_FAILED_MIGRATE and forbid +> +"cont" from finish-migrate (only allow it from failed-migrate)? +OK, I was wrong in my previous statement; we actually go +FINISH_MIGRATE->POSTMIGRATE +so no new state is needed; you shouldn't be restarting the cpu in +FINISH_MIGRATE. + +My preference is to get libvirt to wait for the transition to POSTMIGRATE before +it issues the 'cont'. I'd rather not block the monitor with 'cont' but I'm +not sure how we'd cleanly make cont fail without breaking existing libvirts +that usually don't hit this race. (cc'ing in Jiri). + +Dave + +> +Paolo +> +> +>> Paolo +> +>> +> +>>> I'd really love to see the 'run_on_cpu' being more careful about the BQL; +> +>>> we really need all of the rest of the devices to stay quiesced at times. +> +>> +> +>> That's not really possible, because of how condition variables work. :( +> +> +> +> *Really* we need to find a solution to that - there's probably lots of +> +> other things that can spring up in that small window other than the +> +> 'cont'. +> +> +> +> Dave +> +> +> +> -- +> +> Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +Hi Dave, + +On Fri, Mar 3, 2017 at 9:26 PM, Dr. David Alan Gilbert <address@hidden> +wrote: + +> +* Paolo Bonzini (address@hidden) wrote: +> +> +> +> +> +> On 03/03/2017 14:11, Dr. David Alan Gilbert wrote: +> +> > * Paolo Bonzini (address@hidden) wrote: +> +> >> +> +> >> +> +> >> On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: +> +... +> +> > That event feature went in sometime after 2.3.0. +> +> > +> +> >> One possibility is to suspend the monitor in qmp_migrate_cancel and +> +> >> resume it (with add_migration_state_change_notifier) when we hit the +> +> >> CANCELLED state. I'm not sure what the latency would be between the +> +end +> +> >> of migrate_fd_cancel and finally reaching CANCELLED. +> +> > +> +> > I don't like suspending monitors; it can potentially take quite a +> +significant +> +> > time to do a cancel. +> +> > How about making 'cont' fail if we're in CANCELLING? +> +> +> +> Actually I thought that would be the case already (in fact CANCELLING is +> +> internal only; the outside world sees it as "active" in query-migrate). +> +> +> +> Lei, what is the runstate? (That is, why did cont succeed at all)? +> +> +I suspect it's RUN_STATE_FINISH_MIGRATE - we set that before we do the +> +device +> +It is RUN_STATE_FINISH_MIGRATE. + + +> +save, and that's what we get at the end of a migrate and it's legal to +> +restart +> +from there. +> +> +> Paolo +> +> +> +> > I'd really love to see the 'run_on_cpu' being more careful about the +> +BQL; +> +> > we really need all of the rest of the devices to stay quiesced at +> +times. +> +> +> +> That's not really possible, because of how condition variables work. :( +> +> +*Really* we need to find a solution to that - there's probably lots of +> +other things that can spring up in that small window other than the +> +'cont'. +> +This is what I was worry about. Not only sync_cpu_state() will call +run_on_cpu() +but also vm_stop_force_state() will, both of them did hit the small windows +in our +test. + + +> +> +Dave +> +> +-- +> +Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> + diff --git a/classification_output/01/other/4970412 b/classification_output/01/other/4970412 new file mode 100644 index 000000000..7fa99572a --- /dev/null +++ b/classification_output/01/other/4970412 @@ -0,0 +1,88 @@ +other: 0.964 +mistranslation: 0.935 +instruction: 0.919 +semantic: 0.891 + +[BUG FIX][PATCH v3 0/3] vhost-user-blk: fix bug on device disconnection during initialization + +This is a series fixing a bug in + host-user-blk. +Is there any chance for it to be considered for the next rc? +Thanks! +Denis +On 29.03.2021 16:44, Denis Plotnikov + wrote: +ping! +On 25.03.2021 18:12, Denis Plotnikov + wrote: +v3: + * 0003: a new patch added fixing the problem on vm shutdown + I stumbled on this bug after v2 sending. + * 0001: gramma fixing (Raphael) + * 0002: commit message fixing (Raphael) + +v2: + * split the initial patch into two (Raphael) + * rename init to realized (Raphael) + * remove unrelated comment (Raphael) + +When the vhost-user-blk device lose the connection to the daemon during +the initialization phase it kills qemu because of the assert in the code. +The series fixes the bug. + +0001 is preparation for the fix +0002 fixes the bug, patch description has the full motivation for the series +0003 (added in v3) fix bug on vm shutdown + +Denis Plotnikov (3): + vhost-user-blk: use different event handlers on initialization + vhost-user-blk: perform immediate cleanup if disconnect on + initialization + vhost-user-blk: add immediate cleanup on shutdown + + hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++--------------- + 1 file changed, 48 insertions(+), 31 deletions(-) + +On 01.04.2021 14:21, Denis Plotnikov wrote: +This is a series fixing a bug in host-user-blk. +More specifically, it's not just a bug but crasher. + +Valentine +Is there any chance for it to be considered for the next rc? + +Thanks! + +Denis + +On 29.03.2021 16:44, Denis Plotnikov wrote: +ping! + +On 25.03.2021 18:12, Denis Plotnikov wrote: +v3: + * 0003: a new patch added fixing the problem on vm shutdown + I stumbled on this bug after v2 sending. + * 0001: gramma fixing (Raphael) + * 0002: commit message fixing (Raphael) + +v2: + * split the initial patch into two (Raphael) + * rename init to realized (Raphael) + * remove unrelated comment (Raphael) + +When the vhost-user-blk device lose the connection to the daemon during +the initialization phase it kills qemu because of the assert in the code. +The series fixes the bug. + +0001 is preparation for the fix +0002 fixes the bug, patch description has the full motivation for the series +0003 (added in v3) fix bug on vm shutdown + +Denis Plotnikov (3): + vhost-user-blk: use different event handlers on initialization + vhost-user-blk: perform immediate cleanup if disconnect on + initialization + vhost-user-blk: add immediate cleanup on shutdown + + hw/block/vhost-user-blk.c | 79 ++++++++++++++++++++++++--------------- + 1 file changed, 48 insertions(+), 31 deletions(-) + diff --git a/classification_output/01/other/5057521 b/classification_output/01/other/5057521 new file mode 100644 index 000000000..b2baccb37 --- /dev/null +++ b/classification_output/01/other/5057521 @@ -0,0 +1,771 @@ +other: 0.984 +instruction: 0.966 +semantic: 0.962 +mistranslation: 0.959 + +[Qemu-devel] [BUG] living migrate vm pause forever + +Sometimes, living migrate vm pause forever, migrate job stop, but very small +probability, I canât reproduce. +qemu wait semaphore from libvirt send migrate continue, however libvirt wait +semaphore from qemu send vm pause. + +follow stack: +qemu: +Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): +#0 0x00007f504b84d670 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 +#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at +qemu-2.12/util/qemu-thread-posix.c:322 +#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, +current_active_state=0x7f50445f2ae4, new_state=10) + at qemu-2.12/migration/migration.c:2106 +#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at +qemu-2.12/migration/migration.c:2137 +#4 migration_iteration_run (s=0x5574ef692f50) at +qemu-2.12/migration/migration.c:2311 +#5 migration_thread (opaque=0x5574ef692f50) +atqemu-2.12/migration/migration.c:2415 +#6 0x00007f504b847184 in start_thread () from +/lib/x86_64-linux-gnu/libpthread.so.0 +#7 0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6 + +libvirt: +Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): +#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from +/lib/x86_64-linux-gnu/libpthread.so.0 +#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at +../../../src/util/virthread.c:252 +#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at +../../../src/conf/domain_conf.c:3303 +#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, +vm=0x7fdbc4002f20, persist_xml=0x0, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss +</hostname>\n +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +migParams=0x7fdb82ffc900) + at ../../../src/qemu/qemu_migration.c:3937 +#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, +vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 +"tcp://172.16.202.17:49152", + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hos---Type <return> to continue, or q <return> to quit--- +tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, +migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) + at ../../../src/qemu/qemu_migration.c:4118 +#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +migParams=0x7fdb82ffc900, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + resource=0) at ../../../src/qemu/qemu_migration.c:5030 +#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, +dconnuri=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, +compression=0x7fdb78007990, + migParams=0x7fdb82ffc900, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + dname=0x0, resource=0, v3proto=true) at +../../../src/qemu/qemu_migration.c:5124 +#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, +xmlin=0x0, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +dconnuri=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +resource=0) at ../../../src/qemu/qemu_driver.c:12996 +#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, +xmlin=0x0, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +dconnuri=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +bandwidth=0) at ../../../src/libvirt-domain.c:4698 +#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +rerr=0x7fdb82ffcbc0, + args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528 +#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +rerr=0x7fdb82ffcbc0, + args=0x7fdb7800b220, ret=0x7fdb78021e90) at +../../../daemon/remote_dispatch.h:7944 +#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall (prog=0x55d13af98b50, +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) + at ../../../src/rpc/virnetserverprogram.c:436 +#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) + at ../../../src/rpc/virnetserverprogram.c:307 +#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, +client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) + at ../../../src/rpc/virnetserver.c:148 +------------------------------------------------------------------------------------------------------------------------------------- +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +é®ä»¶ï¼ +This e-mail and its attachments contain confidential information from New H3C, +which is +intended only for the person or entity whose address is listed above. Any use +of the +information contained herein in any way (including, but not limited to, total +or partial +disclosure, reproduction, or dissemination) by persons other than the intended +recipient(s) is prohibited. If you receive this e-mail in error, please notify +the sender +by phone or email immediately and delete it! + +* Yuchen (address@hidden) wrote: +> +Sometimes, living migrate vm pause forever, migrate job stop, but very small +> +probability, I canât reproduce. +> +qemu wait semaphore from libvirt send migrate continue, however libvirt wait +> +semaphore from qemu send vm pause. +Hi, + I've copied in Jiri Denemark from libvirt. +Can you confirm exactly which qemu and libvirt versions you're using +please. + +> +follow stack: +> +qemu: +> +Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): +> +#0 0x00007f504b84d670 in sem_wait () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at +> +qemu-2.12/util/qemu-thread-posix.c:322 +> +#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, +> +current_active_state=0x7f50445f2ae4, new_state=10) +> +at qemu-2.12/migration/migration.c:2106 +> +#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2137 +> +#4 migration_iteration_run (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2311 +> +#5 migration_thread (opaque=0x5574ef692f50) +> +atqemu-2.12/migration/migration.c:2415 +> +#6 0x00007f504b847184 in start_thread () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#7 0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6 +In migration_maybe_pause we have: + + migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); + qemu_sem_wait(&s->pause_sem); + migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); + +the line numbers don't match my 2.12.0 checkout; so I guess that it's +that qemu_sem_wait it's stuck at. + +QEMU must have sent the switch to PRE_SWITCHOVER and that should have +sent an event to libvirt, and libvirt should notice that - I'm +not sure how to tell whether libvirt has seen that event yet or not? + +Dave + +> +libvirt: +> +Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): +> +#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at +> +../../../src/util/virthread.c:252 +> +#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at +> +../../../src/conf/domain_conf.c:3303 +> +#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss +> +</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:3937 +> +#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 +> +"tcp://172.16.202.17:49152", +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n <hos---Type <return> to continue, or q <return> +> +to quit--- +> +tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, +> +migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:4118 +> +#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0) at ../../../src/qemu/qemu_migration.c:5030 +> +#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, +> +compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +dname=0x0, resource=0, v3proto=true) at +> +../../../src/qemu/qemu_migration.c:5124 +> +#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +> +resource=0) at ../../../src/qemu/qemu_driver.c:12996 +> +#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +> +bandwidth=0) at ../../../src/libvirt-domain.c:4698 +> +#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528 +> +#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at +> +../../../daemon/remote_dispatch.h:7944 +> +#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall +> +(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0, +> +msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:436 +> +#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, +> +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:307 +> +#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, +> +client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserver.c:148 +> +------------------------------------------------------------------------------------------------------------------------------------- +> +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +> +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +> +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +> +é®ä»¶ï¼ +> +This e-mail and its attachments contain confidential information from New +> +H3C, which is +> +intended only for the person or entity whose address is listed above. Any use +> +of the +> +information contained herein in any way (including, but not limited to, total +> +or partial +> +disclosure, reproduction, or dissemination) by persons other than the intended +> +recipient(s) is prohibited. If you receive this e-mail in error, please +> +notify the sender +> +by phone or email immediately and delete it! +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +In migration_maybe_pause we have: + + migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); + qemu_sem_wait(&s->pause_sem); + migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); + +the line numbers don't match my 2.12.0 checkout; so I guess that it's that +qemu_sem_wait it's stuck at. + +QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an +event to libvirt, and libvirt should notice that - I'm not sure how to tell +whether libvirt has seen that event yet or not? + + +Thank you for your attention. +Yes, you are right, QEMU wait semaphore in this place. +I use qemu-2.12.1, libvirt-4.0.0. +Because I added some debug code, so the line numbers doesn't match open qemu + +-----é®ä»¶åä»¶----- +å件人: Dr. David Alan Gilbert [ +mailto:address@hidden +] +åéæ¶é´: 2019å¹´8æ21æ¥ 19:13 +æ¶ä»¶äºº: yuchen (Cloud) <address@hidden>; address@hidden +æé: address@hidden +主é¢: Re: [Qemu-devel] [BUG] living migrate vm pause forever + +* Yuchen (address@hidden) wrote: +> +Sometimes, living migrate vm pause forever, migrate job stop, but very small +> +probability, I canât reproduce. +> +qemu wait semaphore from libvirt send migrate continue, however libvirt wait +> +semaphore from qemu send vm pause. +Hi, + I've copied in Jiri Denemark from libvirt. +Can you confirm exactly which qemu and libvirt versions you're using please. + +> +follow stack: +> +qemu: +> +Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): +> +#0 0x00007f504b84d670 in sem_wait () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) +> +at qemu-2.12/util/qemu-thread-posix.c:322 +> +#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, +> +current_active_state=0x7f50445f2ae4, new_state=10) +> +at qemu-2.12/migration/migration.c:2106 +> +#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2137 +> +#4 migration_iteration_run (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2311 +> +#5 migration_thread (opaque=0x5574ef692f50) +> +atqemu-2.12/migration/migration.c:2415 +> +#6 0x00007f504b847184 in start_thread () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#7 0x00007f504b574bed in clone () from +> +/lib/x86_64-linux-gnu/libc.so.6 +In migration_maybe_pause we have: + + migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); + qemu_sem_wait(&s->pause_sem); + migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); + +the line numbers don't match my 2.12.0 checkout; so I guess that it's that +qemu_sem_wait it's stuck at. + +QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an +event to libvirt, and libvirt should notice that - I'm not sure how to tell +whether libvirt has seen that event yet or not? + +Dave + +> +libvirt: +> +Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): +> +#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, +> +m=0x7fdbc4002f30) at ../../../src/util/virthread.c:252 +> +#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at +> +../../../src/conf/domain_conf.c:3303 +> +#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss +> +</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:3937 +> +#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 +> +"tcp://172.16.202.17:49152", +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n +> +<name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n <hos---Type <return> to continue, or q +> +<return> to quit--- +> +tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra".. +> +tuuid>., cookieinlen=207, cookieout=0x7fdb82ffcad0, +> +tuuid>cookieoutlen=0x7fdb82ffcac8, flags=777, +> +resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, +> +migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:4118 +> +#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0) at ../../../src/qemu/qemu_migration.c:5030 +> +#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, +> +compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +dname=0x0, resource=0, v3proto=true) at +> +../../../src/qemu/qemu_migration.c:5124 +> +#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, +> +dname=0x0, resource=0) at ../../../src/qemu/qemu_driver.c:12996 +> +#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, +> +dname=0x0, bandwidth=0) at ../../../src/libvirt-domain.c:4698 +> +#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at +> +../../../daemon/remote.c:4528 +> +#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at +> +../../../daemon/remote_dispatch.h:7944 +> +#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall +> +(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0, +> +msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:436 +> +#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, +> +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:307 +> +#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, +> +client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserver.c:148 +> +---------------------------------------------------------------------- +> +--------------------------------------------------------------- +> +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +> +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +> +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +> +é®ä»¶ï¼ +> +This e-mail and its attachments contain confidential information from +> +New H3C, which is intended only for the person or entity whose address +> +is listed above. Any use of the information contained herein in any +> +way (including, but not limited to, total or partial disclosure, +> +reproduction, or dissemination) by persons other than the intended +> +recipient(s) is prohibited. If you receive this e-mail in error, +> +please notify the sender by phone or email immediately and delete it! +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + diff --git a/classification_output/01/other/5215275 b/classification_output/01/other/5215275 new file mode 100644 index 000000000..d563c8126 --- /dev/null +++ b/classification_output/01/other/5215275 @@ -0,0 +1,538 @@ +other: 0.781 +semantic: 0.764 +instruction: 0.754 +mistranslation: 0.665 + +[Qemu-devel] [BUG/RFC] INIT IPI lost when VM starts + +Hi, +We encountered a problem that when a domain starts, seabios failed to online a +vCPU. + +After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit +in +vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI +sent +to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp command +to qemu +on VM start. + +In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> +do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and +sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call +kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is +overwritten by qemu. + +I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after +âquery-cpusâ, +and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure +whether +it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in +each caller. + +Whatâs your opinion? + +Let me clarify it more clearly. Time sequence is that qemu handles âquery-cpusâ qmp +command, vcpu 1 (and vcpu 0) got registers from kvm-kmod (qmp_query_cpus-> +cpu_synchronize_state-> kvm_cpu_synchronize_state-> +> do_kvm_cpu_synchronize_state-> kvm_arch_get_registers), then vcpu 0 (BSP) +sends INIT-SIPI to vcpu 1(AP). In kvm-kmod, vcpu 1âs pending_eventsâs KVM_APIC_INIT +bit set. +Then vcpu 1 continue running, vcpu1 thread in qemu calls +kvm_arch_put_registers-> kvm_put_vcpu_events, so KVM_APIC_INIT bit in vcpu 1âs +pending_events got cleared, i.e., lost. + +In kvm-kmod, except for pending_events, sipi_vector may also be overwritten., +so I am not sure if there are other fields/registers in danger, i.e., those may +be modified asynchronously with vcpu thread itself. + +BTW, using a sleep like following can reliably reproduce this problem, if VM +equipped with more than 2 vcpus and starting VM using libvirtd. + +diff --git a/target/i386/kvm.c b/target/i386/kvm.c +index 55865db..5099290 100644 +--- a/target/i386/kvm.c ++++ b/target/i386/kvm.c +@@ -2534,6 +2534,11 @@ static int kvm_put_vcpu_events(X86CPU *cpu, int level) + KVM_VCPUEVENT_VALID_NMI_PENDING | KVM_VCPUEVENT_VALID_SIPI_VECTOR; + } + ++ if (CPU(cpu)->cpu_index == 1) { ++ fprintf(stderr, "vcpu 1 sleep!!!!\n"); ++ sleep(10); ++ } ++ + return kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events); + } + + +On 2017/3/20 22:21, Herongguang (Stephen) wrote: +Hi, +We encountered a problem that when a domain starts, seabios failed to online a +vCPU. + +After investigation, we found that the reason is in kvm-kmod, KVM_APIC_INIT bit +in +vcpu->arch.apic->pending_events was overwritten by qemu, and thus an INIT IPI +sent +to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp command +to qemu +on VM start. + +In qemu, qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> +do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from kvm-kmod and +sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call +kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus pending_events is +overwritten by qemu. + +I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true after +âquery-cpusâ, +and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am not sure +whether +it is OK for qemu to set cpu->kvm_vcpu_dirty in do_kvm_cpu_synchronize_state in +each caller. + +Whatâs your opinion? + +On 20/03/2017 15:21, Herongguang (Stephen) wrote: +> +> +We encountered a problem that when a domain starts, seabios failed to +> +online a vCPU. +> +> +After investigation, we found that the reason is in kvm-kmod, +> +KVM_APIC_INIT bit in +> +vcpu->arch.apic->pending_events was overwritten by qemu, and thus an +> +INIT IPI sent +> +to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp +> +command to qemu +> +on VM start. +> +> +In qemu, qmp_query_cpus-> cpu_synchronize_state-> +> +kvm_cpu_synchronize_state-> +> +do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from +> +kvm-kmod and +> +sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call +> +kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus +> +pending_events is +> +overwritten by qemu. +> +> +I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true +> +after âquery-cpusâ, +> +and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am +> +not sure whether +> +it is OK for qemu to set cpu->kvm_vcpu_dirty in +> +do_kvm_cpu_synchronize_state in each caller. +> +> +Whatâs your opinion? +Hi Rongguang, + +sorry for the late response. + +Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear the +bit, but the result of the INIT is stored in mp_state. + +kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves +KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes +it back. Maybe it should ignore events.smi.latched_init if not in SMM, +but I would like to understand the exact sequence of events. + +Thanks, + +paolo + +On 2017/4/6 0:16, Paolo Bonzini wrote: +On 20/03/2017 15:21, Herongguang (Stephen) wrote: +We encountered a problem that when a domain starts, seabios failed to +online a vCPU. + +After investigation, we found that the reason is in kvm-kmod, +KVM_APIC_INIT bit in +vcpu->arch.apic->pending_events was overwritten by qemu, and thus an +INIT IPI sent +to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp +command to qemu +on VM start. + +In qemu, qmp_query_cpus-> cpu_synchronize_state-> +kvm_cpu_synchronize_state-> +do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from +kvm-kmod and +sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call +kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus +pending_events is +overwritten by qemu. + +I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true +after âquery-cpusâ, +and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am +not sure whether +it is OK for qemu to set cpu->kvm_vcpu_dirty in +do_kvm_cpu_synchronize_state in each caller. + +Whatâs your opinion? +Hi Rongguang, + +sorry for the late response. + +Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear the +bit, but the result of the INIT is stored in mp_state. +It's dropped in KVM_SET_VCPU_EVENTS, see below. +kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves +KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes +it back. Maybe it should ignore events.smi.latched_init if not in SMM, +but I would like to understand the exact sequence of events. +time0: +vcpu1: +qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> +> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to true)-> +kvm_arch_get_registers(KVM_APIC_INIT bit in vcpu->arch.apic->pending_events was not set) + +time1: +vcpu0: +send INIT-SIPI to all AP->(in vcpu 0's context)__apic_accept_irq(KVM_APIC_INIT bit +in vcpu1's arch.apic->pending_events is set) + +time2: +vcpu1: +kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is +true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten KVM_APIC_INIT bit in +vcpu->arch.apic->pending_events!) + +So it's a race between vcpu1 get/put registers with kvm/other vcpus changing +vcpu1's status/structure fields in the mean time, I am in worry of if there are +other fields may be overwritten, +sipi_vector is one. + +also see: +https://www.mail-archive.com/address@hidden/msg438675.html +Thanks, + +paolo + +. + +Hi Paolo, + +What's your opinion about this patch? We found it just before finishing patches +for the past two days. + + +Thanks, +-Gonglei + + +> +-----Original Message----- +> +From: address@hidden [ +mailto:address@hidden +On +> +Behalf Of Herongguang (Stephen) +> +Sent: Thursday, April 06, 2017 9:47 AM +> +To: Paolo Bonzini; address@hidden; address@hidden; +> +address@hidden; address@hidden; address@hidden; +> +wangxin (U); Huangweidong (C) +> +Subject: Re: [BUG/RFC] INIT IPI lost when VM starts +> +> +> +> +On 2017/4/6 0:16, Paolo Bonzini wrote: +> +> +> +> On 20/03/2017 15:21, Herongguang (Stephen) wrote: +> +>> We encountered a problem that when a domain starts, seabios failed to +> +>> online a vCPU. +> +>> +> +>> After investigation, we found that the reason is in kvm-kmod, +> +>> KVM_APIC_INIT bit in +> +>> vcpu->arch.apic->pending_events was overwritten by qemu, and thus an +> +>> INIT IPI sent +> +>> to AP was lost. Qemu does this since libvirtd sends a âquery-cpusâ qmp +> +>> command to qemu +> +>> on VM start. +> +>> +> +>> In qemu, qmp_query_cpus-> cpu_synchronize_state-> +> +>> kvm_cpu_synchronize_state-> +> +>> do_kvm_cpu_synchronize_state, qemu gets registers/vcpu_events from +> +>> kvm-kmod and +> +>> sets cpu->kvm_vcpu_dirty to true, and vcpu thread in qemu will call +> +>> kvm_arch_put_registers if cpu->kvm_vcpu_dirty is true, thus +> +>> pending_events is +> +>> overwritten by qemu. +> +>> +> +>> I think there is no need for qemu to set cpu->kvm_vcpu_dirty to true +> +>> after âquery-cpusâ, +> +>> and kvm-kmod should not clear KVM_APIC_INIT unconditionally. And I am +> +>> not sure whether +> +>> it is OK for qemu to set cpu->kvm_vcpu_dirty in +> +>> do_kvm_cpu_synchronize_state in each caller. +> +>> +> +>> Whatâs your opinion? +> +> Hi Rongguang, +> +> +> +> sorry for the late response. +> +> +> +> Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear +> +the +> +> bit, but the result of the INIT is stored in mp_state. +> +> +It's dropped in KVM_SET_VCPU_EVENTS, see below. +> +> +> +> +> kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves +> +> KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes +> +> it back. Maybe it should ignore events.smi.latched_init if not in SMM, +> +> but I would like to understand the exact sequence of events. +> +> +time0: +> +vcpu1: +> +qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> +> +> do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to +> +true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in +> +vcpu->arch.apic->pending_events was not set) +> +> +time1: +> +vcpu0: +> +send INIT-SIPI to all AP->(in vcpu 0's +> +context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's +> +arch.apic->pending_events is set) +> +> +time2: +> +vcpu1: +> +kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is +> +true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten +> +KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!) +> +> +So it's a race between vcpu1 get/put registers with kvm/other vcpus changing +> +vcpu1's status/structure fields in the mean time, I am in worry of if there +> +are +> +other fields may be overwritten, +> +sipi_vector is one. +> +> +also see: +> +https://www.mail-archive.com/address@hidden/msg438675.html +> +> +> Thanks, +> +> +> +> paolo +> +> +> +> . +> +> +> + +2017-11-20 06:57+0000, Gonglei (Arei): +> +Hi Paolo, +> +> +What's your opinion about this patch? We found it just before finishing +> +patches +> +for the past two days. +I think your case was fixed by f4ef19108608 ("KVM: X86: Fix loss of +pending INIT due to race"), but that patch didn't fix it perfectly, so +maybe you're hitting a similar case that happens in SMM ... + +> +> -----Original Message----- +> +> From: address@hidden [ +mailto:address@hidden +On +> +> Behalf Of Herongguang (Stephen) +> +> On 2017/4/6 0:16, Paolo Bonzini wrote: +> +> > Hi Rongguang, +> +> > +> +> > sorry for the late response. +> +> > +> +> > Where exactly is KVM_APIC_INIT dropped? kvm_get_mp_state does clear +> +> the +> +> > bit, but the result of the INIT is stored in mp_state. +> +> +> +> It's dropped in KVM_SET_VCPU_EVENTS, see below. +> +> +> +> > +> +> > kvm_get_vcpu_events is called after kvm_get_mp_state; it retrieves +> +> > KVM_APIC_INIT in events.smi.latched_init and kvm_set_vcpu_events passes +> +> > it back. Maybe it should ignore events.smi.latched_init if not in SMM, +> +> > but I would like to understand the exact sequence of events. +> +> +> +> time0: +> +> vcpu1: +> +> qmp_query_cpus-> cpu_synchronize_state-> kvm_cpu_synchronize_state-> +> +> > do_kvm_cpu_synchronize_state(and set vcpu1's cpu->kvm_vcpu_dirty to +> +> true)-> kvm_arch_get_registers(KVM_APIC_INIT bit in +> +> vcpu->arch.apic->pending_events was not set) +> +> +> +> time1: +> +> vcpu0: +> +> send INIT-SIPI to all AP->(in vcpu 0's +> +> context)__apic_accept_irq(KVM_APIC_INIT bit in vcpu1's +> +> arch.apic->pending_events is set) +> +> +> +> time2: +> +> vcpu1: +> +> kvm_cpu_exec->(if cpu->kvm_vcpu_dirty is +> +> true)kvm_arch_put_registers->kvm_put_vcpu_events(overwritten +> +> KVM_APIC_INIT bit in vcpu->arch.apic->pending_events!) +> +> +> +> So it's a race between vcpu1 get/put registers with kvm/other vcpus changing +> +> vcpu1's status/structure fields in the mean time, I am in worry of if there +> +> are +> +> other fields may be overwritten, +> +> sipi_vector is one. +Fields that can be asynchronously written by other VCPUs (like SIPI, +NMI) must not be SET if other VCPUs were not paused since the last GET. +(Looking at the interface, we can currently lose pending SMI.) + +INIT is one of the restricted fields, but the API unconditionally +couples SMM with latched INIT, which means that we can lose an INIT if +the VCPU is in SMM mode -- do you see SMM in kvm_vcpu_events? + +Thanks. + diff --git a/classification_output/01/other/5321072 b/classification_output/01/other/5321072 new file mode 100644 index 000000000..55a82678b --- /dev/null +++ b/classification_output/01/other/5321072 @@ -0,0 +1,421 @@ +other: 0.869 +instruction: 0.794 +semantic: 0.770 +mistranslation: 0.693 + +[Qemu-devel] 答复: Re: [BUG]COLO failover hang + +hi. + + +I test the git qemu master have the same problem. + + +(gdb) bt + + +#0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, niov=1, +fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 + + +#1 0x00007f658e4aa0c2 in qio_channel_read (address@hidden, address@hidden "", +address@hidden, address@hidden) at io/channel.c:114 + + +#2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +migration/qemu-file-channel.c:78 + + +#3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +migration/qemu-file.c:295 + + +#4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, address@hidden) at +migration/qemu-file.c:555 + + +#5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +migration/qemu-file.c:568 + + +#6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +migration/qemu-file.c:648 + + +#7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +address@hidden) at migration/colo.c:244 + + +#8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized outï¼, +address@hidden, address@hidden) + + + at migration/colo.c:264 + + +#9 0x00007f658e3e740e in colo_process_incoming_thread (opaque=0x7f658eb30360 +ï¼mis_current.31286ï¼) at migration/colo.c:577 + + +#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 + + +#11 0x00007f65881983ed in clone () from /lib64/libc.so.6 + + +(gdb) p ioc-ï¼name + + +$2 = 0x7f658ff7d5c0 "migration-socket-incoming" + + +(gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN + + +$3 = 0 + + + + + +(gdb) bt + + +#0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, condition=G_IO_IN, +opaque=0x7fdcceeafa90) at migration/socket.c:137 + + +#1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +gmain.c:3054 + + +#2 g_main_context_dispatch (context=ï¼optimized outï¼, address@hidden) at +gmain.c:3630 + + +#3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 + + +#4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at util/main-loop.c:258 + + +#5 main_loop_wait (address@hidden) at util/main-loop.c:506 + + +#6 0x00007fdccb526187 in main_loop () at vl.c:1898 + + +#7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized outï¼) at +vl.c:4709 + + +(gdb) p ioc-ï¼features + + +$1 = 6 + + +(gdb) p ioc-ï¼name + + +$2 = 0x7fdcce1b1ab0 "migration-socket-listener" + + + + + +May be socket_accept_incoming_migration should call +qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? + + + + + +thank you. + + + + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 address@hidden +æéäººï¼ address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ16æ¥ 14:46 +主 é¢ ï¼Re: [Qemu-devel] COLO failover hang + + + + + + + +On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ +ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ +ï¼ I found that the colo in qemu is not complete yet. +ï¼ Do the colo have any plan for development? + +Yes, We are developing. You can see some of patch we pushing. + +ï¼ Has anyone ever run it successfully? Any help is appreciated! + +In our internal version can run it successfully, +The failover detail you can ask Zhanghailiang for help. +Next time if you have some question about COLO, +please cc me and zhanghailiang address@hidden + + +Thanks +Zhang Chen + + +ï¼ +ï¼ +ï¼ +ï¼ centos7.2+qemu2.7.50 +ï¼ (gdb) bt +ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ io/channel-socket.c:497 +ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ address@hidden "", address@hidden, +ï¼ address@hidden) at io/channel.c:97 +ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ migration/qemu-file-channel.c:78 +ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ migration/qemu-file.c:257 +ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ migration/qemu-file.c:523 +ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ migration/qemu-file.c:603 +ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ address@hidden) at migration/colo.c:215 +ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ migration/colo.c:546 +ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ migration/colo.c:649 +ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ -- +ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ +ï¼ +ï¼ +ï¼ + +-- +Thanks +Zhang Chen + +Hi,Wang. + +You can test this branch: +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +and please follow wiki ensure your own configuration correctly. +http://wiki.qemu-project.org/Features/COLO +Thanks + +Zhang Chen + + +On 03/21/2017 03:27 PM, address@hidden wrote: +hi. + +I test the git qemu master have the same problem. + +(gdb) bt +#0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +#1 0x00007f658e4aa0c2 in qio_channel_read +(address@hidden, address@hidden "", +address@hidden, address@hidden) at io/channel.c:114 +#2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +migration/qemu-file-channel.c:78 +#3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +migration/qemu-file.c:295 +#4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +address@hidden) at migration/qemu-file.c:555 +#5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +migration/qemu-file.c:568 +#6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +migration/qemu-file.c:648 +#7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +address@hidden) at migration/colo.c:244 +#8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +outï¼, address@hidden, +address@hidden) +at migration/colo.c:264 +#9 0x00007f658e3e740e in colo_process_incoming_thread +(opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +#10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 + +#11 0x00007f65881983ed in clone () from /lib64/libc.so.6 + +(gdb) p ioc-ï¼name + +$2 = 0x7f658ff7d5c0 "migration-socket-incoming" + +(gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN + +$3 = 0 + + +(gdb) bt +#0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +#1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +gmain.c:3054 +#2 g_main_context_dispatch (context=ï¼optimized outï¼, +address@hidden) at gmain.c:3630 +#3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +#4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +util/main-loop.c:258 +#5 main_loop_wait (address@hidden) at +util/main-loop.c:506 +#6 0x00007fdccb526187 in main_loop () at vl.c:1898 +#7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +outï¼) at vl.c:4709 +(gdb) p ioc-ï¼features + +$1 = 6 + +(gdb) p ioc-ï¼name + +$2 = 0x7fdcce1b1ab0 "migration-socket-listener" +May be socket_accept_incoming_migration should +call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +thank you. + + + + + +åå§é®ä»¶ +address@hidden; +*æ¶ä»¶äººï¼*ç广10165992;address@hidden; +address@hidden;address@hidden; +*æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +*主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* + + + + +On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ +ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ +ï¼ I found that the colo in qemu is not complete yet. +ï¼ Do the colo have any plan for development? + +Yes, We are developing. You can see some of patch we pushing. + +ï¼ Has anyone ever run it successfully? Any help is appreciated! + +In our internal version can run it successfully, +The failover detail you can ask Zhanghailiang for help. +Next time if you have some question about COLO, +please cc me and zhanghailiang address@hidden + + +Thanks +Zhang Chen + + +ï¼ +ï¼ +ï¼ +ï¼ centos7.2+qemu2.7.50 +ï¼ (gdb) bt +ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ io/channel-socket.c:497 +ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ address@hidden "", address@hidden, +ï¼ address@hidden) at io/channel.c:97 +ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ migration/qemu-file-channel.c:78 +ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ migration/qemu-file.c:257 +ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ migration/qemu-file.c:523 +ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ migration/qemu-file.c:603 +ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ address@hidden) at migration/colo.c:215 +ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ migration/colo.c:546 +ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ migration/colo.c:649 +ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ -- +ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ +ï¼ +ï¼ +ï¼ + +-- +Thanks +Zhang Chen +-- +Thanks +Zhang Chen + diff --git a/classification_output/01/other/5362491 b/classification_output/01/other/5362491 new file mode 100644 index 000000000..db7724db8 --- /dev/null +++ b/classification_output/01/other/5362491 @@ -0,0 +1,98 @@ +other: 0.980 +semantic: 0.979 +instruction: 0.975 +mistranslation: 0.961 + +[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()" + +Bug Description: +Encountering a boot failure when launching a KVM guest with +'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor +crashes. +Reproduction Steps: +# qemu-system-ppc64 --version +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +pseries,accel=kvm \ +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ + -device virtio-scsi-pci,id=scsi \ +-drive +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +\ +-device scsi-hd,drive=drive0,bus=scsi.0 \ + -netdev bridge,id=net0,br=virbr0 \ + -device virtio-net-pci,netdev=net0 \ + -serial pty \ + -device virtio-balloon-pci \ + -cpu host +QEMU 9.2.50 monitor - type 'help' for more information +char device redirected to /dev/pts/2 (label serial0) +(qemu) +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +unavailable: IRQ_XIVE capability must be present for KVM +Falling back to kernel-irqchip=off +** Qemu Hang + +(In another ssh session) +# screen /dev/pts/2 +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +15:20:17 UTC 2024 +Detected machine type: 0000000000000101 +command line: +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +Calling ibm,client-architecture-support... done +memory layout at init: + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 0000000008200000 + alloc_top : 0000000030000000 + alloc_top_hi : 0000000800000000 + rmo_top : 0000000030000000 + ram_top : 0000000800000000 +instantiating rtas at 0x000000002fff0000... done +prom_hold_cpus: skipped +copying OF device tree... +Building dt strings... +Building dt structure... +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +Quiescing Open Firmware ... +Booting Linux via __start() @ 0x0000000000440000 ... +** Guest Console Hang + + +Git Bisect: +Performing git bisect points to the following patch: +# git bisect bad +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +Author: Nicholas Piggin <npiggin@gmail.com> +Date: Thu Dec 19 13:40:31 2024 +1000 + + target/ppc: fix timebase register reset state +(H)DEC and PURR get reset before icount does, which causes them to +be +skewed and not match the init state. This can cause replay to not +match the recorded trace exactly. For DEC and HDEC this is usually +not +noticable since they tend to get programmed before affecting the + target machine. PURR has been observed to cause replay bugs when + running Linux. + + Fix this by resetting using a time of 0. + + Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> + Signed-off-by: Nicholas Piggin <npiggin@gmail.com> + + hw/ppc/ppc.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + + +Reverting the patch helps boot the guest. +Thanks, +Misbah Anjum N + diff --git a/classification_output/01/other/5396868 b/classification_output/01/other/5396868 new file mode 100644 index 000000000..f04ba7919 --- /dev/null +++ b/classification_output/01/other/5396868 @@ -0,0 +1,223 @@ +other: 0.856 +semantic: 0.832 +instruction: 0.829 +mistranslation: 0.794 + +[Qemu-devel] [Snapshot Bug?]Qcow2 meta data corruption + +Hi all, +There was a problem about qcow2 image file happened in my serval vms and I could not figure it out, +so have to ask for some help. +Here is the thing: +At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms. +parts of check report: +3-Leaked cluster 2926229 refcount=1 reference=0 +4-Leaked cluster 3021181 refcount=1 reference=0 +5-Leaked cluster 3021182 refcount=1 reference=0 +6-Leaked cluster 3021183 refcount=1 reference=0 +7-Leaked cluster 3021184 refcount=1 reference=0 +8-ERROR cluster 3102547 refcount=3 reference=4 +9-ERROR cluster 3111536 refcount=3 reference=4 +10-ERROR cluster 3113369 refcount=3 reference=4 +11-ERROR cluster 3235590 refcount=10 reference=11 +12-ERROR cluster 3235591 refcount=10 reference=11 +423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts. +424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts. +426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts. +After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms. +Like this: +table entry conflict (with our qcow2 check +tool): +a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7 +b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7 +table entry conflict : +a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19 +b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19 +table entry conflict : +a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18 +b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18 +table entry conflict : +a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17 +b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17 +table entry conflict : +a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16 +b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16 +I think the problem is relate to the snapshot create, delete. But I cant reproduce it . +Can Anyone give a hint about how this happen? +Qemu version 2.0.1, I download the source code and make install it. +Qemu parameters: +/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 +Thanks +Sangfor VT. +leijian + +Hi all, +There was a problem about qcow2 image file happened in my serval vms and I could not figure it out, +so have to ask for some help. +Here is the thing: +At first, I found there were some data corruption in a vm, so I did qemu-img check to all my vms. +parts of check report: +3-Leaked cluster 2926229 refcount=1 reference=0 +4-Leaked cluster 3021181 refcount=1 reference=0 +5-Leaked cluster 3021182 refcount=1 reference=0 +6-Leaked cluster 3021183 refcount=1 reference=0 +7-Leaked cluster 3021184 refcount=1 reference=0 +8-ERROR cluster 3102547 refcount=3 reference=4 +9-ERROR cluster 3111536 refcount=3 reference=4 +10-ERROR cluster 3113369 refcount=3 reference=4 +11-ERROR cluster 3235590 refcount=10 reference=11 +12-ERROR cluster 3235591 refcount=10 reference=11 +423-Warning: cluster offset=0xc000c00020000 is after the end of the image file, can't properly check refcounts. +424-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +425-Warning: cluster offset=0xc0001000c0000 is after the end of the image file, can't properly check refcounts. +426-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +427-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +428-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +429-Warning: cluster offset=0xc000c000c0000 is after the end of the image file, can't properly check refcounts. +430-Warning: cluster offset=0xc000c00010000 is after the end of the image file, can't properly check refcounts. +After a futher look in, I found two l2 entries point to the same cluster, and that was found in serval qcow2 image files of different vms. +Like this: +table entry conflict (with our qcow2 check +tool): +a table offset : 0x00000093f7080000 level : 2, l1 table entry 100, l2 table entry 7 +b table offset : 0x00000093f7080000 level : 2, l1 table entry 5, l2 table entry 7 +table entry conflict : +a table offset : 0x00000000a01e0000 level : 2, l1 table entry 100, l2 table entry 19 +b table offset : 0x00000000a01e0000 level : 2, l1 table entry 5, l2 table entry 19 +table entry conflict : +a table offset : 0x00000000a01d0000 level : 2, l1 table entry 100, l2 table entry 18 +b table offset : 0x00000000a01d0000 level : 2, l1 table entry 5, l2 table entry 18 +table entry conflict : +a table offset : 0x00000000a01c0000 level : 2, l1 table entry 100, l2 table entry 17 +b table offset : 0x00000000a01c0000 level : 2, l1 table entry 5, l2 table entry 17 +table entry conflict : +a table offset : 0x00000000a01b0000 level : 2, l1 table entry 100, l2 table entry 16 +b table offset : 0x00000000a01b0000 level : 2, l1 table entry 5, l2 table entry 16 +I think the problem is relate to the snapshot create, delete. But I cant reproduce it . +Can Anyone give a hint about how this happen? +Qemu version 2.0.1, I download the source code and make install it. +Qemu parameters: +/usr/bin/kvm -chardev socket,id=qmp,path=/var/run/qemu-server/5855899639838.qmp,server,nowait -mon chardev=qmp,mode=control -vnc :0,websocket,to=200 -enable-kvm -pidfile /var/run/qemu-server/5855899639838.pid -daemonize -name yfMailSvr-200.200.0.14 -smp sockets=1,cores=4 -cpu core2duo,hv_spinlocks=0xffff,hv_relaxed,hv_time,hv_vapic,+sse4.1,+sse4.2,+x2apic,+erms,+smep,+fsgsbase,+f16c,+dca,+pcid,+pdcm,+xtpr,+ht,+ss,+acpi,+ds -nodefaults -vga cirrus -k en-us -boot menu=on,splash-time=8000 -m 8192 -usb -drive if=none,id=drive-ide0,media=cdrom,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0 -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-1.qcow2,if=none,id=drive-virtio1,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio1,id=virtio1,bus=pci.0,addr=0xb -drive file=/sf/data/36b82a720d3a278001ba904e80c20c13e_ecf4bbbf3e94/images/host-ecf4bbbf3e94/784f3f08532a/yfMailSvr-200.200.0.14.vm/vm-disk-2.qcow2,if=none,id=drive-virtio2,cache=none,aio=native -device virtio-blk-pci,drive=drive-virtio2,id=virtio2,bus=pci.0,addr=0xc,bootindex=101 -netdev type=tap,id=net0,ifname=585589963983800,script=/sf/etc/kvm/vtp-bridge,vhost=on,vhostforce=on -device virtio-net-pci,romfile=,mac=FE:FC:FE:F0:AB:BA,netdev=net0,bus=pci.0,addr=0x12,id=net0 -rtc driftfix=slew,clock=rt,base=localtime -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 +Thanks +Sangfor VT. +leijian + +Am 03.04.2015 um 12:04 hat leijian geschrieben: +> +Hi all, +> +> +There was a problem about qcow2 image file happened in my serval vms and I +> +could not figure it out, +> +so have to ask for some help. +> +[...] +> +I think the problem is relate to the snapshot create, delete. But I cant +> +reproduce it . +> +Can Anyone give a hint about how this happen? +How did you create/delete your snapshots? + +More specifically, did you take care to never access your image from +more than one process (except if both are read-only)? It happens +occasionally that people use 'qemu-img snapshot' while the VM is +running. This is wrong and can corrupt the image. + +Kevin + +On 04/07/2015 03:33 AM, Kevin Wolf wrote: +> +More specifically, did you take care to never access your image from +> +more than one process (except if both are read-only)? It happens +> +occasionally that people use 'qemu-img snapshot' while the VM is +> +running. This is wrong and can corrupt the image. +Since this has been done by more than one person, I'm wondering if there +is something we can do in the qcow2 format itself to make it harder for +the casual user to cause corruption. Maybe if we declare some bit or +extension header for an image open for writing, which other readers can +use as a warning ("this image is being actively modified; reading it may +fail"), and other writers can use to deny access ("another process is +already modifying this image"), where a writer should set that bit +before writing anything else in the file, then clear it on exit. Of +course, you'd need a way to override the bit to actively clear it to +recover from the case of a writer dying unexpectedly without resetting +it normally. And it won't help the case of a reader opening the file +first, followed by a writer, where the reader could still get thrown off +track. + +Or maybe we could document in the qcow2 format that all readers and +writers should attempt to obtain the appropriate flock() permissions [or +other appropriate advisory locking scheme] over the file header, so that +cooperating processes that both use advisory locking will know when the +file is in use by another process. + +-- +Eric Blake eblake redhat com +1-919-301-3266 +Libvirt virtualization library +http://libvirt.org +signature.asc +Description: +OpenPGP digital signature + + +I created/deleted the snapshot by using qmp command "snapshot_blkdev_internal"/"snapshot_delete_blkdev_internal", and for avoiding the case you mentioned above, I have added the flock() permission in the qemu_open(). +Here is the test of doing qemu-img snapshot to a running vm: +Diskfile:/sf/data/36c81f660e38b3b001b183da50b477d89_f8bc123b3e74/images/host-f8bc123b3e74/4a8d8728fcdc/Devried30030.vm/vm-disk-1.qcow2 is used! errno=Resource temporarily unavailable +Does the two cluster entry happen to be the same because of the refcount of using cluster decrease to 0 unexpectedly and is allocated again? +If it was not accessing the image from more than one process, any other exceptions I can test for? +Thanks +leijian +From: +Eric Blake +Date: +2015-04-07 23:27 +To: +Kevin Wolf +; +leijian +CC: +qemu-devel +; +stefanha +Subject: +Re: [Qemu-devel] [Snapshot Bug?]Qcow2 meta data +corruption +On 04/07/2015 03:33 AM, Kevin Wolf wrote: +> More specifically, did you take care to never access your image from +> more than one process (except if both are read-only)? It happens +> occasionally that people use 'qemu-img snapshot' while the VM is +> running. This is wrong and can corrupt the image. +Since this has been done by more than one person, I'm wondering if there +is something we can do in the qcow2 format itself to make it harder for +the casual user to cause corruption. Maybe if we declare some bit or +extension header for an image open for writing, which other readers can +use as a warning ("this image is being actively modified; reading it may +fail"), and other writers can use to deny access ("another process is +already modifying this image"), where a writer should set that bit +before writing anything else in the file, then clear it on exit. Of +course, you'd need a way to override the bit to actively clear it to +recover from the case of a writer dying unexpectedly without resetting +it normally. And it won't help the case of a reader opening the file +first, followed by a writer, where the reader could still get thrown off +track. +Or maybe we could document in the qcow2 format that all readers and +writers should attempt to obtain the appropriate flock() permissions [or +other appropriate advisory locking scheme] over the file header, so that +cooperating processes that both use advisory locking will know when the +file is in use by another process. +-- +Eric Blake eblake redhat com +1-919-301-3266 +Libvirt virtualization library http://libvirt.org + diff --git a/classification_output/01/other/5443005 b/classification_output/01/other/5443005 new file mode 100644 index 000000000..95b28c963 --- /dev/null +++ b/classification_output/01/other/5443005 @@ -0,0 +1,576 @@ +other: 0.846 +instruction: 0.845 +mistranslation: 0.817 +semantic: 0.815 + +[BUG][KVM_SET_USER_MEMORY_REGION] KVM_SET_USER_MEMORY_REGION failed + +Hi all, +I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. +Is there any one know this? +The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log +``` +2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument +kvm_set_phys_mem: error registering slot: Invalid argument +2023-03-14 10:09:18.198+0000: shutting down, reason=crashed +``` +The xml file +``` +root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml +<!-- +WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE +OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: + virsh edit instance-0000000e +or other application using the libvirt API. +--> +<domain type='kvm'> + <name>instance-0000000e</name> + <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> + <metadata> +  <nova:instance xmlns:nova=" +http://openstack.org/xmlns/libvirt/nova/1.1 +"> +   <nova:package version="25.1.0"/> +   <nova:name>provider-instance</nova:name> +   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> +   <nova:flavor name="cirros-os-dpu-test-1"> +    <nova:memory>64</nova:memory> +    <nova:disk>1</nova:disk> +    <nova:swap>0</nova:swap> +    <nova:ephemeral>0</nova:ephemeral> +    <nova:vcpus>1</nova:vcpus> +   </nova:flavor> +   <nova:owner> +    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> +    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> +   </nova:owner> +   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> +   <nova:ports> +    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> +     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> +    </nova:port> +   </nova:ports> +  </nova:instance> + </metadata> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>1</vcpu> + <sysinfo type='smbios'> +  <system> +   <entry name='manufacturer'>OpenStack Foundation</entry> +   <entry name='product'>OpenStack Nova</entry> +   <entry name='version'>25.1.0</entry> +   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> +   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> +   <entry name='family'>Virtual Machine</entry> +  </system> + </sysinfo> + <os> +  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> +  <boot dev='hd'/> +  <smbios mode='sysinfo'/> + </os> + <features> +  <acpi/> +  <apic/> +  <vmcoreinfo state='on'/> + </features> + <cpu mode='host-model' check='partial'> +  <topology sockets='1' dies='1' cores='1' threads='1'/> + </cpu> + <clock offset='utc'> +  <timer name='pit' tickpolicy='delay'/> +  <timer name='rtc' tickpolicy='catchup'/> +  <timer name='hpet' present='no'/> + </clock> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> +  <emulator>/usr/bin/qemu-system-x86_64</emulator> +  <disk type='file' device='disk'> +   <driver name='qemu' type='qcow2' cache='none'/> +   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> +   <target dev='vda' bus='virtio'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> +  </disk> +  <controller type='usb' index='0' model='piix3-uhci'> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> +  </controller> +  <controller type='pci' index='0' model='pci-root'/> +  <interface type='hostdev' managed='yes'> +   <mac address='fa:16:3e:aa:d9:23'/> +   <source> +    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> +   </source> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> +  </interface> +  <serial type='pty'> +   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> +   <target type='isa-serial' port='0'> +    <model name='isa-serial'/> +   </target> +  </serial> +  <console type='pty'> +   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> +   <target type='serial' port='0'/> +  </console> +  <input type='tablet' bus='usb'> +   <address type='usb' bus='0' port='1'/> +  </input> +  <input type='mouse' bus='ps2'/> +  <input type='keyboard' bus='ps2'/> +  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> +   <listen type='address' address='0.0.0.0'/> +  </graphics> +  <audio id='1' type='none'/> +  <video> +   <model type='virtio' heads='1' primary='yes'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> +  </video> +  <hostdev mode='subsystem' type='pci' managed='yes'> +   <source> +    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> +   </source> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> +  </hostdev> +  <memballoon model='virtio'> +   <stats period='10'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> +  </memballoon> +  <rng model='virtio'> +   <backend model='random'>/dev/urandom</backend> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> +  </rng> + </devices> +</domain> +``` +---- +Simon Jones + +This is happened in ubuntu22.04. +QEMU is install by apt like this: +apt install -y qemu qemu-kvm qemu-system +and QEMU version is 6.2.0 +---- +Simon Jones +Simon Jones < +batmanustc@gmail.com +> äº2023å¹´3æ21æ¥å¨äº 08:40åéï¼ +Hi all, +I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. +Is there any one know this? +The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log +``` +2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument +kvm_set_phys_mem: error registering slot: Invalid argument +2023-03-14 10:09:18.198+0000: shutting down, reason=crashed +``` +The xml file +``` +root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml +<!-- +WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE +OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: + virsh edit instance-0000000e +or other application using the libvirt API. +--> +<domain type='kvm'> + <name>instance-0000000e</name> + <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> + <metadata> +  <nova:instance xmlns:nova=" +http://openstack.org/xmlns/libvirt/nova/1.1 +"> +   <nova:package version="25.1.0"/> +   <nova:name>provider-instance</nova:name> +   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> +   <nova:flavor name="cirros-os-dpu-test-1"> +    <nova:memory>64</nova:memory> +    <nova:disk>1</nova:disk> +    <nova:swap>0</nova:swap> +    <nova:ephemeral>0</nova:ephemeral> +    <nova:vcpus>1</nova:vcpus> +   </nova:flavor> +   <nova:owner> +    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> +    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> +   </nova:owner> +   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> +   <nova:ports> +    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> +     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> +    </nova:port> +   </nova:ports> +  </nova:instance> + </metadata> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>1</vcpu> + <sysinfo type='smbios'> +  <system> +   <entry name='manufacturer'>OpenStack Foundation</entry> +   <entry name='product'>OpenStack Nova</entry> +   <entry name='version'>25.1.0</entry> +   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> +   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> +   <entry name='family'>Virtual Machine</entry> +  </system> + </sysinfo> + <os> +  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> +  <boot dev='hd'/> +  <smbios mode='sysinfo'/> + </os> + <features> +  <acpi/> +  <apic/> +  <vmcoreinfo state='on'/> + </features> + <cpu mode='host-model' check='partial'> +  <topology sockets='1' dies='1' cores='1' threads='1'/> + </cpu> + <clock offset='utc'> +  <timer name='pit' tickpolicy='delay'/> +  <timer name='rtc' tickpolicy='catchup'/> +  <timer name='hpet' present='no'/> + </clock> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> +  <emulator>/usr/bin/qemu-system-x86_64</emulator> +  <disk type='file' device='disk'> +   <driver name='qemu' type='qcow2' cache='none'/> +   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> +   <target dev='vda' bus='virtio'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> +  </disk> +  <controller type='usb' index='0' model='piix3-uhci'> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> +  </controller> +  <controller type='pci' index='0' model='pci-root'/> +  <interface type='hostdev' managed='yes'> +   <mac address='fa:16:3e:aa:d9:23'/> +   <source> +    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> +   </source> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> +  </interface> +  <serial type='pty'> +   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> +   <target type='isa-serial' port='0'> +    <model name='isa-serial'/> +   </target> +  </serial> +  <console type='pty'> +   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> +   <target type='serial' port='0'/> +  </console> +  <input type='tablet' bus='usb'> +   <address type='usb' bus='0' port='1'/> +  </input> +  <input type='mouse' bus='ps2'/> +  <input type='keyboard' bus='ps2'/> +  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> +   <listen type='address' address='0.0.0.0'/> +  </graphics> +  <audio id='1' type='none'/> +  <video> +   <model type='virtio' heads='1' primary='yes'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> +  </video> +  <hostdev mode='subsystem' type='pci' managed='yes'> +   <source> +    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> +   </source> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> +  </hostdev> +  <memballoon model='virtio'> +   <stats period='10'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> +  </memballoon> +  <rng model='virtio'> +   <backend model='random'>/dev/urandom</backend> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> +  </rng> + </devices> +</domain> +``` +---- +Simon Jones + +This is full ERROR log +2023-03-23 08:00:52.362+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt < +christian.ehrhardt@canonical.com +> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2 +LC_ALL=C \ +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ +HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-instance-0000000e/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=instance-0000000e,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-instance-0000000e/master-key.aes"}' \ +-machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \ +-accel kvm \ +-cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \ +-m 64 \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,dies=1,cores=1,threads=1 \ +-uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \ +-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=33,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc,driftfix=slew \ +-global kvm-pit.lost_tick_policy=delay \ +-no-hpet \ +-no-shutdown \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ +-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ +-device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ +-add-fd set=1,fd=34 \ +-chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \ +-device isa-serial,chardev=charserial0,id=serial0 \ +-device usb-tablet,id=input0,bus=usb.0,port=1 \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-vnc +0.0.0.0:0 +,audiodev=audio1 \ +-device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \ +-device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \ +-device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ +-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \ +-device vmcoreinfo \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on +char device redirected to /dev/pts/3 (label charserial0) +2023-03-23T08:00:53.728550Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument +kvm_set_phys_mem: error registering slot: Invalid argument +2023-03-23 08:00:54.201+0000: shutting down, reason=crashed +2023-03-23 08:54:43.468+0000: starting up libvirt version: 8.0.0, package: 1ubuntu7.4 (Christian Ehrhardt < +christian.ehrhardt@canonical.com +> Tue, 22 Nov 2022 15:59:28 +0100), qemu version: 6.2.0Debian 1:6.2+dfsg-2ubuntu6.6, kernel: 5.19.0-35-generic, hostname: c1c2 +LC_ALL=C \ +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \ +HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e \ +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.local/share \ +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.cache \ +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-5-instance-0000000e/.config \ +/usr/bin/qemu-system-x86_64 \ +-name guest=instance-0000000e,debug-threads=on \ +-S \ +-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-5-instance-0000000e/master-key.aes"}' \ +-machine pc-i440fx-6.2,usb=off,dump-guest-core=off,memory-backend=pc.ram \ +-accel kvm \ +-cpu Cooperlake,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,sha-ni=on,umip=on,waitpkg=on,gfni=on,vaes=on,vpclmulqdq=on,rdpid=on,movdiri=on,movdir64b=on,fsrm=on,md-clear=on,avx-vnni=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,hle=off,rtm=off,avx512f=off,avx512dq=off,avx512cd=off,avx512bw=off,avx512vl=off,avx512vnni=off,avx512-bf16=off,taa-no=off \ +-m 64 \ +-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":67108864}' \ +-overcommit mem-lock=off \ +-smp 1,sockets=1,dies=1,cores=1,threads=1 \ +-uuid ff91d2dc-69a1-43ef-abde-c9e4e9a0305b \ +-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=25.1.0,serial=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,uuid=ff91d2dc-69a1-43ef-abde-c9e4e9a0305b,family=Virtual Machine' \ +-no-user-config \ +-nodefaults \ +-chardev socket,id=charmonitor,fd=33,server=on,wait=off \ +-mon chardev=charmonitor,id=monitor,mode=control \ +-rtc base=utc,driftfix=slew \ +-global kvm-pit.lost_tick_policy=delay \ +-no-hpet \ +-no-shutdown \ +-boot strict=on \ +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ +-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/_base/8b58db82a488248e7c5e769599954adaa47a5314","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ +-blockdev '{"driver":"file","filename":"/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ +-device virtio-blk-pci,bus=pci.0,addr=0x3,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ +-add-fd set=1,fd=34 \ +-chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on \ +-device isa-serial,chardev=charserial0,id=serial0 \ +-device usb-tablet,id=input0,bus=usb.0,port=1 \ +-audiodev '{"id":"audio1","driver":"none"}' \ +-vnc +0.0.0.0:0 +,audiodev=audio1 \ +-device virtio-vga,id=video0,max_outputs=1,bus=pci.0,addr=0x2 \ +-device vfio-pci,host=0000:01:00.5,id=hostdev0,bus=pci.0,addr=0x4 \ +-device vfio-pci,host=0000:01:00.6,id=hostdev1,bus=pci.0,addr=0x5 \ +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ +-object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/urandom"}' \ +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 \ +-device vmcoreinfo \ +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on +char device redirected to /dev/pts/3 (label charserial0) +2023-03-23T08:54:44.755039Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument +kvm_set_phys_mem: error registering slot: Invalid argument +2023-03-23 08:54:45.230+0000: shutting down, reason=crashed +---- +Simon Jones +Simon Jones < +batmanustc@gmail.com +> äº2023å¹´3æ23æ¥å¨å 05:49åéï¼ +This is happened in ubuntu22.04. +QEMU is install by apt like this: +apt install -y qemu qemu-kvm qemu-system +and QEMU version is 6.2.0 +---- +Simon Jones +Simon Jones < +batmanustc@gmail.com +> äº2023å¹´3æ21æ¥å¨äº 08:40åéï¼ +Hi all, +I start a VM in openstack, and openstack use libvirt to start qemu VM, but now log show this ERROR. +Is there any one know this? +The ERROR log from /var/log/libvirt/qemu/instance-0000000e.log +``` +2023-03-14T10:09:17.674114Z qemu-system-x86_64: kvm_set_user_memory_region: KVM_SET_USER_MEMORY_REGION failed, slot=4, start=0xfffffffffe000000, size=0x2000: Invalid argument +kvm_set_phys_mem: error registering slot: Invalid argument +2023-03-14 10:09:18.198+0000: shutting down, reason=crashed +``` +The xml file +``` +root@c1c2:~# cat /etc/libvirt/qemu/instance-0000000e.xml +<!-- +WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE +OVERWRITTEN AND LOST. Changes to this xml configuration should be made using: + virsh edit instance-0000000e +or other application using the libvirt API. +--> +<domain type='kvm'> + <name>instance-0000000e</name> + <uuid>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</uuid> + <metadata> +  <nova:instance xmlns:nova=" +http://openstack.org/xmlns/libvirt/nova/1.1 +"> +   <nova:package version="25.1.0"/> +   <nova:name>provider-instance</nova:name> +   <nova:creationTime>2023-03-14 10:09:13</nova:creationTime> +   <nova:flavor name="cirros-os-dpu-test-1"> +    <nova:memory>64</nova:memory> +    <nova:disk>1</nova:disk> +    <nova:swap>0</nova:swap> +    <nova:ephemeral>0</nova:ephemeral> +    <nova:vcpus>1</nova:vcpus> +   </nova:flavor> +   <nova:owner> +    <nova:user uuid="ff627ad39ed94479b9c5033bc462cf78">admin</nova:user> +    <nova:project uuid="512866f9994f4ad8916d8539a7cdeec9">admin</nova:project> +   </nova:owner> +   <nova:root type="image" uuid="9e58cb69-316a-4093-9f23-c1d1bd8edffe"/> +   <nova:ports> +    <nova:port uuid="77c1dc00-af39-4463-bea0-12808f4bc340"> +     <nova:ip type="fixed" address="172.1.1.43" ipVersion="4"/> +    </nova:port> +   </nova:ports> +  </nova:instance> + </metadata> + <memory unit='KiB'>65536</memory> + <currentMemory unit='KiB'>65536</currentMemory> + <vcpu placement='static'>1</vcpu> + <sysinfo type='smbios'> +  <system> +   <entry name='manufacturer'>OpenStack Foundation</entry> +   <entry name='product'>OpenStack Nova</entry> +   <entry name='version'>25.1.0</entry> +   <entry name='serial'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> +   <entry name='uuid'>ff91d2dc-69a1-43ef-abde-c9e4e9a0305b</entry> +   <entry name='family'>Virtual Machine</entry> +  </system> + </sysinfo> + <os> +  <type arch='x86_64' machine='pc-i440fx-6.2'>hvm</type> +  <boot dev='hd'/> +  <smbios mode='sysinfo'/> + </os> + <features> +  <acpi/> +  <apic/> +  <vmcoreinfo state='on'/> + </features> + <cpu mode='host-model' check='partial'> +  <topology sockets='1' dies='1' cores='1' threads='1'/> + </cpu> + <clock offset='utc'> +  <timer name='pit' tickpolicy='delay'/> +  <timer name='rtc' tickpolicy='catchup'/> +  <timer name='hpet' present='no'/> + </clock> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <devices> +  <emulator>/usr/bin/qemu-system-x86_64</emulator> +  <disk type='file' device='disk'> +   <driver name='qemu' type='qcow2' cache='none'/> +   <source file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/disk'/> +   <target dev='vda' bus='virtio'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> +  </disk> +  <controller type='usb' index='0' model='piix3-uhci'> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/> +  </controller> +  <controller type='pci' index='0' model='pci-root'/> +  <interface type='hostdev' managed='yes'> +   <mac address='fa:16:3e:aa:d9:23'/> +   <source> +    <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x5'/> +   </source> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> +  </interface> +  <serial type='pty'> +   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> +   <target type='isa-serial' port='0'> +    <model name='isa-serial'/> +   </target> +  </serial> +  <console type='pty'> +   <log file='/var/lib/nova/instances/ff91d2dc-69a1-43ef-abde-c9e4e9a0305b/console.log' append='off'/> +   <target type='serial' port='0'/> +  </console> +  <input type='tablet' bus='usb'> +   <address type='usb' bus='0' port='1'/> +  </input> +  <input type='mouse' bus='ps2'/> +  <input type='keyboard' bus='ps2'/> +  <graphics type='vnc' port='-1' autoport='yes' listen='0.0.0.0'> +   <listen type='address' address='0.0.0.0'/> +  </graphics> +  <audio id='1' type='none'/> +  <video> +   <model type='virtio' heads='1' primary='yes'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> +  </video> +  <hostdev mode='subsystem' type='pci' managed='yes'> +   <source> +    <address domain='0x0000' bus='0x01' slot='0x00' function='0x6'/> +   </source> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/> +  </hostdev> +  <memballoon model='virtio'> +   <stats period='10'/> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> +  </memballoon> +  <rng model='virtio'> +   <backend model='random'>/dev/urandom</backend> +   <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> +  </rng> + </devices> +</domain> +``` +---- +Simon Jones + diff --git a/classification_output/01/other/5745618 b/classification_output/01/other/5745618 new file mode 100644 index 000000000..afd2ea5d3 --- /dev/null +++ b/classification_output/01/other/5745618 @@ -0,0 +1,155 @@ +other: 0.953 +instruction: 0.938 +semantic: 0.937 +mistranslation: 0.897 + +[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c + +If checkpatch.pl is applied (using switch "-f") on file +target/mips/msa_helper.c, it will hang. + +There is a workaround for this particular file: + +These lines in msa_helper.c: + + uint## BITS ##_t S = _S, T = _T; \ + uint## BITS ##_t as, at, xs, xt, xd; \ + +should be replaced with: + + uint## BITS ## _t S = _S, T = _T; \ + uint## BITS ## _t as, at, xs, xt, xd; \ + +(a space is added after the second "##" in each line) + +The workaround is found by partial deleting and undeleting of the code in +msa_helper.c in binary search fashion. + +This workaround will soon be submitted by me as a patch within a series on misc +MIPS issues. + +I took a look at checkpatch.pl code, and it looks it is fairly complicated to +fix the issue, since it happens in the code segment involving intricate logic +conditions. + +Regards, +Aleksandar + +On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: +> +If checkpatch.pl is applied (using switch "-f") on file +> +target/mips/msa_helper.c, it will hang. +> +> +There is a workaround for this particular file: +> +> +These lines in msa_helper.c: +> +> +uint## BITS ##_t S = _S, T = _T; \ +> +uint## BITS ##_t as, at, xs, xt, xd; \ +> +> +should be replaced with: +> +> +uint## BITS ## _t S = _S, T = _T; \ +> +uint## BITS ## _t as, at, xs, xt, xd; \ +> +> +(a space is added after the second "##" in each line) +> +> +The workaround is found by partial deleting and undeleting of the code in +> +msa_helper.c in binary search fashion. +> +> +This workaround will soon be submitted by me as a patch within a series on +> +misc MIPS issues. +> +> +I took a look at checkpatch.pl code, and it looks it is fairly complicated to +> +fix the issue, since it happens in the code segment involving intricate logic +> +conditions. +Thanks for figuring this out, Aleksandar. Not sure if anyone else has +the apetite to fix checkpatch.pl. + +Stefan +signature.asc +Description: +PGP signature + +On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote: +> +On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: +> +> If checkpatch.pl is applied (using switch "-f") on file +> +> target/mips/msa_helper.c, it will hang. +> +> +> +> There is a workaround for this particular file: +> +> +> +> These lines in msa_helper.c: +> +> +> +> uint## BITS ##_t S = _S, T = _T; \ +> +> uint## BITS ##_t as, at, xs, xt, xd; \ +> +> +> +> should be replaced with: +> +> +> +> uint## BITS ## _t S = _S, T = _T; \ +> +> uint## BITS ## _t as, at, xs, xt, xd; \ +> +> +> +> (a space is added after the second "##" in each line) +> +> +> +> The workaround is found by partial deleting and undeleting of the code in +> +> msa_helper.c in binary search fashion. +> +> +> +> This workaround will soon be submitted by me as a patch within a series on +> +> misc MIPS issues. +> +> +> +> I took a look at checkpatch.pl code, and it looks it is fairly complicated +> +> to fix the issue, since it happens in the code segment involving intricate +> +> logic conditions. +> +> +Thanks for figuring this out, Aleksandar. Not sure if anyone else has +> +the apetite to fix checkpatch.pl. +Anyone else but Paolo ;P +http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html +signature.asc +Description: +OpenPGP digital signature + diff --git a/classification_output/01/other/5912779 b/classification_output/01/other/5912779 new file mode 100644 index 000000000..df589df38 --- /dev/null +++ b/classification_output/01/other/5912779 @@ -0,0 +1,315 @@ +other: 0.868 +instruction: 0.833 +semantic: 0.794 +mistranslation: 0.665 + +[BUG Report] Got a use-after-free error while start arm64 VM with lots of pci controllers + +Hi, + +We got a use-after-free report in our Euler Robot Test, it is can be reproduced +quite easily, +It can be reproduced by start VM with lots of pci controller and virtio-scsi +devices. +You can find the full qemu log from attachment. +We have analyzed the log and got the rough process how it happened, but don't +know how to fix it. + +Could anyone help to fix it ? + +The key message shows bellow: +har device redirected to /dev/pts/1 (label charserial0) +==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext +functions and may produce false positives in some cases! +================================================================= +==1517174==ERROR: AddressSanitizer: heap-use-after-free on address +0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0 +READ of size 8 at 0xfffc31a002a0 thread T1 + #0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771 + #1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291 + #2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283 + #3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 + #4 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) + #5 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) + +0xfffc31a002a0 is located 544 bytes inside of 1440-byte region +[0xfffc31a00080,0xfffc31a00620) +freed by thread T37 (CPU 0/KVM) here: + #0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23) + #1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f) + #2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245 + #3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271 + #4 0xaaad745ba867 in pci_bridge_dev_write_config +hw/pci-bridge/pci_bridge_dev.c:153 + #5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81 + #6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483 + #7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544 + #8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482 + #9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167 + #10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207 + #11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297 + #12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386 + #13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246 + #14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 + #15 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) + #16 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) + +previously allocated by thread T0 here: + #0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) + #1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) + #2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188 + #3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385 + #4 0xaaad745baaf3 in pci_bridge_dev_realize +hw/pci-bridge/pci_bridge_dev.c:64 + #5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095 + #6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 + #7 0xaaad7485ed23 in property_set_bool qom/object.c:2102 + #8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 + #9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 + #10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675 + #11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074 + #12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170 + #13 0xaaad73d60c17 in main /home/qemu/vl.c:4313 + #14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) + #15 0xaaad73d6db33 +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) + +Thread T1 created by T0 here: + #0 0xfffc3c068f6f in __interceptor_pthread_create +(/lib64/libasan.so.4+0x38f6f) + #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 + #2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326 + #3 0xaaad74bab2a7 in __libc_csu_init +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7) + #4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47) + #5 0xaaad73d6db33 (/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) + +Thread T37 (CPU 0/KVM) created by T0 here: + #0 0xfffc3c068f6f in __interceptor_pthread_create +(/lib64/libasan.so.4+0x38f6f) + #1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 + #2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045 + #3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077 + #4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712 + #5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 + #6 0xaaad7485ed23 in property_set_bool qom/object.c:2102 + #7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 + #8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 + #9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682 + #10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077 + #11 0xaaad73d60b73 in main /home/qemu/vl.c:4292 + #12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) + #13 0xaaad73d6db33 +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) + +SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in +memory_region_unref + +Thanks +use-after-free-qemu.log +Description: +Text document + +Cc: address@hidden + +On 1/17/2020 4:18 PM, Pan Nengyuan wrote: +> +Hi, +> +> +We got a use-after-free report in our Euler Robot Test, it is can be +> +reproduced quite easily, +> +It can be reproduced by start VM with lots of pci controller and virtio-scsi +> +devices. +> +You can find the full qemu log from attachment. +> +We have analyzed the log and got the rough process how it happened, but don't +> +know how to fix it. +> +> +Could anyone help to fix it ? +> +> +The key message shows bellow: +> +har device redirected to /dev/pts/1 (label charserial0) +> +==1517174==WARNING: ASan doesn't fully support makecontext/swapcontext +> +functions and may produce false positives in some cases! +> +================================================================= +> +==1517174==ERROR: AddressSanitizer: heap-use-after-free on address +> +0xfffc31a002a0 at pc 0xaaad73e1f668 bp 0xfffc319fddb0 sp 0xfffc319fddd0 +> +READ of size 8 at 0xfffc31a002a0 thread T1 +> +#0 0xaaad73e1f667 in memory_region_unref /home/qemu/memory.c:1771 +> +#1 0xaaad73e1f667 in flatview_destroy /home/qemu/memory.c:291 +> +#2 0xaaad74adc85b in call_rcu_thread util/rcu.c:283 +> +#3 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 +> +#4 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) +> +#5 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) +> +> +0xfffc31a002a0 is located 544 bytes inside of 1440-byte region +> +[0xfffc31a00080,0xfffc31a00620) +> +freed by thread T37 (CPU 0/KVM) here: +> +#0 0xfffc3c102e23 in free (/lib64/libasan.so.4+0xd2e23) +> +#1 0xfffc3bbc729f in g_free (/lib64/libglib-2.0.so.0+0x5729f) +> +#2 0xaaad745cce03 in pci_bridge_update_mappings hw/pci/pci_bridge.c:245 +> +#3 0xaaad745ccf33 in pci_bridge_write_config hw/pci/pci_bridge.c:271 +> +#4 0xaaad745ba867 in pci_bridge_dev_write_config +> +hw/pci-bridge/pci_bridge_dev.c:153 +> +#5 0xaaad745d6013 in pci_host_config_write_common hw/pci/pci_host.c:81 +> +#6 0xaaad73e2346f in memory_region_write_accessor /home/qemu/memory.c:483 +> +#7 0xaaad73e1d9ff in access_with_adjusted_size /home/qemu/memory.c:544 +> +#8 0xaaad73e28d1f in memory_region_dispatch_write /home/qemu/memory.c:1482 +> +#9 0xaaad73d7274f in flatview_write_continue /home/qemu/exec.c:3167 +> +#10 0xaaad73d72a53 in flatview_write /home/qemu/exec.c:3207 +> +#11 0xaaad73d7c8c3 in address_space_write /home/qemu/exec.c:3297 +> +#12 0xaaad73e5059b in kvm_cpu_exec /home/qemu/accel/kvm/kvm-all.c:2386 +> +#13 0xaaad73e07ac7 in qemu_kvm_cpu_thread_fn /home/qemu/cpus.c:1246 +> +#14 0xaaad74ab31db in qemu_thread_start util/qemu-thread-posix.c:519 +> +#15 0xfffc3a1678bb (/lib64/libpthread.so.0+0x78bb) +> +#16 0xfffc3a0a616b (/lib64/libc.so.6+0xd616b) +> +> +previously allocated by thread T0 here: +> +#0 0xfffc3c1031cb in __interceptor_malloc (/lib64/libasan.so.4+0xd31cb) +> +#1 0xfffc3bbc7163 in g_malloc (/lib64/libglib-2.0.so.0+0x57163) +> +#2 0xaaad745ccb57 in pci_bridge_region_init hw/pci/pci_bridge.c:188 +> +#3 0xaaad745cd8cb in pci_bridge_initfn hw/pci/pci_bridge.c:385 +> +#4 0xaaad745baaf3 in pci_bridge_dev_realize +> +hw/pci-bridge/pci_bridge_dev.c:64 +> +#5 0xaaad745cacd7 in pci_qdev_realize hw/pci/pci.c:2095 +> +#6 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 +> +#7 0xaaad7485ed23 in property_set_bool qom/object.c:2102 +> +#8 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 +> +#9 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 +> +#10 0xaaad742a53b7 in qdev_device_add /home/qemu/qdev-monitor.c:675 +> +#11 0xaaad742a9c7b in device_init_func /home/qemu/vl.c:2074 +> +#12 0xaaad74ad4d33 in qemu_opts_foreach util/qemu-option.c:1170 +> +#13 0xaaad73d60c17 in main /home/qemu/vl.c:4313 +> +#14 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) +> +#15 0xaaad73d6db33 +> +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) +> +> +Thread T1 created by T0 here: +> +#0 0xfffc3c068f6f in __interceptor_pthread_create +> +(/lib64/libasan.so.4+0x38f6f) +> +#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 +> +#2 0xaaad74adc6a7 in rcu_init_complete util/rcu.c:326 +> +#3 0xaaad74bab2a7 in __libc_csu_init +> +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x17cb2a7) +> +#4 0xfffc39ff0b47 in __libc_start_main (/lib64/libc.so.6+0x20b47) +> +#5 0xaaad73d6db33 +> +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) +> +> +Thread T37 (CPU 0/KVM) created by T0 here: +> +#0 0xfffc3c068f6f in __interceptor_pthread_create +> +(/lib64/libasan.so.4+0x38f6f) +> +#1 0xaaad74ab54ab in qemu_thread_create util/qemu-thread-posix.c:556 +> +#2 0xaaad73e09b0f in qemu_dummy_start_vcpu /home/qemu/cpus.c:2045 +> +#3 0xaaad73e09b0f in qemu_init_vcpu /home/qemu/cpus.c:2077 +> +#4 0xaaad740d36b7 in arm_cpu_realizefn /home/qemu/target/arm/cpu.c:1712 +> +#5 0xaaad7439d9f7 in device_set_realized hw/core/qdev.c:865 +> +#6 0xaaad7485ed23 in property_set_bool qom/object.c:2102 +> +#7 0xaaad74868f4b in object_property_set_qobject qom/qom-qobject.c:26 +> +#8 0xaaad74863a43 in object_property_set_bool qom/object.c:1360 +> +#9 0xaaad73fe3e67 in machvirt_init /home/qemu/hw/arm/virt.c:1682 +> +#10 0xaaad743acfc7 in machine_run_board_init hw/core/machine.c:1077 +> +#11 0xaaad73d60b73 in main /home/qemu/vl.c:4292 +> +#12 0xfffc39ff0b9f in __libc_start_main (/lib64/libc.so.6+0x20b9f) +> +#13 0xaaad73d6db33 +> +(/home/qemu/aarch64-softmmu/qemu-system-aarch64+0x98db33) +> +> +SUMMARY: AddressSanitizer: heap-use-after-free /home/qemu/memory.c:1771 in +> +memory_region_unref +> +> +Thanks +> +use-after-free-qemu.log +Description: +Text document + diff --git a/classification_output/01/other/6156219 b/classification_output/01/other/6156219 new file mode 100644 index 000000000..a0977e39c --- /dev/null +++ b/classification_output/01/other/6156219 @@ -0,0 +1,1421 @@ +other: 0.899 +mistranslation: 0.861 +instruction: 0.854 +semantic: 0.835 + +[Qemu-devel] 答复: Re: 答复: Re: 答复: Re: [BUG]COLO failover hang + +amost like wikiï¼but panic in Primary Node. + + + + +setp: + +1 + +Primary Node. + +x86_64-softmmu/qemu-system-x86_64 -enable-kvm -boot c -m 2048 -smp 2 -qmp stdio +-vnc :7 -name primary -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb +-usbdevice tablet\ + + -drive +if=virtio,id=colo-disk0,driver=quorum,read-pattern=fifo,vote-threshold=1, + + +children.0.file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,children.0.driver=qcow2 + -S \ + + -netdev +tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \ + + -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \ + + -netdev +tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ + + -device e1000,id=e0,netdev=hn0,mac=52:a4:00:12:78:66 \ + + -chardev socket,id=mirror0,host=9.61.1.8,port=9003,server,nowait -chardev +socket,id=compare1,host=9.61.1.8,port=9004,server,nowait \ + + -chardev socket,id=compare0,host=9.61.1.8,port=9001,server,nowait -chardev +socket,id=compare0-0,host=9.61.1.8,port=9001 \ + + -chardev socket,id=compare_out,host=9.61.1.8,port=9005,server,nowait \ + + -chardev socket,id=compare_out0,host=9.61.1.8,port=9005 \ + + -object filter-mirror,id=m0,netdev=hn0,queue=tx,outdev=mirror0 \ + + -object filter-redirector,netdev=hn0,id=redire0,queue=rx,indev=compare_out +-object filter-redirector,netdev=hn0,id=redire1,queue=rx,outdev=compare0 \ + + -object +colo-compare,id=comp0,primary_in=compare0-0,secondary_in=compare1,outdev=compare_out0 + +2 Second node: + +x86_64-softmmu/qemu-system-x86_64 -boot c -m 2048 -smp 2 -qmp stdio -vnc :7 +-name secondary -enable-kvm -cpu qemu64,+kvmclock -device piix3-usb-uhci -usb +-usbdevice tablet\ + + -drive +if=none,id=colo-disk0,file.filename=/mnt/sdd/pure_IMG/linux/redhat/rhel_6.5_64_2U_ide,driver=qcow2,node-name=node0 + \ + + -drive +if=virtio,id=active-disk0,driver=replication,mode=secondary,file.driver=qcow2,top-id=active-disk0,file.file.filename=/mnt/ramfstest/active_disk.img,file.backing.driver=qcow2,file.backing.file.filename=/mnt/ramfstest/hidden_disk.img,file.backing.backing=colo-disk0 + \ + + -netdev +tap,id=hn1,vhost=off,script=/etc/qemu-ifup2,downscript=/etc/qemu-ifdown2 \ + + -device e1000,id=e1,netdev=hn1,mac=52:a4:00:12:78:67 \ + + -netdev +tap,id=hn0,vhost=off,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown \ + + -device e1000,netdev=hn0,mac=52:a4:00:12:78:66 -chardev +socket,id=red0,host=9.61.1.8,port=9003 \ + + -chardev socket,id=red1,host=9.61.1.8,port=9004 \ + + -object filter-redirector,id=f1,netdev=hn0,queue=tx,indev=red0 \ + + -object filter-redirector,id=f2,netdev=hn0,queue=rx,outdev=red1 \ + + -object filter-rewriter,id=rew0,netdev=hn0,queue=all -incoming tcp:0:8888 + +3 Secondary node: + +{'execute':'qmp_capabilities'} + +{ 'execute': 'nbd-server-start', + + 'arguments': {'addr': {'type': 'inet', 'data': {'host': '9.61.1.7', 'port': +'8889'} } } + +} + +{'execute': 'nbd-server-add', 'arguments': {'device': 'colo-disk0', 'writable': +true } } + +4:Primary Nodeï¼ + +{'execute':'qmp_capabilities'} + + +{ 'execute': 'human-monitor-command', + + 'arguments': {'command-line': 'drive_add -n buddy +driver=replication,mode=primary,file.driver=nbd,file.host=9.61.1.7,file.port=8889,file.export=colo-disk0,node-name=node0'}} + +{ 'execute':'x-blockdev-change', 'arguments':{'parent': 'colo-disk0', 'node': +'node0' } } + +{ 'execute': 'migrate-set-capabilities', + + 'arguments': {'capabilities': [ {'capability': 'x-colo', 'state': true } +] } } + +{ 'execute': 'migrate', 'arguments': {'uri': 'tcp:9.61.1.7:8888' } } + + + + +then can see two runing VMs, whenever you make changes to PVM, SVM will be +synced. + + + + +5ï¼Primary Nodeï¼ + +echo c ï¼ /proc/sysrq-trigger + + + + +ï¼ï¼Secondary node: + +{ 'execute': 'nbd-server-stop' } + +{ "execute": "x-colo-lost-heartbeat" } + + + + +then can see the Secondary node hang at recvmsg recvmsg . + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ç广10165992 address@hidden +æéäººï¼ address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ21æ¥ 16:27 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang + + + + + +Hi, + +On 2017/3/21 16:10, address@hidden wrote: +ï¼ Thank youã +ï¼ +ï¼ I have test areadyã +ï¼ +ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã +ï¼ +ï¼ Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node qemu +will not produce the problem,but Primary Node panic canã +ï¼ +ï¼ I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. +ï¼ +ï¼ + +Yes, you are right, when we do failover for primary/secondary VM, we will +shutdown the related +fd in case it is stuck in the read/write fd. + +It seems that you didn't follow the above introduction exactly to do the test. +Could you +share your test procedures ? Especially the commands used in the test. + +Thanks, +Hailiang + +ï¼ when failover,channel_shutdown could not shut down the channel. +ï¼ +ï¼ +ï¼ so the colo_process_incoming_thread will hang at recvmsg. +ï¼ +ï¼ +ï¼ I test a patch: +ï¼ +ï¼ +ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ +ï¼ +ï¼ index 13966f1..d65a0ea 100644 +ï¼ +ï¼ +ï¼ --- a/migration/socket.c +ï¼ +ï¼ +ï¼ +++ b/migration/socket.c +ï¼ +ï¼ +ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ +ï¼ +ï¼ } +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ trace_migration_socket_incoming_accepted() +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +ï¼ +ï¼ +ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ +ï¼ +ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ +ï¼ +ï¼ QIO_CHANNEL(sioc)) +ï¼ +ï¼ +ï¼ object_unref(OBJECT(sioc)) +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ My test will not hang any more. +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ åå§é®ä»¶ +ï¼ +ï¼ +ï¼ +ï¼ åä»¶äººï¼ address@hidden +ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden +ï¼ æéäººï¼ address@hidden address@hidden +ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +ï¼ ä¸» é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ +ï¼ Hi,Wang. +ï¼ +ï¼ You can test this branch: +ï¼ +ï¼ +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +ï¼ +ï¼ and please follow wiki ensure your own configuration correctly. +ï¼ +ï¼ +http://wiki.qemu-project.org/Features/COLO +ï¼ +ï¼ +ï¼ Thanks +ï¼ +ï¼ Zhang Chen +ï¼ +ï¼ +ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ ï¼ +ï¼ ï¼ hi. +ï¼ ï¼ +ï¼ ï¼ I test the git qemu master have the same problem. +ï¼ ï¼ +ï¼ ï¼ (gdb) bt +ï¼ ï¼ +ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ ï¼ +ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ ï¼ (address@hidden, address@hidden "", +ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ ï¼ +ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ +ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ ï¼ migration/qemu-file.c:295 +ï¼ ï¼ +ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ ï¼ +ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:568 +ï¼ ï¼ +ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ migration/qemu-file.c:648 +ï¼ ï¼ +ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ ï¼ address@hidden) at migration/colo.c:244 +ï¼ ï¼ +ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ ï¼ outï¼, address@hidden, +ï¼ ï¼ address@hidden) +ï¼ ï¼ +ï¼ ï¼ at migration/colo.c:264 +ï¼ ï¼ +ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ ï¼ +ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ +ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ +ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ ï¼ +ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ ï¼ +ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ ï¼ +ï¼ ï¼ $3 = 0 +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ (gdb) bt +ï¼ ï¼ +ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ ï¼ +ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ ï¼ gmain.c:3054 +ï¼ ï¼ +ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ ï¼ address@hidden) at gmain.c:3630 +ï¼ ï¼ +ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ ï¼ +ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ ï¼ util/main-loop.c:258 +ï¼ ï¼ +ï¼ ï¼ #5 main_loop_wait (address@hidden) at +ï¼ ï¼ util/main-loop.c:506 +ï¼ ï¼ +ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ ï¼ +ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ ï¼ outï¼) at vl.c:4709 +ï¼ ï¼ +ï¼ ï¼ (gdb) p ioc-ï¼features +ï¼ ï¼ +ï¼ ï¼ $1 = 6 +ï¼ ï¼ +ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ ï¼ +ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ May be socket_accept_incoming_migration should +ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ thank you. +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ åå§é®ä»¶ +ï¼ ï¼ address@hidden +ï¼ ï¼ address@hidden +ï¼ ï¼ address@hidden@huawei.comï¼ +ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ ï¼ ï¼ Do the colo have any plan for development? +ï¼ ï¼ +ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ ï¼ +ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ ï¼ +ï¼ ï¼ In our internal version can run it successfully, +ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ ï¼ Next time if you have some question about COLO, +ï¼ ï¼ please cc me and zhanghailiang address@hidden +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ Thanks +ï¼ ï¼ Zhang Chen +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ ï¼ ï¼ (gdb) bt +ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) at +ï¼ ï¼ ï¼ io/channel-socket.c:497 +ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ ï¼ ï¼ address@hidden "", address@hidden, +ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ ï¼ ï¼ migration/qemu-file.c:257 +ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ ï¼ ï¼ migration/qemu-file.c:523 +ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ ï¼ ï¼ migration/qemu-file.c:603 +ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ ï¼ ï¼ address@hidden) at migration/colo..c:215 +ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ ï¼ ï¼ migration/colo.c:546 +ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ ï¼ ï¼ migration/colo.c:649 +ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc.so.6 +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ -- +ï¼ ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ -- +ï¼ ï¼ Thanks +ï¼ ï¼ Zhang Chen +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ ï¼ +ï¼ + +diff --git a/migration/socket.c b/migration/socket.c + + +index 13966f1..d65a0ea 100644 + + +--- a/migration/socket.c + + ++++ b/migration/socket.c + + +@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel +*ioc, + + + } + + + + + + trace_migration_socket_incoming_accepted() + + + + + + qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") + + ++ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) + + + migration_channel_process_incoming(migrate_get_current(), + + + QIO_CHANNEL(sioc)) + + + object_unref(OBJECT(sioc)) + + + + +Is this patch ok? + +I have test it . The test could not hang any more. + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ address@hidden address@hidden +æéäººï¼ address@hidden address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang + + + + + +On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: +ï¼ * Hailiang Zhang (address@hidden) wrote: +ï¼ï¼ Hi, +ï¼ï¼ +ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. +ï¼ï¼ +ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover, +ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) +ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if +ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN). +ï¼ï¼ +ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden +ï¼ï¼ +ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() +ï¼ï¼ if we tried to cancel migration. +ï¼ï¼ +ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, +Error **errp) +ï¼ï¼ { +ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") +ï¼ï¼ migration_channel_connect(s, ioc, NULL) +ï¼ï¼ ... ... +ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +ï¼ï¼ and the +ï¼ï¼ migrate_fd_cancel() +ï¼ï¼ { +ï¼ï¼ ... ... +ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { +ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? +ï¼ï¼ } +ï¼ï¼ } +ï¼ +ï¼ (cc'd in Daniel Berrange). +ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) +at the +ï¼ top of qio_channel_socket_new so I think that's safe isn't it? +ï¼ + +Hmm, you are right, this problem is only exist for the migration incoming fd, +thanks. + +ï¼ Dave +ï¼ +ï¼ï¼ Thanks, +ï¼ï¼ Hailiang +ï¼ï¼ +ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: +ï¼ï¼ï¼ Thank youã +ï¼ï¼ï¼ +ï¼ï¼ï¼ I have test areadyã +ï¼ï¼ï¼ +ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã +ï¼ï¼ï¼ +ï¼ï¼ï¼ Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node +qemu will not produce the problem,but Primary Node panic canã +ï¼ï¼ï¼ +ï¼ï¼ï¼ I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ I test a patch: +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ --- a/migration/socket.c +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +++ b/migration/socket.c +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ } +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ QIO_CHANNEL(sioc)) +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ object_unref(OBJECT(sioc)) +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ My test will not hang any more. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ åå§é®ä»¶ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ åä»¶äººï¼ address@hidden +ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden +ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden +ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ Hi,Wang. +ï¼ï¼ï¼ +ï¼ï¼ï¼ You can test this branch: +ï¼ï¼ï¼ +ï¼ï¼ï¼ +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +ï¼ï¼ï¼ +ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +http://wiki.qemu-project.org/Features/COLO +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ Thanks +ï¼ï¼ï¼ +ï¼ï¼ï¼ Zhang Chen +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ hi. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", +ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ï¼ï¼ ï¼ outï¼, address@hidden, +ï¼ï¼ï¼ ï¼ address@hidden) +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ at migration/colo.c:264 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $3 = 0 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ï¼ï¼ ï¼ gmain.c:3054 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ï¼ï¼ ï¼ util/main-loop.c:258 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at +ï¼ï¼ï¼ ï¼ util/main-loop.c:506 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $1 = 6 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should +ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ thank you. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ åå§é®ä»¶ +ï¼ï¼ï¼ ï¼ address@hidden +ï¼ï¼ï¼ ï¼ address@hidden +ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ +ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ï¼ï¼ ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, +ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, +ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ Thanks +ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt +ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) +at +ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 +ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 +ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 +ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 +ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 +ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 +ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ -- +ï¼ï¼ï¼ ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ -- +ï¼ï¼ï¼ ï¼ Thanks +ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ +ï¼ï¼ +ï¼ -- +ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK +ï¼ +ï¼ . +ï¼ + +Hi, + +On 2017/3/22 9:42, address@hidden wrote: +diff --git a/migration/socket.c b/migration/socket.c + + +index 13966f1..d65a0ea 100644 + + +--- a/migration/socket.c + + ++++ b/migration/socket.c + + +@@ -147,8 +147,9 @@ static gboolean socket_accept_incoming_migration(QIOChannel +*ioc, + + + } + + + + + + trace_migration_socket_incoming_accepted() + + + + + + qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") + + ++ qio_channel_set_feature(QIO_CHANNEL(sioc), QIO_CHANNEL_FEATURE_SHUTDOWN) + + + migration_channel_process_incoming(migrate_get_current(), + + + QIO_CHANNEL(sioc)) + + + object_unref(OBJECT(sioc)) + + + + +Is this patch ok? +Yes, i think this works, but a better way maybe to call +qio_channel_set_feature() +in qio_channel_socket_accept(), we didn't set the SHUTDOWN feature for the +socket accept fd, +Or fix it by this: + +diff --git a/io/channel-socket.c b/io/channel-socket.c +index f546c68..ce6894c 100644 +--- a/io/channel-socket.c ++++ b/io/channel-socket.c +@@ -330,9 +330,8 @@ qio_channel_socket_accept(QIOChannelSocket *ioc, + Error **errp) + { + QIOChannelSocket *cioc; +- +- cioc = QIO_CHANNEL_SOCKET(object_new(TYPE_QIO_CHANNEL_SOCKET)); +- cioc->fd = -1; ++ ++ cioc = qio_channel_socket_new(); + cioc->remoteAddrLen = sizeof(ioc->remoteAddr); + cioc->localAddrLen = sizeof(ioc->localAddr); + + +Thanks, +Hailiang +I have test it . The test could not hang any more. + + + + + + + + + + + + +åå§é®ä»¶ + + + +åä»¶äººï¼ address@hidden +æ¶ä»¶äººï¼ address@hidden address@hidden +æéäººï¼ address@hidden address@hidden address@hidden +æ¥ æ ï¼2017å¹´03æ22æ¥ 09:11 +主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: çå¤: Re: [BUG]COLO failover hang + + + + + +On 2017/3/21 19:56, Dr. David Alan Gilbert wrote: +ï¼ * Hailiang Zhang (address@hidden) wrote: +ï¼ï¼ Hi, +ï¼ï¼ +ï¼ï¼ Thanks for reporting this, and i confirmed it in my test, and it is a bug. +ï¼ï¼ +ï¼ï¼ Though we tried to call qemu_file_shutdown() to shutdown the related fd, in +ï¼ï¼ case COLO thread/incoming thread is stuck in read/write() while do failover, +ï¼ï¼ but it didn't take effect, because all the fd used by COLO (also migration) +ï¼ï¼ has been wrapped by qio channel, and it will not call the shutdown API if +ï¼ï¼ we didn't qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN). +ï¼ï¼ +ï¼ï¼ Cc: Dr. David Alan Gilbert address@hidden +ï¼ï¼ +ï¼ï¼ I doubted migration cancel has the same problem, it may be stuck in write() +ï¼ï¼ if we tried to cancel migration. +ï¼ï¼ +ï¼ï¼ void fd_start_outgoing_migration(MigrationState *s, const char *fdname, +Error **errp) +ï¼ï¼ { +ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(ioc), "migration-fd-outgoing") +ï¼ï¼ migration_channel_connect(s, ioc, NULL) +ï¼ï¼ ... ... +ï¼ï¼ We didn't call qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) above, +ï¼ï¼ and the +ï¼ï¼ migrate_fd_cancel() +ï¼ï¼ { +ï¼ï¼ ... ... +ï¼ï¼ if (s-ï¼state == MIGRATION_STATUS_CANCELLING && f) { +ï¼ï¼ qemu_file_shutdown(f) --ï¼ This will not take effect. No ? +ï¼ï¼ } +ï¼ï¼ } +ï¼ +ï¼ (cc'd in Daniel Berrange). +ï¼ I see that we call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN) +at the +ï¼ top of qio_channel_socket_new so I think that's safe isn't it? +ï¼ + +Hmm, you are right, this problem is only exist for the migration incoming fd, +thanks. + +ï¼ Dave +ï¼ +ï¼ï¼ Thanks, +ï¼ï¼ Hailiang +ï¼ï¼ +ï¼ï¼ On 2017/3/21 16:10, address@hidden wrote: +ï¼ï¼ï¼ Thank youã +ï¼ï¼ï¼ +ï¼ï¼ï¼ I have test areadyã +ï¼ï¼ï¼ +ï¼ï¼ï¼ When the Primary Node panic,the Secondary Node qemu hang at the same placeã +ï¼ï¼ï¼ +ï¼ï¼ï¼ Incorrding +http://wiki.qemu-project.org/Features/COLO +ï¼kill Primary Node +qemu will not produce the problem,but Primary Node panic canã +ï¼ï¼ï¼ +ï¼ï¼ï¼ I think due to the feature of channel does not support +QIO_CHANNEL_FEATURE_SHUTDOWN. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ when failover,channel_shutdown could not shut down the channel. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ so the colo_process_incoming_thread will hang at recvmsg. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ I test a patch: +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ diff --git a/migration/socket.c b/migration/socket.c +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ index 13966f1..d65a0ea 100644 +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ --- a/migration/socket.c +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +++ b/migration/socket.c +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ @@ -147,8 +147,9 @@ static gboolean +socket_accept_incoming_migration(QIOChannel *ioc, +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ } +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ trace_migration_socket_incoming_accepted() +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ qio_channel_set_name(QIO_CHANNEL(sioc), "migration-socket-incoming") +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ + qio_channel_set_feature(QIO_CHANNEL(sioc), +QIO_CHANNEL_FEATURE_SHUTDOWN) +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ migration_channel_process_incoming(migrate_get_current(), +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ QIO_CHANNEL(sioc)) +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ object_unref(OBJECT(sioc)) +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ My test will not hang any more. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ åå§é®ä»¶ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ åä»¶äººï¼ address@hidden +ï¼ï¼ï¼ æ¶ä»¶äººï¼ç广10165992 address@hidden +ï¼ï¼ï¼ æéäººï¼ address@hidden address@hidden +ï¼ï¼ï¼ æ¥ æ ï¼2017å¹´03æ21æ¥ 15:58 +ï¼ï¼ï¼ 主 é¢ ï¼Re: [Qemu-devel] çå¤: Re: [BUG]COLO failover hang +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ Hi,Wang. +ï¼ï¼ï¼ +ï¼ï¼ï¼ You can test this branch: +ï¼ï¼ï¼ +ï¼ï¼ï¼ +https://github.com/coloft/qemu/tree/colo-v5.1-developing-COLO-frame-v21-with-shared-disk +ï¼ï¼ï¼ +ï¼ï¼ï¼ and please follow wiki ensure your own configuration correctly. +ï¼ï¼ï¼ +ï¼ï¼ï¼ +http://wiki.qemu-project.org/Features/COLO +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ Thanks +ï¼ï¼ï¼ +ï¼ï¼ï¼ Zhang Chen +ï¼ï¼ï¼ +ï¼ï¼ï¼ +ï¼ï¼ï¼ On 03/21/2017 03:27 PM, address@hidden wrote: +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ hi. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ I test the git qemu master have the same problem. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #0 qio_channel_socket_readv (ioc=0x7f65911b4e50, iov=0x7f64ef3fd880, +ï¼ï¼ï¼ ï¼ niov=1, fds=0x0, nfds=0x0, errp=0x0) at io/channel-socket.c:461 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #1 0x00007f658e4aa0c2 in qio_channel_read +ï¼ï¼ï¼ ï¼ (address@hidden, address@hidden "", +ï¼ï¼ï¼ ï¼ address@hidden, address@hidden) at io/channel.c:114 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #2 0x00007f658e3ea990 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ buf=0x7f65907cb838 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ï¼ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #3 0x00007f658e3e97fc in qemu_fill_buffer (f=0x7f65907cb800) at +ï¼ï¼ï¼ ï¼ migration/qemu-file.c:295 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #4 0x00007f658e3ea2e1 in qemu_peek_byte (address@hidden, +ï¼ï¼ï¼ ï¼ address@hidden) at migration/qemu-file.c:555 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #5 0x00007f658e3ea34b in qemu_get_byte (address@hidden) at +ï¼ï¼ï¼ ï¼ migration/qemu-file.c:568 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #6 0x00007f658e3ea552 in qemu_get_be32 (address@hidden) at +ï¼ï¼ï¼ ï¼ migration/qemu-file.c:648 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #7 0x00007f658e3e66e5 in colo_receive_message (f=0x7f65907cb800, +ï¼ï¼ï¼ ï¼ address@hidden) at migration/colo.c:244 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #8 0x00007f658e3e681e in colo_receive_check_message (f=ï¼optimized +ï¼ï¼ï¼ ï¼ outï¼, address@hidden, +ï¼ï¼ï¼ ï¼ address@hidden) +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ at migration/colo.c:264 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #9 0x00007f658e3e740e in colo_process_incoming_thread +ï¼ï¼ï¼ ï¼ (opaque=0x7f658eb30360 ï¼mis_current.31286ï¼) at migration/colo.c:577 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #10 0x00007f658be09df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #11 0x00007f65881983ed in clone () from /lib64/libc.so.6 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $2 = 0x7f658ff7d5c0 "migration-socket-incoming" +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features Do not support QIO_CHANNEL_FEATURE_SHUTDOWN +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $3 = 0 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) bt +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #0 socket_accept_incoming_migration (ioc=0x7fdcceeafa90, +ï¼ï¼ï¼ ï¼ condition=G_IO_IN, opaque=0x7fdcceeafa90) at migration/socket.c:137 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #1 0x00007fdcc6966350 in g_main_dispatch (context=ï¼optimized outï¼) at +ï¼ï¼ï¼ ï¼ gmain.c:3054 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #2 g_main_context_dispatch (context=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ address@hidden) at gmain.c:3630 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #3 0x00007fdccb8a6dcc in glib_pollfds_poll () at util/main-loop.c:213 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #4 os_host_main_loop_wait (timeout=ï¼optimized outï¼) at +ï¼ï¼ï¼ ï¼ util/main-loop.c:258 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #5 main_loop_wait (address@hidden) at +ï¼ï¼ï¼ ï¼ util/main-loop.c:506 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #6 0x00007fdccb526187 in main_loop () at vl.c:1898 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ #7 main (argc=ï¼optimized outï¼, argv=ï¼optimized outï¼, envp=ï¼optimized +ï¼ï¼ï¼ ï¼ outï¼) at vl.c:4709 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼features +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $1 = 6 +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ (gdb) p ioc-ï¼name +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ $2 = 0x7fdcce1b1ab0 "migration-socket-listener" +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ May be socket_accept_incoming_migration should +ï¼ï¼ï¼ ï¼ call qio_channel_set_feature(ioc, QIO_CHANNEL_FEATURE_SHUTDOWN)?? +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ thank you. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ åå§é®ä»¶ +ï¼ï¼ï¼ ï¼ address@hidden +ï¼ï¼ï¼ ï¼ address@hidden +ï¼ï¼ï¼ ï¼ address@hidden@huawei.comï¼ +ï¼ï¼ï¼ ï¼ *æ¥ æ ï¼*2017å¹´03æ16æ¥ 14:46 +ï¼ï¼ï¼ ï¼ *主 é¢ ï¼**Re: [Qemu-devel] COLO failover hang* +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ On 03/15/2017 05:06 PM, wangguang wrote: +ï¼ï¼ï¼ ï¼ ï¼ am testing QEMU COLO feature described here [QEMU +ï¼ï¼ï¼ ï¼ ï¼ Wiki]( +http://wiki.qemu-project.org/Features/COLO +). +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ When the Primary Node panic,the Secondary Node qemu hang. +ï¼ï¼ï¼ ï¼ ï¼ hang at recvmsg in qio_channel_socket_readv. +ï¼ï¼ï¼ ï¼ ï¼ And I run { 'execute': 'nbd-server-stop' } and { "execute": +ï¼ï¼ï¼ ï¼ ï¼ "x-colo-lost-heartbeat" } in Secondary VM's +ï¼ï¼ï¼ ï¼ ï¼ monitor,the Secondary Node qemu still hang at recvmsg . +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ I found that the colo in qemu is not complete yet. +ï¼ï¼ï¼ ï¼ ï¼ Do the colo have any plan for development? +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ Yes, We are developing. You can see some of patch we pushing. +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ Has anyone ever run it successfully? Any help is appreciated! +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ In our internal version can run it successfully, +ï¼ï¼ï¼ ï¼ The failover detail you can ask Zhanghailiang for help. +ï¼ï¼ï¼ ï¼ Next time if you have some question about COLO, +ï¼ï¼ï¼ ï¼ please cc me and zhanghailiang address@hidden +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ Thanks +ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ centos7.2+qemu2.7.50 +ï¼ï¼ï¼ ï¼ ï¼ (gdb) bt +ï¼ï¼ï¼ ï¼ ï¼ #0 0x00007f3e00cc86ad in recvmsg () from /lib64/libpthread.so.0 +ï¼ï¼ï¼ ï¼ ï¼ #1 0x00007f3e0332b738 in qio_channel_socket_readv (ioc=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ ï¼ iov=ï¼optimized outï¼, niov=ï¼optimized outï¼, fds=0x0, nfds=0x0, errp=0x0) +at +ï¼ï¼ï¼ ï¼ ï¼ io/channel-socket.c:497 +ï¼ï¼ï¼ ï¼ ï¼ #2 0x00007f3e03329472 in qio_channel_read (address@hidden, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden "", address@hidden, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at io/channel.c:97 +ï¼ï¼ï¼ ï¼ ï¼ #3 0x00007f3e032750e0 in channel_get_buffer (opaque=ï¼optimized outï¼, +ï¼ï¼ï¼ ï¼ ï¼ buf=0x7f3e05910f38 "", pos=ï¼optimized outï¼, size=32768) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file-channel.c:78 +ï¼ï¼ï¼ ï¼ ï¼ #4 0x00007f3e0327412c in qemu_fill_buffer (f=0x7f3e05910f00) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:257 +ï¼ï¼ï¼ ï¼ ï¼ #5 0x00007f3e03274a41 in qemu_peek_byte (address@hidden, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/qemu-file.c:510 +ï¼ï¼ï¼ ï¼ ï¼ #6 0x00007f3e03274aab in qemu_get_byte (address@hidden) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:523 +ï¼ï¼ï¼ ï¼ ï¼ #7 0x00007f3e03274cb2 in qemu_get_be32 (address@hidden) at +ï¼ï¼ï¼ ï¼ ï¼ migration/qemu-file.c:603 +ï¼ï¼ï¼ ï¼ ï¼ #8 0x00007f3e03271735 in colo_receive_message (f=0x7f3e05910f00, +ï¼ï¼ï¼ ï¼ ï¼ address@hidden) at migration/colo.c:215 +ï¼ï¼ï¼ ï¼ ï¼ #9 0x00007f3e0327250d in colo_wait_handle_message (errp=0x7f3d62bfaa48, +ï¼ï¼ï¼ ï¼ ï¼ checkpoint_request=ï¼synthetic pointerï¼, f=ï¼optimized outï¼) at +ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:546 +ï¼ï¼ï¼ ï¼ ï¼ #10 colo_process_incoming_thread (opaque=0x7f3e067245e0) at +ï¼ï¼ï¼ ï¼ ï¼ migration/colo.c:649 +ï¼ï¼ï¼ ï¼ ï¼ #11 0x00007f3e00cc1df3 in start_thread () from /lib64/libpthread.so.0 +ï¼ï¼ï¼ ï¼ ï¼ #12 0x00007f3dfc9c03ed in clone () from /lib64/libc..so.6 +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ -- +ï¼ï¼ï¼ ï¼ ï¼ View this message in context: +http://qemu.11.n7.nabble.com/COLO-failover-hang-tp473250.html +ï¼ï¼ï¼ ï¼ ï¼ Sent from the Developer mailing list archive at Nabble.com. +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ -- +ï¼ï¼ï¼ ï¼ Thanks +ï¼ï¼ï¼ ï¼ Zhang Chen +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ ï¼ +ï¼ï¼ï¼ +ï¼ï¼ +ï¼ -- +ï¼ Dr. David Alan Gilbert / address@hidden / Manchester, UK +ï¼ +ï¼ . +ï¼ + diff --git a/classification_output/01/other/6257722 b/classification_output/01/other/6257722 new file mode 100644 index 000000000..afdfbfd3e --- /dev/null +++ b/classification_output/01/other/6257722 @@ -0,0 +1,716 @@ +other: 0.714 +semantic: 0.671 +instruction: 0.641 +mistranslation: 0.535 + +[Qemu-devel] [BUG] VM abort after migration + +Hi guys, + +We found a qemu core in our testing environment, the assertion +'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +the bus->irq_count[i] is '-1'. + +Through analysis, it was happened after VM migration and we think +it was caused by the following sequence: + +*Migration Source* +1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +2. save E1000: + e1000_pre_save + e1000_mit_timer + set_interrupt_cause + pci_set_irq --> update pci_dev->irq_state to 1 and + update bus->irq_count[x] to 1 ( new ) + the irq_state sent to dest. + +*Migration Dest* +1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1. +2. If the e1000 need change irqline , it would call to pci_irq_handler(), + the irq_state maybe change to 0 and bus->irq_count[x] will become + -1 in this situation. +3. do VM reboot then the assertion will be triggered. + +We also found some guys faced the similar problem: +[1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +[2] +https://bugs.launchpad.net/qemu/+bug/1702621 +Is there some patches to fix this problem ? +Can we save pcibus state after all the pci devs are saved ? + +Thanks, +Longpeng(Mike) + +* longpeng (address@hidden) wrote: +> +Hi guys, +> +> +We found a qemu core in our testing environment, the assertion +> +'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +> +the bus->irq_count[i] is '-1'. +> +> +Through analysis, it was happened after VM migration and we think +> +it was caused by the following sequence: +> +> +*Migration Source* +> +1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +> +2. save E1000: +> +e1000_pre_save +> +e1000_mit_timer +> +set_interrupt_cause +> +pci_set_irq --> update pci_dev->irq_state to 1 and +> +update bus->irq_count[x] to 1 ( new ) +> +the irq_state sent to dest. +> +> +*Migration Dest* +> +1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1. +> +2. If the e1000 need change irqline , it would call to pci_irq_handler(), +> +the irq_state maybe change to 0 and bus->irq_count[x] will become +> +-1 in this situation. +> +3. do VM reboot then the assertion will be triggered. +> +> +We also found some guys faced the similar problem: +> +[1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +> +[2] +https://bugs.launchpad.net/qemu/+bug/1702621 +> +> +Is there some patches to fix this problem ? +I don't remember any. + +> +Can we save pcibus state after all the pci devs are saved ? +Does this problem only happen with e1000? I think so. +If it's only e1000 I think we should fix it - I think once the VM is +stopped for doing the device migration it shouldn't be raising +interrupts. + +Dave + +> +Thanks, +> +Longpeng(Mike) +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote: +* longpeng (address@hidden) wrote: +Hi guys, + +We found a qemu core in our testing environment, the assertion +'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +the bus->irq_count[i] is '-1'. + +Through analysis, it was happened after VM migration and we think +it was caused by the following sequence: + +*Migration Source* +1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +2. save E1000: + e1000_pre_save + e1000_mit_timer + set_interrupt_cause + pci_set_irq --> update pci_dev->irq_state to 1 and + update bus->irq_count[x] to 1 ( new ) + the irq_state sent to dest. + +*Migration Dest* +1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1. +2. If the e1000 need change irqline , it would call to pci_irq_handler(), + the irq_state maybe change to 0 and bus->irq_count[x] will become + -1 in this situation. +3. do VM reboot then the assertion will be triggered. + +We also found some guys faced the similar problem: +[1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +[2] +https://bugs.launchpad.net/qemu/+bug/1702621 +Is there some patches to fix this problem ? +I don't remember any. +Can we save pcibus state after all the pci devs are saved ? +Does this problem only happen with e1000? I think so. +If it's only e1000 I think we should fix it - I think once the VM is +stopped for doing the device migration it shouldn't be raising +interrupts. +I wonder maybe we can simply fix this by no setting ICS on pre_save() +but scheduling mit timer unconditionally in post_load(). +Thanks +Dave +Thanks, +Longpeng(Mike) +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +å¨ 2019/7/10 11:25, Jason Wang åé: +> +> +On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote: +> +> * longpeng (address@hidden) wrote: +> +>> Hi guys, +> +>> +> +>> We found a qemu core in our testing environment, the assertion +> +>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +> +>> the bus->irq_count[i] is '-1'. +> +>> +> +>> Through analysis, it was happened after VM migration and we think +> +>> it was caused by the following sequence: +> +>> +> +>> *Migration Source* +> +>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +> +>> 2. save E1000: +> +>>    e1000_pre_save +> +>>     e1000_mit_timer +> +>>      set_interrupt_cause +> +>>       pci_set_irq --> update pci_dev->irq_state to 1 and +> +>>                   update bus->irq_count[x] to 1 ( new ) +> +>>     the irq_state sent to dest. +> +>> +> +>> *Migration Dest* +> +>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1. +> +>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(), +> +>>   the irq_state maybe change to 0 and bus->irq_count[x] will become +> +>>   -1 in this situation. +> +>> 3. do VM reboot then the assertion will be triggered. +> +>> +> +>> We also found some guys faced the similar problem: +> +>> [1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +> +>> [2] +https://bugs.launchpad.net/qemu/+bug/1702621 +> +>> +> +>> Is there some patches to fix this problem ? +> +> I don't remember any. +> +> +> +>> Can we save pcibus state after all the pci devs are saved ? +> +> Does this problem only happen with e1000? I think so. +> +> If it's only e1000 I think we should fix it - I think once the VM is +> +> stopped for doing the device migration it shouldn't be raising +> +> interrupts. +> +> +> +I wonder maybe we can simply fix this by no setting ICS on pre_save() but +> +scheduling mit timer unconditionally in post_load(). +> +I also think this is a bug of e1000 because we find more cores with the same +frame thease days. + +I'm not familiar with e1000 so hope someone could fix it, thanks. :) + +> +Thanks +> +> +> +> +> +> Dave +> +> +> +>> Thanks, +> +>> Longpeng(Mike) +> +> -- +> +> Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +> +. +> +-- +Regards, +Longpeng(Mike) + +On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote: +å¨ 2019/7/10 11:25, Jason Wang åé: +On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote: +* longpeng (address@hidden) wrote: +Hi guys, + +We found a qemu core in our testing environment, the assertion +'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +the bus->irq_count[i] is '-1'. + +Through analysis, it was happened after VM migration and we think +it was caused by the following sequence: + +*Migration Source* +1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +2. save E1000: +    e1000_pre_save +     e1000_mit_timer +      set_interrupt_cause +       pci_set_irq --> update pci_dev->irq_state to 1 and +                   update bus->irq_count[x] to 1 ( new ) +     the irq_state sent to dest. + +*Migration Dest* +1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1. +2. If the e1000 need change irqline , it would call to pci_irq_handler(), +   the irq_state maybe change to 0 and bus->irq_count[x] will become +   -1 in this situation. +3. do VM reboot then the assertion will be triggered. + +We also found some guys faced the similar problem: +[1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +[2] +https://bugs.launchpad.net/qemu/+bug/1702621 +Is there some patches to fix this problem ? +I don't remember any. +Can we save pcibus state after all the pci devs are saved ? +Does this problem only happen with e1000? I think so. +If it's only e1000 I think we should fix it - I think once the VM is +stopped for doing the device migration it shouldn't be raising +interrupts. +I wonder maybe we can simply fix this by no setting ICS on pre_save() but +scheduling mit timer unconditionally in post_load(). +I also think this is a bug of e1000 because we find more cores with the same +frame thease days. + +I'm not familiar with e1000 so hope someone could fix it, thanks. :) +Draft a path in attachment, please test. + +Thanks +Thanks +Dave +Thanks, +Longpeng(Mike) +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK +. +0001-e1000-don-t-raise-interrupt-in-pre_save.patch +Description: +Text Data + +å¨ 2019/7/10 11:57, Jason Wang åé: +> +> +On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote: +> +> å¨ 2019/7/10 11:25, Jason Wang åé: +> +>> On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote: +> +>>> * longpeng (address@hidden) wrote: +> +>>>> Hi guys, +> +>>>> +> +>>>> We found a qemu core in our testing environment, the assertion +> +>>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +> +>>>> the bus->irq_count[i] is '-1'. +> +>>>> +> +>>>> Through analysis, it was happened after VM migration and we think +> +>>>> it was caused by the following sequence: +> +>>>> +> +>>>> *Migration Source* +> +>>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +> +>>>> 2. save E1000: +> +>>>>     e1000_pre_save +> +>>>>      e1000_mit_timer +> +>>>>       set_interrupt_cause +> +>>>>        pci_set_irq --> update pci_dev->irq_state to 1 and +> +>>>>                    update bus->irq_count[x] to 1 ( new ) +> +>>>>      the irq_state sent to dest. +> +>>>> +> +>>>> *Migration Dest* +> +>>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is +> +>>>> 1. +> +>>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(), +> +>>>>    the irq_state maybe change to 0 and bus->irq_count[x] will become +> +>>>>    -1 in this situation. +> +>>>> 3. do VM reboot then the assertion will be triggered. +> +>>>> +> +>>>> We also found some guys faced the similar problem: +> +>>>> [1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +> +>>>> [2] +https://bugs.launchpad.net/qemu/+bug/1702621 +> +>>>> +> +>>>> Is there some patches to fix this problem ? +> +>>> I don't remember any. +> +>>> +> +>>>> Can we save pcibus state after all the pci devs are saved ? +> +>>> Does this problem only happen with e1000? I think so. +> +>>> If it's only e1000 I think we should fix it - I think once the VM is +> +>>> stopped for doing the device migration it shouldn't be raising +> +>>> interrupts. +> +>> +> +>> I wonder maybe we can simply fix this by no setting ICS on pre_save() but +> +>> scheduling mit timer unconditionally in post_load(). +> +>> +> +> I also think this is a bug of e1000 because we find more cores with the same +> +> frame thease days. +> +> +> +> I'm not familiar with e1000 so hope someone could fix it, thanks. :) +> +> +> +> +Draft a path in attachment, please test. +> +Thanks. We'll test it for a few weeks and then give you the feedback. :) + +> +Thanks +> +> +> +>> Thanks +> +>> +> +>> +> +>>> Dave +> +>>> +> +>>>> Thanks, +> +>>>> Longpeng(Mike) +> +>>> -- +> +>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +>> . +> +>> +-- +Regards, +Longpeng(Mike) + +å¨ 2019/7/10 11:57, Jason Wang åé: +> +> +On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote: +> +> å¨ 2019/7/10 11:25, Jason Wang åé: +> +>> On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote: +> +>>> * longpeng (address@hidden) wrote: +> +>>>> Hi guys, +> +>>>> +> +>>>> We found a qemu core in our testing environment, the assertion +> +>>>> 'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +> +>>>> the bus->irq_count[i] is '-1'. +> +>>>> +> +>>>> Through analysis, it was happened after VM migration and we think +> +>>>> it was caused by the following sequence: +> +>>>> +> +>>>> *Migration Source* +> +>>>> 1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +> +>>>> 2. save E1000: +> +>>>>     e1000_pre_save +> +>>>>      e1000_mit_timer +> +>>>>       set_interrupt_cause +> +>>>>        pci_set_irq --> update pci_dev->irq_state to 1 and +> +>>>>                    update bus->irq_count[x] to 1 ( new ) +> +>>>>      the irq_state sent to dest. +> +>>>> +> +>>>> *Migration Dest* +> +>>>> 1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is +> +>>>> 1. +> +>>>> 2. If the e1000 need change irqline , it would call to pci_irq_handler(), +> +>>>>    the irq_state maybe change to 0 and bus->irq_count[x] will become +> +>>>>    -1 in this situation. +> +>>>> 3. do VM reboot then the assertion will be triggered. +> +>>>> +> +>>>> We also found some guys faced the similar problem: +> +>>>> [1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +> +>>>> [2] +https://bugs.launchpad.net/qemu/+bug/1702621 +> +>>>> +> +>>>> Is there some patches to fix this problem ? +> +>>> I don't remember any. +> +>>> +> +>>>> Can we save pcibus state after all the pci devs are saved ? +> +>>> Does this problem only happen with e1000? I think so. +> +>>> If it's only e1000 I think we should fix it - I think once the VM is +> +>>> stopped for doing the device migration it shouldn't be raising +> +>>> interrupts. +> +>> +> +>> I wonder maybe we can simply fix this by no setting ICS on pre_save() but +> +>> scheduling mit timer unconditionally in post_load(). +> +>> +> +> I also think this is a bug of e1000 because we find more cores with the same +> +> frame thease days. +> +> +> +> I'm not familiar with e1000 so hope someone could fix it, thanks. :) +> +> +> +> +Draft a path in attachment, please test. +> +Hi Jason, + +We've tested the patch for about two weeks, everything went well, thanks! + +Feel free to add my: +Reported-and-tested-by: Longpeng <address@hidden> + +> +Thanks +> +> +> +>> Thanks +> +>> +> +>> +> +>>> Dave +> +>>> +> +>>>> Thanks, +> +>>>> Longpeng(Mike) +> +>>> -- +> +>>> Dr. David Alan Gilbert / address@hidden / Manchester, UK +> +>> . +> +>> +-- +Regards, +Longpeng(Mike) + +On 2019/7/27 ä¸å2:10, Longpeng (Mike) wrote: +å¨ 2019/7/10 11:57, Jason Wang åé: +On 2019/7/10 ä¸å11:36, Longpeng (Mike) wrote: +å¨ 2019/7/10 11:25, Jason Wang åé: +On 2019/7/8 ä¸å5:47, Dr. David Alan Gilbert wrote: +* longpeng (address@hidden) wrote: +Hi guys, + +We found a qemu core in our testing environment, the assertion +'assert(bus->irq_count[i] == 0)' in pcibus_reset() was triggered and +the bus->irq_count[i] is '-1'. + +Through analysis, it was happened after VM migration and we think +it was caused by the following sequence: + +*Migration Source* +1. save bus pci.0 state, including irq_count[x] ( =0 , old ) +2. save E1000: +     e1000_pre_save +      e1000_mit_timer +       set_interrupt_cause +        pci_set_irq --> update pci_dev->irq_state to 1 and +                    update bus->irq_count[x] to 1 ( new ) +      the irq_state sent to dest. + +*Migration Dest* +1. Receive the irq_count[x] of pci.0 is 0 , but the irq_state of e1000 is 1. +2. If the e1000 need change irqline , it would call to pci_irq_handler(), +    the irq_state maybe change to 0 and bus->irq_count[x] will become +    -1 in this situation. +3. do VM reboot then the assertion will be triggered. + +We also found some guys faced the similar problem: +[1] +https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg02525.html +[2] +https://bugs.launchpad.net/qemu/+bug/1702621 +Is there some patches to fix this problem ? +I don't remember any. +Can we save pcibus state after all the pci devs are saved ? +Does this problem only happen with e1000? I think so. +If it's only e1000 I think we should fix it - I think once the VM is +stopped for doing the device migration it shouldn't be raising +interrupts. +I wonder maybe we can simply fix this by no setting ICS on pre_save() but +scheduling mit timer unconditionally in post_load(). +I also think this is a bug of e1000 because we find more cores with the same +frame thease days. + +I'm not familiar with e1000 so hope someone could fix it, thanks. :) +Draft a path in attachment, please test. +Hi Jason, + +We've tested the patch for about two weeks, everything went well, thanks! + +Feel free to add my: +Reported-and-tested-by: Longpeng <address@hidden> +Applied. + +Thanks +Thanks +Thanks +Dave +Thanks, +Longpeng(Mike) +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK +. + diff --git a/classification_output/01/other/6355518 b/classification_output/01/other/6355518 new file mode 100644 index 000000000..650c10083 --- /dev/null +++ b/classification_output/01/other/6355518 @@ -0,0 +1,530 @@ +other: 0.953 +instruction: 0.951 +semantic: 0.939 +mistranslation: 0.888 + +[Qemu-devel] [BUG] gcov support appears to be broken + +Hello, according to out docs, here is the procedure that should produce +coverage report for execution of the complete "make check": + +#./configure --enable-gcov +#make +#make check +#make coverage-report + +It seems that first three commands execute as expected. (For example, there are +plenty of files generated by "make check" that would've not been generated if +"enable-gcov" hadn't been chosen.) However, the last command complains about +some missing files related to FP support. If those files are added (for +example, artificially, using "touch <missing-file"), that it starts complaining +about missing some decodetree-generated files. Other kinds of files are +involved too. + +It would be nice to have coverage support working. Please somebody take a look, +or explain if I make a mistake or misunderstood our gcov support. + +Yours, +Aleksandar + +On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote: +> +> +Hello, according to out docs, here is the procedure that should produce +> +coverage report for execution of the complete "make check": +> +> +#./configure --enable-gcov +> +#make +> +#make check +> +#make coverage-report +> +> +It seems that first three commands execute as expected. (For example, there +> +are plenty of files generated by "make check" that would've not been +> +generated if "enable-gcov" hadn't been chosen.) However, the last command +> +complains about some missing files related to FP support. If those files are +> +added (for example, artificially, using "touch <missing-file"), that it +> +starts complaining about missing some decodetree-generated files. Other kinds +> +of files are involved too. +> +> +It would be nice to have coverage support working. Please somebody take a +> +look, or explain if I make a mistake or misunderstood our gcov support. +Cc'ing Alex who's probably the closest we have to a gcov expert. + +(make/make check of a --enable-gcov build is in the set of things our +Travis CI setup runs, so we do defend that part against regressions.) + +thanks +-- PMM + +Peter Maydell <address@hidden> writes: + +> +On Mon, 5 Aug 2019 at 11:39, Aleksandar Markovic <address@hidden> wrote: +> +> +> +> Hello, according to out docs, here is the procedure that should produce +> +> coverage report for execution of the complete "make check": +> +> +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +> support. If those files are added (for example, artificially, using +> +> "touch <missing-file"), that it starts complaining about missing some +> +> decodetree-generated files. Other kinds of files are involved too. +The gcov tool is fairly noisy about missing files but that just +indicates the tests haven't exercised those code paths. "make check" +especially doesn't touch much of the TCG code and a chunk of floating +point. + +> +> +> +> It would be nice to have coverage support working. Please somebody +> +> take a look, or explain if I make a mistake or misunderstood our gcov +> +> support. +So your failure mode is no report is generated at all? It's working for +me here. + +> +> +Cc'ing Alex who's probably the closest we have to a gcov expert. +> +> +(make/make check of a --enable-gcov build is in the set of things our +> +Travis CI setup runs, so we do defend that part against regressions.) +We defend the build but I have just checked and it seems our +check_coverage script is currently failing: +https://travis-ci.org/stsquad/qemu/jobs/567809808#L10328 +But as it's an after_success script it doesn't fail the build. + +> +> +thanks +> +-- PMM +-- +Alex Bennée + +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +So your failure mode is no report is generated at all? It's working for +> +me here. +Alex, no report is generated for my test setups - in fact, "make +coverage-report" even says that it explicitly deletes what appears to be the +main coverage report html file). + +This is the terminal output of an unsuccessful executions of "make +coverage-report" for recent ToT: + +~/Build/qemu-TOT-TEST$ make coverage-report +make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp' +make[1]: Nothing to be done for 'all'. +make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp' + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: 'wrap.inc.c' +Makefile:1048: recipe for target +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed +make: *** +[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1 +make: *** Deleting file +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' + +This instance is executed in QEMU 3.0 source tree: (so, it looks the problem +existed for quite some time) + +~/Build/qemu-3.0$ make coverage-report + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: +'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c' +Makefile:992: recipe for target +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed +make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] +Error 1 +make: *** Deleting file +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' + +Fond regards, +Aleksandar + + +> +Alex Bennée + +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +So your failure mode is no report is generated at all? It's working for +> +me here. +Another piece of info: + +~/Build/qemu-TOT-TEST$ gcov --version +gcov (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010 +Copyright (C) 2015 Free Software Foundation, Inc. +This is free software; see the source for copying conditions. +There is NO warranty; not even for MERCHANTABILITY or +FITNESS FOR A PARTICULAR PURPOSE. + +:~/Build/qemu-TOT-TEST$ gcc --version +gcc (Ubuntu 7.2.0-1ubuntu1~16.04) 7.2.0 +Copyright (C) 2017 Free Software Foundation, Inc. +This is free software; see the source for copying conditions. There is NO +warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + + + + +Alex, no report is generated for my test setups - in fact, "make +coverage-report" even says that it explicitly deletes what appears to be the +main coverage report html file). + +This is the terminal output of an unsuccessful executions of "make +coverage-report" for recent ToT: + +~/Build/qemu-TOT-TEST$ make coverage-report +make[1]: Entering directory '/home/user/Build/qemu-TOT-TEST/slirp' +make[1]: Nothing to be done for 'all'. +make[1]: Leaving directory '/home/user/Build/qemu-TOT-TEST/slirp' + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: 'wrap.inc.c' +Makefile:1048: recipe for target +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' failed +make: *** +[/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html] Error 1 +make: *** Deleting file +'/home/user/Build/qemu-TOT-TEST/reports/coverage/coverage-report.html' + +This instance is executed in QEMU 3.0 source tree: (so, it looks the problem +existed for quite some time) + +~/Build/qemu-3.0$ make coverage-report + CHK version_gen.h + GEN coverage-report.html +Traceback (most recent call last): + File "/usr/bin/gcovr", line 1970, in <module> + print_html_report(covdata, options.html_details) + File "/usr/bin/gcovr", line 1473, in print_html_report + INPUT = open(data['FILENAME'], 'r') +IOError: [Errno 2] No such file or directory: +'/home/user/Build/qemu-3.0/target/openrisc/decode.inc.c' +Makefile:992: recipe for target +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' failed +make: *** [/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html] +Error 1 +make: *** Deleting file +'/home/user/Build/qemu-3.0/reports/coverage/coverage-report.html' + +Fond regards, +Aleksandar + + +> +Alex Bennée + +> +> #./configure --enable-gcov +> +> #make +> +> #make check +> +> #make coverage-report +> +> +> +> It seems that first three commands execute as expected. (For example, +> +> there are plenty of files generated by "make check" that would've not +> +> been generated if "enable-gcov" hadn't been chosen.) However, the +> +> last command complains about some missing files related to FP +> +So your failure mode is no report is generated at all? It's working for +> +me here. +Alex, here is the thing: + +Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from +git repo to the most recent 4.1 (actually, to a dev version, from the very tip +of the tree), and "make coverage-report" started generating coverage reports. +It did emit some error messages (totally different than previous), but still it +did not stop like it used to do with gcovr 3.2. + +Perhaps you would want to add some gcov/gcovr minimal version info in our docs. +(or at least a statement "this was tested with such and such gcc, gcov and +gcovr", etc.?) + +Coverage report looked fine at first glance, but it a kind of disappointed me +when I digged deeper into its content - for example, it shows very low coverage +for our FP code (softfloat), while, in fact, we know that "make check" contains +detailed tests on FP functionalities. But this is most likely a separate +problem of a very different nature, perhaps the issue of separate git repo for +FP tests (testfloat) that our FP tests use as a mid-layer. + +I'll try how everything works with my test examples, and will let you know. + +Your help is greatly appreciated, +Aleksandar + +Fond regards, +Aleksandar + + +> +Alex Bennée + +Aleksandar Markovic <address@hidden> writes: + +> +>> #./configure --enable-gcov +> +>> #make +> +>> #make check +> +>> #make coverage-report +> +>> +> +>> It seems that first three commands execute as expected. (For example, +> +>> there are plenty of files generated by "make check" that would've not +> +>> been generated if "enable-gcov" hadn't been chosen.) However, the +> +>> last command complains about some missing files related to FP +> +> +> So your failure mode is no report is generated at all? It's working for +> +> me here. +> +> +Alex, here is the thing: +> +> +Seeing that my gcovr is relatively old (2014) 3.2 version, I upgraded it from +> +git repo to the most recent 4.1 (actually, to a dev version, from the very +> +tip of the tree), and "make coverage-report" started generating coverage +> +reports. It did emit some error messages (totally different than previous), +> +but still it did not stop like it used to do with gcovr 3.2. +> +> +Perhaps you would want to add some gcov/gcovr minimal version info in our +> +docs. (or at least a statement "this was tested with such and such gcc, gcov +> +and gcovr", etc.?) +> +> +Coverage report looked fine at first glance, but it a kind of +> +disappointed me when I digged deeper into its content - for example, +> +it shows very low coverage for our FP code (softfloat), while, in +> +fact, we know that "make check" contains detailed tests on FP +> +functionalities. But this is most likely a separate problem of a very +> +different nature, perhaps the issue of separate git repo for FP tests +> +(testfloat) that our FP tests use as a mid-layer. +I get: + +68.6 % 2593 / 3782 62.2 % 1690 / 2718 + +Which is not bad considering we don't exercise the 80 and 128 bit +softfloat code at all (which is not shared by the re-factored 16/32/64 +bit code). + +> +> +I'll try how everything works with my test examples, and will let you know. +> +> +Your help is greatly appreciated, +> +Aleksandar +> +> +Fond regards, +> +Aleksandar +> +> +> +> Alex Bennée +-- +Alex Bennée + +> +> it shows very low coverage for our FP code (softfloat), while, in +> +> fact, we know that "make check" contains detailed tests on FP +> +> functionalities. But this is most likely a separate problem of a very +> +> different nature, perhaps the issue of separate git repo for FP tests +> +> (testfloat) that our FP tests use as a mid-layer. +> +> +I get: +> +> +68.6 % 2593 / 3782 62.2 % 1690 / 2718 +> +I would expect that kind of result too. + +However, I get: + +File: fpu/softfloat.c Lines: 8 3334 0.2 % +Date: 2019-08-05 19:56:58 Branches: 3 2376 0.1 % + +:( + +OK, I'll try to figure that out, and most likely I could live with it if it is +an isolated problem. + +Thank you for your assistance in this matter, +Aleksandar + +> +Which is not bad considering we don't exercise the 80 and 128 bit +> +softfloat code at all (which is not shared by the re-factored 16/32/64 +> +bit code). +> +> +Alex Bennée + +> +> it shows very low coverage for our FP code (softfloat), while, in +> +> fact, we know that "make check" contains detailed tests on FP +> +> functionalities. But this is most likely a separate problem of a very +> +> different nature, perhaps the issue of separate git repo for FP tests +> +> (testfloat) that our FP tests use as a mid-layer. +> +> +I get: +> +> +68.6 % 2593 / 3782 62.2 % 1690 / 2718 +> +This problem is solved too. (and it is my fault) + +I worked with multiple versions of QEMU, and my previous low-coverage results +were for QEMU 3.0, and for that version the directory tests/fp did not even +exist. :D (<blush>) + +For QEMU ToT, I get now: + +fpu/softfloat.c + 68.8 % 2592 / 3770 62.3 % 1693 / 2718 + +which is identical for all intents and purposes to your result. + +Yours cordially, +Aleksandar + diff --git a/classification_output/01/other/6416205 b/classification_output/01/other/6416205 new file mode 100644 index 000000000..c32e40433 --- /dev/null +++ b/classification_output/01/other/6416205 @@ -0,0 +1,61 @@ +other: 0.776 +instruction: 0.713 +mistranslation: 0.699 +semantic: 0.662 + +[BUG] scsi: vmw_pvscsi: Boot hangs during scsi under qemu, post commit e662502b3a78 + +Hi, + +Commit e662502b3a78 ("scsi: vmw_pvscsi: Set correct residual data length"), +and its backports to stable trees, makes kernel hang during boot, when +ran as a VM under qemu with following parameters: + + -drive file=$DISKFILE,if=none,id=sda + -device pvscsi + -device scsi-hd,bus=scsi.0,drive=sda + +Diving deeper, commit e662502b3a78 + + @@ -585,7 +585,13 @@ static void pvscsi_complete_request(struct +pvscsi_adapter *adapter, + case BTSTAT_SUCCESS: + + /* + + * Commands like INQUIRY may transfer less data than + + * requested by the initiator via bufflen. Set residual + + * count to make upper layer aware of the actual amount + + * of data returned. + + */ + + scsi_set_resid(cmd, scsi_bufflen(cmd) - e->dataLen); + +assumes 'e->dataLen' is properly armed with actual num of bytes +transferred; alas qemu's hw/scsi/vmw_pvscsi.c never arms the 'dataLen' +field of the completion descriptor (kept zero). + +As a result, the residual count is set as the *entire* 'scsi_bufflen' of a +good transfer, which makes upper scsi layers repeatedly ignore this +valid transfer. + +Not properly arming 'dataLen' seems as an oversight in qemu, which needs +to be fixed. + +However, since kernels with commit e662502b3a78 (and backports) now fail +to boot under qemu's "-device pvscsi", a suggested workaround is to set +the residual count *only* if 'e->dataLen' is armed, e.g: + + @@ -588,7 +588,8 @@ static void pvscsi_complete_request(struct pvscsi_adapter +*adapter, + * count to make upper layer aware of the actual +amount + * of data returned. + */ + - scsi_set_resid(cmd, scsi_bufflen(cmd) - e->dataLen); + + if (e->dataLen) + + scsi_set_resid(cmd, scsi_bufflen(cmd) - +e->dataLen); + +in order to make kernels boot on old qemu binaries. + +Best, +Shmulik + diff --git a/classification_output/01/other/6531392 b/classification_output/01/other/6531392 new file mode 100644 index 000000000..73752120b --- /dev/null +++ b/classification_output/01/other/6531392 @@ -0,0 +1,293 @@ +other: 0.734 +mistranslation: 0.649 +instruction: 0.581 +semantic: 0.577 + +[RESEND][BUG FIX HELP] QEMU main thread endlessly hangs in __ppoll() + +Hi Genius, +I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may still +exist in the mainline. +Thanks in advance to heroes who can take a look and share understanding. + +The qemu main thread endlessly hangs in the handle of the qmp statement: +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': +'drive_del replication0' } } +and we have the call trace looks like: +#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1, +timeout=<optimized out>, timeout@entry=0x7ffc56c66db0, +sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 +#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0, +__nfds=<optimized out>, __fds=<optimized out>) +at /usr/include/x86_64-linux-gnu/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, +timeout=<optimized out>) at util/qemu-timer.c:348 +#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0, +blocking=blocking@entry=true) at util/aio-posix.c:669 +#4 0x000055561019268d in bdrv_do_drained_begin (poll=true, +ignore_bds_parents=false, parent=0x0, recursive=false, +bs=0x55561138b0a0) at block/io.c:430 +#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>, +parent=0x0, ignore_bds_parents=<optimized out>, +poll=<optimized out>) at block/io.c:396 +#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0, +child=0x7f36dc0ce380, errp=<optimized out>) +at block/quorum.c:1063 +#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120 +"colo-disk0", has_child=<optimized out>, +child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0, +errp=0x7ffc56c66f98) at blockdev.c:4494 +#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized +out>, ret=<optimized out>, errp=0x7ffc56c67018) +at qapi/qapi-commands-block-core.c:1538 +#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010, +allow_oob=<optimized out>, request=<optimized out>, +cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132 +#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized +out>, allow_oob=<optimized out>) +at qapi/qmp-dispatch.c:175 +#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40, +req=<optimized out>) at monitor/qmp.c:145 +#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized out>) +at monitor/qmp.c:234 +#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at +util/async.c:117 +#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117 +#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at +util/aio-posix.c:459 +#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>, +callback=<optimized out>, user_data=<optimized out>) +at util/async.c:260 +#17 0x00007f3c22302fbd in g_main_context_dispatch () from +/lib/x86_64-linux-gnu/libglib-2.0.so.0 +#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219 +#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 +#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 +#21 0x000055560ff600fe in main_loop () at vl.c:1814 +#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized out>, +envp=<optimized out>) at vl.c:4503 +We found that we're doing endless check in the line of +block/io.c:bdrv_do_drained_begin(): +BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent)); +and it turns out that the bdrv_drain_poll() always get true from: +- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents) +- AND atomic_read(&bs->in_flight) + +I personally think this is a deadlock issue in the a QEMU block layer +(as we know, we have some #FIXME comments in related codes, such as block +permisson update). +Any comments are welcome and appreciated. + +--- +thx,likexu + +On 2/28/21 9:39 PM, Like Xu wrote: +Hi Genius, +I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may +still exist in the mainline. +Thanks in advance to heroes who can take a look and share understanding. +Do you have a test case that reproduces on 5.2? It'd be nice to know if +it was still a problem in the latest source tree or not. +--js +The qemu main thread endlessly hangs in the handle of the qmp statement: +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': +'drive_del replication0' } } +and we have the call trace looks like: +#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1, +timeout=<optimized out>, timeout@entry=0x7ffc56c66db0, +sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 +#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0, +__nfds=<optimized out>, __fds=<optimized out>) +at /usr/include/x86_64-linux-gnu/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, +timeout=<optimized out>) at util/qemu-timer.c:348 +#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0, +blocking=blocking@entry=true) at util/aio-posix.c:669 +#4 0x000055561019268d in bdrv_do_drained_begin (poll=true, +ignore_bds_parents=false, parent=0x0, recursive=false, +bs=0x55561138b0a0) at block/io.c:430 +#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>, +parent=0x0, ignore_bds_parents=<optimized out>, +poll=<optimized out>) at block/io.c:396 +#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0, +child=0x7f36dc0ce380, errp=<optimized out>) +at block/quorum.c:1063 +#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120 +"colo-disk0", has_child=<optimized out>, +child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0, +errp=0x7ffc56c66f98) at blockdev.c:4494 +#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized +out>, ret=<optimized out>, errp=0x7ffc56c67018) +at qapi/qapi-commands-block-core.c:1538 +#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010, +allow_oob=<optimized out>, request=<optimized out>, +cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132 +#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized +out>, allow_oob=<optimized out>) +at qapi/qmp-dispatch.c:175 +#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40, +req=<optimized out>) at monitor/qmp.c:145 +#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized +out>) at monitor/qmp.c:234 +#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at +util/async.c:117 +#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117 +#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at +util/aio-posix.c:459 +#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>, +callback=<optimized out>, user_data=<optimized out>) +at util/async.c:260 +#17 0x00007f3c22302fbd in g_main_context_dispatch () from +/lib/x86_64-linux-gnu/libglib-2.0.so.0 +#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219 +#19 os_host_main_loop_wait (timeout=<optimized out>) at +util/main-loop.c:242 +#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 +#21 0x000055560ff600fe in main_loop () at vl.c:1814 +#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized +out>, envp=<optimized out>) at vl.c:4503 +We found that we're doing endless check in the line of +block/io.c:bdrv_do_drained_begin(): +    BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent)); +and it turns out that the bdrv_drain_poll() always get true from: +- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents) +- AND atomic_read(&bs->in_flight) + +I personally think this is a deadlock issue in the a QEMU block layer +(as we know, we have some #FIXME comments in related codes, such as +block permisson update). +Any comments are welcome and appreciated. + +--- +thx,likexu + +Hi John, + +Thanks for your comment. + +On 2021/3/5 7:53, John Snow wrote: +On 2/28/21 9:39 PM, Like Xu wrote: +Hi Genius, +I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may +still exist in the mainline. +Thanks in advance to heroes who can take a look and share understanding. +Do you have a test case that reproduces on 5.2? It'd be nice to know if it +was still a problem in the latest source tree or not. +We narrowed down the source of the bug, which basically came from +the following qmp usage: +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': +'drive_del replication0' } } +One of the test cases is the COLO usage (docs/colo-proxy.txt). + +This issue is sporadic,the probability may be 1/15 for a io-heavy guest. + +I believe it's reproducible on 5.2 and the latest tree. +--js +The qemu main thread endlessly hangs in the handle of the qmp statement: +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': +'drive_del replication0' } } +and we have the call trace looks like: +#0 0x00007f3c22045bf6 in __ppoll (fds=0x555611328410, nfds=1, +timeout=<optimized out>, timeout@entry=0x7ffc56c66db0, +sigmask=sigmask@entry=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44 +#1 0x000055561021f415 in ppoll (__ss=0x0, __timeout=0x7ffc56c66db0, +__nfds=<optimized out>, __fds=<optimized out>) +at /usr/include/x86_64-linux-gnu/bits/poll2.h:77 +#2 qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, +timeout=<optimized out>) at util/qemu-timer.c:348 +#3 0x0000555610221430 in aio_poll (ctx=ctx@entry=0x5556113010f0, +blocking=blocking@entry=true) at util/aio-posix.c:669 +#4 0x000055561019268d in bdrv_do_drained_begin (poll=true, +ignore_bds_parents=false, parent=0x0, recursive=false, +bs=0x55561138b0a0) at block/io.c:430 +#5 bdrv_do_drained_begin (bs=0x55561138b0a0, recursive=<optimized out>, +parent=0x0, ignore_bds_parents=<optimized out>, +poll=<optimized out>) at block/io.c:396 +#6 0x000055561017b60b in quorum_del_child (bs=0x55561138b0a0, +child=0x7f36dc0ce380, errp=<optimized out>) +at block/quorum.c:1063 +#7 0x000055560ff5836b in qmp_x_blockdev_change (parent=0x555612373120 +"colo-disk0", has_child=<optimized out>, +child=0x5556112df3e0 "children.1", has_node=<optimized out>, node=0x0, +errp=0x7ffc56c66f98) at blockdev.c:4494 +#8 0x00005556100f8f57 in qmp_marshal_x_blockdev_change (args=<optimized +out>, ret=<optimized out>, errp=0x7ffc56c67018) +at qapi/qapi-commands-block-core.c:1538 +#9 0x00005556101d8290 in do_qmp_dispatch (errp=0x7ffc56c67010, +allow_oob=<optimized out>, request=<optimized out>, +cmds=0x5556109c69a0 <qmp_commands>) at qapi/qmp-dispatch.c:132 +#10 qmp_dispatch (cmds=0x5556109c69a0 <qmp_commands>, request=<optimized +out>, allow_oob=<optimized out>) +at qapi/qmp-dispatch.c:175 +#11 0x00005556100d4c4d in monitor_qmp_dispatch (mon=0x5556113a6f40, +req=<optimized out>) at monitor/qmp.c:145 +#12 0x00005556100d5437 in monitor_qmp_bh_dispatcher (data=<optimized +out>) at monitor/qmp.c:234 +#13 0x000055561021dbec in aio_bh_call (bh=0x5556112164bGrateful0) at +util/async.c:117 +#14 aio_bh_poll (ctx=ctx@entry=0x5556112151b0) at util/async.c:117 +#15 0x00005556102212c4 in aio_dispatch (ctx=0x5556112151b0) at +util/aio-posix.c:459 +#16 0x000055561021dab2 in aio_ctx_dispatch (source=<optimized out>, +callback=<optimized out>, user_data=<optimized out>) +at util/async.c:260 +#17 0x00007f3c22302fbd in g_main_context_dispatch () from +/lib/x86_64-linux-gnu/libglib-2.0.so.0 +#18 0x0000555610220358 in glib_pollfds_poll () at util/main-loop.c:219 +#19 os_host_main_loop_wait (timeout=<optimized out>) at util/main-loop.c:242 +#20 main_loop_wait (nonblocking=<optimized out>) at util/main-loop.c:518 +#21 0x000055560ff600fe in main_loop () at vl.c:1814 +#22 0x000055560fddbce9 in main (argc=<optimized out>, argv=<optimized +out>, envp=<optimized out>) at vl.c:4503 +We found that we're doing endless check in the line of +block/io.c:bdrv_do_drained_begin(): +     BDRV_POLL_WHILE(bs, bdrv_drain_poll_top_level(bs, recursive, parent)); +and it turns out that the bdrv_drain_poll() always get true from: +- bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents) +- AND atomic_read(&bs->in_flight) + +I personally think this is a deadlock issue in the a QEMU block layer +(as we know, we have some #FIXME comments in related codes, such as block +permisson update). +Any comments are welcome and appreciated. + +--- +thx,likexu + +On 3/4/21 10:08 PM, Like Xu wrote: +Hi John, + +Thanks for your comment. + +On 2021/3/5 7:53, John Snow wrote: +On 2/28/21 9:39 PM, Like Xu wrote: +Hi Genius, +I am a user of QEMU v4.2.0 and stuck in an interesting bug, which may +still exist in the mainline. +Thanks in advance to heroes who can take a look and share understanding. +Do you have a test case that reproduces on 5.2? It'd be nice to know +if it was still a problem in the latest source tree or not. +We narrowed down the source of the bug, which basically came from +the following qmp usage: +{'execute': 'human-monitor-command', 'arguments':{ 'command-line': +'drive_del replication0' } } +One of the test cases is the COLO usage (docs/colo-proxy.txt). + +This issue is sporadic,the probability may be 1/15 for a io-heavy guest. + +I believe it's reproducible on 5.2 and the latest tree. +Can you please test and confirm that this is the case, and then file a +bug report on the LP: +https://launchpad.net/qemu +and include: +- The exact commit you used (current origin/master debug build would be +the most ideal.) +- Which QEMU binary you are using (qemu-system-x86_64?) +- The shortest command line you are aware of that reproduces the problem +- The host OS and kernel version +- An updated call trace +- Any relevant commands issued prior to the one that caused the hang; or +detailed reproduction steps if possible. +Thanks, +--js + diff --git a/classification_output/01/other/6739993 b/classification_output/01/other/6739993 new file mode 100644 index 000000000..795a7c15f --- /dev/null +++ b/classification_output/01/other/6739993 @@ -0,0 +1,265 @@ +other: 0.990 +semantic: 0.987 +instruction: 0.983 +mistranslation: 0.982 + +[BUG REPORT] cxl process in infinity loop + +Hi, all + +When I did the cxl memory hot-plug test on QEMU, I accidentally connected +two memdev to the same downstream port, the command like below: + +> +-object memory-backend-ram,size=262144k,share=on,id=vmem0 \ +> +-object memory-backend-ram,size=262144k,share=on,id=vmem1 \ +> +-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> +-device cxl-upstream,bus=root_port0,id=us0 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ +same downstream port but has different slot! + +> +-device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ +> +-device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ +> +-M +> +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k +> +\ +There is no error occurred when vm start, but when I executed the âcxl listâ +command to view +the CXL objects info, the process can not end properly. + +Then I used strace to trace the process, I found that the process is in +infinity loop: +# strace cxl list +...... +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 + +[Environment]: +linux: V6.10-rc3 +QEMU: V9.0.0 +ndctl: v79 + +I know this is because of the wrong use of the QEMU command, but I think we +should +be aware of this error in one of the QEMU, OS or ndctl side at least. + +Thanks +Xingtao + +On Tue, 2 Jul 2024 00:30:06 +0000 +"Xingtao Yao (Fujitsu)" <yaoxt.fnst@fujitsu.com> wrote: + +> +Hi, all +> +> +When I did the cxl memory hot-plug test on QEMU, I accidentally connected +> +two memdev to the same downstream port, the command like below: +> +> +> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \ +> +> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \ +> +> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> +> -device cxl-upstream,bus=root_port0,id=us0 \ +> +> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ +> +> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ +> +same downstream port but has different slot! +> +> +> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ +> +> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ +> +> -M +> +> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k +> +> \ +> +> +There is no error occurred when vm start, but when I executed the âcxl listâ +> +command to view +> +the CXL objects info, the process can not end properly. +I'd be happy to look preventing this on QEMU side if you send one, +but in general there are are lots of ways to shoot yourself in the +foot with CXL and PCI device emulation in QEMU so I'm not going +to rush to solve this specific one. + +Likewise, some hardening in kernel / userspace probably makes sense but +this is a non compliant switch so priority of a fix is probably fairly low. + +Jonathan + +> +> +Then I used strace to trace the process, I found that the process is in +> +infinity loop: +> +# strace cxl list +> +...... +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +> +[Environment]: +> +linux: V6.10-rc3 +> +QEMU: V9.0.0 +> +ndctl: v79 +> +> +I know this is because of the wrong use of the QEMU command, but I think we +> +should +> +be aware of this error in one of the QEMU, OS or ndctl side at least. +> +> +Thanks +> +Xingtao + diff --git a/classification_output/01/other/6983580 b/classification_output/01/other/6983580 new file mode 100644 index 000000000..b0e56b614 --- /dev/null +++ b/classification_output/01/other/6983580 @@ -0,0 +1,429 @@ +other: 0.940 +instruction: 0.920 +semantic: 0.917 +mistranslation: 0.882 + +[Bug Report] Possible Missing Endianness Conversion + +The virtio packed virtqueue support patch[1] suggests converting +endianness by lines: + +virtio_tswap16s(vdev, &e->off_wrap); +virtio_tswap16s(vdev, &e->flags); + +Though both of these conversion statements aren't present in the +latest qemu code here[2] + +Is this intentional? + +[1]: +https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html +[2]: +https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314 + +CCing Jason. + +On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote: +> +> +The virtio packed virtqueue support patch[1] suggests converting +> +endianness by lines: +> +> +virtio_tswap16s(vdev, &e->off_wrap); +> +virtio_tswap16s(vdev, &e->flags); +> +> +Though both of these conversion statements aren't present in the +> +latest qemu code here[2] +> +> +Is this intentional? +Good catch! + +It looks like it was removed (maybe by mistake) by commit +d152cdd6f6 ("virtio: use virtio accessor to access packed event") + +Jason can you confirm that? + +Thanks, +Stefano + +> +> +[1]: +https://mail.gnu.org/archive/html/qemu-block/2019-10/msg01492.html +> +[2]: +https://elixir.bootlin.com/qemu/latest/source/hw/virtio/virtio.c#L314 +> + +On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote: +> +> +CCing Jason. +> +> +On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote: +> +> +> +> The virtio packed virtqueue support patch[1] suggests converting +> +> endianness by lines: +> +> +> +> virtio_tswap16s(vdev, &e->off_wrap); +> +> virtio_tswap16s(vdev, &e->flags); +> +> +> +> Though both of these conversion statements aren't present in the +> +> latest qemu code here[2] +> +> +> +> Is this intentional? +> +> +Good catch! +> +> +It looks like it was removed (maybe by mistake) by commit +> +d152cdd6f6 ("virtio: use virtio accessor to access packed event") +That commit changes from: + +- address_space_read_cached(cache, off_off, &e->off_wrap, +- sizeof(e->off_wrap)); +- virtio_tswap16s(vdev, &e->off_wrap); + +which does a byte read of 2 bytes and then swaps the bytes +depending on the host endianness and the value of +virtio_access_is_big_endian() + +to this: + ++ e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); + +virtio_lduw_phys_cached() is a small function which calls +either lduw_be_phys_cached() or lduw_le_phys_cached() +depending on the value of virtio_access_is_big_endian(). +(And lduw_be_phys_cached() and lduw_le_phys_cached() do +the right thing for the host-endianness to do a "load +a specifically big or little endian 16-bit value".) + +Which is to say that because we use a load/store function that's +explicit about the size of the data type it is accessing, the +function itself can handle doing the load as big or little +endian, rather than the calling code having to do a manual swap after +it has done a load-as-bag-of-bytes. This is generally preferable +as it's less error-prone. + +(Explicit swap-after-loading still has a place where the +code is doing a load of a whole structure out of the +guest and then swapping each struct field after the fact, +because it means we can do a single load-from-guest-memory +rather than a whole sequence of calls all the way down +through the memory subsystem.) + +thanks +-- PMM + +On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote: +On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote: +CCing Jason. + +On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote: +> +> The virtio packed virtqueue support patch[1] suggests converting +> endianness by lines: +> +> virtio_tswap16s(vdev, &e->off_wrap); +> virtio_tswap16s(vdev, &e->flags); +> +> Though both of these conversion statements aren't present in the +> latest qemu code here[2] +> +> Is this intentional? + +Good catch! + +It looks like it was removed (maybe by mistake) by commit +d152cdd6f6 ("virtio: use virtio accessor to access packed event") +That commit changes from: + +- address_space_read_cached(cache, off_off, &e->off_wrap, +- sizeof(e->off_wrap)); +- virtio_tswap16s(vdev, &e->off_wrap); + +which does a byte read of 2 bytes and then swaps the bytes +depending on the host endianness and the value of +virtio_access_is_big_endian() + +to this: + ++ e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); + +virtio_lduw_phys_cached() is a small function which calls +either lduw_be_phys_cached() or lduw_le_phys_cached() +depending on the value of virtio_access_is_big_endian(). +(And lduw_be_phys_cached() and lduw_le_phys_cached() do +the right thing for the host-endianness to do a "load +a specifically big or little endian 16-bit value".) + +Which is to say that because we use a load/store function that's +explicit about the size of the data type it is accessing, the +function itself can handle doing the load as big or little +endian, rather than the calling code having to do a manual swap after +it has done a load-as-bag-of-bytes. This is generally preferable +as it's less error-prone. +Thanks for the details! + +So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ? + +I mean: +diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c +index 893a072c9d..2e5e67bdb9 100644 +--- a/hw/virtio/virtio.c ++++ b/hw/virtio/virtio.c +@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev, + /* Make sure flags is seen before off_wrap */ + smp_rmb(); + e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); +- virtio_tswap16s(vdev, &e->flags); + } + + static void vring_packed_off_wrap_write(VirtIODevice *vdev, + +Thanks, +Stefano +(Explicit swap-after-loading still has a place where the +code is doing a load of a whole structure out of the +guest and then swapping each struct field after the fact, +because it means we can do a single load-from-guest-memory +rather than a whole sequence of calls all the way down +through the memory subsystem.) + +thanks +-- PMM + +On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote: +> +> +On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote: +> +>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote: +> +>> +> +>> CCing Jason. +> +>> +> +>> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote: +> +>> > +> +>> > The virtio packed virtqueue support patch[1] suggests converting +> +>> > endianness by lines: +> +>> > +> +>> > virtio_tswap16s(vdev, &e->off_wrap); +> +>> > virtio_tswap16s(vdev, &e->flags); +> +>> > +> +>> > Though both of these conversion statements aren't present in the +> +>> > latest qemu code here[2] +> +>> > +> +>> > Is this intentional? +> +>> +> +>> Good catch! +> +>> +> +>> It looks like it was removed (maybe by mistake) by commit +> +>> d152cdd6f6 ("virtio: use virtio accessor to access packed event") +> +> +> +>That commit changes from: +> +> +> +>- address_space_read_cached(cache, off_off, &e->off_wrap, +> +>- sizeof(e->off_wrap)); +> +>- virtio_tswap16s(vdev, &e->off_wrap); +> +> +> +>which does a byte read of 2 bytes and then swaps the bytes +> +>depending on the host endianness and the value of +> +>virtio_access_is_big_endian() +> +> +> +>to this: +> +> +> +>+ e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); +> +> +> +>virtio_lduw_phys_cached() is a small function which calls +> +>either lduw_be_phys_cached() or lduw_le_phys_cached() +> +>depending on the value of virtio_access_is_big_endian(). +> +>(And lduw_be_phys_cached() and lduw_le_phys_cached() do +> +>the right thing for the host-endianness to do a "load +> +>a specifically big or little endian 16-bit value".) +> +> +> +>Which is to say that because we use a load/store function that's +> +>explicit about the size of the data type it is accessing, the +> +>function itself can handle doing the load as big or little +> +>endian, rather than the calling code having to do a manual swap after +> +>it has done a load-as-bag-of-bytes. This is generally preferable +> +>as it's less error-prone. +> +> +Thanks for the details! +> +> +So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ? +> +> +I mean: +> +diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c +> +index 893a072c9d..2e5e67bdb9 100644 +> +--- a/hw/virtio/virtio.c +> ++++ b/hw/virtio/virtio.c +> +@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev, +> +/* Make sure flags is seen before off_wrap */ +> +smp_rmb(); +> +e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); +> +- virtio_tswap16s(vdev, &e->flags); +> +} +That definitely looks like it's probably not correct... + +-- PMM + +On Fri, Jun 28, 2024 at 03:53:09PM GMT, Peter Maydell wrote: +On Tue, 25 Jun 2024 at 08:18, Stefano Garzarella <sgarzare@redhat.com> wrote: +On Mon, Jun 24, 2024 at 04:19:52PM GMT, Peter Maydell wrote: +>On Mon, 24 Jun 2024 at 16:11, Stefano Garzarella <sgarzare@redhat.com> wrote: +>> +>> CCing Jason. +>> +>> On Mon, Jun 24, 2024 at 4:30â¯PM Xoykie <xoykie@gmail.com> wrote: +>> > +>> > The virtio packed virtqueue support patch[1] suggests converting +>> > endianness by lines: +>> > +>> > virtio_tswap16s(vdev, &e->off_wrap); +>> > virtio_tswap16s(vdev, &e->flags); +>> > +>> > Though both of these conversion statements aren't present in the +>> > latest qemu code here[2] +>> > +>> > Is this intentional? +>> +>> Good catch! +>> +>> It looks like it was removed (maybe by mistake) by commit +>> d152cdd6f6 ("virtio: use virtio accessor to access packed event") +> +>That commit changes from: +> +>- address_space_read_cached(cache, off_off, &e->off_wrap, +>- sizeof(e->off_wrap)); +>- virtio_tswap16s(vdev, &e->off_wrap); +> +>which does a byte read of 2 bytes and then swaps the bytes +>depending on the host endianness and the value of +>virtio_access_is_big_endian() +> +>to this: +> +>+ e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); +> +>virtio_lduw_phys_cached() is a small function which calls +>either lduw_be_phys_cached() or lduw_le_phys_cached() +>depending on the value of virtio_access_is_big_endian(). +>(And lduw_be_phys_cached() and lduw_le_phys_cached() do +>the right thing for the host-endianness to do a "load +>a specifically big or little endian 16-bit value".) +> +>Which is to say that because we use a load/store function that's +>explicit about the size of the data type it is accessing, the +>function itself can handle doing the load as big or little +>endian, rather than the calling code having to do a manual swap after +>it has done a load-as-bag-of-bytes. This is generally preferable +>as it's less error-prone. + +Thanks for the details! + +So, should we also remove `virtio_tswap16s(vdev, &e->flags);` ? + +I mean: +diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c +index 893a072c9d..2e5e67bdb9 100644 +--- a/hw/virtio/virtio.c ++++ b/hw/virtio/virtio.c +@@ -323,7 +323,6 @@ static void vring_packed_event_read(VirtIODevice *vdev, + /* Make sure flags is seen before off_wrap */ + smp_rmb(); + e->off_wrap = virtio_lduw_phys_cached(vdev, cache, off_off); +- virtio_tswap16s(vdev, &e->flags); + } +That definitely looks like it's probably not correct... +Yeah, I just sent that patch: +20240701075208.19634-1-sgarzare@redhat.com +">https://lore.kernel.org/qemu-devel/ +20240701075208.19634-1-sgarzare@redhat.com +We can continue the discussion there. + +Thanks, +Stefano + diff --git a/classification_output/01/other/6998781 b/classification_output/01/other/6998781 new file mode 100644 index 000000000..98a199ee9 --- /dev/null +++ b/classification_output/01/other/6998781 @@ -0,0 +1,1077 @@ +other: 0.892 +instruction: 0.842 +mistranslation: 0.842 +semantic: 0.825 + +[Qemu-devel] [BUG/RFC] Two cpus are not brought up normally in SLES11 sp3 VM after reboot + +Hi, + +Recently we encountered a problem in our project: 2 CPUs in VM are not brought +up normally after reboot. + +Our host is using KVM kmod 3.6 and QEMU 2.1. +A SLES 11 sp3 VM configured with 8 vcpus, +cpu model is configured with 'host-passthrough'. + +After VM's first time started up, everything seems to be OK. +and then VM is paniced and rebooted. +After reboot, only 6 cpus are brought up in VM, cpu1 and cpu7 are not online. + +This is the only message we can get from VM: +VM dmesg shows: +[ 0.069867] Booting Node 0, Processors #1 +[ 5.060042] CPU1: Stuck ?? +[ 5.060499] #2 +[ 5.088322] kvm-clock: cpu 2, msr 6:3fc90901, secondary cpu clock +[ 5.088335] KVM setup async PF for cpu 2 +[ 5.092967] NMI watchdog enabled, takes one hw-pmu counter. +[ 5.094405] #3 +[ 5.108324] kvm-clock: cpu 3, msr 6:3fcd0901, secondary cpu clock +[ 5.108333] KVM setup async PF for cpu 3 +[ 5.113553] NMI watchdog enabled, takes one hw-pmu counter. +[ 5.114970] #4 +[ 5.128325] kvm-clock: cpu 4, msr 6:3fd10901, secondary cpu clock +[ 5.128336] KVM setup async PF for cpu 4 +[ 5.134576] NMI watchdog enabled, takes one hw-pmu counter. +[ 5.135998] #5 +[ 5.152324] kvm-clock: cpu 5, msr 6:3fd50901, secondary cpu clock +[ 5.152334] KVM setup async PF for cpu 5 +[ 5.154764] NMI watchdog enabled, takes one hw-pmu counter. +[ 5.156467] #6 +[ 5.172327] kvm-clock: cpu 6, msr 6:3fd90901, secondary cpu clock +[ 5.172341] KVM setup async PF for cpu 6 +[ 5.180738] NMI watchdog enabled, takes one hw-pmu counter. +[ 5.182173] #7 Ok. +[ 10.170815] CPU7: Stuck ?? +[ 10.171648] Brought up 6 CPUs +[ 10.172394] Total of 6 processors activated (28799.97 BogoMIPS). + +From host, we found that QEMU vcpu1 thread and vcpu7 thread were not consuming +any cpu (Should be in idle state), +All of VCPUs' stacks in host is like bellow: + +[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +[<ffffffff81468092>] system_call_fastpath+0x16/0x1b +[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +[<ffffffffffffffff>] 0xffffffffffffffff + +We looked into the kernel codes that could leading to the above 'Stuck' warning, +and found that the only possible is the emulation of 'cpuid' instruct in +kvm/qemu has something wrong. +But since we canât reproduce this problem, we are not quite sure. +Is there any possible that the cupid emulation in kvm/qemu has some bug ? + +Has anyone come across these problem before? Or any idea? + +Thanks, +zhanghailiang + +On 06/07/2015 09:54, zhanghailiang wrote: +> +> +From host, we found that QEMU vcpu1 thread and vcpu7 thread were not +> +consuming any cpu (Should be in idle state), +> +All of VCPUs' stacks in host is like bellow: +> +> +[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +> +[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +> +[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +> +[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +> +[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +> +[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +> +[<ffffffff81468092>] system_call_fastpath+0x16/0x1b +> +[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +> +[<ffffffffffffffff>] 0xffffffffffffffff +> +> +We looked into the kernel codes that could leading to the above 'Stuck' +> +warning, +> +and found that the only possible is the emulation of 'cpuid' instruct in +> +kvm/qemu has something wrong. +> +But since we canât reproduce this problem, we are not quite sure. +> +Is there any possible that the cupid emulation in kvm/qemu has some bug ? +Can you explain the relationship to the cpuid emulation? What do the +traces say about vcpus 1 and 7? + +Paolo + +On 2015/7/6 16:45, Paolo Bonzini wrote: +On 06/07/2015 09:54, zhanghailiang wrote: +From host, we found that QEMU vcpu1 thread and vcpu7 thread were not +consuming any cpu (Should be in idle state), +All of VCPUs' stacks in host is like bellow: + +[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +[<ffffffff81468092>] system_call_fastpath+0x16/0x1b +[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +[<ffffffffffffffff>] 0xffffffffffffffff + +We looked into the kernel codes that could leading to the above 'Stuck' +warning, +and found that the only possible is the emulation of 'cpuid' instruct in +kvm/qemu has something wrong. +But since we canât reproduce this problem, we are not quite sure. +Is there any possible that the cupid emulation in kvm/qemu has some bug ? +Can you explain the relationship to the cpuid emulation? What do the +traces say about vcpus 1 and 7? +OK, we searched the VM's kernel codes with the 'Stuck' message, and it is +located in +do_boot_cpu(). It's in BSP context, the call process is: +BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() +-> wakeup_secondary_via_INIT() to trigger APs. +It will wait 5s for APs to startup, if some AP not startup normally, it will +print 'CPU%d Stuck' or 'CPU%d: Not responding'. + +If it prints 'Stuck', it means the AP has received the SIPI interrupt and +begins to execute the code +'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before +smp_callin()(smpboot.c). +The follow is the starup process of BSP and AP. +BSP: +start_kernel() + ->smp_init() + ->smp_boot_cpus() + ->do_boot_cpu() + ->start_ip = trampoline_address(); //set the address that AP will go +to execute + ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU + ->for (timeout = 0; timeout < 50000; timeout++) + if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if AP +startup or not + +APs: +ENTRY(trampoline_data) (trampoline_64.S) + ->ENTRY(secondary_startup_64) (head_64.S) + ->start_secondary() (smpboot.c) + ->cpu_init(); + ->smp_callin(); + ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP comes +here, the BSP will not prints the error message. + +From above call process, we can be sure that, the AP has been stuck between +trampoline_data and the cpumask_set_cpu() in +smp_callin(), we look through these codes path carefully, and only found a +'hlt' instruct that could block the process. +It is located in trampoline_data(): + +ENTRY(trampoline_data) + ... + + call verify_cpu # Verify the cpu supports long mode + testl %eax, %eax # Check for return code + jnz no_longmode + + ... + +no_longmode: + hlt + jmp no_longmode + +For the process verify_cpu(), +we can only find the 'cpuid' sensitive instruct that could lead VM exit from +No-root mode. +This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to +the fail in verify_cpu. + +From the message in VM, we know vcpu1 and vcpu7 is something wrong. +[ 5.060042] CPU1: Stuck ?? +[ 10.170815] CPU7: Stuck ?? +[ 10.171648] Brought up 6 CPUs + +Besides, the follow is the cpus message got from host. +80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command +instance-0000000 +* CPU #0: pc=0x00007f64160c683d thread_id=68570 + CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 + CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 + CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 + CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 + CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 + CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 + CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 + +Oh, i also forgot to mention in the above message that, we have bond each vCPU +to different physical CPU in +host. + +Thanks, +zhanghailiang + +On 06/07/2015 11:59, zhanghailiang wrote: +> +> +> +Besides, the follow is the cpus message got from host. +> +80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh +> +qemu-monitor-command instance-0000000 +> +* CPU #0: pc=0x00007f64160c683d thread_id=68570 +> +CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 +> +CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 +> +CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 +> +CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 +> +CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 +> +CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 +> +CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 +> +> +Oh, i also forgot to mention in the above message that, we have bond +> +each vCPU to different physical CPU in +> +host. +Can you capture a trace on the host (trace-cmd record -e kvm) and send +it privately? Please note which CPUs get stuck, since I guess it's not +always 1 and 7. + +Paolo + +On Mon, 6 Jul 2015 17:59:10 +0800 +zhanghailiang <address@hidden> wrote: + +> +On 2015/7/6 16:45, Paolo Bonzini wrote: +> +> +> +> +> +> On 06/07/2015 09:54, zhanghailiang wrote: +> +>> +> +>> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not +> +>> consuming any cpu (Should be in idle state), +> +>> All of VCPUs' stacks in host is like bellow: +> +>> +> +>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +> +>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +> +>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +> +>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +> +>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +> +>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +> +>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b +> +>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +> +>> [<ffffffffffffffff>] 0xffffffffffffffff +> +>> +> +>> We looked into the kernel codes that could leading to the above 'Stuck' +> +>> warning, +in current upstream there isn't any printk(...Stuck...) left since that code +path +has been reworked. +I've often seen this on over-committed host during guest CPUs up/down torture +test. +Could you update guest kernel to upstream and see if issue reproduces? + +> +>> and found that the only possible is the emulation of 'cpuid' instruct in +> +>> kvm/qemu has something wrong. +> +>> But since we canât reproduce this problem, we are not quite sure. +> +>> Is there any possible that the cupid emulation in kvm/qemu has some bug ? +> +> +> +> Can you explain the relationship to the cpuid emulation? What do the +> +> traces say about vcpus 1 and 7? +> +> +OK, we searched the VM's kernel codes with the 'Stuck' message, and it is +> +located in +> +do_boot_cpu(). It's in BSP context, the call process is: +> +BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() +> +-> wakeup_secondary_via_INIT() to trigger APs. +> +It will wait 5s for APs to startup, if some AP not startup normally, it will +> +print 'CPU%d Stuck' or 'CPU%d: Not responding'. +> +> +If it prints 'Stuck', it means the AP has received the SIPI interrupt and +> +begins to execute the code +> +'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places +> +before smp_callin()(smpboot.c). +> +The follow is the starup process of BSP and AP. +> +BSP: +> +start_kernel() +> +->smp_init() +> +->smp_boot_cpus() +> +->do_boot_cpu() +> +->start_ip = trampoline_address(); //set the address that AP will +> +go to execute +> +->wakeup_secondary_cpu_via_init(); // kick the secondary CPU +> +->for (timeout = 0; timeout < 50000; timeout++) +> +if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if +> +AP startup or not +> +> +APs: +> +ENTRY(trampoline_data) (trampoline_64.S) +> +->ENTRY(secondary_startup_64) (head_64.S) +> +->start_secondary() (smpboot.c) +> +->cpu_init(); +> +->smp_callin(); +> +->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP +> +comes here, the BSP will not prints the error message. +> +> +From above call process, we can be sure that, the AP has been stuck between +> +trampoline_data and the cpumask_set_cpu() in +> +smp_callin(), we look through these codes path carefully, and only found a +> +'hlt' instruct that could block the process. +> +It is located in trampoline_data(): +> +> +ENTRY(trampoline_data) +> +... +> +> +call verify_cpu # Verify the cpu supports long mode +> +testl %eax, %eax # Check for return code +> +jnz no_longmode +> +> +... +> +> +no_longmode: +> +hlt +> +jmp no_longmode +> +> +For the process verify_cpu(), +> +we can only find the 'cpuid' sensitive instruct that could lead VM exit from +> +No-root mode. +> +This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to +> +the fail in verify_cpu. +> +> +From the message in VM, we know vcpu1 and vcpu7 is something wrong. +> +[ 5.060042] CPU1: Stuck ?? +> +[ 10.170815] CPU7: Stuck ?? +> +[ 10.171648] Brought up 6 CPUs +> +> +Besides, the follow is the cpus message got from host. +> +80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh +> +qemu-monitor-command instance-0000000 +> +* CPU #0: pc=0x00007f64160c683d thread_id=68570 +> +CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 +> +CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 +> +CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 +> +CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 +> +CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 +> +CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 +> +CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 +> +> +Oh, i also forgot to mention in the above message that, we have bond each +> +vCPU to different physical CPU in +> +host. +> +> +Thanks, +> +zhanghailiang +> +> +> +> +> +-- +> +To unsubscribe from this list: send the line "unsubscribe kvm" in +> +the body of a message to address@hidden +> +More majordomo info at +http://vger.kernel.org/majordomo-info.html + +On 2015/7/7 19:23, Igor Mammedov wrote: +On Mon, 6 Jul 2015 17:59:10 +0800 +zhanghailiang <address@hidden> wrote: +On 2015/7/6 16:45, Paolo Bonzini wrote: +On 06/07/2015 09:54, zhanghailiang wrote: +From host, we found that QEMU vcpu1 thread and vcpu7 thread were not +consuming any cpu (Should be in idle state), +All of VCPUs' stacks in host is like bellow: + +[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +[<ffffffff81468092>] system_call_fastpath+0x16/0x1b +[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +[<ffffffffffffffff>] 0xffffffffffffffff + +We looked into the kernel codes that could leading to the above 'Stuck' +warning, +in current upstream there isn't any printk(...Stuck...) left since that code +path +has been reworked. +I've often seen this on over-committed host during guest CPUs up/down torture +test. +Could you update guest kernel to upstream and see if issue reproduces? +Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to +reproduce it. + +For your test case, is it a kernel bug? +Or is there any related patch could solve your test problem been merged into +upstream ? + +Thanks, +zhanghailiang +and found that the only possible is the emulation of 'cpuid' instruct in +kvm/qemu has something wrong. +But since we canât reproduce this problem, we are not quite sure. +Is there any possible that the cupid emulation in kvm/qemu has some bug ? +Can you explain the relationship to the cpuid emulation? What do the +traces say about vcpus 1 and 7? +OK, we searched the VM's kernel codes with the 'Stuck' message, and it is +located in +do_boot_cpu(). It's in BSP context, the call process is: +BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() +-> wakeup_secondary_via_INIT() to trigger APs. +It will wait 5s for APs to startup, if some AP not startup normally, it will +print 'CPU%d Stuck' or 'CPU%d: Not responding'. + +If it prints 'Stuck', it means the AP has received the SIPI interrupt and +begins to execute the code +'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before +smp_callin()(smpboot.c). +The follow is the starup process of BSP and AP. +BSP: +start_kernel() + ->smp_init() + ->smp_boot_cpus() + ->do_boot_cpu() + ->start_ip = trampoline_address(); //set the address that AP will +go to execute + ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU + ->for (timeout = 0; timeout < 50000; timeout++) + if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if +AP startup or not + +APs: +ENTRY(trampoline_data) (trampoline_64.S) + ->ENTRY(secondary_startup_64) (head_64.S) + ->start_secondary() (smpboot.c) + ->cpu_init(); + ->smp_callin(); + ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP +comes here, the BSP will not prints the error message. + + From above call process, we can be sure that, the AP has been stuck between +trampoline_data and the cpumask_set_cpu() in +smp_callin(), we look through these codes path carefully, and only found a +'hlt' instruct that could block the process. +It is located in trampoline_data(): + +ENTRY(trampoline_data) + ... + + call verify_cpu # Verify the cpu supports long mode + testl %eax, %eax # Check for return code + jnz no_longmode + + ... + +no_longmode: + hlt + jmp no_longmode + +For the process verify_cpu(), +we can only find the 'cpuid' sensitive instruct that could lead VM exit from +No-root mode. +This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to +the fail in verify_cpu. + + From the message in VM, we know vcpu1 and vcpu7 is something wrong. +[ 5.060042] CPU1: Stuck ?? +[ 10.170815] CPU7: Stuck ?? +[ 10.171648] Brought up 6 CPUs + +Besides, the follow is the cpus message got from host. +80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command +instance-0000000 +* CPU #0: pc=0x00007f64160c683d thread_id=68570 + CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 + CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 + CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 + CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 + CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 + CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 + CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 + +Oh, i also forgot to mention in the above message that, we have bond each vCPU +to different physical CPU in +host. + +Thanks, +zhanghailiang + + + + +-- +To unsubscribe from this list: send the line "unsubscribe kvm" in +the body of a message to address@hidden +More majordomo info at +http://vger.kernel.org/majordomo-info.html +. + +On Tue, 7 Jul 2015 19:43:35 +0800 +zhanghailiang <address@hidden> wrote: + +> +On 2015/7/7 19:23, Igor Mammedov wrote: +> +> On Mon, 6 Jul 2015 17:59:10 +0800 +> +> zhanghailiang <address@hidden> wrote: +> +> +> +>> On 2015/7/6 16:45, Paolo Bonzini wrote: +> +>>> +> +>>> +> +>>> On 06/07/2015 09:54, zhanghailiang wrote: +> +>>>> +> +>>>> From host, we found that QEMU vcpu1 thread and vcpu7 thread were not +> +>>>> consuming any cpu (Should be in idle state), +> +>>>> All of VCPUs' stacks in host is like bellow: +> +>>>> +> +>>>> [<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +> +>>>> [<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +> +>>>> [<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +> +>>>> [<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +> +>>>> [<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +> +>>>> [<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +> +>>>> [<ffffffff81468092>] system_call_fastpath+0x16/0x1b +> +>>>> [<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +> +>>>> [<ffffffffffffffff>] 0xffffffffffffffff +> +>>>> +> +>>>> We looked into the kernel codes that could leading to the above 'Stuck' +> +>>>> warning, +> +> in current upstream there isn't any printk(...Stuck...) left since that +> +> code path +> +> has been reworked. +> +> I've often seen this on over-committed host during guest CPUs up/down +> +> torture test. +> +> Could you update guest kernel to upstream and see if issue reproduces? +> +> +> +> +Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to +> +reproduce it. +> +> +For your test case, is it a kernel bug? +> +Or is there any related patch could solve your test problem been merged into +> +upstream ? +I don't remember all prerequisite patches but you should be able to find +http://marc.info/?l=linux-kernel&m=140326703108009&w=2 +"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" +and then look for dependencies. + + +> +> +Thanks, +> +zhanghailiang +> +> +>>>> and found that the only possible is the emulation of 'cpuid' instruct in +> +>>>> kvm/qemu has something wrong. +> +>>>> But since we canât reproduce this problem, we are not quite sure. +> +>>>> Is there any possible that the cupid emulation in kvm/qemu has some bug ? +> +>>> +> +>>> Can you explain the relationship to the cpuid emulation? What do the +> +>>> traces say about vcpus 1 and 7? +> +>> +> +>> OK, we searched the VM's kernel codes with the 'Stuck' message, and it is +> +>> located in +> +>> do_boot_cpu(). It's in BSP context, the call process is: +> +>> BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> +> +>> do_boot_cpu() -> wakeup_secondary_via_INIT() to trigger APs. +> +>> It will wait 5s for APs to startup, if some AP not startup normally, it +> +>> will print 'CPU%d Stuck' or 'CPU%d: Not responding'. +> +>> +> +>> If it prints 'Stuck', it means the AP has received the SIPI interrupt and +> +>> begins to execute the code +> +>> 'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places +> +>> before smp_callin()(smpboot.c). +> +>> The follow is the starup process of BSP and AP. +> +>> BSP: +> +>> start_kernel() +> +>> ->smp_init() +> +>> ->smp_boot_cpus() +> +>> ->do_boot_cpu() +> +>> ->start_ip = trampoline_address(); //set the address that AP +> +>> will go to execute +> +>> ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU +> +>> ->for (timeout = 0; timeout < 50000; timeout++) +> +>> if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// +> +>> check if AP startup or not +> +>> +> +>> APs: +> +>> ENTRY(trampoline_data) (trampoline_64.S) +> +>> ->ENTRY(secondary_startup_64) (head_64.S) +> +>> ->start_secondary() (smpboot.c) +> +>> ->cpu_init(); +> +>> ->smp_callin(); +> +>> ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP +> +>> comes here, the BSP will not prints the error message. +> +>> +> +>> From above call process, we can be sure that, the AP has been stuck +> +>> between trampoline_data and the cpumask_set_cpu() in +> +>> smp_callin(), we look through these codes path carefully, and only found a +> +>> 'hlt' instruct that could block the process. +> +>> It is located in trampoline_data(): +> +>> +> +>> ENTRY(trampoline_data) +> +>> ... +> +>> +> +>> call verify_cpu # Verify the cpu supports long mode +> +>> testl %eax, %eax # Check for return code +> +>> jnz no_longmode +> +>> +> +>> ... +> +>> +> +>> no_longmode: +> +>> hlt +> +>> jmp no_longmode +> +>> +> +>> For the process verify_cpu(), +> +>> we can only find the 'cpuid' sensitive instruct that could lead VM exit +> +>> from No-root mode. +> +>> This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading +> +>> to the fail in verify_cpu. +> +>> +> +>> From the message in VM, we know vcpu1 and vcpu7 is something wrong. +> +>> [ 5.060042] CPU1: Stuck ?? +> +>> [ 10.170815] CPU7: Stuck ?? +> +>> [ 10.171648] Brought up 6 CPUs +> +>> +> +>> Besides, the follow is the cpus message got from host. +> +>> 80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh +> +>> qemu-monitor-command instance-0000000 +> +>> * CPU #0: pc=0x00007f64160c683d thread_id=68570 +> +>> CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 +> +>> CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 +> +>> CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 +> +>> CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 +> +>> CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 +> +>> CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 +> +>> CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 +> +>> +> +>> Oh, i also forgot to mention in the above message that, we have bond each +> +>> vCPU to different physical CPU in +> +>> host. +> +>> +> +>> Thanks, +> +>> zhanghailiang +> +>> +> +>> +> +>> +> +>> +> +>> -- +> +>> To unsubscribe from this list: send the line "unsubscribe kvm" in +> +>> the body of a message to address@hidden +> +>> More majordomo info at +http://vger.kernel.org/majordomo-info.html +> +> +> +> +> +> . +> +> +> +> +> + +On 2015/7/7 20:21, Igor Mammedov wrote: +On Tue, 7 Jul 2015 19:43:35 +0800 +zhanghailiang <address@hidden> wrote: +On 2015/7/7 19:23, Igor Mammedov wrote: +On Mon, 6 Jul 2015 17:59:10 +0800 +zhanghailiang <address@hidden> wrote: +On 2015/7/6 16:45, Paolo Bonzini wrote: +On 06/07/2015 09:54, zhanghailiang wrote: +From host, we found that QEMU vcpu1 thread and vcpu7 thread were not +consuming any cpu (Should be in idle state), +All of VCPUs' stacks in host is like bellow: + +[<ffffffffa07089b5>] kvm_vcpu_block+0x65/0xa0 [kvm] +[<ffffffffa071c7c1>] __vcpu_run+0xd1/0x260 [kvm] +[<ffffffffa071d508>] kvm_arch_vcpu_ioctl_run+0x68/0x1a0 [kvm] +[<ffffffffa0709cee>] kvm_vcpu_ioctl+0x38e/0x580 [kvm] +[<ffffffff8116be8b>] do_vfs_ioctl+0x8b/0x3b0 +[<ffffffff8116c251>] sys_ioctl+0xa1/0xb0 +[<ffffffff81468092>] system_call_fastpath+0x16/0x1b +[<00002ab9fe1f99a7>] 0x2ab9fe1f99a7 +[<ffffffffffffffff>] 0xffffffffffffffff + +We looked into the kernel codes that could leading to the above 'Stuck' +warning, +in current upstream there isn't any printk(...Stuck...) left since that code +path +has been reworked. +I've often seen this on over-committed host during guest CPUs up/down torture +test. +Could you update guest kernel to upstream and see if issue reproduces? +Hmm, Unfortunately, it is very hard to reproduce, and we are still trying to +reproduce it. + +For your test case, is it a kernel bug? +Or is there any related patch could solve your test problem been merged into +upstream ? +I don't remember all prerequisite patches but you should be able to find +http://marc.info/?l=linux-kernel&m=140326703108009&w=2 +"x86/smpboot: Initialize secondary CPU only if master CPU will wait for it" +and then look for dependencies. +Er, we have investigated this patch, and it is not related to our problem, :) + +Thanks. +Thanks, +zhanghailiang +and found that the only possible is the emulation of 'cpuid' instruct in +kvm/qemu has something wrong. +But since we canât reproduce this problem, we are not quite sure. +Is there any possible that the cupid emulation in kvm/qemu has some bug ? +Can you explain the relationship to the cpuid emulation? What do the +traces say about vcpus 1 and 7? +OK, we searched the VM's kernel codes with the 'Stuck' message, and it is +located in +do_boot_cpu(). It's in BSP context, the call process is: +BSP executes start_kernel() -> smp_init() -> smp_boot_cpus() -> do_boot_cpu() +-> wakeup_secondary_via_INIT() to trigger APs. +It will wait 5s for APs to startup, if some AP not startup normally, it will +print 'CPU%d Stuck' or 'CPU%d: Not responding'. + +If it prints 'Stuck', it means the AP has received the SIPI interrupt and +begins to execute the code +'ENTRY(trampoline_data)' (trampoline_64.S) , but be stuck in some places before +smp_callin()(smpboot.c). +The follow is the starup process of BSP and AP. +BSP: +start_kernel() + ->smp_init() + ->smp_boot_cpus() + ->do_boot_cpu() + ->start_ip = trampoline_address(); //set the address that AP will +go to execute + ->wakeup_secondary_cpu_via_init(); // kick the secondary CPU + ->for (timeout = 0; timeout < 50000; timeout++) + if (cpumask_test_cpu(cpu, cpu_callin_mask)) break;// check if +AP startup or not + +APs: +ENTRY(trampoline_data) (trampoline_64.S) + ->ENTRY(secondary_startup_64) (head_64.S) + ->start_secondary() (smpboot.c) + ->cpu_init(); + ->smp_callin(); + ->cpumask_set_cpu(cpuid, cpu_callin_mask); ->Note: if AP +comes here, the BSP will not prints the error message. + + From above call process, we can be sure that, the AP has been stuck between +trampoline_data and the cpumask_set_cpu() in +smp_callin(), we look through these codes path carefully, and only found a +'hlt' instruct that could block the process. +It is located in trampoline_data(): + +ENTRY(trampoline_data) + ... + + call verify_cpu # Verify the cpu supports long mode + testl %eax, %eax # Check for return code + jnz no_longmode + + ... + +no_longmode: + hlt + jmp no_longmode + +For the process verify_cpu(), +we can only find the 'cpuid' sensitive instruct that could lead VM exit from +No-root mode. +This is why we doubt if cpuid emulation is wrong in KVM/QEMU that leading to +the fail in verify_cpu. + + From the message in VM, we know vcpu1 and vcpu7 is something wrong. +[ 5.060042] CPU1: Stuck ?? +[ 10.170815] CPU7: Stuck ?? +[ 10.171648] Brought up 6 CPUs + +Besides, the follow is the cpus message got from host. +80FF72F5-FF6D-E411-A8C8-000000821800:/home/fsp/hrg # virsh qemu-monitor-command +instance-0000000 +* CPU #0: pc=0x00007f64160c683d thread_id=68570 + CPU #1: pc=0xffffffff810301f1 (halted) thread_id=68573 + CPU #2: pc=0xffffffff810301e2 (halted) thread_id=68575 + CPU #3: pc=0xffffffff810301e2 (halted) thread_id=68576 + CPU #4: pc=0xffffffff810301e2 (halted) thread_id=68577 + CPU #5: pc=0xffffffff810301e2 (halted) thread_id=68578 + CPU #6: pc=0xffffffff810301e2 (halted) thread_id=68583 + CPU #7: pc=0xffffffff810301f1 (halted) thread_id=68584 + +Oh, i also forgot to mention in the above message that, we have bond each vCPU +to different physical CPU in +host. + +Thanks, +zhanghailiang + + + + +-- +To unsubscribe from this list: send the line "unsubscribe kvm" in +the body of a message to address@hidden +More majordomo info at +http://vger.kernel.org/majordomo-info.html +. +. + diff --git a/classification_output/01/other/7143139 b/classification_output/01/other/7143139 new file mode 100644 index 000000000..18fcc3e04 --- /dev/null +++ b/classification_output/01/other/7143139 @@ -0,0 +1,126 @@ +other: 0.927 +semantic: 0.916 +instruction: 0.910 +mistranslation: 0.870 + +[Bug] x86 EFLAGS refresh is not happening correctly + +Hello, +I'm posting this here instead of opening an issue as it is not clear to me if this is a bug or not. +The issue is located in function "cpu_compute_eflags" in target/i386/cpu.h +( +https://gitlab.com/qemu-project/qemu/-/blob/master/target/i386/cpu.h#L2071 +) +This function is exectued in an out of cpu loop context. +It is used to synchronize TCG internal eflags registers (CC_OP, CC_SRC, etc...) with the CPU eflags field upon loop exit. +It does: +  eflags +|= +cpu_cc_compute_all +( +env +, +CC_OP +) +| +( +env +-> +df +& +DF_MASK +); +Shouldn't it be: +   +eflags += +cpu_cc_compute_all +( +env +, +CC_OP +) +| +( +env +-> +df +& +DF_MASK +); +as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +Thanks, +Kind regards, +Stevie + +On 05/08/21 11:51, Stevie Lavern wrote: +Shouldn't it be: +eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +No, both are wrong. env->eflags contains flags other than the +arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved. +The right code is in helper_read_eflags. You can move it into +cpu_compute_eflags, and make helper_read_eflags use it. +Paolo + +On 05/08/21 13:24, Paolo Bonzini wrote: +On 05/08/21 11:51, Stevie Lavern wrote: +Shouldn't it be: +eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +No, both are wrong. env->eflags contains flags other than the +arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved. +The right code is in helper_read_eflags. You can move it into +cpu_compute_eflags, and make helper_read_eflags use it. +Ah, actually the two are really the same, the TF/VM bits do not apply to +cpu_compute_eflags so it's correct. +What seems wrong is migration of the EFLAGS register. There should be +code in cpu_pre_save and cpu_post_load to special-case it and setup +CC_DST/CC_OP as done in cpu_load_eflags. +Also, cpu_load_eflags should assert that update_mask does not include +any of the arithmetic flags. +Paolo + +Thank for your reply! +It's still a bit cryptic for me. +I think i need to precise that I'm using a x86_64 custom user-mode,base on linux user-mode, that i'm developing (unfortunately i cannot share the code) with modifications in the translation loop (I've added cpu loop exits on specific instructions which are not control flow instructions). +If my understanding is correct, in the user-mode case 'cpu_compute_eflags' is called directly by 'x86_cpu_exec_exit' with the intention of synchronizing the CPU env->eflags field with its real value (represented by the CC_* fields). +I'm not sure how 'cpu_pre_save' and 'cpu_post_load' are involved in this case. + +As you said in your first email, 'helper_read_eflags' seems to be the correct way to go. +Here is some detail about my current experimentation/understanding of this "issue": +With the current implementation +     +eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +if I exit the loop with a CC_OP different from CC_OP_EFLAGS, I found that the resulting env->eflags may be invalid. +In my test case, the loop was exiting with eflags = 0x44 and CC_OP = CC_OP_SUBL with CC_DST=1, CC_SRC=258, CC_SRC2=0. +While 'cpu_cc_compute_all' computes the correct flags (ZF:0, PF:0), the result will still be 0x44 (ZF:1, PF:1) due to the 'or' operation, thus leading to an incorrect eflags value loaded into the CPU env. +In my case, after loop reentry, it led to an invalid branch to be taken. +Thanks for your time! +Regards +Stevie + +On Thu, Aug 5, 2021 at 1:33 PM Paolo Bonzini < +pbonzini@redhat.com +> wrote: +On 05/08/21 13:24, Paolo Bonzini wrote: +> On 05/08/21 11:51, Stevie Lavern wrote: +>> +>> Shouldn't it be: +>> eflags = cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK); +>> as eflags is entirely reevaluated by "cpu_cc_compute_all" ? +> +> No, both are wrong. env->eflags contains flags other than the +> arithmetic flags (OF/SF/ZF/AF/PF/CF) and those have to be preserved. +> +> The right code is in helper_read_eflags. You can move it into +> cpu_compute_eflags, and make helper_read_eflags use it. +Ah, actually the two are really the same, the TF/VM bits do not apply to +cpu_compute_eflags so it's correct. +What seems wrong is migration of the EFLAGS register. There should be +code in cpu_pre_save and cpu_post_load to special-case it and setup +CC_DST/CC_OP as done in cpu_load_eflags. +Also, cpu_load_eflags should assert that update_mask does not include +any of the arithmetic flags. +Paolo + diff --git a/classification_output/01/other/7427991 b/classification_output/01/other/7427991 new file mode 100644 index 000000000..7c8b06ccf --- /dev/null +++ b/classification_output/01/other/7427991 @@ -0,0 +1,313 @@ +other: 0.963 +semantic: 0.950 +instruction: 0.929 +mistranslation: 0.770 + +[Qemu-devel] [BUG] 216 Alerts reported by LGTM for QEMU (some might be release critical) + +Hi, +LGTM reports 16 errors, 81 warnings and 119 recommendations: +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +Some of them are already know (wrong format strings), others look like +real errors: +- several multiplication results which don't work as they should in +contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +32 bit!), target/i386/translate.c and other files +- potential buffer overflows in gdbstub.c and other files +I am afraid that the overflows in the block code are release critical, +maybe that in target/i386/translate.c and other errors, too. +About half of the alerts are issues which can be fixed later. + +Regards + +Stefan + +On 13/07/19 19:46, Stefan Weil wrote: +> +> +LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +> +> +Some of them are already know (wrong format strings), others look like +> +real errors: +> +> +- several multiplication results which don't work as they should in +> +contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +> +32 bit!), target/i386/translate.c and other files +m->nb_clusters here is limited by s->l2_slice_size (see for example +handle_alloc) so I wouldn't be surprised if this is a false positive. I +couldn't find this particular multiplication in Coverity, but it has +about 250 issues marked as intentional or false positive so there's +probably a lot of overlap with what LGTM found. + +Paolo + +Am 13.07.2019 um 21:42 schrieb Paolo Bonzini: +> +On 13/07/19 19:46, Stefan Weil wrote: +> +> LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +> +> +> +> Some of them are already known (wrong format strings), others look like +> +> real errors: +> +> +> +> - several multiplication results which don't work as they should in +> +> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +> +> 32 bit!), target/i386/translate.c and other files +> +m->nb_clusters here is limited by s->l2_slice_size (see for example +> +handle_alloc) so I wouldn't be surprised if this is a false positive. I +> +couldn't find this particular multiplication in Coverity, but it has +> +about 250 issues marked as intentional or false positive so there's +> +probably a lot of overlap with what LGTM found. +> +> +Paolo +> +From other projects I know that there is a certain overlap between the +results from Coverity Scan an LGTM, but it is good to have both +analyzers, and the results from LGTM are typically quite reliable. + +Even if we know that there is no multiplication overflow, the code could +be modified. Either the assigned value should use the same data type as +the factors (possible when there is never an overflow, avoids a size +extension), or the multiplication could use the larger data type by +adding a type cast to one of the factors (then an overflow cannot +happen, static code analysers and human reviewers have an easier job, +but the multiplication costs more time). + +Stefan + +Am 14.07.2019 um 15:28 hat Stefan Weil geschrieben: +> +Am 13.07.2019 um 21:42 schrieb Paolo Bonzini: +> +> On 13/07/19 19:46, Stefan Weil wrote: +> +>> LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +>> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +> +>> +> +>> Some of them are already known (wrong format strings), others look like +> +>> real errors: +> +>> +> +>> - several multiplication results which don't work as they should in +> +>> contrib/vhost-user-gpu, block/* (m->nb_clusters * s->cluster_size only +> +>> 32 bit!), target/i386/translate.c and other files +Request sizes are limited to 32 bit in the generic block layer before +they are even passed to the individual block drivers, so most if not all +of these are going to be false positives. + +> +> m->nb_clusters here is limited by s->l2_slice_size (see for example +> +> handle_alloc) so I wouldn't be surprised if this is a false positive. I +> +> couldn't find this particular multiplication in Coverity, but it has +> +> about 250 issues marked as intentional or false positive so there's +> +> probably a lot of overlap with what LGTM found. +> +> +> +> Paolo +> +> +From other projects I know that there is a certain overlap between the +> +results from Coverity Scan an LGTM, but it is good to have both +> +analyzers, and the results from LGTM are typically quite reliable. +> +> +Even if we know that there is no multiplication overflow, the code could +> +be modified. Either the assigned value should use the same data type as +> +the factors (possible when there is never an overflow, avoids a size +> +extension), or the multiplication could use the larger data type by +> +adding a type cast to one of the factors (then an overflow cannot +> +happen, static code analysers and human reviewers have an easier job, +> +but the multiplication costs more time). +But if you look at the code we're talking about, you see that it's +complaining about things where being more explicit would make things +less readable. + +For example, if complains about the multiplication in this line: + + s->file_size += n * s->header.cluster_size; + +We know that n * s->header.cluster_size fits in 32 bits, but +s->file_size is 64 bits (and has to be 64 bits). Do you really think we +should introduce another uint32_t variable to store the intermediate +result? And if we cast n to uint64_t, not only might the multiplication +cost more time, but also human readers would wonder why the result could +become larger than 32 bits. So a cast would be misleading. + + +It also complains about this line: + + ret = bdrv_truncate(bs->file, (3 + l1_clusters) * s->cluster_size, + PREALLOC_MODE_OFF, &local_err); + +Here, we don't even assign the result to a 64 bit variable, but just +pass it to a function which takes a 64 bit parameter. Again, I don't +think introducing additional variables for the intermediate result or +adding casts would be an improvement of the situation. + + +So I don't think this is a good enough tool to base our code on what it +does and doesn't understand. It would have too much of a negative impact +on our code. We'd rather need a way to mark false positives as such and +move on without changing the code in such cases. + +Kevin + +On Sat, 13 Jul 2019 at 18:46, Stefan Weil <address@hidden> wrote: +> +LGTM reports 16 errors, 81 warnings and 119 recommendations: +> +https://lgtm.com/projects/g/qemu/qemu/alerts/?mode=list +. +I had a look at some of these before, but mostly I came +to the conclusion that it wasn't worth trying to put the +effort into keeping up with the site because they didn't +seem to provide any useful way to mark things as false +positives. Coverity has its flaws but at least you can do +that kind of thing in its UI (it runs at about a 33% fp +rate, I think.) "Analyzer thinks this multiply can overflow +but in fact it's not possible" is quite a common false +positive cause... + +Anyway, if you want to fish out specific issues, analyse +whether they're false positive or real, and report them +to the mailing list as followups to the patches which +introduced the issue, that's probably the best way for +us to make use of this analyzer. (That is essentially +what I do for coverity.) + +thanks +-- PMM + +Am 14.07.2019 um 19:30 schrieb Peter Maydell: +[...] +> +"Analyzer thinks this multiply can overflow +> +but in fact it's not possible" is quite a common false +> +positive cause... +The analysers don't complain because a multiply can overflow. + +They complain because the code indicates that a larger result is +expected, for example uint64_t = uint32_t * uint32_t. They would not +complain for the same multiplication if it were assigned to a uint32_t. + +So there is a simple solution to write the code in a way which avoids +false positives... + +Stefan + +Stefan Weil <address@hidden> writes: + +> +Am 14.07.2019 um 19:30 schrieb Peter Maydell: +> +[...] +> +> "Analyzer thinks this multiply can overflow +> +> but in fact it's not possible" is quite a common false +> +> positive cause... +> +> +> +The analysers don't complain because a multiply can overflow. +> +> +They complain because the code indicates that a larger result is +> +expected, for example uint64_t = uint32_t * uint32_t. They would not +> +complain for the same multiplication if it were assigned to a uint32_t. +I agree this is an anti-pattern. + +> +So there is a simple solution to write the code in a way which avoids +> +false positives... +You wrote elsewhere in this thread: + + Either the assigned value should use the same data type as the + factors (possible when there is never an overflow, avoids a size + extension), or the multiplication could use the larger data type by + adding a type cast to one of the factors (then an overflow cannot + happen, static code analysers and human reviewers have an easier + job, but the multiplication costs more time). + +Makes sense to me. + +On 7/14/19 5:30 PM, Peter Maydell wrote: +> +I had a look at some of these before, but mostly I came +> +to the conclusion that it wasn't worth trying to put the +> +effort into keeping up with the site because they didn't +> +seem to provide any useful way to mark things as false +> +positives. Coverity has its flaws but at least you can do +> +that kind of thing in its UI (it runs at about a 33% fp +> +rate, I think.) +Yes, LGTM wants you to modify the source code with + + /* lgtm [cpp/some-warning-code] */ + +and on the same line as the reported problem. Which is mildly annoying in that +you're definitely committing to LGTM in the long term. Also for any +non-trivial bit of code, it will almost certainly run over 80 columns. + + +r~ + diff --git a/classification_output/01/other/7639274 b/classification_output/01/other/7639274 new file mode 100644 index 000000000..cbe17cc27 --- /dev/null +++ b/classification_output/01/other/7639274 @@ -0,0 +1,1310 @@ +other: 0.945 +semantic: 0.928 +instruction: 0.928 +mistranslation: 0.841 + +[Qemu-devel] [RFC/BUG] xen-mapcache: buggy invalidate map cache? + +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +instead of first level entry (if map to rom other than guest memory +comes first), while in xen_invalidate_map_cache(), when VM ballooned +out memory, qemu did not invalidate cache entries in linked +list(entry->next), so when VM balloon back in memory, gfns probably +mapped to different mfns, thus if guest asks device to DMA to these +GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? + +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +Hi, +> +> +In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +instead of first level entry (if map to rom other than guest memory +> +comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +out memory, qemu did not invalidate cache entries in linked +> +list(entry->next), so when VM balloon back in memory, gfns probably +> +mapped to different mfns, thus if guest asks device to DMA to these +> +GPA, qemu may DMA to stale MFNs. +> +> +So I think in xen_invalidate_map_cache() linked lists should also be +> +checked and invalidated. +> +> +Whatâs your opinion? Is this a bug? Is my analyze correct? +Added Jun Nakajima and Alexander Graf + +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> Hi, +> +> +> +> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +> instead of first level entry (if map to rom other than guest memory +> +> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +> out memory, qemu did not invalidate cache entries in linked +> +> list(entry->next), so when VM balloon back in memory, gfns probably +> +> mapped to different mfns, thus if guest asks device to DMA to these +> +> GPA, qemu may DMA to stale MFNs. +> +> +> +> So I think in xen_invalidate_map_cache() linked lists should also be +> +> checked and invalidated. +> +> +> +> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +Added Jun Nakajima and Alexander Graf +And correct Stefano Stabellini's email address. + +On Mon, 10 Apr 2017 00:36:02 +0800 +hrg <address@hidden> wrote: + +Hi, + +> +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +>> Hi, +> +>> +> +>> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +>> instead of first level entry (if map to rom other than guest memory +> +>> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +>> out memory, qemu did not invalidate cache entries in linked +> +>> list(entry->next), so when VM balloon back in memory, gfns probably +> +>> mapped to different mfns, thus if guest asks device to DMA to these +> +>> GPA, qemu may DMA to stale MFNs. +> +>> +> +>> So I think in xen_invalidate_map_cache() linked lists should also be +> +>> checked and invalidated. +> +>> +> +>> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +> +> Added Jun Nakajima and Alexander Graf +> +And correct Stefano Stabellini's email address. +There is a real issue with the xen-mapcache corruption in fact. I encountered +it a few months ago while experimenting with Q35 support on Xen. Q35 emulation +uses an AHCI controller by default, along with NCQ mode enabled. The issue can +be (somewhat) easily reproduced there, though using a normal i440 emulation +might possibly allow to reproduce the issue as well, using a dedicated test +code from a guest side. In case of Q35+NCQ the issue can be reproduced "as is". + +The issue occurs when a guest domain performs an intensive disk I/O, ex. while +guest OS booting. QEMU crashes with "Bad ram offset 980aa000" +message logged, where the address is different each time. The hard thing with +this issue is that it has a very low reproducibility rate. + +The corruption happens when there are multiple I/O commands in the NCQ queue. +So there are overlapping emulated DMA operations in flight and QEMU uses a +sequence of mapcache actions which can be executed in the "wrong" order thus +leading to an inconsistent xen-mapcache - so a bad address from the wrong +entry is returned. + +The bad thing with this issue is that QEMU crash due to "Bad ram offset" +appearance is a relatively good situation in the sense that this is a caught +error. But there might be a much worse (artificial) situation where the returned +address looks valid but points to a different mapped memory. + +The fix itself is not hard (ex. an additional checked field in MapCacheEntry), +but there is a need of some reliable way to test it considering the low +reproducibility rate. + +Regards, +Alex + +On Mon, 10 Apr 2017, hrg wrote: +> +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +>> Hi, +> +>> +> +>> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +>> instead of first level entry (if map to rom other than guest memory +> +>> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +>> out memory, qemu did not invalidate cache entries in linked +> +>> list(entry->next), so when VM balloon back in memory, gfns probably +> +>> mapped to different mfns, thus if guest asks device to DMA to these +> +>> GPA, qemu may DMA to stale MFNs. +> +>> +> +>> So I think in xen_invalidate_map_cache() linked lists should also be +> +>> checked and invalidated. +> +>> +> +>> Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? + +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +On Mon, 10 Apr 2017, hrg wrote: +> +> On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> >> Hi, +> +> >> +> +> >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +> >> instead of first level entry (if map to rom other than guest memory +> +> >> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +> >> out memory, qemu did not invalidate cache entries in linked +> +> >> list(entry->next), so when VM balloon back in memory, gfns probably +> +> >> mapped to different mfns, thus if guest asks device to DMA to these +> +> >> GPA, qemu may DMA to stale MFNs. +> +> >> +> +> >> So I think in xen_invalidate_map_cache() linked lists should also be +> +> >> checked and invalidated. +> +> >> +> +> >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +Yes, you are right. We need to go through the list for each element of +> +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. + +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +> +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> On Mon, 10 Apr 2017, hrg wrote: +> +> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> > >> Hi, +> +> > >> +> +> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +> > >> instead of first level entry (if map to rom other than guest memory +> +> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +> > >> out memory, qemu did not invalidate cache entries in linked +> +> > >> list(entry->next), so when VM balloon back in memory, gfns probably +> +> > >> mapped to different mfns, thus if guest asks device to DMA to these +> +> > >> GPA, qemu may DMA to stale MFNs. +> +> > >> +> +> > >> So I think in xen_invalidate_map_cache() linked lists should also be +> +> > >> checked and invalidated. +> +> > >> +> +> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> +> +> Yes, you are right. We need to go through the list for each element of +> +> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> +I spoke too soon. In the regular case there should be no locked mappings +> +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +beginning of the functions). Without locked mappings, there should never +> +be more than one element in each list (see xen_map_cache_unlocked: +> +entry->lock == true is a necessary condition to append a new entry to +> +the list, otherwise it is just remapped). +> +> +Can you confirm that what you are seeing are locked mappings +> +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. + +On Tue, 11 Apr 2017, hrg wrote: +> +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +<address@hidden> wrote: +> +> On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +>> On Mon, 10 Apr 2017, hrg wrote: +> +>> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +>> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +>> > >> Hi, +> +>> > >> +> +>> > >> In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +> +>> > >> instead of first level entry (if map to rom other than guest memory +> +>> > >> comes first), while in xen_invalidate_map_cache(), when VM ballooned +> +>> > >> out memory, qemu did not invalidate cache entries in linked +> +>> > >> list(entry->next), so when VM balloon back in memory, gfns probably +> +>> > >> mapped to different mfns, thus if guest asks device to DMA to these +> +>> > >> GPA, qemu may DMA to stale MFNs. +> +>> > >> +> +>> > >> So I think in xen_invalidate_map_cache() linked lists should also be +> +>> > >> checked and invalidated. +> +>> > >> +> +>> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +>> +> +>> Yes, you are right. We need to go through the list for each element of +> +>> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> +> +> I spoke too soon. In the regular case there should be no locked mappings +> +> when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> beginning of the functions). Without locked mappings, there should never +> +> be more than one element in each list (see xen_map_cache_unlocked: +> +> entry->lock == true is a necessary condition to append a new entry to +> +> the list, otherwise it is just remapped). +> +> +> +> Can you confirm that what you are seeing are locked mappings +> +> when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> +In fact, I think the DPRINTF above is incorrect too. In +> +pci_add_option_rom(), rtl8139 rom is locked mapped in +> +pci_add_option_rom->memory_region_get_ram_ptr (after +> +memory_region_init_ram). So actually I think we should remove the +> +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. + + +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index e6b08e1..04f98b7 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +is_default_rom, + } + + pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); ++ xen_invalidate_map_cache_entry(ptr); + } + + static void pci_del_option_rom(PCIDevice *pdev) + +On Tue, 11 Apr 2017 15:32:09 -0700 (PDT) +Stefano Stabellini <address@hidden> wrote: + +> +On Tue, 11 Apr 2017, hrg wrote: +> +> On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> <address@hidden> wrote: +> +> > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> >> On Mon, 10 Apr 2017, hrg wrote: +> +> >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> >> > >> Hi, +> +> >> > >> +> +> >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in +> +> >> > >> entry->next instead of first level entry (if map to rom other than +> +> >> > >> guest memory comes first), while in xen_invalidate_map_cache(), +> +> >> > >> when VM ballooned out memory, qemu did not invalidate cache entries +> +> >> > >> in linked list(entry->next), so when VM balloon back in memory, +> +> >> > >> gfns probably mapped to different mfns, thus if guest asks device +> +> >> > >> to DMA to these GPA, qemu may DMA to stale MFNs. +> +> >> > >> +> +> >> > >> So I think in xen_invalidate_map_cache() linked lists should also be +> +> >> > >> checked and invalidated. +> +> >> > >> +> +> >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> >> +> +> >> Yes, you are right. We need to go through the list for each element of +> +> >> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> > +> +> > I spoke too soon. In the regular case there should be no locked mappings +> +> > when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> > beginning of the functions). Without locked mappings, there should never +> +> > be more than one element in each list (see xen_map_cache_unlocked: +> +> > entry->lock == true is a necessary condition to append a new entry to +> +> > the list, otherwise it is just remapped). +> +> > +> +> > Can you confirm that what you are seeing are locked mappings +> +> > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> +> +> In fact, I think the DPRINTF above is incorrect too. In +> +> pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> pci_add_option_rom->memory_region_get_ram_ptr (after +> +> memory_region_init_ram). So actually I think we should remove the +> +> DPRINTF warning as it is normal. +> +> +Let me explain why the DPRINTF warning is there: emulated dma operations +> +can involve locked mappings. Once a dma operation completes, the related +> +mapping is unlocked and can be safely destroyed. But if we destroy a +> +locked mapping in xen_invalidate_map_cache, while a dma is still +> +ongoing, QEMU will crash. We cannot handle that case. +> +> +However, the scenario you described is different. It has nothing to do +> +with DMA. It looks like pci_add_option_rom calls +> +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +locked mapping and it is never unlocked or destroyed. +> +> +It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +the append patch fix the problem you are seeing? For the proper fix, I +> +think we probably need some sort of memory_region_unmap wrapper or maybe +> +a call to address_space_unmap. +Hmm, for some reason my message to the Xen-devel list got rejected but was sent +to Qemu-devel instead, without any notice. Sorry if I'm missing something +obvious as a list newbie. + +Stefano, hrg, + +There is an issue with inconsistency between the list of normal MapCacheEntry's +and their 'reverse' counterparts - MapCacheRev's in locked_entries. +When bad situation happens, there are multiple (locked) MapCacheEntry +entries in the bucket's linked list along with a number of MapCacheRev's. And +when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the +first list and calculates a wrong pointer from it which may then be caught with +the "Bad RAM offset" check (or not). Mapcache invalidation might be related to +this issue as well I think. + +I'll try to provide a test code which can reproduce the issue from the +guest side using an emulated IDE controller, though it's much simpler to achieve +this result with an AHCI controller using multiple NCQ I/O commands. So far I've +seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O +DMA should be enough I think. + +On 2017/4/12 14:17, Alexey G wrote: +On Tue, 11 Apr 2017 15:32:09 -0700 (PDT) +Stefano Stabellini <address@hidden> wrote: +On Tue, 11 Apr 2017, hrg wrote: +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +On Mon, 10 Apr 2017, hrg wrote: +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in +entry->next instead of first level entry (if map to rom other than +guest memory comes first), while in xen_invalidate_map_cache(), +when VM ballooned out memory, qemu did not invalidate cache entries +in linked list(entry->next), so when VM balloon back in memory, +gfns probably mapped to different mfns, thus if guest asks device +to DMA to these GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. +Hmm, for some reason my message to the Xen-devel list got rejected but was sent +to Qemu-devel instead, without any notice. Sorry if I'm missing something +obvious as a list newbie. + +Stefano, hrg, + +There is an issue with inconsistency between the list of normal MapCacheEntry's +and their 'reverse' counterparts - MapCacheRev's in locked_entries. +When bad situation happens, there are multiple (locked) MapCacheEntry +entries in the bucket's linked list along with a number of MapCacheRev's. And +when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the +first list and calculates a wrong pointer from it which may then be caught with +the "Bad RAM offset" check (or not). Mapcache invalidation might be related to +this issue as well I think. + +I'll try to provide a test code which can reproduce the issue from the +guest side using an emulated IDE controller, though it's much simpler to achieve +this result with an AHCI controller using multiple NCQ I/O commands. So far I've +seen this issue only with Windows 7 (and above) guest on AHCI, but any block I/O +DMA should be enough I think. +Yes, I think there may be other bugs lurking, considering the complexity, +though we need to reproduce it if we want to delve into it. + +On Wed, 12 Apr 2017, Alexey G wrote: +> +On Tue, 11 Apr 2017 15:32:09 -0700 (PDT) +> +Stefano Stabellini <address@hidden> wrote: +> +> +> On Tue, 11 Apr 2017, hrg wrote: +> +> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> > <address@hidden> wrote: +> +> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> > >> On Mon, 10 Apr 2017, hrg wrote: +> +> > >> > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > >> > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> > >> > >> Hi, +> +> > >> > >> +> +> > >> > >> In xen_map_cache_unlocked(), map to guest memory maybe in +> +> > >> > >> entry->next instead of first level entry (if map to rom other than +> +> > >> > >> guest memory comes first), while in xen_invalidate_map_cache(), +> +> > >> > >> when VM ballooned out memory, qemu did not invalidate cache +> +> > >> > >> entries +> +> > >> > >> in linked list(entry->next), so when VM balloon back in memory, +> +> > >> > >> gfns probably mapped to different mfns, thus if guest asks device +> +> > >> > >> to DMA to these GPA, qemu may DMA to stale MFNs. +> +> > >> > >> +> +> > >> > >> So I think in xen_invalidate_map_cache() linked lists should also +> +> > >> > >> be +> +> > >> > >> checked and invalidated. +> +> > >> > >> +> +> > >> > >> Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> > >> +> +> > >> Yes, you are right. We need to go through the list for each element of +> +> > >> the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> > > +> +> > > I spoke too soon. In the regular case there should be no locked mappings +> +> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> > > beginning of the functions). Without locked mappings, there should never +> +> > > be more than one element in each list (see xen_map_cache_unlocked: +> +> > > entry->lock == true is a necessary condition to append a new entry to +> +> > > the list, otherwise it is just remapped). +> +> > > +> +> > > Can you confirm that what you are seeing are locked mappings +> +> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> > > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> > +> +> > In fact, I think the DPRINTF above is incorrect too. In +> +> > pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> > pci_add_option_rom->memory_region_get_ram_ptr (after +> +> > memory_region_init_ram). So actually I think we should remove the +> +> > DPRINTF warning as it is normal. +> +> +> +> Let me explain why the DPRINTF warning is there: emulated dma operations +> +> can involve locked mappings. Once a dma operation completes, the related +> +> mapping is unlocked and can be safely destroyed. But if we destroy a +> +> locked mapping in xen_invalidate_map_cache, while a dma is still +> +> ongoing, QEMU will crash. We cannot handle that case. +> +> +> +> However, the scenario you described is different. It has nothing to do +> +> with DMA. It looks like pci_add_option_rom calls +> +> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +> locked mapping and it is never unlocked or destroyed. +> +> +> +> It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +> the append patch fix the problem you are seeing? For the proper fix, I +> +> think we probably need some sort of memory_region_unmap wrapper or maybe +> +> a call to address_space_unmap. +> +> +Hmm, for some reason my message to the Xen-devel list got rejected but was +> +sent +> +to Qemu-devel instead, without any notice. Sorry if I'm missing something +> +obvious as a list newbie. +> +> +Stefano, hrg, +> +> +There is an issue with inconsistency between the list of normal +> +MapCacheEntry's +> +and their 'reverse' counterparts - MapCacheRev's in locked_entries. +> +When bad situation happens, there are multiple (locked) MapCacheEntry +> +entries in the bucket's linked list along with a number of MapCacheRev's. And +> +when it comes to a reverse lookup, xen-mapcache picks the wrong entry from the +> +first list and calculates a wrong pointer from it which may then be caught +> +with +> +the "Bad RAM offset" check (or not). Mapcache invalidation might be related to +> +this issue as well I think. +> +> +I'll try to provide a test code which can reproduce the issue from the +> +guest side using an emulated IDE controller, though it's much simpler to +> +achieve +> +this result with an AHCI controller using multiple NCQ I/O commands. So far +> +I've +> +seen this issue only with Windows 7 (and above) guest on AHCI, but any block +> +I/O +> +DMA should be enough I think. +That would be helpful. Please see if you can reproduce it after fixing +the other issue ( +http://marc.info/?l=qemu-devel&m=149195042500707&w=2 +). + +On 2017/4/12 6:32, Stefano Stabellini wrote: +On Tue, 11 Apr 2017, hrg wrote: +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +On Mon, 10 Apr 2017, hrg wrote: +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in entry->next +instead of first level entry (if map to rom other than guest memory +comes first), while in xen_invalidate_map_cache(), when VM ballooned +out memory, qemu did not invalidate cache entries in linked +list(entry->next), so when VM balloon back in memory, gfns probably +mapped to different mfns, thus if guest asks device to DMA to these +GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. +Yes, I think so, maybe this is the proper way to fix this. +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index e6b08e1..04f98b7 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +is_default_rom, + } +pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); ++ xen_invalidate_map_cache_entry(ptr); + } +static void pci_del_option_rom(PCIDevice *pdev) + +On Wed, 12 Apr 2017, Herongguang (Stephen) wrote: +> +On 2017/4/12 6:32, Stefano Stabellini wrote: +> +> On Tue, 11 Apr 2017, hrg wrote: +> +> > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> > <address@hidden> wrote: +> +> > > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> > > > On Mon, 10 Apr 2017, hrg wrote: +> +> > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +> +> > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +> +> > > > > > > Hi, +> +> > > > > > > +> +> > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in +> +> > > > > > > entry->next +> +> > > > > > > instead of first level entry (if map to rom other than guest +> +> > > > > > > memory +> +> > > > > > > comes first), while in xen_invalidate_map_cache(), when VM +> +> > > > > > > ballooned +> +> > > > > > > out memory, qemu did not invalidate cache entries in linked +> +> > > > > > > list(entry->next), so when VM balloon back in memory, gfns +> +> > > > > > > probably +> +> > > > > > > mapped to different mfns, thus if guest asks device to DMA to +> +> > > > > > > these +> +> > > > > > > GPA, qemu may DMA to stale MFNs. +> +> > > > > > > +> +> > > > > > > So I think in xen_invalidate_map_cache() linked lists should +> +> > > > > > > also be +> +> > > > > > > checked and invalidated. +> +> > > > > > > +> +> > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> > > > Yes, you are right. We need to go through the list for each element of +> +> > > > the array in xen_invalidate_map_cache. Can you come up with a patch? +> +> > > I spoke too soon. In the regular case there should be no locked mappings +> +> > > when xen_invalidate_map_cache is called (see the DPRINTF warning at the +> +> > > beginning of the functions). Without locked mappings, there should never +> +> > > be more than one element in each list (see xen_map_cache_unlocked: +> +> > > entry->lock == true is a necessary condition to append a new entry to +> +> > > the list, otherwise it is just remapped). +> +> > > +> +> > > Can you confirm that what you are seeing are locked mappings +> +> > > when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +> +> > > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> > In fact, I think the DPRINTF above is incorrect too. In +> +> > pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> > pci_add_option_rom->memory_region_get_ram_ptr (after +> +> > memory_region_init_ram). So actually I think we should remove the +> +> > DPRINTF warning as it is normal. +> +> Let me explain why the DPRINTF warning is there: emulated dma operations +> +> can involve locked mappings. Once a dma operation completes, the related +> +> mapping is unlocked and can be safely destroyed. But if we destroy a +> +> locked mapping in xen_invalidate_map_cache, while a dma is still +> +> ongoing, QEMU will crash. We cannot handle that case. +> +> +> +> However, the scenario you described is different. It has nothing to do +> +> with DMA. It looks like pci_add_option_rom calls +> +> memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +> locked mapping and it is never unlocked or destroyed. +> +> +> +> It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +> the append patch fix the problem you are seeing? For the proper fix, I +> +> think we probably need some sort of memory_region_unmap wrapper or maybe +> +> a call to address_space_unmap. +> +> +Yes, I think so, maybe this is the proper way to fix this. +Would you be up for sending a proper patch and testing it? We cannot call +xen_invalidate_map_cache_entry directly from pci.c though, it would need +to be one of the other functions like address_space_unmap for example. + + +> +> diff --git a/hw/pci/pci.c b/hw/pci/pci.c +> +> index e6b08e1..04f98b7 100644 +> +> --- a/hw/pci/pci.c +> +> +++ b/hw/pci/pci.c +> +> @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +> +> is_default_rom, +> +> } +> +> pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); +> +> + xen_invalidate_map_cache_entry(ptr); +> +> } +> +> static void pci_del_option_rom(PCIDevice *pdev) + +On 2017/4/13 7:51, Stefano Stabellini wrote: +On Wed, 12 Apr 2017, Herongguang (Stephen) wrote: +On 2017/4/12 6:32, Stefano Stabellini wrote: +On Tue, 11 Apr 2017, hrg wrote: +On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +<address@hidden> wrote: +On Mon, 10 Apr 2017, Stefano Stabellini wrote: +On Mon, 10 Apr 2017, hrg wrote: +On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> wrote: +On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> wrote: +Hi, + +In xen_map_cache_unlocked(), map to guest memory maybe in +entry->next +instead of first level entry (if map to rom other than guest +memory +comes first), while in xen_invalidate_map_cache(), when VM +ballooned +out memory, qemu did not invalidate cache entries in linked +list(entry->next), so when VM balloon back in memory, gfns +probably +mapped to different mfns, thus if guest asks device to DMA to +these +GPA, qemu may DMA to stale MFNs. + +So I think in xen_invalidate_map_cache() linked lists should +also be +checked and invalidated. + +Whatâs your opinion? Is this a bug? Is my analyze correct? +Yes, you are right. We need to go through the list for each element of +the array in xen_invalidate_map_cache. Can you come up with a patch? +I spoke too soon. In the regular case there should be no locked mappings +when xen_invalidate_map_cache is called (see the DPRINTF warning at the +beginning of the functions). Without locked mappings, there should never +be more than one element in each list (see xen_map_cache_unlocked: +entry->lock == true is a necessary condition to append a new entry to +the list, otherwise it is just remapped). + +Can you confirm that what you are seeing are locked mappings +when xen_invalidate_map_cache is called? To find out, enable the DPRINTK +by turning it into a printf or by defininig MAPCACHE_DEBUG. +In fact, I think the DPRINTF above is incorrect too. In +pci_add_option_rom(), rtl8139 rom is locked mapped in +pci_add_option_rom->memory_region_get_ram_ptr (after +memory_region_init_ram). So actually I think we should remove the +DPRINTF warning as it is normal. +Let me explain why the DPRINTF warning is there: emulated dma operations +can involve locked mappings. Once a dma operation completes, the related +mapping is unlocked and can be safely destroyed. But if we destroy a +locked mapping in xen_invalidate_map_cache, while a dma is still +ongoing, QEMU will crash. We cannot handle that case. + +However, the scenario you described is different. It has nothing to do +with DMA. It looks like pci_add_option_rom calls +memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +locked mapping and it is never unlocked or destroyed. + +It looks like "ptr" is not used after pci_add_option_rom returns. Does +the append patch fix the problem you are seeing? For the proper fix, I +think we probably need some sort of memory_region_unmap wrapper or maybe +a call to address_space_unmap. +Yes, I think so, maybe this is the proper way to fix this. +Would you be up for sending a proper patch and testing it? We cannot call +xen_invalidate_map_cache_entry directly from pci.c though, it would need +to be one of the other functions like address_space_unmap for example. +Yes, I will look into this. +diff --git a/hw/pci/pci.c b/hw/pci/pci.c +index e6b08e1..04f98b7 100644 +--- a/hw/pci/pci.c ++++ b/hw/pci/pci.c +@@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool +is_default_rom, + } + pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); ++ xen_invalidate_map_cache_entry(ptr); + } + static void pci_del_option_rom(PCIDevice *pdev) + +On Thu, 13 Apr 2017, Herongguang (Stephen) wrote: +> +On 2017/4/13 7:51, Stefano Stabellini wrote: +> +> On Wed, 12 Apr 2017, Herongguang (Stephen) wrote: +> +> > On 2017/4/12 6:32, Stefano Stabellini wrote: +> +> > > On Tue, 11 Apr 2017, hrg wrote: +> +> > > > On Tue, Apr 11, 2017 at 3:50 AM, Stefano Stabellini +> +> > > > <address@hidden> wrote: +> +> > > > > On Mon, 10 Apr 2017, Stefano Stabellini wrote: +> +> > > > > > On Mon, 10 Apr 2017, hrg wrote: +> +> > > > > > > On Sun, Apr 9, 2017 at 11:55 PM, hrg <address@hidden> +> +> > > > > > > wrote: +> +> > > > > > > > On Sun, Apr 9, 2017 at 11:52 PM, hrg <address@hidden> +> +> > > > > > > > wrote: +> +> > > > > > > > > Hi, +> +> > > > > > > > > +> +> > > > > > > > > In xen_map_cache_unlocked(), map to guest memory maybe in +> +> > > > > > > > > entry->next +> +> > > > > > > > > instead of first level entry (if map to rom other than guest +> +> > > > > > > > > memory +> +> > > > > > > > > comes first), while in xen_invalidate_map_cache(), when VM +> +> > > > > > > > > ballooned +> +> > > > > > > > > out memory, qemu did not invalidate cache entries in linked +> +> > > > > > > > > list(entry->next), so when VM balloon back in memory, gfns +> +> > > > > > > > > probably +> +> > > > > > > > > mapped to different mfns, thus if guest asks device to DMA +> +> > > > > > > > > to +> +> > > > > > > > > these +> +> > > > > > > > > GPA, qemu may DMA to stale MFNs. +> +> > > > > > > > > +> +> > > > > > > > > So I think in xen_invalidate_map_cache() linked lists should +> +> > > > > > > > > also be +> +> > > > > > > > > checked and invalidated. +> +> > > > > > > > > +> +> > > > > > > > > Whatâs your opinion? Is this a bug? Is my analyze correct? +> +> > > > > > Yes, you are right. We need to go through the list for each +> +> > > > > > element of +> +> > > > > > the array in xen_invalidate_map_cache. Can you come up with a +> +> > > > > > patch? +> +> > > > > I spoke too soon. In the regular case there should be no locked +> +> > > > > mappings +> +> > > > > when xen_invalidate_map_cache is called (see the DPRINTF warning at +> +> > > > > the +> +> > > > > beginning of the functions). Without locked mappings, there should +> +> > > > > never +> +> > > > > be more than one element in each list (see xen_map_cache_unlocked: +> +> > > > > entry->lock == true is a necessary condition to append a new entry +> +> > > > > to +> +> > > > > the list, otherwise it is just remapped). +> +> > > > > +> +> > > > > Can you confirm that what you are seeing are locked mappings +> +> > > > > when xen_invalidate_map_cache is called? To find out, enable the +> +> > > > > DPRINTK +> +> > > > > by turning it into a printf or by defininig MAPCACHE_DEBUG. +> +> > > > In fact, I think the DPRINTF above is incorrect too. In +> +> > > > pci_add_option_rom(), rtl8139 rom is locked mapped in +> +> > > > pci_add_option_rom->memory_region_get_ram_ptr (after +> +> > > > memory_region_init_ram). So actually I think we should remove the +> +> > > > DPRINTF warning as it is normal. +> +> > > Let me explain why the DPRINTF warning is there: emulated dma operations +> +> > > can involve locked mappings. Once a dma operation completes, the related +> +> > > mapping is unlocked and can be safely destroyed. But if we destroy a +> +> > > locked mapping in xen_invalidate_map_cache, while a dma is still +> +> > > ongoing, QEMU will crash. We cannot handle that case. +> +> > > +> +> > > However, the scenario you described is different. It has nothing to do +> +> > > with DMA. It looks like pci_add_option_rom calls +> +> > > memory_region_get_ram_ptr to map the rtl8139 rom. The mapping is a +> +> > > locked mapping and it is never unlocked or destroyed. +> +> > > +> +> > > It looks like "ptr" is not used after pci_add_option_rom returns. Does +> +> > > the append patch fix the problem you are seeing? For the proper fix, I +> +> > > think we probably need some sort of memory_region_unmap wrapper or maybe +> +> > > a call to address_space_unmap. +> +> > +> +> > Yes, I think so, maybe this is the proper way to fix this. +> +> +> +> Would you be up for sending a proper patch and testing it? We cannot call +> +> xen_invalidate_map_cache_entry directly from pci.c though, it would need +> +> to be one of the other functions like address_space_unmap for example. +> +> +> +> +> +Yes, I will look into this. +Any updates? + + +> +> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c +> +> > > index e6b08e1..04f98b7 100644 +> +> > > --- a/hw/pci/pci.c +> +> > > +++ b/hw/pci/pci.c +> +> > > @@ -2242,6 +2242,7 @@ static void pci_add_option_rom(PCIDevice *pdev, +> +> > > bool +> +> > > is_default_rom, +> +> > > } +> +> > > pci_register_bar(pdev, PCI_ROM_SLOT, 0, &pdev->rom); +> +> > > + xen_invalidate_map_cache_entry(ptr); +> +> > > } +> +> > > static void pci_del_option_rom(PCIDevice *pdev) +> + diff --git a/classification_output/01/other/8109943 b/classification_output/01/other/8109943 new file mode 100644 index 000000000..7f2b12fbd --- /dev/null +++ b/classification_output/01/other/8109943 @@ -0,0 +1,409 @@ +other: 0.943 +semantic: 0.920 +instruction: 0.911 +mistranslation: 0.877 + +[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application + +Hiï¼ + +We hit a bug in our test while run PCMark 10 in a windows 7 VM, +The VM got stuck and the wallclock was hang after several minutes running +PCMark 10 in it. +It is quite easily to reproduce the bug with the upstream KVM and Qemu. + +We found that KVM can not inject any RTC irq to VM after it was hang, it fails +to +Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. + +static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, + int irq_level, bool line_status) +{ +⦠⦠+ if (!irq_level) { + ioapic->irr &= ~mask; + ret = 1; + goto out; + } +⦠⦠+ if ((edge && old_irr == ioapic->irr) || + (!edge && entry.fields.remote_irr)) { + ret = 0; + goto out; + } + +According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs +register C to to clear the irq flag, and pull down the irq electric pin. + +For Qemu, we will emulate the reading operation in cmos_ioport_read(), +but Guest OS will fire a write operation before to tell which register will be +read +after this write, where we use s->cmos_index to record the following register +to read. + +But in our test, we found that there is a possible situation that Vcpu fails to +read +RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading +registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, +so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, +but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, +it changes s->cmos_index to RTC_YEAR by a writing action. +The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we +will miss +calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never +inject RTC irq, +and Windows VM will hang. +static void cmos_ioport_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + RTCState *s = opaque; + + if ((addr & 1) == 0) { + s->cmos_index = data & 0x7f; + } +â¦â¦ +static uint64_t cmos_ioport_read(void *opaque, hwaddr addr, + unsigned size) +{ + RTCState *s = opaque; + int ret; + if ((addr & 1) == 0) { + return 0xff; + } else { + switch(s->cmos_index) { + +According to CMOS spec, âany write to PROT 0070h should be followed by an +action to PROT 0071h or the RTC +Will be RTC will be left in an unknown stateâ, but it seems that we can not +ensure this sequence in qemu/kvm. + +Any ideas ? + +Thanks, +Hailiang + +Pls see the trace of kvm_pio: + + CPU 1/KVM-15567 [003] .... 209311.762579: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762582: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x89 + CPU 1/KVM-15567 [003] .... 209311.762590: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x17 + CPU 0/KVM-15566 [005] .... 209311.762611: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0xc + CPU 1/KVM-15567 [003] .... 209311.762615: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762619: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x88 + CPU 1/KVM-15567 [003] .... 209311.762627: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x12 + CPU 0/KVM-15566 [005] .... 209311.762632: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x12 + CPU 1/KVM-15567 [003] .... 209311.762633: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 0/KVM-15566 [005] .... 209311.762634: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0xc <--- Firstly write to 0x70, cmo_index = 0xc & +0x7f = 0xc + CPU 1/KVM-15567 [003] .... 209311.762636: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x86 <-- Secondly write to 0x70, cmo_index = 0x86 & +0x7f = 0x6, cover the cmo_index result of first time + CPU 0/KVM-15566 [005] .... 209311.762641: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x6 <-- vcpu0 read 0x6 because cmo_index is 0x6 now + CPU 1/KVM-15567 [003] .... 209311.762644: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x6 <- vcpu1 read 0x6 + CPU 1/KVM-15567 [003] .... 209311.762649: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762669: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x87 + CPU 1/KVM-15567 [003] .... 209311.762678: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x1 + CPU 1/KVM-15567 [003] .... 209311.762683: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762686: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x84 + CPU 1/KVM-15567 [003] .... 209311.762693: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x10 + CPU 1/KVM-15567 [003] .... 209311.762699: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762702: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x82 + CPU 1/KVM-15567 [003] .... 209311.762709: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x25 + CPU 1/KVM-15567 [003] .... 209311.762714: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762717: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x80 + + +Regards, +-Gonglei + +From: Zhanghailiang +Sent: Friday, December 01, 2017 3:03 AM +To: address@hidden; address@hidden; Paolo Bonzini +Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou +Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application + +Hiï¼ + +We hit a bug in our test while run PCMark 10 in a windows 7 VM, +The VM got stuck and the wallclock was hang after several minutes running +PCMark 10 in it. +It is quite easily to reproduce the bug with the upstream KVM and Qemu. + +We found that KVM can not inject any RTC irq to VM after it was hang, it fails +to +Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. + +static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, + int irq_level, bool line_status) +{ +⦠⦠+ if (!irq_level) { + ioapic->irr &= ~mask; + ret = 1; + goto out; + } +⦠⦠+ if ((edge && old_irr == ioapic->irr) || + (!edge && entry.fields.remote_irr)) { + ret = 0; + goto out; + } + +According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs +register C to to clear the irq flag, and pull down the irq electric pin. + +For Qemu, we will emulate the reading operation in cmos_ioport_read(), +but Guest OS will fire a write operation before to tell which register will be +read +after this write, where we use s->cmos_index to record the following register +to read. + +But in our test, we found that there is a possible situation that Vcpu fails to +read +RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading +registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, +so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, +but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, +it changes s->cmos_index to RTC_YEAR by a writing action. +The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we +will miss +calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never +inject RTC irq, +and Windows VM will hang. +static void cmos_ioport_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + RTCState *s = opaque; + + if ((addr & 1) == 0) { + s->cmos_index = data & 0x7f; + } +â¦â¦ +static uint64_t cmos_ioport_read(void *opaque, hwaddr addr, + unsigned size) +{ + RTCState *s = opaque; + int ret; + if ((addr & 1) == 0) { + return 0xff; + } else { + switch(s->cmos_index) { + +According to CMOS spec, âany write to PROT 0070h should be followed by an +action to PROT 0071h or the RTC +Will be RTC will be left in an unknown stateâ, but it seems that we can not +ensure this sequence in qemu/kvm. + +Any ideas ? + +Thanks, +Hailiang + +On 01/12/2017 08:08, Gonglei (Arei) wrote: +> +First write to 0x70, cmos_index = 0xc & 0x7f = 0xc +> +      CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> +> +Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>       CPU 1/KVM-15567 +> +kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because +> +cmos_index is 0x6 now:>       CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size +> +1 count 1 val 0x6> vcpu1 read 0x6:>       CPU 1/KVM-15567 kvm_pio: pio_read +> +at 0x71 size 1 count 1 val 0x6 +This seems to be a Windows bug. The easiest workaround that I +can think of is to clear the interrupts already when 0xc is written, +without waiting for the read (because REG_C can only be read). + +What do you think? + +Thanks, + +Paolo + +I also think it's windows bug, the problem is that it doesn't occur on xen +platform. And there are some other works need to be done while reading REG_C. +So I wrote that patch. + +Thanks, +Gonglei +å件人ï¼Paolo Bonzini +æ¶ä»¶äººï¼é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin +æéï¼é»ä¼æ ,çæ¬£,谢祥æ +æ¶é´ï¼2017-12-02 01:10:08 +主é¢:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application + +On 01/12/2017 08:08, Gonglei (Arei) wrote: +> +First write to 0x70, cmos_index = 0xc & 0x7f = 0xc +> +CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> +> +Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> CPU 1/KVM-15567 +> +kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because +> +cmos_index is 0x6 now:> CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size +> +1 count 1 val 0x6> vcpu1 read 0x6:> CPU 1/KVM-15567 kvm_pio: pio_read +> +at 0x71 size 1 count 1 val 0x6 +This seems to be a Windows bug. The easiest workaround that I +can think of is to clear the interrupts already when 0xc is written, +without waiting for the read (because REG_C can only be read). + +What do you think? + +Thanks, + +Paolo + +On 01/12/2017 18:45, Gonglei (Arei) wrote: +> +I also think it's windows bug, the problem is that it doesn't occur on +> +xen platform. +It's a race, it may just be that RTC PIO is faster in Xen because it's +implemented in the hypervisor. + +I will try reporting it to Microsoft. + +Thanks, + +Paolo + +> +Thanks, +> +Gonglei +> +*å件人ï¼*Paolo Bonzini +> +*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin +> +*æéï¼*é»ä¼æ ,çæ¬£,谢祥æ +> +*æ¶é´ï¼*2017-12-02 01:10:08 +> +*主é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application +> +> +On 01/12/2017 08:08, Gonglei (Arei) wrote: +> +> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc +> +>       CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> +> +> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>       CPU 1/KVM-15567 +> +> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because +> +> cmos_index is 0x6 now:>       CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 +> +> size 1 count 1 val 0x6> vcpu1 +> +read 0x6:>       CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count +> +1 val 0x6 +> +This seems to be a Windows bug. The easiest workaround that I +> +can think of is to clear the interrupts already when 0xc is written, +> +without waiting for the read (because REG_C can only be read). +> +> +What do you think? +> +> +Thanks, +> +> +Paolo + +On 2017/12/2 2:37, Paolo Bonzini wrote: +On 01/12/2017 18:45, Gonglei (Arei) wrote: +I also think it's windows bug, the problem is that it doesn't occur on +xen platform. +It's a race, it may just be that RTC PIO is faster in Xen because it's +implemented in the hypervisor. +No, In Xen, it does not has such problem because it injects the RTC irq without +checking whether its previous irq been cleared or not, which we do has such +checking +in KVM. + +static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, + int irq_level, bool line_status) +{ + ... ... + if (!irq_level) { + ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not +inject RTC irq. + ret = 1; + goto out; + } + +I agree that we move the operation of clearing RTC irq from cmos_ioport_read() +to +cmos_ioport_write() to ensure the action been done. + +Thanks, +Hailiang +I will try reporting it to Microsoft. + +Thanks, + +Paolo +Thanks, +Gonglei +*å件人ï¼*Paolo Bonzini +*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin +*æéï¼*é»ä¼æ ,çæ¬£,谢祥æ +*æ¶é´ï¼*2017-12-02 01:10:08 +*主é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application + +On 01/12/2017 08:08, Gonglei (Arei) wrote: +First write to 0x70, cmos_index = 0xc & 0x7f = 0xc + CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to +0x70, cmos_index = 0x86 & 0x7f = 0x6> CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:> CPU +0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1 +read 0x6:> CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count +1 val 0x6 +This seems to be a Windows bug. The easiest workaround that I +can think of is to clear the interrupts already when 0xc is written, +without waiting for the read (because REG_C can only be read). + +What do you think? + +Thanks, + +Paolo +. + diff --git a/classification_output/01/other/8621822 b/classification_output/01/other/8621822 new file mode 100644 index 000000000..a27abd648 --- /dev/null +++ b/classification_output/01/other/8621822 @@ -0,0 +1,376 @@ +other: 0.989 +instruction: 0.986 +semantic: 0.985 +mistranslation: 0.978 + +[BUG] No irqchip created after commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") + +I apologize if this was already reported, + +I just noticed that with the latest updates QEMU doesn't start with the +following configuration: + +qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu +host,hv_vpindex,hv_synic ... + +qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument +qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument + +If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as +usual. I bisected this to the following commit: + +commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad) +Author: Paolo Bonzini <address@hidden> +Date: Wed Nov 13 10:56:53 2019 +0100 + + kvm: convert "-machine kernel_irqchip" to an accelerator property + +so aparently we now default to 'kernel_irqchip=off'. Is this the desired +behavior? + +-- +Vitaly + +No, absolutely not. I was sure I had tested it, but I will take a look. +Paolo +Il ven 20 dic 2019, 15:11 Vitaly Kuznetsov < +address@hidden +> ha scritto: +I apologize if this was already reported, +I just noticed that with the latest updates QEMU doesn't start with the +following configuration: +qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu host,hv_vpindex,hv_synic ... +qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument +qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument +If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as +usual. I bisected this to the following commit: +commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad) +Author: Paolo Bonzini < +address@hidden +> +Date:  Wed Nov 13 10:56:53 2019 +0100 +  kvm: convert "-machine kernel_irqchip" to an accelerator property +so aparently we now default to 'kernel_irqchip=off'. Is this the desired +behavior? +-- +Vitaly + +Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +accelerator property") moves kernel_irqchip property from "-machine" to +"-accel kvm", but it forgets to set the default value of +kernel_irqchip_allowed and kernel_irqchip_split. + +Also cleaning up the three useless members (kernel_irqchip_allowed, +kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. + +Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator +property") +Signed-off-by: Xiaoyao Li <address@hidden> +--- + accel/kvm/kvm-all.c | 3 +++ + include/hw/boards.h | 3 --- + 2 files changed, 3 insertions(+), 3 deletions(-) + +diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c +index b2f1a5bcb5ef..40f74094f8d3 100644 +--- a/accel/kvm/kvm-all.c ++++ b/accel/kvm/kvm-all.c +@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) + static void kvm_accel_instance_init(Object *obj) + { + KVMState *s = KVM_STATE(obj); ++ MachineClass *mc = MACHINE_GET_CLASS(current_machine); + + s->kvm_shadow_mem = -1; ++ s->kernel_irqchip_allowed = true; ++ s->kernel_irqchip_split = mc->default_kernel_irqchip_split; + } + + static void kvm_accel_class_init(ObjectClass *oc, void *data) +diff --git a/include/hw/boards.h b/include/hw/boards.h +index 61f8bb8e5a42..fb1b43d5b972 100644 +--- a/include/hw/boards.h ++++ b/include/hw/boards.h +@@ -271,9 +271,6 @@ struct MachineState { + + /*< public >*/ + +- bool kernel_irqchip_allowed; +- bool kernel_irqchip_required; +- bool kernel_irqchip_split; + char *dtb; + char *dumpdtb; + int phandle_start; +-- +2.19.1 + +Il sab 28 dic 2019, 09:48 Xiaoyao Li < +address@hidden +> ha scritto: +Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +accelerator property") moves kernel_irqchip property from "-machine" to +"-accel kvm", but it forgets to set the default value of +kernel_irqchip_allowed and kernel_irqchip_split. +Also cleaning up the three useless members (kernel_irqchip_allowed, +kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. +Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") +Signed-off-by: Xiaoyao Li < +address@hidden +> +Please also add a Reported-by line for Vitaly Kuznetsov. +--- + accel/kvm/kvm-all.c | 3 +++ + include/hw/boards.h | 3 --- + 2 files changed, 3 insertions(+), 3 deletions(-) +diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c +index b2f1a5bcb5ef..40f74094f8d3 100644 +--- a/accel/kvm/kvm-all.c ++++ b/accel/kvm/kvm-all.c +@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) + static void kvm_accel_instance_init(Object *obj) + { +   KVMState *s = KVM_STATE(obj); ++  MachineClass *mc = MACHINE_GET_CLASS(current_machine); +   s->kvm_shadow_mem = -1; ++  s->kernel_irqchip_allowed = true; ++  s->kernel_irqchip_split = mc->default_kernel_irqchip_split; +Can you initialize this from the init_machine method instead of assuming that current_machine has been initialized earlier? +Thanks for the quick fix! +Paolo + } + static void kvm_accel_class_init(ObjectClass *oc, void *data) +diff --git a/include/hw/boards.h b/include/hw/boards.h +index 61f8bb8e5a42..fb1b43d5b972 100644 +--- a/include/hw/boards.h ++++ b/include/hw/boards.h +@@ -271,9 +271,6 @@ struct MachineState { +   /*< public >*/ +-  bool kernel_irqchip_allowed; +-  bool kernel_irqchip_required; +-  bool kernel_irqchip_split; +   char *dtb; +   char *dumpdtb; +   int phandle_start; +-- +2.19.1 + +On Sat, 2019-12-28 at 10:02 +0000, Paolo Bonzini wrote: +> +> +> +Il sab 28 dic 2019, 09:48 Xiaoyao Li <address@hidden> ha scritto: +> +> Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +> +> accelerator property") moves kernel_irqchip property from "-machine" to +> +> "-accel kvm", but it forgets to set the default value of +> +> kernel_irqchip_allowed and kernel_irqchip_split. +> +> +> +> Also cleaning up the three useless members (kernel_irqchip_allowed, +> +> kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. +> +> +> +> Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +> +> accelerator property") +> +> Signed-off-by: Xiaoyao Li <address@hidden> +> +> +Please also add a Reported-by line for Vitaly Kuznetsov. +Sure. + +> +> --- +> +> accel/kvm/kvm-all.c | 3 +++ +> +> include/hw/boards.h | 3 --- +> +> 2 files changed, 3 insertions(+), 3 deletions(-) +> +> +> +> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c +> +> index b2f1a5bcb5ef..40f74094f8d3 100644 +> +> --- a/accel/kvm/kvm-all.c +> +> +++ b/accel/kvm/kvm-all.c +> +> @@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) +> +> static void kvm_accel_instance_init(Object *obj) +> +> { +> +> KVMState *s = KVM_STATE(obj); +> +> + MachineClass *mc = MACHINE_GET_CLASS(current_machine); +> +> +> +> s->kvm_shadow_mem = -1; +> +> + s->kernel_irqchip_allowed = true; +> +> + s->kernel_irqchip_split = mc->default_kernel_irqchip_split; +> +> +Can you initialize this from the init_machine method instead of assuming that +> +current_machine has been initialized earlier? +OK, will do it in v2. + +> +Thanks for the quick fix! +BTW, it seems that this patch makes kernel_irqchip default on to workaround the +bug. +However, when explicitly configuring kernel_irqchip=off, guest still fails +booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel +ubuntu guest. Any idea about this? + +> +Paolo +> +> } +> +> +> +> static void kvm_accel_class_init(ObjectClass *oc, void *data) +> +> diff --git a/include/hw/boards.h b/include/hw/boards.h +> +> index 61f8bb8e5a42..fb1b43d5b972 100644 +> +> --- a/include/hw/boards.h +> +> +++ b/include/hw/boards.h +> +> @@ -271,9 +271,6 @@ struct MachineState { +> +> +> +> /*< public >*/ +> +> +> +> - bool kernel_irqchip_allowed; +> +> - bool kernel_irqchip_required; +> +> - bool kernel_irqchip_split; +> +> char *dtb; +> +> char *dumpdtb; +> +> int phandle_start; + +Il sab 28 dic 2019, 10:24 Xiaoyao Li < +address@hidden +> ha scritto: +BTW, it seems that this patch makes kernel_irqchip default on to workaround the +bug. +However, when explicitly configuring kernel_irqchip=off, guest still fails +booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel +ubuntu guest. Any idea about this? +We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu host by chance? +Paolo +> Paolo +> > } +> > +> > static void kvm_accel_class_init(ObjectClass *oc, void *data) +> > diff --git a/include/hw/boards.h b/include/hw/boards.h +> > index 61f8bb8e5a42..fb1b43d5b972 100644 +> > --- a/include/hw/boards.h +> > +++ b/include/hw/boards.h +> > @@ -271,9 +271,6 @@ struct MachineState { +> > +> >   /*< public >*/ +> > +> > -  bool kernel_irqchip_allowed; +> > -  bool kernel_irqchip_required; +> > -  bool kernel_irqchip_split; +> >   char *dtb; +> >   char *dumpdtb; +> >   int phandle_start; + +On Sat, 2019-12-28 at 10:57 +0000, Paolo Bonzini wrote: +> +> +> +Il sab 28 dic 2019, 10:24 Xiaoyao Li <address@hidden> ha scritto: +> +> BTW, it seems that this patch makes kernel_irqchip default on to workaround +> +> the +> +> bug. +> +> However, when explicitly configuring kernel_irqchip=off, guest still fails +> +> booting due to "KVM: failed to send PV IPI: -95" with a latest upstream +> +> kernel +> +> ubuntu guest. Any idea about this? +> +> +We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu +> +host by chance? +Yes, I used -cpu host. + +After using "-cpu host,-kvm-pv-ipi" with kernel_irqchip=off, it can boot +successfully. + +> +Paolo +> +> +> > Paolo +> +> > > } +> +> > > +> +> > > static void kvm_accel_class_init(ObjectClass *oc, void *data) +> +> > > diff --git a/include/hw/boards.h b/include/hw/boards.h +> +> > > index 61f8bb8e5a42..fb1b43d5b972 100644 +> +> > > --- a/include/hw/boards.h +> +> > > +++ b/include/hw/boards.h +> +> > > @@ -271,9 +271,6 @@ struct MachineState { +> +> > > +> +> > > /*< public >*/ +> +> > > +> +> > > - bool kernel_irqchip_allowed; +> +> > > - bool kernel_irqchip_required; +> +> > > - bool kernel_irqchip_split; +> +> > > char *dtb; +> +> > > char *dumpdtb; +> +> > > int phandle_start; +> +> + diff --git a/classification_output/01/other/8627146 b/classification_output/01/other/8627146 new file mode 100644 index 000000000..dc13c18a0 --- /dev/null +++ b/classification_output/01/other/8627146 @@ -0,0 +1,364 @@ +other: 0.967 +semantic: 0.951 +instruction: 0.930 +mistranslation: 0.855 + +[Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / CMP_LEG bits + +Hi Community, + +This email contains 3 bugs appear to share the same root cause. + +[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode: + +qemu-system-x86_64 \ + -machine q35 \ + -m 4G -smp 4 \ + -kernel ./arch/x86/boot/bzImage \ + -bios /usr/share/ovmf/OVMF.fd \ + -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ + -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ + -nographic \ + -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +qemu-system-x86_64: warning: TCG doesn't support requested feature: +CPUID.01H:EDX.ht [bit 28] +qemu-system-x86_64: warning: TCG doesn't support requested feature: +CPUID.80000001H:ECX.cmp-legacy [bit 1] +(repeats 4 times, once per vCPU) +Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in +x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the +warnings. +Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and +CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these +bits trigger the warnings above. +[2] Also, Zhao pointed me to a similar report on GitLab: +https://gitlab.com/qemu-project/qemu/-/issues/2894 +The symptoms there look identical to what we're seeing. +By convention we file one issue per email, but these two appear to share the +same root cause, so I'm describing them together here. +[3] My colleague Alan noticed what appears to be a related problem: if we launch +a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing +the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest) +enabled. In other words, under KVM the ht bit seems to be forced on even when +the user tries to disable it. +Best regards, +Ewan + +On 4/29/25 11:02 AM, Ewan Hai wrote: +Hi Community, + +This email contains 3 bugs appear to share the same root cause. + +[1] We ran into the following warnings when running QEMU v10.0.0 in TCG mode: + +qemu-system-x86_64 \ +  -machine q35 \ +  -m 4G -smp 4 \ +  -kernel ./arch/x86/boot/bzImage \ +  -bios /usr/share/ovmf/OVMF.fd \ +  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +  -nographic \ +  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +qemu-system-x86_64: warning: TCG doesn't support requested feature: +CPUID.01H:EDX.ht [bit 28] +qemu-system-x86_64: warning: TCG doesn't support requested feature: +CPUID.80000001H:ECX.cmp-legacy [bit 1] +(repeats 4 times, once per vCPU) +Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up CPUID_HT in +x86_cpu_expand_features() instead of cpu_x86_cpuid()" is what introduced the +warnings. +Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) and +CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT support, these +bits trigger the warnings above. +[2] Also, Zhao pointed me to a similar report on GitLab: +https://gitlab.com/qemu-project/qemu/-/issues/2894 +The symptoms there look identical to what we're seeing. +By convention we file one issue per email, but these two appear to share the +same root cause, so I'm describing them together here. +[3] My colleague Alan noticed what appears to be a related problem: if we launch +a guest with '-cpu <model>,-ht --enable-kvm', which means explicitly removing +the ht flag, but the guest still reports HT(cat /proc/cpuinfo in linux guest) +enabled. In other words, under KVM the ht bit seems to be forced on even when +the user tries to disable it. +XiaoYao reminded me that issue [3] stems from a different patch. Please ignore +it for nowâI'll start a separate thread to discuss that one independently. +Best regards, +Ewan + +On 4/29/2025 11:02 AM, Ewan Hai wrote: +Hi Community, + +This email contains 3 bugs appear to share the same root cause. +[1] We ran into the following warnings when running QEMU v10.0.0 in TCG +mode: +qemu-system-x86_64 \ +  -machine q35 \ +  -m 4G -smp 4 \ +  -kernel ./arch/x86/boot/bzImage \ +  -bios /usr/share/ovmf/OVMF.fd \ +  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +  -nographic \ +  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +qemu-system-x86_64: warning: TCG doesn't support requested feature: +CPUID.01H:EDX.ht [bit 28] +qemu-system-x86_64: warning: TCG doesn't support requested feature: +CPUID.80000001H:ECX.cmp-legacy [bit 1] +(repeats 4 times, once per vCPU) +Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up +CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is +what introduced the warnings. +Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) +and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT +support, these bits trigger the warnings above. +[2] Also, Zhao pointed me to a similar report on GitLab: +https://gitlab.com/qemu-project/qemu/-/issues/2894 +The symptoms there look identical to what we're seeing. +By convention we file one issue per email, but these two appear to share +the same root cause, so I'm describing them together here. +It was caused by my two patches. I think the fix can be as follow. +If no objection from the community, I can submit the formal patch. + +diff --git a/target/i386/cpu.c b/target/i386/cpu.c +index 1f970aa4daa6..fb95aadd6161 100644 +--- a/target/i386/cpu.c ++++ b/target/i386/cpu.c +@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +vendor1, +CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \ + CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ + CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \ +- CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE) ++ CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \ ++ CPUID_HT) + /* partly implemented: + CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */ + /* missing: +- CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */ ++ CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */ + + /* + * Kernel-only features that can be shown to usermode programs even if +@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +vendor1, +#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ + CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \ +- CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES) ++ CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \ ++ CPUID_EXT3_CMP_LEG) + + #define TCG_EXT4_FEATURES 0 +[3] My colleague Alan noticed what appears to be a related problem: if +we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +explicitly removing the ht flag, but the guest still reports HT(cat / +proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +bit seems to be forced on even when the user tries to disable it. +This has been the behavior of QEMU for many years, not some regression +introduced by my patches. We can discuss how to address it separately. +Best regards, +Ewan + +On Tue, Apr 29, 2025 at 01:55:59PM +0800, Xiaoyao Li wrote: +> +Date: Tue, 29 Apr 2025 13:55:59 +0800 +> +From: Xiaoyao Li <xiaoyao.li@intel.com> +> +Subject: Re: [Bug] QEMU TCG warnings after commit c6bd2dd63420 - HTT / +> +CMP_LEG bits +> +> +On 4/29/2025 11:02 AM, Ewan Hai wrote: +> +> Hi Community, +> +> +> +> This email contains 3 bugs appear to share the same root cause. +> +> +> +> [1] We ran into the following warnings when running QEMU v10.0.0 in TCG +> +> mode: +> +> +> +> qemu-system-x86_64 \ +> +>  -machine q35 \ +> +>  -m 4G -smp 4 \ +> +>  -kernel ./arch/x86/boot/bzImage \ +> +>  -bios /usr/share/ovmf/OVMF.fd \ +> +>  -drive file=~/kernel/rootfs.ext4,index=0,format=raw,media=disk \ +> +>  -drive file=~/kernel/swap.img,index=1,format=raw,media=disk \ +> +>  -nographic \ +> +>  -append 'root=/dev/sda rw resume=/dev/sdb console=ttyS0 nokaslr' +> +> +> +> qemu-system-x86_64: warning: TCG doesn't support requested feature: +> +> CPUID.01H:EDX.ht [bit 28] +> +> qemu-system-x86_64: warning: TCG doesn't support requested feature: +> +> CPUID.80000001H:ECX.cmp-legacy [bit 1] +> +> (repeats 4 times, once per vCPU) +> +> +> +> Tracing the history shows that commit c6bd2dd63420 "i386/cpu: Set up +> +> CPUID_HT in x86_cpu_expand_features() instead of cpu_x86_cpuid()" is +> +> what introduced the warnings. +> +> +> +> Since that commit, TCG unconditionally advertises HTT (CPUID 1 EDX[28]) +> +> and CMP_LEG (CPUID 8000_0001 ECX[1]). Because TCG itself has no SMT +> +> support, these bits trigger the warnings above. +> +> +> +> [2] Also, Zhao pointed me to a similar report on GitLab: +> +> +https://gitlab.com/qemu-project/qemu/-/issues/2894 +> +> The symptoms there look identical to what we're seeing. +> +> +> +> By convention we file one issue per email, but these two appear to share +> +> the same root cause, so I'm describing them together here. +> +> +It was caused by my two patches. I think the fix can be as follow. +> +If no objection from the community, I can submit the formal patch. +> +> +diff --git a/target/i386/cpu.c b/target/i386/cpu.c +> +index 1f970aa4daa6..fb95aadd6161 100644 +> +--- a/target/i386/cpu.c +> ++++ b/target/i386/cpu.c +> +@@ -776,11 +776,12 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> +vendor1, +> +CPUID_PAE | CPUID_MCE | CPUID_CX8 | CPUID_APIC | CPUID_SEP | \ +> +CPUID_MTRR | CPUID_PGE | CPUID_MCA | CPUID_CMOV | CPUID_PAT | \ +> +CPUID_PSE36 | CPUID_CLFLUSH | CPUID_ACPI | CPUID_MMX | \ +> +- CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE) +> ++ CPUID_FXSR | CPUID_SSE | CPUID_SSE2 | CPUID_SS | CPUID_DE | \ +> ++ CPUID_HT) +> +/* partly implemented: +> +CPUID_MTRR, CPUID_MCA, CPUID_CLFLUSH (needed for Win64) */ +> +/* missing: +> +- CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_HT, CPUID_TM, CPUID_PBE */ +> ++ CPUID_VME, CPUID_DTS, CPUID_SS, CPUID_TM, CPUID_PBE */ +> +> +/* +> +* Kernel-only features that can be shown to usermode programs even if +> +@@ -848,7 +849,8 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t +> +vendor1, +> +> +#define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ +> +CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A | \ +> +- CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES) +> ++ CPUID_EXT3_3DNOWPREFETCH | CPUID_EXT3_KERNEL_FEATURES | \ +> ++ CPUID_EXT3_CMP_LEG) +> +> +#define TCG_EXT4_FEATURES 0 +This fix is fine for me...at least from SDM, HTT depends on topology and +it should exist when user sets "-smp 4". + +> +> [3] My colleague Alan noticed what appears to be a related problem: if +> +> we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +> +> explicitly removing the ht flag, but the guest still reports HT(cat +> +> /proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +> +> bit seems to be forced on even when the user tries to disable it. +> +> +XiaoYao reminded me that issue [3] stems from a different patch. Please +> +ignore it for nowâI'll start a separate thread to discuss that one +> +independently. +I haven't found any other thread :-). + +By the way, just curious, in what cases do you need to disbale the HT +flag? "-smp 4" means 4 cores with 1 thread per core, and is it not +enough? + +As for the â-htâ behavior, I'm also unsure whether this should be fixed +or not - one possible consideration is whether â-htâ would be useful. + +On 5/8/25 5:04 PM, Zhao Liu wrote: +[3] My colleague Alan noticed what appears to be a related problem: if +we launch a guest with '-cpu <model>,-ht --enable-kvm', which means +explicitly removing the ht flag, but the guest still reports HT(cat +/proc/cpuinfo in linux guest) enabled. In other words, under KVM the ht +bit seems to be forced on even when the user tries to disable it. +XiaoYao reminded me that issue [3] stems from a different patch. Please +ignore it for nowâI'll start a separate thread to discuss that one +independently. +I haven't found any other thread :-). +Please refer to +https://lore.kernel.org/all/db6ae3bb-f4e5-4719-9beb-623fcff56af2@zhaoxin.com/ +. +By the way, just curious, in what cases do you need to disbale the HT +flag? "-smp 4" means 4 cores with 1 thread per core, and is it not +enough? + +As for the â-htâ behavior, I'm also unsure whether this should be fixed +or not - one possible consideration is whether â-htâ would be useful. +I wasn't trying to target any specific use case, using "-ht" was simply a way to +check how the ht feature behaves under both KVM and TCG. There's no special +workload behind it; I just wanted to confirm that the flag is respected (or not) +in each mode. + diff --git a/classification_output/01/other/8653736 b/classification_output/01/other/8653736 new file mode 100644 index 000000000..f3d47976b --- /dev/null +++ b/classification_output/01/other/8653736 @@ -0,0 +1,120 @@ +other: 0.944 +semantic: 0.941 +instruction: 0.935 +mistranslation: 0.907 + +[Qemu-devel] [Bug in qemu-system-ppc running Mac OS 9 on Windows 10] + +Hi all, + +I've been experiencing issues when installing Mac OS 9.x using +qemu-system-ppc.exe in Windows 10. After booting from CD image, +partitioning a fresh disk image often hangs Qemu. When using a +pre-partitioned disk image, the OS installation process halts +somewhere during the process. The issues can be resolved by setting +qemu-system-ppc.exe to run in Windows 7 compatibility mode. +AFAIK all Qemu builds for Windows since Mac OS 9 became available as +guest are affected. +The issue is reproducible by installing Qemu for Windows from Stephan +Weil on Windows 10 and boot/install Mac OS 9.x + +Best regards and thanks for looking into this, +Howard + +On Nov 25, 2016, at 9:26 AM, address@hidden wrote: +Hi all, + +I've been experiencing issues when installing Mac OS 9.x using +qemu-system-ppc.exe in Windows 10. After booting from CD image, +partitioning a fresh disk image often hangs Qemu. When using a +pre-partitioned disk image, the OS installation process halts +somewhere during the process. The issues can be resolved by setting +qemu-system-ppc.exe to run in Windows 7 compatibility mode. +AFAIK all Qemu builds for Windows since Mac OS 9 became available as +guest are affected. +The issue is reproducible by installing Qemu for Windows from Stephan +Weil on Windows 10 and boot/install Mac OS 9.x + +Best regards and thanks for looking into this, +Howard +I assume there was some kind of behavior change for some of the +Windows API between Windows 7 and Windows 10, that is my guess as to +why the compatibility mode works. Could you run 'make check' on your +system, once in Windows 7 and once in Windows 10. Maybe the tests +will tell us something. I'm hoping that one of the tests succeeds in +Windows 7 and fails in Windows 10. That would help us pinpoint what +the problem is. +What I mean by run in Windows 7 is set the mingw environment to run +in Windows 7 compatibility mode (if possible). If you have Windows 7 +on another partition you could boot from, that would be better. +Good luck. +p.s. use 'make check -k' to allow all the tests to run (even if one +or more of the tests fails). + +> +> Hi all, +> +> +> +> I've been experiencing issues when installing Mac OS 9.x using +> +> qemu-system-ppc.exe in Windows 10. After booting from CD image, +> +> partitioning a fresh disk image often hangs Qemu. When using a +> +> pre-partitioned disk image, the OS installation process halts +> +> somewhere during the process. The issues can be resolved by setting +> +> qemu-system-ppc.exe to run in Windows 7 compatibility mode. +> +> AFAIK all Qemu builds for Windows since Mac OS 9 became available as +> +> guest are affected. +> +> The issue is reproducible by installing Qemu for Windows from Stephan +> +> Weil on Windows 10 and boot/install Mac OS 9.x +> +> +> +> Best regards and thanks for looking into this, +> +> Howard +> +> +> +I assume there was some kind of behavior change for some of the Windows API +> +between Windows 7 and Windows 10, that is my guess as to why the +> +compatibility mode works. Could you run 'make check' on your system, once in +> +Windows 7 and once in Windows 10. Maybe the tests will tell us something. +> +I'm hoping that one of the tests succeeds in Windows 7 and fails in Windows +> +10. That would help us pinpoint what the problem is. +> +> +What I mean by run in Windows 7 is set the mingw environment to run in +> +Windows 7 compatibility mode (if possible). If you have Windows 7 on another +> +partition you could boot from, that would be better. +> +> +Good luck. +> +> +p.s. use 'make check -k' to allow all the tests to run (even if one or more +> +of the tests fails). +Hi, + +Thank you for you suggestion, but I have no means to run the check you +suggest. I cross-compile from Linux. + +Best regards, +Howard + diff --git a/classification_output/01/other/8691137 b/classification_output/01/other/8691137 new file mode 100644 index 000000000..7847ba8bc --- /dev/null +++ b/classification_output/01/other/8691137 @@ -0,0 +1,180 @@ +other: 0.690 +instruction: 0.581 +mistranslation: 0.554 +semantic: 0.521 + +[Qemu-devel] [BUG 2.6] Broken CONFIG_TPM? + +A compilation test with clang -Weverything reported this problem: + +config-host.h:112:20: warning: '$' in identifier +[-Wdollar-in-identifier-extension] + +The line of code looks like this: + +#define CONFIG_TPM $(CONFIG_SOFTMMU) + +This is fine for Makefile code, but won't work as expected in C code. + +Am 28.04.2016 um 22:33 schrieb Stefan Weil: +> +A compilation test with clang -Weverything reported this problem: +> +> +config-host.h:112:20: warning: '$' in identifier +> +[-Wdollar-in-identifier-extension] +> +> +The line of code looks like this: +> +> +#define CONFIG_TPM $(CONFIG_SOFTMMU) +> +> +This is fine for Makefile code, but won't work as expected in C code. +> +A complete 64 bit build with clang -Weverything creates a log file of +1.7 GB. +Here are the uniq warnings sorted by their frequency: + + 1 -Wflexible-array-extensions + 1 -Wgnu-folding-constant + 1 -Wunknown-pragmas + 1 -Wunknown-warning-option + 1 -Wunreachable-code-loop-increment + 2 -Warray-bounds-pointer-arithmetic + 2 -Wdollar-in-identifier-extension + 3 -Woverlength-strings + 3 -Wweak-vtables + 4 -Wgnu-empty-struct + 4 -Wstring-conversion + 6 -Wclass-varargs + 7 -Wc99-extensions + 7 -Wc++-compat + 8 -Wfloat-equal + 11 -Wformat-nonliteral + 16 -Wshift-negative-value + 19 -Wglobal-constructors + 28 -Wc++11-long-long + 29 -Wembedded-directive + 38 -Wvla + 40 -Wcovered-switch-default + 40 -Wmissing-variable-declarations + 49 -Wold-style-cast + 53 -Wgnu-conditional-omitted-operand + 56 -Wformat-pedantic + 61 -Wvariadic-macros + 77 -Wc++11-extensions + 83 -Wgnu-flexible-array-initializer + 83 -Wzero-length-array + 96 -Wgnu-designator + 102 -Wmissing-noreturn + 103 -Wconditional-uninitialized + 107 -Wdisabled-macro-expansion + 115 -Wunreachable-code-return + 134 -Wunreachable-code + 243 -Wunreachable-code-break + 257 -Wfloat-conversion + 280 -Wswitch-enum + 291 -Wpointer-arith + 298 -Wshadow + 378 -Wassign-enum + 395 -Wused-but-marked-unused + 420 -Wreserved-id-macro + 493 -Wdocumentation + 510 -Wshift-sign-overflow + 565 -Wgnu-case-range + 566 -Wgnu-zero-variadic-macro-arguments + 650 -Wbad-function-cast + 705 -Wmissing-field-initializers + 817 -Wgnu-statement-expression + 968 -Wdocumentation-unknown-command + 1021 -Wextra-semi + 1112 -Wgnu-empty-initializer + 1138 -Wcast-qual + 1509 -Wcast-align + 1766 -Wextended-offsetof + 1937 -Wsign-compare + 2130 -Wpacked + 2404 -Wunused-macros + 3081 -Wpadded + 4182 -Wconversion + 5430 -Wlanguage-extension-token + 6655 -Wshorten-64-to-32 + 6995 -Wpedantic + 7354 -Wunused-parameter + 27659 -Wsign-conversion + +Stefan Weil <address@hidden> writes: + +> +A compilation test with clang -Weverything reported this problem: +> +> +config-host.h:112:20: warning: '$' in identifier +> +[-Wdollar-in-identifier-extension] +> +> +The line of code looks like this: +> +> +#define CONFIG_TPM $(CONFIG_SOFTMMU) +> +> +This is fine for Makefile code, but won't work as expected in C code. +Broken in commit 3b8acc1 "configure: fix TPM logic". Cc'ing Paolo. + +Impact: #ifdef CONFIG_TPM never disables code. There are no other uses +of CONFIG_TPM in C code. + +I had a quick peek at configure and create_config, but refrained from +attempting to fix this, since I don't understand when exactly CONFIG_TPM +should be defined. + +On 29 April 2016 at 08:42, Markus Armbruster <address@hidden> wrote: +> +Stefan Weil <address@hidden> writes: +> +> +> A compilation test with clang -Weverything reported this problem: +> +> +> +> config-host.h:112:20: warning: '$' in identifier +> +> [-Wdollar-in-identifier-extension] +> +> +> +> The line of code looks like this: +> +> +> +> #define CONFIG_TPM $(CONFIG_SOFTMMU) +> +> +> +> This is fine for Makefile code, but won't work as expected in C code. +> +> +Broken in commit 3b8acc1 "configure: fix TPM logic". Cc'ing Paolo. +> +> +Impact: #ifdef CONFIG_TPM never disables code. There are no other uses +> +of CONFIG_TPM in C code. +> +> +I had a quick peek at configure and create_config, but refrained from +> +attempting to fix this, since I don't understand when exactly CONFIG_TPM +> +should be defined. +Looking at 'git blame' suggests this has been wrong like this for +some years, so we don't need to scramble to fix it for 2.6. + +thanks +-- PMM + diff --git a/classification_output/01/other/9777608 b/classification_output/01/other/9777608 new file mode 100644 index 000000000..c82ece20b --- /dev/null +++ b/classification_output/01/other/9777608 @@ -0,0 +1,281 @@ +other: 0.983 +instruction: 0.978 +semantic: 0.968 +mistranslation: 0.948 + +[Bug] Take more 150s to boot qemu on ARM64 + +Hi all, +I encounter a issue with kernel 5.19-rc1 on a ARM64 board: it takes +about 150s between beginning to run qemu command and beginng to boot +Linux kernel ("EFI stub: Booting Linux Kernel..."). +But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel +code and it finds c2445d387850 ("srcu: Add contention check to +call_srcu() srcu_data ->lock acquisition"). +The qemu (qemu version is 6.2.92) command i run is : + +./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \ +--trace "kvm*" \ +-cpu host \ +-machine virt,accel=kvm,gic-version=3 \ +-machine smp.cpus=2,smp.sockets=2 \ +-no-reboot \ +-nographic \ +-monitor unix:/home/cx/qmp-test,server,nowait \ +-bios /home/cx/boot/QEMU_EFI.fd \ +-kernel /home/cx/boot/Image \ +-device +pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1 +\ +-device vfio-pci,host=7d:01.3,id=net0 \ +-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4 \ +-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \ +-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \ +-net none \ +-D /home/cx/qemu_log.txt +I am not familiar with rcu code, and don't know how it causes the issue. +Do you have any idea about this issue? +Best Regard, + +Xiang Chen + +On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote: +> +Hi all, +> +> +I encounter a issue with kernel 5.19-rc1 on a ARM64 board: it takes about +> +150s between beginning to run qemu command and beginng to boot Linux kernel +> +("EFI stub: Booting Linux Kernel..."). +> +> +But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code +> +and it finds c2445d387850 ("srcu: Add contention check to call_srcu() +> +srcu_data ->lock acquisition"). +> +> +The qemu (qemu version is 6.2.92) command i run is : +> +> +./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \ +> +--trace "kvm*" \ +> +-cpu host \ +> +-machine virt,accel=kvm,gic-version=3 \ +> +-machine smp.cpus=2,smp.sockets=2 \ +> +-no-reboot \ +> +-nographic \ +> +-monitor unix:/home/cx/qmp-test,server,nowait \ +> +-bios /home/cx/boot/QEMU_EFI.fd \ +> +-kernel /home/cx/boot/Image \ +> +-device +> +pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1 +> +\ +> +-device vfio-pci,host=7d:01.3,id=net0 \ +> +-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4 \ +> +-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \ +> +-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \ +> +-net none \ +> +-D /home/cx/qemu_log.txt +> +> +I am not familiar with rcu code, and don't know how it causes the issue. Do +> +you have any idea about this issue? +Please see the discussion here: +https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/ +Though that report requires ACPI to be forced on to get the +delay, which results in more than 9,000 back-to-back calls to +synchronize_srcu_expedited(). I cannot reproduce this on my setup, even +with an artificial tight loop invoking synchronize_srcu_expedited(), +but then again I don't have ARM hardware. + +My current guess is that the following patch, but with larger values for +SRCU_MAX_NODELAY_PHASE. Here "larger" might well be up in the hundreds, +or perhaps even larger. + +If you get a chance to experiment with this, could you please reply +to the discussion at the above URL? (Or let me know, and I can CC +you on the next message in that thread.) + + Thanx, Paul + +------------------------------------------------------------------------ + +diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c +index 50ba70f019dea..0db7873f4e95b 100644 +--- a/kernel/rcu/srcutree.c ++++ b/kernel/rcu/srcutree.c +@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp) + + #define SRCU_INTERVAL 1 // Base delay if no expedited GPs +pending. + #define SRCU_MAX_INTERVAL 10 // Maximum incremental delay from slow +readers. +-#define SRCU_MAX_NODELAY_PHASE 1 // Maximum per-GP-phase consecutive +no-delay instances. ++#define SRCU_MAX_NODELAY_PHASE 3 // Maximum per-GP-phase consecutive +no-delay instances. + #define SRCU_MAX_NODELAY 100 // Maximum consecutive no-delay +instances. + + /* +@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp) + */ + static unsigned long srcu_get_delay(struct srcu_struct *ssp) + { ++ unsigned long gpstart; ++ unsigned long j; + unsigned long jbase = SRCU_INTERVAL; + + if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), +READ_ONCE(ssp->srcu_gp_seq_needed_exp))) + jbase = 0; +- if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) +- jbase += jiffies - READ_ONCE(ssp->srcu_gp_start); +- if (!jbase) { +- WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); +- if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE) +- jbase = 1; ++ if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) { ++ j = jiffies - 1; ++ gpstart = READ_ONCE(ssp->srcu_gp_start); ++ if (time_after(j, gpstart)) ++ jbase += j - gpstart; ++ if (!jbase) { ++ WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); ++ if (READ_ONCE(ssp->srcu_n_exp_nodelay) > +SRCU_MAX_NODELAY_PHASE) ++ jbase = 1; ++ } + } + return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase; + } + +å¨ 2022/6/13 21:22, Paul E. McKenney åé: +On Mon, Jun 13, 2022 at 08:26:34PM +0800, chenxiang (M) wrote: +Hi all, + +I encounter a issue with kernel 5.19-rc1 on a ARM64 board: it takes about +150s between beginning to run qemu command and beginng to boot Linux kernel +("EFI stub: Booting Linux Kernel..."). + +But in kernel 5.18-rc4, it only takes about 5s. I git bisect the kernel code +and it finds c2445d387850 ("srcu: Add contention check to call_srcu() +srcu_data ->lock acquisition"). + +The qemu (qemu version is 6.2.92) command i run is : + +./qemu-system-aarch64 -m 4G,slots=4,maxmem=8g \ +--trace "kvm*" \ +-cpu host \ +-machine virt,accel=kvm,gic-version=3 \ +-machine smp.cpus=2,smp.sockets=2 \ +-no-reboot \ +-nographic \ +-monitor unix:/home/cx/qmp-test,server,nowait \ +-bios /home/cx/boot/QEMU_EFI.fd \ +-kernel /home/cx/boot/Image \ +-device +pcie-root-port,port=0x8,chassis=1,id=net1,bus=pcie.0,multifunction=on,addr=0x1 +\ +-device vfio-pci,host=7d:01.3,id=net0 \ +-device virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=4 \ +-drive file=/home/cx/boot/boot_ubuntu.img,if=none,id=drive0 \ +-append "rdinit=init console=ttyAMA0 root=/dev/vda rootfstype=ext4 rw " \ +-net none \ +-D /home/cx/qemu_log.txt + +I am not familiar with rcu code, and don't know how it causes the issue. Do +you have any idea about this issue? +Please see the discussion here: +https://lore.kernel.org/all/20615615-0013-5adc-584f-2b1d5c03ebfc@linaro.org/ +Though that report requires ACPI to be forced on to get the +delay, which results in more than 9,000 back-to-back calls to +synchronize_srcu_expedited(). I cannot reproduce this on my setup, even +with an artificial tight loop invoking synchronize_srcu_expedited(), +but then again I don't have ARM hardware. + +My current guess is that the following patch, but with larger values for +SRCU_MAX_NODELAY_PHASE. Here "larger" might well be up in the hundreds, +or perhaps even larger. + +If you get a chance to experiment with this, could you please reply +to the discussion at the above URL? (Or let me know, and I can CC +you on the next message in that thread.) +Ok, thanks, i will reply it on above URL. +Thanx, Paul + +------------------------------------------------------------------------ + +diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c +index 50ba70f019dea..0db7873f4e95b 100644 +--- a/kernel/rcu/srcutree.c ++++ b/kernel/rcu/srcutree.c +@@ -513,7 +513,7 @@ static bool srcu_readers_active(struct srcu_struct *ssp) +#define SRCU_INTERVAL 1 // Base delay if no expedited GPs pending. +#define SRCU_MAX_INTERVAL 10 // Maximum incremental delay from slow +readers. +-#define SRCU_MAX_NODELAY_PHASE 1 // Maximum per-GP-phase consecutive +no-delay instances. ++#define SRCU_MAX_NODELAY_PHASE 3 // Maximum per-GP-phase consecutive +no-delay instances. + #define SRCU_MAX_NODELAY 100 // Maximum consecutive no-delay +instances. +/* +@@ -522,16 +522,22 @@ static bool srcu_readers_active(struct srcu_struct *ssp) + */ + static unsigned long srcu_get_delay(struct srcu_struct *ssp) + { ++ unsigned long gpstart; ++ unsigned long j; + unsigned long jbase = SRCU_INTERVAL; +if (ULONG_CMP_LT(READ_ONCE(ssp->srcu_gp_seq), READ_ONCE(ssp->srcu_gp_seq_needed_exp))) +jbase = 0; +- if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) +- jbase += jiffies - READ_ONCE(ssp->srcu_gp_start); +- if (!jbase) { +- WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); +- if (READ_ONCE(ssp->srcu_n_exp_nodelay) > SRCU_MAX_NODELAY_PHASE) +- jbase = 1; ++ if (rcu_seq_state(READ_ONCE(ssp->srcu_gp_seq))) { ++ j = jiffies - 1; ++ gpstart = READ_ONCE(ssp->srcu_gp_start); ++ if (time_after(j, gpstart)) ++ jbase += j - gpstart; ++ if (!jbase) { ++ WRITE_ONCE(ssp->srcu_n_exp_nodelay, +READ_ONCE(ssp->srcu_n_exp_nodelay) + 1); ++ if (READ_ONCE(ssp->srcu_n_exp_nodelay) > +SRCU_MAX_NODELAY_PHASE) ++ jbase = 1; ++ } + } + return jbase > SRCU_MAX_INTERVAL ? SRCU_MAX_INTERVAL : jbase; + } +. + diff --git a/classification_output/01/other/9840852 b/classification_output/01/other/9840852 new file mode 100644 index 000000000..d5827c3d0 --- /dev/null +++ b/classification_output/01/other/9840852 @@ -0,0 +1,1179 @@ +other: 0.980 +semantic: 0.975 +instruction: 0.947 +mistranslation: 0.942 + +[Bug Report] smmuv3 event 0x10 report when running virtio-blk-pci + +Hi All, + +When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +during kernel booting up. + +qemu command which I use is as below: + +qemu-system-aarch64 -machine virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +-kernel Image -initrd minifs.cpio.gz \ +-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +-device +pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +\ +-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +-device +virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +-drive file=/home/boot.img,if=none,id=drive0,format=raw + +smmuv3 event 0x10 log: +[...] +[ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +[ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +[ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +[ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks (1.07 +GB/1.00 GiB) +[ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +[ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +[ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +[ 1.968381] clk: Disabling unused clocks +[ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +[ 1.968990] PM: genpd: Disabling unused power domains +[ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +[ 1.969814] ALSA device list: +[ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +[ 1.970471] No soundcards found. +[ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +[ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +[ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +[ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +[ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +[ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +[ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +[ 1.975005] Freeing unused kernel memory: 10112K +[ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +[ 1.975442] Run init as init process + +Another information is that if "maxcpus=3" is removed from the kernel command +line, +it will be OK. + +I am not sure if there is a bug about vsmmu. It will be very appreciated if +anyone +know this issue or can take a look at it. + +Thanks, +Zhou + +On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +> +Hi All, +> +> +When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +> +during kernel booting up. +Does it still do this if you either: + (1) use the v9.1.0 release (commit fd1952d814da) + (2) use "-machine virt-9.1" instead of "-machine virt" + +? + +My suspicion is that this will have started happening now that +we expose an SMMU with two-stage translation support to the guest +in the "virt" machine type (which we do not if you either +use virt-9.1 or in the v9.1.0 release). + +I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of +the two-stage support). + +> +qemu command which I use is as below: +> +> +qemu-system-aarch64 -machine +> +virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +-kernel Image -initrd minifs.cpio.gz \ +> +-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +-device +> +pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +\ +> +-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +> +-device +> +virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +-drive file=/home/boot.img,if=none,id=drive0,format=raw +> +> +smmuv3 event 0x10 log: +> +[...] +> +[ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +[ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +[ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +[ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +(1.07 GB/1.00 GiB) +> +[ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +[ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +[ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +[ 1.968381] clk: Disabling unused clocks +> +[ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +[ 1.968990] PM: genpd: Disabling unused power domains +> +[ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.969814] ALSA device list: +> +[ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.970471] No soundcards found. +> +[ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +[ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +[ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +[ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +[ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.975005] Freeing unused kernel memory: 10112K +> +[ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.975442] Run init as init process +> +> +Another information is that if "maxcpus=3" is removed from the kernel command +> +line, +> +it will be OK. +> +> +I am not sure if there is a bug about vsmmu. It will be very appreciated if +> +anyone +> +know this issue or can take a look at it. +thanks +-- PMM + +On 2024/9/9 22:31, Peter Maydell wrote: +> +On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +> +> +> Hi All, +> +> +> +> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +> +> during kernel booting up. +> +> +Does it still do this if you either: +> +(1) use the v9.1.0 release (commit fd1952d814da) +> +(2) use "-machine virt-9.1" instead of "-machine virt" +I tested above two cases, the problem is still there. + +> +> +? +> +> +My suspicion is that this will have started happening now that +> +we expose an SMMU with two-stage translation support to the guest +> +in the "virt" machine type (which we do not if you either +> +use virt-9.1 or in the v9.1.0 release). +> +> +I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of +> +the two-stage support). +> +> +> qemu command which I use is as below: +> +> +> +> qemu-system-aarch64 -machine +> +> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +> -kernel Image -initrd minifs.cpio.gz \ +> +> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +> -device +> +> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +> \ +> +> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +> +> -device +> +> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +> -drive file=/home/boot.img,if=none,id=drive0,format=raw +> +> +> +> smmuv3 event 0x10 log: +> +> [...] +> +> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +> (1.07 GB/1.00 GiB) +> +> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +> [ 1.968381] clk: Disabling unused clocks +> +> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +> [ 1.968990] PM: genpd: Disabling unused power domains +> +> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.969814] ALSA device list: +> +> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.970471] No soundcards found. +> +> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.975005] Freeing unused kernel memory: 10112K +> +> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.975442] Run init as init process +> +> +> +> Another information is that if "maxcpus=3" is removed from the kernel +> +> command line, +> +> it will be OK. +> +> +> +> I am not sure if there is a bug about vsmmu. It will be very appreciated if +> +> anyone +> +> know this issue or can take a look at it. +> +> +thanks +> +-- PMM +> +. + +Hi Zhou, +On 9/10/24 03:24, Zhou Wang via wrote: +> +On 2024/9/9 22:31, Peter Maydell wrote: +> +> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +>> Hi All, +> +>> +> +>> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +> +>> during kernel booting up. +> +> Does it still do this if you either: +> +> (1) use the v9.1.0 release (commit fd1952d814da) +> +> (2) use "-machine virt-9.1" instead of "-machine virt" +> +I tested above two cases, the problem is still there. +Thank you for reporting. I am able to reproduce and effectively the +maxcpus kernel option is triggering the issue. It works without. I will +come back to you asap. + +Eric +> +> +> ? +> +> +> +> My suspicion is that this will have started happening now that +> +> we expose an SMMU with two-stage translation support to the guest +> +> in the "virt" machine type (which we do not if you either +> +> use virt-9.1 or in the v9.1.0 release). +> +> +> +> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of +> +> the two-stage support). +> +> +> +>> qemu command which I use is as below: +> +>> +> +>> qemu-system-aarch64 -machine +> +>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +>> -kernel Image -initrd minifs.cpio.gz \ +> +>> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +>> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +>> -device +> +>> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +>> \ +> +>> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +> +>> -device +> +>> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +>> -drive file=/home/boot.img,if=none,id=drive0,format=raw +> +>> +> +>> smmuv3 event 0x10 log: +> +>> [...] +> +>> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +>> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +>> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +>> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +>> (1.07 GB/1.00 GiB) +> +>> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +>> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.968381] clk: Disabling unused clocks +> +>> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.968990] PM: genpd: Disabling unused power domains +> +>> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.969814] ALSA device list: +> +>> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.970471] No soundcards found. +> +>> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.975005] Freeing unused kernel memory: 10112K +> +>> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.975442] Run init as init process +> +>> +> +>> Another information is that if "maxcpus=3" is removed from the kernel +> +>> command line, +> +>> it will be OK. +> +>> +> +>> I am not sure if there is a bug about vsmmu. It will be very appreciated if +> +>> anyone +> +>> know this issue or can take a look at it. +> +> thanks +> +> -- PMM +> +> . + +Hi, + +On 9/10/24 03:24, Zhou Wang via wrote: +> +On 2024/9/9 22:31, Peter Maydell wrote: +> +> On Mon, 9 Sept 2024 at 15:22, Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +>> Hi All, +> +>> +> +>> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +> +>> during kernel booting up. +> +> Does it still do this if you either: +> +> (1) use the v9.1.0 release (commit fd1952d814da) +> +> (2) use "-machine virt-9.1" instead of "-machine virt" +> +I tested above two cases, the problem is still there. +I have not much progressed yet but I see it comes with +qemu traces. + +smmuv3-iommu-memory-region-0-0 translation failed for iova=0x0 +(SMMU_EVT_F_TRANSLATION) +../.. +qemu-system-aarch64: virtio-blk failed to set guest notifier (-22), +ensure -accel kvm is set. +qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to +userspace (slower). + +the PCIe Host bridge seems to cause that translation failure at iova=0 + +Also virtio-iommu has the same issue: +qemu-system-aarch64: virtio_iommu_translate no mapping for 0x0 for sid=1024 +qemu-system-aarch64: virtio-blk failed to set guest notifier (-22), +ensure -accel kvm is set. +qemu-system-aarch64: virtio_bus_start_ioeventfd: failed. Fallback to +userspace (slower). + +Only happens with maxcpus=3. Note the virtio-blk-pci is not protected by +the vIOMMU in your case. + +Thanks + +Eric + +> +> +> ? +> +> +> +> My suspicion is that this will have started happening now that +> +> we expose an SMMU with two-stage translation support to the guest +> +> in the "virt" machine type (which we do not if you either +> +> use virt-9.1 or in the v9.1.0 release). +> +> +> +> I've cc'd Eric (smmuv3 maintainer) and Mostafa (author of +> +> the two-stage support). +> +> +> +>> qemu command which I use is as below: +> +>> +> +>> qemu-system-aarch64 -machine +> +>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +>> -kernel Image -initrd minifs.cpio.gz \ +> +>> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +>> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +>> -device +> +>> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +>> \ +> +>> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +> +>> -device +> +>> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +>> -drive file=/home/boot.img,if=none,id=drive0,format=raw +> +>> +> +>> smmuv3 event 0x10 log: +> +>> [...] +> +>> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +>> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +>> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +>> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +>> (1.07 GB/1.00 GiB) +> +>> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +>> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.968381] clk: Disabling unused clocks +> +>> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.968990] PM: genpd: Disabling unused power domains +> +>> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.969814] ALSA device list: +> +>> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.970471] No soundcards found. +> +>> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.975005] Freeing unused kernel memory: 10112K +> +>> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.975442] Run init as init process +> +>> +> +>> Another information is that if "maxcpus=3" is removed from the kernel +> +>> command line, +> +>> it will be OK. +> +>> +> +>> I am not sure if there is a bug about vsmmu. It will be very appreciated if +> +>> anyone +> +>> know this issue or can take a look at it. +> +> thanks +> +> -- PMM +> +> . + +Hi Zhou, + +On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +> +Hi All, +> +> +When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +> +during kernel booting up. +> +> +qemu command which I use is as below: +> +> +qemu-system-aarch64 -machine +> +virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +-kernel Image -initrd minifs.cpio.gz \ +> +-enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +-append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +-device +> +pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +\ +> +-device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +> +-device +> +virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +-drive file=/home/boot.img,if=none,id=drive0,format=raw +> +> +smmuv3 event 0x10 log: +> +[...] +> +[ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +[ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +[ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +[ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +(1.07 GB/1.00 GiB) +> +[ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +[ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +[ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +[ 1.968381] clk: Disabling unused clocks +> +[ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +[ 1.968990] PM: genpd: Disabling unused power domains +> +[ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.969814] ALSA device list: +> +[ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.970471] No soundcards found. +> +[ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +[ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +[ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +[ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +[ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +[ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.975005] Freeing unused kernel memory: 10112K +> +[ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +[ 1.975442] Run init as init process +> +> +Another information is that if "maxcpus=3" is removed from the kernel command +> +line, +> +it will be OK. +> +That's interesting, not sure how that would be related. + +> +I am not sure if there is a bug about vsmmu. It will be very appreciated if +> +anyone +> +know this issue or can take a look at it. +> +Can you please provide logs with adding "-d trace:smmu*" to qemu invocation. + +Also if possible, can you please provide which Linux kernel version +you are using, I will see if I can repro. + +Thanks, +Mostafa + +> +Thanks, +> +Zhou +> +> +> + +On 2024/9/9 22:47, Mostafa Saleh wrote: +> +Hi Zhou, +> +> +On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +> +> +> Hi All, +> +> +> +> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event 0x10 +> +> during kernel booting up. +> +> +> +> qemu command which I use is as below: +> +> +> +> qemu-system-aarch64 -machine +> +> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +> -kernel Image -initrd minifs.cpio.gz \ +> +> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +> -device +> +> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +> \ +> +> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 \ +> +> -device +> +> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +> -drive file=/home/boot.img,if=none,id=drive0,format=raw +> +> +> +> smmuv3 event 0x10 log: +> +> [...] +> +> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +> (1.07 GB/1.00 GiB) +> +> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +> [ 1.968381] clk: Disabling unused clocks +> +> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +> [ 1.968990] PM: genpd: Disabling unused power domains +> +> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.969814] ALSA device list: +> +> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.970471] No soundcards found. +> +> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.975005] Freeing unused kernel memory: 10112K +> +> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +> [ 1.975442] Run init as init process +> +> +> +> Another information is that if "maxcpus=3" is removed from the kernel +> +> command line, +> +> it will be OK. +> +> +> +> +That's interesting, not sure how that would be related. +> +> +> I am not sure if there is a bug about vsmmu. It will be very appreciated if +> +> anyone +> +> know this issue or can take a look at it. +> +> +> +> +Can you please provide logs with adding "-d trace:smmu*" to qemu invocation. +Sure. Please see the attached log(using above qemu commit and command). + +> +> +Also if possible, can you please provide which Linux kernel version +> +you are using, I will see if I can repro. +I just use the latest mainline kernel(commit b831f83e40a2) with defconfig. + +Thanks, +Zhou + +> +> +Thanks, +> +Mostafa +> +> +> Thanks, +> +> Zhou +> +> +> +> +> +> +> +> +. +qemu_boot_log.txt +Description: +Text document + +On Tue, Sep 10, 2024 at 2:51â¯AM Zhou Wang <wangzhou1@hisilicon.com> wrote: +> +> +On 2024/9/9 22:47, Mostafa Saleh wrote: +> +> Hi Zhou, +> +> +> +> On Mon, Sep 9, 2024 at 3:22â¯PM Zhou Wang via <qemu-devel@nongnu.org> wrote: +> +>> +> +>> Hi All, +> +>> +> +>> When I tested mainline qemu(commit 7b87a25f49), it reports smmuv3 event +> +>> 0x10 +> +>> during kernel booting up. +> +>> +> +>> qemu command which I use is as below: +> +>> +> +>> qemu-system-aarch64 -machine +> +>> virt,kernel_irqchip=on,gic-version=3,iommu=smmuv3 \ +> +>> -kernel Image -initrd minifs.cpio.gz \ +> +>> -enable-kvm -net none -nographic -m 3G -smp 6 -cpu host \ +> +>> -append 'rdinit=init console=ttyAMA0 ealycon=pl0ll,0x90000000 maxcpus=3' \ +> +>> -device +> +>> pcie-root-port,port=0x8,chassis=0,id=pci.0,bus=pcie.0,multifunction=on,addr=0x2 +> +>> \ +> +>> -device pcie-root-port,port=0x9,chassis=1,id=pci.1,bus=pcie.0,addr=0x2.0x1 +> +>> \ +> +>> -device +> +>> virtio-blk-pci,drive=drive0,id=virtblk0,num-queues=8,packed=on,bus=pci.1 \ +> +>> -drive file=/home/boot.img,if=none,id=drive0,format=raw +> +>> +> +>> smmuv3 event 0x10 log: +> +>> [...] +> +>> [ 1.962656] virtio-pci 0000:02:00.0: Adding to iommu group 0 +> +>> [ 1.963150] virtio-pci 0000:02:00.0: enabling device (0000 -> 0002) +> +>> [ 1.964707] virtio_blk virtio0: 6/0/0 default/read/poll queues +> +>> [ 1.965759] virtio_blk virtio0: [vda] 2097152 512-byte logical blocks +> +>> (1.07 GB/1.00 GiB) +> +>> [ 1.966934] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.967442] input: gpio-keys as /devices/platform/gpio-keys/input/input0 +> +>> [ 1.967478] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.968381] clk: Disabling unused clocks +> +>> [ 1.968677] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.968990] PM: genpd: Disabling unused power domains +> +>> [ 1.969424] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.969814] ALSA device list: +> +>> [ 1.970240] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.970471] No soundcards found. +> +>> [ 1.970902] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.971600] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.971601] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.971602] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.971606] arm-smmu-v3 9050000.smmuv3: event 0x10 received: +> +>> [ 1.971607] arm-smmu-v3 9050000.smmuv3: 0x0000020000000010 +> +>> [ 1.974202] arm-smmu-v3 9050000.smmuv3: 0x0000020000000000 +> +>> [ 1.974634] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.975005] Freeing unused kernel memory: 10112K +> +>> [ 1.975062] arm-smmu-v3 9050000.smmuv3: 0x0000000000000000 +> +>> [ 1.975442] Run init as init process +> +>> +> +>> Another information is that if "maxcpus=3" is removed from the kernel +> +>> command line, +> +>> it will be OK. +> +>> +> +> +> +> That's interesting, not sure how that would be related. +> +> +> +>> I am not sure if there is a bug about vsmmu. It will be very appreciated +> +>> if anyone +> +>> know this issue or can take a look at it. +> +>> +> +> +> +> Can you please provide logs with adding "-d trace:smmu*" to qemu invocation. +> +> +Sure. Please see the attached log(using above qemu commit and command). +> +Thanks a lot, it seems the SMMUv3 indeed receives a translation +request with addr 0x0 which causes this event. +I don't see any kind of modification (alignment) of the address in this path. +So my hunch it's not related to the SMMUv3 and the initiator is +issuing bogus addresses. + +> +> +> +> Also if possible, can you please provide which Linux kernel version +> +> you are using, I will see if I can repro. +> +> +I just use the latest mainline kernel(commit b831f83e40a2) with defconfig. +> +I see, I can't repro in my setup which has no "--enable-kvm" and with +"-cpu max" instead of host. +I will try other options and see if I can repro. + +Thanks, +Mostafa +> +Thanks, +> +Zhou +> +> +> +> +> Thanks, +> +> Mostafa +> +> +> +>> Thanks, +> +>> Zhou +> +>> +> +>> +> +>> +> +> +> +> . + diff --git a/classification_output/01/other/9937102 b/classification_output/01/other/9937102 new file mode 100644 index 000000000..181ac23cb --- /dev/null +++ b/classification_output/01/other/9937102 @@ -0,0 +1,148 @@ +other: 0.883 +instruction: 0.860 +mistranslation: 0.843 +semantic: 0.822 + +[BUG] qemu crashes on assertion in cpu_asidx_from_attrs when cpu is in smm mode + +Hi all! + +First, I see this issue: +https://gitlab.com/qemu-project/qemu/-/issues/1198 +. +where some kvm/hardware failure leads to guest crash, and finally to this +assertion: + + cpu_asidx_from_attrs: Assertion `ret < cpu->num_ases && ret >= 0' failed. + +But in the ticket the talk is about the guest crash and fixing the kernel, not +about the final QEMU assertion (which definitely show that something should be +fixed in QEMU code too). + + +We've faced same stack one time: + +(gdb) bt +#0 raise () from /lib/x86_64-linux-gnu/libc.so.6 +#1 abort () from /lib/x86_64-linux-gnu/libc.so.6 +#2 ?? () from /lib/x86_64-linux-gnu/libc.so.6 +#3 __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 +#4 cpu_asidx_from_attrs at ../hw/core/cpu-sysemu.c:76 +#5 cpu_memory_rw_debug at ../softmmu/physmem.c:3529 +#6 x86_cpu_dump_state at ../target/i386/cpu-dump.c:560 +#7 kvm_cpu_exec at ../accel/kvm/kvm-all.c:3000 +#8 kvm_vcpu_thread_fn at ../accel/kvm/kvm-accel-ops.c:51 +#9 qemu_thread_start at ../util/qemu-thread-posix.c:505 +#10 start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 +#11 clone () from /lib/x86_64-linux-gnu/libc.so.6 + + +And what I see: + +static inline int x86_asidx_from_attrs(CPUState *cs, MemTxAttrs attrs) +{ + return !!attrs.secure; +} + +int cpu_asidx_from_attrs(CPUState *cpu, MemTxAttrs attrs) +{ + int ret = 0; + + if (cpu->cc->sysemu_ops->asidx_from_attrs) { + ret = cpu->cc->sysemu_ops->asidx_from_attrs(cpu, attrs); + assert(ret < cpu->num_ases && ret >= 0); <<<<<<<<<<<<<<<<< + } + return ret; +} + +(gdb) p cpu->num_ases +$3 = 1 + +(gdb) fr 5 +#5 0x00005578c8814ba3 in cpu_memory_rw_debug (cpu=c... +(gdb) p attrs +$6 = {unspecified = 0, secure = 1, user = 0, memory = 0, requester_id = 0, +byte_swap = 0, target_tlb_bit0 = 0, target_tlb_bit1 = 0, target_tlb_bit2 = 0} + +so .secure is 1, therefore ret is 1, in the same time num_ases is 1 too and +assertion fails. + + + +Where is .secure from? + +static inline MemTxAttrs cpu_get_mem_attrs(CPUX86State *env) +{ + return ((MemTxAttrs) { .secure = (env->hflags & HF_SMM_MASK) != 0 }); +} + +Ok, it means we in SMM mode. + + + +On the other hand, it seems that num_ases seems to be always 1 for x86: + +vsementsov@vsementsov-lin:~/work/src/qemu/yc-7.2$ git grep 'num_ases = ' +cpu.c: cpu->num_ases = 0; +softmmu/cpus.c: cpu->num_ases = 1; +target/arm/cpu.c: cs->num_ases = 3 + has_secure; +target/arm/cpu.c: cs->num_ases = 1 + has_secure; +target/i386/tcg/sysemu/tcg-cpu.c: cs->num_ases = 2; + + +So, something is wrong around cpu->num_ases and x86_asidx_from_attrs() which +may return more in SMM mode. + + +The stack starts in +//7 0x00005578c882f539 in kvm_cpu_exec (cpu=cpu@entry=0x5578ca2eb340) at +../accel/kvm/kvm-all.c:3000 + if (ret < 0) { + cpu_dump_state(cpu, stderr, CPU_DUMP_CODE); + vm_stop(RUN_STATE_INTERNAL_ERROR); + } + +So that was some kvm error, and we decided to call cpu_dump_state(). And it +crashes. cpu_dump_state() is also called from hmp_info_registers, so I can +reproduce the crash with a tiny patch to master (as only CPU_DUMP_CODE path +calls cpu_memory_rw_debug(), as it is in kvm_cpu_exec()): + +diff --git a/monitor/hmp-cmds-target.c b/monitor/hmp-cmds-target.c +index ff01cf9d8d..dcf0189048 100644 +--- a/monitor/hmp-cmds-target.c ++++ b/monitor/hmp-cmds-target.c +@@ -116,7 +116,7 @@ void hmp_info_registers(Monitor *mon, const QDict *qdict) + } + + monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index); +- cpu_dump_state(cs, NULL, CPU_DUMP_FPU); ++ cpu_dump_state(cs, NULL, CPU_DUMP_CODE); + } + } + + +Than run + +yes "info registers" | ./build/qemu-system-x86_64 -accel kvm -monitor stdio \ + -global driver=cfi.pflash01,property=secure,value=on \ + -blockdev "{'driver': 'file', 'filename': +'/usr/share/OVMF/OVMF_CODE_4M.secboot.fd', 'node-name': 'ovmf-code', 'read-only': +true}" \ + -blockdev "{'driver': 'file', 'filename': '/usr/share/OVMF/OVMF_VARS_4M.fd', +'node-name': 'ovmf-vars', 'read-only': true}" \ + -machine q35,smm=on,pflash0=ovmf-code,pflash1=ovmf-vars -m 2G -nodefaults + +And after some time (less than 20 seconds for me) it leads to + +qemu-system-x86_64: ../hw/core/cpu-sysemu.c:76: cpu_asidx_from_attrs: Assertion `ret < +cpu->num_ases && ret >= 0' failed. +Aborted (core dumped) + + +I've no idea how to correctly fix this bug, but I hope that my reproducer and +investigation will help a bit. + +-- +Best regards, +Vladimir + diff --git a/classification_output/01/other/9948366 b/classification_output/01/other/9948366 new file mode 100644 index 000000000..fcd04e57a --- /dev/null +++ b/classification_output/01/other/9948366 @@ -0,0 +1,1321 @@ +other: 0.640 +mistranslation: 0.584 +instruction: 0.508 +semantic: 0.374 + +[Qemu-devel] [BUG] I/O thread segfault for QEMU on s390x + +Hi, +I have been noticing some segfaults for QEMU on s390x, and I have been +hitting this issue quite reliably (at least once in 10 runs of a test +case). The qemu version is 2.11.50, and I have systemd created coredumps +when this happens. + +Here is a back trace of the segfaulting thread: + + +#0 0x000003ffafed202c in swapcontext () from /lib64/libc.so.6 +#1 0x000002aa355c02ee in qemu_coroutine_new () at +util/coroutine-ucontext.c:164 +#2 0x000002aa355bec34 in qemu_coroutine_create +(address@hidden <blk_aio_read_entry>, +address@hidden) at util/qemu-coroutine.c:76 +#3 0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0, +offset=<optimized out>, bytes=<optimized out>, qiov=0x3ffa002a9c0, +address@hidden <blk_aio_read_entry>, flags=0, +cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at +block/block-backend.c:1299 +#4 0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>, +offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>, +cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392 +#5 0x000002aa3534114e in submit_requests (niov=<optimized out>, +num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>, +blk=<optimized out>) at +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372 +#6 virtio_blk_submit_multireq (blk=<optimized out>, +address@hidden) at +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402 +#7 0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8, +vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620 +#8 0x000002aa3536655a in virtio_queue_notify_aio_vq +(address@hidden) at +/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515 +#9 0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010) +at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511 +#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at +/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409 +#11 0x000002aa355a8ba4 in run_poll_handlers_once +(address@hidden) at util/aio-posix.c:497 +#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>, +ctx=0x2aa65f99310) at util/aio-posix.c:534 +#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562 +#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at +util/aio-posix.c:602 +#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at +iothread.c:60 +#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0 +#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6 +I don't have much knowledge about i/o threads and the block layer code +in QEMU, so I would like to report to the community about this issue. +I believe this very similar to the bug that I reported upstream couple +of days ago +( +https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html +). +Any help would be greatly appreciated. + +Thanks +Farhan + +On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote: +> +Hi, +> +> +I have been noticing some segfaults for QEMU on s390x, and I have been +> +hitting this issue quite reliably (at least once in 10 runs of a test case). +> +The qemu version is 2.11.50, and I have systemd created coredumps +> +when this happens. +Can you describe the test case or suggest how to reproduce it for us? + +Fam + +On 03/02/2018 01:13 AM, Fam Zheng wrote: +On Thu, Mar 1, 2018 at 10:33 PM, Farhan Ali <address@hidden> wrote: +Hi, + +I have been noticing some segfaults for QEMU on s390x, and I have been +hitting this issue quite reliably (at least once in 10 runs of a test case). +The qemu version is 2.11.50, and I have systemd created coredumps +when this happens. +Can you describe the test case or suggest how to reproduce it for us? + +Fam +The test case is with a single guest, running a memory intensive +workload. The guest has 8 vpcus and 4G of memory. +Here is the qemu command line, if that helps: + +/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \ +-S -object +secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes +\ +-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \ +-m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \ +-object iothread,id=iothread1 -object iothread,id=iothread2 -uuid +b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \ +-display none -no-user-config -nodefaults -chardev +socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait +-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc +-no-shutdown \ +-boot strict=on -drive +file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native +-device +virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 +-drive +file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native +-device +virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1 +-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device +virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000 +-chardev pty,id=charconsole0 -device +sclpconsole,chardev=charconsole0,id=console0 -device +virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on +Please let me know if I need to provide any other information. + +Thanks +Farhan + +On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote: +> +Hi, +> +> +I have been noticing some segfaults for QEMU on s390x, and I have been +> +hitting this issue quite reliably (at least once in 10 runs of a test case). +> +The qemu version is 2.11.50, and I have systemd created coredumps +> +when this happens. +> +> +Here is a back trace of the segfaulting thread: +The backtrace looks normal. + +Please post the QEMU command-line and the details of the segfault (which +memory access faulted?). + +> +#0 0x000003ffafed202c in swapcontext () from /lib64/libc.so.6 +> +#1 0x000002aa355c02ee in qemu_coroutine_new () at +> +util/coroutine-ucontext.c:164 +> +#2 0x000002aa355bec34 in qemu_coroutine_create +> +(address@hidden <blk_aio_read_entry>, +> +address@hidden) at util/qemu-coroutine.c:76 +> +#3 0x000002aa35510262 in blk_aio_prwv (blk=0x2aa65fbefa0, offset=<optimized +> +out>, bytes=<optimized out>, qiov=0x3ffa002a9c0, +> +address@hidden <blk_aio_read_entry>, flags=0, +> +cb=0x2aa35340a50 <virtio_blk_rw_complete>, opaque=0x3ffa002a960) at +> +block/block-backend.c:1299 +> +#4 0x000002aa35510376 in blk_aio_preadv (blk=<optimized out>, +> +offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>, +> +cb=<optimized out>, opaque=0x3ffa002a960) at block/block-backend.c:1392 +> +#5 0x000002aa3534114e in submit_requests (niov=<optimized out>, +> +num_reqs=<optimized out>, start=<optimized out>, mrb=<optimized out>, +> +blk=<optimized out>) at +> +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:372 +> +#6 virtio_blk_submit_multireq (blk=<optimized out>, +> +address@hidden) at +> +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:402 +> +#7 0x000002aa353422e0 in virtio_blk_handle_vq (s=0x2aa6611e7d8, +> +vq=0x3ffb0f5f010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620 +> +#8 0x000002aa3536655a in virtio_queue_notify_aio_vq +> +(address@hidden) at +> +/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515 +> +#9 0x000002aa35366cd6 in virtio_queue_notify_aio_vq (vq=0x3ffb0f5f010) at +> +/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1511 +> +#10 virtio_queue_host_notifier_aio_poll (opaque=0x3ffb0f5f078) at +> +/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:2409 +> +#11 0x000002aa355a8ba4 in run_poll_handlers_once +> +(address@hidden) at util/aio-posix.c:497 +> +#12 0x000002aa355a9b74 in run_poll_handlers (max_ns=<optimized out>, +> +ctx=0x2aa65f99310) at util/aio-posix.c:534 +> +#13 try_poll_mode (blocking=true, ctx=0x2aa65f99310) at util/aio-posix.c:562 +> +#14 aio_poll (ctx=0x2aa65f99310, address@hidden) at +> +util/aio-posix.c:602 +> +#15 0x000002aa353d2d0a in iothread_run (opaque=0x2aa65f990f0) at +> +iothread.c:60 +> +#16 0x000003ffb0f07e82 in start_thread () from /lib64/libpthread.so.0 +> +#17 0x000003ffaff91596 in thread_start () from /lib64/libc.so.6 +> +> +> +I don't have much knowledge about i/o threads and the block layer code in +> +QEMU, so I would like to report to the community about this issue. +> +I believe this very similar to the bug that I reported upstream couple of +> +days ago +> +( +https://lists.gnu.org/archive/html/qemu-devel/2018-02/msg04452.html +). +> +> +Any help would be greatly appreciated. +> +> +Thanks +> +Farhan +> +signature.asc +Description: +PGP signature + +On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote: +On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote: +Hi, + +I have been noticing some segfaults for QEMU on s390x, and I have been +hitting this issue quite reliably (at least once in 10 runs of a test case). +The qemu version is 2.11.50, and I have systemd created coredumps +when this happens. + +Here is a back trace of the segfaulting thread: +The backtrace looks normal. + +Please post the QEMU command-line and the details of the segfault (which +memory access faulted?). +I was able to create another crash today and here is the qemu comand line + +/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \ +-S -object +secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes +\ +-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \ +-m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \ +-object iothread,id=iothread1 -object iothread,id=iothread2 -uuid +b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \ +-display none -no-user-config -nodefaults -chardev +socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait +-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc +-no-shutdown \ +-boot strict=on -drive +file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native +-device +virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 +-drive +file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native +-device +virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1 +-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device +virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000 +-chardev pty,id=charconsole0 -device +sclpconsole,chardev=charconsole0,id=console0 -device +virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on +This the latest back trace on the segfaulting thread, and it seems to +segfault in swapcontext. +Program terminated with signal SIGSEGV, Segmentation fault. +#0 0x000003ff8595202c in swapcontext () from /lib64/libc.so.6 + + +This is the remaining back trace: + +#0 0x000003ff8595202c in swapcontext () from /lib64/libc.so.6 +#1 0x000002aa33b45566 in qemu_coroutine_new () at +util/coroutine-ucontext.c:164 +#2 0x000002aa33b43eac in qemu_coroutine_create +(address@hidden <blk_aio_write_entry>, +address@hidden) at util/qemu-coroutine.c:76 +#3 0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0, +offset=<optimized out>, bytes=<optimized out>, qiov=0x3ff74019080, +address@hidden <blk_aio_write_entry>, flags=0, +cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at +block/block-backend.c:1299 +#4 0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>, +offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>, +cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400 +#5 0x000002aa338c6a38 in submit_requests (niov=<optimized out>, +num_reqs=1, start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized +out>) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369 +#6 virtio_blk_submit_multireq (blk=<optimized out>, +address@hidden) at +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426 +#7 0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8, +vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620 +#8 0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010) +at /usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515 +#9 0x000002aa33b2df46 in aio_dispatch_handlers +(address@hidden) at util/aio-posix.c:406 +#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050, +address@hidden) at util/aio-posix.c:692 +#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at +iothread.c:60 +#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0 +#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6 +Backtrace stopped: previous frame identical to this frame (corrupt stack?) + +On Fri, Mar 02, 2018 at 10:30:57AM -0500, Farhan Ali wrote: +> +> +> +On 03/02/2018 04:23 AM, Stefan Hajnoczi wrote: +> +> On Thu, Mar 01, 2018 at 09:33:35AM -0500, Farhan Ali wrote: +> +> > Hi, +> +> > +> +> > I have been noticing some segfaults for QEMU on s390x, and I have been +> +> > hitting this issue quite reliably (at least once in 10 runs of a test +> +> > case). +> +> > The qemu version is 2.11.50, and I have systemd created coredumps +> +> > when this happens. +> +> > +> +> > Here is a back trace of the segfaulting thread: +> +> The backtrace looks normal. +> +> +> +> Please post the QEMU command-line and the details of the segfault (which +> +> memory access faulted?). +> +> +> +> +> +I was able to create another crash today and here is the qemu comand line +> +> +/usr/bin/qemu-kvm -name guest=sles,debug-threads=on \ +> +-S -object +> +secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-sles/master-key.aes +> +\ +> +-machine s390-ccw-virtio-2.12,accel=kvm,usb=off,dump-guest-core=off \ +> +-m 4096 -realtime mlock=off -smp 8,sockets=8,cores=1,threads=1 \ +> +-object iothread,id=iothread1 -object iothread,id=iothread2 -uuid +> +b83a596b-3a1a-4ac9-9f3e-d9a4032ee52c \ +> +-display none -no-user-config -nodefaults -chardev +> +socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-sles/monitor.sock,server,nowait +> +> +-mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown +> +\ +> +-boot strict=on -drive +> +file=/dev/mapper/360050763998b0883980000002400002b,format=raw,if=none,id=drive-virtio-disk0,cache=none,aio=native +> +-device +> +virtio-blk-ccw,iothread=iothread1,scsi=off,devno=fe.0.0001,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 +> +-drive +> +file=/dev/mapper/360050763998b0883980000002800002f,format=raw,if=none,id=drive-virtio-disk1,cache=none,aio=native +> +-device +> +virtio-blk-ccw,iothread=iothread2,scsi=off,devno=fe.0.0002,drive=drive-virtio-disk1,id=virtio-disk1 +> +-netdev tap,fd=24,id=hostnet0,vhost=on,vhostfd=26 -device +> +virtio-net-ccw,netdev=hostnet0,id=net0,mac=02:38:a6:36:e8:1f,devno=fe.0.0000 +> +-chardev pty,id=charconsole0 -device +> +sclpconsole,chardev=charconsole0,id=console0 -device +> +virtio-balloon-ccw,id=balloon0,devno=fe.3.ffba -msg timestamp=on +> +> +> +This the latest back trace on the segfaulting thread, and it seems to +> +segfault in swapcontext. +> +> +Program terminated with signal SIGSEGV, Segmentation fault. +> +#0 0x000003ff8595202c in swapcontext () from /lib64/libc.so.6 +Please include the following gdb output: + + (gdb) disas swapcontext + (gdb) i r + +That way it's possible to see which instruction faulted and which +registers were being accessed. + +> +This is the remaining back trace: +> +> +#0 0x000003ff8595202c in swapcontext () from /lib64/libc.so.6 +> +#1 0x000002aa33b45566 in qemu_coroutine_new () at +> +util/coroutine-ucontext.c:164 +> +#2 0x000002aa33b43eac in qemu_coroutine_create +> +(address@hidden <blk_aio_write_entry>, +> +address@hidden) at util/qemu-coroutine.c:76 +> +#3 0x000002aa33a954da in blk_aio_prwv (blk=0x2aa4f0efda0, offset=<optimized +> +out>, bytes=<optimized out>, qiov=0x3ff74019080, +> +address@hidden <blk_aio_write_entry>, flags=0, +> +cb=0x2aa338c62e8 <virtio_blk_rw_complete>, opaque=0x3ff74019020) at +> +block/block-backend.c:1299 +> +#4 0x000002aa33a9563e in blk_aio_pwritev (blk=<optimized out>, +> +offset=<optimized out>, qiov=<optimized out>, flags=<optimized out>, +> +cb=<optimized out>, opaque=0x3ff74019020) at block/block-backend.c:1400 +> +#5 0x000002aa338c6a38 in submit_requests (niov=<optimized out>, num_reqs=1, +> +start=<optimized out>, mrb=0x3ff831fe6e0, blk=<optimized out>) at +> +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:369 +> +#6 virtio_blk_submit_multireq (blk=<optimized out>, +> +address@hidden) at +> +/usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:426 +> +#7 0x000002aa338c7b78 in virtio_blk_handle_vq (s=0x2aa4f2507c8, +> +vq=0x3ff869df010) at /usr/src/debug/qemu-2.11.50/hw/block/virtio-blk.c:620 +> +#8 0x000002aa338ebdf2 in virtio_queue_notify_aio_vq (vq=0x3ff869df010) at +> +/usr/src/debug/qemu-2.11.50/hw/virtio/virtio.c:1515 +> +#9 0x000002aa33b2df46 in aio_dispatch_handlers +> +(address@hidden) at util/aio-posix.c:406 +> +#10 0x000002aa33b2eb50 in aio_poll (ctx=0x2aa4f0ca050, +> +address@hidden) at util/aio-posix.c:692 +> +#11 0x000002aa33957f6a in iothread_run (opaque=0x2aa4f0c9630) at +> +iothread.c:60 +> +#12 0x000003ff86987e82 in start_thread () from /lib64/libpthread.so.0 +> +#13 0x000003ff85a11596 in thread_start () from /lib64/libc.so.6 +> +Backtrace stopped: previous frame identical to this frame (corrupt stack?) +> +signature.asc +Description: +PGP signature + +On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote: +Please include the following gdb output: + + (gdb) disas swapcontext + (gdb) i r + +That way it's possible to see which instruction faulted and which +registers were being accessed. +here is the disas out for swapcontext, this is on a coredump with +debugging symbols enabled for qemu. So the addresses from the previous +dump is a little different. +(gdb) disas swapcontext +Dump of assembler code for function swapcontext: + 0x000003ff90751fb8 <+0>: lgr %r1,%r2 + 0x000003ff90751fbc <+4>: lgr %r0,%r3 + 0x000003ff90751fc0 <+8>: stfpc 248(%r1) + 0x000003ff90751fc4 <+12>: std %f0,256(%r1) + 0x000003ff90751fc8 <+16>: std %f1,264(%r1) + 0x000003ff90751fcc <+20>: std %f2,272(%r1) + 0x000003ff90751fd0 <+24>: std %f3,280(%r1) + 0x000003ff90751fd4 <+28>: std %f4,288(%r1) + 0x000003ff90751fd8 <+32>: std %f5,296(%r1) + 0x000003ff90751fdc <+36>: std %f6,304(%r1) + 0x000003ff90751fe0 <+40>: std %f7,312(%r1) + 0x000003ff90751fe4 <+44>: std %f8,320(%r1) + 0x000003ff90751fe8 <+48>: std %f9,328(%r1) + 0x000003ff90751fec <+52>: std %f10,336(%r1) + 0x000003ff90751ff0 <+56>: std %f11,344(%r1) + 0x000003ff90751ff4 <+60>: std %f12,352(%r1) + 0x000003ff90751ff8 <+64>: std %f13,360(%r1) + 0x000003ff90751ffc <+68>: std %f14,368(%r1) + 0x000003ff90752000 <+72>: std %f15,376(%r1) + 0x000003ff90752004 <+76>: slgr %r2,%r2 + 0x000003ff90752008 <+80>: stam %a0,%a15,184(%r1) + 0x000003ff9075200c <+84>: stmg %r0,%r15,56(%r1) + 0x000003ff90752012 <+90>: la %r2,2 + 0x000003ff90752016 <+94>: lgr %r5,%r0 + 0x000003ff9075201a <+98>: la %r3,384(%r5) + 0x000003ff9075201e <+102>: la %r4,384(%r1) + 0x000003ff90752022 <+106>: lghi %r5,8 + 0x000003ff90752026 <+110>: svc 175 + 0x000003ff90752028 <+112>: lgr %r5,%r0 +=> 0x000003ff9075202c <+116>: lfpc 248(%r5) + 0x000003ff90752030 <+120>: ld %f0,256(%r5) + 0x000003ff90752034 <+124>: ld %f1,264(%r5) + 0x000003ff90752038 <+128>: ld %f2,272(%r5) + 0x000003ff9075203c <+132>: ld %f3,280(%r5) + 0x000003ff90752040 <+136>: ld %f4,288(%r5) + 0x000003ff90752044 <+140>: ld %f5,296(%r5) + 0x000003ff90752048 <+144>: ld %f6,304(%r5) + 0x000003ff9075204c <+148>: ld %f7,312(%r5) + 0x000003ff90752050 <+152>: ld %f8,320(%r5) + 0x000003ff90752054 <+156>: ld %f9,328(%r5) + 0x000003ff90752058 <+160>: ld %f10,336(%r5) + 0x000003ff9075205c <+164>: ld %f11,344(%r5) + 0x000003ff90752060 <+168>: ld %f12,352(%r5) + 0x000003ff90752064 <+172>: ld %f13,360(%r5) + 0x000003ff90752068 <+176>: ld %f14,368(%r5) + 0x000003ff9075206c <+180>: ld %f15,376(%r5) + 0x000003ff90752070 <+184>: lam %a2,%a15,192(%r5) + 0x000003ff90752074 <+188>: lmg %r0,%r15,56(%r5) + 0x000003ff9075207a <+194>: br %r14 +End of assembler dump. + +(gdb) i r +r0 0x0 0 +r1 0x3ff8fe7de40 4396165881408 +r2 0x0 0 +r3 0x3ff8fe7e1c0 4396165882304 +r4 0x3ff8fe7dfc0 4396165881792 +r5 0x0 0 +r6 0xffffffff88004880 18446744071696304256 +r7 0x3ff880009e0 4396033247712 +r8 0x27ff89000 10736930816 +r9 0x3ff88001460 4396033250400 +r10 0x1000 4096 +r11 0x1261be0 19274720 +r12 0x3ff88001e00 4396033252864 +r13 0x14d0bc0 21826496 +r14 0x1312ac8 19999432 +r15 0x3ff8fe7dc80 4396165880960 +pc 0x3ff9075202c 0x3ff9075202c <swapcontext+116> +cc 0x2 2 + +On 03/05/2018 07:45 PM, Farhan Ali wrote: +> +> +> +On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote: +> +> Please include the following gdb output: +> +> +> +>   (gdb) disas swapcontext +> +>   (gdb) i r +> +> +> +> That way it's possible to see which instruction faulted and which +> +> registers were being accessed. +> +> +> +here is the disas out for swapcontext, this is on a coredump with debugging +> +symbols enabled for qemu. So the addresses from the previous dump is a little +> +different. +> +> +> +(gdb) disas swapcontext +> +Dump of assembler code for function swapcontext: +> +  0x000003ff90751fb8 <+0>:   lgr   %r1,%r2 +> +  0x000003ff90751fbc <+4>:   lgr   %r0,%r3 +> +  0x000003ff90751fc0 <+8>:   stfpc   248(%r1) +> +  0x000003ff90751fc4 <+12>:   std   %f0,256(%r1) +> +  0x000003ff90751fc8 <+16>:   std   %f1,264(%r1) +> +  0x000003ff90751fcc <+20>:   std   %f2,272(%r1) +> +  0x000003ff90751fd0 <+24>:   std   %f3,280(%r1) +> +  0x000003ff90751fd4 <+28>:   std   %f4,288(%r1) +> +  0x000003ff90751fd8 <+32>:   std   %f5,296(%r1) +> +  0x000003ff90751fdc <+36>:   std   %f6,304(%r1) +> +  0x000003ff90751fe0 <+40>:   std   %f7,312(%r1) +> +  0x000003ff90751fe4 <+44>:   std   %f8,320(%r1) +> +  0x000003ff90751fe8 <+48>:   std   %f9,328(%r1) +> +  0x000003ff90751fec <+52>:   std   %f10,336(%r1) +> +  0x000003ff90751ff0 <+56>:   std   %f11,344(%r1) +> +  0x000003ff90751ff4 <+60>:   std   %f12,352(%r1) +> +  0x000003ff90751ff8 <+64>:   std   %f13,360(%r1) +> +  0x000003ff90751ffc <+68>:   std   %f14,368(%r1) +> +  0x000003ff90752000 <+72>:   std   %f15,376(%r1) +> +  0x000003ff90752004 <+76>:   slgr   %r2,%r2 +> +  0x000003ff90752008 <+80>:   stam   %a0,%a15,184(%r1) +> +  0x000003ff9075200c <+84>:   stmg   %r0,%r15,56(%r1) +> +  0x000003ff90752012 <+90>:   la   %r2,2 +> +  0x000003ff90752016 <+94>:   lgr   %r5,%r0 +> +  0x000003ff9075201a <+98>:   la   %r3,384(%r5) +> +  0x000003ff9075201e <+102>:   la   %r4,384(%r1) +> +  0x000003ff90752022 <+106>:   lghi   %r5,8 +> +  0x000003ff90752026 <+110>:   svc   175 +sys_rt_sigprocmask. r0 should not be changed by the system call. + +> +  0x000003ff90752028 <+112>:   lgr   %r5,%r0 +> +=> 0x000003ff9075202c <+116>:   lfpc   248(%r5) +so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the +2nd parameter to this +function). Now this is odd. + +> +  0x000003ff90752030 <+120>:   ld   %f0,256(%r5) +> +  0x000003ff90752034 <+124>:   ld   %f1,264(%r5) +> +  0x000003ff90752038 <+128>:   ld   %f2,272(%r5) +> +  0x000003ff9075203c <+132>:   ld   %f3,280(%r5) +> +  0x000003ff90752040 <+136>:   ld   %f4,288(%r5) +> +  0x000003ff90752044 <+140>:   ld   %f5,296(%r5) +> +  0x000003ff90752048 <+144>:   ld   %f6,304(%r5) +> +  0x000003ff9075204c <+148>:   ld   %f7,312(%r5) +> +  0x000003ff90752050 <+152>:   ld   %f8,320(%r5) +> +  0x000003ff90752054 <+156>:   ld   %f9,328(%r5) +> +  0x000003ff90752058 <+160>:   ld   %f10,336(%r5) +> +  0x000003ff9075205c <+164>:   ld   %f11,344(%r5) +> +  0x000003ff90752060 <+168>:   ld   %f12,352(%r5) +> +  0x000003ff90752064 <+172>:   ld   %f13,360(%r5) +> +  0x000003ff90752068 <+176>:   ld   %f14,368(%r5) +> +  0x000003ff9075206c <+180>:   ld   %f15,376(%r5) +> +  0x000003ff90752070 <+184>:   lam   %a2,%a15,192(%r5) +> +  0x000003ff90752074 <+188>:   lmg   %r0,%r15,56(%r5) +> +  0x000003ff9075207a <+194>:   br   %r14 +> +End of assembler dump. +> +> +(gdb) i r +> +r0            0x0   0 +> +r1            0x3ff8fe7de40   4396165881408 +> +r2            0x0   0 +> +r3            0x3ff8fe7e1c0   4396165882304 +> +r4            0x3ff8fe7dfc0   4396165881792 +> +r5            0x0   0 +> +r6            0xffffffff88004880   18446744071696304256 +> +r7            0x3ff880009e0   4396033247712 +> +r8            0x27ff89000   10736930816 +> +r9            0x3ff88001460   4396033250400 +> +r10           0x1000   4096 +> +r11           0x1261be0   19274720 +> +r12           0x3ff88001e00   4396033252864 +> +r13           0x14d0bc0   21826496 +> +r14           0x1312ac8   19999432 +> +r15           0x3ff8fe7dc80   4396165880960 +> +pc            0x3ff9075202c   0x3ff9075202c <swapcontext+116> +> +cc            0x2   2 + +On 5 March 2018 at 18:54, Christian Borntraeger <address@hidden> wrote: +> +> +> +On 03/05/2018 07:45 PM, Farhan Ali wrote: +> +> 0x000003ff90752026 <+110>: svc 175 +> +> +sys_rt_sigprocmask. r0 should not be changed by the system call. +> +> +> 0x000003ff90752028 <+112>: lgr %r5,%r0 +> +> => 0x000003ff9075202c <+116>: lfpc 248(%r5) +> +> +so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the +> +2nd parameter to this +> +function). Now this is odd. +...particularly given that the only place we call swapcontext() +the second parameter is always the address of a local variable +and can't be 0... + +thanks +-- PMM + +Do you happen to run with a recent host kernel that has + +commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 + s390: scrub registers on kernel entry and KVM exit + + + + + +Can you run with this on top +diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S +index 13a133a6015c..d6dc0e5e8f74 100644 +--- a/arch/s390/kernel/entry.S ++++ b/arch/s390/kernel/entry.S +@@ -426,13 +426,13 @@ ENTRY(system_call) + UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP + stmg %r0,%r7,__PT_R0(%r11) +- # clear user controlled register to prevent speculative use +- xgr %r0,%r0 + mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC + mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW + mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC + stg %r14,__PT_FLAGS(%r11) + .Lsysc_do_svc: ++ # clear user controlled register to prevent speculative use ++ xgr %r0,%r0 + # load address of system call table + lg %r10,__THREAD_sysc_table(%r13,%r12) + llgh %r8,__PT_INT_CODE+2(%r11) + + +To me it looks like that the critical section cleanup (interrupt during system +call entry) might +save the registers again into ptregs but we have already zeroed out r0. +This patch moves the clearing of r0 after sysc_do_svc, which should fix the +critical +section cleanup. + +Adding Martin and Heiko. Will spin a patch. + + +On 03/05/2018 07:54 PM, Christian Borntraeger wrote: +> +> +> +On 03/05/2018 07:45 PM, Farhan Ali wrote: +> +> +> +> +> +> On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote: +> +>> Please include the following gdb output: +> +>> +> +>>   (gdb) disas swapcontext +> +>>   (gdb) i r +> +>> +> +>> That way it's possible to see which instruction faulted and which +> +>> registers were being accessed. +> +> +> +> +> +> here is the disas out for swapcontext, this is on a coredump with debugging +> +> symbols enabled for qemu. So the addresses from the previous dump is a +> +> little different. +> +> +> +> +> +> (gdb) disas swapcontext +> +> Dump of assembler code for function swapcontext: +> +>   0x000003ff90751fb8 <+0>:   lgr   %r1,%r2 +> +>   0x000003ff90751fbc <+4>:   lgr   %r0,%r3 +> +>   0x000003ff90751fc0 <+8>:   stfpc   248(%r1) +> +>   0x000003ff90751fc4 <+12>:   std   %f0,256(%r1) +> +>   0x000003ff90751fc8 <+16>:   std   %f1,264(%r1) +> +>   0x000003ff90751fcc <+20>:   std   %f2,272(%r1) +> +>   0x000003ff90751fd0 <+24>:   std   %f3,280(%r1) +> +>   0x000003ff90751fd4 <+28>:   std   %f4,288(%r1) +> +>   0x000003ff90751fd8 <+32>:   std   %f5,296(%r1) +> +>   0x000003ff90751fdc <+36>:   std   %f6,304(%r1) +> +>   0x000003ff90751fe0 <+40>:   std   %f7,312(%r1) +> +>   0x000003ff90751fe4 <+44>:   std   %f8,320(%r1) +> +>   0x000003ff90751fe8 <+48>:   std   %f9,328(%r1) +> +>   0x000003ff90751fec <+52>:   std   %f10,336(%r1) +> +>   0x000003ff90751ff0 <+56>:   std   %f11,344(%r1) +> +>   0x000003ff90751ff4 <+60>:   std   %f12,352(%r1) +> +>   0x000003ff90751ff8 <+64>:   std   %f13,360(%r1) +> +>   0x000003ff90751ffc <+68>:   std   %f14,368(%r1) +> +>   0x000003ff90752000 <+72>:   std   %f15,376(%r1) +> +>   0x000003ff90752004 <+76>:   slgr   %r2,%r2 +> +>   0x000003ff90752008 <+80>:   stam   %a0,%a15,184(%r1) +> +>   0x000003ff9075200c <+84>:   stmg   %r0,%r15,56(%r1) +> +>   0x000003ff90752012 <+90>:   la   %r2,2 +> +>   0x000003ff90752016 <+94>:   lgr   %r5,%r0 +> +>   0x000003ff9075201a <+98>:   la   %r3,384(%r5) +> +>   0x000003ff9075201e <+102>:   la   %r4,384(%r1) +> +>   0x000003ff90752022 <+106>:   lghi   %r5,8 +> +>   0x000003ff90752026 <+110>:   svc   175 +> +> +sys_rt_sigprocmask. r0 should not be changed by the system call. +> +> +>   0x000003ff90752028 <+112>:   lgr   %r5,%r0 +> +> => 0x000003ff9075202c <+116>:   lfpc   248(%r5) +> +> +so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the +> +2nd parameter to this +> +function). Now this is odd. +> +> +>   0x000003ff90752030 <+120>:   ld   %f0,256(%r5) +> +>   0x000003ff90752034 <+124>:   ld   %f1,264(%r5) +> +>   0x000003ff90752038 <+128>:   ld   %f2,272(%r5) +> +>   0x000003ff9075203c <+132>:   ld   %f3,280(%r5) +> +>   0x000003ff90752040 <+136>:   ld   %f4,288(%r5) +> +>   0x000003ff90752044 <+140>:   ld   %f5,296(%r5) +> +>   0x000003ff90752048 <+144>:   ld   %f6,304(%r5) +> +>   0x000003ff9075204c <+148>:   ld   %f7,312(%r5) +> +>   0x000003ff90752050 <+152>:   ld   %f8,320(%r5) +> +>   0x000003ff90752054 <+156>:   ld   %f9,328(%r5) +> +>   0x000003ff90752058 <+160>:   ld   %f10,336(%r5) +> +>   0x000003ff9075205c <+164>:   ld   %f11,344(%r5) +> +>   0x000003ff90752060 <+168>:   ld   %f12,352(%r5) +> +>   0x000003ff90752064 <+172>:   ld   %f13,360(%r5) +> +>   0x000003ff90752068 <+176>:   ld   %f14,368(%r5) +> +>   0x000003ff9075206c <+180>:   ld   %f15,376(%r5) +> +>   0x000003ff90752070 <+184>:   lam   %a2,%a15,192(%r5) +> +>   0x000003ff90752074 <+188>:   lmg   %r0,%r15,56(%r5) +> +>   0x000003ff9075207a <+194>:   br   %r14 +> +> End of assembler dump. +> +> +> +> (gdb) i r +> +> r0            0x0   0 +> +> r1            0x3ff8fe7de40   4396165881408 +> +> r2            0x0   0 +> +> r3            0x3ff8fe7e1c0   4396165882304 +> +> r4            0x3ff8fe7dfc0   4396165881792 +> +> r5            0x0   0 +> +> r6            0xffffffff88004880   18446744071696304256 +> +> r7            0x3ff880009e0   4396033247712 +> +> r8            0x27ff89000   10736930816 +> +> r9            0x3ff88001460   4396033250400 +> +> r10           0x1000   4096 +> +> r11           0x1261be0   19274720 +> +> r12           0x3ff88001e00   4396033252864 +> +> r13           0x14d0bc0   21826496 +> +> r14           0x1312ac8   19999432 +> +> r15           0x3ff8fe7dc80   4396165880960 +> +> pc            0x3ff9075202c   0x3ff9075202c <swapcontext+116> +> +> cc            0x2   2 + +On 03/05/2018 02:08 PM, Christian Borntraeger wrote: +Do you happen to run with a recent host kernel that has + +commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 + s390: scrub registers on kernel entry and KVM exit +Yes. +Can you run with this on top +diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S +index 13a133a6015c..d6dc0e5e8f74 100644 +--- a/arch/s390/kernel/entry.S ++++ b/arch/s390/kernel/entry.S +@@ -426,13 +426,13 @@ ENTRY(system_call) + UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP + stmg %r0,%r7,__PT_R0(%r11) +- # clear user controlled register to prevent speculative use +- xgr %r0,%r0 + mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC + mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW + mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC + stg %r14,__PT_FLAGS(%r11) + .Lsysc_do_svc: ++ # clear user controlled register to prevent speculative use ++ xgr %r0,%r0 + # load address of system call table + lg %r10,__THREAD_sysc_table(%r13,%r12) + llgh %r8,__PT_INT_CODE+2(%r11) + + +To me it looks like that the critical section cleanup (interrupt during system +call entry) might +save the registers again into ptregs but we have already zeroed out r0. +This patch moves the clearing of r0 after sysc_do_svc, which should fix the +critical +section cleanup. +Okay I will run with this. +Adding Martin and Heiko. Will spin a patch. + + +On 03/05/2018 07:54 PM, Christian Borntraeger wrote: +On 03/05/2018 07:45 PM, Farhan Ali wrote: +On 03/05/2018 06:03 AM, Stefan Hajnoczi wrote: +Please include the following gdb output: + +   (gdb) disas swapcontext +   (gdb) i r + +That way it's possible to see which instruction faulted and which +registers were being accessed. +here is the disas out for swapcontext, this is on a coredump with debugging +symbols enabled for qemu. So the addresses from the previous dump is a little +different. + + +(gdb) disas swapcontext +Dump of assembler code for function swapcontext: +   0x000003ff90751fb8 <+0>:   lgr   %r1,%r2 +   0x000003ff90751fbc <+4>:   lgr   %r0,%r3 +   0x000003ff90751fc0 <+8>:   stfpc   248(%r1) +   0x000003ff90751fc4 <+12>:   std   %f0,256(%r1) +   0x000003ff90751fc8 <+16>:   std   %f1,264(%r1) +   0x000003ff90751fcc <+20>:   std   %f2,272(%r1) +   0x000003ff90751fd0 <+24>:   std   %f3,280(%r1) +   0x000003ff90751fd4 <+28>:   std   %f4,288(%r1) +   0x000003ff90751fd8 <+32>:   std   %f5,296(%r1) +   0x000003ff90751fdc <+36>:   std   %f6,304(%r1) +   0x000003ff90751fe0 <+40>:   std   %f7,312(%r1) +   0x000003ff90751fe4 <+44>:   std   %f8,320(%r1) +   0x000003ff90751fe8 <+48>:   std   %f9,328(%r1) +   0x000003ff90751fec <+52>:   std   %f10,336(%r1) +   0x000003ff90751ff0 <+56>:   std   %f11,344(%r1) +   0x000003ff90751ff4 <+60>:   std   %f12,352(%r1) +   0x000003ff90751ff8 <+64>:   std   %f13,360(%r1) +   0x000003ff90751ffc <+68>:   std   %f14,368(%r1) +   0x000003ff90752000 <+72>:   std   %f15,376(%r1) +   0x000003ff90752004 <+76>:   slgr   %r2,%r2 +   0x000003ff90752008 <+80>:   stam   %a0,%a15,184(%r1) +   0x000003ff9075200c <+84>:   stmg   %r0,%r15,56(%r1) +   0x000003ff90752012 <+90>:   la   %r2,2 +   0x000003ff90752016 <+94>:   lgr   %r5,%r0 +   0x000003ff9075201a <+98>:   la   %r3,384(%r5) +   0x000003ff9075201e <+102>:   la   %r4,384(%r1) +   0x000003ff90752022 <+106>:   lghi   %r5,8 +   0x000003ff90752026 <+110>:   svc   175 +sys_rt_sigprocmask. r0 should not be changed by the system call. +  0x000003ff90752028 <+112>:   lgr   %r5,%r0 +=> 0x000003ff9075202c <+116>:   lfpc   248(%r5) +so r5 is zero and it was loaded from r0. r0 was loaded from r3 (which is the +2nd parameter to this +function). Now this is odd. +  0x000003ff90752030 <+120>:   ld   %f0,256(%r5) +   0x000003ff90752034 <+124>:   ld   %f1,264(%r5) +   0x000003ff90752038 <+128>:   ld   %f2,272(%r5) +   0x000003ff9075203c <+132>:   ld   %f3,280(%r5) +   0x000003ff90752040 <+136>:   ld   %f4,288(%r5) +   0x000003ff90752044 <+140>:   ld   %f5,296(%r5) +   0x000003ff90752048 <+144>:   ld   %f6,304(%r5) +   0x000003ff9075204c <+148>:   ld   %f7,312(%r5) +   0x000003ff90752050 <+152>:   ld   %f8,320(%r5) +   0x000003ff90752054 <+156>:   ld   %f9,328(%r5) +   0x000003ff90752058 <+160>:   ld   %f10,336(%r5) +   0x000003ff9075205c <+164>:   ld   %f11,344(%r5) +   0x000003ff90752060 <+168>:   ld   %f12,352(%r5) +   0x000003ff90752064 <+172>:   ld   %f13,360(%r5) +   0x000003ff90752068 <+176>:   ld   %f14,368(%r5) +   0x000003ff9075206c <+180>:   ld   %f15,376(%r5) +   0x000003ff90752070 <+184>:   lam   %a2,%a15,192(%r5) +   0x000003ff90752074 <+188>:   lmg   %r0,%r15,56(%r5) +   0x000003ff9075207a <+194>:   br   %r14 +End of assembler dump. + +(gdb) i r +r0            0x0   0 +r1            0x3ff8fe7de40   4396165881408 +r2            0x0   0 +r3            0x3ff8fe7e1c0   4396165882304 +r4            0x3ff8fe7dfc0   4396165881792 +r5            0x0   0 +r6            0xffffffff88004880   18446744071696304256 +r7            0x3ff880009e0   4396033247712 +r8            0x27ff89000   10736930816 +r9            0x3ff88001460   4396033250400 +r10           0x1000   4096 +r11           0x1261be0   19274720 +r12           0x3ff88001e00   4396033252864 +r13           0x14d0bc0   21826496 +r14           0x1312ac8   19999432 +r15           0x3ff8fe7dc80   4396165880960 +pc            0x3ff9075202c   0x3ff9075202c <swapcontext+116> +cc            0x2   2 + +On Mon, 5 Mar 2018 20:08:45 +0100 +Christian Borntraeger <address@hidden> wrote: + +> +Do you happen to run with a recent host kernel that has +> +> +commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 +> +s390: scrub registers on kernel entry and KVM exit +> +> +Can you run with this on top +> +diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S +> +index 13a133a6015c..d6dc0e5e8f74 100644 +> +--- a/arch/s390/kernel/entry.S +> ++++ b/arch/s390/kernel/entry.S +> +@@ -426,13 +426,13 @@ ENTRY(system_call) +> +UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER +> +BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP +> +stmg %r0,%r7,__PT_R0(%r11) +> +- # clear user controlled register to prevent speculative use +> +- xgr %r0,%r0 +> +mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC +> +mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW +> +mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC +> +stg %r14,__PT_FLAGS(%r11) +> +.Lsysc_do_svc: +> ++ # clear user controlled register to prevent speculative use +> ++ xgr %r0,%r0 +> +# load address of system call table +> +lg %r10,__THREAD_sysc_table(%r13,%r12) +> +llgh %r8,__PT_INT_CODE+2(%r11) +> +> +> +To me it looks like that the critical section cleanup (interrupt during +> +system call entry) might +> +save the registers again into ptregs but we have already zeroed out r0. +> +This patch moves the clearing of r0 after sysc_do_svc, which should fix the +> +critical +> +section cleanup. +> +> +Adding Martin and Heiko. Will spin a patch. +Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug +for days now. The point is that if the system call handler is interrupted +after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call +repeats the stmg for %r0-%r7 but now %r0 is already zero. + +Please commit a patch for this and I'll will queue it up immediately. + +-- +blue skies, + Martin. + +"Reality continues to ruin my life." - Calvin. + +On 03/06/2018 01:34 AM, Martin Schwidefsky wrote: +On Mon, 5 Mar 2018 20:08:45 +0100 +Christian Borntraeger <address@hidden> wrote: +Do you happen to run with a recent host kernel that has + +commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 + s390: scrub registers on kernel entry and KVM exit + +Can you run with this on top +diff --git a/arch/s390/kernel/entry.S b/arch/s390/kernel/entry.S +index 13a133a6015c..d6dc0e5e8f74 100644 +--- a/arch/s390/kernel/entry.S ++++ b/arch/s390/kernel/entry.S +@@ -426,13 +426,13 @@ ENTRY(system_call) + UPDATE_VTIME %r8,%r9,__LC_SYNC_ENTER_TIMER + BPENTER __TI_flags(%r12),_TIF_ISOLATE_BP + stmg %r0,%r7,__PT_R0(%r11) +- # clear user controlled register to prevent speculative use +- xgr %r0,%r0 + mvc __PT_R8(64,%r11),__LC_SAVE_AREA_SYNC + mvc __PT_PSW(16,%r11),__LC_SVC_OLD_PSW + mvc __PT_INT_CODE(4,%r11),__LC_SVC_ILC + stg %r14,__PT_FLAGS(%r11) + .Lsysc_do_svc: ++ # clear user controlled register to prevent speculative use ++ xgr %r0,%r0 + # load address of system call table + lg %r10,__THREAD_sysc_table(%r13,%r12) + llgh %r8,__PT_INT_CODE+2(%r11) + + +To me it looks like that the critical section cleanup (interrupt during system +call entry) might +save the registers again into ptregs but we have already zeroed out r0. +This patch moves the clearing of r0 after sysc_do_svc, which should fix the +critical +section cleanup. + +Adding Martin and Heiko. Will spin a patch. +Argh, yes. Thanks Chrisitan, this is it. I have been searching for the bug +for days now. The point is that if the system call handler is interrupted +after the xgr but before .Lsysc_do_svc the code at .Lcleanup_system_call +repeats the stmg for %r0-%r7 but now %r0 is already zero. + +Please commit a patch for this and I'll will queue it up immediately. +This patch does fix the QEMU crash. I haven't seen the crash after +running the test case for more than a day. Thanks to everyone for taking +a look at this problem :) +Thanks +Farhan + diff --git a/classification_output/01/semantic/0504199 b/classification_output/01/semantic/0504199 new file mode 100644 index 000000000..1d5b4d118 --- /dev/null +++ b/classification_output/01/semantic/0504199 @@ -0,0 +1,595 @@ +semantic: 0.953 +other: 0.944 +instruction: 0.919 +mistranslation: 0.799 + +[BUG]QEMU jump into interrupt when single-stepping on aarch64 + +Dear, folks, + +I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 +platform, +the added breakpoint hits but after I type `step`, the gdb always jumps into +interrupt. + +My env: + + gdb-10.2 + qemu-6.2.0 + host kernel: 5.10.84 + VM kernel: 5.10.84 + +The steps to reproduce: + # host console: run a VM with only one core, the import arg: <qemu:arg +value='-s'/> + # details can be found here: +https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt +virsh create dev_core0.xml + + # run gdb client + gdb ./vmlinux + + # gdb client on host console + (gdb) dir +./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 + (gdb) target remote localhost:1234 + (gdb) info b + Num Type Disp Enb Address What + 1 breakpoint keep y <MULTIPLE> + 1.1 y 0xffff800010361444 +mm/memory-failure.c:1318 + 1.2 y 0xffff800010361450 in memory_failure + at mm/memory-failure.c:1488 + (gdb) c + Continuing. + + # console in VM, use madvise to inject a hwposion at virtual address +vaddr, + # which will hit the b inmemory_failur: madvise(vaddr, pagesize, +MADV_HWPOISON); + # and the VM pause + ./run_madvise.c + + # gdb client on host console + (gdb) + Continuing. + Breakpoint 1, 0xffff800010361444 in memory_failure () at +mm/memory-failure.c:1318 + 1318 res = -EHWPOISON; + (gdb) n + vectors () at arch/arm64/kernel/entry.S:552 + 552 kernel_ventry 1, irq // IRQ +EL1h + (gdb) n + (gdb) n + (gdb) n + (gdb) n + gic_handle_irq (regs=0xffff8000147c3b80) at +drivers/irqchip/irq-gic-v3.c:721 + # after several step, I got the irqnr + (gdb) p irqnr + $5 = 8262 + +Sometimes, the irqnr is 27ï¼ which is used for arch_timer. + +I was wondering do you have any comments on this? And feedback are welcomed. + +Thank you. + +Best Regards. +Shuai + +On 4/6/22 09:30, Shuai Xue wrote: +Dear, folks, + +I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 +platform, +the added breakpoint hits but after I type `step`, the gdb always jumps into +interrupt. + +My env: + + gdb-10.2 + qemu-6.2.0 + host kernel: 5.10.84 + VM kernel: 5.10.84 + +The steps to reproduce: + # host console: run a VM with only one core, the import arg: <qemu:arg +value='-s'/> + # details can be found here: +https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt +virsh create dev_core0.xml + + # run gdb client + gdb ./vmlinux + + # gdb client on host console + (gdb) dir +./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 + (gdb) target remote localhost:1234 + (gdb) info b + Num Type Disp Enb Address What + 1 breakpoint keep y <MULTIPLE> + 1.1 y 0xffff800010361444 +mm/memory-failure.c:1318 + 1.2 y 0xffff800010361450 in memory_failure + at mm/memory-failure.c:1488 + (gdb) c + Continuing. + + # console in VM, use madvise to inject a hwposion at virtual address +vaddr, + # which will hit the b inmemory_failur: madvise(vaddr, pagesize, +MADV_HWPOISON); + # and the VM pause + ./run_madvise.c + + # gdb client on host console + (gdb) + Continuing. + Breakpoint 1, 0xffff800010361444 in memory_failure () at +mm/memory-failure.c:1318 + 1318 res = -EHWPOISON; + (gdb) n + vectors () at arch/arm64/kernel/entry.S:552 + 552 kernel_ventry 1, irq // IRQ +EL1h +The 'n' command is not a single-step: use stepi, which will suppress interrupts. +Anyway, not a bug. + +r~ + +å¨ 2022/4/7 AM12:57, Richard Henderson åé: +> +On 4/6/22 09:30, Shuai Xue wrote: +> +> Dear, folks, +> +> +> +> I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 +> +> platform, +> +> the added breakpoint hits but after I type `step`, the gdb always jumps into +> +> interrupt. +> +> +> +> My env: +> +> +> +>     gdb-10.2 +> +>     qemu-6.2.0 +> +>     host kernel: 5.10.84 +> +>     VM kernel: 5.10.84 +> +> +> +> The steps to reproduce: +> +>     # host console: run a VM with only one core, the import arg: <qemu:arg +> +> value='-s'/> +> +>     # details can be found here: +> +> +https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt +> +>     virsh create dev_core0.xml +> +>     +> +>     # run gdb client +> +>     gdb ./vmlinux +> +> +> +>     # gdb client on host console +> +>     (gdb) dir +> +> ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 +> +>     (gdb) target remote localhost:1234 +> +>     (gdb) info b +> +>     Num    Type          Disp Enb Address           What +> +>     1      breakpoint    keep y  <MULTIPLE> +> +>     1.1                        y  0xffff800010361444 +> +> mm/memory-failure.c:1318 +> +>     1.2                        y  0xffff800010361450 in memory_failure +> +>                                                    at +> +> mm/memory-failure.c:1488 +> +>     (gdb) c +> +>     Continuing. +> +> +> +>     # console in VM, use madvise to inject a hwposion at virtual address +> +> vaddr, +> +>     # which will hit the b inmemory_failur: madvise(vaddr, pagesize, +> +> MADV_HWPOISON); +> +>     # and the VM pause +> +>     ./run_madvise.c +> +> +> +>     # gdb client on host console +> +>     (gdb) +> +>     Continuing. +> +>     Breakpoint 1, 0xffff800010361444 in memory_failure () at +> +> mm/memory-failure.c:1318 +> +>     1318                   res = -EHWPOISON; +> +>     (gdb) n +> +>     vectors () at arch/arm64/kernel/entry.S:552 +> +>     552            kernel_ventry  1, irq                         // IRQ +> +> EL1h +> +> +The 'n' command is not a single-step: use stepi, which will suppress +> +interrupts. +> +Anyway, not a bug. +> +> +r~ +Hi, Richard, + +Thank you for your quick reply, I also try `stepi`, but it does NOT work either. + + (gdb) c + Continuing. + + Breakpoint 1, memory_failure (pfn=1273982, flags=1) at +mm/memory-failure.c:1488 + 1488 { + (gdb) stepi + vectors () at arch/arm64/kernel/entry.S:552 + 552 kernel_ventry 1, irq // IRQ +EL1h + +According to QEMU doc[1]: the default single stepping behavior is step with the +IRQs +and timer service routines off. I checked the MASK bits used to control the +single +stepping IE on my machine as bellow: + + # gdb client on host (x86 plafrom) + (gdb) maintenance packet qqemu.sstepbits + sending: "qqemu.sstepbits" + received: "ENABLE=1,NOIRQ=2,NOTIMER=4" + +The sstep MASK looks as expected, but does not work as expected. + +I also try the same kernel and qemu version on X86 platform: +> +> gdb-10.2 +> +> qemu-6.2.0 +> +> host kernel: 5.10.84 +> +> VM kernel: 5.10.84 +The command `n` jumps to the next instruction. + + # gdb client on host (x86 plafrom) + (gdb) b memory-failure.c:1488 + Breakpoint 1, memory_failure (pfn=1128931, flags=1) at +mm/memory-failure.c:1488 + 1488 { + (gdb) n + 1497 if (!sysctl_memory_failure_recovery) + (gdb) stepi + 0xffffffff812efdbc 1497 if +(!sysctl_memory_failure_recovery) + (gdb) stepi + 0xffffffff812efdbe 1497 if +(!sysctl_memory_failure_recovery) + (gdb) n + 1500 p = pfn_to_online_page(pfn); + (gdb) l + 1496 + 1497 if (!sysctl_memory_failure_recovery) + 1498 panic("Memory failure on page %lx", pfn); + 1499 + 1500 p = pfn_to_online_page(pfn); + 1501 if (!p) { + +Best Regrades, +Shuai + + +[1] +https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst + +å¨ 2022/4/7 PM12:10, Shuai Xue åé: +> +å¨ 2022/4/7 AM12:57, Richard Henderson åé: +> +> On 4/6/22 09:30, Shuai Xue wrote: +> +>> Dear, folks, +> +>> +> +>> I try to debug Linux kernel with QEMU in single-stepping mode on aarch64 +> +>> platform, +> +>> the added breakpoint hits but after I type `step`, the gdb always jumps +> +>> into interrupt. +> +>> +> +>> My env: +> +>> +> +>>     gdb-10.2 +> +>>     qemu-6.2.0 +> +>>     host kernel: 5.10.84 +> +>>     VM kernel: 5.10.84 +> +>> +> +>> The steps to reproduce: +> +>>     # host console: run a VM with only one core, the import arg: <qemu:arg +> +>> value='-s'/> +> +>>     # details can be found here: +> +>> +https://www.redhat.com/en/blog/debugging-kernel-qemulibvirt +> +>>     virsh create dev_core0.xml +> +>>     +> +>>     # run gdb client +> +>>     gdb ./vmlinux +> +>> +> +>>     # gdb client on host console +> +>>     (gdb) dir +> +>> ./usr/src/debug/kernel-5.10.84/linux-5.10.84-004.alpha.ali5000.alios7.aarch64 +> +>>     (gdb) target remote localhost:1234 +> +>>     (gdb) info b +> +>>     Num    Type          Disp Enb Address           What +> +>>     1      breakpoint    keep y  <MULTIPLE> +> +>>     1.1                        y  0xffff800010361444 +> +>> mm/memory-failure.c:1318 +> +>>     1.2                        y  0xffff800010361450 in memory_failure +> +>>                                                    at +> +>> mm/memory-failure.c:1488 +> +>>     (gdb) c +> +>>     Continuing. +> +>> +> +>>     # console in VM, use madvise to inject a hwposion at virtual address +> +>> vaddr, +> +>>     # which will hit the b inmemory_failur: madvise(vaddr, pagesize, +> +>> MADV_HWPOISON); +> +>>     # and the VM pause +> +>>     ./run_madvise.c +> +>> +> +>>     # gdb client on host console +> +>>     (gdb) +> +>>     Continuing. +> +>>     Breakpoint 1, 0xffff800010361444 in memory_failure () at +> +>> mm/memory-failure.c:1318 +> +>>     1318                   res = -EHWPOISON; +> +>>     (gdb) n +> +>>     vectors () at arch/arm64/kernel/entry.S:552 +> +>>     552            kernel_ventry  1, irq                         // IRQ +> +>> EL1h +> +> +> +> The 'n' command is not a single-step: use stepi, which will suppress +> +> interrupts. +> +> Anyway, not a bug. +> +> +> +> r~ +> +> +Hi, Richard, +> +> +Thank you for your quick reply, I also try `stepi`, but it does NOT work +> +either. +> +> +(gdb) c +> +Continuing. +> +> +Breakpoint 1, memory_failure (pfn=1273982, flags=1) at +> +mm/memory-failure.c:1488 +> +1488 { +> +(gdb) stepi +> +vectors () at arch/arm64/kernel/entry.S:552 +> +552 kernel_ventry 1, irq // IRQ +> +EL1h +> +> +According to QEMU doc[1]: the default single stepping behavior is step with +> +the IRQs +> +and timer service routines off. I checked the MASK bits used to control the +> +single +> +stepping IE on my machine as bellow: +> +> +# gdb client on host (x86 plafrom) +> +(gdb) maintenance packet qqemu.sstepbits +> +sending: "qqemu.sstepbits" +> +received: "ENABLE=1,NOIRQ=2,NOTIMER=4" +> +> +The sstep MASK looks as expected, but does not work as expected. +> +> +I also try the same kernel and qemu version on X86 platform: +> +>> gdb-10.2 +> +>> qemu-6.2.0 +> +>> host kernel: 5.10.84 +> +>> VM kernel: 5.10.84 +> +> +> +The command `n` jumps to the next instruction. +> +> +# gdb client on host (x86 plafrom) +> +(gdb) b memory-failure.c:1488 +> +Breakpoint 1, memory_failure (pfn=1128931, flags=1) at +> +mm/memory-failure.c:1488 +> +1488 { +> +(gdb) n +> +1497 if (!sysctl_memory_failure_recovery) +> +(gdb) stepi +> +0xffffffff812efdbc 1497 if +> +(!sysctl_memory_failure_recovery) +> +(gdb) stepi +> +0xffffffff812efdbe 1497 if +> +(!sysctl_memory_failure_recovery) +> +(gdb) n +> +1500 p = pfn_to_online_page(pfn); +> +(gdb) l +> +1496 +> +1497 if (!sysctl_memory_failure_recovery) +> +1498 panic("Memory failure on page %lx", pfn); +> +1499 +> +1500 p = pfn_to_online_page(pfn); +> +1501 if (!p) { +> +> +Best Regrades, +> +Shuai +> +> +> +[1] +https://github.com/qemu/qemu/blob/master/docs/system/gdb.rst +Hi, Richard, + +I was wondering that do you have any comments to this? + +Best Regrades, +Shuai + diff --git a/classification_output/01/semantic/0891566 b/classification_output/01/semantic/0891566 new file mode 100644 index 000000000..f2e9cca08 --- /dev/null +++ b/classification_output/01/semantic/0891566 @@ -0,0 +1,400 @@ +semantic: 0.978 +instruction: 0.978 +other: 0.978 +mistranslation: 0.973 + +[Qemu-devel] [vhost-user BUG ?] QEMU process segfault when shutdown or reboot with vhost-user + +Hi, + +We catch a segfault in our project. + +Qemu version is 2.3.0 + +The Stack backtrace is: +(gdb) bt +#0 0x0000000000000000 in ?? () +#1 0x00007f7ad9280b2f in qemu_deliver_packet (sender=<optimized out>, flags=<optimized +out>, data=<optimized out>, size=100, opaque= + 0x7f7ad9d6db10) at net/net.c:510 +#2 0x00007f7ad92831fa in qemu_net_queue_deliver (size=<optimized out>, data=<optimized +out>, flags=<optimized out>, + sender=<optimized out>, queue=<optimized out>) at net/queue.c:157 +#3 qemu_net_queue_flush (queue=0x7f7ad9d39630) at net/queue.c:254 +#4 0x00007f7ad9280dac in qemu_flush_or_purge_queued_packets +(nc=0x7f7ad9d6db10, purge=true) at net/net.c:539 +#5 0x00007f7ad9280e76 in net_vm_change_state_handler (opaque=<optimized out>, +running=<optimized out>, state=100) at net/net.c:1214 +#6 0x00007f7ad915612f in vm_state_notify (running=0, state=RUN_STATE_SHUTDOWN) +at vl.c:1820 +#7 0x00007f7ad906db1a in do_vm_stop (state=<optimized out>) at +/usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:631 +#8 vm_stop (state=RUN_STATE_SHUTDOWN) at +/usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:1325 +#9 0x00007f7ad915e4a2 in main_loop_should_exit () at vl.c:2080 +#10 main_loop () at vl.c:2131 +#11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at +vl.c:4721 +(gdb) p *(NetClientState *)0x7f7ad9d6db10 +$1 = {info = 0x7f7ad9824520, link_down = 0, next = {tqe_next = 0x7f7ad0f06d10, +tqe_prev = 0x7f7ad98b1cf0}, peer = 0x7f7ad0f06d10, + incoming_queue = 0x7f7ad9d39630, model = 0x7f7ad9d39590 "vhost_user", name = +0x7f7ad9d39570 "hostnet0", info_str = + "vhost-user to charnet0", '\000' <repeats 233 times>, receive_disabled = 0, +destructor = + 0x7f7ad92821f0 <qemu_net_client_destructor>, queue_index = 0, +rxfilter_notify_enabled = 0} +(gdb) p *(NetClientInfo *)0x7f7ad9824520 +$2 = {type = NET_CLIENT_OPTIONS_KIND_VHOST_USER, size = 360, receive = 0, +receive_raw = 0, receive_iov = 0, can_receive = 0, cleanup = + 0x7f7ad9288850 <vhost_user_cleanup>, link_status_changed = 0, +query_rx_filter = 0, poll = 0, has_ufo = + 0x7f7ad92886d0 <vhost_user_has_ufo>, has_vnet_hdr = 0x7f7ad9288670 +<vhost_user_has_vnet_hdr>, has_vnet_hdr_len = 0, + using_vnet_hdr = 0, set_offload = 0, set_vnet_hdr_len = 0} +(gdb) + +The corresponding codes where gdb reports error are: (We have added some codes +in net.c) +ssize_t qemu_deliver_packet(NetClientState *sender, + unsigned flags, + const uint8_t *data, + size_t size, + void *opaque) +{ + NetClientState *nc = opaque; + ssize_t ret; + + if (nc->link_down) { + return size; + } + + if (nc->receive_disabled) { + return 0; + } + + if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { + ret = nc->info->receive_raw(nc, data, size); + } else { + ret = nc->info->receive(nc, data, size); ----> Here is 510 line + } + +I'm not quite familiar with vhost-user, but for vhost-user, these two callback +functions seem to be always NULL, +Why we can come here ? +Is it an error to add VM state change handler for vhost-user ? + +Thanks, +zhanghailiang + +Hi + +On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +<address@hidden> wrote: +> +The corresponding codes where gdb reports error are: (We have added some +> +codes in net.c) +Can you reproduce with unmodified qemu? Could you give instructions to do so? + +> +ssize_t qemu_deliver_packet(NetClientState *sender, +> +unsigned flags, +> +const uint8_t *data, +> +size_t size, +> +void *opaque) +> +{ +> +NetClientState *nc = opaque; +> +ssize_t ret; +> +> +if (nc->link_down) { +> +return size; +> +} +> +> +if (nc->receive_disabled) { +> +return 0; +> +} +> +> +if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { +> +ret = nc->info->receive_raw(nc, data, size); +> +} else { +> +ret = nc->info->receive(nc, data, size); ----> Here is 510 line +> +} +> +> +I'm not quite familiar with vhost-user, but for vhost-user, these two +> +callback functions seem to be always NULL, +> +Why we can come here ? +You should not come here, vhost-user has nc->receive_disabled (it +changes in 2.5) + +-- +Marc-André Lureau + +On 2015/11/3 22:54, Marc-André Lureau wrote: +Hi + +On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +<address@hidden> wrote: +The corresponding codes where gdb reports error are: (We have added some +codes in net.c) +Can you reproduce with unmodified qemu? Could you give instructions to do so? +OK, i will try to do it. There is nothing special, we run iperf tool in VM, +and then shutdown or reboot it. There is change you can catch segfault. +ssize_t qemu_deliver_packet(NetClientState *sender, + unsigned flags, + const uint8_t *data, + size_t size, + void *opaque) +{ + NetClientState *nc = opaque; + ssize_t ret; + + if (nc->link_down) { + return size; + } + + if (nc->receive_disabled) { + return 0; + } + + if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { + ret = nc->info->receive_raw(nc, data, size); + } else { + ret = nc->info->receive(nc, data, size); ----> Here is 510 line + } + +I'm not quite familiar with vhost-user, but for vhost-user, these two +callback functions seem to be always NULL, +Why we can come here ? +You should not come here, vhost-user has nc->receive_disabled (it +changes in 2.5) +I have looked at the newest codes, i think we can still have chance to +come here, since we will change nc->receive_disable to false temporarily in +qemu_flush_or_purge_queued_packets(), there is no difference between 2.3 and 2.5 +for this. +Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true +in qemu_net_queue_flush() for vhost-user ? + +i will try to reproduce it by using newest qemu. + +Thanks, +zhanghailiang + +On 11/04/2015 10:24 AM, zhanghailiang wrote: +> +On 2015/11/3 22:54, Marc-André Lureau wrote: +> +> Hi +> +> +> +> On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +> +> <address@hidden> wrote: +> +>> The corresponding codes where gdb reports error are: (We have added +> +>> some +> +>> codes in net.c) +> +> +> +> Can you reproduce with unmodified qemu? Could you give instructions +> +> to do so? +> +> +> +> +OK, i will try to do it. There is nothing special, we run iperf tool +> +in VM, +> +and then shutdown or reboot it. There is change you can catch segfault. +> +> +>> ssize_t qemu_deliver_packet(NetClientState *sender, +> +>> unsigned flags, +> +>> const uint8_t *data, +> +>> size_t size, +> +>> void *opaque) +> +>> { +> +>> NetClientState *nc = opaque; +> +>> ssize_t ret; +> +>> +> +>> if (nc->link_down) { +> +>> return size; +> +>> } +> +>> +> +>> if (nc->receive_disabled) { +> +>> return 0; +> +>> } +> +>> +> +>> if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { +> +>> ret = nc->info->receive_raw(nc, data, size); +> +>> } else { +> +>> ret = nc->info->receive(nc, data, size); ----> Here is +> +>> 510 line +> +>> } +> +>> +> +>> I'm not quite familiar with vhost-user, but for vhost-user, these two +> +>> callback functions seem to be always NULL, +> +>> Why we can come here ? +> +> +> +> You should not come here, vhost-user has nc->receive_disabled (it +> +> changes in 2.5) +> +> +> +> +I have looked at the newest codes, i think we can still have chance to +> +come here, since we will change nc->receive_disable to false +> +temporarily in +> +qemu_flush_or_purge_queued_packets(), there is no difference between +> +2.3 and 2.5 +> +for this. +> +Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true +> +in qemu_net_queue_flush() for vhost-user ? +The only thing I can image is self announcing. Are you trying to do +migration? 2.5 only support sending rarp through this. + +And it's better to have a breakpoint to see why a packet was queued for +vhost-user. The stack trace may also help in this case. + +> +> +i will try to reproduce it by using newest qemu. +> +> +Thanks, +> +zhanghailiang +> + +On 2015/11/4 11:19, Jason Wang wrote: +On 11/04/2015 10:24 AM, zhanghailiang wrote: +On 2015/11/3 22:54, Marc-André Lureau wrote: +Hi + +On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +<address@hidden> wrote: +The corresponding codes where gdb reports error are: (We have added +some +codes in net.c) +Can you reproduce with unmodified qemu? Could you give instructions +to do so? +OK, i will try to do it. There is nothing special, we run iperf tool +in VM, +and then shutdown or reboot it. There is change you can catch segfault. +ssize_t qemu_deliver_packet(NetClientState *sender, + unsigned flags, + const uint8_t *data, + size_t size, + void *opaque) +{ + NetClientState *nc = opaque; + ssize_t ret; + + if (nc->link_down) { + return size; + } + + if (nc->receive_disabled) { + return 0; + } + + if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { + ret = nc->info->receive_raw(nc, data, size); + } else { + ret = nc->info->receive(nc, data, size); ----> Here is +510 line + } + +I'm not quite familiar with vhost-user, but for vhost-user, these two +callback functions seem to be always NULL, +Why we can come here ? +You should not come here, vhost-user has nc->receive_disabled (it +changes in 2.5) +I have looked at the newest codes, i think we can still have chance to +come here, since we will change nc->receive_disable to false +temporarily in +qemu_flush_or_purge_queued_packets(), there is no difference between +2.3 and 2.5 +for this. +Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true +in qemu_net_queue_flush() for vhost-user ? +The only thing I can image is self announcing. Are you trying to do +migration? 2.5 only support sending rarp through this. +Hmm, it's not triggered by migration, For qemu-2.5, IMHO, it doesn't have such +problem, +since the callback function 'receive' is not NULL. It is vhost_user_receive(). +And it's better to have a breakpoint to see why a packet was queued for +vhost-user. The stack trace may also help in this case. +OK, i'm trying to reproduce it. + +Thanks, +zhanghailiang +i will try to reproduce it by using newest qemu. + +Thanks, +zhanghailiang +. + diff --git a/classification_output/01/semantic/1452608 b/classification_output/01/semantic/1452608 new file mode 100644 index 000000000..6750978a6 --- /dev/null +++ b/classification_output/01/semantic/1452608 @@ -0,0 +1,78 @@ +semantic: 0.943 +instruction: 0.932 +other: 0.921 +mistranslation: 0.854 + +[BUG] x86/PAT handling severely crippled AMD-V SVM KVM performance + +Hi, I maintain an out-of-tree 3D APIs pass-through QEMU device models at +https://github.com/kjliew/qemu-3dfx +that provide 3D acceleration for legacy +32-bit Windows guests (Win98SE, WinME, Win2k and WinXP) with the focus on +playing old legacy games from 1996-2003. It currently supports the now-defunct +3Dfx propriety API called Glide and an alternative OpenGL pass-through based on +MESA implementation. + +The basic concept of both implementations create memory-mapped virtual +interfaces consist of host/guest shared memory with guest-push model instead of +a more common host-pull model for typical QEMU device model implementation. +Guest uses shared memory as FIFOs for drawing commands and data to bulk up the +operations until serialization event that flushes the FIFOs into host. This +achieves extremely good performance since virtual CPUs are fast with hardware +acceleration (Intel VT/AMD-V) and reduces the overhead of frequent VMEXITs to +service the device emulation. Both implementations work on Windows 10 with WHPX +and HAXM accelerators as well as KVM in Linux. + +On Windows 10, QEMU WHPX implementation does not sync MSR_IA32_PAT during +host/guest states sync. There is no visibility into the closed-source WHPX on +how things are managed behind the scene, but from measuring performance figures +I can conclude that it didn't handle the MSR_IA32_PAT correctly for both Intel +and AMD. Call this fair enough, if you will, it didn't flag any concerns, in +fact games such as Quake2 and Quake3 were still within playable frame rate of +40~60FPS on Win2k/XP guest. Until the same games were run on Win98/ME guest and +the frame rate blew off the roof (300~500FPS) on the same CPU and GPU. In fact, +the later seemed to be more inlined with runnng the games bare-metal with vsync +off. + +On Linux (at the time of writing kernel 5.6.7/Mesa 20.0), the difference +prevailed. Intel CPUs (and it so happened that I was on laptop with Intel GPU), +the VMX-based kvm_intel got it right while SVM-based kvm_amd did not. +To put this in simple exaggeration, an aging Core i3-4010U/HD Graphics 4400 +(Haswell GT2) exhibited an insane performance in Quake2/Quake3 timedemos that +totally crushed more recent AMD Ryzen 2500U APU/Vega 8 Graphics and AMD +FX8300/NVIDIA GT730 on desktop. Simply unbelievable! + +It turned out that there was something to do with AMD-V NPT. By loading kvm_amd +with npt=0, AMD Ryzen APU and FX8300 regained a huge performance leap. However, +AMD NPT issue with KVM was supposedly fixed in 2017 kernel commits. NPT=0 would +actually incur performance loss for VM due to intervention required by +hypervisors to maintain the shadow page tables. Finally, I was able to find the +pointer that pointed to MSR_IA32_PAT register. By updating the MSR_IA32_PAT to +0x0606xxxx0606xxxxULL, AMD CPUs now regain their rightful performance without +taking the hit of NPT=0 for Linux KVM. Taking the same solution into Windows, +both Intel and AMD CPUs no longer require Win98/ME guest to unleash the full +performance potentials and performance figures based on games measured on WHPX +were not very far behind Linux KVM. + +So I guess the problem lies in host/guest shared memory regions mapped as +uncacheable from virtual CPU perspective. As virtual CPUs now completely execute +in hardware context with x86 hardware virtualiztion extensions, the cacheability +of memory types would severely impact the performance on guests. WHPX didn't +handle it for both Intel EPT and AMD NPT, but KVM seems to do it right for Intel +EPT. I don't have the correct fix for QEMU. But what I can do for my 3D APIs +pass-through device models is to implement host-side hooks to reprogram and +restore MSR_IA32_PAT upon activation/deactivation of the 3D APIs. Perhaps there +is also a better solution of having the proper kernel drivers for virtual +interfaces to manage the memory types of host/guest shared memory in kernel +space, but to do that and the needs of Microsoft tools/DDKs, I will just forget +it. The guest stubs uses the same kernel drivers included in 3Dfx drivers for +memory mapping and the virtual interfaces remain driver-less from Windows OS +perspective. Considering the current state of halting progress for QEMU native +virgil3D to support Windows OS, I am just being pragmatic. I understand that +QEMU virgil3D will eventually bring 3D acceleration for Windows guests, but I do +not expect anything to support legacy 32-bit Windows OSes which have out-grown +their commercial usefulness. + +Regards, +KJ Liew + diff --git a/classification_output/01/semantic/2047990 b/classification_output/01/semantic/2047990 new file mode 100644 index 000000000..d0391c1e7 --- /dev/null +++ b/classification_output/01/semantic/2047990 @@ -0,0 +1,999 @@ +semantic: 0.984 +other: 0.982 +instruction: 0.974 +mistranslation: 0.949 + +[Qemu-devel] [BUG] Migrate failes between boards with different PMC counts + +Hi all, + +Recently, I found migration failed when enable vPMU. + +migrate vPMU state was introduced in linux-3.10 + qemu-1.7. + +As long as enable vPMU, qemu will save / load the +vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. +But global_ctrl generated based on cpuid(0xA), the number of general-purpose +performance +monitoring counters(PMC) can vary according to Intel SDN. The number of PMC +presented +to vm, does not support configuration currently, it depend on host cpuid, and +enable all pmc +defaultly at KVM. It cause migration to fail between boards with different PMC +counts. + +The return value of cpuid (0xA) is different dur to cpu, according to Intel +SDNï¼18-10 Vol. 3B: + +Note: The number of general-purpose performance monitoring counters (i.e. N in +Figure 18-9) +can vary across processor generations within a processor family, across +processor families, or +could be different depending on the configuration chosen at boot time in the +BIOS regarding +Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; N +=4 for processors +based on the Nehalem microarchitecture; for processors based on the Sandy Bridge +microarchitecture, N = 4 if Intel Hyper Threading Technology is active and N=8 +if not active). + +Also I found, N=8 if HT is not active based on the broadwellï¼, +such as CPU E7-8890 v4 @ 2.20GHz + +# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +tcp::8888 +Completed 100 % +qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +kvm_put_msrs: +Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +Aborted + +So make number of pmc configurable to vm ? Any better idea ? + + +Regards, +-Zhuang Yanying + +* Zhuangyanying (address@hidden) wrote: +> +Hi all, +> +> +Recently, I found migration failed when enable vPMU. +> +> +migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> +As long as enable vPMU, qemu will save / load the +> +vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. +> +But global_ctrl generated based on cpuid(0xA), the number of general-purpose +> +performance +> +monitoring counters(PMC) can vary according to Intel SDN. The number of PMC +> +presented +> +to vm, does not support configuration currently, it depend on host cpuid, and +> +enable all pmc +> +defaultly at KVM. It cause migration to fail between boards with different +> +PMC counts. +> +> +The return value of cpuid (0xA) is different dur to cpu, according to Intel +> +SDNï¼18-10 Vol. 3B: +> +> +Note: The number of general-purpose performance monitoring counters (i.e. N +> +in Figure 18-9) +> +can vary across processor generations within a processor family, across +> +processor families, or +> +could be different depending on the configuration chosen at boot time in the +> +BIOS regarding +> +Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; +> +N =4 for processors +> +based on the Nehalem microarchitecture; for processors based on the Sandy +> +Bridge +> +microarchitecture, N = 4 if Intel Hyper Threading Technology is active and +> +N=8 if not active). +> +> +Also I found, N=8 if HT is not active based on the broadwellï¼, +> +such as CPU E7-8890 v4 @ 2.20GHz +> +> +# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +> +tcp::8888 +> +Completed 100 % +> +qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +kvm_put_msrs: +> +Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +Aborted +> +> +So make number of pmc configurable to vm ? Any better idea ? +Coincidentally we hit a similar problem a few days ago with -cpu host - it +took me +quite a while to spot the difference between the machines was the source +had hyperthreading disabled. + +An option to set the number of counters makes sense to me; but I wonder +how many other options we need as well. Also, I'm not sure there's any +easy way for libvirt etc to figure out how many counters a host supports - it's +not in /proc/cpuinfo. + +Dave + +> +> +Regards, +> +-Zhuang Yanying +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +* Zhuangyanying (address@hidden) wrote: +> +> Hi all, +> +> +> +> Recently, I found migration failed when enable vPMU. +> +> +> +> migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> +> +> As long as enable vPMU, qemu will save / load the +> +> vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> migration. +> +> But global_ctrl generated based on cpuid(0xA), the number of +> +> general-purpose performance +> +> monitoring counters(PMC) can vary according to Intel SDN. The number of PMC +> +> presented +> +> to vm, does not support configuration currently, it depend on host cpuid, +> +> and enable all pmc +> +> defaultly at KVM. It cause migration to fail between boards with different +> +> PMC counts. +> +> +> +> The return value of cpuid (0xA) is different dur to cpu, according to Intel +> +> SDNï¼18-10 Vol. 3B: +> +> +> +> Note: The number of general-purpose performance monitoring counters (i.e. N +> +> in Figure 18-9) +> +> can vary across processor generations within a processor family, across +> +> processor families, or +> +> could be different depending on the configuration chosen at boot time in +> +> the BIOS regarding +> +> Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom +> +> processors; N =4 for processors +> +> based on the Nehalem microarchitecture; for processors based on the Sandy +> +> Bridge +> +> microarchitecture, N = 4 if Intel Hyper Threading Technology is active and +> +> N=8 if not active). +> +> +> +> Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> such as CPU E7-8890 v4 @ 2.20GHz +> +> +> +> # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +> /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +> +> tcp::8888 +> +> Completed 100 % +> +> qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +> qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> kvm_put_msrs: +> +> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> Aborted +> +> +> +> So make number of pmc configurable to vm ? Any better idea ? +> +> +Coincidentally we hit a similar problem a few days ago with -cpu host - it +> +took me +> +quite a while to spot the difference between the machines was the source +> +had hyperthreading disabled. +> +> +An option to set the number of counters makes sense to me; but I wonder +> +how many other options we need as well. Also, I'm not sure there's any +> +easy way for libvirt etc to figure out how many counters a host supports - +> +it's not in /proc/cpuinfo. +We actually try to avoid /proc/cpuinfo whereever possible. We do direct +CPUID asm instructions to identify features, and prefer to use +/sys/devices/system/cpu if that has suitable data + +Where do the PMC counts come from originally ? CPUID or something else ? + +Regards, +Daniel +-- +|: +https://berrange.com +-o- +https://www.flickr.com/photos/dberrange +:| +|: +https://libvirt.org +-o- +https://fstop138.berrange.com +:| +|: +https://entangle-photo.org +-o- +https://www.instagram.com/dberrange +:| + +* Daniel P. Berrange (address@hidden) wrote: +> +On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> * Zhuangyanying (address@hidden) wrote: +> +> > Hi all, +> +> > +> +> > Recently, I found migration failed when enable vPMU. +> +> > +> +> > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > +> +> > As long as enable vPMU, qemu will save / load the +> +> > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> > migration. +> +> > But global_ctrl generated based on cpuid(0xA), the number of +> +> > general-purpose performance +> +> > monitoring counters(PMC) can vary according to Intel SDN. The number of +> +> > PMC presented +> +> > to vm, does not support configuration currently, it depend on host cpuid, +> +> > and enable all pmc +> +> > defaultly at KVM. It cause migration to fail between boards with +> +> > different PMC counts. +> +> > +> +> > The return value of cpuid (0xA) is different dur to cpu, according to +> +> > Intel SDNï¼18-10 Vol. 3B: +> +> > +> +> > Note: The number of general-purpose performance monitoring counters (i.e. +> +> > N in Figure 18-9) +> +> > can vary across processor generations within a processor family, across +> +> > processor families, or +> +> > could be different depending on the configuration chosen at boot time in +> +> > the BIOS regarding +> +> > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom +> +> > processors; N =4 for processors +> +> > based on the Nehalem microarchitecture; for processors based on the Sandy +> +> > Bridge +> +> > microarchitecture, N = 4 if Intel Hyper Threading Technology is active +> +> > and N=8 if not active). +> +> > +> +> > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > such as CPU E7-8890 v4 @ 2.20GHz +> +> > +> +> > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +> > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +> +> > tcp::8888 +> +> > Completed 100 % +> +> > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +> > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> > kvm_put_msrs: +> +> > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > Aborted +> +> > +> +> > So make number of pmc configurable to vm ? Any better idea ? +> +> +> +> Coincidentally we hit a similar problem a few days ago with -cpu host - it +> +> took me +> +> quite a while to spot the difference between the machines was the source +> +> had hyperthreading disabled. +> +> +> +> An option to set the number of counters makes sense to me; but I wonder +> +> how many other options we need as well. Also, I'm not sure there's any +> +> easy way for libvirt etc to figure out how many counters a host supports - +> +> it's not in /proc/cpuinfo. +> +> +We actually try to avoid /proc/cpuinfo whereever possible. We do direct +> +CPUID asm instructions to identify features, and prefer to use +> +/sys/devices/system/cpu if that has suitable data +> +> +Where do the PMC counts come from originally ? CPUID or something else ? +Yes, they're bits 8..15 of CPUID leaf 0xa + +Dave + +> +Regards, +> +Daniel +> +-- +> +|: +https://berrange.com +-o- +https://www.flickr.com/photos/dberrange +:| +> +|: +https://libvirt.org +-o- +https://fstop138.berrange.com +:| +> +|: +https://entangle-photo.org +-o- +https://www.instagram.com/dberrange +:| +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: +> +* Daniel P. Berrange (address@hidden) wrote: +> +> On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> > * Zhuangyanying (address@hidden) wrote: +> +> > > Hi all, +> +> > > +> +> > > Recently, I found migration failed when enable vPMU. +> +> > > +> +> > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > > +> +> > > As long as enable vPMU, qemu will save / load the +> +> > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> > > migration. +> +> > > But global_ctrl generated based on cpuid(0xA), the number of +> +> > > general-purpose performance +> +> > > monitoring counters(PMC) can vary according to Intel SDN. The number of +> +> > > PMC presented +> +> > > to vm, does not support configuration currently, it depend on host +> +> > > cpuid, and enable all pmc +> +> > > defaultly at KVM. It cause migration to fail between boards with +> +> > > different PMC counts. +> +> > > +> +> > > The return value of cpuid (0xA) is different dur to cpu, according to +> +> > > Intel SDNï¼18-10 Vol. 3B: +> +> > > +> +> > > Note: The number of general-purpose performance monitoring counters +> +> > > (i.e. N in Figure 18-9) +> +> > > can vary across processor generations within a processor family, across +> +> > > processor families, or +> +> > > could be different depending on the configuration chosen at boot time +> +> > > in the BIOS regarding +> +> > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom +> +> > > processors; N =4 for processors +> +> > > based on the Nehalem microarchitecture; for processors based on the +> +> > > Sandy Bridge +> +> > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active +> +> > > and N=8 if not active). +> +> > > +> +> > > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > > such as CPU E7-8890 v4 @ 2.20GHz +> +> > > +> +> > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +> > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true +> +> > > -incoming tcp::8888 +> +> > > Completed 100 % +> +> > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +> > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> > > kvm_put_msrs: +> +> > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > > Aborted +> +> > > +> +> > > So make number of pmc configurable to vm ? Any better idea ? +> +> > +> +> > Coincidentally we hit a similar problem a few days ago with -cpu host - +> +> > it took me +> +> > quite a while to spot the difference between the machines was the source +> +> > had hyperthreading disabled. +> +> > +> +> > An option to set the number of counters makes sense to me; but I wonder +> +> > how many other options we need as well. Also, I'm not sure there's any +> +> > easy way for libvirt etc to figure out how many counters a host supports - +> +> > it's not in /proc/cpuinfo. +> +> +> +> We actually try to avoid /proc/cpuinfo whereever possible. We do direct +> +> CPUID asm instructions to identify features, and prefer to use +> +> /sys/devices/system/cpu if that has suitable data +> +> +> +> Where do the PMC counts come from originally ? CPUID or something else ? +> +> +Yes, they're bits 8..15 of CPUID leaf 0xa +Ok, that's easy enough for libvirt to detect then. More a question of what +libvirt should then do this with the info.... + +Regards, +Daniel +-- +|: +https://berrange.com +-o- +https://www.flickr.com/photos/dberrange +:| +|: +https://libvirt.org +-o- +https://fstop138.berrange.com +:| +|: +https://entangle-photo.org +-o- +https://www.instagram.com/dberrange +:| + +> +-----Original Message----- +> +From: Daniel P. Berrange [ +mailto:address@hidden +> +Sent: Monday, April 24, 2017 6:34 PM +> +To: Dr. David Alan Gilbert +> +Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; +> +Gonglei (Arei); Huangzhichao; address@hidden +> +Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different +> +PMC counts +> +> +On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: +> +> * Daniel P. Berrange (address@hidden) wrote: +> +> > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> > > * Zhuangyanying (address@hidden) wrote: +> +> > > > Hi all, +> +> > > > +> +> > > > Recently, I found migration failed when enable vPMU. +> +> > > > +> +> > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > > > +> +> > > > As long as enable vPMU, qemu will save / load the +> +> > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +migration. +> +> > > > But global_ctrl generated based on cpuid(0xA), the number of +> +> > > > general-purpose performance monitoring counters(PMC) can vary +> +> > > > according to Intel SDN. The number of PMC presented to vm, does +> +> > > > not support configuration currently, it depend on host cpuid, and +> +> > > > enable +> +all pmc defaultly at KVM. It cause migration to fail between boards with +> +different PMC counts. +> +> > > > +> +> > > > The return value of cpuid (0xA) is different dur to cpu, according to +> +> > > > Intel +> +SDNï¼18-10 Vol. 3B: +> +> > > > +> +> > > > Note: The number of general-purpose performance monitoring +> +> > > > counters (i.e. N in Figure 18-9) can vary across processor +> +> > > > generations within a processor family, across processor +> +> > > > families, or could be different depending on the configuration +> +> > > > chosen at boot time in the BIOS regarding Intel Hyper Threading +> +> > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for +> +processors based on the Nehalem microarchitecture; for processors based on +> +the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading Technology +> +is active and N=8 if not active). +> +> > > > +> +> > > > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > > > such as CPU E7-8890 v4 @ 2.20GHz +> +> > > > +> +> > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m +> +> > > > 4096 -hda +> +> > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true +> +> > > > -incoming tcp::8888 Completed 100 % +> +> > > > qemu-system-x86_64: error: failed to set MSR 0x38f to +> +> > > > 0x7000000ff +> +> > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +kvm_put_msrs: +> +> > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > > > Aborted +> +> > > > +> +> > > > So make number of pmc configurable to vm ? Any better idea ? +> +> > > +> +> > > Coincidentally we hit a similar problem a few days ago with -cpu +> +> > > host - it took me quite a while to spot the difference between +> +> > > the machines was the source had hyperthreading disabled. +> +> > > +> +> > > An option to set the number of counters makes sense to me; but I +> +> > > wonder how many other options we need as well. Also, I'm not sure +> +> > > there's any easy way for libvirt etc to figure out how many +> +> > > counters a host supports - it's not in /proc/cpuinfo. +> +> > +> +> > We actually try to avoid /proc/cpuinfo whereever possible. We do +> +> > direct CPUID asm instructions to identify features, and prefer to +> +> > use /sys/devices/system/cpu if that has suitable data +> +> > +> +> > Where do the PMC counts come from originally ? CPUID or something +> +else ? +> +> +> +> Yes, they're bits 8..15 of CPUID leaf 0xa +> +> +Ok, that's easy enough for libvirt to detect then. More a question of what +> +libvirt +> +should then do this with the info.... +> +Do you mean to do a validation at the begining of migration? in +qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are +not equal, just quit migration? +It maybe a good enough first edition. +But for a further better edition, maybe it's better to support Heterogeneous +migration I think, so we might need to make PMC number configrable, then we +need to modify KVM/qemu as well. + +Regards, +-Zhuang Yanying + +* Zhuangyanying (address@hidden) wrote: +> +> +> +> -----Original Message----- +> +> From: Daniel P. Berrange [ +mailto:address@hidden +> +> Sent: Monday, April 24, 2017 6:34 PM +> +> To: Dr. David Alan Gilbert +> +> Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; +> +> Gonglei (Arei); Huangzhichao; address@hidden +> +> Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different +> +> PMC counts +> +> +> +> On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: +> +> > * Daniel P. Berrange (address@hidden) wrote: +> +> > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> > > > * Zhuangyanying (address@hidden) wrote: +> +> > > > > Hi all, +> +> > > > > +> +> > > > > Recently, I found migration failed when enable vPMU. +> +> > > > > +> +> > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > > > > +> +> > > > > As long as enable vPMU, qemu will save / load the +> +> > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> migration. +> +> > > > > But global_ctrl generated based on cpuid(0xA), the number of +> +> > > > > general-purpose performance monitoring counters(PMC) can vary +> +> > > > > according to Intel SDN. The number of PMC presented to vm, does +> +> > > > > not support configuration currently, it depend on host cpuid, and +> +> > > > > enable +> +> all pmc defaultly at KVM. It cause migration to fail between boards with +> +> different PMC counts. +> +> > > > > +> +> > > > > The return value of cpuid (0xA) is different dur to cpu, according +> +> > > > > to Intel +> +> SDNï¼18-10 Vol. 3B: +> +> > > > > +> +> > > > > Note: The number of general-purpose performance monitoring +> +> > > > > counters (i.e. N in Figure 18-9) can vary across processor +> +> > > > > generations within a processor family, across processor +> +> > > > > families, or could be different depending on the configuration +> +> > > > > chosen at boot time in the BIOS regarding Intel Hyper Threading +> +> > > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for +> +> processors based on the Nehalem microarchitecture; for processors based on +> +> the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading +> +> Technology +> +> is active and N=8 if not active). +> +> > > > > +> +> > > > > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > > > > such as CPU E7-8890 v4 @ 2.20GHz +> +> > > > > +> +> > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m +> +> > > > > 4096 -hda +> +> > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true +> +> > > > > -incoming tcp::8888 Completed 100 % +> +> > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to +> +> > > > > 0x7000000ff +> +> > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> kvm_put_msrs: +> +> > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > > > > Aborted +> +> > > > > +> +> > > > > So make number of pmc configurable to vm ? Any better idea ? +> +> > > > +> +> > > > Coincidentally we hit a similar problem a few days ago with -cpu +> +> > > > host - it took me quite a while to spot the difference between +> +> > > > the machines was the source had hyperthreading disabled. +> +> > > > +> +> > > > An option to set the number of counters makes sense to me; but I +> +> > > > wonder how many other options we need as well. Also, I'm not sure +> +> > > > there's any easy way for libvirt etc to figure out how many +> +> > > > counters a host supports - it's not in /proc/cpuinfo. +> +> > > +> +> > > We actually try to avoid /proc/cpuinfo whereever possible. We do +> +> > > direct CPUID asm instructions to identify features, and prefer to +> +> > > use /sys/devices/system/cpu if that has suitable data +> +> > > +> +> > > Where do the PMC counts come from originally ? CPUID or something +> +> else ? +> +> > +> +> > Yes, they're bits 8..15 of CPUID leaf 0xa +> +> +> +> Ok, that's easy enough for libvirt to detect then. More a question of what +> +> libvirt +> +> should then do this with the info.... +> +> +> +> +Do you mean to do a validation at the begining of migration? in +> +qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are +> +not equal, just quit migration? +> +It maybe a good enough first edition. +> +But for a further better edition, maybe it's better to support Heterogeneous +> +migration I think, so we might need to make PMC number configrable, then we +> +need to modify KVM/qemu as well. +Yes agreed; the only thing I wanted to check was that libvirt would have enough +information to be able to use any feature we added to QEMU. + +Dave + +> +Regards, +> +-Zhuang Yanying +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + diff --git a/classification_output/01/semantic/3242247 b/classification_output/01/semantic/3242247 new file mode 100644 index 000000000..d8352e564 --- /dev/null +++ b/classification_output/01/semantic/3242247 @@ -0,0 +1,406 @@ +semantic: 0.965 +mistranslation: 0.946 +other: 0.927 +instruction: 0.906 + +[Qemu-devel] [Bug?] Windows 7's time drift obviously while RTC rate switching frequently between high and low timer rate + +Hi, + +We tested with the latest QEMU, and found that time drift obviously (clock fast +in guest) +in Windows 7 64 bits guest in some cases. + +It is easily to reproduce, using the follow QEMU command line to start windows +7: + +# x86_64-softmmu/qemu-system-x86_64 -name win7_64_2U_raw -machine +pc-i440fx-2.6,accel=kvm,usb=off -cpu host -m 2048 -realtime mlock=off -smp +4,sockets=2,cores=2,threads=1 -rtc base=utc,clock=vm,driftfix=slew -no-hpet +-global kvm-pit.lost_tick_policy=discard -hda /mnt/nfs/win7_sp1_32_2U_raw -vnc +:11 -netdev tap,id=hn0,vhost=off -device rtl8139,id=net-pci0,netdev=hn0 -device +piix3-usb-uhci,id=usb -device usb-tablet,id=input0 -device usb-mouse,id=input1 +-device usb-kbd,id=input2 -monitor stdio + +Adjust the VM's time to host time, and run java application or run the follow +program +in windows 7: + +#pragma comment(lib, "winmm") +#include <stdio.h> +#include <windows.h> + +#define SWITCH_PEROID 13 + +int main() +{ + DWORD count = 0; + + while (1) + { + count++; + timeBeginPeriod(1); + DWORD start = timeGetTime(); + Sleep(40); + timeEndPeriod(1); + if ((count % SWITCH_PEROID) == 0) { + Sleep(1); + } + } + return 0; +} + +After few minutes, you will find that the time in windows 7 goes ahead of the +host time, drifts about several seconds. + +I have dug deeper in this problem. For windows systems that use the CMOS timer, +the base interrupt rate is usually 64Hz, but running some application in VM +will raise the timer rate to 1024Hz, running java application and or above +program will raise the timer rate. +Besides, Windows operating systems generally keep time by counting timer +interrupts (ticks). But QEMU seems not emulate the rate converting fine. + +We update the timer in function periodic_timer_update(): +static void periodic_timer_update(RTCState *s, int64_t current_time) +{ + + cur_clock = muldiv64(current_time, RTC_CLOCK_RATE, get_ticks_per_sec()); + next_irq_clock = (cur_clock & ~(period - 1)) + period; + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Here we calculate the next interrupt time by align the current clock with the +new period, I'm a little confused that why we care about the *history* time ? +If VM switches from high rate to low rate, the next interrupt time may come +earlier than it supposed to be. We have observed it in our test. we printed the +interval time of interrupts and the VM's current time (We got the time from VM). + +Here is part of the log: +... ... +period=512 irq inject 1534: 15625 us +Tue Mar 29 04:38:00 2016 +*irq_num_period_32=0, irq_num_period_512=64: [3]: Real time interval is 999696 +us +... ... +*irq_num_period_32=893, irq_num_period_512=9 [81]: Real time interval is 951086 +us +Convert 32 --- > 512: 703: 96578 us +period=512 irq inject 44391: 12702 us +Convert 512 --- > 32: 704: 12704 us11 +period=32 irq inject 44392: 979 us +... ... +32 --- > 512: 705: 24388 us +period=512 irq inject 44417: 6834 us +Convert 512 --- > 32: 706: 6830 us +period=32 irq inject 44418: 978 us +... ... +Convert 32 --- > 512: 707: 60525 us +period=512 irq inject 44480: 1945 us +Convert 512 --- > 32: 708: 1955 us +period=32 irq inject 44481: 977 us +... ... +Convert 32 --- > 512: 709: 36105 us +period=512 irq inject 44518: 10741 us +Convert 512 --- > 32: 710: 10736 us +period=32 irq inject 44519: 989 us +... ... +Convert 32 --- > 512: 711: 123998 us +period=512 irq inject 44646: 974 us +period=512 irq inject 44647: 15607 us +Convert 512 --- > 32: 712: 16560 us +period=32 irq inject 44648: 980 us +... ... +period=32 irq inject 44738: 974 us +Convert 32 --- > 512: 713: 88828 us +period=512 irq inject 44739: 4885 us +Convert 512 --- > 32: 714: 4882 us +period=32 irq inject 44740: 989 us +... ... +period=32 irq inject 44842: 974 us +Convert 32 --- > 512: 715: 100537 us +period=512 irq inject 44843: 8788 us +Convert 512 --- > 32: 716: 8789 us +period=32 irq inject 44844: 972 us +... ... +period=32 irq inject 44941: 979 us +Convert 32 --- > 512: 717: 95677 us +period=512 irq inject 44942: 13661 us +Convert 512 --- > 32: 718: 13657 us +period=32 irq inject 44943: 987 us +... ... +Convert 32 --- > 512: 719: 94690 us +period=512 irq inject 45040: 14643 us +Convert 512 --- > 32: 720: 14642 us +period=32 irq inject 45041: 974 us +... ... +Convert 32 --- > 512: 721: 88848 us +period=512 irq inject 45132: 4892 us +Convert 512 --- > 32: 722: 4931 us +period=32 irq inject 45133: 964 us +... ... +Tue Mar 29 04:39:19 2016 +*irq_num_period_32:835, irq_num_period_512:11 [82], Real time interval is +911520 us + +For windows 7, it has got 835 IRQs which injected during the period of 32, +and got 11 IRQs that injected during the period of 512. it updated the +wall-clock +time with one second, because it supposed it has counted +(835*976.5+11*15625)= 987252.5 us, but the real interval time is 911520 us. + +IMHO, we should calculate the next interrupt time based on the time of last +interrupt injected, and it seems to be more similar with hardware CMOS timer +in this way. +Maybe someone can tell me the reason why we calculated the interrupt timer +in that way, or is it a bug ? ;) + +Thanks, +Hailiang + +ping... + +It seems that we can eliminate the drift by the following patch. +(I tested it for two hours, and there is no drift, before, the timer +in Windows 7 drifts about 2 seconds per minute.) I'm not sure if it is +the right way to solve the problem. +Any comments are welcomed. Thanks. + +From bd6acd577cbbc9d92d6376c770219470f184f7de Mon Sep 17 00:00:00 2001 +From: zhanghailiang <address@hidden> +Date: Thu, 31 Mar 2016 16:36:15 -0400 +Subject: [PATCH] timer/mc146818rtc: fix timer drift in Windows OS while RTC + rate converting frequently + +Signed-off-by: zhanghailiang <address@hidden> +--- + hw/timer/mc146818rtc.c | 25 ++++++++++++++++++++++--- + 1 file changed, 22 insertions(+), 3 deletions(-) + +diff --git a/hw/timer/mc146818rtc.c b/hw/timer/mc146818rtc.c +index 2ac0fd3..e39d2da 100644 +--- a/hw/timer/mc146818rtc.c ++++ b/hw/timer/mc146818rtc.c +@@ -79,6 +79,7 @@ typedef struct RTCState { + /* periodic timer */ + QEMUTimer *periodic_timer; + int64_t next_periodic_time; ++ uint64_t last_periodic_time; + /* update-ended timer */ + QEMUTimer *update_timer; + uint64_t next_alarm_time; +@@ -152,7 +153,8 @@ static void rtc_coalesced_timer(void *opaque) + static void periodic_timer_update(RTCState *s, int64_t current_time) + { + int period_code, period; +- int64_t cur_clock, next_irq_clock; ++ int64_t cur_clock, next_irq_clock, pre_irq_clock; ++ bool change = false; + + period_code = s->cmos_data[RTC_REG_A] & 0x0f; + if (period_code != 0 +@@ -165,14 +167,28 @@ static void periodic_timer_update(RTCState *s, int64_t +current_time) + if (period != s->period) { + s->irq_coalesced = (s->irq_coalesced * s->period) / period; + DPRINTF_C("cmos: coalesced irqs scaled to %d\n", s->irq_coalesced); ++ if (s->period && period) { ++ change = true; ++ } + } + s->period = period; + #endif + /* compute 32 khz clock */ + cur_clock = + muldiv64(current_time, RTC_CLOCK_RATE, NANOSECONDS_PER_SECOND); ++ if (change) { ++ int offset = 0; + +- next_irq_clock = (cur_clock & ~(period - 1)) + period; ++ pre_irq_clock = muldiv64(s->last_periodic_time, RTC_CLOCK_RATE, ++ NANOSECONDS_PER_SECOND); ++ if ((cur_clock - pre_irq_clock) > period) { ++ offset = (cur_clock - pre_irq_clock) / period; ++ } ++ s->irq_coalesced += offset; ++ next_irq_clock = pre_irq_clock + (offset + 1) * period; ++ } else { ++ next_irq_clock = (cur_clock & ~(period - 1)) + period; ++ } + s->next_periodic_time = muldiv64(next_irq_clock, +NANOSECONDS_PER_SECOND, + RTC_CLOCK_RATE) + 1; + timer_mod(s->periodic_timer, s->next_periodic_time); +@@ -187,7 +203,9 @@ static void periodic_timer_update(RTCState *s, int64_t +current_time) + static void rtc_periodic_timer(void *opaque) + { + RTCState *s = opaque; +- ++ int64_t next_periodic_time; ++ ++ next_periodic_time = s->next_periodic_time; + periodic_timer_update(s, s->next_periodic_time); + s->cmos_data[RTC_REG_C] |= REG_C_PF; + if (s->cmos_data[RTC_REG_B] & REG_B_PIE) { +@@ -204,6 +222,7 @@ static void rtc_periodic_timer(void *opaque) + DPRINTF_C("cmos: coalesced irqs increased to %d\n", + s->irq_coalesced); + } ++ s->last_periodic_time = next_periodic_time; + } else + #endif + qemu_irq_raise(s->irq); +-- +1.8.3.1 + + +On 2016/3/29 19:58, Hailiang Zhang wrote: +Hi, + +We tested with the latest QEMU, and found that time drift obviously (clock fast +in guest) +in Windows 7 64 bits guest in some cases. + +It is easily to reproduce, using the follow QEMU command line to start windows +7: + +# x86_64-softmmu/qemu-system-x86_64 -name win7_64_2U_raw -machine +pc-i440fx-2.6,accel=kvm,usb=off -cpu host -m 2048 -realtime mlock=off -smp +4,sockets=2,cores=2,threads=1 -rtc base=utc,clock=vm,driftfix=slew -no-hpet +-global kvm-pit.lost_tick_policy=discard -hda /mnt/nfs/win7_sp1_32_2U_raw -vnc +:11 -netdev tap,id=hn0,vhost=off -device rtl8139,id=net-pci0,netdev=hn0 -device +piix3-usb-uhci,id=usb -device usb-tablet,id=input0 -device usb-mouse,id=input1 +-device usb-kbd,id=input2 -monitor stdio + +Adjust the VM's time to host time, and run java application or run the follow +program +in windows 7: + +#pragma comment(lib, "winmm") +#include <stdio.h> +#include <windows.h> + +#define SWITCH_PEROID 13 + +int main() +{ + DWORD count = 0; + + while (1) + { + count++; + timeBeginPeriod(1); + DWORD start = timeGetTime(); + Sleep(40); + timeEndPeriod(1); + if ((count % SWITCH_PEROID) == 0) { + Sleep(1); + } + } + return 0; +} + +After few minutes, you will find that the time in windows 7 goes ahead of the +host time, drifts about several seconds. + +I have dug deeper in this problem. For windows systems that use the CMOS timer, +the base interrupt rate is usually 64Hz, but running some application in VM +will raise the timer rate to 1024Hz, running java application and or above +program will raise the timer rate. +Besides, Windows operating systems generally keep time by counting timer +interrupts (ticks). But QEMU seems not emulate the rate converting fine. + +We update the timer in function periodic_timer_update(): +static void periodic_timer_update(RTCState *s, int64_t current_time) +{ + + cur_clock = muldiv64(current_time, RTC_CLOCK_RATE, +get_ticks_per_sec()); + next_irq_clock = (cur_clock & ~(period - 1)) + period; + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Here we calculate the next interrupt time by align the current clock with the +new period, I'm a little confused that why we care about the *history* time ? +If VM switches from high rate to low rate, the next interrupt time may come +earlier than it supposed to be. We have observed it in our test. we printed the +interval time of interrupts and the VM's current time (We got the time from VM). + +Here is part of the log: +... ... +period=512 irq inject 1534: 15625 us +Tue Mar 29 04:38:00 2016 +*irq_num_period_32=0, irq_num_period_512=64: [3]: Real time interval is 999696 +us +... ... +*irq_num_period_32=893, irq_num_period_512=9 [81]: Real time interval is 951086 +us +Convert 32 --- > 512: 703: 96578 us +period=512 irq inject 44391: 12702 us +Convert 512 --- > 32: 704: 12704 us11 +period=32 irq inject 44392: 979 us +... ... +32 --- > 512: 705: 24388 us +period=512 irq inject 44417: 6834 us +Convert 512 --- > 32: 706: 6830 us +period=32 irq inject 44418: 978 us +... ... +Convert 32 --- > 512: 707: 60525 us +period=512 irq inject 44480: 1945 us +Convert 512 --- > 32: 708: 1955 us +period=32 irq inject 44481: 977 us +... ... +Convert 32 --- > 512: 709: 36105 us +period=512 irq inject 44518: 10741 us +Convert 512 --- > 32: 710: 10736 us +period=32 irq inject 44519: 989 us +... ... +Convert 32 --- > 512: 711: 123998 us +period=512 irq inject 44646: 974 us +period=512 irq inject 44647: 15607 us +Convert 512 --- > 32: 712: 16560 us +period=32 irq inject 44648: 980 us +... ... +period=32 irq inject 44738: 974 us +Convert 32 --- > 512: 713: 88828 us +period=512 irq inject 44739: 4885 us +Convert 512 --- > 32: 714: 4882 us +period=32 irq inject 44740: 989 us +... ... +period=32 irq inject 44842: 974 us +Convert 32 --- > 512: 715: 100537 us +period=512 irq inject 44843: 8788 us +Convert 512 --- > 32: 716: 8789 us +period=32 irq inject 44844: 972 us +... ... +period=32 irq inject 44941: 979 us +Convert 32 --- > 512: 717: 95677 us +period=512 irq inject 44942: 13661 us +Convert 512 --- > 32: 718: 13657 us +period=32 irq inject 44943: 987 us +... ... +Convert 32 --- > 512: 719: 94690 us +period=512 irq inject 45040: 14643 us +Convert 512 --- > 32: 720: 14642 us +period=32 irq inject 45041: 974 us +... ... +Convert 32 --- > 512: 721: 88848 us +period=512 irq inject 45132: 4892 us +Convert 512 --- > 32: 722: 4931 us +period=32 irq inject 45133: 964 us +... ... +Tue Mar 29 04:39:19 2016 +*irq_num_period_32:835, irq_num_period_512:11 [82], Real time interval is +911520 us + +For windows 7, it has got 835 IRQs which injected during the period of 32, +and got 11 IRQs that injected during the period of 512. it updated the +wall-clock +time with one second, because it supposed it has counted +(835*976.5+11*15625)= 987252.5 us, but the real interval time is 911520 us. + +IMHO, we should calculate the next interrupt time based on the time of last +interrupt injected, and it seems to be more similar with hardware CMOS timer +in this way. +Maybe someone can tell me the reason why we calculated the interrupt timer +in that way, or is it a bug ? ;) + +Thanks, +Hailiang + diff --git a/classification_output/01/semantic/3847403 b/classification_output/01/semantic/3847403 new file mode 100644 index 000000000..be4dcd774 --- /dev/null +++ b/classification_output/01/semantic/3847403 @@ -0,0 +1,83 @@ +semantic: 0.866 +mistranslation: 0.759 +instruction: 0.597 +other: 0.200 + +[Qemu-devel] [BUG] network qga : windows os lost ip address of the network card in some cases + +We think this problem coulde be solevd in qga modulesãcan anybody give some +advice ? + + +[BUG] network : windows os lost ip address of the network card in some cases + +we found this problem for a long time ãFor example, if we has three network +card in virtual xml file ï¼such as "network connection 1" / "network connection +2"/"network connection 3" ã + +Echo network card has own ip address ï¼such as 192.168.1.1 / 2.1 /3.1 , when +delete the first card ï¼reboot the windows virtual os, then this problem +happened ! + + + + +we found that the sencond network card will replace the first one , then the +ip address of "network connection 2 " become 192.168.1.1 ã + + +Our third party users began to complain about this bug ãAll the business of the +second ip lost !!! + +I mean both of windows and linux has this bug , we solve this bug in linux +throught bonding netcrad pci and mac address ã + +There is no good solution on windows os . thera are ? we implemented a plan to +resumption of IP by QGA. Is there a better way ? + + + + + + + + +åå§é®ä»¶ + + + +å件人ï¼å°¹ä½ä¸º10144574 +æ¶ä»¶äººï¼ address@hidden +æ¥ æ ï¼2017å¹´04æ14æ¥ 16:46 +主 é¢ ï¼[BUG] network : windows os lost ip address of the network card in some +cases + + + + + + +we found this problem for a long time ãFor example, if we has three network +card in virtual xml file ï¼such as "network connection 1" / "network connection +2"/"network connection 3" ã + +Echo network card has own ip address ï¼such as 192.168.1.1 / 2.1 /3.1 , when +delete the first card ï¼reboot the windows virtual os, then this problem +happened ! + + + + +we found that the sencond network card will replace the first one , then the +ip address of "network connection 2 " become 192.168.1.1 ã + + +Our third party users began to complain about this bug ãAll the business of the +second ip lost !!! + +I mean both of windows and linux has this bug , we solve this bug in linux +throught bonding netcrad pci and mac address ã + +There is no good solution on windows os . thera are ? we implemented a plan to +resumption of IP by QGA. Is there a better way ? + diff --git a/classification_output/01/semantic/7837801 b/classification_output/01/semantic/7837801 new file mode 100644 index 000000000..8816300a8 --- /dev/null +++ b/classification_output/01/semantic/7837801 @@ -0,0 +1,113 @@ +semantic: 0.814 +mistranslation: 0.813 +instruction: 0.748 +other: 0.707 + +[Qemu-devel] [BUG] [low severity] a strange appearance of message involving slirp while doing "empty" make + +Folks, + +If qemu tree is already fully built, and "make" is attempted, for 3.1, the +outcome is: + +$ make + CHK version_gen.h +$ + +For 4.0-rc0, the outcome seems to be different: + +$ make +make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp' +make[1]: Nothing to be done for 'all'. +make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp' + CHK version_gen.h +$ + +Not sure how significant is that, but I report it just in case. + +Yours, +Aleksandar + +On 20/03/2019 22.08, Aleksandar Markovic wrote: +> +Folks, +> +> +If qemu tree is already fully built, and "make" is attempted, for 3.1, the +> +outcome is: +> +> +$ make +> +CHK version_gen.h +> +$ +> +> +For 4.0-rc0, the outcome seems to be different: +> +> +$ make +> +make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp' +> +make[1]: Nothing to be done for 'all'. +> +make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp' +> +CHK version_gen.h +> +$ +> +> +Not sure how significant is that, but I report it just in case. +It's likely because slirp is currently being reworked to become a +separate project, so the makefiles have been changed a little bit. I +guess the message will go away again once slirp has become a stand-alone +library. + + Thomas + +On Fri, 22 Mar 2019 at 04:59, Thomas Huth <address@hidden> wrote: +> +On 20/03/2019 22.08, Aleksandar Markovic wrote: +> +> $ make +> +> make[1]: Entering directory '/home/build/malta-mips64r6/qemu-4.0/slirp' +> +> make[1]: Nothing to be done for 'all'. +> +> make[1]: Leaving directory '/home/build/malta-mips64r6/qemu-4.0/slirp' +> +> CHK version_gen.h +> +> $ +> +> +> +> Not sure how significant is that, but I report it just in case. +> +> +It's likely because slirp is currently being reworked to become a +> +separate project, so the makefiles have been changed a little bit. I +> +guess the message will go away again once slirp has become a stand-alone +> +library. +Well, we'll still need to ship slirp for the foreseeable future... + +I think the cause of this is that the rule in Makefile for +calling the slirp Makefile is not passing it $(SUBDIR_MAKEFLAGS) +like all the other recursive make invocations. If we do that +then we'll suppress the entering/leaving messages for +non-verbose builds. (Some tweaking will be needed as +it looks like the slirp makefile has picked an incompatible +meaning for $BUILD_DIR, which the SUBDIR_MAKEFLAGS will +also be passing to it.) + +thanks +-- PMM + diff --git a/classification_output/01/semantic/8511484 b/classification_output/01/semantic/8511484 new file mode 100644 index 000000000..1340f1aa7 --- /dev/null +++ b/classification_output/01/semantic/8511484 @@ -0,0 +1,296 @@ +semantic: 0.911 +instruction: 0.894 +other: 0.886 +mistranslation: 0.844 + +[Qemu-devel] [BUG] virtio-net linux driver fails to probe on MIPS Malta since 'hw/virtio-pci: fix virtio behaviour' + +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? + +Cheers +James +signature.asc +Description: +Digital signature + +On 03/17/2017 11:57 PM, James Hogan wrote: +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? +Hi, + + +This is strange. This commit changes virtio devices from legacy to virtio +"transitional". +(your command line changes it to legacy) +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +side +there is nothing new. + +Michael, do you have any idea? + +Thanks, +Marcel +Cheers +James + +On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote: +> +On 03/17/2017 11:57 PM, James Hogan wrote: +> +> Hi, +> +> +> +> I've bisected the following failure of the virtio_net linux v4.10 driver +> +> to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: +> +> +> +> virtio_net virtio0: virtio: device uses modern interface but does not have +> +> VIRTIO_F_VERSION_1 +> +> virtio_net: probe of virtio0 failed with error -22 +> +> +> +> To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). +> +> +> +> It appears that adding ",disable-modern=on,disable-legacy=off" to the +> +> virtio-net -device makes it work again. +> +> +> +> I presume this should really just work out of the box. Any ideas why it +> +> isn't? +> +> +> +> +Hi, +> +> +> +This is strange. This commit changes virtio devices from legacy to virtio +> +"transitional". +> +(your command line changes it to legacy) +> +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +> +side +> +there is nothing new. +> +> +Michael, do you have any idea? +> +> +Thanks, +> +Marcel +My guess would be firmware mishandling 64 bit BARs - we saw such +a case on sparc previously. As a result you are probably reading +all zeroes from features register or something like that. +Marcel, could you send a patch making the bar 32 bit? +If that helps we know what the issue is. + +> +> Cheers +> +> James +> +> + +On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote: +On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote: +On 03/17/2017 11:57 PM, James Hogan wrote: +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? +Hi, + + +This is strange. This commit changes virtio devices from legacy to virtio +"transitional". +(your command line changes it to legacy) +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +side +there is nothing new. + +Michael, do you have any idea? + +Thanks, +Marcel +My guess would be firmware mishandling 64 bit BARs - we saw such +a case on sparc previously. As a result you are probably reading +all zeroes from features register or something like that. +Marcel, could you send a patch making the bar 32 bit? +If that helps we know what the issue is. +Sure, + +Thanks, +Marcel +Cheers +James + +On 03/20/2017 05:43 PM, Michael S. Tsirkin wrote: +On Mon, Mar 20, 2017 at 05:21:22PM +0200, Marcel Apfelbaum wrote: +On 03/17/2017 11:57 PM, James Hogan wrote: +Hi, + +I've bisected the following failure of the virtio_net linux v4.10 driver +to probe in QEMU v2.9.0-rc1 emulating a MIPS Malta machine: + +virtio_net virtio0: virtio: device uses modern interface but does not have +VIRTIO_F_VERSION_1 +virtio_net: probe of virtio0 failed with error -22 + +To QEMU commit 9a4c0e220d8a ("hw/virtio-pci: fix virtio behaviour"). + +It appears that adding ",disable-modern=on,disable-legacy=off" to the +virtio-net -device makes it work again. + +I presume this should really just work out of the box. Any ideas why it +isn't? +Hi, + + +This is strange. This commit changes virtio devices from legacy to virtio +"transitional". +(your command line changes it to legacy) +Linux 4.10 supports virtio modern/transitional (as far as I know) and on QEMU +side +there is nothing new. + +Michael, do you have any idea? + +Thanks, +Marcel +My guess would be firmware mishandling 64 bit BARs - we saw such +a case on sparc previously. As a result you are probably reading +all zeroes from features register or something like that. +Marcel, could you send a patch making the bar 32 bit? +If that helps we know what the issue is. +Hi James, + +Can you please check if the below patch fixes the problem? +Please note it is not a solution. + +diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c +index f9b7244..5b4d429 100644 +--- a/hw/virtio/virtio-pci.c ++++ b/hw/virtio/virtio-pci.c +@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d, +Error **errp) + } + + pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx, +- PCI_BASE_ADDRESS_SPACE_MEMORY | +- PCI_BASE_ADDRESS_MEM_PREFETCH | +- PCI_BASE_ADDRESS_MEM_TYPE_64, ++ PCI_BASE_ADDRESS_SPACE_MEMORY, + &proxy->modern_bar); + + proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap); + + +Thanks, +Marcel + +Hi Marcel, + +On Tue, Mar 21, 2017 at 04:16:58PM +0200, Marcel Apfelbaum wrote: +> +Can you please check if the below patch fixes the problem? +> +Please note it is not a solution. +> +> +diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c +> +index f9b7244..5b4d429 100644 +> +--- a/hw/virtio/virtio-pci.c +> ++++ b/hw/virtio/virtio-pci.c +> +@@ -1671,9 +1671,7 @@ static void virtio_pci_device_plugged(DeviceState *d, +> +Error **errp) +> +} +> +> +pci_register_bar(&proxy->pci_dev, proxy->modern_mem_bar_idx, +> +- PCI_BASE_ADDRESS_SPACE_MEMORY | +> +- PCI_BASE_ADDRESS_MEM_PREFETCH | +> +- PCI_BASE_ADDRESS_MEM_TYPE_64, +> ++ PCI_BASE_ADDRESS_SPACE_MEMORY, +> +&proxy->modern_bar); +> +> +proxy->config_cap = virtio_pci_add_mem_cap(proxy, &cfg.cap); +Sorry for the delay trying this, I was away last week. + +No, it doesn't seem to make any difference. + +Thanks +James +signature.asc +Description: +Digital signature + |