diff options
Diffstat (limited to 'classification_output/03/KVM/33802194')
| -rw-r--r-- | classification_output/03/KVM/33802194 | 4942 |
1 files changed, 4942 insertions, 0 deletions
diff --git a/classification_output/03/KVM/33802194 b/classification_output/03/KVM/33802194 new file mode 100644 index 000000000..5a0c1895c --- /dev/null +++ b/classification_output/03/KVM/33802194 @@ -0,0 +1,4942 @@ +KVM: 0.725 +instruction: 0.693 +mistranslation: 0.687 +semantic: 0.656 +network: 0.644 +other: 0.637 +boot: 0.631 + +[BUG] cxl can not create region + +Hi list + +I want to test cxl functions in arm64, and found some problems I can't +figure out. + +My test environment: + +1. build latest bios from +https://github.com/tianocore/edk2.git +master +branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2) +2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git +master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm +support patch: +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +3. build Linux kernel from +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git +preview +branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683) +4. build latest ndctl tools from +https://github.com/pmem/ndctl +create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159) + +And my qemu test commands: +sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \ + -cpu max -smp 8 -nographic -no-reboot \ + -kernel $KERNEL -bios $BIOS_BIN \ + -drive if=none,file=$ROOTFS,format=qcow2,id=hd \ + -device virtio-blk-pci,drive=hd -append 'root=/dev/vda1 +nokaslr dyndbg="module cxl* +p"' \ + -object memory-backend-ram,size=4G,id=mem0 \ + -numa node,nodeid=0,cpus=0-7,memdev=mem0 \ + -net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \ + -object +memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M +\ + -object +memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M +\ + -object +memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M +\ + -object +memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M +\ + -object +memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M +\ + -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ + -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ + -device cxl-upstream,bus=root_port0,id=us0 \ + -device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ + -device +cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ + -device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ + -device +cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ + -device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ + -device +cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ + -device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ + -device +cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ + -M +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k + +And I have got two problems. +1. When I want to create x1 region with command: "cxl create-region -d +decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer +reference. Crash log: + +[ 534.697324] cxl_region region0: config state: 0 +[ 534.697346] cxl_region region0: probe: -6 +[ 534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0 +[ 534.699115] cxl region0: mem0:endpoint3 decoder3.0 add: +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +[ 534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +[ 534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +[ 534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +[ 534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +for mem0:decoder3.0 @ 0 +[ 534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256 +[ 534.699193] cxl region0: 0000:0d:00.0:port2 target[0] = +0000:0e:00.0 for mem0:decoder3.0 @ 0 +[ 534.699405] Unable to handle kernel NULL pointer dereference at +virtual address 0000000000000000 +[ 534.701474] Mem abort info: +[ 534.701994] ESR = 0x0000000086000004 +[ 534.702653] EC = 0x21: IABT (current EL), IL = 32 bits +[ 534.703616] SET = 0, FnV = 0 +[ 534.704174] EA = 0, S1PTW = 0 +[ 534.704803] FSC = 0x04: level 0 translation fault +[ 534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000 +[ 534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 +[ 534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP +[ 534.710301] Modules linked in: +[ 534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted +5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11 +[ 534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 +[ 534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +[ 534.719190] pc : 0x0 +[ 534.719928] lr : commit_store+0x118/0x2cc +[ 534.721007] sp : ffff80000aec3c30 +[ 534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: ffff0000c0c06b30 +[ 534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: ffff0000c0a29400 +[ 534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: ffff0000c0c06800 +[ 534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: 0000000000000000 +[ 534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffd41fe838 +[ 534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000 +[ 534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000 +[ 534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : ffff0000c0906e80 +[ 534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffff80000aec3bf0 +[ 534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c155a000 +[ 534.738878] Call trace: +[ 534.739368] 0x0 +[ 534.739713] dev_attr_store+0x1c/0x30 +[ 534.740186] sysfs_kf_write+0x48/0x58 +[ 534.740961] kernfs_fop_write_iter+0x128/0x184 +[ 534.741872] new_sync_write+0xdc/0x158 +[ 534.742706] vfs_write+0x1ac/0x2a8 +[ 534.743440] ksys_write+0x68/0xf0 +[ 534.744328] __arm64_sys_write+0x1c/0x28 +[ 534.745180] invoke_syscall+0x44/0xf0 +[ 534.745989] el0_svc_common+0x4c/0xfc +[ 534.746661] do_el0_svc+0x60/0xa8 +[ 534.747378] el0_svc+0x2c/0x78 +[ 534.748066] el0t_64_sync_handler+0xb8/0x12c +[ 534.748919] el0t_64_sync+0x18c/0x190 +[ 534.749629] Code: bad PC value +[ 534.750169] ---[ end trace 0000000000000000 ]--- + +2. When I want to create x4 region with command: "cxl create-region -d +decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors: + +cxl region: create_region: region0: failed to set target3 to mem3 +cxl region: cmd_create_region: created 0 regions + +And kernel log as below: +[ 60.536663] cxl_region region0: config state: 0 +[ 60.536675] cxl_region region0: probe: -6 +[ 60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0 +[ 60.538251] cxl region0: mem0:endpoint3 decoder3.0 add: +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +[ 60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +[ 60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +[ 60.538647] cxl region0: mem1:endpoint4 decoder4.0 add: +mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1 +[ 60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2 +[ 60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1 +[ 60.539311] cxl region0: mem2:endpoint5 decoder5.0 add: +mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1 +[ 60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3 +[ 60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1 +[ 60.539711] cxl region0: mem3:endpoint6 decoder6.0 add: +mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1 +[ 60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4 +[ 60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add: +mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1 +[ 60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +[ 60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +for mem0:decoder3.0 @ 0 +[ 60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512 +[ 60.539758] cxl region0: 0000:0d:00.0:port2 target[0] = +0000:0e:00.0 for mem0:decoder3.0 @ 0 +[ 60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at 1 + +I have tried to write sysfs node manually, got same errors. + +Hope I can get some helps here. + +Bob + +On Fri, 5 Aug 2022 10:20:23 +0800 +Bobo WL <lmw.bobo@gmail.com> wrote: + +> +Hi list +> +> +I want to test cxl functions in arm64, and found some problems I can't +> +figure out. +Hi Bob, + +Glad to see people testing this code. + +> +> +My test environment: +> +> +1. build latest bios from +https://github.com/tianocore/edk2.git +master +> +branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2) +> +2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git +> +master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm +> +support patch: +> +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +> +3. build Linux kernel from +> +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git +preview +> +branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683) +> +4. build latest ndctl tools from +https://github.com/pmem/ndctl +> +create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159) +> +> +And my qemu test commands: +> +sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \ +> +-cpu max -smp 8 -nographic -no-reboot \ +> +-kernel $KERNEL -bios $BIOS_BIN \ +> +-drive if=none,file=$ROOTFS,format=qcow2,id=hd \ +> +-device virtio-blk-pci,drive=hd -append 'root=/dev/vda1 +> +nokaslr dyndbg="module cxl* +p"' \ +> +-object memory-backend-ram,size=4G,id=mem0 \ +> +-numa node,nodeid=0,cpus=0-7,memdev=mem0 \ +> +-net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \ +> +-object +> +memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M +> +\ +> +-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +Probably not related to your problem, but there is a disconnect in QEMU / +kernel assumptionsaround the presence of an HDM decoder when a HB only +has a single root port. Spec allows it to be provided or not as an +implementation choice. +Kernel assumes it isn't provide. Qemu assumes it is. + +The temporary solution is to throw in a second root port on the HB and not +connect anything to it. Longer term I may special case this so that the +particular +decoder defaults to pass through settings in QEMU if there is only one root +port. + +> +-device cxl-upstream,bus=root_port0,id=us0 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ +> +-device +> +cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ +> +-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ +> +-device +> +cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ +> +-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ +> +-device +> +cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ +> +-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ +> +-device +> +cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ +> +-M +> +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k +> +> +And I have got two problems. +> +1. When I want to create x1 region with command: "cxl create-region -d +> +decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer +> +reference. Crash log: +> +> +[ 534.697324] cxl_region region0: config state: 0 +> +[ 534.697346] cxl_region region0: probe: -6 +Seems odd this is up here. But maybe fine. + +> +[ 534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 534.699115] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256 +> +[ 534.699193] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 534.699405] Unable to handle kernel NULL pointer dereference at +> +virtual address 0000000000000000 +> +[ 534.701474] Mem abort info: +> +[ 534.701994] ESR = 0x0000000086000004 +> +[ 534.702653] EC = 0x21: IABT (current EL), IL = 32 bits +> +[ 534.703616] SET = 0, FnV = 0 +> +[ 534.704174] EA = 0, S1PTW = 0 +> +[ 534.704803] FSC = 0x04: level 0 translation fault +> +[ 534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000 +> +[ 534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 +> +[ 534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP +> +[ 534.710301] Modules linked in: +> +[ 534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted +> +5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11 +> +[ 534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 534.719190] pc : 0x0 +> +[ 534.719928] lr : commit_store+0x118/0x2cc +> +[ 534.721007] sp : ffff80000aec3c30 +> +[ 534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: +> +ffff0000c0c06b30 +> +[ 534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: +> +ffff0000c0a29400 +> +[ 534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: +> +ffff0000c0c06800 +> +[ 534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: +> +0000000000000000 +> +[ 534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: +> +0000ffffd41fe838 +> +[ 534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: +> +0000000000000000 +> +[ 534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : +> +0000000000000000 +> +[ 534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : +> +ffff0000c0906e80 +> +[ 534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : +> +ffff80000aec3bf0 +> +[ 534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : +> +ffff0000c155a000 +> +[ 534.738878] Call trace: +> +[ 534.739368] 0x0 +> +[ 534.739713] dev_attr_store+0x1c/0x30 +> +[ 534.740186] sysfs_kf_write+0x48/0x58 +> +[ 534.740961] kernfs_fop_write_iter+0x128/0x184 +> +[ 534.741872] new_sync_write+0xdc/0x158 +> +[ 534.742706] vfs_write+0x1ac/0x2a8 +> +[ 534.743440] ksys_write+0x68/0xf0 +> +[ 534.744328] __arm64_sys_write+0x1c/0x28 +> +[ 534.745180] invoke_syscall+0x44/0xf0 +> +[ 534.745989] el0_svc_common+0x4c/0xfc +> +[ 534.746661] do_el0_svc+0x60/0xa8 +> +[ 534.747378] el0_svc+0x2c/0x78 +> +[ 534.748066] el0t_64_sync_handler+0xb8/0x12c +> +[ 534.748919] el0t_64_sync+0x18c/0x190 +> +[ 534.749629] Code: bad PC value +> +[ 534.750169] ---[ end trace 0000000000000000 ]--- +> +> +2. When I want to create x4 region with command: "cxl create-region -d +> +decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors: +> +> +cxl region: create_region: region0: failed to set target3 to mem3 +> +cxl region: cmd_create_region: created 0 regions +> +> +And kernel log as below: +> +[ 60.536663] cxl_region region0: config state: 0 +> +[ 60.536675] cxl_region region0: probe: -6 +> +[ 60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 60.538251] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 60.538647] cxl region0: mem1:endpoint4 decoder4.0 add: +> +mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2 +> +[ 60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1 +> +[ 60.539311] cxl region0: mem2:endpoint5 decoder5.0 add: +> +mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3 +> +[ 60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1 +> +[ 60.539711] cxl region0: mem3:endpoint6 decoder6.0 add: +> +mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4 +> +[ 60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1 +> +[ 60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512 +This looks like off by 1 that should be fixed in the below mentioned +cxl/pending branch. That ig should be 256. Note the fix was +for a test case with a fat HB and no switch, but certainly looks +like this is the same issue. + +> +[ 60.539758] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at +> +1 +> +> +I have tried to write sysfs node manually, got same errors. +When stepping through by hand, which sysfs write triggers the crash above? + +Not sure it's related, but I've just sent out a fix to the +target register handling in QEMU. +20220808122051.14822-1-Jonathan.Cameron@huawei.com +/T/#m47ff985412ce44559e6b04d677c302f8cd371330">https://lore.kernel.org/linux-cxl/ +20220808122051.14822-1-Jonathan.Cameron@huawei.com +/T/#m47ff985412ce44559e6b04d677c302f8cd371330 +I did have one instance last week of triggering what looked to be a race +condition but +the stack trace doesn't looks related to what you've hit. + +It will probably be a few days before I have time to take a look at replicating +what you have seen. + +If you have time, try using the kernel.org cxl/pending branch as there are +a few additional fixes on there since you sent this email. Optimistic to hope +this is covered by one of those, but at least it will mean we are trying to +replicate +on same branch. + +Jonathan + + +> +> +Hope I can get some helps here. +> +> +Bob + +Hi Jonathan + +Thanks for your reply! + +On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +<Jonathan.Cameron@huawei.com> wrote: +> +> +Probably not related to your problem, but there is a disconnect in QEMU / +> +kernel assumptionsaround the presence of an HDM decoder when a HB only +> +has a single root port. Spec allows it to be provided or not as an +> +implementation choice. +> +Kernel assumes it isn't provide. Qemu assumes it is. +> +> +The temporary solution is to throw in a second root port on the HB and not +> +connect anything to it. Longer term I may special case this so that the +> +particular +> +decoder defaults to pass through settings in QEMU if there is only one root +> +port. +> +You are right! After adding an extra HB in qemu, I can create a x1 +region successfully. +But have some errors in Nvdimm: + +[ 74.925838] Unknown online node for memory at 0x10000000000, assuming node 0 +[ 74.925846] Unknown target node for memory at 0x10000000000, assuming node 0 +[ 74.927470] nd_region region0: nmem0: is disabled, failing probe + +And x4 region still failed with same errors, using latest cxl/preview +branch don't work. +I have picked "Two CXL emulation fixes" patches in qemu, still not working. + +Bob + +On Tue, 9 Aug 2022 21:07:06 +0800 +Bobo WL <lmw.bobo@gmail.com> wrote: + +> +Hi Jonathan +> +> +Thanks for your reply! +> +> +On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +<Jonathan.Cameron@huawei.com> wrote: +> +> +> +> Probably not related to your problem, but there is a disconnect in QEMU / +> +> kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> has a single root port. Spec allows it to be provided or not as an +> +> implementation choice. +> +> Kernel assumes it isn't provide. Qemu assumes it is. +> +> +> +> The temporary solution is to throw in a second root port on the HB and not +> +> connect anything to it. Longer term I may special case this so that the +> +> particular +> +> decoder defaults to pass through settings in QEMU if there is only one root +> +> port. +> +> +> +> +You are right! After adding an extra HB in qemu, I can create a x1 +> +region successfully. +> +But have some errors in Nvdimm: +> +> +[ 74.925838] Unknown online node for memory at 0x10000000000, assuming node > 0 +> +[ 74.925846] Unknown target node for memory at 0x10000000000, assuming node > 0 +> +[ 74.927470] nd_region region0: nmem0: is disabled, failing probe +Ah. I've seen this one, but not chased it down yet. Was on my todo list to +chase +down. Once I reach this state I can verify the HDM Decode is correct which is +what +I've been using to test (Which wasn't true until earlier this week). +I'm currently testing via devmem, more for historical reasons than because it +makes +that much sense anymore. + +> +> +And x4 region still failed with same errors, using latest cxl/preview +> +branch don't work. +> +I have picked "Two CXL emulation fixes" patches in qemu, still not working. +> +> +Bob + +On Tue, 9 Aug 2022 17:08:25 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Tue, 9 Aug 2022 21:07:06 +0800 +> +Bobo WL <lmw.bobo@gmail.com> wrote: +> +> +> Hi Jonathan +> +> +> +> Thanks for your reply! +> +> +> +> On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > Probably not related to your problem, but there is a disconnect in QEMU / +> +> > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > has a single root port. Spec allows it to be provided or not as an +> +> > implementation choice. +> +> > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > +> +> > The temporary solution is to throw in a second root port on the HB and not +> +> > connect anything to it. Longer term I may special case this so that the +> +> > particular +> +> > decoder defaults to pass through settings in QEMU if there is only one +> +> > root port. +> +> > +> +> +> +> You are right! After adding an extra HB in qemu, I can create a x1 +> +> region successfully. +> +> But have some errors in Nvdimm: +> +> +> +> [ 74.925838] Unknown online node for memory at 0x10000000000, assuming +> +> node 0 +> +> [ 74.925846] Unknown target node for memory at 0x10000000000, assuming +> +> node 0 +> +> [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> +Ah. I've seen this one, but not chased it down yet. Was on my todo list to +> +chase +> +down. Once I reach this state I can verify the HDM Decode is correct which is +> +what +> +I've been using to test (Which wasn't true until earlier this week). +> +I'm currently testing via devmem, more for historical reasons than because it +> +makes +> +that much sense anymore. +*embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +I'd forgotten that was still on the todo list. I don't think it will +be particularly hard to do and will take a look in next few days. + +Very very indirectly this error is causing a driver probe fail that means that +we hit a code path that has a rather odd looking check on NDD_LABELING. +Should not have gotten near that path though - hence the problem is actually +when we call cxl_pmem_get_config_data() and it returns an error because +we haven't fully connected up the command in QEMU. + +Jonathan + + +> +> +> +> +> And x4 region still failed with same errors, using latest cxl/preview +> +> branch don't work. +> +> I have picked "Two CXL emulation fixes" patches in qemu, still not working. +> +> +> +> Bob + +On Thu, 11 Aug 2022 18:08:57 +0100 +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: + +> +On Tue, 9 Aug 2022 17:08:25 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Tue, 9 Aug 2022 21:07:06 +0800 +> +> Bobo WL <lmw.bobo@gmail.com> wrote: +> +> +> +> > Hi Jonathan +> +> > +> +> > Thanks for your reply! +> +> > +> +> > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > <Jonathan.Cameron@huawei.com> wrote: +> +> > > +> +> > > Probably not related to your problem, but there is a disconnect in QEMU +> +> > > / +> +> > > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > > has a single root port. Spec allows it to be provided or not as an +> +> > > implementation choice. +> +> > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > +> +> > > The temporary solution is to throw in a second root port on the HB and +> +> > > not +> +> > > connect anything to it. Longer term I may special case this so that +> +> > > the particular +> +> > > decoder defaults to pass through settings in QEMU if there is only one +> +> > > root port. +> +> > > +> +> > +> +> > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > region successfully. +> +> > But have some errors in Nvdimm: +> +> > +> +> > [ 74.925838] Unknown online node for memory at 0x10000000000, assuming +> +> > node 0 +> +> > [ 74.925846] Unknown target node for memory at 0x10000000000, assuming +> +> > node 0 +> +> > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> +> +> Ah. I've seen this one, but not chased it down yet. Was on my todo list to +> +> chase +> +> down. Once I reach this state I can verify the HDM Decode is correct which +> +> is what +> +> I've been using to test (Which wasn't true until earlier this week). +> +> I'm currently testing via devmem, more for historical reasons than because +> +> it makes +> +> that much sense anymore. +> +> +*embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +I'd forgotten that was still on the todo list. I don't think it will +> +be particularly hard to do and will take a look in next few days. +> +> +Very very indirectly this error is causing a driver probe fail that means that +> +we hit a code path that has a rather odd looking check on NDD_LABELING. +> +Should not have gotten near that path though - hence the problem is actually +> +when we call cxl_pmem_get_config_data() and it returns an error because +> +we haven't fully connected up the command in QEMU. +So a least one bug in QEMU. We were not supporting variable length payloads on +mailbox +inputs (but were on outputs). That hasn't mattered until we get to LSA writes. +We just need to relax condition on the supplied length. + +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +index c352a935c4..fdda9529fe 100644 +--- a/hw/cxl/cxl-mailbox-utils.c ++++ b/hw/cxl/cxl-mailbox-utils.c +@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) + cxl_cmd = &cxl_cmd_set[set][cmd]; + h = cxl_cmd->handler; + if (h) { +- if (len == cxl_cmd->in) { ++ if (len == cxl_cmd->in || !cxl_cmd->in) { + cxl_cmd->payload = cxl_dstate->mbox_reg_state + + A_CXL_DEV_CMD_PAYLOAD; + ret = (*h)(cxl_cmd, cxl_dstate, &len); + + +This lets the nvdimm/region probe fine, but I'm getting some issues with +namespace capacity so I'll look at what is causing that next. +Unfortunately I'm not that familiar with the driver/nvdimm side of things +so it's take a while to figure out what kicks off what! + +Jonathan + +> +> +Jonathan +> +> +> +> +> +> > +> +> > And x4 region still failed with same errors, using latest cxl/preview +> +> > branch don't work. +> +> > I have picked "Two CXL emulation fixes" patches in qemu, still not +> +> > working. +> +> > +> +> > Bob +> +> + +Jonathan Cameron wrote: +> +On Thu, 11 Aug 2022 18:08:57 +0100 +> +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> On Tue, 9 Aug 2022 17:08:25 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > +> +> > > Hi Jonathan +> +> > > +> +> > > Thanks for your reply! +> +> > > +> +> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > +> +> > > > Probably not related to your problem, but there is a disconnect in +> +> > > > QEMU / +> +> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > implementation choice. +> +> > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > +> +> > > > The temporary solution is to throw in a second root port on the HB +> +> > > > and not +> +> > > > connect anything to it. Longer term I may special case this so that +> +> > > > the particular +> +> > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > one root port. +> +> > > > +> +> > > +> +> > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > region successfully. +> +> > > But have some errors in Nvdimm: +> +> > > +> +> > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > +> +> > Ah. I've seen this one, but not chased it down yet. Was on my todo list +> +> > to chase +> +> > down. Once I reach this state I can verify the HDM Decode is correct +> +> > which is what +> +> > I've been using to test (Which wasn't true until earlier this week). +> +> > I'm currently testing via devmem, more for historical reasons than +> +> > because it makes +> +> > that much sense anymore. +> +> +> +> *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> I'd forgotten that was still on the todo list. I don't think it will +> +> be particularly hard to do and will take a look in next few days. +> +> +> +> Very very indirectly this error is causing a driver probe fail that means +> +> that +> +> we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> Should not have gotten near that path though - hence the problem is actually +> +> when we call cxl_pmem_get_config_data() and it returns an error because +> +> we haven't fully connected up the command in QEMU. +> +> +So a least one bug in QEMU. We were not supporting variable length payloads +> +on mailbox +> +inputs (but were on outputs). That hasn't mattered until we get to LSA +> +writes. +> +We just need to relax condition on the supplied length. +> +> +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +index c352a935c4..fdda9529fe 100644 +> +--- a/hw/cxl/cxl-mailbox-utils.c +> ++++ b/hw/cxl/cxl-mailbox-utils.c +> +@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +cxl_cmd = &cxl_cmd_set[set][cmd]; +> +h = cxl_cmd->handler; +> +if (h) { +> +- if (len == cxl_cmd->in) { +> ++ if (len == cxl_cmd->in || !cxl_cmd->in) { +> +cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +A_CXL_DEV_CMD_PAYLOAD; +> +ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +This lets the nvdimm/region probe fine, but I'm getting some issues with +> +namespace capacity so I'll look at what is causing that next. +> +Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +so it's take a while to figure out what kicks off what! +The whirlwind tour is that 'struct nd_region' instances that represent a +persitent memory address range are composed of one more mappings of +'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +the dimm (if locked) and interrogating the label area to look for +namespace labels. + +The label command calls are routed to the '->ndctl()' callback that was +registered when the CXL nvdimm_bus_descriptor was created. That callback +handles both 'bus' scope calls, currently none for CXL, and per nvdimm +calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +to CXL commands. + +The 'struct nvdimm' objects that the CXL side registers have the +NDD_LABELING flag set which means that namespaces need to be explicitly +created / provisioned from region capacity. Otherwise, if +drivers/nvdimm/dimm.c does not find a namespace-label-index block then +the region reverts to label-less mode and a default namespace equal to +the size of the region is instantiated. + +If you are seeing small mismatches in namespace capacity then it may +just be the fact that by default 'ndctl create-namespace' results in an +'fsdax' mode namespace which just means that it is a block device where +1.5% of the capacity is reserved for 'struct page' metadata. You should +be able to see namespace capacity == region capacity by doing "ndctl +create-namespace -m raw", and disable DAX operation. + +Hope that helps. + +On Fri, 12 Aug 2022 09:03:02 -0700 +Dan Williams <dan.j.williams@intel.com> wrote: + +> +Jonathan Cameron wrote: +> +> On Thu, 11 Aug 2022 18:08:57 +0100 +> +> Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> +> > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > +> +> > > > Hi Jonathan +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > +> +> > > > > Probably not related to your problem, but there is a disconnect in +> +> > > > > QEMU / +> +> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > only +> +> > > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > > implementation choice. +> +> > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > +> +> > > > > The temporary solution is to throw in a second root port on the HB +> +> > > > > and not +> +> > > > > connect anything to it. Longer term I may special case this so +> +> > > > > that the particular +> +> > > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > > one root port. +> +> > > > > +> +> > > > +> +> > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > region successfully. +> +> > > > But have some errors in Nvdimm: +> +> > > > +> +> > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > +> +> > > +> +> > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > list to chase +> +> > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > which is what +> +> > > I've been using to test (Which wasn't true until earlier this week). +> +> > > I'm currently testing via devmem, more for historical reasons than +> +> > > because it makes +> +> > > that much sense anymore. +> +> > +> +> > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > I'd forgotten that was still on the todo list. I don't think it will +> +> > be particularly hard to do and will take a look in next few days. +> +> > +> +> > Very very indirectly this error is causing a driver probe fail that means +> +> > that +> +> > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > Should not have gotten near that path though - hence the problem is +> +> > actually +> +> > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > we haven't fully connected up the command in QEMU. +> +> +> +> So a least one bug in QEMU. We were not supporting variable length payloads +> +> on mailbox +> +> inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> writes. +> +> We just need to relax condition on the supplied length. +> +> +> +> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> index c352a935c4..fdda9529fe 100644 +> +> --- a/hw/cxl/cxl-mailbox-utils.c +> +> +++ b/hw/cxl/cxl-mailbox-utils.c +> +> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> h = cxl_cmd->handler; +> +> if (h) { +> +> - if (len == cxl_cmd->in) { +> +> + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> A_CXL_DEV_CMD_PAYLOAD; +> +> ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +> +> +> This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> namespace capacity so I'll look at what is causing that next. +> +> Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +> so it's take a while to figure out what kicks off what! +> +> +The whirlwind tour is that 'struct nd_region' instances that represent a +> +persitent memory address range are composed of one more mappings of +> +'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +> +in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +> +the dimm (if locked) and interrogating the label area to look for +> +namespace labels. +> +> +The label command calls are routed to the '->ndctl()' callback that was +> +registered when the CXL nvdimm_bus_descriptor was created. That callback +> +handles both 'bus' scope calls, currently none for CXL, and per nvdimm +> +calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +> +to CXL commands. +> +> +The 'struct nvdimm' objects that the CXL side registers have the +> +NDD_LABELING flag set which means that namespaces need to be explicitly +> +created / provisioned from region capacity. Otherwise, if +> +drivers/nvdimm/dimm.c does not find a namespace-label-index block then +> +the region reverts to label-less mode and a default namespace equal to +> +the size of the region is instantiated. +> +> +If you are seeing small mismatches in namespace capacity then it may +> +just be the fact that by default 'ndctl create-namespace' results in an +> +'fsdax' mode namespace which just means that it is a block device where +> +1.5% of the capacity is reserved for 'struct page' metadata. You should +> +be able to see namespace capacity == region capacity by doing "ndctl +> +create-namespace -m raw", and disable DAX operation. +Currently ndctl create-namespace crashes qemu ;) +Which isn't ideal! + +> +> +Hope that helps. +Got me looking at the right code. Thanks! + +Jonathan + +On Fri, 12 Aug 2022 17:15:09 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Fri, 12 Aug 2022 09:03:02 -0700 +> +Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> Jonathan Cameron wrote: +> +> > On Thu, 11 Aug 2022 18:08:57 +0100 +> +> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > > +> +> > > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > > +> +> > > > > Hi Jonathan +> +> > > > > +> +> > > > > Thanks for your reply! +> +> > > > > +> +> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > > +> +> > > > > > Probably not related to your problem, but there is a disconnect +> +> > > > > > in QEMU / +> +> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > > only +> +> > > > > > has a single root port. Spec allows it to be provided or not as +> +> > > > > > an implementation choice. +> +> > > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > > +> +> > > > > > The temporary solution is to throw in a second root port on the +> +> > > > > > HB and not +> +> > > > > > connect anything to it. Longer term I may special case this so +> +> > > > > > that the particular +> +> > > > > > decoder defaults to pass through settings in QEMU if there is +> +> > > > > > only one root port. +> +> > > > > > +> +> > > > > +> +> > > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > > region successfully. +> +> > > > > But have some errors in Nvdimm: +> +> > > > > +> +> > > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > > +> +> > > > +> +> > > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > > list to chase +> +> > > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > > which is what +> +> > > > I've been using to test (Which wasn't true until earlier this week). +> +> > > > I'm currently testing via devmem, more for historical reasons than +> +> > > > because it makes +> +> > > > that much sense anymore. +> +> > > +> +> > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > > I'd forgotten that was still on the todo list. I don't think it will +> +> > > be particularly hard to do and will take a look in next few days. +> +> > > +> +> > > Very very indirectly this error is causing a driver probe fail that +> +> > > means that +> +> > > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > > Should not have gotten near that path though - hence the problem is +> +> > > actually +> +> > > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > > we haven't fully connected up the command in QEMU. +> +> > +> +> > So a least one bug in QEMU. We were not supporting variable length +> +> > payloads on mailbox +> +> > inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> > writes. +> +> > We just need to relax condition on the supplied length. +> +> > +> +> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> > index c352a935c4..fdda9529fe 100644 +> +> > --- a/hw/cxl/cxl-mailbox-utils.c +> +> > +++ b/hw/cxl/cxl-mailbox-utils.c +> +> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> > cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> > h = cxl_cmd->handler; +> +> > if (h) { +> +> > - if (len == cxl_cmd->in) { +> +> > + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> > cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> > A_CXL_DEV_CMD_PAYLOAD; +> +> > ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> > +> +> > +> +> > This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> > namespace capacity so I'll look at what is causing that next. +> +> > Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +> > so it's take a while to figure out what kicks off what! +> +> +> +> The whirlwind tour is that 'struct nd_region' instances that represent a +> +> persitent memory address range are composed of one more mappings of +> +> 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +> +> in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +> +> the dimm (if locked) and interrogating the label area to look for +> +> namespace labels. +> +> +> +> The label command calls are routed to the '->ndctl()' callback that was +> +> registered when the CXL nvdimm_bus_descriptor was created. That callback +> +> handles both 'bus' scope calls, currently none for CXL, and per nvdimm +> +> calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +> +> to CXL commands. +> +> +> +> The 'struct nvdimm' objects that the CXL side registers have the +> +> NDD_LABELING flag set which means that namespaces need to be explicitly +> +> created / provisioned from region capacity. Otherwise, if +> +> drivers/nvdimm/dimm.c does not find a namespace-label-index block then +> +> the region reverts to label-less mode and a default namespace equal to +> +> the size of the region is instantiated. +> +> +> +> If you are seeing small mismatches in namespace capacity then it may +> +> just be the fact that by default 'ndctl create-namespace' results in an +> +> 'fsdax' mode namespace which just means that it is a block device where +> +> 1.5% of the capacity is reserved for 'struct page' metadata. You should +> +> be able to see namespace capacity == region capacity by doing "ndctl +> +> create-namespace -m raw", and disable DAX operation. +> +> +Currently ndctl create-namespace crashes qemu ;) +> +Which isn't ideal! +> +Found a cause for this one. Mailbox payload may be as small as 256 bytes. +We have code in kernel sanity checking that output payload fits in the +mailbox, but nothing on the input payload. Symptom is that we write just +off the end whatever size the payload is. Note doing this shouldn't crash +qemu - so I need to fix a range check somewhere. + +I think this is because cxl_pmem_get_config_size() returns the mailbox +payload size as being the available LSA size, forgetting to remove the +size of the headers on the set_lsa side of things. +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110 +I've hacked the max_payload to be -8 + +Now we still don't succeed in creating the namespace, but bonus is it doesn't +crash any more. + + +Jonathan + + + +> +> +> +> Hope that helps. +> +Got me looking at the right code. Thanks! +> +> +Jonathan +> +> + +On Mon, 15 Aug 2022 15:18:09 +0100 +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: + +> +On Fri, 12 Aug 2022 17:15:09 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Fri, 12 Aug 2022 09:03:02 -0700 +> +> Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> +> > Jonathan Cameron wrote: +> +> > > On Thu, 11 Aug 2022 18:08:57 +0100 +> +> > > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> > > +> +> > > > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > > > +> +> > > > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > > > +> +> > > > > > Hi Jonathan +> +> > > > > > +> +> > > > > > Thanks for your reply! +> +> > > > > > +> +> > > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > > > +> +> > > > > > > Probably not related to your problem, but there is a disconnect +> +> > > > > > > in QEMU / +> +> > > > > > > kernel assumptionsaround the presence of an HDM decoder when a +> +> > > > > > > HB only +> +> > > > > > > has a single root port. Spec allows it to be provided or not as +> +> > > > > > > an implementation choice. +> +> > > > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > > > +> +> > > > > > > The temporary solution is to throw in a second root port on the +> +> > > > > > > HB and not +> +> > > > > > > connect anything to it. Longer term I may special case this so +> +> > > > > > > that the particular +> +> > > > > > > decoder defaults to pass through settings in QEMU if there is +> +> > > > > > > only one root port. +> +> > > > > > > +> +> > > > > > +> +> > > > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > > > region successfully. +> +> > > > > > But have some errors in Nvdimm: +> +> > > > > > +> +> > > > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > > > assuming node 0 +> +> > > > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > > > assuming node 0 +> +> > > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing +> +> > > > > > probe +> +> > > > > +> +> > > > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > > > list to chase +> +> > > > > down. Once I reach this state I can verify the HDM Decode is +> +> > > > > correct which is what +> +> > > > > I've been using to test (Which wasn't true until earlier this +> +> > > > > week). +> +> > > > > I'm currently testing via devmem, more for historical reasons than +> +> > > > > because it makes +> +> > > > > that much sense anymore. +> +> > > > +> +> > > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > > > I'd forgotten that was still on the todo list. I don't think it will +> +> > > > be particularly hard to do and will take a look in next few days. +> +> > > > +> +> > > > Very very indirectly this error is causing a driver probe fail that +> +> > > > means that +> +> > > > we hit a code path that has a rather odd looking check on +> +> > > > NDD_LABELING. +> +> > > > Should not have gotten near that path though - hence the problem is +> +> > > > actually +> +> > > > when we call cxl_pmem_get_config_data() and it returns an error +> +> > > > because +> +> > > > we haven't fully connected up the command in QEMU. +> +> > > +> +> > > So a least one bug in QEMU. We were not supporting variable length +> +> > > payloads on mailbox +> +> > > inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> > > writes. +> +> > > We just need to relax condition on the supplied length. +> +> > > +> +> > > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> > > index c352a935c4..fdda9529fe 100644 +> +> > > --- a/hw/cxl/cxl-mailbox-utils.c +> +> > > +++ b/hw/cxl/cxl-mailbox-utils.c +> +> > > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> > > cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> > > h = cxl_cmd->handler; +> +> > > if (h) { +> +> > > - if (len == cxl_cmd->in) { +> +> > > + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> > > cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> > > A_CXL_DEV_CMD_PAYLOAD; +> +> > > ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> > > +> +> > > +> +> > > This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> > > namespace capacity so I'll look at what is causing that next. +> +> > > Unfortunately I'm not that familiar with the driver/nvdimm side of +> +> > > things +> +> > > so it's take a while to figure out what kicks off what! +> +> > +> +> > The whirlwind tour is that 'struct nd_region' instances that represent a +> +> > persitent memory address range are composed of one more mappings of +> +> > 'struct nvdimm' objects. The nvdimm object is driven by the dimm driver +> +> > in drivers/nvdimm/dimm.c. That driver is mainly charged with unlocking +> +> > the dimm (if locked) and interrogating the label area to look for +> +> > namespace labels. +> +> > +> +> > The label command calls are routed to the '->ndctl()' callback that was +> +> > registered when the CXL nvdimm_bus_descriptor was created. That callback +> +> > handles both 'bus' scope calls, currently none for CXL, and per nvdimm +> +> > calls. cxl_pmem_nvdimm_ctl() translates those generic LIBNVDIMM commands +> +> > to CXL commands. +> +> > +> +> > The 'struct nvdimm' objects that the CXL side registers have the +> +> > NDD_LABELING flag set which means that namespaces need to be explicitly +> +> > created / provisioned from region capacity. Otherwise, if +> +> > drivers/nvdimm/dimm.c does not find a namespace-label-index block then +> +> > the region reverts to label-less mode and a default namespace equal to +> +> > the size of the region is instantiated. +> +> > +> +> > If you are seeing small mismatches in namespace capacity then it may +> +> > just be the fact that by default 'ndctl create-namespace' results in an +> +> > 'fsdax' mode namespace which just means that it is a block device where +> +> > 1.5% of the capacity is reserved for 'struct page' metadata. You should +> +> > be able to see namespace capacity == region capacity by doing "ndctl +> +> > create-namespace -m raw", and disable DAX operation. +> +> +> +> Currently ndctl create-namespace crashes qemu ;) +> +> Which isn't ideal! +> +> +> +> +Found a cause for this one. Mailbox payload may be as small as 256 bytes. +> +We have code in kernel sanity checking that output payload fits in the +> +mailbox, but nothing on the input payload. Symptom is that we write just +> +off the end whatever size the payload is. Note doing this shouldn't crash +> +qemu - so I need to fix a range check somewhere. +> +> +I think this is because cxl_pmem_get_config_size() returns the mailbox +> +payload size as being the available LSA size, forgetting to remove the +> +size of the headers on the set_lsa side of things. +> +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/tree/drivers/cxl/pmem.c?h=next#n110 +> +> +I've hacked the max_payload to be -8 +> +> +Now we still don't succeed in creating the namespace, but bonus is it doesn't +> +crash any more. +In the interests of defensive / correct handling from QEMU I took a +look into why it was crashing. Turns out that providing a NULL write callback +for +the memory device region (that the above overlarge write was spilling into) +isn't +a safe thing to do. Needs a stub. Oops. + +On plus side we might never have noticed this was going wrong without the crash +*silver lining in every cloud* + +Fix to follow... + +Jonathan + + +> +> +> +Jonathan +> +> +> +> +> > +> +> > Hope that helps. +> +> Got me looking at the right code. Thanks! +> +> +> +> Jonathan +> +> +> +> +> +> + +On Mon, 15 Aug 2022 at 15:55, Jonathan Cameron via <qemu-arm@nongnu.org> wrote: +> +In the interests of defensive / correct handling from QEMU I took a +> +look into why it was crashing. Turns out that providing a NULL write +> +callback for +> +the memory device region (that the above overlarge write was spilling into) +> +isn't +> +a safe thing to do. Needs a stub. Oops. +Yeah. We've talked before about adding an assert so that that kind of +"missing function" bug is caught at device creation rather than only +if the guest tries to access the device, but we never quite got around +to it... + +-- PMM + +On Fri, 12 Aug 2022 16:44:03 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Thu, 11 Aug 2022 18:08:57 +0100 +> +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> On Tue, 9 Aug 2022 17:08:25 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > +> +> > > Hi Jonathan +> +> > > +> +> > > Thanks for your reply! +> +> > > +> +> > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > +> +> > > > Probably not related to your problem, but there is a disconnect in +> +> > > > QEMU / +> +> > > > kernel assumptionsaround the presence of an HDM decoder when a HB only +> +> > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > implementation choice. +> +> > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > +> +> > > > The temporary solution is to throw in a second root port on the HB +> +> > > > and not +> +> > > > connect anything to it. Longer term I may special case this so that +> +> > > > the particular +> +> > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > one root port. +> +> > > > +> +> > > +> +> > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > region successfully. +> +> > > But have some errors in Nvdimm: +> +> > > +> +> > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > assuming node 0 +> +> > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > +> +> > +> +> > Ah. I've seen this one, but not chased it down yet. Was on my todo list +> +> > to chase +> +> > down. Once I reach this state I can verify the HDM Decode is correct +> +> > which is what +> +> > I've been using to test (Which wasn't true until earlier this week). +> +> > I'm currently testing via devmem, more for historical reasons than +> +> > because it makes +> +> > that much sense anymore. +> +> +> +> *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> I'd forgotten that was still on the todo list. I don't think it will +> +> be particularly hard to do and will take a look in next few days. +> +> +> +> Very very indirectly this error is causing a driver probe fail that means +> +> that +> +> we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> Should not have gotten near that path though - hence the problem is actually +> +> when we call cxl_pmem_get_config_data() and it returns an error because +> +> we haven't fully connected up the command in QEMU. +> +> +So a least one bug in QEMU. We were not supporting variable length payloads +> +on mailbox +> +inputs (but were on outputs). That hasn't mattered until we get to LSA +> +writes. +> +We just need to relax condition on the supplied length. +> +> +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +index c352a935c4..fdda9529fe 100644 +> +--- a/hw/cxl/cxl-mailbox-utils.c +> ++++ b/hw/cxl/cxl-mailbox-utils.c +> +@@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +cxl_cmd = &cxl_cmd_set[set][cmd]; +> +h = cxl_cmd->handler; +> +if (h) { +> +- if (len == cxl_cmd->in) { +> ++ if (len == cxl_cmd->in || !cxl_cmd->in) { +Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. + +With that fixed we hit new fun paths - after some errors we get the +worrying - not totally sure but looks like a failure on an error cleanup. +I'll chase down the error source, but even then this is probably triggerable by +hardware problem or similar. Some bonus prints in here from me chasing +error paths, but it's otherwise just cxl/next + the fix I posted earlier today. + +[ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +[ 69.920108] nd_region_probe +[ 69.920623] ------------[ cut here ]------------ +[ 69.920675] refcount_t: addition on 0; use-after-free. +[ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +refcount_warn_saturate+0xa0/0x144 +[ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi +cxl_core +[ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399 +[ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 +[ 69.931482] Workqueue: events_unbound async_run_entry_fn +[ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +[ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +[ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +[ 69.936541] sp : ffff80000890b960 +[ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: 0000000000000000 +[ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: 0000000000000000 +[ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: ffff0000c5254800 +[ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: ffffffffffffffff +[ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: 0000000000000000 +[ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: 657466612d657375 +[ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : ffffa54a8f63d288 +[ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : 00000000fffff31e +[ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : ffff5ab66e5ef000 +root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +[ 69.957098] Call trace: +[ 69.957959] refcount_warn_saturate+0xa0/0x144 +[ 69.958773] get_ndd+0x5c/0x80 +[ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +[ 69.960253] nd_region_probe+0x100/0x290 +[ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +[ 69.962087] really_probe+0x19c/0x3f0 +[ 69.962620] __driver_probe_device+0x11c/0x190 +[ 69.963258] driver_probe_device+0x44/0xf4 +[ 69.963773] __device_attach_driver+0xa4/0x140 +[ 69.964471] bus_for_each_drv+0x84/0xe0 +[ 69.965068] __device_attach+0xb0/0x1f0 +[ 69.966101] device_initial_probe+0x20/0x30 +[ 69.967142] bus_probe_device+0xa4/0xb0 +[ 69.968104] device_add+0x3e8/0x910 +[ 69.969111] nd_async_device_register+0x24/0x74 +[ 69.969928] async_run_entry_fn+0x40/0x150 +[ 69.970725] process_one_work+0x1dc/0x450 +[ 69.971796] worker_thread+0x154/0x450 +[ 69.972700] kthread+0x118/0x120 +[ 69.974141] ret_from_fork+0x10/0x20 +[ 69.975141] ---[ end trace 0000000000000000 ]--- +[ 70.117887] Into nd_namespace_pmem_set_resource() + +> +cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +A_CXL_DEV_CMD_PAYLOAD; +> +ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +This lets the nvdimm/region probe fine, but I'm getting some issues with +> +namespace capacity so I'll look at what is causing that next. +> +Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +so it's take a while to figure out what kicks off what! +> +> +Jonathan +> +> +> +> +> Jonathan +> +> +> +> +> +> > +> +> > > +> +> > > And x4 region still failed with same errors, using latest cxl/preview +> +> > > branch don't work. +> +> > > I have picked "Two CXL emulation fixes" patches in qemu, still not +> +> > > working. +> +> > > +> +> > > Bob +> +> +> +> +> + +On Mon, 15 Aug 2022 18:04:44 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Fri, 12 Aug 2022 16:44:03 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Thu, 11 Aug 2022 18:08:57 +0100 +> +> Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> +> > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > +> +> > > > Hi Jonathan +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > +> +> > > > > Probably not related to your problem, but there is a disconnect in +> +> > > > > QEMU / +> +> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > only +> +> > > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > > implementation choice. +> +> > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > +> +> > > > > The temporary solution is to throw in a second root port on the HB +> +> > > > > and not +> +> > > > > connect anything to it. Longer term I may special case this so +> +> > > > > that the particular +> +> > > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > > one root port. +> +> > > > > +> +> > > > +> +> > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > region successfully. +> +> > > > But have some errors in Nvdimm: +> +> > > > +> +> > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > +> +> > > +> +> > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > list to chase +> +> > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > which is what +> +> > > I've been using to test (Which wasn't true until earlier this week). +> +> > > I'm currently testing via devmem, more for historical reasons than +> +> > > because it makes +> +> > > that much sense anymore. +> +> > +> +> > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > I'd forgotten that was still on the todo list. I don't think it will +> +> > be particularly hard to do and will take a look in next few days. +> +> > +> +> > Very very indirectly this error is causing a driver probe fail that means +> +> > that +> +> > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > Should not have gotten near that path though - hence the problem is +> +> > actually +> +> > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > we haven't fully connected up the command in QEMU. +> +> +> +> So a least one bug in QEMU. We were not supporting variable length payloads +> +> on mailbox +> +> inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> writes. +> +> We just need to relax condition on the supplied length. +> +> +> +> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> index c352a935c4..fdda9529fe 100644 +> +> --- a/hw/cxl/cxl-mailbox-utils.c +> +> +++ b/hw/cxl/cxl-mailbox-utils.c +> +> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> h = cxl_cmd->handler; +> +> if (h) { +> +> - if (len == cxl_cmd->in) { +> +> + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. +Cause of the error is a failure in GET_LSA. +Reason, payload length is wrong in QEMU but was hidden previously by my wrong +fix here. Probably still a good idea to inject an error in GET_LSA and chase +down the refcount issue. + + +diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +index fdda9529fe..e8565fbd6e 100644 +--- a/hw/cxl/cxl-mailbox-utils.c ++++ b/hw/cxl/cxl-mailbox-utils.c +@@ -489,7 +489,7 @@ static struct cxl_cmd cxl_cmd_set[256][256] = { + cmd_identify_memory_device, 0, 0 }, + [CCLS][GET_PARTITION_INFO] = { "CCLS_GET_PARTITION_INFO", + cmd_ccls_get_partition_info, 0, 0 }, +- [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 0, 0 }, ++ [CCLS][GET_LSA] = { "CCLS_GET_LSA", cmd_ccls_get_lsa, 8, 0 }, + [CCLS][SET_LSA] = { "CCLS_SET_LSA", cmd_ccls_set_lsa, + ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE }, + [MEDIA_AND_POISON][GET_POISON_LIST] = { "MEDIA_AND_POISON_GET_POISON_LIST", +@@ -510,12 +510,13 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) + cxl_cmd = &cxl_cmd_set[set][cmd]; + h = cxl_cmd->handler; + if (h) { +- if (len == cxl_cmd->in || !cxl_cmd->in) { ++ if (len == cxl_cmd->in || cxl_cmd->in == ~0) { + cxl_cmd->payload = cxl_dstate->mbox_reg_state + + A_CXL_DEV_CMD_PAYLOAD; + +And woot, we get a namespace in the LSA :) + +I'll post QEMU fixes in next day or two. Kernel side now seems more or less +fine be it with suspicious refcount underflow. + +> +> +With that fixed we hit new fun paths - after some errors we get the +> +worrying - not totally sure but looks like a failure on an error cleanup. +> +I'll chase down the error source, but even then this is probably triggerable +> +by +> +hardware problem or similar. Some bonus prints in here from me chasing +> +error paths, but it's otherwise just cxl/next + the fix I posted earlier +> +today. +> +> +[ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +> +[ 69.920108] nd_region_probe +> +[ 69.920623] ------------[ cut here ]------------ +> +[ 69.920675] refcount_t: addition on 0; use-after-free. +> +[ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +> +refcount_warn_saturate+0xa0/0x144 +> +[ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi +> +cxl_core +> +[ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399 +> +[ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 69.931482] Workqueue: events_unbound async_run_entry_fn +> +[ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +> +[ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +> +[ 69.936541] sp : ffff80000890b960 +> +[ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: +> +0000000000000000 +> +[ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: +> +0000000000000000 +> +[ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: +> +ffff0000c5254800 +> +[ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: +> +ffffffffffffffff +> +[ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: +> +0000000000000000 +> +[ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: +> +657466612d657375 +> +[ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : +> +ffffa54a8f63d288 +> +[ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : +> +00000000fffff31e +> +[ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : +> +ffff5ab66e5ef000 +> +root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +> +0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +> +[ 69.957098] Call trace: +> +[ 69.957959] refcount_warn_saturate+0xa0/0x144 +> +[ 69.958773] get_ndd+0x5c/0x80 +> +[ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +> +[ 69.960253] nd_region_probe+0x100/0x290 +> +[ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +> +[ 69.962087] really_probe+0x19c/0x3f0 +> +[ 69.962620] __driver_probe_device+0x11c/0x190 +> +[ 69.963258] driver_probe_device+0x44/0xf4 +> +[ 69.963773] __device_attach_driver+0xa4/0x140 +> +[ 69.964471] bus_for_each_drv+0x84/0xe0 +> +[ 69.965068] __device_attach+0xb0/0x1f0 +> +[ 69.966101] device_initial_probe+0x20/0x30 +> +[ 69.967142] bus_probe_device+0xa4/0xb0 +> +[ 69.968104] device_add+0x3e8/0x910 +> +[ 69.969111] nd_async_device_register+0x24/0x74 +> +[ 69.969928] async_run_entry_fn+0x40/0x150 +> +[ 69.970725] process_one_work+0x1dc/0x450 +> +[ 69.971796] worker_thread+0x154/0x450 +> +[ 69.972700] kthread+0x118/0x120 +> +[ 69.974141] ret_from_fork+0x10/0x20 +> +[ 69.975141] ---[ end trace 0000000000000000 ]--- +> +[ 70.117887] Into nd_namespace_pmem_set_resource() +> +> +> cxl_cmd->payload = cxl_dstate->mbox_reg_state + +> +> A_CXL_DEV_CMD_PAYLOAD; +> +> ret = (*h)(cxl_cmd, cxl_dstate, &len); +> +> +> +> +> +> This lets the nvdimm/region probe fine, but I'm getting some issues with +> +> namespace capacity so I'll look at what is causing that next. +> +> Unfortunately I'm not that familiar with the driver/nvdimm side of things +> +> so it's take a while to figure out what kicks off what! +> +> +> +> Jonathan +> +> +> +> > +> +> > Jonathan +> +> > +> +> > +> +> > > +> +> > > > +> +> > > > And x4 region still failed with same errors, using latest cxl/preview +> +> > > > branch don't work. +> +> > > > I have picked "Two CXL emulation fixes" patches in qemu, still not +> +> > > > working. +> +> > > > +> +> > > > Bob +> +> > +> +> > +> +> +> + +Jonathan Cameron wrote: +> +On Fri, 12 Aug 2022 16:44:03 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Thu, 11 Aug 2022 18:08:57 +0100 +> +> Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> +> > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > +> +> > > > Hi Jonathan +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > +> +> > > > > Probably not related to your problem, but there is a disconnect in +> +> > > > > QEMU / +> +> > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > only +> +> > > > > has a single root port. Spec allows it to be provided or not as an +> +> > > > > implementation choice. +> +> > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > +> +> > > > > The temporary solution is to throw in a second root port on the HB +> +> > > > > and not +> +> > > > > connect anything to it. Longer term I may special case this so +> +> > > > > that the particular +> +> > > > > decoder defaults to pass through settings in QEMU if there is only +> +> > > > > one root port. +> +> > > > > +> +> > > > +> +> > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > region successfully. +> +> > > > But have some errors in Nvdimm: +> +> > > > +> +> > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > assuming node 0 +> +> > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > +> +> > > +> +> > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > list to chase +> +> > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > which is what +> +> > > I've been using to test (Which wasn't true until earlier this week). +> +> > > I'm currently testing via devmem, more for historical reasons than +> +> > > because it makes +> +> > > that much sense anymore. +> +> > +> +> > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > I'd forgotten that was still on the todo list. I don't think it will +> +> > be particularly hard to do and will take a look in next few days. +> +> > +> +> > Very very indirectly this error is causing a driver probe fail that means +> +> > that +> +> > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > Should not have gotten near that path though - hence the problem is +> +> > actually +> +> > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > we haven't fully connected up the command in QEMU. +> +> +> +> So a least one bug in QEMU. We were not supporting variable length payloads +> +> on mailbox +> +> inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> writes. +> +> We just need to relax condition on the supplied length. +> +> +> +> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> index c352a935c4..fdda9529fe 100644 +> +> --- a/hw/cxl/cxl-mailbox-utils.c +> +> +++ b/hw/cxl/cxl-mailbox-utils.c +> +> @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> h = cxl_cmd->handler; +> +> if (h) { +> +> - if (len == cxl_cmd->in) { +> +> + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. +> +> +With that fixed we hit new fun paths - after some errors we get the +> +worrying - not totally sure but looks like a failure on an error cleanup. +> +I'll chase down the error source, but even then this is probably triggerable +> +by +> +hardware problem or similar. Some bonus prints in here from me chasing +> +error paths, but it's otherwise just cxl/next + the fix I posted earlier +> +today. +One of the scenarios that I cannot rule out is nvdimm_probe() racing +nd_region_probe(), but given all the work it takes to create a region I +suspect all the nvdimm_probe() work to have completed... + +It is at least one potentially wrong hypothesis that needs to be chased +down. + +> +> +[ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +> +[ 69.920108] nd_region_probe +> +[ 69.920623] ------------[ cut here ]------------ +> +[ 69.920675] refcount_t: addition on 0; use-after-free. +> +[ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +> +refcount_warn_saturate+0xa0/0x144 +> +[ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port cxl_acpi +> +cxl_core +> +[ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ #399 +> +[ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 69.931482] Workqueue: events_unbound async_run_entry_fn +> +[ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +> +[ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +> +[ 69.936541] sp : ffff80000890b960 +> +[ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: +> +0000000000000000 +> +[ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: +> +0000000000000000 +> +[ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: +> +ffff0000c5254800 +> +[ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: +> +ffffffffffffffff +> +[ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: +> +0000000000000000 +> +[ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: +> +657466612d657375 +> +[ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : +> +ffffa54a8f63d288 +> +[ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : +> +00000000fffff31e +> +[ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : +> +ffff5ab66e5ef000 +> +root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +> +0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +> +[ 69.957098] Call trace: +> +[ 69.957959] refcount_warn_saturate+0xa0/0x144 +> +[ 69.958773] get_ndd+0x5c/0x80 +> +[ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +> +[ 69.960253] nd_region_probe+0x100/0x290 +> +[ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +> +[ 69.962087] really_probe+0x19c/0x3f0 +> +[ 69.962620] __driver_probe_device+0x11c/0x190 +> +[ 69.963258] driver_probe_device+0x44/0xf4 +> +[ 69.963773] __device_attach_driver+0xa4/0x140 +> +[ 69.964471] bus_for_each_drv+0x84/0xe0 +> +[ 69.965068] __device_attach+0xb0/0x1f0 +> +[ 69.966101] device_initial_probe+0x20/0x30 +> +[ 69.967142] bus_probe_device+0xa4/0xb0 +> +[ 69.968104] device_add+0x3e8/0x910 +> +[ 69.969111] nd_async_device_register+0x24/0x74 +> +[ 69.969928] async_run_entry_fn+0x40/0x150 +> +[ 69.970725] process_one_work+0x1dc/0x450 +> +[ 69.971796] worker_thread+0x154/0x450 +> +[ 69.972700] kthread+0x118/0x120 +> +[ 69.974141] ret_from_fork+0x10/0x20 +> +[ 69.975141] ---[ end trace 0000000000000000 ]--- +> +[ 70.117887] Into nd_namespace_pmem_set_resource() + +On Mon, 15 Aug 2022 15:55:15 -0700 +Dan Williams <dan.j.williams@intel.com> wrote: + +> +Jonathan Cameron wrote: +> +> On Fri, 12 Aug 2022 16:44:03 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Thu, 11 Aug 2022 18:08:57 +0100 +> +> > Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> > +> +> > > On Tue, 9 Aug 2022 17:08:25 +0100 +> +> > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> > > +> +> > > > On Tue, 9 Aug 2022 21:07:06 +0800 +> +> > > > Bobo WL <lmw.bobo@gmail.com> wrote: +> +> > > > +> +> > > > > Hi Jonathan +> +> > > > > +> +> > > > > Thanks for your reply! +> +> > > > > +> +> > > > > On Mon, Aug 8, 2022 at 8:37 PM Jonathan Cameron +> +> > > > > <Jonathan.Cameron@huawei.com> wrote: +> +> > > > > > +> +> > > > > > Probably not related to your problem, but there is a disconnect +> +> > > > > > in QEMU / +> +> > > > > > kernel assumptionsaround the presence of an HDM decoder when a HB +> +> > > > > > only +> +> > > > > > has a single root port. Spec allows it to be provided or not as +> +> > > > > > an implementation choice. +> +> > > > > > Kernel assumes it isn't provide. Qemu assumes it is. +> +> > > > > > +> +> > > > > > The temporary solution is to throw in a second root port on the +> +> > > > > > HB and not +> +> > > > > > connect anything to it. Longer term I may special case this so +> +> > > > > > that the particular +> +> > > > > > decoder defaults to pass through settings in QEMU if there is +> +> > > > > > only one root port. +> +> > > > > > +> +> > > > > +> +> > > > > You are right! After adding an extra HB in qemu, I can create a x1 +> +> > > > > region successfully. +> +> > > > > But have some errors in Nvdimm: +> +> > > > > +> +> > > > > [ 74.925838] Unknown online node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.925846] Unknown target node for memory at 0x10000000000, +> +> > > > > assuming node 0 +> +> > > > > [ 74.927470] nd_region region0: nmem0: is disabled, failing probe +> +> > > > > +> +> > > > +> +> > > > Ah. I've seen this one, but not chased it down yet. Was on my todo +> +> > > > list to chase +> +> > > > down. Once I reach this state I can verify the HDM Decode is correct +> +> > > > which is what +> +> > > > I've been using to test (Which wasn't true until earlier this week). +> +> > > > I'm currently testing via devmem, more for historical reasons than +> +> > > > because it makes +> +> > > > that much sense anymore. +> +> > > +> +> > > *embarassed cough*. We haven't fully hooked the LSA up in qemu yet. +> +> > > I'd forgotten that was still on the todo list. I don't think it will +> +> > > be particularly hard to do and will take a look in next few days. +> +> > > +> +> > > Very very indirectly this error is causing a driver probe fail that +> +> > > means that +> +> > > we hit a code path that has a rather odd looking check on NDD_LABELING. +> +> > > Should not have gotten near that path though - hence the problem is +> +> > > actually +> +> > > when we call cxl_pmem_get_config_data() and it returns an error because +> +> > > we haven't fully connected up the command in QEMU. +> +> > +> +> > So a least one bug in QEMU. We were not supporting variable length +> +> > payloads on mailbox +> +> > inputs (but were on outputs). That hasn't mattered until we get to LSA +> +> > writes. +> +> > We just need to relax condition on the supplied length. +> +> > +> +> > diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c +> +> > index c352a935c4..fdda9529fe 100644 +> +> > --- a/hw/cxl/cxl-mailbox-utils.c +> +> > +++ b/hw/cxl/cxl-mailbox-utils.c +> +> > @@ -510,7 +510,7 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate) +> +> > cxl_cmd = &cxl_cmd_set[set][cmd]; +> +> > h = cxl_cmd->handler; +> +> > if (h) { +> +> > - if (len == cxl_cmd->in) { +> +> > + if (len == cxl_cmd->in || !cxl_cmd->in) { +> +> Fix is wrong as we use ~0 as the placeholder for variable payload, not 0. +> +> +> +> With that fixed we hit new fun paths - after some errors we get the +> +> worrying - not totally sure but looks like a failure on an error cleanup. +> +> I'll chase down the error source, but even then this is probably +> +> triggerable by +> +> hardware problem or similar. Some bonus prints in here from me chasing +> +> error paths, but it's otherwise just cxl/next + the fix I posted earlier +> +> today. +> +> +One of the scenarios that I cannot rule out is nvdimm_probe() racing +> +nd_region_probe(), but given all the work it takes to create a region I +> +suspect all the nvdimm_probe() work to have completed... +> +> +It is at least one potentially wrong hypothesis that needs to be chased +> +down. +Maybe there should be a special award for the non-intuitive +ndctl create-namespace command (modifies existing namespace and might create +a different empty one...) I'm sure there is some interesting history behind +that one :) + +Upshot is I just threw a filesystem on fsdax and wrote some text files on it +to allow easy grepping. The right data ends up in the memory and a plausible +namespace description is stored in the LSA. + +So to some degree at least it's 'working' on an 8 way direct connected +set of emulated devices. + +One snag is that serial number support isn't yet upstream in QEMU. +(I have had it in my tree for a while but not posted it yet because of + QEMU feature freeze) +https://gitlab.com/jic23/qemu/-/commit/144c783ea8a5fbe169f46ea1ba92940157f42733 +That's needed for meaningful cookie generation. Otherwise you can build the +namespace once, but it won't work on next probe as the cookie is 0 and you +hit some error paths. + +Maybe sensible to add a sanity check and fail namespace creation if +cookie is 0? (Silly side question, but is there a theoretical risk of +a serial number / other data combination leading to a fletcher64() +checksum that happens to be 0 - that would give a very odd bug report!) + +So to make it work the following is needed: + +1) The kernel fix for mailbox buffer overflow. +2) Qemu fix for size of arguements for get_lsa +3) Qemu fix to allow variable size input arguements (for set_lsa) +4) Serial number patch above + command lines to qemu to set appropriate + serial numbers. + +I'll send out the QEMU fixes shortly and post the Serial number patch, +though that almost certainly won't go in until next QEMU development +cycle starts in a few weeks. + +Next up, run through same tests on some other topologies. + +Jonathan + +> +> +> +> +> [ 69.919877] nd_bus ndbus0: START: nd_region.probe(region0) +> +> [ 69.920108] nd_region_probe +> +> [ 69.920623] ------------[ cut here ]------------ +> +> [ 69.920675] refcount_t: addition on 0; use-after-free. +> +> [ 69.921314] WARNING: CPU: 3 PID: 710 at lib/refcount.c:25 +> +> refcount_warn_saturate+0xa0/0x144 +> +> [ 69.926949] Modules linked in: cxl_pmem cxl_mem cxl_pci cxl_port +> +> cxl_acpi cxl_core +> +> [ 69.928830] CPU: 3 PID: 710 Comm: kworker/u8:9 Not tainted 5.19.0-rc3+ +> +> #399 +> +> [ 69.930596] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 +> +> 02/06/2015 +> +> [ 69.931482] Workqueue: events_unbound async_run_entry_fn +> +> [ 69.932403] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS +> +> BTYPE=--) +> +> [ 69.934023] pc : refcount_warn_saturate+0xa0/0x144 +> +> [ 69.935161] lr : refcount_warn_saturate+0xa0/0x144 +> +> [ 69.936541] sp : ffff80000890b960 +> +> [ 69.937921] x29: ffff80000890b960 x28: 0000000000000000 x27: +> +> 0000000000000000 +> +> [ 69.940917] x26: ffffa54a90d5cb10 x25: ffffa54a90809e98 x24: +> +> 0000000000000000 +> +> [ 69.942537] x23: ffffa54a91a3d8d8 x22: ffff0000c5254800 x21: +> +> ffff0000c5254800 +> +> [ 69.944013] x20: ffff0000ce924180 x19: ffff0000c5254800 x18: +> +> ffffffffffffffff +> +> [ 69.946100] x17: ffff5ab66e5ef000 x16: ffff80000801c000 x15: +> +> 0000000000000000 +> +> [ 69.947585] x14: 0000000000000001 x13: 0a2e656572662d72 x12: +> +> 657466612d657375 +> +> [ 69.948670] x11: 203b30206e6f206e x10: 6f69746964646120 x9 : +> +> ffffa54a8f63d288 +> +> [ 69.950679] x8 : 206e6f206e6f6974 x7 : 69646461203a745f x6 : +> +> 00000000fffff31e +> +> [ 69.952113] x5 : ffff0000ff61ba08 x4 : 00000000fffff31e x3 : +> +> ffff5ab66e5ef000 +> +> root@debian:/sys/bus/cxl/devices/decoder0.0/region0# [ 69.954752] x2 : +> +> 0000000000000000 x1 : 0000000000000000 x0 : ffff0000c512e740 +> +> [ 69.957098] Call trace: +> +> [ 69.957959] refcount_warn_saturate+0xa0/0x144 +> +> [ 69.958773] get_ndd+0x5c/0x80 +> +> [ 69.959294] nd_region_register_namespaces+0xe4/0xe90 +> +> [ 69.960253] nd_region_probe+0x100/0x290 +> +> [ 69.960796] nvdimm_bus_probe+0xf4/0x1c0 +> +> [ 69.962087] really_probe+0x19c/0x3f0 +> +> [ 69.962620] __driver_probe_device+0x11c/0x190 +> +> [ 69.963258] driver_probe_device+0x44/0xf4 +> +> [ 69.963773] __device_attach_driver+0xa4/0x140 +> +> [ 69.964471] bus_for_each_drv+0x84/0xe0 +> +> [ 69.965068] __device_attach+0xb0/0x1f0 +> +> [ 69.966101] device_initial_probe+0x20/0x30 +> +> [ 69.967142] bus_probe_device+0xa4/0xb0 +> +> [ 69.968104] device_add+0x3e8/0x910 +> +> [ 69.969111] nd_async_device_register+0x24/0x74 +> +> [ 69.969928] async_run_entry_fn+0x40/0x150 +> +> [ 69.970725] process_one_work+0x1dc/0x450 +> +> [ 69.971796] worker_thread+0x154/0x450 +> +> [ 69.972700] kthread+0x118/0x120 +> +> [ 69.974141] ret_from_fork+0x10/0x20 +> +> [ 69.975141] ---[ end trace 0000000000000000 ]--- +> +> [ 70.117887] Into nd_namespace_pmem_set_resource() + +Bobo WL wrote: +> +Hi list +> +> +I want to test cxl functions in arm64, and found some problems I can't +> +figure out. +> +> +My test environment: +> +> +1. build latest bios from +https://github.com/tianocore/edk2.git +master +> +branch(cc2db6ebfb6d9d85ba4c7b35fba1fa37fffc0bc2) +> +2. build latest qemu-system-aarch64 from git://git.qemu.org/qemu.git +> +master branch(846dcf0ba4eff824c295f06550b8673ff3f31314). With cxl arm +> +support patch: +> +https://patchwork.kernel.org/project/cxl/cover/20220616141950.23374-1-Jonathan.Cameron@huawei.com/ +> +3. build Linux kernel from +> +https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git +preview +> +branch(65fc1c3d26b96002a5aa1f4012fae4dc98fd5683) +> +4. build latest ndctl tools from +https://github.com/pmem/ndctl +> +create_region branch(8558b394e449779e3a4f3ae90fae77ede0bca159) +> +> +And my qemu test commands: +> +sudo $QEMU_BIN -M virt,gic-version=3,cxl=on -m 4g,maxmem=8G,slots=8 \ +> +-cpu max -smp 8 -nographic -no-reboot \ +> +-kernel $KERNEL -bios $BIOS_BIN \ +> +-drive if=none,file=$ROOTFS,format=qcow2,id=hd \ +> +-device virtio-blk-pci,drive=hd -append 'root=/dev/vda1 +> +nokaslr dyndbg="module cxl* +p"' \ +> +-object memory-backend-ram,size=4G,id=mem0 \ +> +-numa node,nodeid=0,cpus=0-7,memdev=mem0 \ +> +-net nic -net user,hostfwd=tcp::2222-:22 -enable-kvm \ +> +-object +> +memory-backend-file,id=cxl-mem0,share=on,mem-path=/tmp/cxltest.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem1,share=on,mem-path=/tmp/cxltest1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem2,share=on,mem-path=/tmp/cxltest2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-mem3,share=on,mem-path=/tmp/cxltest3.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa0,share=on,mem-path=/tmp/lsa0.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa1,share=on,mem-path=/tmp/lsa1.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa2,share=on,mem-path=/tmp/lsa2.raw,size=256M +> +\ +> +-object +> +memory-backend-file,id=cxl-lsa3,share=on,mem-path=/tmp/lsa3.raw,size=256M +> +\ +> +-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> +-device cxl-upstream,bus=root_port0,id=us0 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport0,chassis=0,slot=4 \ +> +-device +> +cxl-type3,bus=swport0,memdev=cxl-mem0,lsa=cxl-lsa0,id=cxl-pmem0 \ +> +-device cxl-downstream,port=1,bus=us0,id=swport1,chassis=0,slot=5 \ +> +-device +> +cxl-type3,bus=swport1,memdev=cxl-mem1,lsa=cxl-lsa1,id=cxl-pmem1 \ +> +-device cxl-downstream,port=2,bus=us0,id=swport2,chassis=0,slot=6 \ +> +-device +> +cxl-type3,bus=swport2,memdev=cxl-mem2,lsa=cxl-lsa2,id=cxl-pmem2 \ +> +-device cxl-downstream,port=3,bus=us0,id=swport3,chassis=0,slot=7 \ +> +-device +> +cxl-type3,bus=swport3,memdev=cxl-mem3,lsa=cxl-lsa3,id=cxl-pmem3 \ +> +-M +> +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=4G,cxl-fmw.0.interleave-granularity=4k +> +> +And I have got two problems. +> +1. When I want to create x1 region with command: "cxl create-region -d +> +decoder0.0 -w 1 -g 4096 mem0", kernel crashed with null pointer +> +reference. Crash log: +> +> +[ 534.697324] cxl_region region0: config state: 0 +> +[ 534.697346] cxl_region region0: probe: -6 +> +[ 534.697368] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 534.699115] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 534.699149] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 534.699167] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 534.699176] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 534.699182] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 534.699189] cxl region0: 0000:0d:00.0:port2 iw: 1 ig: 256 +> +[ 534.699193] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 534.699405] Unable to handle kernel NULL pointer dereference at +> +virtual address 0000000000000000 +> +[ 534.701474] Mem abort info: +> +[ 534.701994] ESR = 0x0000000086000004 +> +[ 534.702653] EC = 0x21: IABT (current EL), IL = 32 bits +> +[ 534.703616] SET = 0, FnV = 0 +> +[ 534.704174] EA = 0, S1PTW = 0 +> +[ 534.704803] FSC = 0x04: level 0 translation fault +> +[ 534.705694] user pgtable: 4k pages, 48-bit VAs, pgdp=000000010144a000 +> +[ 534.706875] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 +> +[ 534.709855] Internal error: Oops: 86000004 [#1] PREEMPT SMP +> +[ 534.710301] Modules linked in: +> +[ 534.710546] CPU: 7 PID: 331 Comm: cxl Not tainted +> +5.19.0-rc3-00064-g65fc1c3d26b9-dirty #11 +> +[ 534.715393] Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 +> +[ 534.717179] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) +> +[ 534.719190] pc : 0x0 +> +[ 534.719928] lr : commit_store+0x118/0x2cc +> +[ 534.721007] sp : ffff80000aec3c30 +> +[ 534.721793] x29: ffff80000aec3c30 x28: ffff0000da62e740 x27: +> +ffff0000c0c06b30 +> +[ 534.723875] x26: 0000000000000000 x25: ffff0000c0a2a400 x24: +> +ffff0000c0a29400 +> +[ 534.725440] x23: 0000000000000003 x22: 0000000000000000 x21: +> +ffff0000c0c06800 +> +[ 534.727312] x20: 0000000000000000 x19: ffff0000c1559800 x18: +> +0000000000000000 +> +[ 534.729138] x17: 0000000000000000 x16: 0000000000000000 x15: +> +0000ffffd41fe838 +> +[ 534.731046] x14: 0000000000000000 x13: 0000000000000000 x12: +> +0000000000000000 +> +[ 534.732402] x11: 0000000000000000 x10: 0000000000000000 x9 : +> +0000000000000000 +> +[ 534.734432] x8 : 0000000000000000 x7 : 0000000000000000 x6 : +> +ffff0000c0906e80 +> +[ 534.735921] x5 : 0000000000000000 x4 : 0000000000000000 x3 : +> +ffff80000aec3bf0 +> +[ 534.737437] x2 : 0000000000000000 x1 : 0000000000000000 x0 : +> +ffff0000c155a000 +> +[ 534.738878] Call trace: +> +[ 534.739368] 0x0 +> +[ 534.739713] dev_attr_store+0x1c/0x30 +> +[ 534.740186] sysfs_kf_write+0x48/0x58 +> +[ 534.740961] kernfs_fop_write_iter+0x128/0x184 +> +[ 534.741872] new_sync_write+0xdc/0x158 +> +[ 534.742706] vfs_write+0x1ac/0x2a8 +> +[ 534.743440] ksys_write+0x68/0xf0 +> +[ 534.744328] __arm64_sys_write+0x1c/0x28 +> +[ 534.745180] invoke_syscall+0x44/0xf0 +> +[ 534.745989] el0_svc_common+0x4c/0xfc +> +[ 534.746661] do_el0_svc+0x60/0xa8 +> +[ 534.747378] el0_svc+0x2c/0x78 +> +[ 534.748066] el0t_64_sync_handler+0xb8/0x12c +> +[ 534.748919] el0t_64_sync+0x18c/0x190 +> +[ 534.749629] Code: bad PC value +> +[ 534.750169] ---[ end trace 0000000000000000 ]--- +What was the top kernel commit when you ran this test? What is the line +number of "commit_store+0x118"? + +> +2. When I want to create x4 region with command: "cxl create-region -d +> +decoder0.0 -w 4 -g 4096 -m mem0 mem1 mem2 mem3". I got below errors: +> +> +cxl region: create_region: region0: failed to set target3 to mem3 +> +cxl region: cmd_create_region: created 0 regions +> +> +And kernel log as below: +> +[ 60.536663] cxl_region region0: config state: 0 +> +[ 60.536675] cxl_region region0: probe: -6 +> +[ 60.536696] cxl_acpi ACPI0017:00: decoder0.0: created region0 +> +[ 60.538251] cxl region0: mem0:endpoint3 decoder3.0 add: +> +mem0:decoder3.0 @ 0 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538278] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem0:decoder3.0 @ 0 next: mem0 nr_eps: 1 nr_targets: 1 +> +[ 60.538295] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem0:decoder3.0 @ 0 next: 0000:0d:00.0 nr_eps: 1 nr_targets: 1 +> +[ 60.538647] cxl region0: mem1:endpoint4 decoder4.0 add: +> +mem1:decoder4.0 @ 1 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.538663] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem1:decoder4.0 @ 1 next: mem1 nr_eps: 2 nr_targets: 2 +> +[ 60.538675] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem1:decoder4.0 @ 1 next: 0000:0d:00.0 nr_eps: 2 nr_targets: 1 +> +[ 60.539311] cxl region0: mem2:endpoint5 decoder5.0 add: +> +mem2:decoder5.0 @ 2 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539332] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem2:decoder5.0 @ 2 next: mem2 nr_eps: 3 nr_targets: 3 +> +[ 60.539343] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem2:decoder5.0 @ 2 next: 0000:0d:00.0 nr_eps: 3 nr_targets: 1 +> +[ 60.539711] cxl region0: mem3:endpoint6 decoder6.0 add: +> +mem3:decoder6.0 @ 3 next: none nr_eps: 1 nr_targets: 1 +> +[ 60.539723] cxl region0: 0000:0d:00.0:port2 decoder2.0 add: +> +mem3:decoder6.0 @ 3 next: mem3 nr_eps: 4 nr_targets: 4 +> +[ 60.539735] cxl region0: ACPI0016:00:port1 decoder1.0 add: +> +mem3:decoder6.0 @ 3 next: 0000:0d:00.0 nr_eps: 4 nr_targets: 1 +> +[ 60.539742] cxl region0: ACPI0016:00:port1 iw: 1 ig: 256 +> +[ 60.539747] cxl region0: ACPI0016:00:port1 target[0] = 0000:0c:00.0 +> +for mem0:decoder3.0 @ 0 +> +[ 60.539754] cxl region0: 0000:0d:00.0:port2 iw: 4 ig: 512 +> +[ 60.539758] cxl region0: 0000:0d:00.0:port2 target[0] = +> +0000:0e:00.0 for mem0:decoder3.0 @ 0 +> +[ 60.539764] cxl region0: ACPI0016:00:port1: cannot host mem1:decoder4.0 at +> +1 +> +> +I have tried to write sysfs node manually, got same errors. +> +> +Hope I can get some helps here. +What is the output of: + + cxl list -MDTu -d decoder0.0 + +...? It might be the case that mem1 cannot be mapped by decoder0.0, or +at least not in the specified order, or that validation check is broken. + +Hi Dan, + +Thanks for your reply! + +On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote: +> +> +What is the output of: +> +> +cxl list -MDTu -d decoder0.0 +> +> +...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +at least not in the specified order, or that validation check is broken. +Command "cxl list -MDTu -d decoder0.0" output: + +[ + { + "memdevs":[ + { + "memdev":"mem2", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:11:00.0" + }, + { + "memdev":"mem1", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:10:00.0" + }, + { + "memdev":"mem0", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:0f:00.0" + }, + { + "memdev":"mem3", + "pmem_size":"256.00 MiB (268.44 MB)", + "ram_size":0, + "serial":"0", + "host":"0000:12:00.0" + } + ] + }, + { + "root decoders":[ + { + "decoder":"decoder0.0", + "resource":"0x10000000000", + "size":"4.00 GiB (4.29 GB)", + "pmem_capable":true, + "volatile_capable":true, + "accelmem_capable":true, + "nr_targets":1, + "targets":[ + { + "target":"ACPI0016:01", + "alias":"pci0000:0c", + "position":0, + "id":"0xc" + } + ] + } + ] + } +] + +Bobo WL wrote: +> +Hi Dan, +> +> +Thanks for your reply! +> +> +On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> +> What is the output of: +> +> +> +> cxl list -MDTu -d decoder0.0 +> +> +> +> ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> at least not in the specified order, or that validation check is broken. +> +> +Command "cxl list -MDTu -d decoder0.0" output: +Thanks for this, I think I know the problem, but will try some +experiments with cxl_test first. + +Did the commit_store() crash stop reproducing with latest cxl/preview +branch? + +On Tue, Aug 9, 2022 at 11:17 PM Dan Williams <dan.j.williams@intel.com> wrote: +> +> +Bobo WL wrote: +> +> Hi Dan, +> +> +> +> Thanks for your reply! +> +> +> +> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> wrote: +> +> > +> +> > What is the output of: +> +> > +> +> > cxl list -MDTu -d decoder0.0 +> +> > +> +> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > at least not in the specified order, or that validation check is broken. +> +> +> +> Command "cxl list -MDTu -d decoder0.0" output: +> +> +Thanks for this, I think I know the problem, but will try some +> +experiments with cxl_test first. +> +> +Did the commit_store() crash stop reproducing with latest cxl/preview +> +branch? +No, still hitting this bug if don't add extra HB device in qemu + +Dan Williams wrote: +> +Bobo WL wrote: +> +> Hi Dan, +> +> +> +> Thanks for your reply! +> +> +> +> On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> wrote: +> +> > +> +> > What is the output of: +> +> > +> +> > cxl list -MDTu -d decoder0.0 +> +> > +> +> > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > at least not in the specified order, or that validation check is broken. +> +> +> +> Command "cxl list -MDTu -d decoder0.0" output: +> +> +Thanks for this, I think I know the problem, but will try some +> +experiments with cxl_test first. +Hmm, so my cxl_test experiment unfortunately passed so I'm not +reproducing the failure mode. This is the result of creating x4 region +with devices directly attached to a single host-bridge: + +# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30)) +{ + "region":"region8", + "resource":"0xf1f0000000", + "size":"1024.00 MiB (1073.74 MB)", + "interleave_ways":4, + "interleave_granularity":256, + "decode_state":"commit", + "mappings":[ + { + "position":3, + "memdev":"mem11", + "decoder":"decoder21.0" + }, + { + "position":2, + "memdev":"mem9", + "decoder":"decoder19.0" + }, + { + "position":1, + "memdev":"mem10", + "decoder":"decoder20.0" + }, + { + "position":0, + "memdev":"mem12", + "decoder":"decoder22.0" + } + ] +} +cxl region: cmd_create_region: created 1 region + +> +Did the commit_store() crash stop reproducing with latest cxl/preview +> +branch? +I missed the answer to this question. + +All of these changes are now in Linus' tree perhaps give that a try and +post the debug log again? + +On Thu, 11 Aug 2022 17:46:55 -0700 +Dan Williams <dan.j.williams@intel.com> wrote: + +> +Dan Williams wrote: +> +> Bobo WL wrote: +> +> > Hi Dan, +> +> > +> +> > Thanks for your reply! +> +> > +> +> > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> > wrote: +> +> > > +> +> > > What is the output of: +> +> > > +> +> > > cxl list -MDTu -d decoder0.0 +> +> > > +> +> > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > > at least not in the specified order, or that validation check is +> +> > > broken. +> +> > +> +> > Command "cxl list -MDTu -d decoder0.0" output: +> +> +> +> Thanks for this, I think I know the problem, but will try some +> +> experiments with cxl_test first. +> +> +Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +reproducing the failure mode. This is the result of creating x4 region +> +with devices directly attached to a single host-bridge: +> +> +# cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s $((1<<30)) +> +{ +> +"region":"region8", +> +"resource":"0xf1f0000000", +> +"size":"1024.00 MiB (1073.74 MB)", +> +"interleave_ways":4, +> +"interleave_granularity":256, +> +"decode_state":"commit", +> +"mappings":[ +> +{ +> +"position":3, +> +"memdev":"mem11", +> +"decoder":"decoder21.0" +> +}, +> +{ +> +"position":2, +> +"memdev":"mem9", +> +"decoder":"decoder19.0" +> +}, +> +{ +> +"position":1, +> +"memdev":"mem10", +> +"decoder":"decoder20.0" +> +}, +> +{ +> +"position":0, +> +"memdev":"mem12", +> +"decoder":"decoder22.0" +> +} +> +] +> +} +> +cxl region: cmd_create_region: created 1 region +> +> +> Did the commit_store() crash stop reproducing with latest cxl/preview +> +> branch? +> +> +I missed the answer to this question. +> +> +All of these changes are now in Linus' tree perhaps give that a try and +> +post the debug log again? +Hi Dan, + +I've moved onto looking at this one. +1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy that +up +at some stage), 1 switch, 4 downstream switch ports each with a type 3 + +I'm not getting a crash, but can't successfully setup a region. +Upon adding the final target +It's failing in check_last_peer() as pos < distance. +Seems distance is 4 which makes me think it's using the wrong level of the +heirarchy for +some reason or that distance check is wrong. +Wasn't a good idea to just skip that step though as it goes boom - though +stack trace is not useful. + +Jonathan + +On Wed, 17 Aug 2022 17:16:19 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Thu, 11 Aug 2022 17:46:55 -0700 +> +Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> Dan Williams wrote: +> +> > Bobo WL wrote: +> +> > > Hi Dan, +> +> > > +> +> > > Thanks for your reply! +> +> > > +> +> > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams <dan.j.williams@intel.com> +> +> > > wrote: +> +> > > > +> +> > > > What is the output of: +> +> > > > +> +> > > > cxl list -MDTu -d decoder0.0 +> +> > > > +> +> > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, or +> +> > > > at least not in the specified order, or that validation check is +> +> > > > broken. +> +> > > +> +> > > Command "cxl list -MDTu -d decoder0.0" output: +> +> > +> +> > Thanks for this, I think I know the problem, but will try some +> +> > experiments with cxl_test first. +> +> +> +> Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +> reproducing the failure mode. This is the result of creating x4 region +> +> with devices directly attached to a single host-bridge: +> +> +> +> # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s +> +> $((1<<30)) +> +> { +> +> "region":"region8", +> +> "resource":"0xf1f0000000", +> +> "size":"1024.00 MiB (1073.74 MB)", +> +> "interleave_ways":4, +> +> "interleave_granularity":256, +> +> "decode_state":"commit", +> +> "mappings":[ +> +> { +> +> "position":3, +> +> "memdev":"mem11", +> +> "decoder":"decoder21.0" +> +> }, +> +> { +> +> "position":2, +> +> "memdev":"mem9", +> +> "decoder":"decoder19.0" +> +> }, +> +> { +> +> "position":1, +> +> "memdev":"mem10", +> +> "decoder":"decoder20.0" +> +> }, +> +> { +> +> "position":0, +> +> "memdev":"mem12", +> +> "decoder":"decoder22.0" +> +> } +> +> ] +> +> } +> +> cxl region: cmd_create_region: created 1 region +> +> +> +> > Did the commit_store() crash stop reproducing with latest cxl/preview +> +> > branch? +> +> +> +> I missed the answer to this question. +> +> +> +> All of these changes are now in Linus' tree perhaps give that a try and +> +> post the debug log again? +> +> +Hi Dan, +> +> +I've moved onto looking at this one. +> +1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy +> +that up +> +at some stage), 1 switch, 4 downstream switch ports each with a type 3 +> +> +I'm not getting a crash, but can't successfully setup a region. +> +Upon adding the final target +> +It's failing in check_last_peer() as pos < distance. +> +Seems distance is 4 which makes me think it's using the wrong level of the +> +heirarchy for +> +some reason or that distance check is wrong. +> +Wasn't a good idea to just skip that step though as it goes boom - though +> +stack trace is not useful. +Turns out really weird corruption happens if you accidentally back two type3 +devices +with the same memory device. Who would have thought it :) + +That aside ignoring the check_last_peer() failure seems to make everything work +for this +topology. I'm not seeing the crash, so my guess is we fixed it somewhere along +the way. + +Now for the fun one. I've replicated the crash if we have + +1HB 1*RP 1SW, 4SW-DSP, 4Type3 + +Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be +programmed +but the null pointer dereference isn't related to that. + +The bug is straight forward. Not all decoders have commit callbacks... Will +send out +a possible fix shortly. + +Jonathan + + + +> +> +Jonathan +> +> +> +> +> +> + +On Thu, 18 Aug 2022 17:37:40 +0100 +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: + +> +On Wed, 17 Aug 2022 17:16:19 +0100 +> +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> On Thu, 11 Aug 2022 17:46:55 -0700 +> +> Dan Williams <dan.j.williams@intel.com> wrote: +> +> +> +> > Dan Williams wrote: +> +> > > Bobo WL wrote: +> +> > > > Hi Dan, +> +> > > > +> +> > > > Thanks for your reply! +> +> > > > +> +> > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams +> +> > > > <dan.j.williams@intel.com> wrote: +> +> > > > > +> +> > > > > What is the output of: +> +> > > > > +> +> > > > > cxl list -MDTu -d decoder0.0 +> +> > > > > +> +> > > > > ...? It might be the case that mem1 cannot be mapped by decoder0.0, +> +> > > > > or +> +> > > > > at least not in the specified order, or that validation check is +> +> > > > > broken. +> +> > > > +> +> > > > Command "cxl list -MDTu -d decoder0.0" output: +> +> > > +> +> > > Thanks for this, I think I know the problem, but will try some +> +> > > experiments with cxl_test first. +> +> > +> +> > Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +> > reproducing the failure mode. This is the result of creating x4 region +> +> > with devices directly attached to a single host-bridge: +> +> > +> +> > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s +> +> > $((1<<30)) +> +> > { +> +> > "region":"region8", +> +> > "resource":"0xf1f0000000", +> +> > "size":"1024.00 MiB (1073.74 MB)", +> +> > "interleave_ways":4, +> +> > "interleave_granularity":256, +> +> > "decode_state":"commit", +> +> > "mappings":[ +> +> > { +> +> > "position":3, +> +> > "memdev":"mem11", +> +> > "decoder":"decoder21.0" +> +> > }, +> +> > { +> +> > "position":2, +> +> > "memdev":"mem9", +> +> > "decoder":"decoder19.0" +> +> > }, +> +> > { +> +> > "position":1, +> +> > "memdev":"mem10", +> +> > "decoder":"decoder20.0" +> +> > }, +> +> > { +> +> > "position":0, +> +> > "memdev":"mem12", +> +> > "decoder":"decoder22.0" +> +> > } +> +> > ] +> +> > } +> +> > cxl region: cmd_create_region: created 1 region +> +> > +> +> > > Did the commit_store() crash stop reproducing with latest cxl/preview +> +> > > branch? +> +> > +> +> > I missed the answer to this question. +> +> > +> +> > All of these changes are now in Linus' tree perhaps give that a try and +> +> > post the debug log again? +> +> +> +> Hi Dan, +> +> +> +> I've moved onto looking at this one. +> +> 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy +> +> that up +> +> at some stage), 1 switch, 4 downstream switch ports each with a type 3 +> +> +> +> I'm not getting a crash, but can't successfully setup a region. +> +> Upon adding the final target +> +> It's failing in check_last_peer() as pos < distance. +> +> Seems distance is 4 which makes me think it's using the wrong level of the +> +> heirarchy for +> +> some reason or that distance check is wrong. +> +> Wasn't a good idea to just skip that step though as it goes boom - though +> +> stack trace is not useful. +> +> +Turns out really weird corruption happens if you accidentally back two type3 +> +devices +> +with the same memory device. Who would have thought it :) +> +> +That aside ignoring the check_last_peer() failure seems to make everything +> +work for this +> +topology. I'm not seeing the crash, so my guess is we fixed it somewhere +> +along the way. +> +> +Now for the fun one. I've replicated the crash if we have +> +> +1HB 1*RP 1SW, 4SW-DSP, 4Type3 +> +> +Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be +> +programmed +> +but the null pointer dereference isn't related to that. +> +> +The bug is straight forward. Not all decoders have commit callbacks... Will +> +send out +> +a possible fix shortly. +> +For completeness I'm carrying this hack because I haven't gotten my head +around the right fix for check_last_peer() failing on this test topology. + +diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c +index c49d9a5f1091..275e143bd748 100644 +--- a/drivers/cxl/core/region.c ++++ b/drivers/cxl/core/region.c +@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, + rc = check_last_peer(cxled, ep, cxl_rr, + distance); + if (rc) +- return rc; ++ // return rc; + goto out_target_set; + } + goto add_target; +-- + +I might find more bugs with more testing, but this is all the ones I've +seen so far + in Bobo's reports. Qemu fixes are now in upstream so +will be there in the release. + +As a reminder, testing on QEMU has a few corners... + +Need a patch to add serial number ECAP support. It is on list for revew, +but will have wait for after QEMU 7.1 release (which may be next week) + +QEMU still assumes HDM decoder on the host bridge will be programmed. +So if you want anything to work there should be at least +2 RP below the HB (no need to plug anything in to one of them). + +I don't want to add a commandline parameter to hide the decoder in QEMU +and detecting there is only one RP would require moving a bunch of static +stuff into runtime code (I think). + +I still think we should make the kernel check to see if there is a decoder, +but if not I might see how bad a hack it is to have QEMU ignore that decoder +if not committed in this one special case (HB HDM decoder with only one place +it can send stuff). Obviously that would be a break from specification +so less than idea! + +Thanks, + +Jonathan + +On Fri, 19 Aug 2022 09:46:55 +0100 +Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: + +> +On Thu, 18 Aug 2022 17:37:40 +0100 +> +Jonathan Cameron via <qemu-devel@nongnu.org> wrote: +> +> +> On Wed, 17 Aug 2022 17:16:19 +0100 +> +> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: +> +> +> +> > On Thu, 11 Aug 2022 17:46:55 -0700 +> +> > Dan Williams <dan.j.williams@intel.com> wrote: +> +> > +> +> > > Dan Williams wrote: +> +> > > > Bobo WL wrote: +> +> > > > > Hi Dan, +> +> > > > > +> +> > > > > Thanks for your reply! +> +> > > > > +> +> > > > > On Mon, Aug 8, 2022 at 11:58 PM Dan Williams +> +> > > > > <dan.j.williams@intel.com> wrote: +> +> > > > > > +> +> > > > > > What is the output of: +> +> > > > > > +> +> > > > > > cxl list -MDTu -d decoder0.0 +> +> > > > > > +> +> > > > > > ...? It might be the case that mem1 cannot be mapped by +> +> > > > > > decoder0.0, or +> +> > > > > > at least not in the specified order, or that validation check is +> +> > > > > > broken. +> +> > > > > +> +> > > > > Command "cxl list -MDTu -d decoder0.0" output: +> +> > > > +> +> > > > Thanks for this, I think I know the problem, but will try some +> +> > > > experiments with cxl_test first. +> +> > > +> +> > > Hmm, so my cxl_test experiment unfortunately passed so I'm not +> +> > > reproducing the failure mode. This is the result of creating x4 region +> +> > > with devices directly attached to a single host-bridge: +> +> > > +> +> > > # cxl create-region -d decoder3.5 -w 4 -m -g 256 mem{12,10,9,11} -s +> +> > > $((1<<30)) +> +> > > { +> +> > > "region":"region8", +> +> > > "resource":"0xf1f0000000", +> +> > > "size":"1024.00 MiB (1073.74 MB)", +> +> > > "interleave_ways":4, +> +> > > "interleave_granularity":256, +> +> > > "decode_state":"commit", +> +> > > "mappings":[ +> +> > > { +> +> > > "position":3, +> +> > > "memdev":"mem11", +> +> > > "decoder":"decoder21.0" +> +> > > }, +> +> > > { +> +> > > "position":2, +> +> > > "memdev":"mem9", +> +> > > "decoder":"decoder19.0" +> +> > > }, +> +> > > { +> +> > > "position":1, +> +> > > "memdev":"mem10", +> +> > > "decoder":"decoder20.0" +> +> > > }, +> +> > > { +> +> > > "position":0, +> +> > > "memdev":"mem12", +> +> > > "decoder":"decoder22.0" +> +> > > } +> +> > > ] +> +> > > } +> +> > > cxl region: cmd_create_region: created 1 region +> +> > > +> +> > > > Did the commit_store() crash stop reproducing with latest cxl/preview +> +> > > > branch? +> +> > > +> +> > > I missed the answer to this question. +> +> > > +> +> > > All of these changes are now in Linus' tree perhaps give that a try and +> +> > > post the debug log again? +> +> > +> +> > Hi Dan, +> +> > +> +> > I've moved onto looking at this one. +> +> > 1 HB, 2RP (to make it configure the HDM decoder in the QEMU HB, I'll tidy +> +> > that up +> +> > at some stage), 1 switch, 4 downstream switch ports each with a type 3 +> +> > +> +> > I'm not getting a crash, but can't successfully setup a region. +> +> > Upon adding the final target +> +> > It's failing in check_last_peer() as pos < distance. +> +> > Seems distance is 4 which makes me think it's using the wrong level of +> +> > the heirarchy for +> +> > some reason or that distance check is wrong. +> +> > Wasn't a good idea to just skip that step though as it goes boom - though +> +> > stack trace is not useful. +> +> +> +> Turns out really weird corruption happens if you accidentally back two +> +> type3 devices +> +> with the same memory device. Who would have thought it :) +> +> +> +> That aside ignoring the check_last_peer() failure seems to make everything +> +> work for this +> +> topology. I'm not seeing the crash, so my guess is we fixed it somewhere +> +> along the way. +> +> +> +> Now for the fun one. I've replicated the crash if we have +> +> +> +> 1HB 1*RP 1SW, 4SW-DSP, 4Type3 +> +> +> +> Now, I'd expect to see it not 'work' because the QEMU HDM decoder won't be +> +> programmed +> +> but the null pointer dereference isn't related to that. +> +> +> +> The bug is straight forward. Not all decoders have commit callbacks... +> +> Will send out +> +> a possible fix shortly. +> +> +> +For completeness I'm carrying this hack because I haven't gotten my head +> +around the right fix for check_last_peer() failing on this test topology. +> +> +diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c +> +index c49d9a5f1091..275e143bd748 100644 +> +--- a/drivers/cxl/core/region.c +> ++++ b/drivers/cxl/core/region.c +> +@@ -978,7 +978,7 @@ static int cxl_port_setup_targets(struct cxl_port *port, +> +rc = check_last_peer(cxled, ep, cxl_rr, +> +distance); +> +if (rc) +> +- return rc; +> ++ // return rc; +> +goto out_target_set; +> +} +> +goto add_target; +I'm still carrying this hack and still haven't worked out the right fix. + +Suggestions welcome! If not I'll hopefully get some time on this +towards the end of the week. + +Jonathan + |