TCG: 0.923 mistranslation: 0.922 performance: 0.919 device: 0.917 debug: 0.901 graphic: 0.901 other: 0.898 PID: 0.896 permissions: 0.892 semantic: 0.890 architecture: 0.888 assembly: 0.886 socket: 0.884 register: 0.881 vnc: 0.881 risc-v: 0.880 arm: 0.869 network: 0.865 files: 0.861 boot: 0.860 x86: 0.828 kernel virtual machine: 0.824 [BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device When I start qemu with a second virtio-net-ccw device (i.e. adding -device virtio-net-ccw in addition to the autogenerated device), I get a segfault. gdb points to #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { (backtrace doesn't go further) Starting qemu with no additional "-device virtio-net-ccw" (i.e., only the autogenerated virtio-net-ccw device is present) works. Specifying several "-device virtio-net-pci" works as well. Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") works (in-between state does not compile). This is reproducible with tcg as well. Same problem both with --enable-vhost-vdpa and --disable-vhost-vdpa. Have not yet tried to figure out what might be special with virtio-ccw... anyone have an idea? [This should probably be considered a blocker?] On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > When I start qemu with a second virtio-net-ccw device (i.e. adding > -device virtio-net-ccw in addition to the autogenerated device), I get > a segfault. gdb points to > > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, > config=0x55d6ad9e3f80 "RT") at > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > > (backtrace doesn't go further) > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only > the autogenerated virtio-net-ccw device is present) works. Specifying > several "-device virtio-net-pci" works as well. > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") > works (in-between state does not compile). Ouch. I didn't test all in-between states :( But I wish we had a 0-day instrastructure like kernel has, that catches things like that. > This is reproducible with tcg as well. Same problem both with > --enable-vhost-vdpa and --disable-vhost-vdpa. > > Have not yet tried to figure out what might be special with > virtio-ccw... anyone have an idea? > > [This should probably be considered a blocker?] On Fri, 24 Jul 2020 09:30:58 -0400 "Michael S. Tsirkin" wrote: > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > > When I start qemu with a second virtio-net-ccw device (i.e. adding > > -device virtio-net-ccw in addition to the autogenerated device), I get > > a segfault. gdb points to > > > > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, > > config=0x55d6ad9e3f80 "RT") at > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > > > > (backtrace doesn't go further) The core was incomplete, but running under gdb directly shows that it is just a bog-standard config space access (first for that device). The cause of the crash is that nc->peer is not set... no idea how that can happen, not that familiar with that part of QEMU. (Should the code check, or is that really something that should not happen?) What I don't understand is why it is set correctly for the first, autogenerated virtio-net-ccw device, but not for the second one, and why virtio-net-pci doesn't show these problems. The only difference between -ccw and -pci that comes to my mind here is that config space accesses for ccw are done via an asynchronous operation, so timing might be different. > > > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only > > the autogenerated virtio-net-ccw device is present) works. Specifying > > several "-device virtio-net-pci" works as well. > > > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") > > works (in-between state does not compile). > > Ouch. I didn't test all in-between states :( > But I wish we had a 0-day instrastructure like kernel has, > that catches things like that. Yep, that would be useful... so patchew only builds the complete series? > > > This is reproducible with tcg as well. Same problem both with > > --enable-vhost-vdpa and --disable-vhost-vdpa. > > > > Have not yet tried to figure out what might be special with > > virtio-ccw... anyone have an idea? > > > > [This should probably be considered a blocker?] I think so, as it makes s390x unusable with more that one virtio-net-ccw device, and I don't even see a workaround. On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: > On Fri, 24 Jul 2020 09:30:58 -0400 > "Michael S. Tsirkin" wrote: > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > > > When I start qemu with a second virtio-net-ccw device (i.e. adding > > > -device virtio-net-ccw in addition to the autogenerated device), I get > > > a segfault. gdb points to > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, > > > config=0x55d6ad9e3f80 "RT") at > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > > > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > > > > > > (backtrace doesn't go further) > > The core was incomplete, but running under gdb directly shows that it > is just a bog-standard config space access (first for that device). > > The cause of the crash is that nc->peer is not set... no idea how that > can happen, not that familiar with that part of QEMU. (Should the code > check, or is that really something that should not happen?) > > What I don't understand is why it is set correctly for the first, > autogenerated virtio-net-ccw device, but not for the second one, and > why virtio-net-pci doesn't show these problems. The only difference > between -ccw and -pci that comes to my mind here is that config space > accesses for ccw are done via an asynchronous operation, so timing > might be different. Hopefully Jason has an idea. Could you post a full command line please? Do you need a working guest to trigger this? Does this trigger on an x86 host? > > > > > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only > > > the autogenerated virtio-net-ccw device is present) works. Specifying > > > several "-device virtio-net-pci" works as well. > > > > > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net > > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") > > > works (in-between state does not compile). > > > > Ouch. I didn't test all in-between states :( > > But I wish we had a 0-day instrastructure like kernel has, > > that catches things like that. > > Yep, that would be useful... so patchew only builds the complete series? > > > > > > This is reproducible with tcg as well. Same problem both with > > > --enable-vhost-vdpa and --disable-vhost-vdpa. > > > > > > Have not yet tried to figure out what might be special with > > > virtio-ccw... anyone have an idea? > > > > > > [This should probably be considered a blocker?] > > I think so, as it makes s390x unusable with more that one > virtio-net-ccw device, and I don't even see a workaround. On Fri, 24 Jul 2020 11:17:57 -0400 "Michael S. Tsirkin" wrote: > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: > > On Fri, 24 Jul 2020 09:30:58 -0400 > > "Michael S. Tsirkin" wrote: > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > > > > When I start qemu with a second virtio-net-ccw device (i.e. adding > > > > -device virtio-net-ccw in addition to the autogenerated device), I get > > > > a segfault. gdb points to > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, > > > > config=0x55d6ad9e3f80 "RT") at > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > > > > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > > > > > > > > (backtrace doesn't go further) > > > > The core was incomplete, but running under gdb directly shows that it > > is just a bog-standard config space access (first for that device). > > > > The cause of the crash is that nc->peer is not set... no idea how that > > can happen, not that familiar with that part of QEMU. (Should the code > > check, or is that really something that should not happen?) > > > > What I don't understand is why it is set correctly for the first, > > autogenerated virtio-net-ccw device, but not for the second one, and > > why virtio-net-pci doesn't show these problems. The only difference > > between -ccw and -pci that comes to my mind here is that config space > > accesses for ccw are done via an asynchronous operation, so timing > > might be different. > > Hopefully Jason has an idea. Could you post a full command line > please? Do you need a working guest to trigger this? Does this trigger > on an x86 host? Yes, it does trigger with tcg-on-x86 as well. I've been using s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-net-ccw It seems it needs the guest actually doing something with the nics; I cannot reproduce the crash if I use the old advent calendar moon buggy image and just add a virtio-net-ccw device. (I don't think it's a problem with my local build, as I see the problem both on my laptop and on an LPAR.) > > > > > > > > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only > > > > the autogenerated virtio-net-ccw device is present) works. Specifying > > > > several "-device virtio-net-pci" works as well. > > > > > > > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net > > > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") > > > > works (in-between state does not compile). > > > > > > Ouch. I didn't test all in-between states :( > > > But I wish we had a 0-day instrastructure like kernel has, > > > that catches things like that. > > > > Yep, that would be useful... so patchew only builds the complete series? > > > > > > > > > This is reproducible with tcg as well. Same problem both with > > > > --enable-vhost-vdpa and --disable-vhost-vdpa. > > > > > > > > Have not yet tried to figure out what might be special with > > > > virtio-ccw... anyone have an idea? > > > > > > > > [This should probably be considered a blocker?] > > > > I think so, as it makes s390x unusable with more that one > > virtio-net-ccw device, and I don't even see a workaround. > On 2020/7/24 下午11:34, Cornelia Huck wrote: On Fri, 24 Jul 2020 11:17:57 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: On Fri, 24 Jul 2020 09:30:58 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: When I start qemu with a second virtio-net-ccw device (i.e. adding -device virtio-net-ccw in addition to the autogenerated device), I get a segfault. gdb points to #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { (backtrace doesn't go further) The core was incomplete, but running under gdb directly shows that it is just a bog-standard config space access (first for that device). The cause of the crash is that nc->peer is not set... no idea how that can happen, not that familiar with that part of QEMU. (Should the code check, or is that really something that should not happen?) What I don't understand is why it is set correctly for the first, autogenerated virtio-net-ccw device, but not for the second one, and why virtio-net-pci doesn't show these problems. The only difference between -ccw and -pci that comes to my mind here is that config space accesses for ccw are done via an asynchronous operation, so timing might be different. Hopefully Jason has an idea. Could you post a full command line please? Do you need a working guest to trigger this? Does this trigger on an x86 host? Yes, it does trigger with tcg-on-x86 as well. I've been using s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-net-ccw It seems it needs the guest actually doing something with the nics; I cannot reproduce the crash if I use the old advent calendar moon buggy image and just add a virtio-net-ccw device. (I don't think it's a problem with my local build, as I see the problem both on my laptop and on an LPAR.) It looks to me we forget the check the existence of peer. Please try the attached patch to see if it works. Thanks 0001-virtio-net-check-the-existence-of-peer-before-accesi.patch Description: Text Data On Sat, 25 Jul 2020 08:40:07 +0800 Jason Wang wrote: > On 2020/7/24 下午11:34, Cornelia Huck wrote: > > On Fri, 24 Jul 2020 11:17:57 -0400 > > "Michael S. Tsirkin" wrote: > > > >> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: > >>> On Fri, 24 Jul 2020 09:30:58 -0400 > >>> "Michael S. Tsirkin" wrote: > >>> > >>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > >>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding > >>>>> -device virtio-net-ccw in addition to the autogenerated device), I get > >>>>> a segfault. gdb points to > >>>>> > >>>>> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, > >>>>> config=0x55d6ad9e3f80 "RT") at > >>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > >>>>> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > >>>>> > >>>>> (backtrace doesn't go further) > >>> The core was incomplete, but running under gdb directly shows that it > >>> is just a bog-standard config space access (first for that device). > >>> > >>> The cause of the crash is that nc->peer is not set... no idea how that > >>> can happen, not that familiar with that part of QEMU. (Should the code > >>> check, or is that really something that should not happen?) > >>> > >>> What I don't understand is why it is set correctly for the first, > >>> autogenerated virtio-net-ccw device, but not for the second one, and > >>> why virtio-net-pci doesn't show these problems. The only difference > >>> between -ccw and -pci that comes to my mind here is that config space > >>> accesses for ccw are done via an asynchronous operation, so timing > >>> might be different. > >> Hopefully Jason has an idea. Could you post a full command line > >> please? Do you need a working guest to trigger this? Does this trigger > >> on an x86 host? > > Yes, it does trigger with tcg-on-x86 as well. I've been using > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu > > qemu,zpci=on > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 > > -device > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 > > -device virtio-net-ccw > > > > It seems it needs the guest actually doing something with the nics; I > > cannot reproduce the crash if I use the old advent calendar moon buggy > > image and just add a virtio-net-ccw device. > > > > (I don't think it's a problem with my local build, as I see the problem > > both on my laptop and on an LPAR.) > > > It looks to me we forget the check the existence of peer. > > Please try the attached patch to see if it works. Thanks, that patch gets my guest up and running again. So, FWIW, Tested-by: Cornelia Huck Any idea why this did not hit with virtio-net-pci (or the autogenerated virtio-net-ccw device)? On 2020/7/27 下午2:43, Cornelia Huck wrote: On Sat, 25 Jul 2020 08:40:07 +0800 Jason Wang wrote: On 2020/7/24 下午11:34, Cornelia Huck wrote: On Fri, 24 Jul 2020 11:17:57 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: On Fri, 24 Jul 2020 09:30:58 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: When I start qemu with a second virtio-net-ccw device (i.e. adding -device virtio-net-ccw in addition to the autogenerated device), I get a segfault. gdb points to #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { (backtrace doesn't go further) The core was incomplete, but running under gdb directly shows that it is just a bog-standard config space access (first for that device). The cause of the crash is that nc->peer is not set... no idea how that can happen, not that familiar with that part of QEMU. (Should the code check, or is that really something that should not happen?) What I don't understand is why it is set correctly for the first, autogenerated virtio-net-ccw device, but not for the second one, and why virtio-net-pci doesn't show these problems. The only difference between -ccw and -pci that comes to my mind here is that config space accesses for ccw are done via an asynchronous operation, so timing might be different. Hopefully Jason has an idea. Could you post a full command line please? Do you need a working guest to trigger this? Does this trigger on an x86 host? Yes, it does trigger with tcg-on-x86 as well. I've been using s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-net-ccw It seems it needs the guest actually doing something with the nics; I cannot reproduce the crash if I use the old advent calendar moon buggy image and just add a virtio-net-ccw device. (I don't think it's a problem with my local build, as I see the problem both on my laptop and on an LPAR.) It looks to me we forget the check the existence of peer. Please try the attached patch to see if it works. Thanks, that patch gets my guest up and running again. So, FWIW, Tested-by: Cornelia Huck Any idea why this did not hit with virtio-net-pci (or the autogenerated virtio-net-ccw device)? It can be hit with virtio-net-pci as well (just start without peer). For autogenerated virtio-net-cww, I think the reason is that it has already had a peer set. Thanks On Mon, 27 Jul 2020 15:38:12 +0800 Jason Wang wrote: > On 2020/7/27 下午2:43, Cornelia Huck wrote: > > On Sat, 25 Jul 2020 08:40:07 +0800 > > Jason Wang wrote: > > > >> On 2020/7/24 下午11:34, Cornelia Huck wrote: > >>> On Fri, 24 Jul 2020 11:17:57 -0400 > >>> "Michael S. Tsirkin" wrote: > >>> > >>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: > >>>>> On Fri, 24 Jul 2020 09:30:58 -0400 > >>>>> "Michael S. Tsirkin" wrote: > >>>>> > >>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > >>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding > >>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get > >>>>>>> a segfault. gdb points to > >>>>>>> > >>>>>>> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, > >>>>>>> config=0x55d6ad9e3f80 "RT") at > >>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > >>>>>>> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { > >>>>>>> > >>>>>>> (backtrace doesn't go further) > >>>>> The core was incomplete, but running under gdb directly shows that it > >>>>> is just a bog-standard config space access (first for that device). > >>>>> > >>>>> The cause of the crash is that nc->peer is not set... no idea how that > >>>>> can happen, not that familiar with that part of QEMU. (Should the code > >>>>> check, or is that really something that should not happen?) > >>>>> > >>>>> What I don't understand is why it is set correctly for the first, > >>>>> autogenerated virtio-net-ccw device, but not for the second one, and > >>>>> why virtio-net-pci doesn't show these problems. The only difference > >>>>> between -ccw and -pci that comes to my mind here is that config space > >>>>> accesses for ccw are done via an asynchronous operation, so timing > >>>>> might be different. > >>>> Hopefully Jason has an idea. Could you post a full command line > >>>> please? Do you need a working guest to trigger this? Does this trigger > >>>> on an x86 host? > >>> Yes, it does trigger with tcg-on-x86 as well. I've been using > >>> > >>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu > >>> qemu,zpci=on > >>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 > >>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 > >>> -device > >>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 > >>> -device virtio-net-ccw > >>> > >>> It seems it needs the guest actually doing something with the nics; I > >>> cannot reproduce the crash if I use the old advent calendar moon buggy > >>> image and just add a virtio-net-ccw device. > >>> > >>> (I don't think it's a problem with my local build, as I see the problem > >>> both on my laptop and on an LPAR.) > >> > >> It looks to me we forget the check the existence of peer. > >> > >> Please try the attached patch to see if it works. > > Thanks, that patch gets my guest up and running again. So, FWIW, > > > > Tested-by: Cornelia Huck > > > > Any idea why this did not hit with virtio-net-pci (or the autogenerated > > virtio-net-ccw device)? > > > It can be hit with virtio-net-pci as well (just start without peer). Hm, I had not been able to reproduce the crash with a 'naked' -device virtio-net-pci. But checking seems to be the right idea anyway. > > For autogenerated virtio-net-cww, I think the reason is that it has > already had a peer set. Ok, that might well be. On 2020/7/27 下午4:41, Cornelia Huck wrote: On Mon, 27 Jul 2020 15:38:12 +0800 Jason Wang wrote: On 2020/7/27 下午2:43, Cornelia Huck wrote: On Sat, 25 Jul 2020 08:40:07 +0800 Jason Wang wrote: On 2020/7/24 下午11:34, Cornelia Huck wrote: On Fri, 24 Jul 2020 11:17:57 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: On Fri, 24 Jul 2020 09:30:58 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: When I start qemu with a second virtio-net-ccw device (i.e. adding -device virtio-net-ccw in addition to the autogenerated device), I get a segfault. gdb points to #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { (backtrace doesn't go further) The core was incomplete, but running under gdb directly shows that it is just a bog-standard config space access (first for that device). The cause of the crash is that nc->peer is not set... no idea how that can happen, not that familiar with that part of QEMU. (Should the code check, or is that really something that should not happen?) What I don't understand is why it is set correctly for the first, autogenerated virtio-net-ccw device, but not for the second one, and why virtio-net-pci doesn't show these problems. The only difference between -ccw and -pci that comes to my mind here is that config space accesses for ccw are done via an asynchronous operation, so timing might be different. Hopefully Jason has an idea. Could you post a full command line please? Do you need a working guest to trigger this? Does this trigger on an x86 host? Yes, it does trigger with tcg-on-x86 as well. I've been using s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-net-ccw It seems it needs the guest actually doing something with the nics; I cannot reproduce the crash if I use the old advent calendar moon buggy image and just add a virtio-net-ccw device. (I don't think it's a problem with my local build, as I see the problem both on my laptop and on an LPAR.) It looks to me we forget the check the existence of peer. Please try the attached patch to see if it works. Thanks, that patch gets my guest up and running again. So, FWIW, Tested-by: Cornelia Huck Any idea why this did not hit with virtio-net-pci (or the autogenerated virtio-net-ccw device)? It can be hit with virtio-net-pci as well (just start without peer). Hm, I had not been able to reproduce the crash with a 'naked' -device virtio-net-pci. But checking seems to be the right idea anyway. Sorry for being unclear, I meant for networking part, you just need start without peer, and you need a real guest (any Linux) that is trying to access the config space of virtio-net. Thanks For autogenerated virtio-net-cww, I think the reason is that it has already had a peer set. Ok, that might well be. On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: > > On 2020/7/27 下午4:41, Cornelia Huck wrote: > > On Mon, 27 Jul 2020 15:38:12 +0800 > > Jason Wang wrote: > > > > > On 2020/7/27 下午2:43, Cornelia Huck wrote: > > > > On Sat, 25 Jul 2020 08:40:07 +0800 > > > > Jason Wang wrote: > > > > > On 2020/7/24 下午11:34, Cornelia Huck wrote: > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400 > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400 > > > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: > > > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e. > > > > > > > > > > adding > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated > > > > > > > > > > device), I get > > > > > > > > > > a segfault. gdb points to > > > > > > > > > > > > > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config > > > > > > > > > > (vdev=, > > > > > > > > > > config=0x55d6ad9e3f80 "RT") at > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > > > > > > > > > > 146 if (nc->peer->info->type == > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) { > > > > > > > > > > > > > > > > > > > > (backtrace doesn't go further) > > > > > > > > The core was incomplete, but running under gdb directly shows > > > > > > > > that it > > > > > > > > is just a bog-standard config space access (first for that > > > > > > > > device). > > > > > > > > > > > > > > > > The cause of the crash is that nc->peer is not set... no idea > > > > > > > > how that > > > > > > > > can happen, not that familiar with that part of QEMU. (Should > > > > > > > > the code > > > > > > > > check, or is that really something that should not happen?) > > > > > > > > > > > > > > > > What I don't understand is why it is set correctly for the > > > > > > > > first, > > > > > > > > autogenerated virtio-net-ccw device, but not for the second > > > > > > > > one, and > > > > > > > > why virtio-net-pci doesn't show these problems. The only > > > > > > > > difference > > > > > > > > between -ccw and -pci that comes to my mind here is that config > > > > > > > > space > > > > > > > > accesses for ccw are done via an asynchronous operation, so > > > > > > > > timing > > > > > > > > might be different. > > > > > > > Hopefully Jason has an idea. Could you post a full command line > > > > > > > please? Do you need a working guest to trigger this? Does this > > > > > > > trigger > > > > > > > on an x86 host? > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using > > > > > > > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu > > > > > > qemu,zpci=on > > > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 > > > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 > > > > > > -device > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 > > > > > > -device virtio-net-ccw > > > > > > > > > > > > It seems it needs the guest actually doing something with the nics; > > > > > > I > > > > > > cannot reproduce the crash if I use the old advent calendar moon > > > > > > buggy > > > > > > image and just add a virtio-net-ccw device. > > > > > > > > > > > > (I don't think it's a problem with my local build, as I see the > > > > > > problem > > > > > > both on my laptop and on an LPAR.) > > > > > It looks to me we forget the check the existence of peer. > > > > > > > > > > Please try the attached patch to see if it works. > > > > Thanks, that patch gets my guest up and running again. So, FWIW, > > > > > > > > Tested-by: Cornelia Huck > > > > > > > > Any idea why this did not hit with virtio-net-pci (or the autogenerated > > > > virtio-net-ccw device)? > > > > > > It can be hit with virtio-net-pci as well (just start without peer). > > Hm, I had not been able to reproduce the crash with a 'naked' -device > > virtio-net-pci. But checking seems to be the right idea anyway. > > > Sorry for being unclear, I meant for networking part, you just need start > without peer, and you need a real guest (any Linux) that is trying to access > the config space of virtio-net. > > Thanks A pxe guest will do it, but that doesn't support ccw, right? I'm still unclear why this triggers with ccw but not pci - any idea? > > > > > > For autogenerated virtio-net-cww, I think the reason is that it has > > > already had a peer set. > > Ok, that might well be. > > > > On 2020/7/27 下午7:43, Michael S. Tsirkin wrote: On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: On 2020/7/27 下午4:41, Cornelia Huck wrote: On Mon, 27 Jul 2020 15:38:12 +0800 Jason Wang wrote: On 2020/7/27 下午2:43, Cornelia Huck wrote: On Sat, 25 Jul 2020 08:40:07 +0800 Jason Wang wrote: On 2020/7/24 下午11:34, Cornelia Huck wrote: On Fri, 24 Jul 2020 11:17:57 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: On Fri, 24 Jul 2020 09:30:58 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: When I start qemu with a second virtio-net-ccw device (i.e. adding -device virtio-net-ccw in addition to the autogenerated device), I get a segfault. gdb points to #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { (backtrace doesn't go further) The core was incomplete, but running under gdb directly shows that it is just a bog-standard config space access (first for that device). The cause of the crash is that nc->peer is not set... no idea how that can happen, not that familiar with that part of QEMU. (Should the code check, or is that really something that should not happen?) What I don't understand is why it is set correctly for the first, autogenerated virtio-net-ccw device, but not for the second one, and why virtio-net-pci doesn't show these problems. The only difference between -ccw and -pci that comes to my mind here is that config space accesses for ccw are done via an asynchronous operation, so timing might be different. Hopefully Jason has an idea. Could you post a full command line please? Do you need a working guest to trigger this? Does this trigger on an x86 host? Yes, it does trigger with tcg-on-x86 as well. I've been using s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-net-ccw It seems it needs the guest actually doing something with the nics; I cannot reproduce the crash if I use the old advent calendar moon buggy image and just add a virtio-net-ccw device. (I don't think it's a problem with my local build, as I see the problem both on my laptop and on an LPAR.) It looks to me we forget the check the existence of peer. Please try the attached patch to see if it works. Thanks, that patch gets my guest up and running again. So, FWIW, Tested-by: Cornelia Huck Any idea why this did not hit with virtio-net-pci (or the autogenerated virtio-net-ccw device)? It can be hit with virtio-net-pci as well (just start without peer). Hm, I had not been able to reproduce the crash with a 'naked' -device virtio-net-pci. But checking seems to be the right idea anyway. Sorry for being unclear, I meant for networking part, you just need start without peer, and you need a real guest (any Linux) that is trying to access the config space of virtio-net. Thanks A pxe guest will do it, but that doesn't support ccw, right? Yes, it depends on the cli actually. I'm still unclear why this triggers with ccw but not pci - any idea? I don't test pxe but I can reproduce this with pci (just start a linux guest without a peer). Thanks On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote: > > On 2020/7/27 下午7:43, Michael S. Tsirkin wrote: > > On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: > > > On 2020/7/27 下午4:41, Cornelia Huck wrote: > > > > On Mon, 27 Jul 2020 15:38:12 +0800 > > > > Jason Wang wrote: > > > > > > > > > On 2020/7/27 下午2:43, Cornelia Huck wrote: > > > > > > On Sat, 25 Jul 2020 08:40:07 +0800 > > > > > > Jason Wang wrote: > > > > > > > On 2020/7/24 下午11:34, Cornelia Huck wrote: > > > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400 > > > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: > > > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400 > > > > > > > > > > "Michael S. Tsirkin" wrote: > > > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck > > > > > > > > > > > wrote: > > > > > > > > > > > > When I start qemu with a second virtio-net-ccw device > > > > > > > > > > > > (i.e. adding > > > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated > > > > > > > > > > > > device), I get > > > > > > > > > > > > a segfault. gdb points to > > > > > > > > > > > > > > > > > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config > > > > > > > > > > > > (vdev=, > > > > > > > > > > > > config=0x55d6ad9e3f80 "RT") at > > > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 > > > > > > > > > > > > 146 if (nc->peer->info->type == > > > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) { > > > > > > > > > > > > > > > > > > > > > > > > (backtrace doesn't go further) > > > > > > > > > > The core was incomplete, but running under gdb directly > > > > > > > > > > shows that it > > > > > > > > > > is just a bog-standard config space access (first for that > > > > > > > > > > device). > > > > > > > > > > > > > > > > > > > > The cause of the crash is that nc->peer is not set... no > > > > > > > > > > idea how that > > > > > > > > > > can happen, not that familiar with that part of QEMU. > > > > > > > > > > (Should the code > > > > > > > > > > check, or is that really something that should not happen?) > > > > > > > > > > > > > > > > > > > > What I don't understand is why it is set correctly for the > > > > > > > > > > first, > > > > > > > > > > autogenerated virtio-net-ccw device, but not for the second > > > > > > > > > > one, and > > > > > > > > > > why virtio-net-pci doesn't show these problems. The only > > > > > > > > > > difference > > > > > > > > > > between -ccw and -pci that comes to my mind here is that > > > > > > > > > > config space > > > > > > > > > > accesses for ccw are done via an asynchronous operation, so > > > > > > > > > > timing > > > > > > > > > > might be different. > > > > > > > > > Hopefully Jason has an idea. Could you post a full command > > > > > > > > > line > > > > > > > > > please? Do you need a working guest to trigger this? Does > > > > > > > > > this trigger > > > > > > > > > on an x86 host? > > > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using > > > > > > > > > > > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg > > > > > > > > -cpu qemu,zpci=on > > > > > > > > -m 1024 -nographic -device > > > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 > > > > > > > > -drive > > > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 > > > > > > > > -device > > > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 > > > > > > > > -device virtio-net-ccw > > > > > > > > > > > > > > > > It seems it needs the guest actually doing something with the > > > > > > > > nics; I > > > > > > > > cannot reproduce the crash if I use the old advent calendar > > > > > > > > moon buggy > > > > > > > > image and just add a virtio-net-ccw device. > > > > > > > > > > > > > > > > (I don't think it's a problem with my local build, as I see the > > > > > > > > problem > > > > > > > > both on my laptop and on an LPAR.) > > > > > > > It looks to me we forget the check the existence of peer. > > > > > > > > > > > > > > Please try the attached patch to see if it works. > > > > > > Thanks, that patch gets my guest up and running again. So, FWIW, > > > > > > > > > > > > Tested-by: Cornelia Huck > > > > > > > > > > > > Any idea why this did not hit with virtio-net-pci (or the > > > > > > autogenerated > > > > > > virtio-net-ccw device)? > > > > > It can be hit with virtio-net-pci as well (just start without peer). > > > > Hm, I had not been able to reproduce the crash with a 'naked' -device > > > > virtio-net-pci. But checking seems to be the right idea anyway. > > > Sorry for being unclear, I meant for networking part, you just need start > > > without peer, and you need a real guest (any Linux) that is trying to > > > access > > > the config space of virtio-net. > > > > > > Thanks > > A pxe guest will do it, but that doesn't support ccw, right? > > > Yes, it depends on the cli actually. > > > > > > I'm still unclear why this triggers with ccw but not pci - > > any idea? > > > I don't test pxe but I can reproduce this with pci (just start a linux guest > without a peer). > > Thanks > Might be a good addition to a unit test. Not sure what would the test do exactly: just make sure guest runs? Looks like a lot of work for an empty test ... maybe we can poke at the guest config with qtest commands at least. -- MST On 2020/7/27 下午9:16, Michael S. Tsirkin wrote: On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote: On 2020/7/27 下午7:43, Michael S. Tsirkin wrote: On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: On 2020/7/27 下午4:41, Cornelia Huck wrote: On Mon, 27 Jul 2020 15:38:12 +0800 Jason Wang wrote: On 2020/7/27 下午2:43, Cornelia Huck wrote: On Sat, 25 Jul 2020 08:40:07 +0800 Jason Wang wrote: On 2020/7/24 下午11:34, Cornelia Huck wrote: On Fri, 24 Jul 2020 11:17:57 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: On Fri, 24 Jul 2020 09:30:58 -0400 "Michael S. Tsirkin" wrote: On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: When I start qemu with a second virtio-net-ccw device (i.e. adding -device virtio-net-ccw in addition to the autogenerated device), I get a segfault. gdb points to #0 0x000055d6ab52681d in virtio_net_get_config (vdev=, config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { (backtrace doesn't go further) The core was incomplete, but running under gdb directly shows that it is just a bog-standard config space access (first for that device). The cause of the crash is that nc->peer is not set... no idea how that can happen, not that familiar with that part of QEMU. (Should the code check, or is that really something that should not happen?) What I don't understand is why it is set correctly for the first, autogenerated virtio-net-ccw device, but not for the second one, and why virtio-net-pci doesn't show these problems. The only difference between -ccw and -pci that comes to my mind here is that config space accesses for ccw are done via an asynchronous operation, so timing might be different. Hopefully Jason has an idea. Could you post a full command line please? Do you need a working guest to trigger this? Does this trigger on an x86 host? Yes, it does trigger with tcg-on-x86 as well. I've been using s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 -device virtio-net-ccw It seems it needs the guest actually doing something with the nics; I cannot reproduce the crash if I use the old advent calendar moon buggy image and just add a virtio-net-ccw device. (I don't think it's a problem with my local build, as I see the problem both on my laptop and on an LPAR.) It looks to me we forget the check the existence of peer. Please try the attached patch to see if it works. Thanks, that patch gets my guest up and running again. So, FWIW, Tested-by: Cornelia Huck Any idea why this did not hit with virtio-net-pci (or the autogenerated virtio-net-ccw device)? It can be hit with virtio-net-pci as well (just start without peer). Hm, I had not been able to reproduce the crash with a 'naked' -device virtio-net-pci. But checking seems to be the right idea anyway. Sorry for being unclear, I meant for networking part, you just need start without peer, and you need a real guest (any Linux) that is trying to access the config space of virtio-net. Thanks A pxe guest will do it, but that doesn't support ccw, right? Yes, it depends on the cli actually. I'm still unclear why this triggers with ccw but not pci - any idea? I don't test pxe but I can reproduce this with pci (just start a linux guest without a peer). Thanks Might be a good addition to a unit test. Not sure what would the test do exactly: just make sure guest runs? Looks like a lot of work for an empty test ... maybe we can poke at the guest config with qtest commands at least. That should work or we can simply extend the exist virtio-net qtest to do that. Thanks