diff options
Diffstat (limited to 'results/classifier/009/performance')
| -rw-r--r-- | results/classifier/009/performance/79834768 | 419 | ||||
| -rw-r--r-- | results/classifier/009/performance/80604314 | 1490 |
2 files changed, 1909 insertions, 0 deletions
diff --git a/results/classifier/009/performance/79834768 b/results/classifier/009/performance/79834768 new file mode 100644 index 000000000..95c9f99ec --- /dev/null +++ b/results/classifier/009/performance/79834768 @@ -0,0 +1,419 @@ +performance: 0.952 +debug: 0.943 +other: 0.943 +permissions: 0.939 +graphic: 0.933 +semantic: 0.920 +PID: 0.916 +device: 0.915 +socket: 0.912 +files: 0.904 +vnc: 0.885 +boot: 0.880 +KVM: 0.840 +network: 0.830 + +[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application + +Hiï¼ + +We hit a bug in our test while run PCMark 10 in a windows 7 VM, +The VM got stuck and the wallclock was hang after several minutes running +PCMark 10 in it. +It is quite easily to reproduce the bug with the upstream KVM and Qemu. + +We found that KVM can not inject any RTC irq to VM after it was hang, it fails +to +Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. + +static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, + int irq_level, bool line_status) +{ +⦠⦠+ if (!irq_level) { + ioapic->irr &= ~mask; + ret = 1; + goto out; + } +⦠⦠+ if ((edge && old_irr == ioapic->irr) || + (!edge && entry.fields.remote_irr)) { + ret = 0; + goto out; + } + +According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs +register C to to clear the irq flag, and pull down the irq electric pin. + +For Qemu, we will emulate the reading operation in cmos_ioport_read(), +but Guest OS will fire a write operation before to tell which register will be +read +after this write, where we use s->cmos_index to record the following register +to read. + +But in our test, we found that there is a possible situation that Vcpu fails to +read +RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading +registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, +so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, +but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, +it changes s->cmos_index to RTC_YEAR by a writing action. +The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we +will miss +calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never +inject RTC irq, +and Windows VM will hang. +static void cmos_ioport_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + RTCState *s = opaque; + + if ((addr & 1) == 0) { + s->cmos_index = data & 0x7f; + } +â¦â¦ +static uint64_t cmos_ioport_read(void *opaque, hwaddr addr, + unsigned size) +{ + RTCState *s = opaque; + int ret; + if ((addr & 1) == 0) { + return 0xff; + } else { + switch(s->cmos_index) { + +According to CMOS spec, âany write to PROT 0070h should be followed by an +action to PROT 0071h or the RTC +Will be RTC will be left in an unknown stateâ, but it seems that we can not +ensure this sequence in qemu/kvm. + +Any ideas ? + +Thanks, +Hailiang + +Pls see the trace of kvm_pio: + + CPU 1/KVM-15567 [003] .... 209311.762579: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762582: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x89 + CPU 1/KVM-15567 [003] .... 209311.762590: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x17 + CPU 0/KVM-15566 [005] .... 209311.762611: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0xc + CPU 1/KVM-15567 [003] .... 209311.762615: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762619: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x88 + CPU 1/KVM-15567 [003] .... 209311.762627: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x12 + CPU 0/KVM-15566 [005] .... 209311.762632: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x12 + CPU 1/KVM-15567 [003] .... 209311.762633: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 0/KVM-15566 [005] .... 209311.762634: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0xc <--- Firstly write to 0x70, cmo_index = 0xc & +0x7f = 0xc + CPU 1/KVM-15567 [003] .... 209311.762636: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x86 <-- Secondly write to 0x70, cmo_index = 0x86 & +0x7f = 0x6, cover the cmo_index result of first time + CPU 0/KVM-15566 [005] .... 209311.762641: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x6 <-- vcpu0 read 0x6 because cmo_index is 0x6 now + CPU 1/KVM-15567 [003] .... 209311.762644: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x6 <- vcpu1 read 0x6 + CPU 1/KVM-15567 [003] .... 209311.762649: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762669: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x87 + CPU 1/KVM-15567 [003] .... 209311.762678: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x1 + CPU 1/KVM-15567 [003] .... 209311.762683: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762686: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x84 + CPU 1/KVM-15567 [003] .... 209311.762693: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x10 + CPU 1/KVM-15567 [003] .... 209311.762699: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762702: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x82 + CPU 1/KVM-15567 [003] .... 209311.762709: kvm_pio: pio_read at 0x71 size +1 count 1 val 0x25 + CPU 1/KVM-15567 [003] .... 209311.762714: kvm_pio: pio_read at 0x70 size +1 count 1 val 0xff + CPU 1/KVM-15567 [003] .... 209311.762717: kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x80 + + +Regards, +-Gonglei + +From: Zhanghailiang +Sent: Friday, December 01, 2017 3:03 AM +To: address@hidden; address@hidden; Paolo Bonzini +Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou +Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application + +Hiï¼ + +We hit a bug in our test while run PCMark 10 in a windows 7 VM, +The VM got stuck and the wallclock was hang after several minutes running +PCMark 10 in it. +It is quite easily to reproduce the bug with the upstream KVM and Qemu. + +We found that KVM can not inject any RTC irq to VM after it was hang, it fails +to +Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr. + +static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, + int irq_level, bool line_status) +{ +⦠⦠+ if (!irq_level) { + ioapic->irr &= ~mask; + ret = 1; + goto out; + } +⦠⦠+ if ((edge && old_irr == ioapic->irr) || + (!edge && entry.fields.remote_irr)) { + ret = 0; + goto out; + } + +According to RTC spec, after RTC injects a High level irq, OS will read CMOSâs +register C to to clear the irq flag, and pull down the irq electric pin. + +For Qemu, we will emulate the reading operation in cmos_ioport_read(), +but Guest OS will fire a write operation before to tell which register will be +read +after this write, where we use s->cmos_index to record the following register +to read. + +But in our test, we found that there is a possible situation that Vcpu fails to +read +RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading +registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C, +so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C, +but before it tries to read register C, another vcpu1 is going to read RTC_YEAR, +it changes s->cmos_index to RTC_YEAR by a writing action. +The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we +will miss +calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never +inject RTC irq, +and Windows VM will hang. +static void cmos_ioport_write(void *opaque, hwaddr addr, + uint64_t data, unsigned size) +{ + RTCState *s = opaque; + + if ((addr & 1) == 0) { + s->cmos_index = data & 0x7f; + } +â¦â¦ +static uint64_t cmos_ioport_read(void *opaque, hwaddr addr, + unsigned size) +{ + RTCState *s = opaque; + int ret; + if ((addr & 1) == 0) { + return 0xff; + } else { + switch(s->cmos_index) { + +According to CMOS spec, âany write to PROT 0070h should be followed by an +action to PROT 0071h or the RTC +Will be RTC will be left in an unknown stateâ, but it seems that we can not +ensure this sequence in qemu/kvm. + +Any ideas ? + +Thanks, +Hailiang + +On 01/12/2017 08:08, Gonglei (Arei) wrote: +> +First write to 0x70, cmos_index = 0xc & 0x7f = 0xc +> +      CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> +> +Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>       CPU 1/KVM-15567 +> +kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because +> +cmos_index is 0x6 now:>       CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size +> +1 count 1 val 0x6> vcpu1 read 0x6:>       CPU 1/KVM-15567 kvm_pio: pio_read +> +at 0x71 size 1 count 1 val 0x6 +This seems to be a Windows bug. The easiest workaround that I +can think of is to clear the interrupts already when 0xc is written, +without waiting for the read (because REG_C can only be read). + +What do you think? + +Thanks, + +Paolo + +I also think it's windows bug, the problem is that it doesn't occur on xen +platform. And there are some other works need to be done while reading REG_C. +So I wrote that patch. + +Thanks, +Gonglei +å件人ï¼Paolo Bonzini +æ¶ä»¶äººï¼é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin +æéï¼é»ä¼æ ,çæ¬£,谢祥æ +æ¶é´ï¼2017-12-02 01:10:08 +主é¢:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application + +On 01/12/2017 08:08, Gonglei (Arei) wrote: +> +First write to 0x70, cmos_index = 0xc & 0x7f = 0xc +> +CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> +> +Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6> CPU 1/KVM-15567 +> +kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because +> +cmos_index is 0x6 now:> CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size +> +1 count 1 val 0x6> vcpu1 read 0x6:> CPU 1/KVM-15567 kvm_pio: pio_read +> +at 0x71 size 1 count 1 val 0x6 +This seems to be a Windows bug. The easiest workaround that I +can think of is to clear the interrupts already when 0xc is written, +without waiting for the read (because REG_C can only be read). + +What do you think? + +Thanks, + +Paolo + +On 01/12/2017 18:45, Gonglei (Arei) wrote: +> +I also think it's windows bug, the problem is that it doesn't occur on +> +xen platform. +It's a race, it may just be that RTC PIO is faster in Xen because it's +implemented in the hypervisor. + +I will try reporting it to Microsoft. + +Thanks, + +Paolo + +> +Thanks, +> +Gonglei +> +*å件人ï¼*Paolo Bonzini +> +*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin +> +*æéï¼*é»ä¼æ ,çæ¬£,谢祥æ +> +*æ¶é´ï¼*2017-12-02 01:10:08 +> +*主é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application +> +> +On 01/12/2017 08:08, Gonglei (Arei) wrote: +> +> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc +> +>       CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> +> +> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>       CPU 1/KVM-15567 +> +> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because +> +> cmos_index is 0x6 now:>       CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 +> +> size 1 count 1 val 0x6> vcpu1 +> +read 0x6:>       CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count +> +1 val 0x6 +> +This seems to be a Windows bug. The easiest workaround that I +> +can think of is to clear the interrupts already when 0xc is written, +> +without waiting for the read (because REG_C can only be read). +> +> +What do you think? +> +> +Thanks, +> +> +Paolo + +On 2017/12/2 2:37, Paolo Bonzini wrote: +On 01/12/2017 18:45, Gonglei (Arei) wrote: +I also think it's windows bug, the problem is that it doesn't occur on +xen platform. +It's a race, it may just be that RTC PIO is faster in Xen because it's +implemented in the hypervisor. +No, In Xen, it does not has such problem because it injects the RTC irq without +checking whether its previous irq been cleared or not, which we do has such +checking +in KVM. + +static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq, + int irq_level, bool line_status) +{ + ... ... + if (!irq_level) { + ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not +inject RTC irq. + ret = 1; + goto out; + } + +I agree that we move the operation of clearing RTC irq from cmos_ioport_read() +to +cmos_ioport_write() to ensure the action been done. + +Thanks, +Hailiang +I will try reporting it to Microsoft. + +Thanks, + +Paolo +Thanks, +Gonglei +*å件人ï¼*Paolo Bonzini +*æ¶ä»¶äººï¼*é¾ç£,å¼ æµ·äº®,qemu-devel,Michael S. Tsirkin +*æéï¼*é»ä¼æ ,çæ¬£,谢祥æ +*æ¶é´ï¼*2017-12-02 01:10:08 +*主é¢:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application + +On 01/12/2017 08:08, Gonglei (Arei) wrote: +First write to 0x70, cmos_index = 0xc & 0x7f = 0xc + CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to +0x70, cmos_index = 0x86 & 0x7f = 0x6> CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 +size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:> CPU +0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1 +read 0x6:> CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count +1 val 0x6 +This seems to be a Windows bug. The easiest workaround that I +can think of is to clear the interrupts already when 0xc is written, +without waiting for the read (because REG_C can only be read). + +What do you think? + +Thanks, + +Paolo +. + diff --git a/results/classifier/009/performance/80604314 b/results/classifier/009/performance/80604314 new file mode 100644 index 000000000..79b9997e8 --- /dev/null +++ b/results/classifier/009/performance/80604314 @@ -0,0 +1,1490 @@ +performance: 0.919 +device: 0.917 +debug: 0.901 +graphic: 0.901 +other: 0.898 +PID: 0.896 +permissions: 0.892 +KVM: 0.891 +semantic: 0.890 +socket: 0.884 +vnc: 0.881 +network: 0.865 +files: 0.861 +boot: 0.860 + +[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device + +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) + +Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +the autogenerated virtio-net-ccw device is present) works. Specifying +several "-device virtio-net-pci" works as well. + +Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +works (in-between state does not compile). + +This is reproducible with tcg as well. Same problem both with +--enable-vhost-vdpa and --disable-vhost-vdpa. + +Have not yet tried to figure out what might be special with +virtio-ccw... anyone have an idea? + +[This should probably be considered a blocker?] + +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +When I start qemu with a second virtio-net-ccw device (i.e. adding +> +-device virtio-net-ccw in addition to the autogenerated device), I get +> +a segfault. gdb points to +> +> +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +config=0x55d6ad9e3f80 "RT") at +> +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> +(backtrace doesn't go further) +> +> +Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +the autogenerated virtio-net-ccw device is present) works. Specifying +> +several "-device virtio-net-pci" works as well. +> +> +Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +works (in-between state does not compile). +Ouch. I didn't test all in-between states :( +But I wish we had a 0-day instrastructure like kernel has, +that catches things like that. + +> +This is reproducible with tcg as well. Same problem both with +> +--enable-vhost-vdpa and --disable-vhost-vdpa. +> +> +Have not yet tried to figure out what might be special with +> +virtio-ccw... anyone have an idea? +> +> +[This should probably be considered a blocker?] + +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin" <mst@redhat.com> wrote: + +> +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> When I start qemu with a second virtio-net-ccw device (i.e. adding +> +> -device virtio-net-ccw in addition to the autogenerated device), I get +> +> a segfault. gdb points to +> +> +> +> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +> config=0x55d6ad9e3f80 "RT") at +> +> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> +> +> (backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. + +> +> +> +> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +> the autogenerated virtio-net-ccw device is present) works. Specifying +> +> several "-device virtio-net-pci" works as well. +> +> +> +> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +> works (in-between state does not compile). +> +> +Ouch. I didn't test all in-between states :( +> +But I wish we had a 0-day instrastructure like kernel has, +> +that catches things like that. +Yep, that would be useful... so patchew only builds the complete series? + +> +> +> This is reproducible with tcg as well. Same problem both with +> +> --enable-vhost-vdpa and --disable-vhost-vdpa. +> +> +> +> Have not yet tried to figure out what might be special with +> +> virtio-ccw... anyone have an idea? +> +> +> +> [This should probably be considered a blocker?] +I think so, as it makes s390x unusable with more that one +virtio-net-ccw device, and I don't even see a workaround. + +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +On Fri, 24 Jul 2020 09:30:58 -0400 +> +"Michael S. Tsirkin" <mst@redhat.com> wrote: +> +> +> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> > When I start qemu with a second virtio-net-ccw device (i.e. adding +> +> > -device virtio-net-ccw in addition to the autogenerated device), I get +> +> > a segfault. gdb points to +> +> > +> +> > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +> > config=0x55d6ad9e3f80 "RT") at +> +> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > +> +> > (backtrace doesn't go further) +> +> +The core was incomplete, but running under gdb directly shows that it +> +is just a bog-standard config space access (first for that device). +> +> +The cause of the crash is that nc->peer is not set... no idea how that +> +can happen, not that familiar with that part of QEMU. (Should the code +> +check, or is that really something that should not happen?) +> +> +What I don't understand is why it is set correctly for the first, +> +autogenerated virtio-net-ccw device, but not for the second one, and +> +why virtio-net-pci doesn't show these problems. The only difference +> +between -ccw and -pci that comes to my mind here is that config space +> +accesses for ccw are done via an asynchronous operation, so timing +> +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? + +> +> > +> +> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +> > the autogenerated virtio-net-ccw device is present) works. Specifying +> +> > several "-device virtio-net-pci" works as well. +> +> > +> +> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +> > works (in-between state does not compile). +> +> +> +> Ouch. I didn't test all in-between states :( +> +> But I wish we had a 0-day instrastructure like kernel has, +> +> that catches things like that. +> +> +Yep, that would be useful... so patchew only builds the complete series? +> +> +> +> +> > This is reproducible with tcg as well. Same problem both with +> +> > --enable-vhost-vdpa and --disable-vhost-vdpa. +> +> > +> +> > Have not yet tried to figure out what might be special with +> +> > virtio-ccw... anyone have an idea? +> +> > +> +> > [This should probably be considered a blocker?] +> +> +I think so, as it makes s390x unusable with more that one +> +virtio-net-ccw device, and I don't even see a workaround. + +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin" <mst@redhat.com> wrote: + +> +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +> On Fri, 24 Jul 2020 09:30:58 -0400 +> +> "Michael S. Tsirkin" <mst@redhat.com> wrote: +> +> +> +> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> > > When I start qemu with a second virtio-net-ccw device (i.e. adding +> +> > > -device virtio-net-ccw in addition to the autogenerated device), I get +> +> > > a segfault. gdb points to +> +> > > +> +> > > #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +> > > config=0x55d6ad9e3f80 "RT") at +> +> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > > 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > > +> +> > > (backtrace doesn't go further) +> +> +> +> The core was incomplete, but running under gdb directly shows that it +> +> is just a bog-standard config space access (first for that device). +> +> +> +> The cause of the crash is that nc->peer is not set... no idea how that +> +> can happen, not that familiar with that part of QEMU. (Should the code +> +> check, or is that really something that should not happen?) +> +> +> +> What I don't understand is why it is set correctly for the first, +> +> autogenerated virtio-net-ccw device, but not for the second one, and +> +> why virtio-net-pci doesn't show these problems. The only difference +> +> between -ccw and -pci that comes to my mind here is that config space +> +> accesses for ccw are done via an asynchronous operation, so timing +> +> might be different. +> +> +Hopefully Jason has an idea. Could you post a full command line +> +please? Do you need a working guest to trigger this? Does this trigger +> +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 + +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) + +> +> +> > > +> +> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only +> +> > > the autogenerated virtio-net-ccw device is present) works. Specifying +> +> > > several "-device virtio-net-pci" works as well. +> +> > > +> +> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net +> +> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config") +> +> > > works (in-between state does not compile). +> +> > +> +> > Ouch. I didn't test all in-between states :( +> +> > But I wish we had a 0-day instrastructure like kernel has, +> +> > that catches things like that. +> +> +> +> Yep, that would be useful... so patchew only builds the complete series? +> +> +> +> > +> +> > > This is reproducible with tcg as well. Same problem both with +> +> > > --enable-vhost-vdpa and --disable-vhost-vdpa. +> +> > > +> +> > > Have not yet tried to figure out what might be special with +> +> > > virtio-ccw... anyone have an idea? +> +> > > +> +> > > [This should probably be considered a blocker?] +> +> +> +> I think so, as it makes s390x unusable with more that one +> +> virtio-net-ccw device, and I don't even see a workaround. +> + +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. + +Thanks +0001-virtio-net-check-the-existence-of-peer-before-accesi.patch +Description: +Text Data + +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang <jasowang@redhat.com> wrote: + +> +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +> On Fri, 24 Jul 2020 11:17:57 -0400 +> +> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> +> +>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +>>> On Fri, 24 Jul 2020 09:30:58 -0400 +> +>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +>>> +> +>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding +> +>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get +> +>>>>> a segfault. gdb points to +> +>>>>> +> +>>>>> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +>>>>> config=0x55d6ad9e3f80 "RT") at +> +>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +>>>>> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +>>>>> +> +>>>>> (backtrace doesn't go further) +> +>>> The core was incomplete, but running under gdb directly shows that it +> +>>> is just a bog-standard config space access (first for that device). +> +>>> +> +>>> The cause of the crash is that nc->peer is not set... no idea how that +> +>>> can happen, not that familiar with that part of QEMU. (Should the code +> +>>> check, or is that really something that should not happen?) +> +>>> +> +>>> What I don't understand is why it is set correctly for the first, +> +>>> autogenerated virtio-net-ccw device, but not for the second one, and +> +>>> why virtio-net-pci doesn't show these problems. The only difference +> +>>> between -ccw and -pci that comes to my mind here is that config space +> +>>> accesses for ccw are done via an asynchronous operation, so timing +> +>>> might be different. +> +>> Hopefully Jason has an idea. Could you post a full command line +> +>> please? Do you need a working guest to trigger this? Does this trigger +> +>> on an x86 host? +> +> Yes, it does trigger with tcg-on-x86 as well. I've been using +> +> +> +> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu +> +> qemu,zpci=on +> +> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +> -device +> +> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +> -device virtio-net-ccw +> +> +> +> It seems it needs the guest actually doing something with the nics; I +> +> cannot reproduce the crash if I use the old advent calendar moon buggy +> +> image and just add a virtio-net-ccw device. +> +> +> +> (I don't think it's a problem with my local build, as I see the problem +> +> both on my laptop and on an LPAR.) +> +> +> +It looks to me we forget the check the existence of peer. +> +> +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck <cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? + +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang <jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck <cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +For autogenerated virtio-net-cww, I think the reason is that it has +already had a peer set. +Thanks + +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang <jasowang@redhat.com> wrote: + +> +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +> +> On Sat, 25 Jul 2020 08:40:07 +0800 +> +> Jason Wang <jasowang@redhat.com> wrote: +> +> +> +>> On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +>>> On Fri, 24 Jul 2020 11:17:57 -0400 +> +>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +>>> +> +>>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +>>>>> On Fri, 24 Jul 2020 09:30:58 -0400 +> +>>>>> "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +>>>>> +> +>>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +>>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding +> +>>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get +> +>>>>>>> a segfault. gdb points to +> +>>>>>>> +> +>>>>>>> #0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, +> +>>>>>>> config=0x55d6ad9e3f80 "RT") at +> +>>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +>>>>>>> 146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { +> +>>>>>>> +> +>>>>>>> (backtrace doesn't go further) +> +>>>>> The core was incomplete, but running under gdb directly shows that it +> +>>>>> is just a bog-standard config space access (first for that device). +> +>>>>> +> +>>>>> The cause of the crash is that nc->peer is not set... no idea how that +> +>>>>> can happen, not that familiar with that part of QEMU. (Should the code +> +>>>>> check, or is that really something that should not happen?) +> +>>>>> +> +>>>>> What I don't understand is why it is set correctly for the first, +> +>>>>> autogenerated virtio-net-ccw device, but not for the second one, and +> +>>>>> why virtio-net-pci doesn't show these problems. The only difference +> +>>>>> between -ccw and -pci that comes to my mind here is that config space +> +>>>>> accesses for ccw are done via an asynchronous operation, so timing +> +>>>>> might be different. +> +>>>> Hopefully Jason has an idea. Could you post a full command line +> +>>>> please? Do you need a working guest to trigger this? Does this trigger +> +>>>> on an x86 host? +> +>>> Yes, it does trigger with tcg-on-x86 as well. I've been using +> +>>> +> +>>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu +> +>>> qemu,zpci=on +> +>>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +>>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +>>> -device +> +>>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +>>> -device virtio-net-ccw +> +>>> +> +>>> It seems it needs the guest actually doing something with the nics; I +> +>>> cannot reproduce the crash if I use the old advent calendar moon buggy +> +>>> image and just add a virtio-net-ccw device. +> +>>> +> +>>> (I don't think it's a problem with my local build, as I see the problem +> +>>> both on my laptop and on an LPAR.) +> +>> +> +>> It looks to me we forget the check the existence of peer. +> +>> +> +>> Please try the attached patch to see if it works. +> +> Thanks, that patch gets my guest up and running again. So, FWIW, +> +> +> +> Tested-by: Cornelia Huck <cohuck@redhat.com> +> +> +> +> Any idea why this did not hit with virtio-net-pci (or the autogenerated +> +> virtio-net-ccw device)? +> +> +> +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. + +> +> +For autogenerated virtio-net-cww, I think the reason is that it has +> +already had a peer set. +Ok, that might well be. + +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang <jasowang@redhat.com> wrote: +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang <jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck <cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. +Sorry for being unclear, I meant for networking part, you just need +start without peer, and you need a real guest (any Linux) that is trying +to access the config space of virtio-net. +Thanks +For autogenerated virtio-net-cww, I think the reason is that it has +already had a peer set. +Ok, that might well be. + +On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +> +> +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +> +> On Mon, 27 Jul 2020 15:38:12 +0800 +> +> Jason Wang <jasowang@redhat.com> wrote: +> +> +> +> > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +> +> > > On Sat, 25 Jul 2020 08:40:07 +0800 +> +> > > Jason Wang <jasowang@redhat.com> wrote: +> +> > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +> > > > > On Fri, 24 Jul 2020 11:17:57 -0400 +> +> > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400 +> +> > > > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +> +> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e. +> +> > > > > > > > > adding +> +> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated +> +> > > > > > > > > device), I get +> +> > > > > > > > > a segfault. gdb points to +> +> > > > > > > > > +> +> > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config +> +> > > > > > > > > (vdev=<optimized out>, +> +> > > > > > > > > config=0x55d6ad9e3f80 "RT") at +> +> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > > > > > > > > 146 if (nc->peer->info->type == +> +> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > > > > > > > > +> +> > > > > > > > > (backtrace doesn't go further) +> +> > > > > > > The core was incomplete, but running under gdb directly shows +> +> > > > > > > that it +> +> > > > > > > is just a bog-standard config space access (first for that +> +> > > > > > > device). +> +> > > > > > > +> +> > > > > > > The cause of the crash is that nc->peer is not set... no idea +> +> > > > > > > how that +> +> > > > > > > can happen, not that familiar with that part of QEMU. (Should +> +> > > > > > > the code +> +> > > > > > > check, or is that really something that should not happen?) +> +> > > > > > > +> +> > > > > > > What I don't understand is why it is set correctly for the +> +> > > > > > > first, +> +> > > > > > > autogenerated virtio-net-ccw device, but not for the second +> +> > > > > > > one, and +> +> > > > > > > why virtio-net-pci doesn't show these problems. The only +> +> > > > > > > difference +> +> > > > > > > between -ccw and -pci that comes to my mind here is that config +> +> > > > > > > space +> +> > > > > > > accesses for ccw are done via an asynchronous operation, so +> +> > > > > > > timing +> +> > > > > > > might be different. +> +> > > > > > Hopefully Jason has an idea. Could you post a full command line +> +> > > > > > please? Do you need a working guest to trigger this? Does this +> +> > > > > > trigger +> +> > > > > > on an x86 host? +> +> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using +> +> > > > > +> +> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu +> +> > > > > qemu,zpci=on +> +> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +> > > > > -device +> +> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +> > > > > -device virtio-net-ccw +> +> > > > > +> +> > > > > It seems it needs the guest actually doing something with the nics; +> +> > > > > I +> +> > > > > cannot reproduce the crash if I use the old advent calendar moon +> +> > > > > buggy +> +> > > > > image and just add a virtio-net-ccw device. +> +> > > > > +> +> > > > > (I don't think it's a problem with my local build, as I see the +> +> > > > > problem +> +> > > > > both on my laptop and on an LPAR.) +> +> > > > It looks to me we forget the check the existence of peer. +> +> > > > +> +> > > > Please try the attached patch to see if it works. +> +> > > Thanks, that patch gets my guest up and running again. So, FWIW, +> +> > > +> +> > > Tested-by: Cornelia Huck <cohuck@redhat.com> +> +> > > +> +> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated +> +> > > virtio-net-ccw device)? +> +> > +> +> > It can be hit with virtio-net-pci as well (just start without peer). +> +> Hm, I had not been able to reproduce the crash with a 'naked' -device +> +> virtio-net-pci. But checking seems to be the right idea anyway. +> +> +> +Sorry for being unclear, I meant for networking part, you just need start +> +without peer, and you need a real guest (any Linux) that is trying to access +> +the config space of virtio-net. +> +> +Thanks +A pxe guest will do it, but that doesn't support ccw, right? + +I'm still unclear why this triggers with ccw but not pci - +any idea? + +> +> +> +> +> > For autogenerated virtio-net-cww, I think the reason is that it has +> +> > already had a peer set. +> +> Ok, that might well be. +> +> +> +> + +On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote: +On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck<cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. +Sorry for being unclear, I meant for networking part, you just need start +without peer, and you need a real guest (any Linux) that is trying to access +the config space of virtio-net. + +Thanks +A pxe guest will do it, but that doesn't support ccw, right? +Yes, it depends on the cli actually. +I'm still unclear why this triggers with ccw but not pci - +any idea? +I don't test pxe but I can reproduce this with pci (just start a linux +guest without a peer). +Thanks + +On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote: +> +> +On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote: +> +> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +> +> > On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +> +> > > On Mon, 27 Jul 2020 15:38:12 +0800 +> +> > > Jason Wang<jasowang@redhat.com> wrote: +> +> > > +> +> > > > On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +> +> > > > > On Sat, 25 Jul 2020 08:40:07 +0800 +> +> > > > > Jason Wang<jasowang@redhat.com> wrote: +> +> > > > > > On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +> +> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400 +> +> > > > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +> +> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400 +> +> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com> wrote: +> +> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck +> +> > > > > > > > > > wrote: +> +> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device +> +> > > > > > > > > > > (i.e. adding +> +> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated +> +> > > > > > > > > > > device), I get +> +> > > > > > > > > > > a segfault. gdb points to +> +> > > > > > > > > > > +> +> > > > > > > > > > > #0 0x000055d6ab52681d in virtio_net_get_config +> +> > > > > > > > > > > (vdev=<optimized out>, +> +> > > > > > > > > > > config=0x55d6ad9e3f80 "RT") at +> +> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146 +> +> > > > > > > > > > > 146 if (nc->peer->info->type == +> +> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) { +> +> > > > > > > > > > > +> +> > > > > > > > > > > (backtrace doesn't go further) +> +> > > > > > > > > The core was incomplete, but running under gdb directly +> +> > > > > > > > > shows that it +> +> > > > > > > > > is just a bog-standard config space access (first for that +> +> > > > > > > > > device). +> +> > > > > > > > > +> +> > > > > > > > > The cause of the crash is that nc->peer is not set... no +> +> > > > > > > > > idea how that +> +> > > > > > > > > can happen, not that familiar with that part of QEMU. +> +> > > > > > > > > (Should the code +> +> > > > > > > > > check, or is that really something that should not happen?) +> +> > > > > > > > > +> +> > > > > > > > > What I don't understand is why it is set correctly for the +> +> > > > > > > > > first, +> +> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second +> +> > > > > > > > > one, and +> +> > > > > > > > > why virtio-net-pci doesn't show these problems. The only +> +> > > > > > > > > difference +> +> > > > > > > > > between -ccw and -pci that comes to my mind here is that +> +> > > > > > > > > config space +> +> > > > > > > > > accesses for ccw are done via an asynchronous operation, so +> +> > > > > > > > > timing +> +> > > > > > > > > might be different. +> +> > > > > > > > Hopefully Jason has an idea. Could you post a full command +> +> > > > > > > > line +> +> > > > > > > > please? Do you need a working guest to trigger this? Does +> +> > > > > > > > this trigger +> +> > > > > > > > on an x86 host? +> +> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using +> +> > > > > > > +> +> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg +> +> > > > > > > -cpu qemu,zpci=on +> +> > > > > > > -m 1024 -nographic -device +> +> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +> +> > > > > > > -drive +> +> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +> +> > > > > > > -device +> +> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +> +> > > > > > > -device virtio-net-ccw +> +> > > > > > > +> +> > > > > > > It seems it needs the guest actually doing something with the +> +> > > > > > > nics; I +> +> > > > > > > cannot reproduce the crash if I use the old advent calendar +> +> > > > > > > moon buggy +> +> > > > > > > image and just add a virtio-net-ccw device. +> +> > > > > > > +> +> > > > > > > (I don't think it's a problem with my local build, as I see the +> +> > > > > > > problem +> +> > > > > > > both on my laptop and on an LPAR.) +> +> > > > > > It looks to me we forget the check the existence of peer. +> +> > > > > > +> +> > > > > > Please try the attached patch to see if it works. +> +> > > > > Thanks, that patch gets my guest up and running again. So, FWIW, +> +> > > > > +> +> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com> +> +> > > > > +> +> > > > > Any idea why this did not hit with virtio-net-pci (or the +> +> > > > > autogenerated +> +> > > > > virtio-net-ccw device)? +> +> > > > It can be hit with virtio-net-pci as well (just start without peer). +> +> > > Hm, I had not been able to reproduce the crash with a 'naked' -device +> +> > > virtio-net-pci. But checking seems to be the right idea anyway. +> +> > Sorry for being unclear, I meant for networking part, you just need start +> +> > without peer, and you need a real guest (any Linux) that is trying to +> +> > access +> +> > the config space of virtio-net. +> +> > +> +> > Thanks +> +> A pxe guest will do it, but that doesn't support ccw, right? +> +> +> +Yes, it depends on the cli actually. +> +> +> +> +> +> I'm still unclear why this triggers with ccw but not pci - +> +> any idea? +> +> +> +I don't test pxe but I can reproduce this with pci (just start a linux guest +> +without a peer). +> +> +Thanks +> +Might be a good addition to a unit test. Not sure what would the +test do exactly: just make sure guest runs? Looks like a lot of work +for an empty test ... maybe we can poke at the guest config with +qtest commands at least. + +-- +MST + +On 2020/7/27 ä¸å9:16, Michael S. Tsirkin wrote: +On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote: +On 2020/7/27 ä¸å7:43, Michael S. Tsirkin wrote: +On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote: +On 2020/7/27 ä¸å4:41, Cornelia Huck wrote: +On Mon, 27 Jul 2020 15:38:12 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/27 ä¸å2:43, Cornelia Huck wrote: +On Sat, 25 Jul 2020 08:40:07 +0800 +Jason Wang<jasowang@redhat.com> wrote: +On 2020/7/24 ä¸å11:34, Cornelia Huck wrote: +On Fri, 24 Jul 2020 11:17:57 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote: +On Fri, 24 Jul 2020 09:30:58 -0400 +"Michael S. Tsirkin"<mst@redhat.com> wrote: +On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote: +When I start qemu with a second virtio-net-ccw device (i.e. adding +-device virtio-net-ccw in addition to the autogenerated device), I get +a segfault. gdb points to + +#0 0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, + config=0x55d6ad9e3f80 "RT") at +/home/cohuck/git/qemu/hw/net/virtio-net.c:146 +146 if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) { + +(backtrace doesn't go further) +The core was incomplete, but running under gdb directly shows that it +is just a bog-standard config space access (first for that device). + +The cause of the crash is that nc->peer is not set... no idea how that +can happen, not that familiar with that part of QEMU. (Should the code +check, or is that really something that should not happen?) + +What I don't understand is why it is set correctly for the first, +autogenerated virtio-net-ccw device, but not for the second one, and +why virtio-net-pci doesn't show these problems. The only difference +between -ccw and -pci that comes to my mind here is that config space +accesses for ccw are done via an asynchronous operation, so timing +might be different. +Hopefully Jason has an idea. Could you post a full command line +please? Do you need a working guest to trigger this? Does this trigger +on an x86 host? +Yes, it does trigger with tcg-on-x86 as well. I've been using + +s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on +-m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 +-drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 +-device +scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1 +-device virtio-net-ccw + +It seems it needs the guest actually doing something with the nics; I +cannot reproduce the crash if I use the old advent calendar moon buggy +image and just add a virtio-net-ccw device. + +(I don't think it's a problem with my local build, as I see the problem +both on my laptop and on an LPAR.) +It looks to me we forget the check the existence of peer. + +Please try the attached patch to see if it works. +Thanks, that patch gets my guest up and running again. So, FWIW, + +Tested-by: Cornelia Huck<cohuck@redhat.com> + +Any idea why this did not hit with virtio-net-pci (or the autogenerated +virtio-net-ccw device)? +It can be hit with virtio-net-pci as well (just start without peer). +Hm, I had not been able to reproduce the crash with a 'naked' -device +virtio-net-pci. But checking seems to be the right idea anyway. +Sorry for being unclear, I meant for networking part, you just need start +without peer, and you need a real guest (any Linux) that is trying to access +the config space of virtio-net. + +Thanks +A pxe guest will do it, but that doesn't support ccw, right? +Yes, it depends on the cli actually. +I'm still unclear why this triggers with ccw but not pci - +any idea? +I don't test pxe but I can reproduce this with pci (just start a linux guest +without a peer). + +Thanks +Might be a good addition to a unit test. Not sure what would the +test do exactly: just make sure guest runs? Looks like a lot of work +for an empty test ... maybe we can poke at the guest config with +qtest commands at least. +That should work or we can simply extend the exist virtio-net qtest to +do that. +Thanks + |