summary refs log tree commit diff stats
path: root/results/classifier/009/performance
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-07-03 19:39:53 +0200
committerChristian Krinitsin <mail@krinitsin.com>2025-07-03 19:39:53 +0200
commitdee4dcba78baf712cab403d47d9db319ab7f95d6 (patch)
tree418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/009/performance
parent4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff)
downloadqemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz
qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip
restructure results
Diffstat (limited to 'results/classifier/009/performance')
-rw-r--r--results/classifier/009/performance/79834768419
-rw-r--r--results/classifier/009/performance/806043141490
2 files changed, 0 insertions, 1909 deletions
diff --git a/results/classifier/009/performance/79834768 b/results/classifier/009/performance/79834768
deleted file mode 100644
index 95c9f99ec..000000000
--- a/results/classifier/009/performance/79834768
+++ /dev/null
@@ -1,419 +0,0 @@
-performance: 0.952
-debug: 0.943
-other: 0.943
-permissions: 0.939
-graphic: 0.933
-semantic: 0.920
-PID: 0.916
-device: 0.915
-socket: 0.912
-files: 0.904
-vnc: 0.885
-boot: 0.880
-KVM: 0.840
-network: 0.830
-
-[Qemu-devel] [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-Hi,
-
-We hit a bug in our test while run PCMark 10 in a windows 7 VM,
-The VM got stuck and the wallclock was hang after several minutes running
-PCMark 10 in it.
-It is quite easily to reproduce the bug with the upstream KVM and Qemu.
-
-We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
-to
-Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
-
-static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
-                  int irq_level, bool line_status)
-{
-… …
-         if (!irq_level) {
-                  ioapic->irr &= ~mask;
-                  ret = 1;
-                  goto out;
-         }
-… …
-         if ((edge && old_irr == ioapic->irr) ||
-             (!edge && entry.fields.remote_irr)) {
-                  ret = 0;
-                  goto out;
-         }
-
-According to RTC spec, after RTC injects a High level irq, OS will read CMOS’s
-register C to to clear the irq flag, and pull down the irq electric pin.
-
-For Qemu, we will emulate the reading operation in cmos_ioport_read(),
-but Guest OS will fire a write operation before to tell which register will be 
-read
-after this write, where we use s->cmos_index to record the following register 
-to read.
-
-But in our test, we found that there is a possible situation that Vcpu fails to 
-read
-RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
-registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
-so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
-but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
-it changes s->cmos_index to RTC_YEAR by a writing action.
-The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
-will miss
-calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
-inject RTC irq,
-and Windows VM will hang.
-static void cmos_ioport_write(void *opaque, hwaddr addr,
-                              uint64_t data, unsigned size)
-{
-    RTCState *s = opaque;
-
-    if ((addr & 1) == 0) {
-        s->cmos_index = data & 0x7f;
-    }
-……
-static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
-                                 unsigned size)
-{
-    RTCState *s = opaque;
-    int ret;
-    if ((addr & 1) == 0) {
-        return 0xff;
-    } else {
-        switch(s->cmos_index) {
-
-According to CMOS spec, ‘any write to PROT 0070h should be followed by an 
-action to PROT 0071h or the RTC
-Will be RTC will be left in an unknown state’, but it seems that we can not 
-ensure this sequence in qemu/kvm.
-
-Any ideas ?
-
-Thanks,
-Hailiang
-
-Pls see the trace of kvm_pio:
-
-       CPU 1/KVM-15567 [003] .... 209311.762579: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762582: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x89
-       CPU 1/KVM-15567 [003] .... 209311.762590: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x17
-       CPU 0/KVM-15566 [005] .... 209311.762611: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0xc
-       CPU 1/KVM-15567 [003] .... 209311.762615: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762619: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x88
-       CPU 1/KVM-15567 [003] .... 209311.762627: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x12
-       CPU 0/KVM-15566 [005] .... 209311.762632: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x12
-       CPU 1/KVM-15567 [003] .... 209311.762633: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 0/KVM-15566 [005] .... 209311.762634: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0xc           <--- Firstly write to 0x70, cmo_index = 0xc & 
-0x7f = 0xc
-       CPU 1/KVM-15567 [003] .... 209311.762636: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x86       <-- Secondly write to 0x70, cmo_index = 0x86 & 
-0x7f = 0x6, cover the cmo_index result of first time
-       CPU 0/KVM-15566 [005] .... 209311.762641: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x6      <--  vcpu0 read 0x6 because cmo_index is 0x6 now
-       CPU 1/KVM-15567 [003] .... 209311.762644: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x6     <-  vcpu1 read 0x6
-       CPU 1/KVM-15567 [003] .... 209311.762649: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762669: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x87
-       CPU 1/KVM-15567 [003] .... 209311.762678: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x1
-       CPU 1/KVM-15567 [003] .... 209311.762683: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762686: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x84
-       CPU 1/KVM-15567 [003] .... 209311.762693: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x10
-       CPU 1/KVM-15567 [003] .... 209311.762699: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762702: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x82
-       CPU 1/KVM-15567 [003] .... 209311.762709: kvm_pio: pio_read at 0x71 size 
-1 count 1 val 0x25
-       CPU 1/KVM-15567 [003] .... 209311.762714: kvm_pio: pio_read at 0x70 size 
-1 count 1 val 0xff
-       CPU 1/KVM-15567 [003] .... 209311.762717: kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x80
-
-
-Regards,
--Gonglei
-
-From: Zhanghailiang
-Sent: Friday, December 01, 2017 3:03 AM
-To: address@hidden; address@hidden; Paolo Bonzini
-Cc: Huangweidong (C); Gonglei (Arei); wangxin (U); Xiexiangyou
-Subject: [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-Hi,
-
-We hit a bug in our test while run PCMark 10 in a windows 7 VM,
-The VM got stuck and the wallclock was hang after several minutes running
-PCMark 10 in it.
-It is quite easily to reproduce the bug with the upstream KVM and Qemu.
-
-We found that KVM can not inject any RTC irq to VM after it was hang, it fails 
-to
-Deliver irq in ioapic_set_irq() because RTC irq is still pending in ioapic->irr.
-
-static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
-                  int irq_level, bool line_status)
-{
-… …
-         if (!irq_level) {
-                  ioapic->irr &= ~mask;
-                  ret = 1;
-                  goto out;
-         }
-… …
-         if ((edge && old_irr == ioapic->irr) ||
-             (!edge && entry.fields.remote_irr)) {
-                  ret = 0;
-                  goto out;
-         }
-
-According to RTC spec, after RTC injects a High level irq, OS will read CMOS’s
-register C to to clear the irq flag, and pull down the irq electric pin.
-
-For Qemu, we will emulate the reading operation in cmos_ioport_read(),
-but Guest OS will fire a write operation before to tell which register will be 
-read
-after this write, where we use s->cmos_index to record the following register 
-to read.
-
-But in our test, we found that there is a possible situation that Vcpu fails to 
-read
-RTC_REG_C to clear irq, This could happens while two VCpus are writing/reading
-registers at the same time, for example, vcpu 0 is trying to read RTC_REG_C,
-so it write RTC_REG_C first, where the s->cmos_index will be RTC_REG_C,
-but before it tries to read register C, another vcpu1 is going to read RTC_YEAR,
-it changes s->cmos_index to RTC_YEAR by a writing action.
-The next operation of vcpu0 will be lead to read RTC_YEAR, In this case, we 
-will miss
-calling qemu_irq_lower(s->irq) to clear the irq. After this, kvm will never 
-inject RTC irq,
-and Windows VM will hang.
-static void cmos_ioport_write(void *opaque, hwaddr addr,
-                              uint64_t data, unsigned size)
-{
-    RTCState *s = opaque;
-
-    if ((addr & 1) == 0) {
-        s->cmos_index = data & 0x7f;
-    }
-……
-static uint64_t cmos_ioport_read(void *opaque, hwaddr addr,
-                                 unsigned size)
-{
-    RTCState *s = opaque;
-    int ret;
-    if ((addr & 1) == 0) {
-        return 0xff;
-    } else {
-        switch(s->cmos_index) {
-
-According to CMOS spec, ‘any write to PROT 0070h should be followed by an 
-action to PROT 0071h or the RTC
-Will be RTC will be left in an unknown state’, but it seems that we can not 
-ensure this sequence in qemu/kvm.
-
-Any ideas ?
-
-Thanks,
-Hailiang
-
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
->
-First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
->
-       CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
->
-Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567
->
-kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
->
-cmos_index is 0x6 now:>        CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size
->
-1 count 1 val 0x6> vcpu1 read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read
->
-at 0x71 size 1 count 1 val 0x6
-This seems to be a Windows bug.  The easiest workaround that I
-can think of is to clear the interrupts already when 0xc is written,
-without waiting for the read (because REG_C can only be read).
-
-What do you think?
-
-Thanks,
-
-Paolo
-
-I also think it's windows bug, the problem is that it doesn't occur on xen 
-platform. And there are some other works need to be done while reading REG_C. 
-So I wrote that patch.
-
-Thanks,
-Gonglei
-发件人:Paolo Bonzini
-收件人:龚磊,张海亮,qemu-devel,Michael S. Tsirkin
-抄送:黄伟栋,王欣,谢祥有
-时间:2017-12-02 01:10:08
-主题:Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
->
-First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
->
-CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
->
-Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567
->
-kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
->
-cmos_index is 0x6 now:>        CPU 0/KVM-15566 kvm_pio: pio_read at 0x71 size
->
-1 count 1 val 0x6> vcpu1 read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read
->
-at 0x71 size 1 count 1 val 0x6
-This seems to be a Windows bug.  The easiest workaround that I
-can think of is to clear the interrupts already when 0xc is written,
-without waiting for the read (because REG_C can only be read).
-
-What do you think?
-
-Thanks,
-
-Paolo
-
-On 01/12/2017 18:45, Gonglei (Arei) wrote:
->
-I also think it's windows bug, the problem is that it doesn't occur on
->
-xen platform.
-It's a race, it may just be that RTC PIO is faster in Xen because it's
-implemented in the hypervisor.
-
-I will try reporting it to Microsoft.
-
-Thanks,
-
-Paolo
-
->
-Thanks,
->
-Gonglei
->
-*发件人:*Paolo Bonzini
->
-*收件人:*龚磊,张海亮,qemu-devel,Michael S. Tsirkin
->
-*抄送:*黄伟栋,王欣,谢祥有
->
-*时间:*2017-12-02 01:10:08
->
-*主题:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
->
->
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
->
-> First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
->
->        CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc>
->
-> Second write to 0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567
->
-> kvm_pio: pio_write at 0x70 size 1 count 1 val 0x86> vcpu0 read 0x6 because
->
-> cmos_index is 0x6 now:>        CPU 0/KVM-15566 kvm_pio: pio_read at 0x71
->
-> size 1 count 1 val 0x6> vcpu1
->
-read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
->
-1 val 0x6
->
-This seems to be a Windows bug.  The easiest workaround that I
->
-can think of is to clear the interrupts already when 0xc is written,
->
-without waiting for the read (because REG_C can only be read).
->
->
-What do you think?
->
->
-Thanks,
->
->
-Paolo
-
-On 2017/12/2 2:37, Paolo Bonzini wrote:
-On 01/12/2017 18:45, Gonglei (Arei) wrote:
-I also think it's windows bug, the problem is that it doesn't occur on
-xen platform.
-It's a race, it may just be that RTC PIO is faster in Xen because it's
-implemented in the hypervisor.
-No, In Xen, it does not has such problem because it injects the RTC irq without
-checking whether its previous irq been cleared or not, which we do has such 
-checking
-in KVM.
-
-static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
-        int irq_level, bool line_status)
-{
-   ... ...
-    if (!irq_level) {
-        ioapic->irr &= ~mask; -->clear the RTC irq in irr, Or we will can not 
-inject RTC irq.
-        ret = 1;
-        goto out;
-    }
-
-I agree that we move the operation of clearing RTC irq from cmos_ioport_read() 
-to
-cmos_ioport_write() to ensure the action been done.
-
-Thanks,
-Hailiang
-I will try reporting it to Microsoft.
-
-Thanks,
-
-Paolo
-Thanks,
-Gonglei
-*发件人:*Paolo Bonzini
-*收件人:*龚磊,张海亮,qemu-devel,Michael S. Tsirkin
-*抄送:*黄伟栋,王欣,谢祥有
-*时间:*2017-12-02 01:10:08
-*主题:*Re: [BUG] Windows 7 got stuck easily while run PCMark10 application
-
-On 01/12/2017 08:08, Gonglei (Arei) wrote:
-First write to 0x70, cmos_index = 0xc & 0x7f = 0xc
-        CPU 0/KVM-15566 kvm_pio: pio_write at 0x70 size 1 count 1 val 0xc> Second write to 
-0x70, cmos_index = 0x86 & 0x7f = 0x6>        CPU 1/KVM-15567 kvm_pio: pio_write at 0x70 
-size 1 count 1 val 0x86> vcpu0 read 0x6 because cmos_index is 0x6 now:>        CPU 
-0/KVM-15566 kvm_pio: pio_read at 0x71 size 1 count 1 val 0x6> vcpu1
-read 0x6:>        CPU 1/KVM-15567 kvm_pio: pio_read at 0x71 size 1 count
-1 val 0x6
-This seems to be a Windows bug.  The easiest workaround that I
-can think of is to clear the interrupts already when 0xc is written,
-without waiting for the read (because REG_C can only be read).
-
-What do you think?
-
-Thanks,
-
-Paolo
-.
-
diff --git a/results/classifier/009/performance/80604314 b/results/classifier/009/performance/80604314
deleted file mode 100644
index 79b9997e8..000000000
--- a/results/classifier/009/performance/80604314
+++ /dev/null
@@ -1,1490 +0,0 @@
-performance: 0.919
-device: 0.917
-debug: 0.901
-graphic: 0.901
-other: 0.898
-PID: 0.896
-permissions: 0.892
-KVM: 0.891
-semantic: 0.890
-socket: 0.884
-vnc: 0.881
-network: 0.865
-files: 0.861
-boot: 0.860
-
-[BUG] vhost-vdpa: qemu-system-s390x crashes with second virtio-net-ccw device
-
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>, 
-    config=0x55d6ad9e3f80 "RT") at /home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
-the autogenerated virtio-net-ccw device is present) works. Specifying
-several "-device virtio-net-pci" works as well.
-
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
-works (in-between state does not compile).
-
-This is reproducible with tcg as well. Same problem both with
---enable-vhost-vdpa and --disable-vhost-vdpa.
-
-Have not yet tried to figure out what might be special with
-virtio-ccw... anyone have an idea?
-
-[This should probably be considered a blocker?]
-
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-When I start qemu with a second virtio-net-ccw device (i.e. adding
->
--device virtio-net-ccw in addition to the autogenerated device), I get
->
-a segfault. gdb points to
->
->
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-config=0x55d6ad9e3f80 "RT") at
->
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
-(backtrace doesn't go further)
->
->
-Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-the autogenerated virtio-net-ccw device is present) works. Specifying
->
-several "-device virtio-net-pci" works as well.
->
->
-Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-works (in-between state does not compile).
-Ouch. I didn't test all in-between states :(
-But I wish we had a 0-day instrastructure like kernel has,
-that catches things like that.
-
->
-This is reproducible with tcg as well. Same problem both with
->
---enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
-Have not yet tried to figure out what might be special with
->
-virtio-ccw... anyone have an idea?
->
->
-[This should probably be considered a blocker?]
-
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> a segfault. gdb points to
->
->
->
-> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->     config=0x55d6ad9e3f80 "RT") at
->
-> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> 146     if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->
->
-> (backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-
->
->
->
-> Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> several "-device virtio-net-pci" works as well.
->
->
->
-> Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> works (in-between state does not compile).
->
->
-Ouch. I didn't test all in-between states :(
->
-But I wish we had a 0-day instrastructure like kernel has,
->
-that catches things like that.
-Yep, that would be useful... so patchew only builds the complete series?
-
->
->
-> This is reproducible with tcg as well. Same problem both with
->
-> --enable-vhost-vdpa and --disable-vhost-vdpa.
->
->
->
-> Have not yet tried to figure out what might be special with
->
-> virtio-ccw... anyone have an idea?
->
->
->
-> [This should probably be considered a blocker?]
-I think so, as it makes s390x unusable with more that one
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-On Fri, 24 Jul 2020 09:30:58 -0400
->
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
-> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > a segfault. gdb points to
->
-> >
->
-> > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> >     config=0x55d6ad9e3f80 "RT") at
->
-> > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > 146           if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> >
->
-> > (backtrace doesn't go further)
->
->
-The core was incomplete, but running under gdb directly shows that it
->
-is just a bog-standard config space access (first for that device).
->
->
-The cause of the crash is that nc->peer is not set... no idea how that
->
-can happen, not that familiar with that part of QEMU. (Should the code
->
-check, or is that really something that should not happen?)
->
->
-What I don't understand is why it is set correctly for the first,
->
-autogenerated virtio-net-ccw device, but not for the second one, and
->
-why virtio-net-pci doesn't show these problems. The only difference
->
-between -ccw and -pci that comes to my mind here is that config space
->
-accesses for ccw are done via an asynchronous operation, so timing
->
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-
->
-> >
->
-> > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > several "-device virtio-net-pci" works as well.
->
-> >
->
-> > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > works (in-between state does not compile).
->
->
->
-> Ouch. I didn't test all in-between states :(
->
-> But I wish we had a 0-day instrastructure like kernel has,
->
-> that catches things like that.
->
->
-Yep, that would be useful... so patchew only builds the complete series?
->
->
->
->
-> > This is reproducible with tcg as well. Same problem both with
->
-> > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> >
->
-> > Have not yet tried to figure out what might be special with
->
-> > virtio-ccw... anyone have an idea?
->
-> >
->
-> > [This should probably be considered a blocker?]
->
->
-I think so, as it makes s390x unusable with more that one
->
-virtio-net-ccw device, and I don't even see a workaround.
-
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin" <mst@redhat.com> wrote:
-
->
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 09:30:58 -0400
->
-> "Michael S. Tsirkin" <mst@redhat.com> wrote:
->
->
->
-> > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > When I start qemu with a second virtio-net-ccw device (i.e. adding
->
-> > > -device virtio-net-ccw in addition to the autogenerated device), I get
->
-> > > a segfault. gdb points to
->
-> > >
->
-> > > #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
-> > >     config=0x55d6ad9e3f80 "RT") at
->
-> > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > >
->
-> > > (backtrace doesn't go further)
->
->
->
-> The core was incomplete, but running under gdb directly shows that it
->
-> is just a bog-standard config space access (first for that device).
->
->
->
-> The cause of the crash is that nc->peer is not set... no idea how that
->
-> can happen, not that familiar with that part of QEMU. (Should the code
->
-> check, or is that really something that should not happen?)
->
->
->
-> What I don't understand is why it is set correctly for the first,
->
-> autogenerated virtio-net-ccw device, but not for the second one, and
->
-> why virtio-net-pci doesn't show these problems. The only difference
->
-> between -ccw and -pci that comes to my mind here is that config space
->
-> accesses for ccw are done via an asynchronous operation, so timing
->
-> might be different.
->
->
-Hopefully Jason has an idea. Could you post a full command line
->
-please? Do you need a working guest to trigger this? Does this trigger
->
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on 
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001 
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0 
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
- 
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-
->
->
-> > >
->
-> > > Starting qemu with no additional "-device virtio-net-ccw" (i.e., only
->
-> > > the autogenerated virtio-net-ccw device is present) works. Specifying
->
-> > > several "-device virtio-net-pci" works as well.
->
-> > >
->
-> > > Things break with 1e0a84ea49b6 ("vhost-vdpa: introduce vhost-vdpa net
->
-> > > client"), 38140cc4d971 ("vhost_net: introduce set_config & get_config")
->
-> > > works (in-between state does not compile).
->
-> >
->
-> > Ouch. I didn't test all in-between states :(
->
-> > But I wish we had a 0-day instrastructure like kernel has,
->
-> > that catches things like that.
->
->
->
-> Yep, that would be useful... so patchew only builds the complete series?
->
->
->
-> >
->
-> > > This is reproducible with tcg as well. Same problem both with
->
-> > > --enable-vhost-vdpa and --disable-vhost-vdpa.
->
-> > >
->
-> > > Have not yet tried to figure out what might be special with
->
-> > > virtio-ccw... anyone have an idea?
->
-> > >
->
-> > > [This should probably be considered a blocker?]
->
->
->
-> I think so, as it makes s390x unusable with more that one
->
-> virtio-net-ccw device, and I don't even see a workaround.
->
-
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-     config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-
-Thanks
-0001-virtio-net-check-the-existence-of-peer-before-accesi.patch
-Description:
-Text Data
-
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
-> On Fri, 24 Jul 2020 11:17:57 -0400
->
-> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->
->
->> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>> a segfault. gdb points to
->
->>>>>
->
->>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>      config=0x55d6ad9e3f80 "RT") at
->
->>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>> 146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>
->
->>>>> (backtrace doesn't go further)
->
->>> The core was incomplete, but running under gdb directly shows that it
->
->>> is just a bog-standard config space access (first for that device).
->
->>>
->
->>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>> check, or is that really something that should not happen?)
->
->>>
->
->>> What I don't understand is why it is set correctly for the first,
->
->>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>> why virtio-net-pci doesn't show these problems. The only difference
->
->>> between -ccw and -pci that comes to my mind here is that config space
->
->>> accesses for ccw are done via an asynchronous operation, so timing
->
->>> might be different.
->
->> Hopefully Jason has an idea. Could you post a full command line
->
->> please? Do you need a working guest to trigger this? Does this trigger
->
->> on an x86 host?
->
-> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->
->
-> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> qemu,zpci=on
->
-> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> -device
->
-> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> -device virtio-net-ccw
->
->
->
-> It seems it needs the guest actually doing something with the nics; I
->
-> cannot reproduce the crash if I use the old advent calendar moon buggy
->
-> image and just add a virtio-net-ccw device.
->
->
->
-> (I don't think it's a problem with my local build, as I see the problem
->
-> both on my laptop and on an LPAR.)
->
->
->
-It looks to me we forget the check the existence of peer.
->
->
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-      config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Thanks
-
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-
->
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
->
-> On Sat, 25 Jul 2020 08:40:07 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
->> On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
->>> On Fri, 24 Jul 2020 11:17:57 -0400
->
->>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>
->
->>>> On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
->>>>> On Fri, 24 Jul 2020 09:30:58 -0400
->
->>>>> "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
->>>>>
->
->>>>>> On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
->>>>>>> When I start qemu with a second virtio-net-ccw device (i.e. adding
->
->>>>>>> -device virtio-net-ccw in addition to the autogenerated device), I get
->
->>>>>>> a segfault. gdb points to
->
->>>>>>>
->
->>>>>>> #0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
->
->>>>>>>       config=0x55d6ad9e3f80 "RT") at
->
->>>>>>> /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
->>>>>>> 146       if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
->
->>>>>>>
->
->>>>>>> (backtrace doesn't go further)
->
->>>>> The core was incomplete, but running under gdb directly shows that it
->
->>>>> is just a bog-standard config space access (first for that device).
->
->>>>>
->
->>>>> The cause of the crash is that nc->peer is not set... no idea how that
->
->>>>> can happen, not that familiar with that part of QEMU. (Should the code
->
->>>>> check, or is that really something that should not happen?)
->
->>>>>
->
->>>>> What I don't understand is why it is set correctly for the first,
->
->>>>> autogenerated virtio-net-ccw device, but not for the second one, and
->
->>>>> why virtio-net-pci doesn't show these problems. The only difference
->
->>>>> between -ccw and -pci that comes to my mind here is that config space
->
->>>>> accesses for ccw are done via an asynchronous operation, so timing
->
->>>>> might be different.
->
->>>> Hopefully Jason has an idea. Could you post a full command line
->
->>>> please? Do you need a working guest to trigger this? Does this trigger
->
->>>> on an x86 host?
->
->>> Yes, it does trigger with tcg-on-x86 as well. I've been using
->
->>>
->
->>> s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
->>> qemu,zpci=on
->
->>> -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
->>> -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
->>> -device
->
->>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
->>> -device virtio-net-ccw
->
->>>
->
->>> It seems it needs the guest actually doing something with the nics; I
->
->>> cannot reproduce the crash if I use the old advent calendar moon buggy
->
->>> image and just add a virtio-net-ccw device.
->
->>>
->
->>> (I don't think it's a problem with my local build, as I see the problem
->
->>> both on my laptop and on an LPAR.)
->
->>
->
->> It looks to me we forget the check the existence of peer.
->
->>
->
->> Please try the attached patch to see if it works.
->
-> Thanks, that patch gets my guest up and running again. So, FWIW,
->
->
->
-> Tested-by: Cornelia Huck <cohuck@redhat.com>
->
->
->
-> Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> virtio-net-ccw device)?
->
->
->
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-
->
->
-For autogenerated virtio-net-cww, I think the reason is that it has
->
-already had a peer set.
-Ok, that might well be.
-
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang <jasowang@redhat.com> wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>  wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-       config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck <cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need
-start without peer, and you need a real guest (any Linux) that is trying
-to access the config space of virtio-net.
-Thanks
-For autogenerated virtio-net-cww, I think the reason is that it has
-already had a peer set.
-Ok, that might well be.
-
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
->
-> On Mon, 27 Jul 2020 15:38:12 +0800
->
-> Jason Wang <jasowang@redhat.com> wrote:
->
->
->
-> > On 2020/7/27 下午2:43, Cornelia Huck wrote:
->
-> > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > Jason Wang <jasowang@redhat.com> wrote:
->
-> > > > On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
-> > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>  wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > When I start qemu with a second virtio-net-ccw device (i.e.
->
-> > > > > > > > > adding
->
-> > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > device), I get
->
-> > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > >
->
-> > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > >        config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > 146     if (nc->peer->info->type ==
->
-> > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > >
->
-> > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > The core was incomplete, but running under gdb directly shows
->
-> > > > > > > that it
->
-> > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > device).
->
-> > > > > > >
->
-> > > > > > > The cause of the crash is that nc->peer is not set... no idea
->
-> > > > > > > how that
->
-> > > > > > > can happen, not that familiar with that part of QEMU. (Should
->
-> > > > > > > the code
->
-> > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > >
->
-> > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > first,
->
-> > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > one, and
->
-> > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > difference
->
-> > > > > > > between -ccw and -pci that comes to my mind here is that config
->
-> > > > > > > space
->
-> > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > timing
->
-> > > > > > > might be different.
->
-> > > > > > Hopefully Jason has an idea. Could you post a full command line
->
-> > > > > > please? Do you need a working guest to trigger this? Does this
->
-> > > > > > trigger
->
-> > > > > > on an x86 host?
->
-> > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > >
->
-> > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu
->
-> > > > > qemu,zpci=on
->
-> > > > > -m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > -drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > -device
->
-> > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > -device virtio-net-ccw
->
-> > > > >
->
-> > > > > It seems it needs the guest actually doing something with the nics;
->
-> > > > > I
->
-> > > > > cannot reproduce the crash if I use the old advent calendar moon
->
-> > > > > buggy
->
-> > > > > image and just add a virtio-net-ccw device.
->
-> > > > >
->
-> > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > problem
->
-> > > > > both on my laptop and on an LPAR.)
->
-> > > > It looks to me we forget the check the existence of peer.
->
-> > > >
->
-> > > > Please try the attached patch to see if it works.
->
-> > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > >
->
-> > > Tested-by: Cornelia Huck <cohuck@redhat.com>
->
-> > >
->
-> > > Any idea why this did not hit with virtio-net-pci (or the autogenerated
->
-> > > virtio-net-ccw device)?
->
-> >
->
-> > It can be hit with virtio-net-pci as well (just start without peer).
->
-> Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> virtio-net-pci. But checking seems to be the right idea anyway.
->
->
->
-Sorry for being unclear, I meant for networking part, you just need start
->
-without peer, and you need a real guest (any Linux) that is trying to access
->
-the config space of virtio-net.
->
->
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-
->
->
->
->
-> > For autogenerated virtio-net-cww, I think the reason is that it has
->
-> > already had a peer set.
->
-> Ok, that might well be.
->
->
->
->
-
-On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-        config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux
-guest without a peer).
-Thanks
-
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
->
->
-On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
->
-> On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
->
-> > On 2020/7/27 下午4:41, Cornelia Huck wrote:
->
-> > > On Mon, 27 Jul 2020 15:38:12 +0800
->
-> > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > >
->
-> > > > On 2020/7/27 下午2:43, Cornelia Huck wrote:
->
-> > > > > On Sat, 25 Jul 2020 08:40:07 +0800
->
-> > > > > Jason Wang<jasowang@redhat.com>  wrote:
->
-> > > > > > On 2020/7/24 下午11:34, Cornelia Huck wrote:
->
-> > > > > > > On Fri, 24 Jul 2020 11:17:57 -0400
->
-> > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
->
-> > > > > > > > > On Fri, 24 Jul 2020 09:30:58 -0400
->
-> > > > > > > > > "Michael S. Tsirkin"<mst@redhat.com>   wrote:
->
-> > > > > > > > > > On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck
->
-> > > > > > > > > > wrote:
->
-> > > > > > > > > > > When I start qemu with a second virtio-net-ccw device
->
-> > > > > > > > > > > (i.e. adding
->
-> > > > > > > > > > > -device virtio-net-ccw in addition to the autogenerated
->
-> > > > > > > > > > > device), I get
->
-> > > > > > > > > > > a segfault. gdb points to
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > #0  0x000055d6ab52681d in virtio_net_get_config
->
-> > > > > > > > > > > (vdev=<optimized out>,
->
-> > > > > > > > > > >         config=0x55d6ad9e3f80 "RT") at
->
-> > > > > > > > > > > /home/cohuck/git/qemu/hw/net/virtio-net.c:146
->
-> > > > > > > > > > > 146         if (nc->peer->info->type ==
->
-> > > > > > > > > > > NET_CLIENT_DRIVER_VHOST_VDPA) {
->
-> > > > > > > > > > >
->
-> > > > > > > > > > > (backtrace doesn't go further)
->
-> > > > > > > > > The core was incomplete, but running under gdb directly
->
-> > > > > > > > > shows that it
->
-> > > > > > > > > is just a bog-standard config space access (first for that
->
-> > > > > > > > > device).
->
-> > > > > > > > >
->
-> > > > > > > > > The cause of the crash is that nc->peer is not set... no
->
-> > > > > > > > > idea how that
->
-> > > > > > > > > can happen, not that familiar with that part of QEMU.
->
-> > > > > > > > > (Should the code
->
-> > > > > > > > > check, or is that really something that should not happen?)
->
-> > > > > > > > >
->
-> > > > > > > > > What I don't understand is why it is set correctly for the
->
-> > > > > > > > > first,
->
-> > > > > > > > > autogenerated virtio-net-ccw device, but not for the second
->
-> > > > > > > > > one, and
->
-> > > > > > > > > why virtio-net-pci doesn't show these problems. The only
->
-> > > > > > > > > difference
->
-> > > > > > > > > between -ccw and -pci that comes to my mind here is that
->
-> > > > > > > > > config space
->
-> > > > > > > > > accesses for ccw are done via an asynchronous operation, so
->
-> > > > > > > > > timing
->
-> > > > > > > > > might be different.
->
-> > > > > > > > Hopefully Jason has an idea. Could you post a full command
->
-> > > > > > > > line
->
-> > > > > > > > please? Do you need a working guest to trigger this? Does
->
-> > > > > > > > this trigger
->
-> > > > > > > > on an x86 host?
->
-> > > > > > > Yes, it does trigger with tcg-on-x86 as well. I've been using
->
-> > > > > > >
->
-> > > > > > > s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg
->
-> > > > > > > -cpu qemu,zpci=on
->
-> > > > > > > -m 1024 -nographic -device
->
-> > > > > > > virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
->
-> > > > > > > -drive
->
-> > > > > > > file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
->
-> > > > > > > -device
->
-> > > > > > > scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
->
-> > > > > > > -device virtio-net-ccw
->
-> > > > > > >
->
-> > > > > > > It seems it needs the guest actually doing something with the
->
-> > > > > > > nics; I
->
-> > > > > > > cannot reproduce the crash if I use the old advent calendar
->
-> > > > > > > moon buggy
->
-> > > > > > > image and just add a virtio-net-ccw device.
->
-> > > > > > >
->
-> > > > > > > (I don't think it's a problem with my local build, as I see the
->
-> > > > > > > problem
->
-> > > > > > > both on my laptop and on an LPAR.)
->
-> > > > > > It looks to me we forget the check the existence of peer.
->
-> > > > > >
->
-> > > > > > Please try the attached patch to see if it works.
->
-> > > > > Thanks, that patch gets my guest up and running again. So, FWIW,
->
-> > > > >
->
-> > > > > Tested-by: Cornelia Huck<cohuck@redhat.com>
->
-> > > > >
->
-> > > > > Any idea why this did not hit with virtio-net-pci (or the
->
-> > > > > autogenerated
->
-> > > > > virtio-net-ccw device)?
->
-> > > > It can be hit with virtio-net-pci as well (just start without peer).
->
-> > > Hm, I had not been able to reproduce the crash with a 'naked' -device
->
-> > > virtio-net-pci. But checking seems to be the right idea anyway.
->
-> > Sorry for being unclear, I meant for networking part, you just need start
->
-> > without peer, and you need a real guest (any Linux) that is trying to
->
-> > access
->
-> > the config space of virtio-net.
->
-> >
->
-> > Thanks
->
-> A pxe guest will do it, but that doesn't support ccw, right?
->
->
->
-Yes, it depends on the cli actually.
->
->
->
->
->
-> I'm still unclear why this triggers with ccw but not pci -
->
-> any idea?
->
->
->
-I don't test pxe but I can reproduce this with pci (just start a linux guest
->
-without a peer).
->
->
-Thanks
->
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-
--- 
-MST
-
-On 2020/7/27 下午9:16, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 08:44:09PM +0800, Jason Wang wrote:
-On 2020/7/27 下午7:43, Michael S. Tsirkin wrote:
-On Mon, Jul 27, 2020 at 04:51:23PM +0800, Jason Wang wrote:
-On 2020/7/27 下午4:41, Cornelia Huck wrote:
-On Mon, 27 Jul 2020 15:38:12 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/27 下午2:43, Cornelia Huck wrote:
-On Sat, 25 Jul 2020 08:40:07 +0800
-Jason Wang<jasowang@redhat.com>  wrote:
-On 2020/7/24 下午11:34, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 11:17:57 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 04:56:27PM +0200, Cornelia Huck wrote:
-On Fri, 24 Jul 2020 09:30:58 -0400
-"Michael S. Tsirkin"<mst@redhat.com>   wrote:
-On Fri, Jul 24, 2020 at 03:27:18PM +0200, Cornelia Huck wrote:
-When I start qemu with a second virtio-net-ccw device (i.e. adding
--device virtio-net-ccw in addition to the autogenerated device), I get
-a segfault. gdb points to
-
-#0  0x000055d6ab52681d in virtio_net_get_config (vdev=<optimized out>,
-         config=0x55d6ad9e3f80 "RT") at 
-/home/cohuck/git/qemu/hw/net/virtio-net.c:146
-146         if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_VDPA) {
-
-(backtrace doesn't go further)
-The core was incomplete, but running under gdb directly shows that it
-is just a bog-standard config space access (first for that device).
-
-The cause of the crash is that nc->peer is not set... no idea how that
-can happen, not that familiar with that part of QEMU. (Should the code
-check, or is that really something that should not happen?)
-
-What I don't understand is why it is set correctly for the first,
-autogenerated virtio-net-ccw device, but not for the second one, and
-why virtio-net-pci doesn't show these problems. The only difference
-between -ccw and -pci that comes to my mind here is that config space
-accesses for ccw are done via an asynchronous operation, so timing
-might be different.
-Hopefully Jason has an idea. Could you post a full command line
-please? Do you need a working guest to trigger this? Does this trigger
-on an x86 host?
-Yes, it does trigger with tcg-on-x86 as well. I've been using
-
-s390x-softmmu/qemu-system-s390x -M s390-ccw-virtio,accel=tcg -cpu qemu,zpci=on
--m 1024 -nographic -device virtio-scsi-ccw,id=scsi0,devno=fe.0.0001
--drive file=/path/to/image,format=qcow2,if=none,id=drive-scsi0-0-0-0
--device 
-scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1
--device virtio-net-ccw
-
-It seems it needs the guest actually doing something with the nics; I
-cannot reproduce the crash if I use the old advent calendar moon buggy
-image and just add a virtio-net-ccw device.
-
-(I don't think it's a problem with my local build, as I see the problem
-both on my laptop and on an LPAR.)
-It looks to me we forget the check the existence of peer.
-
-Please try the attached patch to see if it works.
-Thanks, that patch gets my guest up and running again. So, FWIW,
-
-Tested-by: Cornelia Huck<cohuck@redhat.com>
-
-Any idea why this did not hit with virtio-net-pci (or the autogenerated
-virtio-net-ccw device)?
-It can be hit with virtio-net-pci as well (just start without peer).
-Hm, I had not been able to reproduce the crash with a 'naked' -device
-virtio-net-pci. But checking seems to be the right idea anyway.
-Sorry for being unclear, I meant for networking part, you just need start
-without peer, and you need a real guest (any Linux) that is trying to access
-the config space of virtio-net.
-
-Thanks
-A pxe guest will do it, but that doesn't support ccw, right?
-Yes, it depends on the cli actually.
-I'm still unclear why this triggers with ccw but not pci -
-any idea?
-I don't test pxe but I can reproduce this with pci (just start a linux guest
-without a peer).
-
-Thanks
-Might be a good addition to a unit test. Not sure what would the
-test do exactly: just make sure guest runs? Looks like a lot of work
-for an empty test ... maybe we can poke at the guest config with
-qtest commands at least.
-That should work or we can simply extend the exist virtio-net qtest to
-do that.
-Thanks
-