summary refs log tree commit diff stats
path: root/results/classifier/118/none/1793791
diff options
context:
space:
mode:
Diffstat (limited to 'results/classifier/118/none/1793791')
-rw-r--r--results/classifier/118/none/1793791313
1 files changed, 313 insertions, 0 deletions
diff --git a/results/classifier/118/none/1793791 b/results/classifier/118/none/1793791
new file mode 100644
index 000000000..4b73d8f40
--- /dev/null
+++ b/results/classifier/118/none/1793791
@@ -0,0 +1,313 @@
+VMM: 0.613
+risc-v: 0.573
+mistranslation: 0.542
+semantic: 0.526
+user-level: 0.524
+hypervisor: 0.517
+register: 0.513
+KVM: 0.504
+virtual: 0.499
+ppc: 0.486
+graphic: 0.477
+arm: 0.475
+debug: 0.472
+TCG: 0.462
+performance: 0.457
+permissions: 0.451
+vnc: 0.450
+assembly: 0.449
+kernel: 0.404
+boot: 0.400
+network: 0.398
+architecture: 0.391
+x86: 0.390
+PID: 0.388
+device: 0.371
+peripherals: 0.358
+socket: 0.291
+files: 0.239
+i386: 0.178
+
+Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed
+
+Qemu version on both sides: 2.12.1
+Host A Linux: 4.9.76
+Host B Linux: 4.14.67
+
+While calling from Host A:
+virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all
+
+I get a qemu crash with:
+
+2018-09-21 16:12:23.073+0000: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990
+qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
+2018-09-21 16:12:41.230+0000: shutting down, reason=crashed
+2018-09-21 16:12:52.900+0000: shutting down, reason=failed
+
+It doesn't do it every time, but most of the time.
+
+Tested with Qemu 3.0.0 and this still happens.
+
+Also tested with kernel 4.9.128 on one side and 4.9.76 on the other thinking it might be a kernel 4.14 issue.
+
+On 9/21/18 11:42 AM, Matthew Schumacher wrote:
+> Public bug reported:
+> 
+> Qemu version on both sides: 2.12.1
+> Host A Linux: 4.9.76
+> Host B Linux: 4.14.67
+> 
+> While calling from Host A:
+> virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all
+
+Can you show the qemu command line generated by libvirt?
+
+> 
+> I get a qemu crash with:
+> 
+> 2018-09-21 16:12:23.073+0000: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990
+> qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
+
+That assertion is the client complaining that it things the server has 
+sent invalid data.
+
+> 2018-09-21 16:12:41.230+0000: shutting down, reason=crashed
+> 2018-09-21 16:12:52.900+0000: shutting down, reason=failed
+> 
+> It doesn't do it every time, but most of the time.
+
+Hmm - you're running under libvirt, so it's harder to inject command 
+line options, but I'd be interested in figuring out how to enable trace 
+output of nbd_* on both the server and the destination during the block 
+migration portion of 'virsh migrate', to see if the tail of that trace 
+when reproducing the failure gives any more insights into which side is 
+breaking the protocol.  I'm also trying to inspect the code in 
+nbd/server.c to see if the server can ever actually send a 
+NBD_REPLY_TYPE_NONE packet without setting the NBD_REPLY_FLAG_DONE bit, 
+which is what the assertion is complaining about.
+
+-- 
+Eric Blake, Principal Software Engineer
+Red Hat, Inc.           +1-919-301-3266
+Virtualization:  qemu.org | libvirt.org
+
+
+Hi Eric,
+
+Thanks for looking at this.  
+
+I looked at the nbd/server.c code and couldn't see how it could send a NBD_REPLY_TYPE_NONE packet without setting the NBD_REPLY_FLAG_DONE bit.  The only place NBD_REPLY_TYPE_NONE is set is on line 1603:
+
+    set_be_chunk(&chunk, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);
+
+Anyway, here is the command line generated:
+
+/usr/bin/qemu-system-x86_64 -name guest=dng-smokeping,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-dng-smokeping/master-key.aes -machine pc-1.1,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64,pmu=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object iothread,id=iothread1 -uuid 3d0e1603-ad08-4876-9d9f-2d563fac07ea -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/datastore/vm/dng-smokeping.raw,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1d:da:b9,bus=pci.0,addr=0x3 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:59 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on
+
+
+Is there some way to turn on NBD trace?  I don't see any trace code around the assert, so I'm guessing it would need to be written....   Is there a log event in QMP?  Can that be used to trace what is going on?  If so it would be easy to make libvirt log all of that, which should tell us what is going on...
+
+If that won't work, I can run the VM outside of libvirt and tell it to migrate over the QMP socket
+
+
+I'm back to trying to figure this out.  I can't use migrate and copy storage until this bug is fixed, so I'm pretty motivated.  Today I configured libvirt/qemu to dump the core, and I compiled qemu with debugging symbols.  Here is the backtrace.  I'm not sure it says anything we don't already know.
+
+I may try to hack in some more debugging later today, but my C is terrible.  Any other ideas on ways I can help?
+
+(gdb) bt full
+#0  0x00007f1a6a3313f8 in raise () at /lib64/libc.so.6
+#1  0x00007f1a6a332ffa in abort () at /lib64/libc.so.6
+#2  0x00007f1a6a329c17 in __assert_fail_base () at /lib64/libc.so.6
+#3  0x00007f1a6a329cc2 in  () at /lib64/libc.so.6
+#4  0x000055a6cba705a6 in nbd_reply_chunk_iter_receive (s=s@entry=0x55a6ce458200, iter=iter@entry=0x7f1945fe8890, handle=handle@entry=94174913593865, qiov=qiov@entry=0x0, reply=0x7f1945fe8800,
+    reply@entry=0x0, payload=payload@entry=0x0) at block/nbd-client.c:606
+        local_reply =
+            {simple = {magic = 1732535960, error = 0, handle = 94174913593865}, structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, length = 0}, {magic = 1732535960, _skip = 0, handle = 94174913593865}}
+        chunk = 0x7f1945fe8800
+        local_err = 0x0
+        __func__ = "nbd_reply_chunk_iter_receive"
+        __PRETTY_FUNCTION__ = "nbd_reply_chunk_iter_receive"
+#5  0x000055a6cba706d6 in nbd_co_request (errp=0x7f1945fe8888, handle=94174913593865, s=0x55a6ce458200) at block/nbd-client.c:634
+        iter = {ret = 0, fatal = false, err = 0x0, done = false, only_structured = true}
+        ret = <optimized out>
+        local_err = 0x0
+        client = 0x55a6ce458200
+        __PRETTY_FUNCTION__ = "nbd_co_request"
+#6  0x000055a6cba706d6 in nbd_co_request (bs=bs@entry=0x55a6ce450130, request=request@entry=0x7f1945fe88e0, write_qiov=write_qiov@entry=0x0) at block/nbd-client.c:772
+        ret = <optimized out>
+        local_err = 0x0
+        client = 0x55a6ce458200
+        __PRETTY_FUNCTION__ = "nbd_co_request"
+#7  0x000055a6cba70cb5 in nbd_client_co_pwrite_zeroes (bs=0x55a6ce450130, offset=2483027968, bytes=16777216, flags=<optimized out>) at block/nbd-client.c:860
+        client = <optimized out>
+        request = {handle = 94174913593865, from = 2483027968, len = 16777216, flags = 0, type = 6}
+        __PRETTY_FUNCTION__ = "nbd_client_co_pwrite_zeroes"
+#8  0x000055a6cba67f44 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x55a6ce450130, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, flags=flags@entry=6) at block/io.c:1410
+        num = 16777216
+        drv = 0x55a6cc3b0600 <bdrv_nbd_unix>
+        qiov = {iov = 0x100000, niov = -834338512, nalloc = 21926, size = 1831862272}
+        iov = {iov_base = 0x0, iov_len = 0}
+        ret = -95
+        need_flush = false
+        head = 0
+        tail = 0
+        max_write_zeroes = 33554432
+        alignment = 512
+        max_transfer = 16777216
+        __PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes"
+#9  0x000055a6cba68373 in bdrv_aligned_pwritev (req=req@entry=0x7f1945fe8b50, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, align=align@entry=512, qiov=0x0, flags=6, child=0x55a6ce333f50, child=0x55a6ce333f50)
+    at block/io.c:1522
+        bs = 0x55a6ce450130
+        drv = 0x55a6cc3b0600 <bdrv_nbd_unix>
+        waited = <optimized out>
+        ret = <optimized out>
+        end_sector = 4882432
+        bytes_remaining = 16777216
+        max_transfer = 33554432
+#10 0x000055a6cba69a42 in bdrv_co_pwritev (req=0x7f1945fe8b50, flags=6, bytes=16777216, offset=2483027968, child=0x55a6ce333f50) at block/io.c:1625
+        aligned_bytes = 16777216
+        bs = 0x55a6ce450130
+        buf = <optimized out>
+        tail_padding_bytes = 0
+---Type <return> to continue, or q <return> to quit---
+        local_qiov = {iov = 0x0, niov = 0, nalloc = 0, size = 1825570816}
+        align = 512
+        head_padding_bytes = <optimized out>
+        ret = 0
+        iov = {iov_base = 0x7f1945fe8bc0, iov_len = 1}
+        bs = 0x55a6ce450130
+        req =
+          {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8b98}}, waiting_for = 0x0}
+        align = <optimized out>
+        head_buf = 0x0
+        tail_buf = 0x0
+        local_qiov = {iov = 0x7f1945fe8bc0, niov = 1, nalloc = 0, size = 94171452932608}
+        use_local_qiov = false
+        ret = <optimized out>
+        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
+#11 0x000055a6cba69a42 in bdrv_co_pwritev (child=child@entry=0x55a6ce333f50, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, qiov=qiov@entry=0x0, flags=flags@entry=6) at block/io.c:1698
+        bs = 0x55a6ce450130
+        req =
+          {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8b98}}, waiting_for = 0x0}
+        align = <optimized out>
+        head_buf = 0x0
+        tail_buf = 0x0
+        local_qiov = {iov = 0x7f1945fe8bc0, niov = 1, nalloc = 0, size = 94171452932608}
+        use_local_qiov = false
+        ret = <optimized out>
+        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
+#12 0x000055a6cba6a3be in bdrv_co_pwrite_zeroes (child=0x55a6ce333f50, offset=2483027968, bytes=16777216, flags=<optimized out>) at block/io.c:1822
+#13 0x000055a6cba67f44 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x55a6ce35dd80, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, flags=flags@entry=6) at block/io.c:1410
+        num = 16777216
+        drv = 0x55a6cc3aa5a0 <bdrv_raw>
+        qiov = {iov = 0x6d400000, niov = -878284120, nalloc = 21926, size = 1}
+        iov = {iov_base = 0x0, iov_len = 0}
+        ret = -95
+        need_flush = false
+        head = 0
+        tail = 0
+        max_write_zeroes = 2147483647
+        alignment = 1
+        max_transfer = 16777216
+        __PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes"
+#14 0x000055a6cba68373 in bdrv_aligned_pwritev (req=req@entry=0x7f1945fe8e90, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, align=align@entry=1, qiov=0x0, flags=6, child=0x55a6ce457960, child=0x55a6ce457960)
+    at block/io.c:1522
+        bs = 0x55a6ce35dd80
+        drv = 0x55a6cc3aa5a0 <bdrv_raw>
+        waited = <optimized out>
+        ret = <optimized out>
+        end_sector = 4882432
+        bytes_remaining = 16777216
+        max_transfer = 33554432
+#15 0x000055a6cba69a42 in bdrv_co_pwritev (req=0x7f1945fe8e90, flags=6, bytes=16777216, offset=2483027968, child=0x55a6ce457960) at block/io.c:1625
+        aligned_bytes = 16777216
+        bs = 0x55a6ce35dd80
+        buf = <optimized out>
+---Type <return> to continue, or q <return> to quit---
+        tail_padding_bytes = 0
+        local_qiov = {iov = 0x55a6ce3f8b40, niov = 1543529136, nalloc = 32538, size = 139746525220544}
+        align = 1
+        head_padding_bytes = <optimized out>
+        ret = 0
+        iov = {iov_base = 0x7f1a5c003030, iov_len = 94174870259312}
+        bs = 0x55a6ce35dd80
+        req =
+          {bs = 0x55a6ce35dd80, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbec0}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8ed8}}, waiting_for = 0x0}
+        align = <optimized out>
+        head_buf = 0x0
+        tail_buf = 0x0
+        local_qiov = {iov = 0x7f1a5c003030, niov = -877640080, nalloc = 21926, size = 139751189406384}
+        use_local_qiov = false
+        ret = <optimized out>
+        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
+#16 0x000055a6cba69a42 in bdrv_co_pwritev (child=0x55a6ce457960, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, qiov=qiov@entry=0x0, flags=6) at block/io.c:1698
+        bs = 0x55a6ce35dd80
+        req =
+          {bs = 0x55a6ce35dd80, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbec0}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8ed8}}, waiting_for = 0x0}
+        align = <optimized out>
+        head_buf = 0x0
+        tail_buf = 0x0
+        local_qiov = {iov = 0x7f1a5c003030, niov = -877640080, nalloc = 21926, size = 139751189406384}
+        use_local_qiov = false
+        ret = <optimized out>
+        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
+#17 0x000055a6cba5898b in blk_co_pwritev (blk=0x55a6ce4576b0, offset=2483027968, bytes=16777216, qiov=0x0, flags=<optimized out>) at block/block-backend.c:1188
+        ret = <optimized out>
+        bs = 0x55a6ce35dd80
+#18 0x000055a6cba58a9b in blk_aio_write_entry (opaque=0x7f1a5c0084b0) at block/block-backend.c:1394
+        acb = 0x7f1a5c0084b0
+        rwco = 0x7f1a5c0084d8
+        qiov = <optimized out>
+#19 0x000055a6cbb046ba in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
+        co = 0x7f1a5c003030
+#20 0x00007f1a6a346560 in __start_context () at /lib64/libc.so.6
+#21 0x00007f1a617a3fb0 in  ()
+#22 0x0000000000000000 in  ()
+
+From the core:
+
+structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, length = 0}
+
+You would think that would pass:
+
+    chunk = &reply->structured;
+
+    if (chunk->type == NBD_REPLY_TYPE_NONE) {
+        /* NBD_REPLY_FLAG_DONE is already checked in nbd_co_receive_one_chunk */
+        assert(chunk->flags & NBD_REPLY_FLAG_DONE);
+        goto break_loop;
+    }
+
+Given:
+
+#define NBD_REPLY_TYPE_NONE          0
+
+
+Perhaps this is a problem with my compiler.  (or maybe it's an ignorant guess) I'm using:
+
+gcc version 5.5.0 (GCC)
+
+
+
+Okay, this is probably a race condition bug.  If remove:
+
+<iothreads>1</iothreads>
+and
+iothread='1' from the disk which causes the command to change from: 
+
+-device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on
+
+to
+
+-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on
+
+I don't get crashes anymore.  
+
+So for sure it has something to do with iothreads.
+
+
+The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now.
+If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience.
+
+[Expired for QEMU because there has been no activity for 60 days.]
+