summary refs log tree commit diff stats
path: root/results/classifier/zero-shot/105/mistranslation/1793791
blob: b036f40586aed7d6d03353f5ec84a64884d91650 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
mistranslation: 0.542
semantic: 0.526
KVM: 0.504
other: 0.491
graphic: 0.477
vnc: 0.450
assembly: 0.449
boot: 0.400
instruction: 0.398
network: 0.398
device: 0.371
socket: 0.291

Crash with nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed

Qemu version on both sides: 2.12.1
Host A Linux: 4.9.76
Host B Linux: 4.14.67

While calling from Host A:
virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all

I get a qemu crash with:

2018-09-21 16:12:23.073+0000: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990
qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.
2018-09-21 16:12:41.230+0000: shutting down, reason=crashed
2018-09-21 16:12:52.900+0000: shutting down, reason=failed

It doesn't do it every time, but most of the time.

Tested with Qemu 3.0.0 and this still happens.

Also tested with kernel 4.9.128 on one side and 4.9.76 on the other thinking it might be a kernel 4.14 issue.

On 9/21/18 11:42 AM, Matthew Schumacher wrote:
> Public bug reported:
> 
> Qemu version on both sides: 2.12.1
> Host A Linux: 4.9.76
> Host B Linux: 4.14.67
> 
> While calling from Host A:
> virsh migrate virtualmachine qemu+ssh://hostB/system --live --undefinesource --persistent --verbose --copy-storage-all

Can you show the qemu command line generated by libvirt?

> 
> I get a qemu crash with:
> 
> 2018-09-21 16:12:23.073+0000: 14428: info : virObjectUnref:350 : OBJECT_UNREF: obj=0x7f922c03d990
> qemu-system-x86_64: block/nbd-client.c:606: nbd_reply_chunk_iter_receive: Assertion `chunk->flags & NBD_REPLY_FLAG_DONE' failed.

That assertion is the client complaining that it things the server has 
sent invalid data.

> 2018-09-21 16:12:41.230+0000: shutting down, reason=crashed
> 2018-09-21 16:12:52.900+0000: shutting down, reason=failed
> 
> It doesn't do it every time, but most of the time.

Hmm - you're running under libvirt, so it's harder to inject command 
line options, but I'd be interested in figuring out how to enable trace 
output of nbd_* on both the server and the destination during the block 
migration portion of 'virsh migrate', to see if the tail of that trace 
when reproducing the failure gives any more insights into which side is 
breaking the protocol.  I'm also trying to inspect the code in 
nbd/server.c to see if the server can ever actually send a 
NBD_REPLY_TYPE_NONE packet without setting the NBD_REPLY_FLAG_DONE bit, 
which is what the assertion is complaining about.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


Hi Eric,

Thanks for looking at this.  

I looked at the nbd/server.c code and couldn't see how it could send a NBD_REPLY_TYPE_NONE packet without setting the NBD_REPLY_FLAG_DONE bit.  The only place NBD_REPLY_TYPE_NONE is set is on line 1603:

    set_be_chunk(&chunk, NBD_REPLY_FLAG_DONE, NBD_REPLY_TYPE_NONE, handle, 0);

Anyway, here is the command line generated:

/usr/bin/qemu-system-x86_64 -name guest=dng-smokeping,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-7-dng-smokeping/master-key.aes -machine pc-1.1,accel=kvm,usb=off,dump-guest-core=off -cpu qemu64,pmu=off -m 4096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -object iothread,id=iothread1 -uuid 3d0e1603-ad08-4876-9d9f-2d563fac07ea -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=26,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,clock=vm,driftfix=slew -global kvm-pit.lost_tick_policy=delay -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/datastore/vm/dng-smokeping.raw,format=raw,if=none,id=drive-virtio-disk0,cache=writeback,aio=threads -device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on -drive if=none,id=drive-ide0-1-0,readonly=on -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0,bootindex=1 -netdev tap,fd=28,id=hostnet0,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1d:da:b9,bus=pci.0,addr=0x3 -device usb-tablet,id=input0,bus=usb.0,port=1 -vnc 0.0.0.0:59 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on


Is there some way to turn on NBD trace?  I don't see any trace code around the assert, so I'm guessing it would need to be written....   Is there a log event in QMP?  Can that be used to trace what is going on?  If so it would be easy to make libvirt log all of that, which should tell us what is going on...

If that won't work, I can run the VM outside of libvirt and tell it to migrate over the QMP socket


I'm back to trying to figure this out.  I can't use migrate and copy storage until this bug is fixed, so I'm pretty motivated.  Today I configured libvirt/qemu to dump the core, and I compiled qemu with debugging symbols.  Here is the backtrace.  I'm not sure it says anything we don't already know.

I may try to hack in some more debugging later today, but my C is terrible.  Any other ideas on ways I can help?

(gdb) bt full
#0  0x00007f1a6a3313f8 in raise () at /lib64/libc.so.6
#1  0x00007f1a6a332ffa in abort () at /lib64/libc.so.6
#2  0x00007f1a6a329c17 in __assert_fail_base () at /lib64/libc.so.6
#3  0x00007f1a6a329cc2 in  () at /lib64/libc.so.6
#4  0x000055a6cba705a6 in nbd_reply_chunk_iter_receive (s=s@entry=0x55a6ce458200, iter=iter@entry=0x7f1945fe8890, handle=handle@entry=94174913593865, qiov=qiov@entry=0x0, reply=0x7f1945fe8800,
    reply@entry=0x0, payload=payload@entry=0x0) at block/nbd-client.c:606
        local_reply =
            {simple = {magic = 1732535960, error = 0, handle = 94174913593865}, structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, length = 0}, {magic = 1732535960, _skip = 0, handle = 94174913593865}}
        chunk = 0x7f1945fe8800
        local_err = 0x0
        __func__ = "nbd_reply_chunk_iter_receive"
        __PRETTY_FUNCTION__ = "nbd_reply_chunk_iter_receive"
#5  0x000055a6cba706d6 in nbd_co_request (errp=0x7f1945fe8888, handle=94174913593865, s=0x55a6ce458200) at block/nbd-client.c:634
        iter = {ret = 0, fatal = false, err = 0x0, done = false, only_structured = true}
        ret = <optimized out>
        local_err = 0x0
        client = 0x55a6ce458200
        __PRETTY_FUNCTION__ = "nbd_co_request"
#6  0x000055a6cba706d6 in nbd_co_request (bs=bs@entry=0x55a6ce450130, request=request@entry=0x7f1945fe88e0, write_qiov=write_qiov@entry=0x0) at block/nbd-client.c:772
        ret = <optimized out>
        local_err = 0x0
        client = 0x55a6ce458200
        __PRETTY_FUNCTION__ = "nbd_co_request"
#7  0x000055a6cba70cb5 in nbd_client_co_pwrite_zeroes (bs=0x55a6ce450130, offset=2483027968, bytes=16777216, flags=<optimized out>) at block/nbd-client.c:860
        client = <optimized out>
        request = {handle = 94174913593865, from = 2483027968, len = 16777216, flags = 0, type = 6}
        __PRETTY_FUNCTION__ = "nbd_client_co_pwrite_zeroes"
#8  0x000055a6cba67f44 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x55a6ce450130, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, flags=flags@entry=6) at block/io.c:1410
        num = 16777216
        drv = 0x55a6cc3b0600 <bdrv_nbd_unix>
        qiov = {iov = 0x100000, niov = -834338512, nalloc = 21926, size = 1831862272}
        iov = {iov_base = 0x0, iov_len = 0}
        ret = -95
        need_flush = false
        head = 0
        tail = 0
        max_write_zeroes = 33554432
        alignment = 512
        max_transfer = 16777216
        __PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes"
#9  0x000055a6cba68373 in bdrv_aligned_pwritev (req=req@entry=0x7f1945fe8b50, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, align=align@entry=512, qiov=0x0, flags=6, child=0x55a6ce333f50, child=0x55a6ce333f50)
    at block/io.c:1522
        bs = 0x55a6ce450130
        drv = 0x55a6cc3b0600 <bdrv_nbd_unix>
        waited = <optimized out>
        ret = <optimized out>
        end_sector = 4882432
        bytes_remaining = 16777216
        max_transfer = 33554432
#10 0x000055a6cba69a42 in bdrv_co_pwritev (req=0x7f1945fe8b50, flags=6, bytes=16777216, offset=2483027968, child=0x55a6ce333f50) at block/io.c:1625
        aligned_bytes = 16777216
        bs = 0x55a6ce450130
        buf = <optimized out>
        tail_padding_bytes = 0
---Type <return> to continue, or q <return> to quit---
        local_qiov = {iov = 0x0, niov = 0, nalloc = 0, size = 1825570816}
        align = 512
        head_padding_bytes = <optimized out>
        ret = 0
        iov = {iov_base = 0x7f1945fe8bc0, iov_len = 1}
        bs = 0x55a6ce450130
        req =
          {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8b98}}, waiting_for = 0x0}
        align = <optimized out>
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x7f1945fe8bc0, niov = 1, nalloc = 0, size = 94171452932608}
        use_local_qiov = false
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
#11 0x000055a6cba69a42 in bdrv_co_pwritev (child=child@entry=0x55a6ce333f50, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, qiov=qiov@entry=0x0, flags=flags@entry=6) at block/io.c:1698
        bs = 0x55a6ce450130
        req =
          {bs = 0x55a6ce450130, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbb80}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8b98}}, waiting_for = 0x0}
        align = <optimized out>
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x7f1945fe8bc0, niov = 1, nalloc = 0, size = 94171452932608}
        use_local_qiov = false
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
#12 0x000055a6cba6a3be in bdrv_co_pwrite_zeroes (child=0x55a6ce333f50, offset=2483027968, bytes=16777216, flags=<optimized out>) at block/io.c:1822
#13 0x000055a6cba67f44 in bdrv_co_do_pwrite_zeroes (bs=bs@entry=0x55a6ce35dd80, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, flags=flags@entry=6) at block/io.c:1410
        num = 16777216
        drv = 0x55a6cc3aa5a0 <bdrv_raw>
        qiov = {iov = 0x6d400000, niov = -878284120, nalloc = 21926, size = 1}
        iov = {iov_base = 0x0, iov_len = 0}
        ret = -95
        need_flush = false
        head = 0
        tail = 0
        max_write_zeroes = 2147483647
        alignment = 1
        max_transfer = 16777216
        __PRETTY_FUNCTION__ = "bdrv_co_do_pwrite_zeroes"
#14 0x000055a6cba68373 in bdrv_aligned_pwritev (req=req@entry=0x7f1945fe8e90, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, align=align@entry=1, qiov=0x0, flags=6, child=0x55a6ce457960, child=0x55a6ce457960)
    at block/io.c:1522
        bs = 0x55a6ce35dd80
        drv = 0x55a6cc3aa5a0 <bdrv_raw>
        waited = <optimized out>
        ret = <optimized out>
        end_sector = 4882432
        bytes_remaining = 16777216
        max_transfer = 33554432
#15 0x000055a6cba69a42 in bdrv_co_pwritev (req=0x7f1945fe8e90, flags=6, bytes=16777216, offset=2483027968, child=0x55a6ce457960) at block/io.c:1625
        aligned_bytes = 16777216
        bs = 0x55a6ce35dd80
        buf = <optimized out>
---Type <return> to continue, or q <return> to quit---
        tail_padding_bytes = 0
        local_qiov = {iov = 0x55a6ce3f8b40, niov = 1543529136, nalloc = 32538, size = 139746525220544}
        align = 1
        head_padding_bytes = <optimized out>
        ret = 0
        iov = {iov_base = 0x7f1a5c003030, iov_len = 94174870259312}
        bs = 0x55a6ce35dd80
        req =
          {bs = 0x55a6ce35dd80, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbec0}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8ed8}}, waiting_for = 0x0}
        align = <optimized out>
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x7f1a5c003030, niov = -877640080, nalloc = 21926, size = 139751189406384}
        use_local_qiov = false
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
#16 0x000055a6cba69a42 in bdrv_co_pwritev (child=0x55a6ce457960, offset=offset@entry=2483027968, bytes=bytes@entry=16777216, qiov=qiov@entry=0x0, flags=6) at block/io.c:1698
        bs = 0x55a6ce35dd80
        req =
          {bs = 0x55a6ce35dd80, offset = 2483027968, bytes = 16777216, type = BDRV_TRACKED_WRITE, serialising = false, overlap_offset = 2483027968, overlap_bytes = 16777216, list = {le_next = 0x0, le_prev = 0x7f19452dbec0}, co = 0x7f1a5c003030, wait_queue = {entries = {sqh_first = 0x0, sqh_last = 0x7f1945fe8ed8}}, waiting_for = 0x0}
        align = <optimized out>
        head_buf = 0x0
        tail_buf = 0x0
        local_qiov = {iov = 0x7f1a5c003030, niov = -877640080, nalloc = 21926, size = 139751189406384}
        use_local_qiov = false
        ret = <optimized out>
        __PRETTY_FUNCTION__ = "bdrv_co_pwritev"
#17 0x000055a6cba5898b in blk_co_pwritev (blk=0x55a6ce4576b0, offset=2483027968, bytes=16777216, qiov=0x0, flags=<optimized out>) at block/block-backend.c:1188
        ret = <optimized out>
        bs = 0x55a6ce35dd80
#18 0x000055a6cba58a9b in blk_aio_write_entry (opaque=0x7f1a5c0084b0) at block/block-backend.c:1394
        acb = 0x7f1a5c0084b0
        rwco = 0x7f1a5c0084d8
        qiov = <optimized out>
#19 0x000055a6cbb046ba in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at util/coroutine-ucontext.c:116
        co = 0x7f1a5c003030
#20 0x00007f1a6a346560 in __start_context () at /lib64/libc.so.6
#21 0x00007f1a617a3fb0 in  ()
#22 0x0000000000000000 in  ()

From the core:

structured = {magic = 1732535960, flags = 0, type = 0, handle = 94174913593865, length = 0}

You would think that would pass:

    chunk = &reply->structured;

    if (chunk->type == NBD_REPLY_TYPE_NONE) {
        /* NBD_REPLY_FLAG_DONE is already checked in nbd_co_receive_one_chunk */
        assert(chunk->flags & NBD_REPLY_FLAG_DONE);
        goto break_loop;
    }

Given:

#define NBD_REPLY_TYPE_NONE          0


Perhaps this is a problem with my compiler.  (or maybe it's an ignorant guess) I'm using:

gcc version 5.5.0 (GCC)



Okay, this is probably a race condition bug.  If remove:

<iothreads>1</iothreads>
and
iothread='1' from the disk which causes the command to change from: 

-device virtio-blk-pci,iothread=iothread1,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on

to

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=2,write-cache=on

I don't get crashes anymore.  

So for sure it has something to do with iothreads.


The QEMU project is currently considering to move its bug tracking to another system. For this we need to know which bugs are still valid and which could be closed already. Thus we are setting older bugs to "Incomplete" now.
If you still think this bug report here is valid, then please switch the state back to "New" within the next 60 days, otherwise this report will be marked as "Expired". Or mark it as "Fix Released" if the problem has been solved with a newer version of QEMU already. Thank you and sorry for the inconvenience.

[Expired for QEMU because there has been no activity for 60 days.]