summary refs log tree commit diff stats
path: root/results/scraper/launchpad/1761798
blob: 869d07d65e3f6ad15c42cc8288084c0865795961 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
live migration intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8"

Seen here:

http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz

2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/0 (label charserial0)
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8
2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'
2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: Operation not permitted
2018-04-05 21:48:43.198+0000: shutting down, reason=crashed

And in the n-cpu logs on the other host:

http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/screen-n-cpu.txt.gz#_Apr_05_21_48_43_257541

There is a related Red Hat bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1450524

The CI job failures are at present using the Pike UCA:

ii  libvirt-bin                         3.6.0-1ubuntu6.2~cloud0

ii  qemu-system-x86                     1:2.10+dfsg-0ubuntu3.5~cloud0

Maybe when we move to use the Queens UCA we'll have better luck with this:

https://review.openstack.org/#/c/554314/

That has these package versions:

ii  libvirt-bin                         4.0.0-1ubuntu4~cloud0

ii  qemu-system-x86                     1:2.11+dfsg-1ubuntu2~cloud0

The offending instance QEMU from the log that crashed:

http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz

----------------------------------------------------------------------------
2018-04-05 21:48:38.136+0000: starting up libvirt version: 3.6.0, package: 1ubuntu6.2~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Wed, 07 Feb 2018 20:05:24 +0000), qemu version: 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.5~cloud0), hostname: ubuntu-xenial-ovh-bhs1-0003361707
LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-00000002,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-00000002/master-key.aes -machine pc-i440fx-artful,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid e70df34f-395e-4482-9c24-101f12cf635d -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=17.0.0,serial=97e1e00c-8afb-4356-b55e-92e997c5a1a7,uuid=e70df34f-395e-4482-9c24-101f12cf635d,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/e70df34f-395e-4482-9c24-101f12cf635d/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:15:ea:59,bus=pci.0,addr=0x3 -add-fd set=1,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on
2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/0 (label charserial0)
warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5]
2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8
2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'
2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: Operation not permitted
2018-04-05 21:48:43.198+0000: shutting down, reason=crashed
----------------------------------------------------------------------------



I'm not sure that's the same as the bz 1450524 - in this case it's virtio-blk, where I *think* that bz was tracked down to a virtio-balloon and a virtio-net issue.

Hm, looks like the e-r query [1] for this bug doesn't find it if the screen-n-cpu.txt is on the subnode-2:

http://logs.openstack.org/25/566425/4/gate/nova-live-migration/27afb39/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_May_15_00_54_26_234161

[1] https://github.com/openstack-infra/elastic-recheck/blob/master/queries/1761798.yaml

Ran into a similar issue, when snapshotting an instance, it won't start afterwards.
Root cause is inconsistency between state of device virtio-scsi and the saved VM RAM state in /var/lib/libvirt/qemu/save/
Removing instance-0000****.save file from thsi folder allows starting the VM back but may cause data corruption.
Specifics of my case - using virtio-scsi driver and discard enabled, backend is Ceph. Impacted 2 Windows VMs so far.

I also think this is a qemu bug, will do some tests, try to reproduce.

We hit a very similar issue https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/screen-n-cpu.txt?severity=0#13482

Source node 
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt

2020-11-20 14:25:24.887+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872195

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64,hypervisor=on,lahf-lm=on \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=35,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=37,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=39 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 127.0.0.1:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-incoming defer \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

char device redirected to /dev/pts/2 (label charserial0)

virtio: bogus descriptor or out of resources

2020-11-20 14:25:39.911+0000: initiating migration

2020-11-20T14:26:21.409517Z qemu-system-x86_64: terminating on signal 15 from pid 17395 (/usr/sbin/libvirtd)

2020-11-20 14:26:21.610+0000: shutting down, reason=destroyed



Target node
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/libvir
t/qemu/instance-0000001a.txt

2020-11-20 14:25:11.589+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872194

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64 \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=32,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=34,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=36 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 0.0.0.0:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

char device redirected to /dev/pts/0 (label charserial0)

2020-11-20 14:25:25.637+0000: initiating migration

2020-11-20 14:25:26.776+0000: shutting down, reason=migrated

2020-11-20T14:25:26.777394Z qemu-system-x86_64: terminating on signal 15 from pid 31113 (/usr/sbin/libvirtd)

2020-11-20 14:25:38.909+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872194

LC_ALL=C \

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \

HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a \

XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.local/share \

XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.cache \

XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.config \

QEMU_AUDIO_DRV=none \

/usr/bin/qemu-system-x86_64 \

-name guest=instance-0000001a,debug-threads=on \

-S \

-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-instance-0000001a/master-key.aes \

-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \

-cpu qemu64,hypervisor=on,lahf-lm=on \

-m 128 \

-overcommit mem-lock=off \

-smp 1,sockets=1,cores=1,threads=1 \

-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \

-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \

-no-user-config \

-nodefaults \

-chardev socket,id=charmonitor,fd=34,server,nowait \

-mon chardev=charmonitor,id=monitor,mode=control \

-rtc base=utc \

-no-shutdown \

-boot strict=on \

-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \

-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \

-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \

-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \

-netdev tap,fd=36,id=hostnet0 \

-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \

-add-fd set=2,fd=38 \

-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \

-device isa-serial,chardev=charserial0,id=serial0 \

-vnc 0.0.0.0:0 \

-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \

-incoming defer \

-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \

-object rng-random,id=objrng0,filename=/dev/urandom \

-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \

-msg timestamp=on

char device redirected to /dev/pts/1 (label charserial0)

2020-11-20T14:25:40.720757Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0xb8 inconsistent with Host index 0xe0: delta 0xffd8

2020-11-20T14:25:40.720785Z qemu-system-x86_64: Failed to load virtio-blk:virtio

2020-11-20T14:25:40.720790Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'

2020-11-20T14:25:40.720824Z qemu-system-x86_64: load of migration failed: Operation not permitted

2020-11-20 14:25:40.778+0000: shutting down, reason=failed





in the new case we have the same qemu/libvirt version in both nodes

4.2.1Debian 1:4.2-3ubuntu6.7~cloud0

and in this cae its failng for the virtio-blk device not the memroy ballon

before the migration starts we see

-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/2 (label charserial0)
virtio: bogus descriptor or out of resources
2020-11-20 14:25:39.911+0000: initiating migration

on the destination
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt

then  on the source we see 

char device redirected to /dev/pts/1 (label charserial0)
2020-11-20T14:25:40.720757Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0xb8 inconsistent with Host index 0xe0: delta 0xffd8
2020-11-20T14:25:40.720785Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2020-11-20T14:25:40.720790Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk'
2020-11-20T14:25:40.720824Z qemu-system-x86_64: load of migration failed: Operation not permitted

when it fails
https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/libvirt/qemu/instance-0000001a.txt



[Copy/pasting my comment from here: https://bugs.launchpad.net/nova/+bug/1737625/comments/4]

I just talked to Dave Gilbert from upstream QEMU. Overall, as I implied in comment#2, this gnarly issue requires specialized debugging, digging deep into the bowels of QEMU, 'virtio-blk' and 'virtio.

That said, Dave notes that we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted.

Dave makes some guesswork of a potential scenario where this can occur:

  - Guest is running
  - ... live migration starts
  - ... a "block read" request gets submitteed
  - ... live migration stops the vCPUs, finishes transmitting RAM
  - ... the "block read" completes, 'virtio-blk' updates pointers
  - ... live migration "serializes" the 'virito-blk' state

So the "guest index inconsistent" state would only happen if you got unlucky with the timing of that read.

Another possibility, Dave points out, is that the guest has screwed up the device state somehow; the migration code in 'virtio' checks the state a lot. We have ruled this possibility out becausethe guest is just a garden-variety CirrOS instance idling; nothing special about it.


hang on, I've just noticed the :

 char device redirected to /dev/pts/2 (label charserial0)
virtio: bogus descriptor or out of resources
2020-11-20 14:25:39.911+0000: initiating migration

I've never seen that 'bogus descriptor or out of resources' one before; the fact that's happening before the migration starts is suspicious that something has already gone wrong before migration started.
Is that warning present in all these failures?

I see two recent hits we have still logs for.

The one described in comment #10

And another but there the error message is a bit different:

on the migration source host:
: https://6f0be18d925d64906a23-689ad0b9b6f06bc0c51bfb99bf86ea04.ssl.cf5.rackcdn.com/698706/4/check/nova-grenade-multinode/ee2dbea/logs/libvirt/qemu/instance-0000001b.txt

char device redirected to /dev/pts/0 (label charserial0)
virtio: zero sized buffers are not allowed
2020-11-23 22:20:54.297+0000: initiating migration

on the migration destination
https://6f0be18d925d64906a23-689ad0b9b6f06bc0c51bfb99bf86ea04.ssl.cf5.rackcdn.com/698706/4/check/nova-grenade-multinode/ee2dbea/logs/subnode-2/libvirt/qemu/instance-0000001b.txt

char device redirected to /dev/pts/0 (label charserial0)
2020-11-23T22:20:55.129189Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x62 inconsistent with Host index 0xa1: delta 0xffc1
2020-11-23T22:20:55.129230Z qemu-system-x86_64: Failed to load virtio-blk:virtio
2020-11-23T22:20:55.129241Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-blk'
2020-11-23T22:20:55.129259Z qemu-system-x86_64: load of migration failed: Operation not permitted







OK, but that still says in both cases here we've got a virtio error telling us that the queues are broken before migration even starts;  so we should try and figure out why that's happening first.

Is this still an issue with the latest release of QEMU (v6.0)?

I got same issue on centos 7 stein


Hello, some news ....I wonder if they can help:
I am testing with some virtual machine again.
If I follows this steps it works (but I lost network connection):

1) Detach network interface from instance
2) Attach network interface to instance
3) Migrate instance
4) Loggin into instance using console and restart networking

while if I restart networking before live migration it does not work.
So, when someone mentioned

########################
we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted.
#############################à
the network traffic could be the problem ?
Ignazio

Be careful, it might not be the same bug.

Yes, it *shouldn't* be a problem, but if the virtio code in qemu is broken then it will keep accepting incoming packets even when the guest is stopped in the final part of the migration and you get the contents of the RAM taken before the reception ofthe packet, but hte virtio state that's in the migration stream after the reception of the packet, and it's inconsistent.

But the case the other reporter mentioned is on a virtio-blk device; the same thing can happen if the storage device stalls/is slow during the migration code - i.e. a block read takes ages to complete and happens to complete after the point it should have stopped for migration.

Is this still happening with the latest release?

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

[Expired for QEMU because there has been no activity for 60 days.]