diff options
Diffstat (limited to 'results/classifier/zero-shot/105/other/1761798')
| -rw-r--r-- | results/classifier/zero-shot/105/other/1761798 | 521 |
1 files changed, 521 insertions, 0 deletions
diff --git a/results/classifier/zero-shot/105/other/1761798 b/results/classifier/zero-shot/105/other/1761798 new file mode 100644 index 000000000..119c96610 --- /dev/null +++ b/results/classifier/zero-shot/105/other/1761798 @@ -0,0 +1,521 @@ +other: 0.860 +semantic: 0.837 +instruction: 0.823 +device: 0.819 +assembly: 0.810 +graphic: 0.786 +KVM: 0.782 +vnc: 0.751 +socket: 0.747 +boot: 0.732 +network: 0.728 +mistranslation: 0.698 + +live migration intermittently fails in CI with "VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8" + +Seen here: + +http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz + +2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/0 (label charserial0) +warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] +2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8 +2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load virtio-blk:virtio +2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk' +2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: Operation not permitted +2018-04-05 21:48:43.198+0000: shutting down, reason=crashed + +And in the n-cpu logs on the other host: + +http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/screen-n-cpu.txt.gz#_Apr_05_21_48_43_257541 + +There is a related Red Hat bug: + +https://bugzilla.redhat.com/show_bug.cgi?id=1450524 + +The CI job failures are at present using the Pike UCA: + +ii libvirt-bin 3.6.0-1ubuntu6.2~cloud0 + +ii qemu-system-x86 1:2.10+dfsg-0ubuntu3.5~cloud0 + +Maybe when we move to use the Queens UCA we'll have better luck with this: + +https://review.openstack.org/#/c/554314/ + +That has these package versions: + +ii libvirt-bin 4.0.0-1ubuntu4~cloud0 + +ii qemu-system-x86 1:2.11+dfsg-1ubuntu2~cloud0 + +The offending instance QEMU from the log that crashed: + +http://logs.openstack.org/37/522537/20/check/legacy-tempest-dsvm-multinode-live-migration/8de6e74/logs/subnode-2/libvirt/qemu/instance-00000002.txt.gz + +---------------------------------------------------------------------------- +2018-04-05 21:48:38.136+0000: starting up libvirt version: 3.6.0, package: 1ubuntu6.2~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Wed, 07 Feb 2018 20:05:24 +0000), qemu version: 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.5~cloud0), hostname: ubuntu-xenial-ovh-bhs1-0003361707 +LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin QEMU_AUDIO_DRV=none /usr/bin/qemu-system-x86_64 -name guest=instance-00000002,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-instance-00000002/master-key.aes -machine pc-i440fx-artful,accel=tcg,usb=off,dump-guest-core=off -m 64 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -uuid e70df34f-395e-4482-9c24-101f12cf635d -smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=17.0.0,serial=97e1e00c-8afb-4356-b55e-92e997c5a1a7,uuid=e70df34f-395e-4482-9c24-101f12cf635d,family=Virtual Machine' -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-3-instance-00000002/monitor.sock,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/opt/stack/data/nova/instances/e70df34f-395e-4482-9c24-101f12cf635d/disk,format=qcow2,if=none,id=drive-virtio-disk0,cache=none -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,fd=29,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=fa:16:3e:15:ea:59,bus=pci.0,addr=0x3 -add-fd set=1,fd=32 -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0 -device cirrus-vga,id=video0,bus=pci.0,addr=0x2 -incoming defer -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 -msg timestamp=on +2018-04-05T21:48:38.205752Z qemu-system-x86_64: -chardev pty,id=charserial0,logfile=/dev/fdset/1,logappend=on: char device redirected to /dev/pts/0 (label charserial0) +warning: TCG doesn't support requested feature: CPUID.01H:ECX.vmx [bit 5] +2018-04-05T21:48:43.153268Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x12c inconsistent with Host index 0x134: delta 0xfff8 +2018-04-05T21:48:43.153288Z qemu-system-x86_64: Failed to load virtio-blk:virtio +2018-04-05T21:48:43.153292Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk' +2018-04-05T21:48:43.153347Z qemu-system-x86_64: load of migration failed: Operation not permitted +2018-04-05 21:48:43.198+0000: shutting down, reason=crashed +---------------------------------------------------------------------------- + + + +I'm not sure that's the same as the bz 1450524 - in this case it's virtio-blk, where I *think* that bz was tracked down to a virtio-balloon and a virtio-net issue. + +Hm, looks like the e-r query [1] for this bug doesn't find it if the screen-n-cpu.txt is on the subnode-2: + +http://logs.openstack.org/25/566425/4/gate/nova-live-migration/27afb39/logs/subnode-2/screen-n-cpu.txt.gz?level=TRACE#_May_15_00_54_26_234161 + +[1] https://github.com/openstack-infra/elastic-recheck/blob/master/queries/1761798.yaml + +Ran into a similar issue, when snapshotting an instance, it won't start afterwards. +Root cause is inconsistency between state of device virtio-scsi and the saved VM RAM state in /var/lib/libvirt/qemu/save/ +Removing instance-0000****.save file from thsi folder allows starting the VM back but may cause data corruption. +Specifics of my case - using virtio-scsi driver and discard enabled, backend is Ceph. Impacted 2 Windows VMs so far. + +I also think this is a qemu bug, will do some tests, try to reproduce. + +We hit a very similar issue https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/screen-n-cpu.txt?severity=0#13482 + +Source node +https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt + +2020-11-20 14:25:24.887+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872195 + +LC_ALL=C \ + +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ + +HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a \ + +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.local/share \ + +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.cache \ + +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-19-instance-0000001a/.config \ + +QEMU_AUDIO_DRV=none \ + +/usr/bin/qemu-system-x86_64 \ + +-name guest=instance-0000001a,debug-threads=on \ + +-S \ + +-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-19-instance-0000001a/master-key.aes \ + +-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \ + +-cpu qemu64,hypervisor=on,lahf-lm=on \ + +-m 128 \ + +-overcommit mem-lock=off \ + +-smp 1,sockets=1,cores=1,threads=1 \ + +-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \ + +-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \ + +-no-user-config \ + +-nodefaults \ + +-chardev socket,id=charmonitor,fd=35,server,nowait \ + +-mon chardev=charmonitor,id=monitor,mode=control \ + +-rtc base=utc \ + +-no-shutdown \ + +-boot strict=on \ + +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ + +-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ + +-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ + +-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ + +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ + +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ + +-netdev tap,fd=37,id=hostnet0 \ + +-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \ + +-add-fd set=2,fd=39 \ + +-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \ + +-device isa-serial,chardev=charserial0,id=serial0 \ + +-vnc 127.0.0.1:0 \ + +-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \ + +-incoming defer \ + +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ + +-object rng-random,id=objrng0,filename=/dev/urandom \ + +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \ + +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ + +-msg timestamp=on + +char device redirected to /dev/pts/2 (label charserial0) + +virtio: bogus descriptor or out of resources + +2020-11-20 14:25:39.911+0000: initiating migration + +2020-11-20T14:26:21.409517Z qemu-system-x86_64: terminating on signal 15 from pid 17395 (/usr/sbin/libvirtd) + +2020-11-20 14:26:21.610+0000: shutting down, reason=destroyed + + + +Target node +https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/libvir +t/qemu/instance-0000001a.txt + +2020-11-20 14:25:11.589+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872194 + +LC_ALL=C \ + +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ + +HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a \ + +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.local/share \ + +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.cache \ + +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-10-instance-0000001a/.config \ + +QEMU_AUDIO_DRV=none \ + +/usr/bin/qemu-system-x86_64 \ + +-name guest=instance-0000001a,debug-threads=on \ + +-S \ + +-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-10-instance-0000001a/master-key.aes \ + +-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \ + +-cpu qemu64 \ + +-m 128 \ + +-overcommit mem-lock=off \ + +-smp 1,sockets=1,cores=1,threads=1 \ + +-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \ + +-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \ + +-no-user-config \ + +-nodefaults \ + +-chardev socket,id=charmonitor,fd=32,server,nowait \ + +-mon chardev=charmonitor,id=monitor,mode=control \ + +-rtc base=utc \ + +-no-shutdown \ + +-boot strict=on \ + +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ + +-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ + +-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ + +-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ + +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ + +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ + +-netdev tap,fd=34,id=hostnet0 \ + +-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \ + +-add-fd set=2,fd=36 \ + +-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \ + +-device isa-serial,chardev=charserial0,id=serial0 \ + +-vnc 0.0.0.0:0 \ + +-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \ + +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ + +-object rng-random,id=objrng0,filename=/dev/urandom \ + +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \ + +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ + +-msg timestamp=on + +char device redirected to /dev/pts/0 (label charserial0) + +2020-11-20 14:25:25.637+0000: initiating migration + +2020-11-20 14:25:26.776+0000: shutting down, reason=migrated + +2020-11-20T14:25:26.777394Z qemu-system-x86_64: terminating on signal 15 from pid 31113 (/usr/sbin/libvirtd) + +2020-11-20 14:25:38.909+0000: starting up libvirt version: 6.0.0, package: 0ubuntu8.4~cloud0 (Openstack Ubuntu Testing Bot <email address hidden> Tue, 15 Sep 2020 20:36:28 +0000), qemu version: 4.2.1Debian 1:4.2-3ubuntu6.7~cloud0, kernel: 4.15.0-124-generic, hostname: ubuntu-bionic-ovh-bhs1-0021872194 + +LC_ALL=C \ + +PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \ + +HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a \ + +XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.local/share \ + +XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.cache \ + +XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-13-instance-0000001a/.config \ + +QEMU_AUDIO_DRV=none \ + +/usr/bin/qemu-system-x86_64 \ + +-name guest=instance-0000001a,debug-threads=on \ + +-S \ + +-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-instance-0000001a/master-key.aes \ + +-machine pc-i440fx-4.2,accel=tcg,usb=off,dump-guest-core=off \ + +-cpu qemu64,hypervisor=on,lahf-lm=on \ + +-m 128 \ + +-overcommit mem-lock=off \ + +-smp 1,sockets=1,cores=1,threads=1 \ + +-uuid 2c468d92-4b19-426a-8c25-16b4624c21a4 \ + +-smbios 'type=1,manufacturer=OpenStack Foundation,product=OpenStack Nova,version=22.1.0,serial=2c468d92-4b19-426a-8c25-16b4624c21a4,uuid=2c468d92-4b19-426a-8c25-16b4624c21a4,family=Virtual Machine' \ + +-no-user-config \ + +-nodefaults \ + +-chardev socket,id=charmonitor,fd=34,server,nowait \ + +-mon chardev=charmonitor,id=monitor,mode=control \ + +-rtc base=utc \ + +-no-shutdown \ + +-boot strict=on \ + +-device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 \ + +-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/_base/61bd5e531ab4c82456aa5300ede7266b3610be79","node-name":"libvirt-2-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ + +-blockdev '{"node-name":"libvirt-2-format","read-only":true,"cache":{"direct":true,"no-flush":false},"driver":"raw","file":"libvirt-2-storage"}' \ + +-blockdev '{"driver":"file","filename":"/opt/stack/data/nova/instances/2c468d92-4b19-426a-8c25-16b4624c21a4/disk","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ + +-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":"libvirt-2-format"}' \ + +-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ + +-netdev tap,fd=36,id=hostnet0 \ + +-device virtio-net-pci,host_mtu=1400,netdev=hostnet0,id=net0,mac=fa:16:3e:43:11:f4,bus=pci.0,addr=0x3 \ + +-add-fd set=2,fd=38 \ + +-chardev pty,id=charserial0,logfile=/dev/fdset/2,logappend=on \ + +-device isa-serial,chardev=charserial0,id=serial0 \ + +-vnc 0.0.0.0:0 \ + +-device cirrus-vga,id=video0,bus=pci.0,addr=0x2 \ + +-incoming defer \ + +-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5 \ + +-object rng-random,id=objrng0,filename=/dev/urandom \ + +-device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x6 \ + +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ + +-msg timestamp=on + +char device redirected to /dev/pts/1 (label charserial0) + +2020-11-20T14:25:40.720757Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0xb8 inconsistent with Host index 0xe0: delta 0xffd8 + +2020-11-20T14:25:40.720785Z qemu-system-x86_64: Failed to load virtio-blk:virtio + +2020-11-20T14:25:40.720790Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk' + +2020-11-20T14:25:40.720824Z qemu-system-x86_64: load of migration failed: Operation not permitted + +2020-11-20 14:25:40.778+0000: shutting down, reason=failed + + + + + +in the new case we have the same qemu/libvirt version in both nodes + +4.2.1Debian 1:4.2-3ubuntu6.7~cloud0 + +and in this cae its failng for the virtio-blk device not the memroy ballon + +before the migration starts we see + +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ +-msg timestamp=on +char device redirected to /dev/pts/2 (label charserial0) +virtio: bogus descriptor or out of resources +2020-11-20 14:25:39.911+0000: initiating migration + +on the destination +https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/subnode-2/libvirt/qemu/instance-0000001a.txt + +then on the source we see + +char device redirected to /dev/pts/1 (label charserial0) +2020-11-20T14:25:40.720757Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0xb8 inconsistent with Host index 0xe0: delta 0xffd8 +2020-11-20T14:25:40.720785Z qemu-system-x86_64: Failed to load virtio-blk:virtio +2020-11-20T14:25:40.720790Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:04.0/virtio-blk' +2020-11-20T14:25:40.720824Z qemu-system-x86_64: load of migration failed: Operation not permitted + +when it fails +https://zuul.opendev.org/t/openstack/build/d50877ae15db4022b82f4bb1d1d52cea/log/logs/libvirt/qemu/instance-0000001a.txt + + + +[Copy/pasting my comment from here: https://bugs.launchpad.net/nova/+bug/1737625/comments/4] + +I just talked to Dave Gilbert from upstream QEMU. Overall, as I implied in comment#2, this gnarly issue requires specialized debugging, digging deep into the bowels of QEMU, 'virtio-blk' and 'virtio. + +That said, Dave notes that we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted. + +Dave makes some guesswork of a potential scenario where this can occur: + + - Guest is running + - ... live migration starts + - ... a "block read" request gets submitteed + - ... live migration stops the vCPUs, finishes transmitting RAM + - ... the "block read" completes, 'virtio-blk' updates pointers + - ... live migration "serializes" the 'virito-blk' state + +So the "guest index inconsistent" state would only happen if you got unlucky with the timing of that read. + +Another possibility, Dave points out, is that the guest has screwed up the device state somehow; the migration code in 'virtio' checks the state a lot. We have ruled this possibility out becausethe guest is just a garden-variety CirrOS instance idling; nothing special about it. + + +hang on, I've just noticed the : + + char device redirected to /dev/pts/2 (label charserial0) +virtio: bogus descriptor or out of resources +2020-11-20 14:25:39.911+0000: initiating migration + +I've never seen that 'bogus descriptor or out of resources' one before; the fact that's happening before the migration starts is suspicious that something has already gone wrong before migration started. +Is that warning present in all these failures? + +I see two recent hits we have still logs for. + +The one described in comment #10 + +And another but there the error message is a bit different: + +on the migration source host: +: https://6f0be18d925d64906a23-689ad0b9b6f06bc0c51bfb99bf86ea04.ssl.cf5.rackcdn.com/698706/4/check/nova-grenade-multinode/ee2dbea/logs/libvirt/qemu/instance-0000001b.txt + +char device redirected to /dev/pts/0 (label charserial0) +virtio: zero sized buffers are not allowed +2020-11-23 22:20:54.297+0000: initiating migration + +on the migration destination +https://6f0be18d925d64906a23-689ad0b9b6f06bc0c51bfb99bf86ea04.ssl.cf5.rackcdn.com/698706/4/check/nova-grenade-multinode/ee2dbea/logs/subnode-2/libvirt/qemu/instance-0000001b.txt + +char device redirected to /dev/pts/0 (label charserial0) +2020-11-23T22:20:55.129189Z qemu-system-x86_64: VQ 0 size 0x80 Guest index 0x62 inconsistent with Host index 0xa1: delta 0xffc1 +2020-11-23T22:20:55.129230Z qemu-system-x86_64: Failed to load virtio-blk:virtio +2020-11-23T22:20:55.129241Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/virtio-blk' +2020-11-23T22:20:55.129259Z qemu-system-x86_64: load of migration failed: Operation not permitted + + + + + + + +OK, but that still says in both cases here we've got a virtio error telling us that the queues are broken before migration even starts; so we should try and figure out why that's happening first. + +Is this still an issue with the latest release of QEMU (v6.0)? + +I got same issue on centos 7 stein + + +Hello, some news ....I wonder if they can help: +I am testing with some virtual machine again. +If I follows this steps it works (but I lost network connection): + +1) Detach network interface from instance +2) Attach network interface to instance +3) Migrate instance +4) Loggin into instance using console and restart networking + +while if I restart networking before live migration it does not work. +So, when someone mentioned + +######################## +we get this "guest index inconsistent" error when the migrated RAM is inconsistent with the migrated 'virtio' device state. And a common case is where a 'virtio' device does an operation after the vCPU is stopped and after RAM has been transmitted. +#############################à +the network traffic could be the problem ? +Ignazio + +Be careful, it might not be the same bug. + +Yes, it *shouldn't* be a problem, but if the virtio code in qemu is broken then it will keep accepting incoming packets even when the guest is stopped in the final part of the migration and you get the contents of the RAM taken before the reception ofthe packet, but hte virtio state that's in the migration stream after the reception of the packet, and it's inconsistent. + +But the case the other reporter mentioned is on a virtio-blk device; the same thing can happen if the storage device stalls/is slow during the migration code - i.e. a block read takes ages to complete and happens to complete after the point it should have stopped for migration. + +Is this still happening with the latest release? + +[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.] + +[Expired for QEMU because there has been no activity for 60 days.] + |