virtio-balloon change breaks post 4.0 upgrade We upgraded the libvirt UCA packages from 3.6 to 4.0 as part of a queens upgrade and noticed that virtio-ballon is broken when instances live migrate (started with a prior 3.6 version) with: 2019-07-24T06:46:49.487109Z qemu-system-x86_64: warning: Unknown firmware file in legacy mode: etc/msr_feature_control 2019-07-24T06:47:22.187749Z qemu-system-x86_64: VQ 2 size 0x80 < last_avail_idx 0xb57 - used_idx 0xb59 2019-07-24T06:47:22.187768Z qemu-system-x86_64: Failed to load virtio-balloon:virtio 2019-07-24T06:47:22.187771Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon' 2019-07-24T06:47:22.188194Z qemu-system-x86_64: load of migration failed: Operation not permitted 2019-07-24 06:47:22.430+0000: shutting down, reason=failed This seem to be the exact problem as reported by https://lists.gnu.org/archive/html/qemu-devel/2019-07/msg02228.html Listed the packages which changed: Start-Date: 2019-07-06 06:40:55 Commandline: /usr/bin/apt-get -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confold install libvirt-bin python-libvirt qemu qemu-utils qemu-system qemu-system-arm qemu-system-mips qemu-system-ppc qemu-system-sparc qemu-system-x86 qemu-system-misc qemu-block-extra qemu-utils qemu-user qemu-kvm Install: librdmacm1:amd64 (17.1-1ubuntu0.1~cloud0, automatic), libvirt-daemon-driver-storage-rbd:amd64 (4.0.0-1ubuntu8.10~cloud0, automatic), ipxe-qemu-256k-compat-efi-roms:amd64 (1.0.0+git-20150424.a25a16d-0ubuntu2~cloud0, automatic) Upgrade: qemu-system-mips:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-misc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-ppc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), python-libvirt:amd64 (3.5.0-1build1~cloud0, 4.0.0-1~cloud0), qemu-system-x86:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-clients:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-user:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-bin:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-utils:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon-system:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-system-sparc:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-user-binfmt:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-kvm:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt0:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0), qemu-system-arm:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-block-extra:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system-common:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), qemu-system:amd64 (1:2.10+dfsg-0ubuntu3.8~cloud1, 1:2.11+dfsg-1ubuntu7.13~cloud0), libvirt-daemon:amd64 (3.6.0-1ubuntu6.8~cloud0, 4.0.0-1ubuntu8.10~cloud0) End-Date: 2019-07-06 06:41:08 At this point the instances would have to be hard rebooted or stopped/started to fix the issue for future live migration attemps Hi Bjoern, I don't think this is the same bug as the one you reference; that's a config space disagreement, not a virtio queue disagreeement. It almost feels like the old 4a1e48becab81020adfb74b22c76a595f2d02a01 stats migration fix; but that went in with 2.8 so shouldn't be a problem going 2.10 to 2.11 In regard to "similar bugs" it sounds more like [1] to me. Which was around needing [2]. But just like the commit David mentioned is in 2.8 this one is in since 2.6 (and backported). [1]: https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1647389 [2]: https://git.qemu.org/?p=qemu.git;a=commit;h=4eae2a657d1ff5ada56eb9b4966eae0eff333b0b Status changed to 'Confirmed' because the bug affects multiple users. With recent release of OpenStack Train this issue reappears... Upgrading from Stein to Train will require all VMs to be hard-rebooted to be migrated as a final step because Live Migration fails with: Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: Unable to read from monitor: Connection reset by peer Oct 17 10:28:43 h2.1.openstack.r0cket.net libvirtd[1545]: internal error: qemu unexpectedly closed the monitor: 2019-10-17T10:28:42.981201Z qemu-system-x86_64: get_pci_config_device: Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0 2019-10-17T10:28:42.981250Z qemu-system-x86_64: Failed to load PCIDevice:config 2019-10-17T10:28:42.981263Z qemu-system-x86_64: Failed to load virtio-balloon:virtio 2019-10-17T10:28:42.981272Z qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:05.0/virtio-balloon' 2019-10-17T10:28:42.981391Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable 2019-10-17T10:28:42.983157Z qemu-system-x86_64: warning: TSC frequency mismatch between VM (2532609 kHz) and host (2532608 kHz), and TSC scaling unavailable 2019-10-17T10:28:42.983672Z qemu-system-x86_64: load of migration failed: Invalid argument Dnaiel: That's a different problem; 'Bad config data: i=0x10 read: a1 device: 1 cmask: ff wmask: c0 w1cmask:0'; so should probably be a separate bug. I'd bet on this being the one fixed by 2bbadb08ce272d65e1f78621002008b07d1e0f03 > I'd bet on this being the one fixed by > 2bbadb08ce272d65e1f78621002008b07d1e0f03 But wasn't the breakage this fixes only added in qemu 4.0? He reports his change is from qemu 2.10 to 2.11. Unfortunately 2bbadb08 doesn't have a "fixes" line, maybe Ubuntu has something backported that makes the Ubuntu 2.11 being affected by it. I checked the Delta but there was nothing related backported that I could identify. But I agree, that first of all this should be a new bug instead of reviving the old one here. > I'd bet on this being the one fixed by > 2bbadb08ce272d65e1f78621002008b07d1e0f03 But wasn't the breakage this fixes only added in qemu 4.0? He reports his change is from qemu 2.10 to 2.11. Unfortunately 2bbadb08 doesn't have a "fixes" line, maybe Ubuntu has something backported that makes the Ubuntu 2.11 being affected by it. I checked the Delta but there was nothing related backported that I could identify. But I agree, that first of all this should be a new bug instead of reviving the old one here. P.S. this will race as I sent the same via mail :-/, but I need to extend. I have now realized that the later report (comment #4) is about Openstack Train which at least on Ubuntu would come with qemu 4.0 - and since 2bbadb08 was released only with 4.1 that would be an open issue. I forked that new discussion into https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1848497 Please follow there and leave this bug here to the originally reported error signature. It seems the fix is comitted with https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1848497. Closing this issue