[SRU] migration was active, but no RAM info was set While live-migrating many instances concurrently, libvirt sometimes return internal error: migration was active, but no RAM info was set: ~~~ 2022-03-30 06:08:37.197 7 WARNING nova.virt.libvirt.driver [req-5c3296cf-88ee-4af6-ae6a-ddba99935e23 - - - - -] [instance: af339c99-1182-4489-b15c-21e52f50f724] Error monitoring migration: internal error: migration was active, but no RAM info was set: libvirt.libvirtError: internal error: migration was active, but no RAM info was set ~~~ From upstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=2074205 [Impact] * Effects of this bug are mostly observed in large scale clusters with a lot of live migration activity. * Has second order effects for consumers of migration monitor such as libvirt and openstack. [Test Case] Steps to Reproduce: 1. live evacuate a compute 2. live migration of one or more instances fails with the above error N.B Due to the nature of this bug it is difficult consistently reproduce. [Where problems could occur] * In the event of a regression the migration monitor may report an inconsistent state. The attachment "lp1994002-qemu-ussuri.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team. [This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.] If you need something from upstream QEMU, please use the new bug tracker here: https://gitlab.com/qemu-project/qemu/-/issues Hi Brett, Thanks for the debdiffs! I just reviewed them, and there are changes that should be made. I could do those myself, but that wouldn't be an opportunity to learn/practice some details for SRUs for you, so I'll add notes. *However*, if you're too busy and can't do that, do let me know. cheers, Mauricio ... qemu.git $ git describe --contains 552de79bfdd5e9e53847eb3c6d6e4cd898a4370e v7.1.0-rc0~136^2 ubuntu archive: $ rmadison -a source qemu ... qemu | 1:2.11+dfsg-1ubuntu7 | bionic | source qemu | 1:2.11+dfsg-1ubuntu7.40 | bionic-security | source qemu | 1:2.11+dfsg-1ubuntu7.40 | bionic-updates | source qemu | 1:4.2-3ubuntu6 | focal | source qemu | 1:4.2-3ubuntu6.23 | focal-security | source qemu | 1:4.2-3ubuntu6.23 | focal-updates | source qemu | 1:6.2+dfsg-2ubuntu6 | jammy | source qemu | 1:6.2+dfsg-2ubuntu6.2 | jammy-security | source qemu | 1:6.2+dfsg-2ubuntu6.5 | jammy-updates | source qemu | 1:7.0+dfsg-7ubuntu2 | kinetic | source qemu | 1:7.0+dfsg-7ubuntu2 | lunar | source 0) Development release The development release (lunar) still doesn't have the patch. That is required for SRU / stable releases. We'll need a debdiff for lunar, slightly different than kinetic (release name and greater version string for the upgrade path). I just checked w/ Christian and we shouldn't wait on qemu 7.1 merge from Debian (sid), which would include the patch, since the merge from Debian should happen in January to get qemu 7.2. 1) Oldest LTS in standard support Would Bionic benefit from this fix on the long run as well, just before it goes into expanded/out of standard- support? Apparently, some deployments/clouds still use Bionic on kvm compute nodes. If so, the backport targets qmp_query_migrate()/same file, per commit 65ace0604551 ("migration: add postcopy total blocktime into query-migrate"). 2) Debdiffs: - version strings: the 'lp*' version suffix is fine for test builds, but for official packages usually (see [1]): just increment '.1' on stable releases, and '1' on dev. example: kinetic (sru): 1:7.0+dfsg-7ubuntu2 -> ubuntu2.1 luanr (devel): 1:7.0+dfsg-7ubuntu2 -> ubuntu3 - changelog: mostly good! (d/p/file.patch; LP: #number?; releases). The LP bug number 1982284 refers to another/openstack bug, but the Ubuntu SRUs are coming through this bug, apparently. Since this is the bug where Ubuntu Archive/Cloud Archive have packages/series on, to be closed when SRUs land in -proposed and -updates (and UCA), we should change: 1) the LP bug number in the changelog 2) and patch file names 3) also, it's a good idea to link to other LP bug in the SRU template '[Other Info]' section. (you could also just move the SRU template/packages/ series/tracks to the other LP bug, I guess. Up to you.) - quilt patch: add DEP3 headers [2] (Origin:/Bug-Ubuntu:) - quilt series: missing 'ubuntu/' dir on k/j (not on f) - duplications: jammy has duplicated messages, and focal has that plus duplicated changelog entries? -- for HA? x) [1] https://wiki.ubuntu.com/SecurityTeam/UpdatePreparation#Update_the_packaging [2] https://dep-team.pages.debian.net/deps/dep3/ Brett, per our email conversation, please ignore this: > - quilt series: missing 'ubuntu/' dir on k/j (not on f) I missed that focal uses `d/p/ubuntu/` too (it just wasn't present in `d/p/series` context lines in the debdiff, for CVEs). Sorry for the confusion, and thanks for checking! Hi Mauricio, Thanks for your review. I've made the changes you've requested. Looking forward to your feedback. Thanks, Brett! Very minor nitpicks left (changelog entry/release for lunar, and URL for Origin:), I can handle those. For Lunar/devel release, I'll send a MR for Christian to review/upload (my upload rights are for stable releases). This includes a fix to FTBFS per a package change in the last 24 hours :) happy to catch it now! It's currently (re)build-testing on all supported archs. If all goes well now, I'll send the MR for Lunar, and once it lands, we'll proceed w/ SRUs. ... I also played with GDB for a synthetic reproducer. It seems to be possible, but needs a little more study on the monitor path. We can sync on that later! All archs finished building successfully on ppa:mfo/lp1994002v2. Just sent the MR for Lunar. If/once it lands, I can do the SRUs. https://code.launchpad.net/~mfo/ubuntu/+source/qemu/+git/qemu/+merge/434118 For documentation purposes, The qemu package in lunar-proposed has its migration blocked to lunar(-release) because of autopkgtests failures (sbuild), which have been analyzed/understood. We're waiting on the autopkgtests queue to run sbuild w/ triggers on qemu _and_ sbuild from lunar-proposed, which should address the error w/ sbuild/unshare (lack of adduser command in the sbuild chroot, as apt no longer deps on that). Once that runs, we'll check if any other errors happen, and address those. cheers, Mauricio The sbuild autopkgtest failure on the 'unshare' test is indeed fixed w/ sbuild in lunar-proposed; however, now the test 'unshare-qemuwrapper' timed out. autopkgtest [23:36:43]: @@@@@@@@@@@@@@@@@@@@ summary build-procenv PASS unshare-qemuwrapper FAIL timed out unshare PASS It timed out on the 'guestfish' command, so I enabled `export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1` there, and run autopkgtests against its build in a PPA [1]. Then it finished successfully w/out timing out! x) autopkgtest [16:17:39]: @@@@@@@@@@@@@@@@@@@@ summary build-procenv PASS unshare-qemuwrapper PASS unshare PASS Not a very useful result, but it did show that an step in guestfish took ~25 minutes; 30 mins total: autopkgtest [15:22:52]: test unshare-qemuwrapper: [----------------------- ... + export LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 + guestfish <...> ... libguestfs: trace: tar_in "/tmp/.../ubuntu-lunar-host.tar" "/" ... tar -C /sysroot/ -xf - 2> /tmp/tarSfYHJX ... guestfsd: => tar_in (0x45) took 1489.08 secs ... autopkgtest [15:52:27]: test unshare-qemuwrapper: -----------------------] unshare-qemuwrapper PASS So, well, it might have been due to load in the autopkgtest infrastructure at the time tests ran, so just triggered retries on sbuild and sbuild+qemu. Hopefully they will pass and unblock proposed migration for both sbuild & qemu. [1] https://autopkgtest.ubuntu.com/results/autopkgtest-lunar-mfo-build/lunar/amd64/s/sbuild/20221215_161801_a2772@/log.gz The sbuild autopkgtests need a fix for lunar-proposed; reported bug 2000015 w/ analysis and debdiff attached. This bug was fixed in the package qemu - 1:7.0+dfsg-7ubuntu3 --------------- qemu (1:7.0+dfsg-7ubuntu3) lunar; urgency=medium [ Brett Milford ] * d/p/u/lp1994002-migration-Read-state-once.patch: Fix for libvirt error 'migration was active, but no RAM info was set' (LP: #1994002) [ Mauricio Faria de Oliveira ] * d/p/u/ebpf-replace-deprecated-bpf_program__set_socket_filt.patch: Fix FTBFS with libbpf 1.0.1-2. -- Mauricio Faria de Oliveira