semantic: 0.710 graphic: 0.688 assembly: 0.639 device: 0.637 mistranslation: 0.635 other: 0.620 instruction: 0.601 vnc: 0.444 network: 0.336 boot: 0.309 socket: 0.275 KVM: 0.245 qemu-img convert intermittently corrupts output images -- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on Ubuntu 14.04 using Ext4 filesystems. The command qemu-img convert -O raw inputimage.qcow2 outputimage.raw intermittently creates corrupted output images, when the input image is not yet fully synchronized to disk. While the issue has actually been discovered in operation of of OpenStack nova, it can be reproduced "easily" on command line using cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH on filesystems exposing this behavior. (The difficult part of this exercise is to prepare a filesystem to reliably trigger this race. On my test machine some filesystems are affected while other aren't, and unfortunately I haven't found the relevant difference between them, yet. Possible it's timing issues completely out of userspace control ...) The root cause, however, is the same as in http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html and it can be solved the same way as suggested in http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change f.fm.fm_flags = 0; to f.fm.fm_flags = FIEMAP_FLAG_SYNC; As discussed in the thread mentioned above, retrieving a page cache coherent map of file extents is possible only after fsync on that file. See also https://bugs.launchpad.net/nova/+bug/1350766 In that bug report filed against nova, fsync had been suggested to be performed by the framework invoking qemu-img. However, as the choice of fiemap -- implying this otherwise unneeded fsync of a temporary file -- is not made by the caller but by qemu-img, I agree with the nova bug reviewer's objection to put it into nova. The fsync should instead be triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC, specifically intended for that purpose. Is there a minimum version of qemu that would be required to use the FIEMAP_FLAG_SYNC flag? The affected code was introduced with version 1.2.0. However, due to https://bugs.launchpad.net/qemu/+bug/1193628 I can't build these old releases to verify whether they actually expose the same behaviour. It seems the dust settles a bit: Found the relevant difference between my various filesystems, and how to reproduce the failure: Susceptible filesystems don't have the extent feature of ext4 enabled. You can create such a filesystem using mke2fs -t ext4 -O ^extent /dev/... mount /mnt /dev/... Adapting the command line example provided above you can see rm -f /mnt/tmp.qcow2 cat $SRC_PATH > /mnt/tmp.qcow2 && qemu-img convert -O raw /mnt/tmp.qcow /mnt/tmp.qcow cksum /mnt/tmp.qcow creating corrupt (usually nullified) result images. By inserting a sleep of at least 33 seconds between the cat command and the qemu-img invocation I'm getting proper output. To me it's unclear now, where the actual defect is located. Creating ext4 filesystems with certain features disabled (such as the exetent tree) is apparently supported and ok. Is the fiemap ioctl supposed to handle this gracefully, for example by assuming FIEMAP_FLAG_SYNC in absence of an extent tree? Or are clients such as qemu-img supposed to always FIEMAP_FLAG_SYNC to be safe? I see seek hole is supported in the latest qemu-img so I would reorder so that's tried first like: if lseek(SEEK_HOLE) == ENOTSUP use_that if fiemap(FIEMAP_FLAG_SYNC) use_that The fallback cascade Pádraig mentions is already implemented in qemu-2.1.0, in function raw_co_get_block_status. Just swap ret = try_fiemap( ... ) and ret = try_seek_hole( ... ) to reverse the order. I can confirm that it works just fine on 3.13 kernel (all version since 3.1, according to lseek(2)), while older versions will fall back to fiemap, which needs to be protected with FIEMAP_FLAG_SYNC in try_fiemap, to be safe. This should work under all conditions, and avoid redundant syncs where possible, right? Marking as High since duplicate bug 1350766 was marked High. openstack review at: https://review.openstack.org/#/c/123957/ Qemu patches at: http://patchwork.ozlabs.org/patch/393494/ ; and http://patchwork.ozlabs.org/patch/393495/ FWIW the following 2 commits in qemu master resolve the issue for qemu-img. http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0aea3e1264c86e282d99560330adf2b6e25 http://git.qemu.org/?p=qemu.git;a=commit;h=7c15903789953ead14a417882657d52dc0c19a24 If possible they should be back ported to trusty and utopic. You'll also need something like: http://git.qemu.org/?p=qemu.git;a=commit;h=4f11aa8a40351b28c0e67c7276e0003b38cc46ac before my 2 patches. Thanks for the information. Looks like we can apply these in debian too. Status changed to 'Confirmed' because the bug affects multiple users. This bug was fixed in the package qemu - 2.1+dfsg-4ubuntu7 --------------- qemu (2.1+dfsg-4ubuntu7) vivid; urgency=medium * Apply two patches to fix intermittent qemu-img corruption (LP: #1368815) - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap -- Serge Hallyn