semantic: 0.710
graphic: 0.688
assembly: 0.639
device: 0.637
mistranslation: 0.635
other: 0.620
instruction: 0.601
vnc: 0.444
network: 0.336
boot: 0.309
socket: 0.275
KVM: 0.245

qemu-img convert intermittently corrupts output images

-- Found in releases qemu-2.0.0, qemu-2.0.2, qemu-2.1.0. Tested on Ubuntu 14.04 using Ext4 filesystems.

The command

  qemu-img convert -O raw inputimage.qcow2 outputimage.raw

intermittently creates corrupted output images, when the input image is not yet fully synchronized to disk. While the issue has actually been discovered in operation of of OpenStack nova, it can be reproduced "easily" on command line using

  cat $SRC_PATH > $TMP_PATH && $QEMU_IMG_PATH convert -O raw $TMP_PATH $DST_PATH && cksum $DST_PATH

on filesystems exposing this behavior. (The difficult part of this exercise is to prepare a filesystem to reliably trigger this race. On my test machine some filesystems are affected while other aren't, and unfortunately I haven't found the relevant difference between them, yet. Possible it's timing issues completely out of userspace control ...)

The root cause, however, is the same as in

  http://lists.gnu.org/archive/html/coreutils/2011-04/msg00069.html

and it can be solved the same way as suggested in

  http://lists.gnu.org/archive/html/coreutils/2011-04/msg00102.html

In qemu, file block/raw-posix.c use the FIEMAP_FLAG_SYNC, i.e change 

    f.fm.fm_flags = 0;

to

    f.fm.fm_flags = FIEMAP_FLAG_SYNC;

As discussed in the thread mentioned above, retrieving a page cache coherent map of file extents is possible only after fsync on that file.

See also

  https://bugs.launchpad.net/nova/+bug/1350766

In that bug report filed against nova, fsync had been suggested to be performed by the framework invoking qemu-img. However, as the choice of fiemap -- implying this otherwise unneeded fsync of a temporary file  -- is not made by the caller but by qemu-img, I agree with the nova bug reviewer's objection to put it into nova. The fsync should instead be triggered by qemu-img utilizing the FIEMAP_FLAG_SYNC, specifically intended for that purpose.

Is there a minimum version of qemu that would be required to use the FIEMAP_FLAG_SYNC flag?

The affected code was introduced with version 1.2.0. However, due to https://bugs.launchpad.net/qemu/+bug/1193628 I can't build these old releases to verify whether they actually expose the same behaviour.

It seems the dust settles a bit: Found the relevant difference between my various filesystems, and how to reproduce the failure: Susceptible filesystems don't have the extent feature of ext4 enabled.

You can create such a filesystem using

  mke2fs -t ext4 -O ^extent /dev/...
  mount /mnt /dev/...
 
Adapting the command line example provided above you can see

  rm -f /mnt/tmp.qcow2
  cat $SRC_PATH > /mnt/tmp.qcow2 && qemu-img convert -O raw  /mnt/tmp.qcow /mnt/tmp.qcow
  cksum  /mnt/tmp.qcow

creating corrupt (usually nullified) result images. By inserting a sleep of at least 33 seconds between the cat command and the qemu-img invocation I'm getting proper output.

To me it's unclear now, where the actual defect is located. Creating ext4 filesystems with certain features disabled (such as the exetent tree) is apparently supported and ok. Is the fiemap ioctl supposed to handle this gracefully, for example by assuming FIEMAP_FLAG_SYNC in absence of an extent tree? Or are clients such as qemu-img supposed to always FIEMAP_FLAG_SYNC to be safe?

I see seek hole is supported in the latest qemu-img so I would reorder so that's tried first like:

    if lseek(SEEK_HOLE) == ENOTSUP
        use_that
        if fiemap(FIEMAP_FLAG_SYNC)
            use_that

The fallback cascade Pádraig mentions is already implemented in qemu-2.1.0, in function raw_co_get_block_status. Just swap

  ret = try_fiemap( ... )

and

  ret = try_seek_hole( ... )

to reverse the order. I can confirm that it works just fine on 3.13 kernel (all version since 3.1, according to lseek(2)), while older versions will fall back to fiemap, which needs to be protected with FIEMAP_FLAG_SYNC in try_fiemap, to be safe.

This should work under all conditions, and avoid redundant syncs where possible, right?

  
Marking as High since duplicate bug 1350766 was marked High.

openstack review at:
  https://review.openstack.org/#/c/123957/

Qemu patches at:
  http://patchwork.ozlabs.org/patch/393494/ ; and
  http://patchwork.ozlabs.org/patch/393495/

FWIW the following 2 commits in qemu master resolve the issue for qemu-img.

  http://git.qemu.org/?p=qemu.git;a=commit;h=38c4d0aea3e1264c86e282d99560330adf2b6e25
  http://git.qemu.org/?p=qemu.git;a=commit;h=7c15903789953ead14a417882657d52dc0c19a24

If possible they should be back ported to trusty and utopic.

You'll also need something like:

  http://git.qemu.org/?p=qemu.git;a=commit;h=4f11aa8a40351b28c0e67c7276e0003b38cc46ac

before my 2 patches.

Thanks for the information.  Looks like we can apply these in debian too.

Status changed to 'Confirmed' because the bug affects multiple users.

This bug was fixed in the package qemu - 2.1+dfsg-4ubuntu7

---------------
qemu (2.1+dfsg-4ubuntu7) vivid; urgency=medium

  * Apply two patches to fix intermittent qemu-img corruption
    (LP: #1368815)
    - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
    - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap
 -- Serge Hallyn <email address hidden>   Wed, 29 Oct 2014 22:31:43 -0500

Hi Serge,


Is there any chance these fixes will go into trusty?

Hi Tony,

yes, I've uploaded a proposed fix for trusty-proposed earlier today.  It should be available for testing as soon as it is accepted.

Awesome.

Thanks!

Hello Michael, or anyone else affected,

Accepted qemu into utopic-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu/2.1+dfsg-4ubuntu6.2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.  Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed.  In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in advance!

Hello Michael, or anyone else affected,

Accepted qemu into trusty-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg-2ubuntu1.8 in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.  Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed.  In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in advance!

Tested qemu-utils  2.0.0+dfsg-2ubuntu1.8. Successful.

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates.  Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report.  In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.8

---------------
qemu (2.0.0+dfsg-2ubuntu1.8) trusty-proposed; urgency=medium

  * debian/qemu-system-x86.qemu-kvm.upstart: create /dev/kvm in a
    container. (LP: #1370199)
  * Cherrypick upstream patch to fix intermittent qemu-img corruption
    (LP: #1368815)
    - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
    - (note - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap (which was
      also needed in utopic) appears to be unneeded here as the code being
      changed has not yet been switched to using try_fiemap)
 -- Serge Hallyn <email address hidden>   Thu, 20 Nov 2014 11:24:51 -0600

@Michael,

by any chance would you be albe to test on utopic?

I couldn't reproduce the bug on the old qemu myself, however Michael has verified the (same) fix on trusty, and the full qa-regression-test passed for me on utopic-proposed.  So I would request that we call this verification-done.

Looking at the fixes, I also see the following commits remove the above changes, which could mean we might encounter this again:
c4875e5 raw-posix: SEEK_HOLE suffices, get rid of FIEMAP
d1f06fe raw-posix: The SEEK_HOLE code is flawed, rewrite it

Note there is also a related issue:
bug 1292234
So far testing with the proposed qemu version or upstream I still encounter issues on ext4 w/ ^extent and ext3 filesystems.

Filed a separate issue for MOS https://bugs.launchpad.net/mos/+bug/1401261

Hi Chris,
Markus' rework will not reintroduce this bug as it completely removes all fiemap code.

bug 129224 is a different issue, I'll comment on that bug.

You say: you encounter issues with upstream with ^extent and ext3 filesystems.  Just to be clear: Are you saying that *this* bug is still a problem for you?

if it's a different bug then I write it up and I'll take a look.

Tony,

Yea, its a different bug. I tested with the above patched package and upstream qemu from git, and I can still hit bug 129224. I was hoping this also fixed my issue, but unfortunately it seems to be a different issue that occurs when using the same types of filesystems. I have a solid reproducer on my desk so let me know which experiments / areas of code / etc I should look at.

Just to clarify it's bug 1292234 in the previous comment.

Chris,
I've read through 1292234 and I'll have a play with your reproducer locally and see if I can gain any insight.

I'm sorry my fix didn't help 1292234, but glad you can't hit 1368815 with upstream, I was kinda having kittens here ;P

This bug was fixed in the package qemu - 2.1+dfsg-4ubuntu6.2

---------------
qemu (2.1+dfsg-4ubuntu6.2) utopic-proposed; urgency=medium

  * Apply two patches to fix intermittent qemu-img corruption
    (LP: #1368815)
    - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
    - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap
 -- Serge Hallyn <email address hidden>   Thu, 20 Nov 2014 16:33:09 -0600

I'm happy to tackle to also fix cinder with a version of the nova fix (for consistency).  I propose waiting until the nova fix lands

I'd elevate this to high so it matches nova and ubuntu but I don't have permissions to do so.

Fix proposed to branch: master
Review: https://review.openstack.org/141259

> - 501-block-raw-posix-fix-disk-corruption-in-try-fiemap
>   - (note - 502-block-raw-posic-use-seek-hole-ahead-of-fiemap (which was
>     also needed in utopic) appears to be unneeded here as the code being
>      changed has not yet been switched to using try_fiemap)

Actually such a enforces fsync and drastically reduces the performance of conversion.
I propose to use seek_hole instead of FIEMAP (which is basically what 
 502-block-raw-posic-use-seek-hole-ahead-of-fiemap does). 


The second part of the fix (which does not reduce the performance) for qemu 2.0 (apparently uploading two patches at once is not so easy)

Patchg 0500-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch appears to be part of a bigger re-write of the related code.   and is ON TOP of the patches already applied in this bug.


No doubt the rewirtten code is "better" but backporting it contains more risk than the 2 simple fixes I already nominated.

> Patch 0500-block-raw-posix-Try-both-FIEMAP-and-SEEK_HOLE.patch appears to be part of a bigger re-write
> of the related code. and is ON TOP of the patches already applied in this bug.

Yep, sorry for not mentioning this. As far as I understand qemu-2.1 package contains this partially rewritten
code too (without any recent changes like disabling FIEMAP completely and rewriting the code using SEEK_HOLE).

> No doubt the rewirtten code is "better" but backporting it contains more risk than the 2 simple fixes I already nominated.

Can we completely disable the FIEMAP code and pretend that all blocks are allocated? I'm afraid fsync'ing 100+ GB
files might be even slower than ignoring the sparseness.

Change abandoned by John Griffith (<email address hidden>) on branch: master
Review: https://review.openstack.org/141259

Fix proposed to branch: master
Review: https://review.openstack.org/143575

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/143575
Reason: 1 month, no update.

Change abandoned by Tony Breeds (<email address hidden>) on branch: master
Review: https://review.openstack.org/123957
Reason: The main distros we care about have landed or are in progress.

Marking as Wont-Fix.

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/143575
Reason: No activity for over a month.

Closing based on the assumption that a working qemu-img is available now.

According to comment #8 the fixes have been included in the upstream QEMU repository, so setting the status to "Fix released" now.