1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
On Windows, qcow2 is corrupted on expansion
Description of problem:
On Windows, the qcow2 loses blocks on account of which the filesystem withing is corrupted as data is copied to it, just the same way as in #727 VHDX is corrupted on expansion on both Linux/Windows.
After filing a bug for WNBD https://github.com/cloudbase/wnbd/issues/63 , I was suggested to try raw and qcow2. In the process I found that qcow2 is also affected. But it is also true that the kernel-5.15.4 ... 5.15.13 series have also been buggy https://bugzilla.kernel.org/show_bug.cgi?id=215460 .
On Linux, qcow2 never showed any signs of corruption.
On Windows, however, qcow2 does corrupt.
It is possible that, as Linux is so much more efficient at files and disk-IO, the kernel-block-code, qemu-block-code and qemu-qcow2-code do not hit the bug, and so the corruption does not show up as easily in Linux. Windows, being a little slower at this, might be causing the bug to show up in this qcow2 test. Possibly, the issue more likely to show up on slower machines. I am using an 2013-era intel-4rth gen i7-4700mq Haswell machine.
It is possible that, the resolution for this issue and that for #727 could be the same or very closely related. The bug may not be in qcow2.c or vhdx.c but maybe in the qemu/block subsystem. If the data-block that arrives from the VM-interface/nbd-interface which has to be written to file, but never gets to the virtual-disk code, not allocated and written to, then the data-block is lost.
Steps to reproduce:
1. Prepare virtual-disk1 as empty qcow2. In my-setup, the qcow2 file resides on an 150 GiB ExFAT partition on 512 GiB SSD. I use ExFAT as the ExFAT-filesystem does not have a concept of sparse files, eliminating that factor from troubleshooting.
```qemu-img.exe create -f qcow2 H:\gkpics01.qcow2 99723771904```
2. Prepare virtual-disk2 VHDX with synthetic generated data (sgdata). Scriptlets to recreate sgdata are described in https://gitlab.com/qemu-project/qemu/-/issues/727#note_739930694 . In my-setup, the vhdx file resides on an 1 TiB NTFS partition on a 2 TiB HDD.
3. Start qemu with arguments as given above.
4. Inside VM, boot and bringup livecd desktop, close the installer and open a terminal
5. Use gdisk to put an ext4 partition on /dev/sda
6. Put ext4 partition on sda1 ```mkfs.ext4 -L fs_gkpics01 /dev/sda1```
7. Create mount directories ```mkdir /mnt/a /mnt/b```
8. Mount the empty partition from virtual-disk-1 ```mount -t ext4 /dev/sda1 /mnt/a```
9. Mount the sgdata partition from virtual-disk-2 ```mount.ntfs-3g /dev/sdb2 /mnt/b``` or ```mount -t ntfs3 /dev/sdb2 /mnt/b```
10. Keep a terminal tab open with ```dmesg -w``` running
11. Rsync sgdata ```( sdate=`date` ; cd /mnt/b ; rsync -avH ./photos001 /mnt/a | tee /tmp/rst.txt ; echo $sdate ; date )```
12. Check sha256sum ```( sdate=`date` ; cd /mnt/a/photos001 ; shas256sum -c ./find.CHECKSUM --quiet ; echo $sdate ; date )```
corruption will show even without needing to unmount-remount or reboot-remount.
- About 1.4 GiB free-space left on the ext4 partition.
- Compared to #727, The number of files corrupted are less ``` sha256sum: WARNING: 31 computed checksums did not match ```
- After, VM guest OS warm reboot, a recheck of the sha256sum shows the same 31 files as corrupted
- After, qemu poweroff, restart qemu, VM guest OS cold boot, a recheck of the sha256sum shows the same 31 files as corrupted
- df shows: sda1 has 95271336 1k-blocks, of which 88840860 are used, 1544820 available, 99% used. The numbers don't add up. Either file-blocks are lost in lost-clusters or the ext4-filesystem has a large journal or the file-system-metadata is too large, or the ext4-filesystem has large cluster-size which results in inefficient space usage.
- An ```unmount /dev/sda1 ; fsck -y /dev/sda1 ; mount -t ext4 /dev/sda1 /mnt/a``` did not find any lost clusters.
The reason I don't think this is a kernel bug, is because the raw-file as virtual-disk-1 doesn't show this issue. Also, it happens regardless of whether sgdata is on ntfs-3g or ntfs3-paragon.
Additional information:
|