summary refs log tree commit diff stats
path: root/gitlab/issues_text/target_missing/host_missing/accel_missing/727
diff options
context:
space:
mode:
Diffstat (limited to 'gitlab/issues_text/target_missing/host_missing/accel_missing/727')
-rw-r--r--gitlab/issues_text/target_missing/host_missing/accel_missing/727156
1 files changed, 0 insertions, 156 deletions
diff --git a/gitlab/issues_text/target_missing/host_missing/accel_missing/727 b/gitlab/issues_text/target_missing/host_missing/accel_missing/727
deleted file mode 100644
index 2ed6bc8e5..000000000
--- a/gitlab/issues_text/target_missing/host_missing/accel_missing/727
+++ /dev/null
@@ -1,156 +0,0 @@
-VHDX is corrupted on expansion
-Description of problem:
-Fresh VHDX corrupts with data loss upon copying data into it.
-Steps to reproduce:
-1. Create new dynamic vhdx file of about 93Gib (unexpanded, starting size is small ~205Mib, freshly created and NTFS formatted in windows.) 
-2. Connect drive using qemu-nbd to /dev/nbd0
-3. Ensure partition using gdisk
-4. format partition with ntfs/ExFAT volume
-5. mount volume
-6. copy/rsync data of about 85Gib of data into the mounted volume
-7. unmount volume
-8. disconnect /dev/nbd0
-9. reconnect /dev/nbd0
-10. attempt mount, sometimes mount may fail if corrupted
-11. If mount succeeds, verify data/all-files using some method like sha256sum. Some data is likely to fail
-
-Given the amount of data I am rsync-ing into the volume, there is very high chance of corruption. 
-
-The corruption is not apparent until **disconnection and reconnection** of virtual-disk. Simply unmounting and remounting without disconnecting is unlikely to cause one to suspect corruption. 
-
-If the expanded corrupted volume is again disconnected, reconnected, reformatted and data is again re-copied onto it, then the volume is less likely to experience a corruption, perhaps because new block allocation is not required.
-
-Errors vary and include:
-- sometimes mount fails
-- sometimes ls -l output is garbled
-- sometimes one cannot cd into a directory
-- several consecutive errors in shasum256 start midway through the file-list processing. Error is shown as if rsync failed and files do not exist.
-  ```
-  sha256sum: ./201207/IMG_2406.JPG: No such file or directory
-  ./201207/IMG_2406.JPG: FAILED open or read
-  ```
-- Doing chdsk on windows may just create FOUND.000/FILE0000.CHK files.
-Additional information:
-See comment https://gitlab.com/qemu-project/qemu/-/issues/136#note_731044761 from where this all began. Some summary included here.
-
-```
-[root@sirius a16]# uname -a
-Linux sirius 5.15.0-60.fc35.x86_64 #1 SMP Tue Nov 2 15:38:03 IST 2021 x86_64 x86_64 x86_64 GNU/Linux
-
-[root@sirius ~]# qemu-system-x86_64 --version
-QEMU emulator version 6.1.0 (qemu-6.1.0-10.fc35)
-Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers
-
-[root@sirius ~]# cat /etc/mtab | grep -E "a16|a17" | grep ntfs3
-/dev/sda16 /mnt/a16 ExFAT rw,relatime,fmask=0022,dmask=0022,iocharset=utf8,errors=remount-ro 0 0
-/dev/sda17 /mnt/a17 ntfs3 rw,relatime,uid=0,gid=0,iocharset=utf8 0 0
-
-[root@sirius ~]# uname -a # self-built rpmbuild kernel from fedora rawhide kernel-src rpm 
-Linux sirius 5.15.0-60.fc35.x86_64 #1 SMP Tue Nov 2 15:38:03 IST 2021 x86_64 x86_64 x86_64 GNU/Linux
-```
-
-Test/Activity being done: About 85Gib of data is copied onto a size 93Gib VHDX on host-FS ntfs3 with guest-FS ntfs3. 
-```
-Prefer windows method: Inside windows-10, using powershell command New-VHD, one may a 93Gib VHDX
-  New-VHD -Path I:\gkpics01.vhdx -SizeBytes 99723771904 -Dynamic
-  Then attach disk and format volume inside to ntfs.
-or Alternatively, Linux method (less preferred)
-  qemu-img create -f qcow2 /mnt/a16/gkpics01.qcow2 99723771904
-  qemu-img create -f vhdx -o subformat=dynamic /mnt/a16/gkpics01.vhdx 99723771904
-:
-sync ; sleep 1 ; qemu-nbd -c /dev/nbd0 /mnt/a16/gkpics01.vhdx
-:
-create appropriate partitions on /dev/nbd0 if not already partitioned
-gdisk /dev/nbd0 
-:
-format volume with filesystem ntfs, or ext4 etc if not already formatted
-mkfs -t ntfs -Q -L fs_gkpics01 /dev/nbd0p2 
-:
-mount partition
-sync ; sleep 1 ; mount -t ntfs3 /dev/nbd0p2 /mnt/t1
-:
-do copy/rsync etc
-( fl="photos001" ; src="/mnt/c13" ; dst="/mnt/t1" ; cd "$src" ;rsync -avH "$fl" "$dst" ; sudo -u gana DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus DISPLAY=:0.0 -- notify-send "$src/$fl" "rsync $src/$fl" )
-:
-sync ; sleep 1 ; umount /mnt/t1
-:
-sync ; sleep 1 ; blockdev --flushbufs /dev/nbd0 ; sleep 2 ; qemu-nbd -d /dev/nbd0 ; sleep 1 ; sync
-:
-sync ; sleep 1 ; qemu-nbd -c /dev/nbd0 /mnt/a16/gkpics01.vhdx
-:
-sync ; sleep 1 ; mount -t ntfs3 /dev/nbd0p2 /mnt/t1
-:
-do ls-l/verify/sha256sum-c etc
-( fl="photos001" ; rtpt="/mnt/t1" ; cd "${rtpt}/${fl}" ; sdate=`date` ; echo "$sdate" ; sha256sum -c "$rtpt/$fl/find.CHECKSUM" --quiet ; echo "$sdate" ; date ; sudo -u gana DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus DISPLAY=:0.0 -- notify-send "$src/$fl" "checksum $src/$fl" )
-```
-
-In the below list detailing under what circumstance corruption occurs
-
-- Format:  kernel-version/ disk-attaching-sw/ hostFS/ VDISK/ guestFS with any parameters in parenthesis.
-- Corruption does happen with kernel-5.15.0-60/qemu-6.1.0-10/ntfs3/VHDX/ntfs3
-- Corruption does happen with kernel-5.15.0-60/qemu-6.1.0-10/ntfs3/VHDX/ext4
-- Corruption does happen with kernel-5.15.0-60/guestfish-1.46.0(backend=direct)/ntfs3/VHDX/ntfs3
-- Corruption does happen with kernel-5.15.0-60/guestfish-1.46.0(backend=libvirt-7.6.0-3)/ntfs3/VHDX/ntfs3
-- Corruption does happen on host-FS **ExFAT too** with kernel-5.15.0-60/qemu-6.1.0-10/ExFAT/VHDX/ntfs3
-- Corruption does happen with kernel-5.15.0-60/qemu-6.0.0-10/ExFAT/VHDX/ntfs3
-- Corruption does happen with kernel-5.14.18-300/qemu-6.0.0-12/ExFAT/VHDX/ntfs3g-fuseblk
-- Corruption does happen with kernel-5.14.18-300/qemu-6.0.0-12/ExFAT/VHDX(created by qemu-img)/ntfs3g-fuseblk
-  ``` Failed to mount '/dev/nbd0p2': Input/output error NTFS is either inconsistent, or there is a hardware fault,```
-- Corruption does **not** happen with kernel-5.14.18-300/qemu-6.0.0-12/ExFAT/qcow2/ext4
-- Corruption does **not** happen with kernel-5.14.18-300/qemu-6.0.0-12/ExFAT/qcow2/ntfs3g-fuseblk
-- Corruption does happen with kernel-5.15.0-60/qemu-6.1.0-10/ExFAT/VHDX(cache=none,aio=threads)/ntfs3
-- Corruption does happen with kernel-5.15.0-60/qemu-6.1.0-10/ExFAT/VHDX(cache=none,aio=io_uring)/ntfs3
-- VHDX fixed disk grows in size. Filed as different bug: https://gitlab.com/qemu-project/qemu/-/issues/806 
-  - Corruption **does happen** with kernel-5.15.0-60/qemu-6.1.0-10/ExFAT/VHDX(fixed)/ntfs3
-    A fixed vhdx disk should not grow in size. It is as if the blocks are added to a vhdx-journal instead of overwriting preallocated blocks.
-- Corruption does happen with kernel-5.15.0-60/qemu-6.1.0-10/ext4/VHDX/ntfs3
-- Corruption does happen with kernel-5.15.2-200/**qemu-6.2.0-rc1**/ExFAT/VHDX/ntfs3
-- Corruption does **not** happen with kernel-5.15.2-200/qemu-6.2.0-rc1/ExFAT/**VMDK**(v4,monolithicSparse)/ntfs3
-- Corruption does not happen with kernel-5.15.2-200/qemu-6.2.0-rc1/ExFAT/VMDK(compat6,monolithicSparse)/ntfs3
-- Corruption does **not** happen with kernel-5.15.2-200/qemu-6.2.0-rc1/ExFAT/**VDI**/ntfs3
-- Corruption does **not** happen with kernel-5.15.2-200/qemu-6.2.0-rc1/ExFAT/**VPC**(dynamic)/ntfs3
-- Corruption does happen with kernel-5.15.2-200/**qemu-5.2.0-8**/ExFAT/VHDX/ntfs3
-- Corruption does happen with kernel-5.15.2-200/**qemu-4.2.1-1**/ExFAT/VHDX/ntfs3
-- Corruption does happen with vhdx-file is on 2Tb NTFS 1Tb partition of **external USB HDD** 2Tb,  with kernel-5.15.2-200/qemu-6.2.0-rc1/ntfs3/VHDX/ntfs3
-- Corruption does happen when using src is on ntfs3 partition on external USB drive, which is **generated synthetic data (sgdata)** sgdata/kernel-5.15.2-200/qemu-6.2.0-rc1/ExFat/VHDX/ntfs3
-- Corruption does happen when starting with qemu-img created vhdx image with sgdata/kernel-5.15.2-200/qemu-6.2.0-rc1/ExFat/VHDX(created by qemu-img)/ext4 superblock mount fail
-- Corruption does happen older fc34-kernel on Fedora-35, sgdata/kernel-5.13.19-200/qemu-6.2.0-rc2/ExFAT/VHDX/ntfs3g-fuseblk , different, fewer files 3 small files affected
-- Corruption does happen with older fc32-kernel on Fedora-35, sgdata/kernel-5.11.22-100/qemu-6.2.0-rc2/ExFAT/VHDX/ntfs3g-fuseblk , fewer files, different, but same as above  3 small files affected, 
-- Corruption does happen with older fc32-kernel on Fedora-35, sgdata/kernel-5.11.22-100/qemu-6.2.0-rc2/ExFAT/VHDX/ext4  
-- Corruption does happen with self-built 5.10 LTS kernel on Fedora-35, sgdata/kernel-5.10.90-200/qemu-6.2.0-1/ExFAT/VHDX/ext4 (sgdata accessed using ntfs-fuseblk)  
-- As the host kernel invoking qemu-nbd, these kernels showed less errors than if they were run inside a VM as a guest. If run as a guest VM, These kernels, 5.15.4 and above, may also have kernel bugs https://bugzilla.kernel.org/show_bug.cgi?id=215460 or https://bugzilla.kernel.org/show_bug.cgi?id=215563 resulting in additional compounded errors in the failure test results, even in raw-img and qcow2(fixed).
-  - Corruption does happen with sgdata/kernel-5.15.4-201/qemu-6.2.0-rc1/ExFAT/VHDX(created by qemu-img)/ext4
-  - Corruption does happen with sgdata/kernel-5.15.4-201/**qemu-6.2.0-rc2**/ExFAT/VHDX(created by qemu-img)/ext4
-  - Corruption does not happen with synthetic-data sgdata/kernel-5.15.4-201/qemu-6.2.0-rc2/ExFAT/VMDK(created by qemu-img)/ext4
-  - Corruption does happen with sgdata/kernel-5.15.5-200/qemu-6.2.0-rc2/ExFAT/VHDX(created by qemu-img)/ext4
-  - Corruption does not happen with sgdata/kernel-5.15.4-201/nbdkit-1.28.2-nbdplugin-qemu-6.2.0-0.rc2/ExFAT/vmdk/ntfs3
-  - Corruption does not happen with sgdata/kernel-5.15.4-201/nbdkit-1.28.2-nbdplugin/ExFAT/vmdk-nbd-vddkplugin/ntfs3
-  - Corruption does happen with sgdata/kernel-5.15.4-201/nbdkit-1.28.2-nbdplugin-qemu-6.2.0-0.rc2/ExFAT/VHDX/ntfs3
-  - Corruption does happen with sgdata/kernel-5.15.6-200 to kernel-5.15.13-200 /qemu-6.2.0-0.rc2/ExFAT/VHDX/ntfs3
-- On Windows-10, these tests may possibly be different bug. Also causes system-wide DiskIO stuck in addition to corruption https://github.com/cloudbase/wnbd/issues/63
-  - Corruption does happen with sgdata/**WIN10**-21H2-19044-1415/**WNBD**-0.2.2-4-g10c1fbe/qemu-6.2.0-rc4/ExFAT/VHDX/NTFS
-  - Corruption **does happen** with sgdata/**WIN10**-21H2-19044-1415/**WNBD**-0.2.2-4-g10c1fbe/qemu-6.2.0-rc4/ExFAT/**qcow2**/NTFS 
-- Possibly different bug, on Windows-10, corruption of virtual-disk from inside VM, no nbd . Maybe https://bugzilla.kernel.org/show_bug.cgi?id=215460 or https://bugzilla.kernel.org/show_bug.cgi?id=215563
-  - Win10-21H2-19044-1415/WHPX/ExFAT/qemu-6.2.0-rc4/alpine-linux-3.15/kernel-5.15.4/VHDX/ntfs3
-  - Win10-21H2-19044-1415/WHPX/ExFAT/qemu-6.2.0-rc4/alpine-linux-3.15/kernel-5.15.4/**qcow2**/ext4
-- Corruption does **not** happen with Fedora-35/kernel-5.17.0-0.rc3.89(SB)/qemu-6.2.0-2/Fedora-Rawhide-202208/kernel-5.17.0-0.rc3.89/ExFAT/**qcow2(dyn)**/ntfs3 data-src: VHDX(dyn)/ntfs3/sgdata
-- Corruption does **not** happen with Fedora-35/kernel-5.17.0-0.rc3.89(SB)/qemu-6.2.0-2/Fedora-Rawhide-202208/kernel-5.17.0-0.rc3.89/ExFAT/qcow2(dyn)/ntfs3 data-src: VHDX(dyn)/**ntfs-fuseblk**/sgdata
-- Corruption **does** happen with Fedora-35/kernel-5.17.0-0.rc3.89(SB)/qemu-6.2.0-2/Fedora-Rawhide-202208/kernel-5.17.0-0.rc3.89/ExFAT/**VHDX**/ntfs3 data-src: VHDX(dyn)/ntfs3/sgdata
-- Corruption **does** happen with Fedora-35/kernel-5.17.0-0.rc3.89(SB)/qemu-6.2.0-2/Fedora-Rawhide-202208/kernel-5.17.0-0.rc3.89/ExFAT/VHDX/ext4 data-src: VHDX(dyn)/**ntfs-fuseblk**/sgdata
-- Corruption **does** happen with Fedora-35/kernel-5.17.0-0.rc3.89(SB)/qemu-6.2.0-2/**Rocky-8.5-Workstation-20211114.iso**/**kernel-4.18.0-348.el8.0.2.x86_64**/ExFAT/VHDX/ext4 data-src: VHDX(dyn)/**ntfs-fuseblk**/sgdata
-
-ExFAT filesystem was considered because it does not have concept of sparse files eliminating that factor from troubleshooting. Furthermore, it may be incorrect to suspect NTFS3, ExFAT or NTFS3g-fuseblk only because they are new/recently mainstreamed filesystems, as there aren't any intense/complex filesystem operations. The filesystem is experiencing only though-put and files are simply copied into it without further operations. Furthermore, ext4 also experiences corruption if on VHDX.
-
-It just seems to me the VHDX support implementation has bugs, corrupts and hence is not reliable.
-
-The qemu test-suite needs test-cases added for testing for vhdx-stress and vhdx-throughput .
-
-More troubleshooting test results are summarized in https://gitlab.com/qemu-project/qemu/-/issues/727#note_745711084
-
-Chief suspect files
-- ~~kernel: nbd: [drivers/block/nbd.c](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/block/nbd.c)~~ can be made to happen via VM
-- ~~kernel: ntfs3~~ no ntfs3 partition required
-- ~~kernel 5.x series~~ bug exists in 4.18.0.348
-- ~~qemu: block~~ doesn't happen to other virtual-disk formats (raw,qcow2) 
-- qemu/VM : seems to happen only when using qemu-nbd or inside qemu-VM
-- qemu: [block/vhdx.c](https://gitlab.com/qemu-project/qemu/-/blob/master/block/vhdx.c) , [block/vhdx_log.c](https://gitlab.com/qemu-project/qemu/-/blob/master/block/vhdx-log.c) , [block/vhdx-endian.c](https://gitlab.com/qemu-project/qemu/-/blob/master/block/vhdx-endian.c) , [block/vhdx.h](https://gitlab.com/qemu-project/qemu/-/blob/master/block/vhdx.h),