summary refs log tree commit diff stats
path: root/results/classifier/zero-shot/105/graphic/1422307
blob: 1237aa557f3e057556a2f73c0bf937dd059276e6 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
graphic: 0.984
other: 0.968
assembly: 0.967
semantic: 0.964
device: 0.959
mistranslation: 0.959
instruction: 0.956
socket: 0.954
boot: 0.943
vnc: 0.941
network: 0.933
KVM: 0.890

qemu-nbd corrupts files

Dear all,

On Trusty, in certain situations, try to copy files over a qemu-nbd mounted file system leads to write errors (and thus, file corruption).

Here is the last example I tried:
-> virtual disk is a VDI disk
-> It has only one partition, in FAT

Here is my mount process:
# modprobe nbd max_part=63
# qemu-nbd -c /dev/nbd0 "virtual_disk.vdi"
# partprobe /dev/nbd0
# mount /dev/nbd0p1 /tmp/mnt/

Partition is properly mounted at that point:
/dev/nbd0p1 on /tmp/mnt type vfat (rw)

Now, when I copy a file (rather big, ~28MB):
# cp file_to_copy /tmp/mnt/ ; sync
# md5sum /tmp/mnt/file_to_copy
2efc9f32e4267782b11d63d2f128a363  /tmp/mnt/file_to_copy
# umount /tmp/mnt 
# mount /dev/nbd0p1 /tmp/mnt/
# md5sum /tmp/mnt/file_to_copy
42b0a3bf73f704d03ce301716d7654de  /tmp/mnt/file_to_copy

The first hash was obviously the right one.

On a previous attempt I did, I spotted thanks to vbindiff that parts of the file were just filed with 0s instead of actual data.
It will randomly work after several attempts to write.

Version information:
# qemu-nbd --version
qemu-nbd version 0.0.1
Written by Anthony Liguori.

Cheers,

Hi Pierre,

I can reproduce the bug with a 2 GB VDI image with a single FAT32-formatted partition (on git master):

# cp src.vdi test.vdi
# ./qemu-nbd -c /dev/nbd0 test.vdi
# dd if=/dev/urandom of=/dev/nbd0 bs=1M count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 3.34091 s, 20.1 MB/s
# md5sum /dev/nbd0
bfa6726d0d8fe752c0c7dccbf770fae6  /dev/nbd0
# sync
# echo 1 > /proc/sys/vm/drop_caches
# md5sum /dev/nbd0
cb4762769e09ed6da5e327710bfb3996  /dev/nbd0
# ./qemu-nbd -d /dev/nbd0
/dev/nbd0 disconnected

Using qcow2 or not using NBD I cannot reproduce the issue. Using a qcow2 image and converting it to VDI, the issue appears again.

Using an empty VDI image, or one filled with random data, the issue does not appear either.

I have attached a qcow2 image for others to test:

# ./qemu-img convert -O vdi src.qcow2 test.vdi; ./qemu-nbd -c /dev/nbd0 test.vdi; dd if=/dev/urandom of=/dev/nbd0 bs=1M count=64; md5sum /dev/nbd0; sync; echo 1 > /proc/sys/vm/drop_caches; md5sum /dev/nbd0; ./qemu-nbd -d /dev/nbd0                                                                                                                                                                 
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 3.33071 s, 20.1 MB/s
9f683b4a58cecdd8da04ec2f1b7abc4a  /dev/nbd0
efb1cdd5ebe1dd326056eb2f2e500944  /dev/nbd0
/dev/nbd0 disconnected

Unfortunately, I do not yet know the cause of this issue.

Max



For whatever reason, using an empty image now works for me, too:

$ ./qemu-img create -f vdi test.vdi 64M; ./qemu-nbd -c /dev/nbd0 test.vdi; dd if=/dev/urandom of=/dev/nbd0 bs=1K count=16384; md5sum /dev/nbd0; sync; echo 1 > /proc/sys/vm/drop_caches; md5sum /dev/nbd0; ./qemu-nbd -d /dev/nbd0
Formatting 'test.vdi', fmt=vdi size=67108864 static=off
16384+0 records in
16384+0 records out
16777216 bytes (17 MB) copied, 0.982225 s, 17.1 MB/s
216f7abbf90bf2539163396bdb7fd7b9  /dev/nbd0
a42faf71124c1f6102fa39cea82a1c86  /dev/nbd0
/dev/nbd0 disconnected

Writing less than 16384 kB, the issue is not always reproducible; for me, it disappears around 16160 kB (it's fuzzy, sometimes it appears, sometimes it doesn't).

So far I was only able to reproduce the issue by connecting qemu-nbd to the the Linux NBD interface; connecting to qemu-nbd via TCP worked fine.

So, a couple of test cases:

VDI and NBD over /dev/nbd0:
# for i in $(seq 0 9); do ./qemu-img create -f vdi test.vdi 64M > /dev/null; ./qemu-nbd -c /dev/nbd0 test.vdi; sleep 1; ./qemu-img convert -n blob.raw /dev/nbd0; ./qemu-img convert /dev/nbd0 test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert /dev/nbd0 test2.raw; ./qemu-nbd -d /dev/nbd0 > /dev/null; if ! ./qemu-img compare -q test1.raw test2.raw; then md5sum test1.raw test2.raw; echo "$i failed"; break; fi; done; echo 'done'
e5185b807948d65bb4e837d992cea429  test1.raw
9907ca700f6ee4d4cdb136bb90fd8df1  test2.raw
6 failed
done

VDI and NBD over TCP:
# for i in $(seq 0 9); do ./qemu-img create -f vdi test.vdi 64M > /dev/null; (./qemu-nbd -t test.vdi &); sleep 1; ./qemu-img convert -n blob.raw nbd://localhost; ./qemu-img convert nbd://localhost test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert nbd://localhost test2.raw; killall qemu-nbd; if ! ./qemu-img compare -q test1.raw test2.raw; then md5sum test1.raw test2.raw; echo "$i failed"; break; fi; done; echo 'done'      
done

VDI and NBD over a Unix socket:
# for i in $(seq 0 9); do ./qemu-img create -f vdi test.vdi 64M > /dev/null; (./qemu-nbd -k /tmp/nbd -t test.vdi &); sleep 1; ./qemu-img convert -n blob.raw nbd+unix:///\?socket=/tmp/nbd; ./qemu-img convert nbd+unix:///\?socket=/tmp/nbd test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert nbd+unix:///\?socket=/tmp/nbd test2.raw; killall qemu-nbd; if ! ./qemu-img compare -q test1.raw test2.raw; then md5sum test1.raw test2.raw; echo "$i failed"; break; fi; done; echo 'done'                                     
done

VDI without NBD:
# for i in $(seq 0 9); do ./qemu-img create -f vdi test.vdi 64M > /dev/null; ./qemu-img convert -n -O vdi blob.raw test.vdi; ./qemu-img convert test.vdi test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert test.vdi test2.raw; if ! ./qemu-img compare -q test1.raw test2.raw; then md5sum test1.raw test2.raw; echo "$i failed"; break; fi; done; echo 'done'
done

qcow2 and NBD over /dev/nbd0:
# for i in $(seq 0 9); do ./qemu-img create -f qcow2 test.qcow2 64M > /dev/null; ./qemu-nbd -c /dev/nbd0 test.qcow2; sleep 1; ./qemu-img convert -n blob.raw /dev/nbd0; ./qemu-img convert /dev/nbd0 test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert /dev/nbd0 test2.raw; ./qemu-nbd -d /dev/nbd0 > /dev/null; if ! ./qemu-img compare -q test1.raw test2.raw; then md5sum test1.raw test2.raw; echo "$i failed"; break; fi; done; echo 'done'
done

raw and NBD over /dev/nbd0:
# for i in $(seq 0 9); do ./qemu-img create -f raw test.raw 64M > /dev/null; ./qemu-nbd -f raw -c /dev/nbd0 test.raw; sleep 1; ./qemu-img convert -n blob.raw /dev/nbd0; ./qemu-img convert /dev/nbd0 test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert /dev/nbd0 test2.raw; ./qemu-nbd -d /dev/nbd0 > /dev/null; if ! ./qemu-img compare -q test1.raw test2.raw; then md5sum test1.raw test2.raw; echo "$i failed"; break; fi; done; echo 'done'
done

In conclusion, the only combination I can reproduce the issue with is VDI with NBD over the Linux NBD interface. It doesn't seem to be the kernel's fault because other file formats work fine; it doesn't seem to be qemu-nbd's fault because not using the kernel interface works fine; and it doesn't seem to be VDI's fault because not using NBD or at least using NBD over TCP or Unix sockets works fine, too.

I'll keep looking into it.

Max

First insight: Having the VDI image in tmpfs or using --cache=none for qemu-nbd changes nothing, therefore the reason why dropping the caches affects the result is probably Linux's cache for the NBD block device.

Second insight: Using blkverify makes it look very much like qemu's VDI implementation is the culprit:

# for i in $(seq 0 9); do ./qemu-img create -f vdi /tmp/test.vdi 64M > /dev/null; ./qemu-img create -f raw /tmp/test.raw 64M > /dev/null; (./qemu-nbd -v -f raw -c /dev/nbd0 blkverify:/tmp/test.raw:/tmp/test.vdi &); sleep 1; ./qemu-img convert -n blob.raw /dev/nbd0; ./qemu-img convert /dev/nbd0 /tmp/test1.raw; sync; echo 1 > /proc/sys/vm/drop_caches; ./qemu-img convert /dev/nbd0 /tmp/test2.raw; ./qemu-nbd -d /dev/nbd0 > /dev/null; if ! ./qemu-img compare -q /tmp/test1.raw /tmp/test2.raw; then md5sum /tmp/test1.raw /tmp/test2.raw; echo "$i failed"; break; fi; done; echo 'done'
NBD device /dev/nbd0 is now connected to blkverify:/tmp/test.raw:/tmp/test.vdi
NBD device /dev/nbd0 is now connected to blkverify:/tmp/test.raw:/tmp/test.vdi
NBD device /dev/nbd0 is now connected to blkverify:/tmp/test.raw:/tmp/test.vdi
blkverify: read sector_num=27872 nb_sectors=256 contents mismatch in sector 27904
blkverify: read sector_num=28128 nb_sectors=256 contents mismatch in sector 28128
qemu-img: error while reading sector 24576: Input/output error
e5185b807948d65bb4e837d992cea429  /tmp/test1.raw
b8a195424a240ed77c849d89002fa23b  /tmp/test2.raw
2 failed
done

Max

I succeeded to reproduce the bug without NBD:

$ ./qemu-img create -f vdi test.vdi 2G
Formatting 'test.vdi', fmt=vdi size=2147483648 static=off
$ ./qemu-img create -f raw test.raw 2G
Formatting 'test.raw', fmt=raw size=2147483648
$ x86_64-softmmu/qemu-system-x86_64 -enable-kvm -drive if=virtio,file=blkverify:test.raw:test.vdi,format=raw -drive if=virtio,file=data.img,format=raw,format=raw -cdrom ~/tmp/arch.iso -m 512 -boot d
blkverify: read sector_num=810976 nb_sectors=256 contents mismatch in sector 811008

Operations in the guest:
$ dd if=/dev/vdb of=/dev/vda
$ dd if=/dev/vda of=/dev/null

So it really looks like something's broken in qemu's VDI block driver.

Max

Adding some locking in qemu's VDI implementation makes the bug disappear, at least I can't reproduce it anymore. I'll send a patch.

Max

So, if I understand well, it's "just" some kind of race condition in the handling of writes in the VDI driver of Qemu?

Does that mean it can also affect VMs running with VDI disk(s)?

On Wed, Feb 18, 2015 at 09:11:57AM -0000, Pierre Schweitzer wrote:
> Does that mean it can also affect VMs running with VDI disk(s)?

Yes, that's what Max's example showed.

Note that only raw, qcow2, and qed support in QEMU is designed for
running guests.  The other formats like vmdk and vdi are mainly for
qemu-img convert and will perform poorly because they don't handle
parallel I/O requests.

Stefan


Thanks Max & Stefan.

Could you please let me know when the commit makes it to the QEMU repository? And give me its commit ID?

So that I can ask for a bugfix backport in Trusty.

On Thu, Feb 19, 2015 at 05:48:03PM -0000, Pierre Schweitzer wrote:
> Thanks Max & Stefan.
> 
> Could you please let me know when the commit makes it to the QEMU
> repository? And give me its commit ID?
> 
> So that I can ask for a bugfix backport in Trusty.

Hi Pierre,
In case I forget, please follow this patch series from Max:
http://patchwork.ozlabs.org/patch/440713/

Stefan


Hi,
Was the patch abandoned? I haven't seen any further progress on it for a month...
Cheers,

Hi,

Has this patch already been incorporated? I'm observing the bug as well when copying files into a VDI image.

Kind regards,

Nicolas

Hi Nicolas,

It (that is, its v2) has been merged into upstream master (http://git.qemu.org/?p=qemu.git;a=commit;h=f0ab6f109630940146cbaf47d0cd99993ddba824) and is part of qemu 2.3.0 (and will probably become part of 2.2.2, too).

Max

Thanks for the extra information.

I'll open a new bug report at Ubuntu to get the fix backported to Trusty. Thanks!

Based on Max's comment this is Fix Released upstream.

And since Ubuntu Wily ships 2.3, I presume this is also fixed in Ubuntu in Wily. I'll mark that Fix Released too, and add a task for Trusty.

Please find attach a proposed debdiff for fixing the issue in Ubuntu Trusty by backporting the fix which is now in Wily.

Hello Pierre, or anyone else affected,

Accepted qemu into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/qemu/2.0.0+dfsg-2ubuntu1.18 in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.  Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed.  In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in advance!

Tested 2.0.0+dfsg-2ubuntu1.18.
Cannot reproduce the issue. 

TEST CASE 1:
Attempt several times to copy files on a VDI disk, and couldn't trigger any corruption (nor deadlock - no regression). Given my test scenario, previously, I would already have hit the corruption.

TEST CASE 2:
Tested with test scenario provided in comment #4. Couldn't reproduce, and no regression.

TEST CASE 3:
Tested with test scenario provided in comment #5. Couldn't reproduce, and no regression.

Marking as verification-done

This bug was fixed in the package qemu - 2.0.0+dfsg-2ubuntu1.18

---------------
qemu (2.0.0+dfsg-2ubuntu1.18) trusty-proposed; urgency=medium

  * qemu-nbd-fix-vdi-corruption.patch:
    qemu-nbd: fix corruption while writing VDI volumes (LP: #1422307)

 -- Pierre Schweitzer <email address hidden>  Mon, 17 Aug 2015 11:43:39 +0200

The verification of the Stable Release Update for qemu has completed successfully and the package has now been released to -updates.  Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report.  In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.