1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
|
graphic: 0.909
semantic: 0.876
mistranslation: 0.857
assembly: 0.810
device: 0.790
other: 0.785
KVM: 0.753
instruction: 0.722
boot: 0.720
vnc: 0.689
socket: 0.662
network: 0.655
QEMU-KVM / detect_zeroes causes KVM to start unlimited number of threads on Guest-Sided High-IO with big Blocksize
QEMU-KVM in combination with "detect_zeroes=on" makes a Guest able to DoS the Host. This is possible if the Host itself has "detect_zeroes" enabled and the Guest writes a large Chunk of data with a huge blocksize onto the drive.
E.g.: dd if=/dev/zero of=/tmp/DoS bs=1G count=1 oflag=direct
All QEMU-Versions after implementation of detect_zeroes are affected. Prior are unaffected. This is absolutely critical, please fix this ASAP!
#####
Provided by Dominik Csapak:
source , bs , count , O_DIRECT, behaviour
urandom , bs 1M, count 1024, O_DIRECT: OK
file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1M, count 1024, O_DIRECT: OK
zero file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, no O_DIRECT: NOT OK
rand file , bs 1G, count 1, O_DIRECT: OK
rand file , bs 1G, count 1, no O_DIRECT: OK
discard on:
urandom , bs 1M, count 1024, O_DIRECT: OK
rand file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1M, count 1024, O_DIRECT: OK
zero file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, O_DIRECT: NOT OK
zero file , bs 1G, count 1, no O_DIRECT: NOT OK
rand file , bs 1G, count 1, O_DIRECT: OK
rand file , bs 1G, count 1, no O_DIRECT: OK
detect_zeros off:
urandom , bs 1M, count 1024, O_DIRECT: OK
rand file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1M, count 1024, O_DIRECT: OK
zero file , bs 1M, count 1024, O_DIRECT: OK
/dev/zero , bs 1G, count 1, O_DIRECT: OK
zero file , bs 1G, count 1, O_DIRECT: OK
zero file , bs 1G, count 1, no O_DIRECT: OK
rand file , bs 1G, count 1, O_DIRECT: OK
rand file , bs 1G, count 1, no O_DIRECT: OK
#####
Provided by Florian Strankowski
bs - count - io-threads
512K - 2048 - 2
1M - 1024 - 2
2M - 512 - 4
4M - 256 - 6
8M - 128 - 10
16M - 64 - 18
32M - 32 - uncountable
Please refer to further information here:
https://bugzilla.proxmox.com/show_bug.cgi?id=1368
Sorry ab out the visibility settings, this bugtracker drives me nuts.
Just to make this clear: This bug affects only LVM-backed storages. File-based-storage is not affected. LVM-Thin and also LVM-Thick.
Status changed to 'Confirmed' because the bug affects multiple users.
I'm unable to reproduce this issue. The host stays responsive and the dd command completes in a reasonable amount of time. QEMU does not exceed the 64-thread pool size.
Please post steps to reproduce the issue using a minimal command-line without libvirt.
Here is information on my attempt to reproduce the problem:
Guest: Kernel 4.10.8-200.fc25.x86_64
Host: 4.10.11-200.fc25.x86_64
QEMU: qemu.git/master (e619b14746e5d8c0e53061661fd0e1da01fd4d60)
The LV is 1 GB on top of LUKS on a Samsung MZNLN256HCHP SATA SSD drive.
mpstat -P ALL 5 output:
11:02:02 AM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
11:02:07 AM all 3.36 0.00 6.22 34.54 0.25 0.50 0.00 3.11 0.00 52.03
11:02:07 AM 0 2.82 0.00 5.63 32.39 0.80 1.21 0.00 3.22 0.00 53.92
11:02:07 AM 1 3.02 0.00 6.04 28.77 0.20 0.20 0.00 3.02 0.00 58.75
11:02:07 AM 2 3.56 0.00 7.71 44.27 0.20 0.40 0.00 2.37 0.00 41.50
11:02:07 AM 3 3.81 0.00 5.61 32.46 0.00 0.40 0.00 4.01 0.00 53.71
vmstat 5 output:
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 1617404 6484 3541468 0 0 2145 84794 1976 8814 8 8 64 20 0
0 0 0 1619492 6484 3538592 0 0 613 69340 1518 7430 6 7 70 17 0
0 0 0 1618920 6484 3538680 0 0 280 75199 1421 6811 6 7 52 35 0
pidstat -v -p $PID_OF_QEMU 5 output:
11:01:08 AM UID PID threads fd-nr Command
11:02:03 AM 0 13043 67 37 qemu-system-x86
11:02:08 AM 0 13043 67 37 qemu-system-x86
11:02:13 AM 0 13043 67 37 qemu-system-x86
$ sudo x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 1024 -cpu host \
-device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 \
-drive file=test.img,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on \
-device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=100 \
-drive file=/dev/path/to/testlv,if=none,id=drive-scsi1,format=raw,cache=none,aio=native,detect-zeroes=on \
-device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1,bootindex=101 \
-nographic
guest# dd if=/dev/zero of=/dev/sdb bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 15.0681 s, 71.3 MB/s
Please be so kind and go for a 6G LVM-Vol and do "dd if=/dev/zero of=/dev/sdb bs=3G count=2 oflag=direct". Please keep an eye on your processor usage in comparison to the threads created. Its harder to knock-down an SSD-Backed system than one with spinners.
After further investigation on IRC the following points were raised:
1. Non-vcpu threads in QEMU weren't being isolated. Libvirt can do this
using the <cputune> domain XML element. The guest can create a high
load if some QEMU threads are unconstrained.
2. The wait% CPU stat was causing confusion. It's the idle time during
which synchronous I/O is pending. High wait% does not mean that the
system is under high CPU load. detect-zeroes=on can take a
synchronous I/O path even when aio=native is used, and this results
in wait% instead of idle%.
I'm closing the bug.
|