summary refs log tree commit diff stats
path: root/results/classifier/zero-shot/118/all/1636217
blob: e8a181d400ab5b68e209846f25a45b780fa39496 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
permissions: 0.957
risc-v: 0.954
peripherals: 0.945
mistranslation: 0.942
vnc: 0.942
assembly: 0.941
device: 0.935
register: 0.935
architecture: 0.933
PID: 0.930
semantic: 0.923
user-level: 0.922
debug: 0.919
files: 0.917
boot: 0.905
kernel: 0.902
ppc: 0.900
VMM: 0.900
arm: 0.896
socket: 0.893
graphic: 0.891
KVM: 0.888
network: 0.880
virtual: 0.867
performance: 0.837
TCG: 0.802
hypervisor: 0.789
x86: 0.709
i386: 0.591

qemu-kvm 2.7 does not boot kvm VMs with virtio on top of VMware ESX

After todays Proxmox update all my Linux VMs stopped booting.

# How to reproduce
- Have KVM on top of VMware ESX (I use VMware ESX 6)
- Boot Linux VM with virtio Disk drive.


# Result
virtio based VMs do not boot anymore:

root@demotuxdc:/etc/pve/nodes/demotuxdc/qemu-server# grep virtio0 100.conf 
bootdisk: virtio0
virtio0: pvestorage:100/vm-100-disk-1.raw,discard=on,size=20G

(initially with cache=writethrough, but that doesn´t matter)

What happens instead is:

- BIOS displays "Booting from harddisk..."
- kvm process of VM loops at about 140% of Intel(R) Core(TM) i5-6260U CPU @ 1.80GHz Skylake dual core CPU

Disk of course has valid bootsector:

root@demotuxdc:/srv/pvestorage/images/100# file -sk vm-100-disk-1.raw 
vm-100-disk-1.raw: DOS/MBR boot sector DOS/MBR boot sector DOS executable (COM), boot code
root@demotuxdc:/srv/pvestorage/images/100# head -c 2048 vm-100-disk-1.raw | hd | grep GRUB
00000170  be 94 7d e8 2e 00 cd 18  eb fe 47 52 55 42 20 00  |..}.......GRUB .|


# Workaround 1
- Change disk from virtio0 to scsi0
- Debian boots out of the box after this change
- SLES 12 needs a rebuilt initrd
- CentOS 7 too, but it seems that is not even enough and it still fails (even in hostonly="no" mode for dracut)


# Workaround 2
Downgrade pve-qemu-kvm 2.7.0-3 to 2.6.2-2.


# Expected results
Disk boots just fine via virtio like it did before.


# Downstream bug report
Downstream suggests an issue with upstream qemu-kvm:

https://bugzilla.proxmox.com/show_bug.cgi?id=1181



I traced this back to the switch to enabling virtio-1 mode by default in 2.7 in commit 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5

forcing the old behaviour with a 2.6 machine type works.

I confirm that "qm set ID -machine pc-i440fx-2.6" on the machine in question lets it boot as a virtio machine again with Qemu 2.7.

Adding Gerd, Marcel, and Kevin

On 10/28/16 10:23, Fabian Grünbichler wrote:
> I traced this back to the switch to enabling virtio-1 mode by default in
> 2.7 in commit 9a4c0e220d8a4f82b5665d0ee95ef94d8e1509d5
> 
> forcing the old behaviour with a 2.6 machine type works.

I think this issue is a duplicate of the following RHBZ:

https://bugzilla.redhat.com/show_bug.cgi?id=1373154

and the *SeaBIOS* commit that makes it all work again is:

commit 0e21548b15e25e010c362ea13d170c61a3fcc899
Author: Gerd Hoffmann <email address hidden>
Date:   Fri Jul 3 11:07:05 2015 +0200

    virtio: pci cfg access

That SeaBIOS commit is part of the SeaBIOS 1.10.0 release.

However, QEMU 2.7 shipped with bundled SeaBIOS 1.9.3 binaries. See QEMU
commits 6e03a28e1cee (part of v2.7.0) and 6e99f5741ff1 (not part of any
tagged release yet).

The fix is probably the following:
- backport SeaBIOS commit 0e21548b15e2 to the stable 1.9 branch, for
  release 1.9.4
- bundle SeaBIOS 1.9.4 binaries with QEMU v2.7.1.

Thanks
Laszlo


unfortunately cherry-picking the SeaBIOS 1.10 binary update commit from qemu master (6e99f5741ff1) on top of v2.7.0 does not solve the issue (the only observable change is the version string that is displayed on booting, right when it hangs ;)).

I can still give your suggested route a try if you think it is worth it, but since the 1.10 release contains your suggested commit, I doubt it will change anything..

> However, QEMU 2.7 shipped with bundled SeaBIOS 1.9.3 binaries. See QEMU
> commits 6e03a28e1cee (part of v2.7.0) and 6e99f5741ff1 (not part of any
> tagged release yet).
> 
> The fix is probably the following:
> - backport SeaBIOS commit 0e21548b15e2 to the stable 1.9 branch, for
>   release 1.9.4
> - bundle SeaBIOS 1.9.4 binaries with QEMU v2.7.1.

I'd rather cherry-pick 6e99f5741ff1 into 2.7.1 ...

cheers,
  Gerd


This problem still exists as of now on Debian sid. Qemu version is "QEMU emulator version 2.10.1(Debian 1:2.10.0+dfsg-2)".


for "-machine type=pc-i440fx-x" where x > 2.6, all stuck at boot if the interface is virtio.

I use nested virtualization where the first level is VMWARE FUSION (might not be the same as ESX), and the second is qemu-kvm.


Hi,

I have exactly the same problem.

My stack:
- macOS Sierra 10.12.6
- VMware Fusion 10.1.1 (tried with 10.0.1 too)
- Linux 4.9.78 (tried with 4.9.65 too)
- Qemu 2.11.0 (tried with 2.10.1 too)

All is working great with i440fx (or q35) <= 2.6 but it doesn't boot on >= 2.7 and QEMU takes all the CPU.

It doesn't boot with the disk in virtio and scsci-virtio mode but boot in scsi.

Exactly the same configuration on a baremetal (so no macOS and VMware) works great.

So I assume it's a bug with VMware and virtio.

we still meet similar issue on centos.7 (qemu 2.9.0-16.el7_4.5.1 + libvirt 3.2.0-14.el7_4.3)

my workaround including:
a) without kvm accel
or 
b) as comment #7 said "-machine type=pc-i440fx-x" where x <= 2.6
or
c) with pci device "disable-modern=on"

i found the function _farcall16 in seabios was invoked (https://github.com/coreboot/seabios/blob/af0daeb2687ad2595482b8a71b02a082a5672ceb/src/stacks.c#L418)
and failed when guest hang with 'Booting from hard disk'.

the invoking sequence (in seabios rel-1.11.0-5-g14d91c3) like :
src/boot.c line 614, call_boot_entry->
src/stacks.c line 427, farcall16->
src/stack.c line 411, _farcall16


but the issue perform diff in our two clusters. 

I just name them cluster A(6.0.0.0 3029758 E5 2640 v2,ststem x3650 M4) and cluster B (6.0.0.0 3029758 E5 2620 v4,system x3650 M5)for easy.

This issue in cluster B not be reproduced in cluster A(same qemu/libvirt/esxi)

my command:
/usr/libexec/qemu-kvm  -machine pc-i440fx-rhel7.3.0 \
-m 256 -drive file=centos.qcow2,if=none,id=drive-virtio-disk0 \
-device virtio-blk-pci,disable-modern=on,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 \
-vnc :1 -chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios

hope the above info can help fix the bug.


I'd like to share some other available workarounds:

1. Use "-machine type=pc-i440fx-x" where x <= 2.6
2. Add "disable-modern=on" option for all virtio block devices
3. Add "-global virtio-blk-pci.disable-modern=on"
3. Use software acceleration for virtual appliances ("-machine accel=tcg")

Hi Mykola. Thanks you for this information. Any idea about advantages / disadvantages of the different work-arounds? Thanks.

Well, it appeared that it is not enough to set "disable-modern=on" for virtio-blk-pci devices. You have to do the same for virtio-scsi-pci, and may be for other virtio devices you are using to disable virtio 1.0. But VM will hangs latter on during boot process if you use virtio-rng-pci.

"-machine accel=tcg" will work but in cost of performance penalty due to software virtualization.

So I found "-machine type=pc-i440fx-x" where x <= 2.6 the only reliable workaround.

This is a KVM bug.  It has been fixed in mainstream Linux in

commit d391f1207067268261add0485f0f34503539c5b0
Author: Vitaly Kuznetsov <email address hidden>
Date:   Thu Jan 25 16:37:07 2018 +0100

    x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
    
    I was investigating an issue with seabios >= 1.10 which stopped working
    for nested KVM on Hyper-V. The problem appears to be in
    handle_ept_violation() function: when we do fast mmio we need to skip
    the instruction so we do kvm_skip_emulated_instruction(). This, however,
    depends on VM_EXIT_INSTRUCTION_LEN field being set correctly in VMCS.
    However, this is not the case.
    
    Intel's manual doesn't mandate VM_EXIT_INSTRUCTION_LEN to be set when
    EPT MISCONFIG occurs. While on real hardware it was observed to be set,
    some hypervisors follow the spec and don't set it; we end up advancing
    IP with some random value.
    
    I checked with Microsoft and they confirmed they don't fill
    VM_EXIT_INSTRUCTION_LEN on EPT MISCONFIG.
    
    Fix the issue by doing instruction skip through emulator when running
    nested.
    
    Fixes: 68c3b4d1676d870f0453c31d5a52e7e65c7448ae
    Suggested-by: Radim Krčmář <email address hidden>
    Suggested-by: Paolo Bonzini <email address hidden>
    Signed-off-by: Vitaly Kuznetsov <email address hidden>
    Acked-by: Michael S. Tsirkin <email address hidden>
    Signed-off-by: Radim Krčmář <email address hidden>


Although the commit mentions Hyper-V as L0 hypervisor, the same problem
pertains to ESXi.

The commit is included in v4.16.

That is great news. Thanks for sharing!

Marking as fixed, according to comment #13