summary refs log tree commit diff stats
path: root/results/classifier/zero-shot/105/graphic/1797332
blob: 2b533a4cbcd4a21087457827f8ee761dab51e2f2 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
graphic: 0.764
assembly: 0.728
instruction: 0.721
network: 0.716
mistranslation: 0.697
device: 0.688
semantic: 0.665
KVM: 0.641
other: 0.621
vnc: 0.616
boot: 0.517
socket: 0.472

qemu nested virtualization is not working with Ubuntu16.04 + Intel CPU

# 1 What am I trying to do ? #

I want to use `libvirt` `qemu/KVM` with **nested virtualization** like described
in [1] and [2].
**But it does not work with Ubuntu16.04.** It worked some times ago, but not
anymore.


I want 2 levels of virtualization like this:

* L0 – the bare metal host, running KVM on `Ubuntu 16.04`
* L1 – a `Ubuntu 16.04` VM running on L0; also called the "guest hypervisor" 
  — as it itself is capable of running KVM
* L2 – a `Ubuntu 16.04` VM running on L1, also called the "nested guest"


[1] https://docs.fedoraproject.org/en-US/quick-docs/using-nested-virtualization-in-kvm/
[2] https://www.linux-kvm.org/page/Nested_Guests


My goal is to deploy an `OpenStack` environnement on top of VMs rather than on
bare metal hosts for convenience for a lab experiment. As a result, the 
`OpenStack` nodes are L1 VMs. Compute nodes are L1 VMs as well and the VMs 
created with `OpenStack` and wich are running on the compute nodes are L2 VMs.






# 2 What is my problem ? #

I can **not** run my 2nd levels of virtualization in 16.04: 

* L0 is just fine: running `Ubuntu 16.04.5 LTS`, installed with the `.iso` image
* L1: I install `libvirt` + `KVM` on L0. I can run VMs like the `Ubuntu16.04` 
  cloud image on L0.
* L2: I install `libvirt` + `KVM` on L1 as well. But I **can not** run VMs on 
  L1: I get `kernel panic` or `general protection fault`.


**But if I do the same with Ubuntu18.04** (on the same hardware) instead of 
`Ubuntu16.04`, it works without faults.
I don't change the configuration or `virt-install scripts` (other than using 
the 18.04 .iso and cloud image).






# 3 My libvirt installation for Ubuntu16.04 #

I install `libvir KVM` in both L0 and L1 using a custom repository [3] from 
`OpenStack` team, because their version of libvirt in this repo is newer than  
the one on Ubuntu 16.04 official repo and it match the version of `libvirt` 
in Ubuntu 18.04.

[3] https://wiki.ubuntu.com/OpenStack/CloudArchive






# 4 hardware and CPU #

CPU is:
> Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
> Intel virt is enable in the bios/uefi.

The rest is standard HDD, standard I/O...






# 5 .iso and cloud image #

I download .iso for L0 bare metal server and cloud image 
for L1/L2 VMs from official repository:

Ubuntu 16.04
 * http://releases.ubuntu.com/16.04/
 * https://cloud-images.ubuntu.com/releases/16.04/release/

Ubuntu 18.04
 * http://releases.ubuntu.com/bionic/
 * https://cloud-images.ubuntu.com/releases/18.04/release/






# 6 Details #

## Details about L0 Ubuntu 16.04 bare metal host ##
L0 is running `Ubuntu 16.04.5 LTS` installed with the .iso.


**kernel**
```
user@L0:~$ uname -a
Linux L0 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
```

**libvirt version** running on L0
```
user@L0:~$ virsh version
Compiled against library: libvirt 4.0.0
Using library: libvirt 4.0.0
Using API: QEMU 4.0.0
Running hypervisor: QEMU 2.11.1
```

**qemu version detail**
```
ukvm2@kvm2:~$ qemu-system-x86_64 --version
QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.5~cloud0)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
```

**KVM acceleration**
```
user@L0:~$ kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
```

**nested parameter**
```
user@L0:~$ cat /sys/module/kvm_intel/parameters/nested
Y
```

**number of CPU**
```
user@L0:~$ egrep -c '(vmx|svm)' /proc/cpuinfo
48
```



## Details about a L1 Ubuntu 16.04 VM ##
A VM in L1 (which is running on L0) which is running `Ubuntu 16.04.5 LTS` 
installed by a cloud image.

**kernel**
```
user@L1-VM:~$ uname -a
Linux L1 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
```

**libvirt version** running on the L1 VM
```
user@L1-VM:~$ sudo virsh version
Compiled against library: libvirt 4.0.0
Using library: libvirt 4.0.0
Using API: QEMU 4.0.0
Running hypervisor: QEMU 2.11.1
```

**qemu version detail**
```
user@L1-VM:~$ qemu-system-x86_64 --version
QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.5~cloud0)
Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers
```

**KVM acceleration**
```
user@L1-VM:~$ kvm-ok 
INFO: /dev/kvm exists
KVM acceleration can be used
```

**nested parameter**
```
user@L1-VM:~$ cat /sys/module/kvm_intel/parameters/nested
Y
```

**number of CPU**, which are vCPU given by L0 to the L1 VM
I give 20 vCPU.
```
user@L1-VM:~$ egrep -c '(vmx|svm)' /proc/cpuinfo
20
```



## L1 VM virt-install script parameter ##
If you want to reproduce an L1 VM, I followed this [4]:

```
virt-install \
    --connect=qemu:///system \
    --name $VMName \
    --memory $RAM \
    --vcpus $VCPUS \
    --cpu host \
    --metadata description=$DESCRIPTION \
    --os-type linux \
    --os-variant ubuntu16.04 \
    --disk $DISK_PATH/$VMName.$DISK_FORMAT,size=$DISK_SIZE,bus=virtio \
    --disk $CFGIMG_PATH/config_$VMName.$DISK_FORMAT,device=cdrom \
    --network bridge=virbr0 \
    --graphics none \
    --console pty,target_type=serial \
    --hvm
```

[4] https://youth2009.org/post/kvm-with-ubuntu-cloud-image/



## Details about a L2 VM ##

I want to create a L2 `Ubuntu 16.04.5 LTS` VM installed by a cloud image VM 
within my L1 `KVM` VM. But whatever I do, my L2 VM crash before finishing to be 
instantiated. I get `kernel panic` or `general protection fault`.


Here is the log of an L2 VM after the instanciation failed:
```
user@L1-VM:~$ less /var/log/libvirt/qemu/VMNAME.log

2018-10-11T07:40:45.837151Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0)
2018-10-11T07:40:45.844279Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10]
2018-10-11T07:40:45.848532Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10]
```


If you want to reproduce an L2 VM running on L1, follow [4].


**However** a Cirros OS image can run on a L1 VM !






# 7 Thoughts #
I think this is a bug in either `Ubuntu16.04` or `libvirt`.
All the information are here to reproduce the bug, I think.


If I do the same with `Ubuntu 18.04`, on the same hardware, following the same 
steps but with Ubuntu 18.04 .iso and cloud image, it works.

It works if:

* L0 = Ubuntu18.04 (.iso) + qemu/KVM
* L1 = Ubuntu18.04 (cloud image) + qemu/KVM
* L2 = Ubuntu18.04 (cloud image)


It also works if:

* L0 = Ubuntu18.04 (.iso) + qemu/KVM
* L1 = Ubuntu18.04 (cloud image) + qemu/KVM
* L2 = Ubuntu16.04 (cloud image)




Thank you for your time reading !
--
nico



[update]
I tested some new combinations #1 and #2 (see in attachment) with
Ubuntu 18.04 and 16.04.

I think for the moment and if it fits my needs, I will stick to 
combination #1 and/or #2.

Hi Nicolas, interesting.

Seeing CPUID.07H:EBX.invpcid  makes me wonder - IIRC that was a speedup feature long neglected by everyone but suddenly becoming important in the context of meltdown avoidance. Maybe that wasn't passed/emulated in the older qemu but the guest now insists or misdetects it?

This also would sort of match your statement "It worked some times ago, but not anymore." as the meltdown fixes obviously came after the release of 16.04.


Let me try to recreate your initial case first with 16.04->16.04->16.04.

I took a fresh deployed Xenial host and deployed a Xenial guest in lvl1
$ sudo apt install uvtool-libvirt
$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 release=xenial label=daily
$ uvt-kvm create --memory 4096 --disk 30 --cpu 4 --password ubuntu xenial-guest-lvl1 arch=amd64 release=xenial label=daily

And then in the lvl1 guest doing the same to spawn a smaller lvl2 guest.
...
# note: back then (16.04) nested default libvirt network needed to manually get to work before the next command
$ uvt-kvm create --password ubuntu xenial-guest-lvl2 arch=amd64 release=xenial label=daily


That guest runs just fine and is happy.
So it has to be part of your guest configuration in some way.
$ cat /proc/cpuinfo  | grep invpcid
Report it is available on the Host (lvl0) but none of the guests (lvl1/lvl2).

I did not see a warning like yours about CPUID.07H:EBX.invpcid (on neither of the lvls).
My guests are defined the "most default" way possible leaving most of the cpu construction to the defaults of libvirt/qemu.
virsh dumpxml content:
lvl1 => http://paste.ubuntu.com/p/fH57d5prmS/
lvl2 => http://paste.ubuntu.com/p/vQbcgfmfVv/

I wonder if your way to setup the guests uses special CPU types that define the meltdowny features - like the -IBRS types or even adding features like those mentioned in [1].

E.g. Virt-manager would default to "Haswell-noTSX-IBRS" on my system with the virt stack of 16.04. 

If I use that in my guest definition (on both levels)
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Haswell-noTSX-IBRS</model>
  </cpu>

Now I get invpcid in $ cat /proc/cpuinfo  | grep invpcid in the lvl1 guest.
But since this type lacks the KVM features I'm no more assuming but waiting for your reply on how guest CPU is modelled in your case.

But in  general in that case i could think of this being a potential trouble for (x86) nesting which is generally known as "working great until it does't"

Waiting for your feedback on guest CPU definitions in your case.

[1]: https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/

Hi Christian,

First, I tried to create a lvl2 VM using your suggestion with `uvtool-libvirt`.
I have tried this on one of my lvl1 VM, which is an OpenStack compute node.

```
compute@L1: $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 release=xenial label=daily
compute@L1: $ uvt-kvm create --memory 4096 --disk 30 --cpu 4 --password ubuntu xenial-guest-lvl1 arch=amd64 release=xenial label=daily
```
And this way, it works !
However, if I use the OpenStack API to create a lvl2 VM on this same compute 
node, the OpenStack Nova VM fails.
.
.
.
.
If we recap the 3 options tested here to create an lvl2 VM:
  1. OpenStack API -> FAIL
  The lvl2 VM is stuck for example at:
  "Starting Update UTMP about System Boot/Shutdown..."

  2. virt-install "By hand" -> FAIL
  If I do like in [1], VM generate the same error than in my 1st post.

  3. uvtool-libvirt -> SUCCESS
  Your example works just fine.

[1] https://youth2009.org/post/kvm-with-ubuntu-cloud-image/
.
.
.
.
You say:
  > That guest runs just fine and is happy.
  > So it has to be part of your guest configuration in some way.

I agree: maybe I should look further into the options of `virt-install`.
What I give in my first post (the virt-install script) is my way of creating
lvl1 VM.

    NB: It seems that for the moment, --os-variant has no `ubuntu18.04` value.
        I keep this parameter to ubuntu16.04, even if I want to create a 
        18.04 VM.

The difference between Ubuntu 16.04 and 18.04 regarding `virt-install`:
  * 16.04: virt-install --version is 1.3.2
  * 18.04: virt-install --version is 1.5.1

So maybe the problem comes from `virt-install` and the way I configure a VM.
.
.
.
.
However when looking at the OpenStack API, here I am not the one who provides
the guest configuration. I provide the OpenStack API with the info it needs to
create a new OpenStack instance (i.e. flavor, image type, cloud-init config, 
etc...) and then the API ~converts~ this description to instantiate this
OpenStack instance on the compute node which is running qemu/KVM.

I am not sure what the OpenStack API uses to do that. I assume it uses 
python-libvirt [2] but I may be wrong.

[2] https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Guest_Domains-Lifecycle_Control.html#libvirt_application_development_guide_using_python-Guest_Domains-Lifecycle-Provisioning_and_Starting
.
.
.
.
You say:
  > I wonder if your way to setup the guests uses special CPU types [...]
  > 
  > Waiting for your feedback on guest CPU definitions in your case.

My CPU is a Intel Xeon Broadwell.

On the lvl0, which have 48 cores:
```
baremetal@L0:cat /proc/cpuinfo | grep invpcid | wc -l
48
```

On the lvl1, which is a VM with 20 vCPU:
```
compute@L1:$ cat /proc/cpuinfo | grep invpcid | wc -l
20
```
.
.
.
.
Dumpxml of a working lvl1 VM "compute41":
https://paste.ubuntu.com/p/KMrCKGgvRg/

Dumpxml of a failing lvl2 VM created by the OpenStack/Nova API on "compute41":
https://paste.ubuntu.com/p/9FrhMWWgVk/

Dumpxml of a working lvl2 VM created by uvtool-libvirt:
https://paste.ubuntu.com/p/4CztPDW7fM/

On difference I see with your Dumpxml is for os part:
  machine='pc-i440fx-bionic' for me
  machine='pc-i440fx-xenial' for you

Maybe this is due to the way I install qemu with cloud-archive:queens [3].

[3] https://wiki.ubuntu.com/OpenStack/CloudArchive

.
.
.
.
qemu logs for the lvl2 VM created by uvtool-libvirt:
```
compute@L1:$ cat /var/log/libvirt/qemu/xenial-guest-lvl2.log

[...]
2018-10-12T14:28:03.317760Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
```

If I miss sth, let me know!
--
Nicolas

On my previous comment, I am in the "16.04 > 16.04 > 16.04" situation.
I say this especially for my 3 dumpxml files.

Ok,
at lvl1 definition Openstack came up with it's cpu modelling which in this case is actually:
  cpu mode='host-passthrough
  + a bunch of required features
That is what gives your LVL1 the invpcid feature (so far so good).

At lvl2 we have
Nova:
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='1' cores='1' threads='1'/>
  </cpu>
vs uvtool
 <!-- has no definition, keeping defaults -->

Thanks for the data Nicolas!
With that in mind I have set my LVL1 to run the same host-passthrough config that you have reported.
Then again I configure LVL2 to run the same host-model config.
Note: "my 16.04" would not allow "check='partial'", so I dropped it.
What version of libvirt is running in your lvl1 (or all levels)?
Current is 1.3.1-1ubuntu10.24

I was feeling glad that it seems that the uvtool style guests work for you as I assumed.
But even with the same CPU definitions used in my case it works for me.
That is for "16.04 > 16.04 > 16.04" as well.

x86 nested virt is never really supported, just "as good as it happens to work". I wonder if that is one of those cases.
My chip is a somewhat older 12 core "Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz".

We have plenty of SW workarounds already, but if you can spend the time I wonder if you can:
- reproduce the same on a different host CPU
- To confirm our current theory that the usage/emulation/nesting of invpcid is the root cause, could you on the failing case in the definition for LVL2 add <feature policy='disable' name='invpcid'/> to the cpu section. That would keep the rest as-is, but remove that feature.


NB-reply: there was no --os-variant 18.04 released back then, but since there was no change since former releases it doesn't matter - the only drawback is that sometimes people are wondering if it is missing.

You say:
  > What version of libvirt is running in your lvl1 (or all levels)?

cf my 1st post and below:

On my Ubuntu16.04 bare metal host:
    ```
    libvirt-bin      4.0.0-1ubuntu8.5~cloud0
    qemu-system-x86  1:2.11+dfsg-1ubuntu7.6~cloud0
    ```

On one Ubuntu16.04 lvl1 VM:
    ```
    libvirt-bin      4.0.0-1ubuntu8.3~cloud0
    qemu-system-x86  1:2.11+dfsg-1ubuntu7.4~cloud0
    ```
.
.
.
.
You say:
  > - [can you] reproduce the same on a different host CPU

If I can I will try, on a much 'smaller' device (not a Xeon CPU).
Maybe tomorow.
.
.
.
.
You say:
  > To confirm our current theory that the usage/emulation/nesting of invpcid 
  is the root cause [...]

Dumpxml of my lvl2 VM with "<feature policy='disable' name='invpcid'/>":
https://paste.ubuntu.com/p/WxvfBcHnF2/

And here is the boot log of the lvl2 VM with invpcid disabled:
The VM failed to boot. Maybe I missed sth.
https://paste.ubuntu.com/p/bkqDsT8VTy/

I followed [1]:
[1] https://youth2009.org/post/kvm-with-ubuntu-cloud-image/
.
.
.
.
you say:
  > NB-reply: there was no --os-variant 18.04 released back then

No big deal, it is just frustrating to instantiate an Ubuntu18.04 and to set 
this parameter to something else !
.
.
.
.
You say:
  > x86 nested virt is never really supported, just "as good as it happens to work". I wonder if that is one of those cases.

First I thought nested virt could make my life easier regarding what I wanted
to achieve. There are several ways of testing and deploying OpenStack.
With the way I chose, I could 'simulate' a multi-hosts environment like in [2] 
with several compute nodes, etc...

[2] https://github.com/nuagenetworks/nuage-openstack-ansible/wiki/Configure-OSA-Multi-node-Environment

But now I understand that nested virt is maybe too much a beta.
I will try with "Ubuntu 18.04 > Ubuntu 16/18 > {whatever OS}".
Maybe this is patched in Ubuntu18.04.
And if it does not suit my needs, I will figure out something else (and more 
bare metal ^^). 

Metal as a Service (Ubuntu MAAS) looks good, but it is too much for me.
.
.
.
.
TY for your help. I think this bug is hard to identify and maybe harder to
patch. And I am not a virtualization or qemu or OpenStack expert 
(for the moment !?). So I can't help you more.

[Expired for qemu (Ubuntu) because there has been no activity for 60 days.]

FWIW, bumping the kernel on the host (and most likely on the L1 VMs too) should work.
The HWE kernel in Xenial is the same version (4.15) with the kernel used by Bionic (18.04), so this should fix the problem:
$ apt install linux-generic-hwe-16.04
$ reboot

BR,
Alex

Thank you for your suggestion @Alexandru !
.
.
.
I can not try this fix because since then I have moved on and I use Ubuntu18.04 for my L0 hypervisor, and I have also tried with Ubuntu18.04 on th L1 VMs.
.
.
.
However very interesting. On my previous Ubuntu16.04 hosts, I believe I used "linux-image-4.4.0-XYZ-generic".

I think I have encountered exactly the same issue as you Nico. We have very similar setups. 
Upgrading the kernel did not help.
Only thing that helped was setting openstack to use qemu instead of kvm in L2 VMs with the performance cost associated with doing that :(