diff options
Diffstat (limited to 'results/classifier/118/none/1797332')
| -rw-r--r-- | results/classifier/118/none/1797332 | 628 |
1 files changed, 628 insertions, 0 deletions
diff --git a/results/classifier/118/none/1797332 b/results/classifier/118/none/1797332 new file mode 100644 index 000000000..193e047be --- /dev/null +++ b/results/classifier/118/none/1797332 @@ -0,0 +1,628 @@ +graphic: 0.764 +hypervisor: 0.760 +user-level: 0.753 +architecture: 0.750 +assembly: 0.728 +kernel: 0.718 +network: 0.716 +permissions: 0.714 +PID: 0.703 +virtual: 0.700 +mistranslation: 0.697 +peripherals: 0.691 +device: 0.688 +register: 0.685 +x86: 0.683 +risc-v: 0.682 +semantic: 0.665 +performance: 0.651 +KVM: 0.641 +arm: 0.634 +TCG: 0.618 +vnc: 0.616 +VMM: 0.613 +files: 0.601 +debug: 0.574 +ppc: 0.566 +boot: 0.517 +socket: 0.472 +i386: 0.307 + +qemu nested virtualization is not working with Ubuntu16.04 + Intel CPU + +# 1 What am I trying to do ? # + +I want to use `libvirt` `qemu/KVM` with **nested virtualization** like described +in [1] and [2]. +**But it does not work with Ubuntu16.04.** It worked some times ago, but not +anymore. + + +I want 2 levels of virtualization like this: + +* L0 – the bare metal host, running KVM on `Ubuntu 16.04` +* L1 – a `Ubuntu 16.04` VM running on L0; also called the "guest hypervisor" + — as it itself is capable of running KVM +* L2 – a `Ubuntu 16.04` VM running on L1, also called the "nested guest" + + +[1] https://docs.fedoraproject.org/en-US/quick-docs/using-nested-virtualization-in-kvm/ +[2] https://www.linux-kvm.org/page/Nested_Guests + + +My goal is to deploy an `OpenStack` environnement on top of VMs rather than on +bare metal hosts for convenience for a lab experiment. As a result, the +`OpenStack` nodes are L1 VMs. Compute nodes are L1 VMs as well and the VMs +created with `OpenStack` and wich are running on the compute nodes are L2 VMs. + + + + + + +# 2 What is my problem ? # + +I can **not** run my 2nd levels of virtualization in 16.04: + +* L0 is just fine: running `Ubuntu 16.04.5 LTS`, installed with the `.iso` image +* L1: I install `libvirt` + `KVM` on L0. I can run VMs like the `Ubuntu16.04` + cloud image on L0. +* L2: I install `libvirt` + `KVM` on L1 as well. But I **can not** run VMs on + L1: I get `kernel panic` or `general protection fault`. + + +**But if I do the same with Ubuntu18.04** (on the same hardware) instead of +`Ubuntu16.04`, it works without faults. +I don't change the configuration or `virt-install scripts` (other than using +the 18.04 .iso and cloud image). + + + + + + +# 3 My libvirt installation for Ubuntu16.04 # + +I install `libvir KVM` in both L0 and L1 using a custom repository [3] from +`OpenStack` team, because their version of libvirt in this repo is newer than +the one on Ubuntu 16.04 official repo and it match the version of `libvirt` +in Ubuntu 18.04. + +[3] https://wiki.ubuntu.com/OpenStack/CloudArchive + + + + + + +# 4 hardware and CPU # + +CPU is: +> Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz +> Intel virt is enable in the bios/uefi. + +The rest is standard HDD, standard I/O... + + + + + + +# 5 .iso and cloud image # + +I download .iso for L0 bare metal server and cloud image +for L1/L2 VMs from official repository: + +Ubuntu 16.04 + * http://releases.ubuntu.com/16.04/ + * https://cloud-images.ubuntu.com/releases/16.04/release/ + +Ubuntu 18.04 + * http://releases.ubuntu.com/bionic/ + * https://cloud-images.ubuntu.com/releases/18.04/release/ + + + + + + +# 6 Details # + +## Details about L0 Ubuntu 16.04 bare metal host ## +L0 is running `Ubuntu 16.04.5 LTS` installed with the .iso. + + +**kernel** +``` +user@L0:~$ uname -a +Linux L0 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux +``` + +**libvirt version** running on L0 +``` +user@L0:~$ virsh version +Compiled against library: libvirt 4.0.0 +Using library: libvirt 4.0.0 +Using API: QEMU 4.0.0 +Running hypervisor: QEMU 2.11.1 +``` + +**qemu version detail** +``` +ukvm2@kvm2:~$ qemu-system-x86_64 --version +QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.5~cloud0) +Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers +``` + +**KVM acceleration** +``` +user@L0:~$ kvm-ok +INFO: /dev/kvm exists +KVM acceleration can be used +``` + +**nested parameter** +``` +user@L0:~$ cat /sys/module/kvm_intel/parameters/nested +Y +``` + +**number of CPU** +``` +user@L0:~$ egrep -c '(vmx|svm)' /proc/cpuinfo +48 +``` + + + +## Details about a L1 Ubuntu 16.04 VM ## +A VM in L1 (which is running on L0) which is running `Ubuntu 16.04.5 LTS` +installed by a cloud image. + +**kernel** +``` +user@L1-VM:~$ uname -a +Linux L1 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux +``` + +**libvirt version** running on the L1 VM +``` +user@L1-VM:~$ sudo virsh version +Compiled against library: libvirt 4.0.0 +Using library: libvirt 4.0.0 +Using API: QEMU 4.0.0 +Running hypervisor: QEMU 2.11.1 +``` + +**qemu version detail** +``` +user@L1-VM:~$ qemu-system-x86_64 --version +QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.5~cloud0) +Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers +``` + +**KVM acceleration** +``` +user@L1-VM:~$ kvm-ok +INFO: /dev/kvm exists +KVM acceleration can be used +``` + +**nested parameter** +``` +user@L1-VM:~$ cat /sys/module/kvm_intel/parameters/nested +Y +``` + +**number of CPU**, which are vCPU given by L0 to the L1 VM +I give 20 vCPU. +``` +user@L1-VM:~$ egrep -c '(vmx|svm)' /proc/cpuinfo +20 +``` + + + +## L1 VM virt-install script parameter ## +If you want to reproduce an L1 VM, I followed this [4]: + +``` +virt-install \ + --connect=qemu:///system \ + --name $VMName \ + --memory $RAM \ + --vcpus $VCPUS \ + --cpu host \ + --metadata description=$DESCRIPTION \ + --os-type linux \ + --os-variant ubuntu16.04 \ + --disk $DISK_PATH/$VMName.$DISK_FORMAT,size=$DISK_SIZE,bus=virtio \ + --disk $CFGIMG_PATH/config_$VMName.$DISK_FORMAT,device=cdrom \ + --network bridge=virbr0 \ + --graphics none \ + --console pty,target_type=serial \ + --hvm +``` + +[4] https://youth2009.org/post/kvm-with-ubuntu-cloud-image/ + + + +## Details about a L2 VM ## + +I want to create a L2 `Ubuntu 16.04.5 LTS` VM installed by a cloud image VM +within my L1 `KVM` VM. But whatever I do, my L2 VM crash before finishing to be +instantiated. I get `kernel panic` or `general protection fault`. + + +Here is the log of an L2 VM after the instanciation failed: +``` +user@L1-VM:~$ less /var/log/libvirt/qemu/VMNAME.log + +2018-10-11T07:40:45.837151Z qemu-system-x86_64: -chardev pty,id=charserial0: char device redirected to /dev/pts/1 (label charserial0) +2018-10-11T07:40:45.844279Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10] +2018-10-11T07:40:45.848532Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.07H:EBX.invpcid [bit 10] +``` + + +If you want to reproduce an L2 VM running on L1, follow [4]. + + +**However** a Cirros OS image can run on a L1 VM ! + + + + + + +# 7 Thoughts # +I think this is a bug in either `Ubuntu16.04` or `libvirt`. +All the information are here to reproduce the bug, I think. + + +If I do the same with `Ubuntu 18.04`, on the same hardware, following the same +steps but with Ubuntu 18.04 .iso and cloud image, it works. + +It works if: + +* L0 = Ubuntu18.04 (.iso) + qemu/KVM +* L1 = Ubuntu18.04 (cloud image) + qemu/KVM +* L2 = Ubuntu18.04 (cloud image) + + +It also works if: + +* L0 = Ubuntu18.04 (.iso) + qemu/KVM +* L1 = Ubuntu18.04 (cloud image) + qemu/KVM +* L2 = Ubuntu16.04 (cloud image) + + + + +Thank you for your time reading ! +-- +nico + + + +[update] +I tested some new combinations #1 and #2 (see in attachment) with +Ubuntu 18.04 and 16.04. + +I think for the moment and if it fits my needs, I will stick to +combination #1 and/or #2. + +Hi Nicolas, interesting. + +Seeing CPUID.07H:EBX.invpcid makes me wonder - IIRC that was a speedup feature long neglected by everyone but suddenly becoming important in the context of meltdown avoidance. Maybe that wasn't passed/emulated in the older qemu but the guest now insists or misdetects it? + +This also would sort of match your statement "It worked some times ago, but not anymore." as the meltdown fixes obviously came after the release of 16.04. + + +Let me try to recreate your initial case first with 16.04->16.04->16.04. + +I took a fresh deployed Xenial host and deployed a Xenial guest in lvl1 +$ sudo apt install uvtool-libvirt +$ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 release=xenial label=daily +$ uvt-kvm create --memory 4096 --disk 30 --cpu 4 --password ubuntu xenial-guest-lvl1 arch=amd64 release=xenial label=daily + +And then in the lvl1 guest doing the same to spawn a smaller lvl2 guest. +... +# note: back then (16.04) nested default libvirt network needed to manually get to work before the next command +$ uvt-kvm create --password ubuntu xenial-guest-lvl2 arch=amd64 release=xenial label=daily + + +That guest runs just fine and is happy. +So it has to be part of your guest configuration in some way. +$ cat /proc/cpuinfo | grep invpcid +Report it is available on the Host (lvl0) but none of the guests (lvl1/lvl2). + +I did not see a warning like yours about CPUID.07H:EBX.invpcid (on neither of the lvls). +My guests are defined the "most default" way possible leaving most of the cpu construction to the defaults of libvirt/qemu. +virsh dumpxml content: +lvl1 => http://paste.ubuntu.com/p/fH57d5prmS/ +lvl2 => http://paste.ubuntu.com/p/vQbcgfmfVv/ + +I wonder if your way to setup the guests uses special CPU types that define the meltdowny features - like the -IBRS types or even adding features like those mentioned in [1]. + +E.g. Virt-manager would default to "Haswell-noTSX-IBRS" on my system with the virt stack of 16.04. + +If I use that in my guest definition (on both levels) + <cpu mode='custom' match='exact'> + <model fallback='allow'>Haswell-noTSX-IBRS</model> + </cpu> + +Now I get invpcid in $ cat /proc/cpuinfo | grep invpcid in the lvl1 guest. +But since this type lacks the KVM features I'm no more assuming but waiting for your reply on how guest CPU is modelled in your case. + +But in general in that case i could think of this being a potential trouble for (x86) nesting which is generally known as "working great until it does't" + +Waiting for your feedback on guest CPU definitions in your case. + +[1]: https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/ + +Hi Christian, + +First, I tried to create a lvl2 VM using your suggestion with `uvtool-libvirt`. +I have tried this on one of my lvl1 VM, which is an OpenStack compute node. + +``` +compute@L1: $ uvt-simplestreams-libvirt --verbose sync --source http://cloud-images.ubuntu.com/daily arch=amd64 release=xenial label=daily +compute@L1: $ uvt-kvm create --memory 4096 --disk 30 --cpu 4 --password ubuntu xenial-guest-lvl1 arch=amd64 release=xenial label=daily +``` +And this way, it works ! +However, if I use the OpenStack API to create a lvl2 VM on this same compute +node, the OpenStack Nova VM fails. +. +. +. +. +If we recap the 3 options tested here to create an lvl2 VM: + 1. OpenStack API -> FAIL + The lvl2 VM is stuck for example at: + "Starting Update UTMP about System Boot/Shutdown..." + + 2. virt-install "By hand" -> FAIL + If I do like in [1], VM generate the same error than in my 1st post. + + 3. uvtool-libvirt -> SUCCESS + Your example works just fine. + +[1] https://youth2009.org/post/kvm-with-ubuntu-cloud-image/ +. +. +. +. +You say: + > That guest runs just fine and is happy. + > So it has to be part of your guest configuration in some way. + +I agree: maybe I should look further into the options of `virt-install`. +What I give in my first post (the virt-install script) is my way of creating +lvl1 VM. + + NB: It seems that for the moment, --os-variant has no `ubuntu18.04` value. + I keep this parameter to ubuntu16.04, even if I want to create a + 18.04 VM. + +The difference between Ubuntu 16.04 and 18.04 regarding `virt-install`: + * 16.04: virt-install --version is 1.3.2 + * 18.04: virt-install --version is 1.5.1 + +So maybe the problem comes from `virt-install` and the way I configure a VM. +. +. +. +. +However when looking at the OpenStack API, here I am not the one who provides +the guest configuration. I provide the OpenStack API with the info it needs to +create a new OpenStack instance (i.e. flavor, image type, cloud-init config, +etc...) and then the API ~converts~ this description to instantiate this +OpenStack instance on the compute node which is running qemu/KVM. + +I am not sure what the OpenStack API uses to do that. I assume it uses +python-libvirt [2] but I may be wrong. + +[2] https://libvirt.org/docs/libvirt-appdev-guide-python/en-US/html/libvirt_application_development_guide_using_python-Guest_Domains-Lifecycle_Control.html#libvirt_application_development_guide_using_python-Guest_Domains-Lifecycle-Provisioning_and_Starting +. +. +. +. +You say: + > I wonder if your way to setup the guests uses special CPU types [...] + > + > Waiting for your feedback on guest CPU definitions in your case. + +My CPU is a Intel Xeon Broadwell. + +On the lvl0, which have 48 cores: +``` +baremetal@L0:cat /proc/cpuinfo | grep invpcid | wc -l +48 +``` + +On the lvl1, which is a VM with 20 vCPU: +``` +compute@L1:$ cat /proc/cpuinfo | grep invpcid | wc -l +20 +``` +. +. +. +. +Dumpxml of a working lvl1 VM "compute41": +https://paste.ubuntu.com/p/KMrCKGgvRg/ + +Dumpxml of a failing lvl2 VM created by the OpenStack/Nova API on "compute41": +https://paste.ubuntu.com/p/9FrhMWWgVk/ + +Dumpxml of a working lvl2 VM created by uvtool-libvirt: +https://paste.ubuntu.com/p/4CztPDW7fM/ + +On difference I see with your Dumpxml is for os part: + machine='pc-i440fx-bionic' for me + machine='pc-i440fx-xenial' for you + +Maybe this is due to the way I install qemu with cloud-archive:queens [3]. + +[3] https://wiki.ubuntu.com/OpenStack/CloudArchive + +. +. +. +. +qemu logs for the lvl2 VM created by uvtool-libvirt: +``` +compute@L1:$ cat /var/log/libvirt/qemu/xenial-guest-lvl2.log + +[...] +2018-10-12T14:28:03.317760Z qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2] +``` + +If I miss sth, let me know! +-- +Nicolas + +On my previous comment, I am in the "16.04 > 16.04 > 16.04" situation. +I say this especially for my 3 dumpxml files. + +Ok, +at lvl1 definition Openstack came up with it's cpu modelling which in this case is actually: + cpu mode='host-passthrough + + a bunch of required features +That is what gives your LVL1 the invpcid feature (so far so good). + +At lvl2 we have +Nova: + <cpu mode='host-model' check='partial'> + <model fallback='allow'/> + <topology sockets='1' cores='1' threads='1'/> + </cpu> +vs uvtool + <!-- has no definition, keeping defaults --> + +Thanks for the data Nicolas! +With that in mind I have set my LVL1 to run the same host-passthrough config that you have reported. +Then again I configure LVL2 to run the same host-model config. +Note: "my 16.04" would not allow "check='partial'", so I dropped it. +What version of libvirt is running in your lvl1 (or all levels)? +Current is 1.3.1-1ubuntu10.24 + +I was feeling glad that it seems that the uvtool style guests work for you as I assumed. +But even with the same CPU definitions used in my case it works for me. +That is for "16.04 > 16.04 > 16.04" as well. + +x86 nested virt is never really supported, just "as good as it happens to work". I wonder if that is one of those cases. +My chip is a somewhat older 12 core "Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz". + +We have plenty of SW workarounds already, but if you can spend the time I wonder if you can: +- reproduce the same on a different host CPU +- To confirm our current theory that the usage/emulation/nesting of invpcid is the root cause, could you on the failing case in the definition for LVL2 add <feature policy='disable' name='invpcid'/> to the cpu section. That would keep the rest as-is, but remove that feature. + + +NB-reply: there was no --os-variant 18.04 released back then, but since there was no change since former releases it doesn't matter - the only drawback is that sometimes people are wondering if it is missing. + +You say: + > What version of libvirt is running in your lvl1 (or all levels)? + +cf my 1st post and below: + +On my Ubuntu16.04 bare metal host: + ``` + libvirt-bin 4.0.0-1ubuntu8.5~cloud0 + qemu-system-x86 1:2.11+dfsg-1ubuntu7.6~cloud0 + ``` + +On one Ubuntu16.04 lvl1 VM: + ``` + libvirt-bin 4.0.0-1ubuntu8.3~cloud0 + qemu-system-x86 1:2.11+dfsg-1ubuntu7.4~cloud0 + ``` +. +. +. +. +You say: + > - [can you] reproduce the same on a different host CPU + +If I can I will try, on a much 'smaller' device (not a Xeon CPU). +Maybe tomorow. +. +. +. +. +You say: + > To confirm our current theory that the usage/emulation/nesting of invpcid + is the root cause [...] + +Dumpxml of my lvl2 VM with "<feature policy='disable' name='invpcid'/>": +https://paste.ubuntu.com/p/WxvfBcHnF2/ + +And here is the boot log of the lvl2 VM with invpcid disabled: +The VM failed to boot. Maybe I missed sth. +https://paste.ubuntu.com/p/bkqDsT8VTy/ + +I followed [1]: +[1] https://youth2009.org/post/kvm-with-ubuntu-cloud-image/ +. +. +. +. +you say: + > NB-reply: there was no --os-variant 18.04 released back then + +No big deal, it is just frustrating to instantiate an Ubuntu18.04 and to set +this parameter to something else ! +. +. +. +. +You say: + > x86 nested virt is never really supported, just "as good as it happens to work". I wonder if that is one of those cases. + +First I thought nested virt could make my life easier regarding what I wanted +to achieve. There are several ways of testing and deploying OpenStack. +With the way I chose, I could 'simulate' a multi-hosts environment like in [2] +with several compute nodes, etc... + +[2] https://github.com/nuagenetworks/nuage-openstack-ansible/wiki/Configure-OSA-Multi-node-Environment + +But now I understand that nested virt is maybe too much a beta. +I will try with "Ubuntu 18.04 > Ubuntu 16/18 > {whatever OS}". +Maybe this is patched in Ubuntu18.04. +And if it does not suit my needs, I will figure out something else (and more +bare metal ^^). + +Metal as a Service (Ubuntu MAAS) looks good, but it is too much for me. +. +. +. +. +TY for your help. I think this bug is hard to identify and maybe harder to +patch. And I am not a virtualization or qemu or OpenStack expert +(for the moment !?). So I can't help you more. + +[Expired for qemu (Ubuntu) because there has been no activity for 60 days.] + +FWIW, bumping the kernel on the host (and most likely on the L1 VMs too) should work. +The HWE kernel in Xenial is the same version (4.15) with the kernel used by Bionic (18.04), so this should fix the problem: +$ apt install linux-generic-hwe-16.04 +$ reboot + +BR, +Alex + +Thank you for your suggestion @Alexandru ! +. +. +. +I can not try this fix because since then I have moved on and I use Ubuntu18.04 for my L0 hypervisor, and I have also tried with Ubuntu18.04 on th L1 VMs. +. +. +. +However very interesting. On my previous Ubuntu16.04 hosts, I believe I used "linux-image-4.4.0-XYZ-generic". + +I think I have encountered exactly the same issue as you Nico. We have very similar setups. +Upgrading the kernel did not help. +Only thing that helped was setting openstack to use qemu instead of kvm in L2 VMs with the performance cost associated with doing that :( + |