Windows 10 wil not install using qemu-system-x86_64
Steps to reproduce
install virt-manager and ovmf if nopt already there
copy windows and virtio iso files to /var/lib/libvirt/images
Use virt-manager from local machine to create your VMs with the disk, CPUs and memory required
Select customize configuration then select OVMF(UEFI) instead of seabios
set first CDROM to the windows installation iso (enable in boot options)
add a second CDROM and load with the virtio iso
change spice display to VNC
Always get a security error from windows and it fails to launch the installer (works on RHEL and Fedora)
I tried updating the qemu version from Focals 4.2 to Groovy 5.0 which was of no help
This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:
apport-collect 1915063
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
apport information
We believe that the latest version of Windows doesn't play nice with the older version of QEMU - it seems Windows is broken somewhere between 4.2.0-2 and 5.0.0-6 on AMD Ryzen based processors (which is what we have on the P620).
What are the recommendations for the best way for a Ubuntu customer to get going again, with an updated version of QEMU?
I'm subscribing Christian to this bug, who is our QEMU expert.
Hi,
I was just recently for a different case installing a Win10 guest and had the ISOs available.
So I was trying to install an OVMF based one through virt-manager as you asked.
ISOs
- win10 20H2_v2
- virtio iso 0.1.190
I used the default "new guest" of virt-manager and then went into "customize before install" to also add the virtio drivers and select the OVMF mode.
BTW I'd not expect that providing the virtio ISO is related to triggering (or not triggering the bug). If you could confirm that it happens without that as well it would make reproducing this a little bit easier. The default config uses SATA.
Info:
If the ISO is not auto-boointg (depends on some setup detail e.g. how you add your extra CD images) you can still manually load the Windows EFI loader like:
FS0:
cd EFI
cd BOOT
BOOTX64.EFI
I started my tests on 21.04 Hirsute as that is what I had around.
qemu 1:5.2+dfsg-6ubuntu2 virt-manager 1:3.2.0-3 ovmf 2020.11-2
That started up the installer just fine no matter if I used SATA or virtio for the root disk.
Next I tried Focal (as reported)
qemu 1:4.2-3ubuntu6.14 virt-manager 1:2.2.1-3ubuntu2.1 ovmf 0~20191122.bd85bf54-2ubuntu3.1
That also started the windows installer without an error.
Therefore I can't confirm your issue (yet) and will set this to incomplete.
Have you in the meantime found more details about what exactly makes the issue trigger/pass for you?
In regard your further comments and for clarification:
- @Mark: I don't have a Ryzen based processor thou, so your case could still be absolutely
valid, yet I can't confirm/deny at the time. Do you have any more data showing that it is
processor dependent?
- David said 5.0 didn't help while Mark said between .2.0-2 and 5.0.0-6 it is fixed. Are
we sure you all talk about the same bug?
For those that want/can try a new version give the virt-stack from [1] a try, if that
resolves it for you we can look fox fixes in between those versions.
- David you said "get a security error from windows" can you be more specific about the error
that you get? That will also help to check if you two are facing the same issue.
[1]: https://launchpad.net/~canonical-server/+archive/ubuntu/server-backports
I am not using a virtio based drive so that should not be par of the issue, I normally do not use the virtio iso until windows is installed to clear errors in the device manager
I tried using even the version 5.2 from the hirsute release and that also is not working
As a test I tried doing this from an Intel based machine and it does install correctly using even the default version of qemu-system_x64 from focal
Attaching a screen show of the error I get when installing on an AMD Ryzen Threadripper processor
Thanks David,
I have no threadripper around atm, I think I can next week get hands on an EPYC Rome, but that isn't 100% the same.
But gladly you tried this on the latest qemu 5.2 and since it is failing there as well it might be worth to also report it upstream. That is a great community which might have ran things on a threadripper already and be able to point us to a qemu/kernel fix - or at least an existing discussions abut it.
For now I'm adding a qemu task here which will mirror this case to the mailing list.
I was playing around with this and find that if I change the Configuration under CPUs from the default (uncheck "Copy host CPU configuration") and select qemu64 in the Model drop down box I can get it to work
That is awesome David,
qemu64 is like a very low common denominator with only very basic CPU features.
While "copy host" means "enable all you can".
We can surely work with that a bit, but until I get access to the same HW I need you to do it.
If you run in a console `$virsh domcapabilities` it will spew some XML at you. One of the sections will be for "host-model". In my case that looks like
Skylake-Client-IBRS
Intel
...
That means a names CPU type (the one that is closest to what you have) and some feature additionally enabled/disabled.
If you could please post the full output you have, that can be useful.
From there you could go two steps.
1. as you see in my example it will list some cpu features on top of the named type.
If you remove them one by one you might be able to identify the single-cpu featute
that breaks in your case.
2. The named CPU that you have also has a representation, it can be found in
/usr/share/libvirt/cpu_map...
That ill list all the CPU features that make up the named type.
If #1 wasn't sufficient, you can now add those to your guest definition one by one in disabled
state, example
A description of the underlying mechanism is here https://libvirt.org/formatdomain.html#cpu-model-and-topology
/usr/bin/qemu-system-x86_64
kvm
pc-i440fx-hirsute
x86_64
efi
/usr/share/OVMF/OVMF_CODE_4M.fd
rom
pflash
yes
no
no
EPYC-Rome
AMD
qemu64
qemu32
phenom
pentium3
pentium2
pentium
n270
kvm64
kvm32
coreduo
core2duo
athlon
Westmere-IBRS
Westmere
Skylake-Server-noTSX-IBRS
Skylake-Server-IBRS
Skylake-Server
Skylake-Client-noTSX-IBRS
Skylake-Client-IBRS
Skylake-Client
SandyBridge-IBRS
SandyBridge
Penryn
Opteron_G5
Opteron_G4
Opteron_G3
Opteron_G2
Opteron_G1
Nehalem-IBRS
Nehalem
IvyBridge-IBRS
IvyBridge
Icelake-Server-noTSX
Icelake-Server
Icelake-Client-noTSX
Icelake-Client
Haswell-noTSX-IBRS
Haswell-noTSX
Haswell-IBRS
Haswell
EPYC-Rome
EPYC-IBPB
EPYC
Dhyana
Conroe
Cascadelake-Server-noTSX
Cascadelake-Server
Broadwell-noTSX-IBRS
Broadwell-noTSX
Broadwell-IBRS
Broadwell
486
disk
cdrom
floppy
lun
ide
fdc
scsi
virtio
usb
sata
virtio
virtio-transitional
virtio-non-transitional
sdl
vnc
spice
subsystem
default
mandatory
requisite
optional
usb
pci
scsi
default
vfio
virtio
virtio-transitional
virtio-non-transitional
random
egd
Ok, so you should be able to drop these lines one by one:
If that does not yet make it work, add those one by one (removing the features of the named type)
Eventually I'd hope you identify one feature (re add everything but this to verify) that breaks it. Any chance to do this iterative test? You could also "bisect" this list if you want to save some time.
On Sat, 03 Apr 2021 16:52:13 -0000
Christian Ehrhardt wrote:
> That is awesome David,
> qemu64 is like a very low common denominator with only very basic CPU features.
> While "copy host" means "enable all you can".
Also it's worth to try setting real CPU topology for if EPYC cpu model is used.
i.e. use -smp with options that resemble a real EPYC cpu
(for number of core complexes is configured with 'dies' option in QEMU)
> We can surely work with that a bit, but until I get access to the same
> HW I need you to do it.
>
>
> If you run in a console `$virsh domcapabilities` it will spew some XML at you. One of the sections will be for "host-model". In my case that looks like
>
>
> Skylake-Client-IBRS
> Intel
>
>
>
> ...
>
>
>
> That means a names CPU type (the one that is closest to what you have) and some feature additionally enabled/disabled.
>
> If you could please post the full output you have, that can be useful.
> >From there you could go two steps.
> 1. as you see in my example it will list some cpu features on top of the named type.
> If you remove them one by one you might be able to identify the single-cpu featute
> that breaks in your case.
> 2. The named CPU that you have also has a representation, it can be found in
> /usr/share/libvirt/cpu_map...
> That ill list all the CPU features that make up the named type.
> If #1 wasn't sufficient, you can now add those to your guest definition one by one in disabled
> state, example
>
>
> A description of the underlying mechanism is here
> https://libvirt.org/formatdomain.html#cpu-model-and-topology
>
I have not done any of what you are asking so not exactly sure how to change those values, been looking and reading but not finding what I want so thought it might be better to just ask how to do what yo are asking. I did try CPU type EPYC and that did get past the error I am seeing on install
On Wed, 07 Apr 2021 13:00:23 -0000
David Ober wrote:
> I have not done any of what you are asking so not exactly sure how to
> change those values, been looking and reading but not finding what I
> want so thought it might be better to just ask how to do what yo are
> asking.
see https://libvirt.org/formatdomain.html#cpu-model-and-topology
for the way to describe topology in domain xml.
Pick a real AMD CPU for cpu model you're are having problem with,
and use its config to define topology.
> I did try CPU type EPYC and that did get past the error I am
> seeing on install
So it works with EPYC but not with ECPY-Rome, then probably topology
is not issue.
CCing Babu,
who added EPYC-Rome cpu model, maybe he can help
I remember seeing something similar before. This was supposed to be fixed by the linux kernel commit.
commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
Author: Maxim Levitsky
Date: Wed Jul 8 14:57:31 2020 +0300
kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host
# git describe --contains 841c2be09fe4f495fe5224952a419bd8c7e5b455
v5.9-rc1~121^2~67
Problem seems to happen with EPYC-Rome model which exposes the feature STIBP but not IBRS.
Did you guys try "-cpu host"? It might work.
Thanks Babu/Igor for chiming in!
@Babu
That exposed STIBP but not IBRS - isn't that what you tried to solve (for userspace) in qemu via a v2 for the Rome chips?
=> https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01020.html
I was recently pinging that, as it wasn't merged into the qemu 6.0-rc
Do you have any more insight why this is held back still?
If I might ask - how does the kernel fix you referenced interact with this proposed qemu change?
Assumptions (please correct me):
1. with the qemu change and using that Rome-v2 it would ask to expose both features and no more crash (even on unfixed kernels)
2. with the kernel fix it will no more crash, even with an unfixed qemu?
Finally I'm able to test on a Threadripper myself now.
Note: In regard to the commit that Babu identified - I'm on kernel 5.10.0-1020-oem so that patch would be applied already. I need to find an older kernel to retry with that as well
(on that new kernel) I did a full Win10 install and it worked fine for me.
In regard to CPU types (for comparison) I got
qemu 1:4.2-3ubuntu6.15 / libvirt 6.0.0-0ubuntu8.8:
EPYC-Rome
AMD
With a more recent qemu/libvirt it isn't much different for this chip (there recently were some Milan changes, but those seem not to matter for this chip).
qemu 1:5.2+dfsg-9ubuntu1 / libvirt 7.0.0-2ubuntu1
EPYC-Rome
AMD
I wasn't able to crash this setup with an old (18.04) nor a new 21.04) Ubuntu guest.
Installing Win10 worked fine for a while and didn't break as reported. But the setup I have goes through triple ssh-tunnels and around the globe - that slows things down a lot :-/
This is how far I've got:
1. start up the install
2. select no license key -> custom install -> it started copying files
3. it goes into the first reboot
After this the latency kills me and virt-manager starts to abort the installation.
So far I did not hit (https://launchpadlibrarian.net/529734412/security.png) as reported by David.
@David - did this already pass the critical step for you, how early or late in the install did you hit the issues.
As I said I'll probably need to find an older kernel anyway (to be before the commit that Babu referenced)
@Christian,
Yes. This following patch fixes the problem
https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01020.html
I saw your ping on the patch. I am not sure why it is not picked up. I am going ping them today.
>If I might ask - how does the kernel fix you referenced interact with this proposed qemu change?
>Assumptions (please correct me):
Problem seem to happen when guest tries to access the SPEC_CTRL register to with the wrong settings. The kernel fix avoids writing those values and avoids #GP fault.
>1. with the qemu change and using that Rome-v2 it would ask to expose both features and no more crash (even >on unfixed kernels)
Yes. With Qemu patch EPYC-Rome v2 this issue will be fixed.
>2. with the kernel fix it will no more crash, even with an unfixed qemu?
Yes, That is correct. We need at lease one of these patches to fix this problem.
David used "5.6.0-1042.46-oem", the closest I had was "5.6.0-1052-oem" so I tried that one.
With that my win10 install immediately crashed into the reported issue.
So to summarize:
1. I can reproduce it
2. Chances are high that it is fixed by kernel commit 841c2be0 "kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host"
3. there are some qemu changes which might be related, but we need Babu to reply about if/how those are related
I need to get myself updated on Ubuntu oem kernels.
If there is a 5.6 series that is supposed to work on that, then this patch needs to be backported.
But if OTOH it is a valid upgrade path that you'll get the 5.10.0-1020-oem that I had or later as part of your 20.04 OEM then that "is the fix" for you @David.
This change was made by a bot.
Thanks @Babu for the clarifications!
I really hope that the qemu patch makes it in v6.0 - then I can better consider picking it up as backport for qemu (already have a bug about that in bug 1921754 - therefore I'm setting the qemu task here as invalid)
The last step I can provide for the kernel bug that this one here is (before the rest of the work is with the kernel Team) is to verify/falsify if that also affects the non-oem linux-generic kernel.
There the latest was 5.4.0.71.74 from focal-proposed and the latest already released one is 5.4.0.70.73.
5.4.0.70.73 - failing
5.4.0.71.74 - failing
So while the almost-released oem kernel based on 5.10 will cover this - the patch should indeed also be backported to linux-generic and all the other flavours - otherwise Windows (and potentially more) will no more be usable as KVM guest on such Chips (threadrippers, but maybe more AMD chips that are not yet known as well)
The commit in question is marked for stable:
commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
Author: Maxim Levitsky
Date: Wed Jul 8 14:57:31 2020 +0300
kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host
To avoid complex and in some cases incorrect logic in
kvm_spec_ctrl_test_value, just try the guest's given value on the host
processor instead, and if it doesn't #GP, allow the guest to set it.
One such case is when host CPU supports STIBP mitigation
but doesn't support IBRS (as is the case with some Zen2 AMD cpus),
and in this case we were giving guest #GP when it tried to use STIBP
The reason why can can do the host test is that IA32_SPEC_CTRL msr is
passed to the guest, after the guest sets it to a non zero value
for the first time (due to performance reasons),
and as as result of this, it is pointless to emulate #GP condition on
this first access, in a different way than what the host CPU does.
This is based on a patch from Sean Christopherson, who suggested this idea.
Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL")
Cc:
Suggested-by: Sean Christopherson
Signed-off-by: Maxim Levitsky
Message-Id:
Signed-off-by: Paolo Bonzini
It appears to be in `v5.4.102` which is currently queued up for the cycle following the one just starting.
The patch for QEMU that has been mentioned in comment #38 has been merged already, so I'm marking this as Fix-Released there.