summary refs log tree commit diff stats
path: root/results/scraper/launchpad/1915063
blob: 70fa59c337f23539c5f2c0b52d1f6895cd2d2e0d (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
Windows 10 wil not install using qemu-system-x86_64

Steps to reproduce
install virt-manager and ovmf if nopt already there
copy windows and virtio iso files to /var/lib/libvirt/images

Use virt-manager from local machine to create your VMs with the disk, CPUs and memory required
    Select customize configuration then select OVMF(UEFI) instead of seabios
    set first CDROM to the windows installation iso (enable in boot options)
    add a second CDROM and load with the virtio iso
	change spice display to VNC

  Always get a security error from windows and it fails to launch the installer (works on RHEL and Fedora)
I tried updating the qemu version from Focals 4.2 to Groovy 5.0 which was of no help

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1915063

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

We believe that the latest version of Windows doesn't play nice with the older version of QEMU - it seems Windows is broken somewhere between 4.2.0-2 and 5.0.0-6 on AMD Ryzen based processors (which is what we have on the P620).

What are the recommendations for the best way for a Ubuntu customer to get going again, with an updated version of QEMU?


I'm subscribing Christian to this bug, who is our QEMU expert.

Hi,
I was just recently for a different case installing a Win10 guest and had the ISOs available.
So I was trying to install an OVMF based one through virt-manager as you asked.

ISOs
- win10 20H2_v2
- virtio iso 0.1.190

I used the default "new guest" of virt-manager and then went into "customize before install" to also add the virtio drivers and select the OVMF mode.

BTW I'd not expect that providing the virtio ISO is related to triggering (or not triggering the bug). If you could confirm that it happens without that as well it would make reproducing this a little bit easier. The default config uses SATA.

Info:
If the ISO is not auto-boointg (depends on some setup detail e.g. how you add your extra CD images) you can still manually load the Windows EFI loader like:
FS0:
cd EFI
cd BOOT
BOOTX64.EFI

I started my tests on 21.04 Hirsute as that is what I had around.
qemu 1:5.2+dfsg-6ubuntu2 virt-manager 1:3.2.0-3 ovmf 2020.11-2
That started up the installer just fine no matter if I used SATA or virtio for the root disk.

Next I tried Focal (as reported)
qemu 1:4.2-3ubuntu6.14 virt-manager 1:2.2.1-3ubuntu2.1 ovmf 0~20191122.bd85bf54-2ubuntu3.1
That also started the windows installer without an error.


Therefore I can't confirm your issue (yet) and will set this to incomplete.
Have you in the meantime found more details about what exactly makes the issue trigger/pass for you?

In regard your further comments and for clarification:
- @Mark: I don't have a Ryzen based processor thou, so your case could still be absolutely
  valid, yet I can't confirm/deny at the time. Do you have any more data showing that it is 
  processor dependent?
- David said 5.0 didn't help while Mark said between .2.0-2 and 5.0.0-6 it is fixed. Are
  we sure you all talk about the same bug?
  For those that want/can try a new version give the virt-stack from [1] a try, if that 
  resolves it for you we can look fox fixes in between those versions.
- David you said "get a security error from windows" can you be more specific about the error 
  that you get? That will also help to check if you two are facing the same issue.

[1]: https://launchpad.net/~canonical-server/+archive/ubuntu/server-backports

I am not using a virtio based drive so that should not be par of the issue, I normally do not use the virtio iso until windows is installed to clear errors in the device manager
I tried using even the version 5.2 from the hirsute release and that also is not working
As a test I tried doing this from an Intel based machine and it does install correctly using even the default version of qemu-system_x64 from focal
Attaching a screen show of the error I get when installing on an AMD Ryzen Threadripper processor



Thanks David,
I have no threadripper around atm, I think I can next week get hands on an EPYC Rome, but that isn't 100% the same.

But gladly you tried this on the latest qemu 5.2 and since it is failing there as well it might be worth to also report it upstream. That is a great community which might have ran things on a threadripper already and be able to point us to a qemu/kernel fix - or at least an existing discussions abut it.
For now I'm adding a qemu task here which will mirror this case to the mailing list.

I was playing around with this and find that if I change the Configuration under CPUs from the default (uncheck "Copy host CPU configuration") and select qemu64 in the Model drop down box I can get it to work

That is awesome David,
qemu64 is like a very low common denominator with only very basic CPU features.
While "copy host" means "enable all you can".

We can surely work with that a bit, but until I get access to the same HW I need you to do it.


If you run in a console `$virsh domcapabilities` it will spew some XML at you. One of the sections will be for "host-model". In my case that looks like

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>Skylake-Client-IBRS</model>
      <vendor>Intel</vendor>
      <feature policy='require' name='ss'/>
      <feature policy='require' name='vmx'/>
      <feature policy='require' name='hypervisor'/>
...
    </mode>


That means a names CPU type (the one that is closest to what you have) and some feature additionally enabled/disabled.

If you could please post the full output you have, that can be useful.
From there you could go two steps.
1. as you see in my example it will list some cpu features on top of the named type.
   If you remove them one by one you might be able to identify the single-cpu featute
   that breaks in your case.
2. The named CPU that you have also has a representation, it can be found in
   /usr/share/libvirt/cpu_map...
   That ill list all the CPU features that make up the named type.
   If #1 wasn't sufficient, you can now add those to your guest definition one by one in disabled 
   state, example
    <feature policy='disable' name='ss'/>

A description of the underlying mechanism is here https://libvirt.org/formatdomain.html#cpu-model-and-topology

<domainCapabilities>
  <path>/usr/bin/qemu-system-x86_64</path>
  <domain>kvm</domain>
  <machine>pc-i440fx-hirsute</machine>
  <arch>x86_64</arch>
  <vcpu max='255'/>
  <iothreads supported='yes'/>
  <os supported='yes'>
    <enum name='firmware'>
      <value>efi</value>
    </enum>
    <loader supported='yes'>
      <value>/usr/share/OVMF/OVMF_CODE_4M.fd</value>
      <enum name='type'>
        <value>rom</value>
        <value>pflash</value>
      </enum>
      <enum name='readonly'>
        <value>yes</value>
        <value>no</value>
      </enum>
      <enum name='secure'>
        <value>no</value>
      </enum>
    </loader>
  </os>
  <cpu>
    <mode name='host-passthrough' supported='yes'/>
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Rome</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>
    <mode name='custom' supported='yes'>
      <model usable='yes'>qemu64</model>
      <model usable='yes'>qemu32</model>
      <model usable='no'>phenom</model>
      <model usable='yes'>pentium3</model>
      <model usable='yes'>pentium2</model>
      <model usable='yes'>pentium</model>
      <model usable='no'>n270</model>
      <model usable='yes'>kvm64</model>
      <model usable='yes'>kvm32</model>
      <model usable='no'>coreduo</model>
      <model usable='no'>core2duo</model>
      <model usable='no'>athlon</model>
      <model usable='no'>Westmere-IBRS</model>
      <model usable='yes'>Westmere</model>
      <model usable='no'>Skylake-Server-noTSX-IBRS</model>
      <model usable='no'>Skylake-Server-IBRS</model>
      <model usable='no'>Skylake-Server</model>
      <model usable='no'>Skylake-Client-noTSX-IBRS</model>
      <model usable='no'>Skylake-Client-IBRS</model>
      <model usable='no'>Skylake-Client</model>
      <model usable='no'>SandyBridge-IBRS</model>
      <model usable='yes'>SandyBridge</model>
      <model usable='yes'>Penryn</model>
      <model usable='no'>Opteron_G5</model>
      <model usable='no'>Opteron_G4</model>
      <model usable='yes'>Opteron_G3</model>
      <model usable='yes'>Opteron_G2</model>
      <model usable='yes'>Opteron_G1</model>
      <model usable='no'>Nehalem-IBRS</model>
      <model usable='yes'>Nehalem</model>
      <model usable='no'>IvyBridge-IBRS</model>
      <model usable='no'>IvyBridge</model>
      <model usable='no'>Icelake-Server-noTSX</model>
      <model usable='no'>Icelake-Server</model>
      <model usable='no'>Icelake-Client-noTSX</model>
      <model usable='no'>Icelake-Client</model>
      <model usable='no'>Haswell-noTSX-IBRS</model>
      <model usable='no'>Haswell-noTSX</model>
      <model usable='no'>Haswell-IBRS</model>
      <model usable='no'>Haswell</model>
      <model usable='yes'>EPYC-Rome</model>
      <model usable='yes'>EPYC-IBPB</model>
      <model usable='yes'>EPYC</model>
      <model usable='yes'>Dhyana</model>
      <model usable='yes'>Conroe</model>
      <model usable='no'>Cascadelake-Server-noTSX</model>
      <model usable='no'>Cascadelake-Server</model>
      <model usable='no'>Broadwell-noTSX-IBRS</model>
      <model usable='no'>Broadwell-noTSX</model>
      <model usable='no'>Broadwell-IBRS</model>
      <model usable='no'>Broadwell</model>
      <model usable='yes'>486</model>
    </mode>
  </cpu>
  <devices>
    <disk supported='yes'>
      <enum name='diskDevice'>
        <value>disk</value>
        <value>cdrom</value>
        <value>floppy</value>
        <value>lun</value>
      </enum>
      <enum name='bus'>
        <value>ide</value>
        <value>fdc</value>
        <value>scsi</value>
        <value>virtio</value>
        <value>usb</value>
        <value>sata</value>
      </enum>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
    </disk>
    <graphics supported='yes'>
      <enum name='type'>
        <value>sdl</value>
        <value>vnc</value>
        <value>spice</value>
      </enum>
    </graphics>
    <video supported='yes'>
      <enum name='modelType'>
        <value>vga</value>
        <value>cirrus</value>
        <value>vmvga</value>
        <value>qxl</value>
        <value>virtio</value>
        <value>none</value>
        <value>bochs</value>
        <value>ramfb</value>
      </enum>
    </video>
    <hostdev supported='yes'>
      <enum name='mode'>
        <value>subsystem</value>
      </enum>
      <enum name='startupPolicy'>
        <value>default</value>
        <value>mandatory</value>
        <value>requisite</value>
        <value>optional</value>
      </enum>
      <enum name='subsysType'>
        <value>usb</value>
        <value>pci</value>
        <value>scsi</value>
      </enum>
      <enum name='capsType'/>
      <enum name='pciBackend'>
        <value>default</value>
        <value>vfio</value>
      </enum>
    </hostdev>
    <rng supported='yes'>
      <enum name='model'>
        <value>virtio</value>
        <value>virtio-transitional</value>
        <value>virtio-non-transitional</value>
      </enum>
      <enum name='backendModel'>
        <value>random</value>
        <value>egd</value>
      </enum>
    </rng>
  </devices>
  <features>
    <gic supported='no'/>
    <vmcoreinfo supported='yes'/>
    <genid supported='yes'/>
    <backingStoreInput supported='yes'/>
    <backup supported='no'/>
    <sev supported='no'/>
  </features>
</domainCapabilities>

Ok, so you should be able to drop these lines one by one:

      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>

If that does not yet make it work, add those one by one (removing the features of the named type)
    <feature policy='disable' name='3dnowprefetch'/>
    <feature policy='disable' name='abm'/>
    <feature policy='disable' name='adx'/>
    <feature policy='disable' name='aes'/>
    <feature policy='disable' name='amd-stibp'/>
    <feature policy='disable' name='apic'/>
    <feature policy='disable' name='arat'/>
    <feature policy='disable' name='avx'/>
    <feature policy='disable' name='avx2'/>
    <feature policy='disable' name='bmi1'/>
    <feature policy='disable' name='bmi2'/>
    <feature policy='disable' name='clflush'/>
    <feature policy='disable' name='clflushopt'/>
    <feature policy='disable' name='clwb'/>
    <feature policy='disable' name='clzero'/>
    <feature policy='disable' name='cmov'/>
    <feature policy='disable' name='cr8legacy'/>
    <feature policy='disable' name='cx16'/>
    <feature policy='disable' name='cx8'/>
    <feature policy='disable' name='de'/>
    <feature policy='disable' name='f16c'/>
    <feature policy='disable' name='fma'/>
    <feature policy='disable' name='fpu'/>
    <feature policy='disable' name='fsgsbase'/>
    <feature policy='disable' name='fxsr'/>
    <feature policy='disable' name='fxsr_opt'/>
    <feature policy='disable' name='ibpb'/>
    <feature policy='disable' name='lahf_lm'/>
    <feature policy='disable' name='lm'/>
    <feature policy='disable' name='mca'/>
    <feature policy='disable' name='mce'/>
    <feature policy='disable' name='misalignsse'/>
    <feature policy='disable' name='mmx'/>
    <feature policy='disable' name='mmxext'/>
    <feature policy='disable' name='movbe'/>
    <feature policy='disable' name='msr'/>
    <feature policy='disable' name='mtrr'/>
    <feature policy='disable' name='npt'/>
    <feature policy='disable' name='nrip-save'/>
    <feature policy='disable' name='nx'/>
    <feature policy='disable' name='osvw'/>
    <feature policy='disable' name='pae'/>
    <feature policy='disable' name='pat'/>
    <feature policy='disable' name='pclmuldq'/>
    <feature policy='disable' name='pdpe1gb'/>
    <feature policy='disable' name='perfctr_core'/>
    <feature policy='disable' name='pge'/>
    <feature policy='disable' name='pni'/>
    <feature policy='disable' name='popcnt'/>
    <feature policy='disable' name='pse'/>
    <feature policy='disable' name='pse36'/>
    <feature policy='disable' name='rdpid'/>
    <feature policy='disable' name='rdrand'/>
    <feature policy='disable' name='rdseed'/>
    <feature policy='disable' name='rdtscp'/>
    <feature policy='disable' name='sep'/>
    <feature policy='disable' name='sha-ni'/>
    <feature policy='disable' name='smap'/>
    <feature policy='disable' name='smep'/>
    <feature policy='disable' name='sse'/>
    <feature policy='disable' name='sse2'/>
    <feature policy='disable' name='sse4.1'/>
    <feature policy='disable' name='sse4.2'/>
    <feature policy='disable' name='sse4a'/>
    <feature policy='disable' name='ssse3'/>
    <feature policy='disable' name='svm'/>
    <feature policy='disable' name='syscall'/>
    <feature policy='disable' name='tsc'/>
    <feature policy='disable' name='umip'/>
    <feature policy='disable' name='vme'/>
    <feature policy='disable' name='wbnoinvd'/>
    <feature policy='disable' name='xgetbv1'/>
    <feature policy='disable' name='xsave'/>
    <feature policy='disable' name='xsavec'/>
    <feature policy='disable' name='xsaveerptr'/>
    <feature policy='disable' name='xsaveopt'/>

Eventually I'd hope you identify one feature (re add everything but this to verify) that breaks it. Any chance to do this iterative test? You could also "bisect" this list if you want to save some time.

On Sat, 03 Apr 2021 16:52:13 -0000
Christian Ehrhardt  <email address hidden> wrote:

> That is awesome David,
> qemu64 is like a very low common denominator with only very basic CPU features.
> While "copy host" means "enable all you can".

Also it's worth to try setting real CPU topology for if EPYC cpu model is used.
i.e. use -smp with options that resemble a real EPYC cpu
(for number of core complexes is configured with 'dies' option in QEMU)


> We can surely work with that a bit, but until I get access to the same
> HW I need you to do it.
> 
>
> If you run in a console `$virsh domcapabilities` it will spew some XML at you. One of the sections will be for "host-model". In my case that looks like
> 
>     <mode name='host-model' supported='yes'>
>       <model fallback='forbid'>Skylake-Client-IBRS</model>
>       <vendor>Intel</vendor>
>       <feature policy='require' name='ss'/>
>       <feature policy='require' name='vmx'/>
>       <feature policy='require' name='hypervisor'/>
> ...
>     </mode>
> 
> 
> That means a names CPU type (the one that is closest to what you have) and some feature additionally enabled/disabled.
> 
> If you could please post the full output you have, that can be useful.
> >From there you could go two steps.  
> 1. as you see in my example it will list some cpu features on top of the named type.
>    If you remove them one by one you might be able to identify the single-cpu featute
>    that breaks in your case.
> 2. The named CPU that you have also has a representation, it can be found in
>    /usr/share/libvirt/cpu_map...
>    That ill list all the CPU features that make up the named type.
>    If #1 wasn't sufficient, you can now add those to your guest definition one by one in disabled 
>    state, example
>     <feature policy='disable' name='ss'/>
> 
> A description of the underlying mechanism is here
> https://libvirt.org/formatdomain.html#cpu-model-and-topology
> 



I have not done any of what you are asking so not exactly sure how to change those values, been looking and reading but not finding what I want so thought it might be better to just ask how to do what yo are asking. I did try CPU type EPYC and that did get past the error I am seeing on install

On Wed, 07 Apr 2021 13:00:23 -0000
David Ober <email address hidden> wrote:

> I have not done any of what you are asking so not exactly sure how to
> change those values, been looking and reading but not finding what I
> want so thought it might be better to just ask how to do what yo are
> asking.

see https://libvirt.org/formatdomain.html#cpu-model-and-topology
for the way to describe topology in domain xml.
Pick a real AMD CPU for cpu model you're are having problem with,
and use its config to define topology.

> I did try CPU type EPYC and that did get past the error I am
> seeing on install
So it works with EPYC but not with ECPY-Rome, then probably topology
is not issue.

CCing Babu,
who added EPYC-Rome cpu model, maybe he can help




I remember seeing something similar before. This was supposed to be fixed by the linux kernel commit.

commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
Author: Maxim Levitsky <email address hidden>
Date:   Wed Jul 8 14:57:31 2020 +0300

kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host

# git describe --contains 841c2be09fe4f495fe5224952a419bd8c7e5b455
v5.9-rc1~121^2~67

Problem seems to happen with EPYC-Rome model which exposes the feature STIBP but not IBRS.

Did you guys  try "-cpu host"? It might work.


Thanks Babu/Igor for chiming in!

@Babu
That exposed STIBP but not IBRS - isn't that what you tried to solve (for userspace) in qemu via a v2 for the Rome chips?
=> https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01020.html

I was recently pinging that, as it wasn't merged into the qemu 6.0-rc
Do you have any more insight why this is held back still?

If I might ask - how does the kernel fix you referenced interact with this proposed qemu change?
Assumptions (please correct me):
1. with the qemu change and using that Rome-v2 it would ask to expose both features and no more crash (even on unfixed kernels)
2. with the kernel fix it will no more crash, even with an unfixed qemu?

Finally I'm able to test on a Threadripper myself now.

Note: In regard to the commit that Babu identified - I'm on kernel 5.10.0-1020-oem so that patch would be applied already. I need to find an older kernel to retry with that as well

(on that new kernel) I did a full Win10 install and it worked fine for me.

In regard to CPU types (for comparison) I got

qemu 1:4.2-3ubuntu6.15 / libvirt 6.0.0-0ubuntu8.8:
    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Rome</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>

With a more recent qemu/libvirt it isn't much different for this chip (there recently were some Milan changes, but those seem not to matter for this chip).

qemu  1:5.2+dfsg-9ubuntu1 / libvirt 7.0.0-2ubuntu1

    <mode name='host-model' supported='yes'>
      <model fallback='forbid'>EPYC-Rome</model>
      <vendor>AMD</vendor>
      <feature policy='require' name='x2apic'/>
      <feature policy='require' name='tsc-deadline'/>
      <feature policy='require' name='hypervisor'/>
      <feature policy='require' name='tsc_adjust'/>
      <feature policy='require' name='stibp'/>
      <feature policy='require' name='arch-capabilities'/>
      <feature policy='require' name='ssbd'/>
      <feature policy='require' name='xsaves'/>
      <feature policy='require' name='cmp_legacy'/>
      <feature policy='require' name='invtsc'/>
      <feature policy='require' name='amd-ssbd'/>
      <feature policy='require' name='virt-ssbd'/>
      <feature policy='require' name='rdctl-no'/>
      <feature policy='require' name='skip-l1dfl-vmentry'/>
      <feature policy='require' name='mds-no'/>
      <feature policy='require' name='pschange-mc-no'/>
    </mode>


I wasn't able to crash this setup with an old (18.04) nor a new 21.04) Ubuntu guest.
Installing Win10 worked fine for a while and didn't break as reported. But the setup I have goes through triple ssh-tunnels and around the globe - that slows things down a lot :-/
This is how far I've got:
1. start up the install
2. select no license key -> custom install -> it started copying files
3. it goes into the first reboot

After this the latency kills me and virt-manager starts to abort the installation.
So far I did not hit (https://launchpadlibrarian.net/529734412/security.png) as reported by David.
@David - did this already pass the critical step for you, how early or late in the install did you hit the issues.


As I said I'll probably need to find an older kernel anyway (to be before the commit that Babu referenced)

@Christian,
Yes. This following patch fixes the problem
 https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg01020.html
 
I saw your ping on the patch. I am not sure why it is not picked up. I am going ping them today.

>If I might ask - how does the kernel fix you referenced interact with this proposed qemu change?
>Assumptions (please correct me):
Problem seem to happen when guest tries to access the SPEC_CTRL register to with the wrong settings. The kernel fix avoids writing those values and avoids #GP fault.
  
>1. with the qemu change and using that Rome-v2 it would ask to expose both features and no more crash (even >on unfixed kernels)
Yes. With Qemu patch EPYC-Rome v2 this issue will be fixed.

>2. with the kernel fix it will no more crash, even with an unfixed qemu?
Yes, That is correct. We need at lease one of these patches to fix this problem.



David used "5.6.0-1042.46-oem", the closest I had was "5.6.0-1052-oem" so I tried that one.

With that my win10 install immediately crashed into the reported issue.

So to summarize:
1. I can reproduce it
2. Chances are high that it is fixed by kernel commit 841c2be0 "kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host"
3. there are some qemu changes which might be related, but we need Babu to reply about if/how those are related

I need to get myself updated on Ubuntu oem kernels.
If there is a 5.6 series that is supposed to work on that, then this patch needs to be backported.
But if OTOH it is a valid upgrade path that you'll get the 5.10.0-1020-oem that I had or later as part of your 20.04 OEM then that "is the fix" for you @David.





This change was made by a bot.

Thanks @Babu for the clarifications!
I really hope that the qemu patch makes it in v6.0 - then I can better consider picking it up as backport for qemu (already have a bug about that in bug 1921754 - therefore I'm setting the qemu task here as invalid)

The last step I can provide for the kernel bug that this one here is (before the rest of the work is with the kernel Team) is to verify/falsify if that also affects the non-oem linux-generic kernel.
There the latest was 5.4.0.71.74 from focal-proposed and the latest already released one is 5.4.0.70.73.

5.4.0.70.73 - failing
5.4.0.71.74 - failing

So while the almost-released oem kernel based on 5.10 will cover this - the patch should indeed also be backported to linux-generic and all the other flavours - otherwise Windows (and potentially more) will no more be usable as KVM guest on such Chips (threadrippers, but maybe more AMD chips that are not yet known as well)

The commit in question is marked for stable:

  commit 841c2be09fe4f495fe5224952a419bd8c7e5b455
  Author: Maxim Levitsky <email address hidden>
  Date:   Wed Jul 8 14:57:31 2020 +0300

    kvm: x86: replace kvm_spec_ctrl_test_value with runtime test on the host
    
    To avoid complex and in some cases incorrect logic in
    kvm_spec_ctrl_test_value, just try the guest's given value on the host
    processor instead, and if it doesn't #GP, allow the guest to set it.
    
    One such case is when host CPU supports STIBP mitigation
    but doesn't support IBRS (as is the case with some Zen2 AMD cpus),
    and in this case we were giving guest #GP when it tried to use STIBP
    
    The reason why can can do the host test is that IA32_SPEC_CTRL msr is
    passed to the guest, after the guest sets it to a non zero value
    for the first time (due to performance reasons),
    and as as result of this, it is pointless to emulate #GP condition on
    this first access, in a different way than what the host CPU does.
    
    This is based on a patch from Sean Christopherson, who suggested this idea.
    
    Fixes: 6441fa6178f5 ("KVM: x86: avoid incorrect writes to host MSR_IA32_SPEC_CTRL")
    Cc: <email address hidden>
    Suggested-by: Sean Christopherson <email address hidden>
    Signed-off-by: Maxim Levitsky <email address hidden>
    Message-Id: <email address hidden>
    Signed-off-by: Paolo Bonzini <email address hidden>

It appears to be in `v5.4.102` which is currently queued up for the cycle following the one just starting.

The patch for QEMU that has been mentioned in comment #38 has been merged already, so I'm marking this as Fix-Released there.