1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
|
qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()
Unable to launch Default Fedora 29 images in gnome-boxes
ProblemType: Crash
DistroRelease: Ubuntu 19.04
Package: qemu-system-x86 1:3.1+dfsg-2ubuntu1
ProcVersionSignature: Ubuntu 4.19.0-12.13-generic 4.19.18
Uname: Linux 4.19.0-12-generic x86_64
ApportVersion: 2.20.10-0ubuntu20
Architecture: amd64
Date: Thu Feb 14 11:00:45 2019
ExecutablePath: /usr/bin/qemu-system-x86_64
KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND
MachineType: Dell Inc. Precision T3610
ProcEnviron: PATH=(custom, user)
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.19.0-12-generic root=UUID=939b509b-d627-4642-a655-979b44972d17 ro splash quiet vt.handoff=1
Signal: 31
SourcePackage: qemu
StacktraceTop:
__pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
() at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
() at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
start_thread (arg=<optimized out>) at pthread_create.c:486
clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Title: qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new()
UpgradeStatus: Upgraded to disco on 2018-11-14 (91 days ago)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo video
dmi.bios.date: 11/14/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A18
dmi.board.name: 09M8Y8
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 7
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA18:bd11/14/2018:svnDellInc.:pnPrecisionT3610:pvr00:rvnDellInc.:rn09M8Y8:rvrA01:cvnDellInc.:ct7:cvr:
dmi.product.name: Precision T3610
dmi.product.sku: 05D2
dmi.product.version: 00
dmi.sys.vendor: Dell Inc.
StacktraceTop:
__pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
?? () from /tmp/apport_sandbox_8_pwkx51/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so
?? ()
?? ()
?? ()
I can confirm the reported issue
Trace looks similar:
--- stack trace ---
#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
__arg2 = 128
_a3 = 139730004883072
_a1 = 22587
resultvar = <optimized out>
__arg3 = 139730004883072
__arg1 = 22587
_a2 = 128
pd = <optimized out>
res = <optimized out>
#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
No symbol table info available.
#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
No symbol table info available.
#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486
ret = <optimized out>
pd = <optimized out>
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139730004887296, -2085932122569588158, 140733496626446, 140733496626447, 0, 139730004883520, 2100820740254843458, 2100830499542516290}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
[Error: pthread_setaffinity.c was not found in source tree]
#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486
[Error: pthread_create.c was not found in source tree]
#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[Error: clone.S was not found in source tree]
libvirt XML that was generated:
<domain type="kvm">
<name>fedora29-wor</name>
<uuid>2f4e83f7-18ed-45e2-bbf7-eef9f1c6c6c0</uuid>
<title>Fedora 29 Workstation</title>
<metadata>
<boxes:gnome-boxes xmlns:boxes="https://wiki.gnome.org/Apps/Boxes">
<os-state>live</os-state>
<media-id>http://fedoraproject.org/fedora/29:0</media-id>
<media>/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso</media>
</boxes:gnome-boxes>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://fedoraproject.org/fedora/29"/>
</libosinfo:libosinfo>
</metadata>
<memory unit="KiB">2097152</memory>
<currentMemory unit="KiB">2097152</currentMemory>
<vcpu placement="static">2</vcpu>
<os>
<type arch="x86_64" machine="pc-q35-3.1">hvm</type>
<boot dev="cdrom"/>
<boot dev="hd"/>
</os>
<features>
<acpi/>
<apic/>
</features>
<cpu mode="host-passthrough" check="none">
<topology sockets="1" cores="2" threads="1"/>
</cpu>
<clock offset="utc">
<timer name="rtc" tickpolicy="catchup"/>
<timer name="pit" tickpolicy="delay"/>
<timer name="hpet" present="no"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>destroy</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type="file" device="disk">
<driver name="qemu" type="qcow2" cache="writeback"/>
<source file="/home/paelzer/.local/share/gnome-boxes/images/fedora29-wor"/>
<target dev="vda" bus="virtio"/>
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</disk>
<disk type="file" device="cdrom">
<driver name="qemu" type="raw"/>
<source file="/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso" startupPolicy="mandatory"/>
<target dev="hdc" bus="sata"/>
<readonly/>
<address type="drive" controller="0" bus="0" target="0" unit="2"/>
</disk>
<controller type="usb" index="0" model="ich9-ehci1">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x7"/>
</controller>
<controller type="usb" index="0" model="ich9-uhci1">
<master startport="0"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x0" multifunction="on"/>
</controller>
<controller type="usb" index="0" model="ich9-uhci2">
<master startport="2"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x1"/>
</controller>
<controller type="usb" index="0" model="ich9-uhci3">
<master startport="4"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x2"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x10"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="2" port="0x11"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0x12"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0x13"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0x14"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
</controller>
<controller type="virtio-serial" index="0">
<address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</controller>
<controller type="ccid" index="0">
<address type="usb" bus="0" port="1"/>
</controller>
<interface type="user">
<mac address="52:54:00:ee:17:af"/>
<model type="virtio"/>
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<smartcard mode="passthrough" type="spicevmc">
<address type="ccid" controller="0" slot="0"/>
</smartcard>
<serial type="pty">
<target type="isa-serial" port="0">
<model name="isa-serial"/>
</target>
</serial>
<console type="pty">
<target type="serial" port="0"/>
</console>
<channel type="spicevmc">
<target type="virtio" name="com.redhat.spice.0"/>
<address type="virtio-serial" controller="0" bus="0" port="1"/>
</channel>
<channel type="spiceport">
<source channel="org.spice-space.webdav.0"/>
<target type="virtio" name="org.spice-space.webdav.0"/>
<address type="virtio-serial" controller="0" bus="0" port="2"/>
</channel>
<input type="tablet" bus="usb">
<address type="usb" bus="0" port="2"/>
</input>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<graphics type="spice">
<listen type="none"/>
<image compression="off"/>
<gl enable="yes"/>
</graphics>
<sound model="ich9">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
</sound>
<video>
<model type="virtio" heads="1" primary="yes">
<acceleration accel3d="yes"/>
</model>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
</video>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="3"/>
</redirdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="4"/>
</redirdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="5"/>
</redirdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="6"/>
</redirdev>
<memballoon model="virtio">
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</memballoon>
</devices>
</domain>
Interestingly, the Ubuntu 18.10 image works.
So is it really an attribute of the guest that breaks it?
BTW - Arr, why does it spawn its own libvirtd ?!
Dear gnome boxes what are you doing?
0 1000 21610 1 20 0 85807204 68912 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess 2 15
0 1000 21612 1 20 0 85772584 34132 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitNetworkProcess 3 15
0 1000 21649 1 20 0 1391464 39144 poll_s Sl ? 0:00 /usr/sbin/libvirtd --timeout=30
Thanks to "lsof +fg -p" some important paths:
The guest log is in /home/paelzer/.cache/libvirt/qemu/log/ubuntu18.10.log
Control sockets are at
/run/user/1000/libvirt/libvirt-sock
/run/user/1000/libvirt/libvirt-admin-sock
Now lets try to poke at it without that UI around it ....
The following gets me to non boxy libvirt:
$ virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock list --all
For now I'll assume that it is NOT depending on the guest, but lets modify the working Ubuntu guest one by one to become more like the F29 guest and we will see.
1. different disks/iso's/MAC (obviously)
2. F29 has gl enabled on the spice graphics
3. video F29: virtio Ubuntu: qxl
4. video has <acceleration accel3d='yes'/> set
That is all the difference, so it seems 3d'ish to me.
First change
<model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
to
<model type='virtio' heads='1' primary='yes'>
=> still working
Second change enable gl
<gl enable='no'/>
to
<gl enable='yes'/>
=> Broken
Lets take back the First change but keep only the second.
=> still broken.
So it is the enablement of gl which I work on anyway recently (some apparmor changes to make it work in my former setup).
Thanks for sharing this bug, but I need to analyze more in depth what is wrong here, but that might take a while.
Note: Since your guest crashed on start the crash has no private data - marking the bug public ...
For the time being as a workaround:
virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock edit fedora29-wor
(assuming that is your guest name as well)
and switch off the gl enablement.
Gives me a perfectly working guest, hope that helps you for now until a real fix is found.
FTR: this guest XML (not out of gnome-boxes) works on the very same Host system.
This runs qxl + gl=yes as well and does not fail.
We need to find what the difference is between those is as well.
<domain type='kvm'>
<name>ubuntu18.04</name>
<uuid>2f6bde7c-1d3d-498a-b96c-8920f165fa4c</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://ubuntu.com/ubuntu/18.04"/>
</libosinfo:libosinfo>
</metadata>
<memory unit='KiB'>2097152</memory>
<currentMemory unit='KiB'>2097152</currentMemory>
<vcpu placement='static'>2</vcpu>
<os>
<type arch='x86_64' machine='pc-q35-3.1'>hvm</type>
<boot dev='hd'/>
</os>
<features>
<acpi/>
<apic/>
<vmport state='off'/>
</features>
<cpu mode='host-model' check='partial'>
<model fallback='allow'/>
</cpu>
<clock offset='utc'>
<timer name='rtc' tickpolicy='catchup'/>
<timer name='pit' tickpolicy='delay'/>
<timer name='hpet' present='no'/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled='no'/>
<suspend-to-disk enabled='no'/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type='file' device='disk'>
<driver name='qemu' type='qcow2'/>
<source file='/var/lib/libvirt/images/ubuntu18.04.qcow2'/>
<target dev='vda' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
</disk>
<disk type='file' device='cdrom'>
<driver name='qemu' type='raw'/>
<target dev='sda' bus='sata'/>
<readonly/>
<address type='drive' controller='0' bus='0' target='0' unit='0'/>
</disk>
<controller type='usb' index='0' model='ich9-ehci1'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci1'>
<master startport='0'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci2'>
<master startport='2'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/>
</controller>
<controller type='usb' index='0' model='ich9-uhci3'>
<master startport='4'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/>
</controller>
<controller type='sata' index='0'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
</controller>
<controller type='pci' index='0' model='pcie-root'/>
<controller type='pci' index='1' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='1' port='0x10'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
</controller>
<controller type='pci' index='2' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='2' port='0x11'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
</controller>
<controller type='pci' index='3' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='3' port='0x12'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
</controller>
<controller type='pci' index='4' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='4' port='0x13'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
</controller>
<controller type='pci' index='5' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='5' port='0x14'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
</controller>
<controller type='pci' index='6' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='6' port='0x15'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
</controller>
<controller type='virtio-serial' index='0'>
<address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
</controller>
<interface type='network'>
<mac address='52:54:00:8c:31:fc'/>
<source network='default'/>
<model type='virtio'/>
<address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
</interface>
<serial type='pty'>
<target type='isa-serial' port='0'>
<model name='isa-serial'/>
</target>
</serial>
<console type='pty'>
<target type='serial' port='0'/>
</console>
<channel type='unix'>
<target type='virtio' name='org.qemu.guest_agent.0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
<channel type='spicevmc'>
<target type='virtio' name='com.redhat.spice.0'/>
<address type='virtio-serial' controller='0' bus='0' port='2'/>
</channel>
<input type='tablet' bus='usb'>
<address type='usb' bus='0' port='1'/>
</input>
<input type='mouse' bus='ps2'/>
<input type='keyboard' bus='ps2'/>
<graphics type='spice'>
<listen type='none'/>
<image compression='off'/>
<gl enable='yes'/>
</graphics>
<sound model='ich9'>
<address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
</sound>
<video>
<model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
</video>
<redirdev bus='usb' type='spicevmc'>
<address type='usb' bus='0' port='2'/>
</redirdev>
<redirdev bus='usb' type='spicevmc'>
<address type='usb' bus='0' port='3'/>
</redirdev>
<memballoon model='virtio'>
<address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
</memballoon>
<rng model='virtio'>
<backend model='random'>/dev/urandom</backend>
<address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
</rng>
</devices>
</domain>
P.S. I'm on a trip next week so further response might take a while, sorry
Since my domain ran gl fine I was eliminating more differences one by one, keeping <gl enable='yes'/> to check if there is a second ingredient needed.
- do not set acceleration on virtio vido dev
- machine type q35 -> i440fx (and all pcie->pci that comes with that)
- 1 instead of 4 vcpus
- no host passthrough
- no boot from CD
- add pae feature
- remove rtc/pit/hpet clock attributes
- usb ich9-[eu]hci1 -> piix3-uhci
- no smartcard entry
- no usb tablet
- use cirrus video card
- virtio channel
- no PM config
- console virtio serial
- no soundcard
- reduce memory
None of it makes it work, but the files are nearly identical now
That left only the actual disk+iso of fedora vs ubuntu cloudimg based qcow and that the boxes VM used userspace networking. Still the issue remained.
But I realized there is one more difference, the Boxes VM runs in user context while mine is a system level VM (qemu:///system) running the gl essentially headless until one connects to the local spice port.
But the gnome boxes VM was having the UI up immediately connecting to it once available.
So I defined the XML of the gnome-boxes VM in my qemu:///system libvirt context.
This - as expected (I copied the files to /var/lib/libvirt/images and adapted the paths).
This makes it work which is at least some lead to follow.
I can make the viewers (virt-viewer / virt-manager) crash when attaching to it semi-remotely - but that might be a broken setup for a local only spice definition.
When attaching viewers locally it works just fine.
In none of those cases qemu crashes, so it clearly isn't the same. Both fail at some glib errors which makes sense since I try to remote (though ssh) use local only features.
So to summarize:
- crash with gl enabled
- only triggers if run in user context
- gl works in system context (local viewers can attach and it works)
I'm out of obvious "change the config to check what it is" options.
But since it is at least reproducible I'll focus on the qemu backtrace itself next ...
Stack trace with slightly more info as all DBG and source is installed here.
--- stack trace ---
#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
__arg2 = 128
_a3 = 139788870899328
_a1 = 17325
resultvar = <optimized out>
__arg3 = 139788870899328
__arg1 = 17325
_a2 = 128
pd = <optimized out>
res = <optimized out>
#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
cpuset = {__bits = {18446744073709551615 <repeats 16 times>}}
queue = 0x55a59a8952d0
thread_index = 0
__PRETTY_FUNCTION__ = "util_queue_thread_func"
#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
pack = {func = 0x7f23227aba70 <util_queue_thread_func>, arg = 0x55a59a695bd0}
#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
ret = <optimized out>
pd = <optimized out>
now = <optimized out>
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139788870903552, 9195723382052266688, 140723610455422, 140723610455423, 0, 139788870899776, -9089523756422225216, -9089514281776799040}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = <optimized out>
#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
No locals.
--- source code stack trace ---
#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
[Error: pthread_setaffinity.c was not found in source tree]
#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252
[Error: u_queue.c was not found in source tree]
#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
[Error: threads_posix.h was not found in source tree]
#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486
[Error: pthread_create.c was not found in source tree]
#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
[Error: clone.S was not found in source tree]
Eventually it is an "Program terminated with signal SIGSYS, Bad system call"
So we need to find what is bad about it.
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7f2321fe6700 (LWP 17325) 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpus
et@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
2 Thread 0x7f2323ad3500 (LWP 17322) 0x00007f2326fe0fb7 in dri_bind_extensions (dri=dri@entry=0x55a59a7583e0, matches=matches@entry=0x7f2326fec34
0 <dri_core_extensions>, extensions=<optimized out>) at ../src/gbm/backends/dri/gbm_dri.c:286
3 Thread 0x7f2323acf700 (LWP 17323) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
A discussion with the kernel team pointed to seccomp at first:
...
<apw> grep it appears that seccomp is the only thing which triggers that signal
The stack in the breaking cases uses this by default
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
resourcecontrol is defined as:
"Disable process affinity and schedular priority"
Interestingly that is the global default, the qemu://system qemu also runs with the same.
I'd assume that:
libgl1-mesa-dri:amd64: /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
behaves differently depending if it is on a local UI session or not.
And it gets punished as soon as it tries to set-affinity which it might only do in that case.
Implemented by
- https://git.qemu.org/?p=qemu.git;a=commit;h=24f8cdc5722476e12d8e39d71f66311b4fa971c1
Similar issue being fixed last year
- https://git.qemu.org/?p=qemu.git;a=commit;h=056de1e894155fbb99e7b43c1c4382d4920cf437
Libvirt has no means to fin-control it (yet), only to switch the hole feature of sandboxing on/off.
That matches what we see - it fails on init when spawning threads - most likely there it will set the affinity.
From Ubuntu's POV this is rather new as the code in Mesa came in with the fresh 18.3.0_rc4-1
It is possible that no one else saw it so far ...
It is in mesa upstream since
https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a
But opinions might differ ...
I'll subscribe upstream qemu to this bug and then post a summary here.
This will mirror the bug updates to the Mailing List, if there is no harsh feedback I'll propose a patch to remove sched_setaffinity from the list of blocked calls.
Summary:
- qemu crash when using GL
- "sched_setaffinity" is the syscall that is seccomp blocked and kills qemu
- the mesa i915 drivers (and your radeon as well) will do that call
- it is blocked by the current qemu -sanbox on,...,resourcecontrol=deny which is libvirts default
- Implemented by qemu 24f8cdc572
- Similar issue being fixed last year qemu 056de1e894
- new code in mesa 18.3 since mesa d877451b48
I think we just need to allow sched_setaffinity with these new mesa drivers in the wild.
The alternative to detect gl usage in libvirt and only then allow ressourcecontrol IMHO seems over-engineered (needs internals to actually pass the need of seccomp subsets to be switched) and not better (more syscalls will be non-blocked then as the -secomp interface isn't fine grained).
OTOH the man page literally says "... Disable process affinity ...", so I'm not sure we can just remove it. Maybe split resourcecontrol in two, put *affinity* in the new one and make the default being not blocked - so that upper layers like libvirt will work until one explicitly states ... -sandbox on,affinity=on which no one wanting to use GL would do. That again seems too much.
Well the discussion will happen either here on ML/bug or latter when submitting an RFC for it.
IMHO that mesa change is not valid. It is settings its affinity to run on all threads which is definitely *NOT* something we want to be allowed. Management applications want to control which CPUs QEMU runs on, and as such Mesa should honour the CPU placement that the QEMU process has.
This is a great example of why QEMU wants to use seccomp to block affinity changes to prevent something silently trying to use more CPUs than are assigned to this QEMU.
(I reported that issue a few days ago too: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06066.html)
Perhaps we can teach mesa to not change CPU affinity (some option, or environment variable, or seccomp check).
Daniel, when virgl/mesa will be running in a separate process (thanks to vhost-user-gpu), I suppose the rendering process will be free to change the CPU affinity. Does that make a difference if mesa thread is in qemu or a separate process, in this case?
As & when libvirt & QEMU supports the external vhost processes for this I expect it will still restrict the CPU affinity and apply seccomp filters that likely to be as strict as they are today at minimum.
I did wonder if we could set the action for some syscalls to be "errno" instead of "kill process", but I worry that could then result in silent mis-behaviour as processes fail to check return value as they blindly assume the call cannot fail.
We should probably talk with mesa developers about providing a config option to prevent this affinity change. An env variable is workable if there's no other mechanism they can expose.
See also mesa bug:
https://bugs.freedesktop.org/show_bug.cgi?id=109695
Thanks Daniel and MarcAndre for chiming in here.
Atfer thinking more about it I agree to Daniel that actually mesa should honor and stick with its affinity assignment.
For documentation purpose: the solution proposed on the ML is at https://lists.freedesktop.org/archives/mesa-dev/2019-February/215926.html
I also added a bug tracker to the fredesktop bug as task.
@Ubuntu-Desktop Team (now subscribed) - is there a chance we can revert [1] in mesa before it will be released with Disco for now. That would be needed until an accepted solution throughout the stack of libvirt/qemu/mesa is found?
Otherwise using GL backed qemu graphics will fail as outlined in the bug.
Once such a cross-package solution to the problem is found we can (if needed at all) SRU back the set of changes to all components required.
[1]: https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a
Adding Timo who maintainers mesa.
Since upgrading Mesa from 18.2 to 18.3, launching a QEMU virtual machine with Spice OpenGL enabled (for virgl), causes QEMU to crash with SIGSYS inside the radeonsi driver. The reason for this is that the QEMU sandbox option 'resourcecontrol=deny' disables the sched_setaffinity syscall called in pthread_setaffinity_np, which is now used by the radeonsi driver.
A simple way to reproduce this problem is:
$ gdb --batch --ex run --ex bt --args qemu-system-x86_64 -spice gl=on -sandbox on,resourcecontrol=deny
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff45aa700 (LWP 23432)]
[New Thread 0x7ffff08e5700 (LWP 23433)]
[New Thread 0x7fffe3fff700 (LWP 23434)]
[New Thread 0x7fffe37fe700 (LWP 23435)]
Thread 4 "qemu-system-x86" received signal SIGSYS, Bad system call.
[Switching to Thread 0x7fffe3fff700 (LWP 23434)]
0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
34 ../sysdeps/unix/sysv/linux/pthread_setaffinity.c: No such file or directory.
#0 0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34
#1 0x00007ffff12ba2b3 in util_queue_thread_func (input=input@entry=0x55555640b1f0) at ../src/util/u_queue.c:252
#2 0x00007ffff12b9c17 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87
#3 0x00007ffff68c1fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486
#4 0x00007ffff67f280f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
The problematic code at src/util/u_queue.c:252 was added in the following commit:
commit d877451b48a59ab0f9a4210fc736f51da5851c9a
Author: Marek Olšák <email address hidden>
Date: Mon Oct 1 15:51:06 2018 -0400
util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
Initial version discussed with Rob Clark under a different patch name.
This approach leaves his driver unaffected.
Since setting the thread affinity seems non-essential here, the failing syscall should be handled gracefully, for example by setting a signal handler to ignore the SIGSYS signal.
Mesa needs a way to query that it can't set thread affinity.
To check for the availability of the syscall, one can try it in a child process and see if the child is terminated by a signal, e.g. like this:
#include <stdbool.h>
#include <unistd.h>
#include <sys/resource.h>
#include <sys/syscall.h>
#include <sys/wait.h>
static bool
can_set_affinity()
{
pid_t pid = fork();
int status = 0;
if (!pid) {
/* Disable coredumps, because a SIGSYS crash is expected. */
struct rlimit limit = { 0 };
limit.rlim_cur = 1;
limit.rlim_max = 1;
setrlimit(RLIMIT_CORE, &limit);
/* Test the syscall in the child process. */
syscall(SYS_sched_setaffinity, 0, 0, 0);
_exit(0);
} else if (pid < 0) {
return false;
}
if (waitpid(pid, &status, 0) < 0) {
return false;
}
if (WIFSIGNALED(status)) {
/* The child process was terminated by a signal,
* thus the syscall cannot be used.
*/
return false;
}
return true;
}
(In reply to Ahzo from comment #2)
> To check for the availability of the syscall, one can try it in a child
> process and see if the child is terminated by a signal, e.g. like this:
Afraid not, QEMU's seccomp filter blocks use of fork() too :-)
(In reply to Ahzo from comment #0)
> The problematic code at src/util/u_queue.c:252 was added in the following
> commit:
> commit d877451b48a59ab0f9a4210fc736f51da5851c9a
> Author: Marek Olšák <email address hidden>
> Date: Mon Oct 1 15:51:06 2018 -0400
>
> util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY
>
> Initial version discussed with Rob Clark under a different patch name.
> This approach leaves his driver unaffected.
>
>
> Since setting the thread affinity seems non-essential here, the failing
> syscall should be handled gracefully, for example by setting a signal
> handler to ignore the SIGSYS signal.
I'm curious what motivated this change to start with ? Even if QEMU was not enforcing seccomp filters, I think I'd consider it a bug for mesa to be setting its process affinity in this way. The mgmt application or sysadmin has decided that the process must have a certain affinity, based on how it/they want the host CPUs utilized. Why is mesa wanting to override this administrative policy decision to restrict CPU usage ?
(In reply to Daniel P. Berrange from comment #4)
>
> I'm curious what motivated this change to start with ? Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way. The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?
To improve performance on modern multi-core NUMA architectures.
Sent a quick RFC for an env variable workaround on the ML "[PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering".
(In reply to Daniel P. Berrange from comment #4)
> I'm curious what motivated this change to start with ? Even if QEMU was not
> enforcing seccomp filters, I think I'd consider it a bug for mesa to be
> setting its process affinity in this way. The mgmt application or sysadmin
> has decided that the process must have a certain affinity, based on how
> it/they want the host CPUs utilized. Why is mesa wanting to override this
> administrative policy decision to restrict CPU usage ?
The correct solution is to fix pthread_setaffinity such that it returns an error code instead of crashing.
An even better solution would be to have a virtual thread affinity that only the application can see and change, which should be silently masked by administrative policies not visible to the application.
(In reply to Marek Olšák from comment #7)
> An even better solution would be to have a virtual thread affinity that only
> the application can see and change, which should be silently masked by
> administrative policies not visible to the application.
Mesa doesn't really need explicit thread affinity at all. All it wants is that certain sets of threads run on the same CPU module; it doesn't care which particular CPU module that is. What's really needed is an API to express this affinity between threads, instead of to specific CPU cores.
(In reply to Daniel P. Berrange from comment #3)
> (In reply to Ahzo from comment #2)
> > To check for the availability of the syscall, one can try it in a child
> > process and see if the child is terminated by a signal, e.g. like this:
>
> Afraid not, QEMU's seccomp filter blocks use of fork() too :-)
Maybe it should, at least when using the spawn=deny option, but currently it doesn't. That option only blocks the fork, vfork and execve syscalls, but glibc's fork() function uses the clone syscall, and thus continues to work.
However, that behavior might be different when using other C library implementations, so it wouldn't be correct to rely on this.
One could use clone() instead of fork(), but future versions of qemu might block the clone syscall, as well.
Unfortunately, I'm not aware of a proper solution for this bug short of adding a new API to the kernel.
You can test 19.0~rc6 with this reverted on a ppa:
ppa:canonical-x/x-staging
should be built in 30min
Hi Timo,
I tried to test with the mesa from ppa:canonical-x/x-staging
But there is a dependency issue in that PPA - I can't install all packages from there.
It seems most of the X* packages will need a transition for the new mesa and those are not in this ppa right now.
Installing all that I can from the PPA doesn't resolve the issue, is there something more you need to upload to the PPA - or are there other things I'd need to do to install all of mesa?
This is the current mix of rc5/6 it gave me :-/
libegl-mesa0:amd64 19.0.0~rc5-1ubuntu0.1
libegl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1
libgl1-mesa-dri:amd64 19.0.0~rc5-1ubuntu0.1
libgl1-mesa-glx:amd64 19.0.0~rc6-1ubuntu0.1
libglapi-mesa:amd64 19.0.0~rc5-1ubuntu0.1
libglx-mesa0:amd64 19.0.0~rc5-1ubuntu0.1
libwayland-egl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1
mesa-va-drivers:amd64 19.0.0~rc5-1ubuntu0.1
mesa-vdpau-drivers:amd64 19.0.0~rc5-1ubuntu0.1
I don't have that issue on a chroot, so you should at least tell me why it would refuse to upgrade them all.. apt should show an error
The PPA was built against -proposed so I had to enable that to install all libs.
That done the 19.0.0~rc6-1ubuntu0.1 with the set affinity change reverted works quite nicely.
It would be great to get that into Ubuntu 19.04 until the involved upstreams agreed how to proceed with it and we can then sort out what to do in which package. Which after all might be after cutoff and in 19.10 then.
Thanks Timo, let me know if you need another verification on this at any point to drive it into 19.04.
We're getting down to just a few bugs blocking 19.0, so I'm pinging those bugs to see what the progress is?
I'm removing this from the 19.0 blocking tracker. Generally we don't add bugs to block a release if they were present in the previous release, additionally there doesn't seem to be any consensus on a solution, at this moment. If there is a fix implemented I'd be happy to pull that into a later 19.0 release.
This bug was fixed in the package mesa - 19.0.0-1ubuntu1
---------------
mesa (19.0.0-1ubuntu1) disco; urgency=medium
* Merge from Debian. (LP: #1818516)
* revert-set-full-thread-affinity.diff: Fix qemu crash. (LP: #1815889)
-- Timo Aaltonen <email address hidden> Thu, 14 Mar 2019 18:48:18 +0200
(In reply to Michel Dänzer from comment #8)
> Mesa doesn't really need explicit thread affinity at all. All it wants is
> that certain sets of threads run on the same CPU module; it doesn't care
> which particular CPU module that is. What's really needed is an API to
> express this affinity between threads, instead of to specific CPU cores.
I think the thread affinity API is a correct way to optimize for CPU cache topologies. pthread is a basic user API. Security policies shouldn't disallow pthread functions.
FYI the QEMU change merged in the following pull request changed to return an EPERM errno for the thread affinity syscalls:
commit 12f067cc14b90aef60b2b7d03e1df74cc50a0459
Merge: 84bdc58c06 035121d23a
Author: Peter Maydell <email address hidden>
Date: Thu Mar 28 12:04:52 2019 +0000
Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20190327' into staging
pull-seccomp-20190327
# gpg: Signature made Wed 27 Mar 2019 12:12:39 GMT
# gpg: using RSA key DF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <email address hidden>" [full]
# Primary key fingerprint: D67E 1B50 9374 86B4 0723 DBAB DF32 E7C0 F0FF F9A2
* remotes/otubo/tags/pull-seccomp-20190327:
seccomp: report more useful errors from seccomp
seccomp: don't kill process for resource control syscalls
Signed-off-by: Peter Maydell <email address hidden>
IOW, mesa's usage of this syscalls will still be blocked, but it will no longer kill the process.
Thank you Daniel,
we will most likely keep Disco as-is for now and merge this in 19.10 where then mesa can drop the revert. I tagged it for 19.10 to be revisited.
This problem was solved by qemu [1], so this mesa bug can be closed.
[1] https://git.qemu.org/git/qemu.git/?a=commitdiff;h=9a1565a03b79d80b236bc7cc2dbce52a2ef3a1b8
Reopening/Assigning to TImo for eoan since there is a patch which can we dropped once qemu is fixed
I believe this was fixed by qemu 4.0 in eoan.
This bug was fixed in the package mesa - 19.2.4-1ubuntu1
---------------
mesa (19.2.4-1ubuntu1) focal; urgency=medium
* Merge from Debian.
* revert-set-full-thread-affinity.diff: Dropped, qemu is fixed now in
eoan and up. (LP: #1815889)
-- Timo Aaltonen <email address hidden> Wed, 20 Nov 2019 20:17:00 +0200
|