diff options
Diffstat (limited to 'results/classifier/zero-shot/108/performance/1815889')
| -rw-r--r-- | results/classifier/zero-shot/108/performance/1815889 | 929 |
1 files changed, 929 insertions, 0 deletions
diff --git a/results/classifier/zero-shot/108/performance/1815889 b/results/classifier/zero-shot/108/performance/1815889 new file mode 100644 index 000000000..fc7a8830d --- /dev/null +++ b/results/classifier/zero-shot/108/performance/1815889 @@ -0,0 +1,929 @@ +performance: 0.957 +device: 0.957 +permissions: 0.950 +debug: 0.950 +semantic: 0.949 +other: 0.940 +graphic: 0.939 +PID: 0.923 +socket: 0.921 +files: 0.918 +boot: 0.908 +network: 0.903 +vnc: 0.887 +KVM: 0.858 + +qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new() + +Unable to launch Default Fedora 29 images in gnome-boxes + +ProblemType: Crash +DistroRelease: Ubuntu 19.04 +Package: qemu-system-x86 1:3.1+dfsg-2ubuntu1 +ProcVersionSignature: Ubuntu 4.19.0-12.13-generic 4.19.18 +Uname: Linux 4.19.0-12-generic x86_64 +ApportVersion: 2.20.10-0ubuntu20 +Architecture: amd64 +Date: Thu Feb 14 11:00:45 2019 +ExecutablePath: /usr/bin/qemu-system-x86_64 +KvmCmdLine: COMMAND STAT EUID RUID PID PPID %CPU COMMAND +MachineType: Dell Inc. Precision T3610 +ProcEnviron: PATH=(custom, user) +ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.19.0-12-generic root=UUID=939b509b-d627-4642-a655-979b44972d17 ro splash quiet vt.handoff=1 +Signal: 31 +SourcePackage: qemu +StacktraceTop: + __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so + () at /usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so + start_thread (arg=<optimized out>) at pthread_create.c:486 + clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 +Title: qemu-system-x86_64 crashed with signal 31 in __pthread_setaffinity_new() +UpgradeStatus: Upgraded to disco on 2018-11-14 (91 days ago) +UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo video +dmi.bios.date: 11/14/2018 +dmi.bios.vendor: Dell Inc. +dmi.bios.version: A18 +dmi.board.name: 09M8Y8 +dmi.board.vendor: Dell Inc. +dmi.board.version: A01 +dmi.chassis.type: 7 +dmi.chassis.vendor: Dell Inc. +dmi.modalias: dmi:bvnDellInc.:bvrA18:bd11/14/2018:svnDellInc.:pnPrecisionT3610:pvr00:rvnDellInc.:rn09M8Y8:rvrA01:cvnDellInc.:ct7:cvr: +dmi.product.name: Precision T3610 +dmi.product.sku: 05D2 +dmi.product.version: 00 +dmi.sys.vendor: Dell Inc. + + + +StacktraceTop: + __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f5771fbf680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + ?? () from /tmp/apport_sandbox_8_pwkx51/usr/lib/x86_64-linux-gnu/dri/radeonsi_dri.so + ?? () + ?? () + ?? () + + + + + + + + +I can confirm the reported issue + +Trace looks similar: +--- stack trace --- +#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + __arg2 = 128 + _a3 = 139730004883072 + _a1 = 22587 + resultvar = <optimized out> + __arg3 = 139730004883072 + __arg1 = 22587 + _a2 = 128 + pd = <optimized out> + res = <optimized out> +#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so +No symbol table info available. +#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so +No symbol table info available. +#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486 + ret = <optimized out> + pd = <optimized out> + now = <optimized out> + unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139730004887296, -2085932122569588158, 140733496626446, 140733496626447, 0, 139730004883520, 2100820740254843458, 2100830499542516290}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} + not_first_call = <optimized out> +#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 +No locals. +--- source code stack trace --- +#0 0x00007f1570fec0bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=128, cpuset=0x7f156d4e3680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + [Error: pthread_setaffinity.c was not found in source tree] +#1 0x00007f156dc8dc73 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so +#2 0x00007f156dc8d5d7 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so +#3 0x00007f1570fe1164 in start_thread (arg=<optimized out>) at pthread_create.c:486 + [Error: pthread_create.c was not found in source tree] +#4 0x00007f1570f09def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 + [Error: clone.S was not found in source tree] + + +libvirt XML that was generated: +<domain type="kvm"> + <name>fedora29-wor</name> + <uuid>2f4e83f7-18ed-45e2-bbf7-eef9f1c6c6c0</uuid> + <title>Fedora 29 Workstation</title> + <metadata> + <boxes:gnome-boxes xmlns:boxes="https://wiki.gnome.org/Apps/Boxes"> + <os-state>live</os-state> + <media-id>http://fedoraproject.org/fedora/29:0</media-id> + <media>/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso</media> + </boxes:gnome-boxes> + <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> + <libosinfo:os id="http://fedoraproject.org/fedora/29"/> + </libosinfo:libosinfo> + </metadata> + <memory unit="KiB">2097152</memory> + <currentMemory unit="KiB">2097152</currentMemory> + <vcpu placement="static">2</vcpu> + <os> + <type arch="x86_64" machine="pc-q35-3.1">hvm</type> + <boot dev="cdrom"/> + <boot dev="hd"/> + </os> + <features> + <acpi/> + <apic/> + </features> + <cpu mode="host-passthrough" check="none"> + <topology sockets="1" cores="2" threads="1"/> + </cpu> + <clock offset="utc"> + <timer name="rtc" tickpolicy="catchup"/> + <timer name="pit" tickpolicy="delay"/> + <timer name="hpet" present="no"/> + </clock> + <on_poweroff>destroy</on_poweroff> + <on_reboot>destroy</on_reboot> + <on_crash>destroy</on_crash> + <pm> + <suspend-to-mem enabled="no"/> + <suspend-to-disk enabled="no"/> + </pm> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type="file" device="disk"> + <driver name="qemu" type="qcow2" cache="writeback"/> + <source file="/home/paelzer/.local/share/gnome-boxes/images/fedora29-wor"/> + <target dev="vda" bus="virtio"/> + <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/> + </disk> + <disk type="file" device="cdrom"> + <driver name="qemu" type="raw"/> + <source file="/home/paelzer/Fedora-Workstation-Live-x86_64-29-1.2.iso" startupPolicy="mandatory"/> + <target dev="hdc" bus="sata"/> + <readonly/> + <address type="drive" controller="0" bus="0" target="0" unit="2"/> + </disk> + <controller type="usb" index="0" model="ich9-ehci1"> + <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x7"/> + </controller> + <controller type="usb" index="0" model="ich9-uhci1"> + <master startport="0"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x0" multifunction="on"/> + </controller> + <controller type="usb" index="0" model="ich9-uhci2"> + <master startport="2"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x1"/> + </controller> + <controller type="usb" index="0" model="ich9-uhci3"> + <master startport="4"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x1d" function="0x2"/> + </controller> + <controller type="sata" index="0"> + <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/> + </controller> + <controller type="pci" index="0" model="pcie-root"/> + <controller type="pci" index="1" model="pcie-root-port"> + <model name="pcie-root-port"/> + <target chassis="1" port="0x10"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/> + </controller> + <controller type="pci" index="2" model="pcie-root-port"> + <model name="pcie-root-port"/> + <target chassis="2" port="0x11"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/> + </controller> + <controller type="pci" index="3" model="pcie-root-port"> + <model name="pcie-root-port"/> + <target chassis="3" port="0x12"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/> + </controller> + <controller type="pci" index="4" model="pcie-root-port"> + <model name="pcie-root-port"/> + <target chassis="4" port="0x13"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/> + </controller> + <controller type="pci" index="5" model="pcie-root-port"> + <model name="pcie-root-port"/> + <target chassis="5" port="0x14"/> + <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/> + </controller> + <controller type="virtio-serial" index="0"> + <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/> + </controller> + <controller type="ccid" index="0"> + <address type="usb" bus="0" port="1"/> + </controller> + <interface type="user"> + <mac address="52:54:00:ee:17:af"/> + <model type="virtio"/> + <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/> + </interface> + <smartcard mode="passthrough" type="spicevmc"> + <address type="ccid" controller="0" slot="0"/> + </smartcard> + <serial type="pty"> + <target type="isa-serial" port="0"> + <model name="isa-serial"/> + </target> + </serial> + <console type="pty"> + <target type="serial" port="0"/> + </console> + <channel type="spicevmc"> + <target type="virtio" name="com.redhat.spice.0"/> + <address type="virtio-serial" controller="0" bus="0" port="1"/> + </channel> + <channel type="spiceport"> + <source channel="org.spice-space.webdav.0"/> + <target type="virtio" name="org.spice-space.webdav.0"/> + <address type="virtio-serial" controller="0" bus="0" port="2"/> + </channel> + <input type="tablet" bus="usb"> + <address type="usb" bus="0" port="2"/> + </input> + <input type="mouse" bus="ps2"/> + <input type="keyboard" bus="ps2"/> + <graphics type="spice"> + <listen type="none"/> + <image compression="off"/> + <gl enable="yes"/> + </graphics> + <sound model="ich9"> + <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/> + </sound> + <video> + <model type="virtio" heads="1" primary="yes"> + <acceleration accel3d="yes"/> + </model> + <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/> + </video> + <redirdev bus="usb" type="spicevmc"> + <address type="usb" bus="0" port="3"/> + </redirdev> + <redirdev bus="usb" type="spicevmc"> + <address type="usb" bus="0" port="4"/> + </redirdev> + <redirdev bus="usb" type="spicevmc"> + <address type="usb" bus="0" port="5"/> + </redirdev> + <redirdev bus="usb" type="spicevmc"> + <address type="usb" bus="0" port="6"/> + </redirdev> + <memballoon model="virtio"> + <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/> + </memballoon> + </devices> +</domain> + +Interestingly, the Ubuntu 18.10 image works. +So is it really an attribute of the guest that breaks it? + + +BTW - Arr, why does it spawn its own libvirtd ?! +Dear gnome boxes what are you doing? +0 1000 21610 1 20 0 85807204 68912 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitWebProcess 2 15 +0 1000 21612 1 20 0 85772584 34132 poll_s SLl pts/2 0:00 /usr/lib/x86_64-linux-gnu/webkit2gtk-4.0/WebKitNetworkProcess 3 15 +0 1000 21649 1 20 0 1391464 39144 poll_s Sl ? 0:00 /usr/sbin/libvirtd --timeout=30 + +Thanks to "lsof +fg -p" some important paths: + +The guest log is in /home/paelzer/.cache/libvirt/qemu/log/ubuntu18.10.log +Control sockets are at +/run/user/1000/libvirt/libvirt-sock +/run/user/1000/libvirt/libvirt-admin-sock + +Now lets try to poke at it without that UI around it .... + + +The following gets me to non boxy libvirt: +$ virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock list --all + +For now I'll assume that it is NOT depending on the guest, but lets modify the working Ubuntu guest one by one to become more like the F29 guest and we will see. + +1. different disks/iso's/MAC (obviously) +2. F29 has gl enabled on the spice graphics +3. video F29: virtio Ubuntu: qxl +4. video has <acceleration accel3d='yes'/> set + +That is all the difference, so it seems 3d'ish to me. + +First change +<model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> +to +<model type='virtio' heads='1' primary='yes'> +=> still working + +Second change enable gl +<gl enable='no'/> +to +<gl enable='yes'/> + +=> Broken + +Lets take back the First change but keep only the second. +=> still broken. + +So it is the enablement of gl which I work on anyway recently (some apparmor changes to make it work in my former setup). + +Thanks for sharing this bug, but I need to analyze more in depth what is wrong here, but that might take a while. + +Note: Since your guest crashed on start the crash has no private data - marking the bug public ... + + +For the time being as a workaround: + virsh -c qemu+unix:///session?socket=/run/user/1000/libvirt/libvirt-sock edit fedora29-wor +(assuming that is your guest name as well) +and switch off the gl enablement. +Gives me a perfectly working guest, hope that helps you for now until a real fix is found. + +FTR: this guest XML (not out of gnome-boxes) works on the very same Host system. +This runs qxl + gl=yes as well and does not fail. +We need to find what the difference is between those is as well. + +<domain type='kvm'> + <name>ubuntu18.04</name> + <uuid>2f6bde7c-1d3d-498a-b96c-8920f165fa4c</uuid> + <metadata> + <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0"> + <libosinfo:os id="http://ubuntu.com/ubuntu/18.04"/> + </libosinfo:libosinfo> + </metadata> + <memory unit='KiB'>2097152</memory> + <currentMemory unit='KiB'>2097152</currentMemory> + <vcpu placement='static'>2</vcpu> + <os> + <type arch='x86_64' machine='pc-q35-3.1'>hvm</type> + <boot dev='hd'/> + </os> + <features> + <acpi/> + <apic/> + <vmport state='off'/> + </features> + <cpu mode='host-model' check='partial'> + <model fallback='allow'/> + </cpu> + <clock offset='utc'> + <timer name='rtc' tickpolicy='catchup'/> + <timer name='pit' tickpolicy='delay'/> + <timer name='hpet' present='no'/> + </clock> + <on_poweroff>destroy</on_poweroff> + <on_reboot>restart</on_reboot> + <on_crash>destroy</on_crash> + <pm> + <suspend-to-mem enabled='no'/> + <suspend-to-disk enabled='no'/> + </pm> + <devices> + <emulator>/usr/bin/qemu-system-x86_64</emulator> + <disk type='file' device='disk'> + <driver name='qemu' type='qcow2'/> + <source file='/var/lib/libvirt/images/ubuntu18.04.qcow2'/> + <target dev='vda' bus='virtio'/> + <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/> + </disk> + <disk type='file' device='cdrom'> + <driver name='qemu' type='raw'/> + <target dev='sda' bus='sata'/> + <readonly/> + <address type='drive' controller='0' bus='0' target='0' unit='0'/> + </disk> + <controller type='usb' index='0' model='ich9-ehci1'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci1'> + <master startport='0'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci2'> + <master startport='2'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/> + </controller> + <controller type='usb' index='0' model='ich9-uhci3'> + <master startport='4'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/> + </controller> + <controller type='sata' index='0'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/> + </controller> + <controller type='pci' index='0' model='pcie-root'/> + <controller type='pci' index='1' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='1' port='0x10'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/> + </controller> + <controller type='pci' index='2' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='2' port='0x11'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/> + </controller> + <controller type='pci' index='3' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='3' port='0x12'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/> + </controller> + <controller type='pci' index='4' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='4' port='0x13'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/> + </controller> + <controller type='pci' index='5' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='5' port='0x14'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/> + </controller> + <controller type='pci' index='6' model='pcie-root-port'> + <model name='pcie-root-port'/> + <target chassis='6' port='0x15'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/> + </controller> + <controller type='virtio-serial' index='0'> + <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/> + </controller> + <interface type='network'> + <mac address='52:54:00:8c:31:fc'/> + <source network='default'/> + <model type='virtio'/> + <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/> + </interface> + <serial type='pty'> + <target type='isa-serial' port='0'> + <model name='isa-serial'/> + </target> + </serial> + <console type='pty'> + <target type='serial' port='0'/> + </console> + <channel type='unix'> + <target type='virtio' name='org.qemu.guest_agent.0'/> + <address type='virtio-serial' controller='0' bus='0' port='1'/> + </channel> + <channel type='spicevmc'> + <target type='virtio' name='com.redhat.spice.0'/> + <address type='virtio-serial' controller='0' bus='0' port='2'/> + </channel> + <input type='tablet' bus='usb'> + <address type='usb' bus='0' port='1'/> + </input> + <input type='mouse' bus='ps2'/> + <input type='keyboard' bus='ps2'/> + <graphics type='spice'> + <listen type='none'/> + <image compression='off'/> + <gl enable='yes'/> + </graphics> + <sound model='ich9'> + <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/> + </sound> + <video> + <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/> + <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/> + </video> + <redirdev bus='usb' type='spicevmc'> + <address type='usb' bus='0' port='2'/> + </redirdev> + <redirdev bus='usb' type='spicevmc'> + <address type='usb' bus='0' port='3'/> + </redirdev> + <memballoon model='virtio'> + <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/> + </memballoon> + <rng model='virtio'> + <backend model='random'>/dev/urandom</backend> + <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/> + </rng> + </devices> +</domain> + +P.S. I'm on a trip next week so further response might take a while, sorry + +Since my domain ran gl fine I was eliminating more differences one by one, keeping <gl enable='yes'/> to check if there is a second ingredient needed. + +- do not set acceleration on virtio vido dev +- machine type q35 -> i440fx (and all pcie->pci that comes with that) +- 1 instead of 4 vcpus +- no host passthrough +- no boot from CD +- add pae feature +- remove rtc/pit/hpet clock attributes +- usb ich9-[eu]hci1 -> piix3-uhci +- no smartcard entry +- no usb tablet +- use cirrus video card +- virtio channel +- no PM config +- console virtio serial +- no soundcard +- reduce memory + +None of it makes it work, but the files are nearly identical now + +That left only the actual disk+iso of fedora vs ubuntu cloudimg based qcow and that the boxes VM used userspace networking. Still the issue remained. + +But I realized there is one more difference, the Boxes VM runs in user context while mine is a system level VM (qemu:///system) running the gl essentially headless until one connects to the local spice port. +But the gnome boxes VM was having the UI up immediately connecting to it once available. + +So I defined the XML of the gnome-boxes VM in my qemu:///system libvirt context. +This - as expected (I copied the files to /var/lib/libvirt/images and adapted the paths). +This makes it work which is at least some lead to follow. + +I can make the viewers (virt-viewer / virt-manager) crash when attaching to it semi-remotely - but that might be a broken setup for a local only spice definition. + +When attaching viewers locally it works just fine. + +In none of those cases qemu crashes, so it clearly isn't the same. Both fail at some glib errors which makes sense since I try to remote (though ssh) use local only features. + + +So to summarize: +- crash with gl enabled +- only triggers if run in user context +- gl works in system context (local viewers can attach and it works) + +I'm out of obvious "change the config to check what it is" options. +But since it is at least reproducible I'll focus on the qemu backtrace itself next ... + +Stack trace with slightly more info as all DBG and source is installed here. + +--- stack trace --- +#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + __arg2 = 128 + _a3 = 139788870899328 + _a1 = 17325 + resultvar = <optimized out> + __arg3 = 139788870899328 + __arg1 = 17325 + _a2 = 128 + pd = <optimized out> + res = <optimized out> +#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252 + cpuset = {__bits = {18446744073709551615 <repeats 16 times>}} + queue = 0x55a59a8952d0 + thread_index = 0 + __PRETTY_FUNCTION__ = "util_queue_thread_func" +#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87 + pack = {func = 0x7f23227aba70 <util_queue_thread_func>, arg = 0x55a59a695bd0} +#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486 + ret = <optimized out> + pd = <optimized out> + now = <optimized out> + unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139788870903552, 9195723382052266688, 140723610455422, 140723610455423, 0, 139788870899776, -9089523756422225216, -9089514281776799040}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} + not_first_call = <optimized out> +#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 +No locals. +--- source code stack trace --- +#0 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + [Error: pthread_setaffinity.c was not found in source tree] +#1 0x00007f23227abd83 in util_queue_thread_func (input=input@entry=0x55a59a695bd0) at ../src/util/u_queue.c:252 + [Error: u_queue.c was not found in source tree] +#2 0x00007f23227ab6e7 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87 + [Error: threads_posix.h was not found in source tree] +#3 0x00007f2325ad5164 in start_thread (arg=<optimized out>) at pthread_create.c:486 + [Error: pthread_create.c was not found in source tree] +#4 0x00007f23259fddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 + [Error: clone.S was not found in source tree] + +Eventually it is an "Program terminated with signal SIGSYS, Bad system call" +So we need to find what is bad about it. + + + +(gdb) info threads + Id Target Id Frame +* 1 Thread 0x7f2321fe6700 (LWP 17325) 0x00007f2325ae00bf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpus + et@entry=0x7f2321fe5680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 + 2 Thread 0x7f2323ad3500 (LWP 17322) 0x00007f2326fe0fb7 in dri_bind_extensions (dri=dri@entry=0x55a59a7583e0, matches=matches@entry=0x7f2326fec34 + 0 <dri_core_extensions>, extensions=<optimized out>) at ../src/gbm/backends/dri/gbm_dri.c:286 + 3 Thread 0x7f2323acf700 (LWP 17323) syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38 + +A discussion with the kernel team pointed to seccomp at first: +... +<apw> grep it appears that seccomp is the only thing which triggers that signal + +The stack in the breaking cases uses this by default +-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny + +resourcecontrol is defined as: +"Disable process affinity and schedular priority" + +Interestingly that is the global default, the qemu://system qemu also runs with the same. +I'd assume that: + libgl1-mesa-dri:amd64: /usr/lib/x86_64-linux-gnu/dri/i965_dri.so +behaves differently depending if it is on a local UI session or not. +And it gets punished as soon as it tries to set-affinity which it might only do in that case. + +Implemented by +- https://git.qemu.org/?p=qemu.git;a=commit;h=24f8cdc5722476e12d8e39d71f66311b4fa971c1 +Similar issue being fixed last year +- https://git.qemu.org/?p=qemu.git;a=commit;h=056de1e894155fbb99e7b43c1c4382d4920cf437 + +Libvirt has no means to fin-control it (yet), only to switch the hole feature of sandboxing on/off. + +That matches what we see - it fails on init when spawning threads - most likely there it will set the affinity. + +From Ubuntu's POV this is rather new as the code in Mesa came in with the fresh 18.3.0_rc4-1 +It is possible that no one else saw it so far ... +It is in mesa upstream since + https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a + +But opinions might differ ... +I'll subscribe upstream qemu to this bug and then post a summary here. +This will mirror the bug updates to the Mailing List, if there is no harsh feedback I'll propose a patch to remove sched_setaffinity from the list of blocked calls. + +Summary: +- qemu crash when using GL +- "sched_setaffinity" is the syscall that is seccomp blocked and kills qemu +- the mesa i915 drivers (and your radeon as well) will do that call +- it is blocked by the current qemu -sanbox on,...,resourcecontrol=deny which is libvirts default +- Implemented by qemu 24f8cdc572 +- Similar issue being fixed last year qemu 056de1e894 +- new code in mesa 18.3 since mesa d877451b48 + +I think we just need to allow sched_setaffinity with these new mesa drivers in the wild. +The alternative to detect gl usage in libvirt and only then allow ressourcecontrol IMHO seems over-engineered (needs internals to actually pass the need of seccomp subsets to be switched) and not better (more syscalls will be non-blocked then as the -secomp interface isn't fine grained). + +OTOH the man page literally says "... Disable process affinity ...", so I'm not sure we can just remove it. Maybe split resourcecontrol in two, put *affinity* in the new one and make the default being not blocked - so that upper layers like libvirt will work until one explicitly states ... -sandbox on,affinity=on which no one wanting to use GL would do. That again seems too much. +Well the discussion will happen either here on ML/bug or latter when submitting an RFC for it. + +IMHO that mesa change is not valid. It is settings its affinity to run on all threads which is definitely *NOT* something we want to be allowed. Management applications want to control which CPUs QEMU runs on, and as such Mesa should honour the CPU placement that the QEMU process has. + +This is a great example of why QEMU wants to use seccomp to block affinity changes to prevent something silently trying to use more CPUs than are assigned to this QEMU. + +(I reported that issue a few days ago too: https://lists.gnu.org/archive/html/qemu-devel/2019-02/msg06066.html) + +Perhaps we can teach mesa to not change CPU affinity (some option, or environment variable, or seccomp check). + +Daniel, when virgl/mesa will be running in a separate process (thanks to vhost-user-gpu), I suppose the rendering process will be free to change the CPU affinity. Does that make a difference if mesa thread is in qemu or a separate process, in this case? + +As & when libvirt & QEMU supports the external vhost processes for this I expect it will still restrict the CPU affinity and apply seccomp filters that likely to be as strict as they are today at minimum. + +I did wonder if we could set the action for some syscalls to be "errno" instead of "kill process", but I worry that could then result in silent mis-behaviour as processes fail to check return value as they blindly assume the call cannot fail. + +We should probably talk with mesa developers about providing a config option to prevent this affinity change. An env variable is workable if there's no other mechanism they can expose. + +See also mesa bug: +https://bugs.freedesktop.org/show_bug.cgi?id=109695 + +Thanks Daniel and MarcAndre for chiming in here. +Atfer thinking more about it I agree to Daniel that actually mesa should honor and stick with its affinity assignment. + +For documentation purpose: the solution proposed on the ML is at https://lists.freedesktop.org/archives/mesa-dev/2019-February/215926.html +I also added a bug tracker to the fredesktop bug as task. + +@Ubuntu-Desktop Team (now subscribed) - is there a chance we can revert [1] in mesa before it will be released with Disco for now. That would be needed until an accepted solution throughout the stack of libvirt/qemu/mesa is found? +Otherwise using GL backed qemu graphics will fail as outlined in the bug. + +Once such a cross-package solution to the problem is found we can (if needed at all) SRU back the set of changes to all components required. + +[1]: https://github.com/mesa3d/mesa/commit/d877451b48a59ab0f9a4210fc736f51da5851c9a + +Adding Timo who maintainers mesa. + + +Since upgrading Mesa from 18.2 to 18.3, launching a QEMU virtual machine with Spice OpenGL enabled (for virgl), causes QEMU to crash with SIGSYS inside the radeonsi driver. The reason for this is that the QEMU sandbox option 'resourcecontrol=deny' disables the sched_setaffinity syscall called in pthread_setaffinity_np, which is now used by the radeonsi driver. + +A simple way to reproduce this problem is: +$ gdb --batch --ex run --ex bt --args qemu-system-x86_64 -spice gl=on -sandbox on,resourcecontrol=deny +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". +[New Thread 0x7ffff45aa700 (LWP 23432)] +[New Thread 0x7ffff08e5700 (LWP 23433)] +[New Thread 0x7fffe3fff700 (LWP 23434)] +[New Thread 0x7fffe37fe700 (LWP 23435)] + +Thread 4 "qemu-system-x86" received signal SIGSYS, Bad system call. +[Switching to Thread 0x7fffe3fff700 (LWP 23434)] +0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 +34 ../sysdeps/unix/sysv/linux/pthread_setaffinity.c: No such file or directory. +#0 0x00007ffff68cc9cf in __pthread_setaffinity_new (th=<optimized out>, cpusetsize=cpusetsize@entry=128, cpuset=cpuset@entry=0x7fffe3ffe680) at ../sysdeps/unix/sysv/linux/pthread_setaffinity.c:34 +#1 0x00007ffff12ba2b3 in util_queue_thread_func (input=input@entry=0x55555640b1f0) at ../src/util/u_queue.c:252 +#2 0x00007ffff12b9c17 in impl_thrd_routine (p=<optimized out>) at ../src/../include/c11/threads_posix.h:87 +#3 0x00007ffff68c1fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 +#4 0x00007ffff67f280f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 + + +The problematic code at src/util/u_queue.c:252 was added in the following commit: +commit d877451b48a59ab0f9a4210fc736f51da5851c9a +Author: Marek Olšák <email address hidden> +Date: Mon Oct 1 15:51:06 2018 -0400 + + util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY + + Initial version discussed with Rob Clark under a different patch name. + This approach leaves his driver unaffected. + + +Since setting the thread affinity seems non-essential here, the failing syscall should be handled gracefully, for example by setting a signal handler to ignore the SIGSYS signal. + +Mesa needs a way to query that it can't set thread affinity. + +To check for the availability of the syscall, one can try it in a child process and see if the child is terminated by a signal, e.g. like this: + +#include <stdbool.h> +#include <unistd.h> +#include <sys/resource.h> +#include <sys/syscall.h> +#include <sys/wait.h> + +static bool +can_set_affinity() +{ + pid_t pid = fork(); + int status = 0; + if (!pid) { + /* Disable coredumps, because a SIGSYS crash is expected. */ + struct rlimit limit = { 0 }; + limit.rlim_cur = 1; + limit.rlim_max = 1; + setrlimit(RLIMIT_CORE, &limit); + /* Test the syscall in the child process. */ + syscall(SYS_sched_setaffinity, 0, 0, 0); + _exit(0); + } else if (pid < 0) { + return false; + } + if (waitpid(pid, &status, 0) < 0) { + return false; + } + if (WIFSIGNALED(status)) { + /* The child process was terminated by a signal, + * thus the syscall cannot be used. + */ + return false; + } + return true; +} + +(In reply to Ahzo from comment #2) +> To check for the availability of the syscall, one can try it in a child +> process and see if the child is terminated by a signal, e.g. like this: + +Afraid not, QEMU's seccomp filter blocks use of fork() too :-) + +(In reply to Ahzo from comment #0) +> The problematic code at src/util/u_queue.c:252 was added in the following +> commit: +> commit d877451b48a59ab0f9a4210fc736f51da5851c9a +> Author: Marek Olšák <email address hidden> +> Date: Mon Oct 1 15:51:06 2018 -0400 +> +> util/u_queue: add UTIL_QUEUE_INIT_SET_FULL_THREAD_AFFINITY +> +> Initial version discussed with Rob Clark under a different patch name. +> This approach leaves his driver unaffected. +> +> +> Since setting the thread affinity seems non-essential here, the failing +> syscall should be handled gracefully, for example by setting a signal +> handler to ignore the SIGSYS signal. + +I'm curious what motivated this change to start with ? Even if QEMU was not enforcing seccomp filters, I think I'd consider it a bug for mesa to be setting its process affinity in this way. The mgmt application or sysadmin has decided that the process must have a certain affinity, based on how it/they want the host CPUs utilized. Why is mesa wanting to override this administrative policy decision to restrict CPU usage ? + +(In reply to Daniel P. Berrange from comment #4) +> +> I'm curious what motivated this change to start with ? Even if QEMU was not +> enforcing seccomp filters, I think I'd consider it a bug for mesa to be +> setting its process affinity in this way. The mgmt application or sysadmin +> has decided that the process must have a certain affinity, based on how +> it/they want the host CPUs utilized. Why is mesa wanting to override this +> administrative policy decision to restrict CPU usage ? + +To improve performance on modern multi-core NUMA architectures. + +Sent a quick RFC for an env variable workaround on the ML "[PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering". + +(In reply to Daniel P. Berrange from comment #4) +> I'm curious what motivated this change to start with ? Even if QEMU was not +> enforcing seccomp filters, I think I'd consider it a bug for mesa to be +> setting its process affinity in this way. The mgmt application or sysadmin +> has decided that the process must have a certain affinity, based on how +> it/they want the host CPUs utilized. Why is mesa wanting to override this +> administrative policy decision to restrict CPU usage ? + +The correct solution is to fix pthread_setaffinity such that it returns an error code instead of crashing. + +An even better solution would be to have a virtual thread affinity that only the application can see and change, which should be silently masked by administrative policies not visible to the application. + +(In reply to Marek Olšák from comment #7) +> An even better solution would be to have a virtual thread affinity that only +> the application can see and change, which should be silently masked by +> administrative policies not visible to the application. + +Mesa doesn't really need explicit thread affinity at all. All it wants is that certain sets of threads run on the same CPU module; it doesn't care which particular CPU module that is. What's really needed is an API to express this affinity between threads, instead of to specific CPU cores. + +(In reply to Daniel P. Berrange from comment #3) +> (In reply to Ahzo from comment #2) +> > To check for the availability of the syscall, one can try it in a child +> > process and see if the child is terminated by a signal, e.g. like this: +> +> Afraid not, QEMU's seccomp filter blocks use of fork() too :-) + +Maybe it should, at least when using the spawn=deny option, but currently it doesn't. That option only blocks the fork, vfork and execve syscalls, but glibc's fork() function uses the clone syscall, and thus continues to work. +However, that behavior might be different when using other C library implementations, so it wouldn't be correct to rely on this. +One could use clone() instead of fork(), but future versions of qemu might block the clone syscall, as well. + +Unfortunately, I'm not aware of a proper solution for this bug short of adding a new API to the kernel. + +You can test 19.0~rc6 with this reverted on a ppa: + +ppa:canonical-x/x-staging + +should be built in 30min + +Hi Timo, +I tried to test with the mesa from ppa:canonical-x/x-staging +But there is a dependency issue in that PPA - I can't install all packages from there. +It seems most of the X* packages will need a transition for the new mesa and those are not in this ppa right now. + +Installing all that I can from the PPA doesn't resolve the issue, is there something more you need to upload to the PPA - or are there other things I'd need to do to install all of mesa? + +This is the current mix of rc5/6 it gave me :-/ +libegl-mesa0:amd64 19.0.0~rc5-1ubuntu0.1 +libegl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1 +libgl1-mesa-dri:amd64 19.0.0~rc5-1ubuntu0.1 +libgl1-mesa-glx:amd64 19.0.0~rc6-1ubuntu0.1 +libglapi-mesa:amd64 19.0.0~rc5-1ubuntu0.1 +libglx-mesa0:amd64 19.0.0~rc5-1ubuntu0.1 +libwayland-egl1-mesa:amd64 19.0.0~rc6-1ubuntu0.1 +mesa-va-drivers:amd64 19.0.0~rc5-1ubuntu0.1 +mesa-vdpau-drivers:amd64 19.0.0~rc5-1ubuntu0.1 + +I don't have that issue on a chroot, so you should at least tell me why it would refuse to upgrade them all.. apt should show an error + +The PPA was built against -proposed so I had to enable that to install all libs. +That done the 19.0.0~rc6-1ubuntu0.1 with the set affinity change reverted works quite nicely. + +It would be great to get that into Ubuntu 19.04 until the involved upstreams agreed how to proceed with it and we can then sort out what to do in which package. Which after all might be after cutoff and in 19.10 then. + +Thanks Timo, let me know if you need another verification on this at any point to drive it into 19.04. + +We're getting down to just a few bugs blocking 19.0, so I'm pinging those bugs to see what the progress is? + +I'm removing this from the 19.0 blocking tracker. Generally we don't add bugs to block a release if they were present in the previous release, additionally there doesn't seem to be any consensus on a solution, at this moment. If there is a fix implemented I'd be happy to pull that into a later 19.0 release. + +This bug was fixed in the package mesa - 19.0.0-1ubuntu1 + +--------------- +mesa (19.0.0-1ubuntu1) disco; urgency=medium + + * Merge from Debian. (LP: #1818516) + * revert-set-full-thread-affinity.diff: Fix qemu crash. (LP: #1815889) + + -- Timo Aaltonen <email address hidden> Thu, 14 Mar 2019 18:48:18 +0200 + +(In reply to Michel Dänzer from comment #8) +> Mesa doesn't really need explicit thread affinity at all. All it wants is +> that certain sets of threads run on the same CPU module; it doesn't care +> which particular CPU module that is. What's really needed is an API to +> express this affinity between threads, instead of to specific CPU cores. + +I think the thread affinity API is a correct way to optimize for CPU cache topologies. pthread is a basic user API. Security policies shouldn't disallow pthread functions. + +FYI the QEMU change merged in the following pull request changed to return an EPERM errno for the thread affinity syscalls: + +commit 12f067cc14b90aef60b2b7d03e1df74cc50a0459 +Merge: 84bdc58c06 035121d23a +Author: Peter Maydell <email address hidden> +Date: Thu Mar 28 12:04:52 2019 +0000 + + Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20190327' into staging + + pull-seccomp-20190327 + + # gpg: Signature made Wed 27 Mar 2019 12:12:39 GMT + # gpg: using RSA key DF32E7C0F0FFF9A2 + # gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <email address hidden>" [full] + # Primary key fingerprint: D67E 1B50 9374 86B4 0723 DBAB DF32 E7C0 F0FF F9A2 + + * remotes/otubo/tags/pull-seccomp-20190327: + seccomp: report more useful errors from seccomp + seccomp: don't kill process for resource control syscalls + + Signed-off-by: Peter Maydell <email address hidden> + +IOW, mesa's usage of this syscalls will still be blocked, but it will no longer kill the process. + +Thank you Daniel, +we will most likely keep Disco as-is for now and merge this in 19.10 where then mesa can drop the revert. I tagged it for 19.10 to be revisited. + +This problem was solved by qemu [1], so this mesa bug can be closed. + +[1] https://git.qemu.org/git/qemu.git/?a=commitdiff;h=9a1565a03b79d80b236bc7cc2dbce52a2ef3a1b8 + +Reopening/Assigning to TImo for eoan since there is a patch which can we dropped once qemu is fixed + +I believe this was fixed by qemu 4.0 in eoan. + +This bug was fixed in the package mesa - 19.2.4-1ubuntu1 + +--------------- +mesa (19.2.4-1ubuntu1) focal; urgency=medium + + * Merge from Debian. + * revert-set-full-thread-affinity.diff: Dropped, qemu is fixed now in + eoan and up. (LP: #1815889) + + -- Timo Aaltonen <email address hidden> Wed, 20 Nov 2019 20:17:00 +0200 + |