diff options
Diffstat (limited to 'results/scraper/launchpad-without-comments/1771238')
| -rw-r--r-- | results/scraper/launchpad-without-comments/1771238 | 156 |
1 files changed, 156 insertions, 0 deletions
diff --git a/results/scraper/launchpad-without-comments/1771238 b/results/scraper/launchpad-without-comments/1771238 new file mode 100644 index 000000000..21f4e22b9 --- /dev/null +++ b/results/scraper/launchpad-without-comments/1771238 @@ -0,0 +1,156 @@ +Not able to passthrough > 32 PCIe devices to a KVM Guest + +Using an Ubuntu Server 16.04-based host with KVM hypervisor installed, we are unable to launch a vanilla Ubuntu Server 16.04.4 guest with >= 32 PCIe devices. It is 100% reproducible. Using fewer PCIe devices works fine. We are using the vanilla kvm and qemu packages from the Canonical repos. + +The ultimate goal is to create a KVM Guest wherein I can passthrough 44 PCI devices. + +When a KVM Guest launches, it also has some internal PCIe devices including host bridge, USB, IDE (for virtual disk), and virtual nic etc. + +Script used to launch all devices looks like this: + +#!/bin/bash +NAME=16gpuvm + +sudo qemu-img create -f qcow2 /home/lab/kvm/images/${NAME}.img 40G && +sudo virt-install \ +--name ${NAME} \ +--ram 716800 \ +--vcpus 88 \ +--disk path=/home/lab/kvm/images/${NAME}.img,format=qcow2 \ +--network bridge=virbr0 \ +--graphics none \ +--host-device 34:00.0 \ +--host-device 36:00.0 \ +--host-device 39:00.0 \ +--host-device 3b:00.0 \ +--host-device 57:00.0 \ +--host-device 59:00.0 \ +--host-device 5c:00.0 \ +--host-device 5e:00.0 \ +--host-device 61:00.0 \ +--host-device 62:00.0 \ +--host-device 63:00.0 \ +--host-device 65:00.0 \ +--host-device 66:00.0 \ +--host-device 67:00.0 \ +--host-device 35:00.0 \ +--host-device 3a:00.0 \ +--host-device 58:00.0 \ +--host-device 5d:00.0 \ +--host-device 2e:00.0 \ +--host-device 2f:00.0 \ +--host-device 51:00.0 \ +--host-device 52:00.0 \ +--host-device b7:00.0 \ +--host-device b9:00.0 \ +--host-device bc:00.0 \ +--host-device be:00.0 \ +--host-device e0:00.0 \ +--host-device e2:00.0 \ +--host-device e5:00.0 \ +--host-device e7:00.0 \ +--host-device c1:00.0 \ +--host-device c2:00.0 \ +--host-device c3:00.0 \ +--host-device c5:00.0 \ +--host-device c6:00.0 \ +--host-device c7:00.0 \ +--host-device b8:00.0 \ +--host-device bd:00.0 \ +--host-device e1:00.0 \ +--host-device e6:00.0 \ +--host-device b1:00.0 \ +--host-device b2:00.0 \ +--host-device da:00.0 \ +--host-device db:00.0 \ +--console pty,target_type=serial \ +--location http://ftp.ubuntu.com/ubuntu/dists/xenial/main/installer-amd64 \ +--initrd-inject=/home/lab/kvm/images/preseed.cfg \ +--extra-args=" +console=ttyS0,115200 +locale=en_US +console-keymaps-at/keymap=us +console-setup/ask_detect=false +console-setup/layoutcode=us +keyboard-configuration/layout=USA +keyboard-configuration/variant=USA +hostname=${NAME} +file=file:/preseed.cfg +" + +Passing > 32 device causes this issue: 32nd device hits a DPC error and the host/HV crashes: + + +Apr 25 22:34:35 xpl-evt-16 kernel: [18125.977496] dpc 0000:5b:10.0:pcie210: DPC containment event, status:0x0009 source:0x0000 +Apr 25 22:34:35 xpl-evt-16 kernel: [18125.977500] dpc 0000:5b:10.0:pcie210: DPC unmasked uncorrectable error detected, remove downstream devices +Apr 25 22:34:35 xpl-evt-16 kernel: [18125.994326] vfio-pci 0000:5e:00.0: Refused to change power state, currently in D3 +Apr 25 22:34:35 xpl-evt-16 kernel: [18125.994427] iommu: Removing device 0000:5e:00.0 from group 92 + + +From syslog (attached) + +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194358] dpc 0000:bb:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194387] dpc 0000:bb:10.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194413] dpc 0000:d9:00.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194439] dpc 0000:d9:01.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194472] dpc 0000:d9:02.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194499] dpc 0000:d9:03.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194526] dpc 0000:d9:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194553] dpc 0000:d9:0c.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194583] dpc 0000:df:00.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194619] dpc 0000:df:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194649] dpc 0000:df:10.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194679] dpc 0000:e4:00.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194709] dpc 0000:e4:04.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194742] dpc 0000:e4:10.0:pcie210: DPC error containment capabilities: Int Msg #3, RPExt- PoisonedTLP+ SwTrigger+ RP PIO Log 0, DL_ActiveErr+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.194763] pciehp 0000:00:1c.0:pcie004: Slot #0 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ LLActRep+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.195036] pciehp 0000:60:02.0:pcie204: Slot #2 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.195278] pciehp 0000:60:0a.0:pcie204: Slot #10 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.195513] pciehp 0000:c0:02.0:pcie204: Slot #2 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.195753] pciehp 0000:c0:0a.0:pcie204: Slot #10 AttnBtn+ PwrCtrl+ MRL+ AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- LLActRep+ +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.196196] efifb: probing for efifb +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.196242] efifb: framebuffer at 0x9c000000, using 1920k, total 1920k +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.196247] efifb: mode is 800x600x32, linelength=3200, pages=1 +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.196250] efifb: scrolling: redraw +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.196254] efifb: Truecolor: size=8:8:8:8, shift=24:16:8:0 +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.206652] Console: switching to colour frame buffer device 100x37 +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.217034] fb0: EFI VGA frame buffer device +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.217173] intel_idle: MWAIT substates: 0x2020 +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.217174] intel_idle: v0.4.1 model 0x55 +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.220874] intel_idle: lapic_timer_reliable_states 0xffffffff +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.221219] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.221590] ACPI: Power Button [PWRF] +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.231089] ERST: Error Record Serialization Table (ERST) support is initialized. +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.231312] pstore: using zlib compression +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.231444] pstore: Registered erst as persistent store backend +Apr 25 22:37:13 xpl-evt-16 kernel: [ 2.232503] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC. + + + +All PCI devices go offline include NVMe. + +OS Drives go away, RAID-1 is remounted as RO, and eventually system crashes. + + +Apr 25 22:37:13 xpl-evt-16 rsyslogd-2007: action 'action 9' suspended, next retry is Wed Apr 25 22:37:43 2018 [v8.16.0 try http://www.rsyslog.com/e/2007 ] +Apr 25 22:37:13 xpl-evt-16 systemd-udevd[1383]: Process '/sbin/mdadm --incremental /dev/nvme1n1p2 --offroot' failed with exit code 1. +Apr 25 22:37:13 xpl-evt-16 systemd-udevd[1371]: Process '/sbin/mdadm --incremental /dev/nvme0n1p2 --offroot' failed with exit code 1. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting Apply Kernel Variables... +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounted Configuration File System. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounted FUSE Control File System. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Started Apply Kernel Variables. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Found device /dev/disk/by-uuid/269E-631A. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting File System Check on /dev/disk/by-uuid/269E-631A... +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Started File System Check Daemon to report status. +Apr 25 22:37:13 xpl-evt-16 systemd-fsck[1576]: fsck.fat 3.0.28 (2015-05-16) +Apr 25 22:37:13 xpl-evt-16 systemd-fsck[1576]: /dev/nvme0n1p1: 10 files, 1168/130812 clusters +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Started File System Check on /dev/disk/by-uuid/269E-631A. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounting /boot/efi... +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Mounted /boot/efi. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Reached target Local File Systems. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting Preprocess NFS configuration... +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting Create Volatile Files and Directories... +Apr 25 22:37:13 xpl-evt-16 systemd-tmpfiles[1714]: [/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring. +Apr 25 22:37:13 xpl-evt-16 systemd[1]: Starting openibd - configure Mellanox devices... +Apr 25 22:37:13 xpl-evt-16 kernel: [ 0.000000] random: get_random_bytes called from start_kernel+0x42/0x50d with crng_init=0 \ No newline at end of file |