diff options
Diffstat (limited to 'results/classifier/108/other/1471583')
| -rw-r--r-- | results/classifier/108/other/1471583 | 174 |
1 files changed, 174 insertions, 0 deletions
diff --git a/results/classifier/108/other/1471583 b/results/classifier/108/other/1471583 new file mode 100644 index 000000000..120f699ea --- /dev/null +++ b/results/classifier/108/other/1471583 @@ -0,0 +1,174 @@ +graphic: 0.795 +other: 0.757 +device: 0.737 +debug: 0.726 +semantic: 0.725 +network: 0.702 +permissions: 0.678 +PID: 0.670 +performance: 0.654 +vnc: 0.642 +boot: 0.635 +socket: 0.618 +KVM: 0.536 +files: 0.502 + +QCA988X Wifi Card Not PCI Passing Through + +CPU: Intel(R) Xeon(R) CPU E3-1265L v3 @ 2.50GHz +KVM: qemu-kvm-1.5.3-86.el7_1.2.x86_64 +Kernel: 4.1.1-1.el7.elrepo.x86_64, and kernel-3.10.0-229.7.2.el7.x86_64 +Host & Guest: CentOS 7.1 +Using virt-manager-1.1.0-12.el7.noarch to create, configure, and start guest + +I am trying to do a PCI passthrough of a QCA988X wifi card. It's a Doodle Labs military-grade 802.11ac miniPCI card, which uses the ath10k kernel driver. This card configures nicely on the host, and seems to pass through to the guest, but early in the boot of the guest it says "Unknown header type" at the wifi's bus address. And sure enough, lspci -vv on the host then shows: + !!! Unknown header type 7f + Kernel driver in use: vfio-pci + +When the guest has booted, of course it shows as an Unclassified device. Host and guest must run at least kernel 4.0 so the wifi card's current firmware will load, and so that its driver comes with the kernel. I have both host and guest set up for the wifi card. I tried running kernel 3.10 in the host and passing through the PCI device, but same behavior. + +I am passing through to the guest an Intel i350 ethernet card just fine, in fact I'm passing through two of its SR-IOV virt interfaces to the guest, so that works. + +On the host, before I start the guest, the wifi card looks like this (lspci -vv): + +0a:00.0 Network controller: Qualcomm Atheros QCA988x 802.11ac Wireless Network Adapter + Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ + Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- + Latency: 0, Cache Line Size: 64 bytes + Interrupt: pin A routed to IRQ 43 + Region 0: Memory at f7000000 (64-bit, non-prefetchable) [size=2M] + Expansion ROM at f7200000 [disabled] [size=64K] + Capabilities: [40] Power Management version 2 + Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+) + Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- + Capabilities: [50] MSI: Enable+ Count=8/8 Maskable+ 64bit- + Address: fee00618 Data: 0000 + Masking: 00000000 Pending: 00000000 + Capabilities: [70] Express (v2) Endpoint, MSI 00 + DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us + ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset- + DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- + RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- + MaxPayload 128 bytes, MaxReadReq 512 bytes + DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend- + LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us + ClockPM- Surprise- LLActRep- BwNot- + LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- + ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- + LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- + DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported + DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled + LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis- + Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- + Compliance De-emphasis: -6dB + LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1- + EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest- + Capabilities: [100 v1] Advanced Error Reporting + UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- + UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- + UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- + CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr- + CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ + AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn- + Capabilities: [140 v1] Virtual Channel + Caps: LPEVC=0 RefClk=100ns PATEntryBits=1 + Arb: Fixed- WRR32- WRR64- WRR128- + Ctrl: ArbSelect=Fixed + Status: InProgress- + VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans- + Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256- + Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01 + Status: NegoPending- InProgress- + Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00 + Kernel driver in use: ath10k_pci + +It probably needs a quirk like this to avoid bus resets: + +http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/quirks.c?id=c3e59ee4e76686b0c84ca8faa1011d10cd4ca1b8 + +IOW, add a line like this below the line added by the above patch: + +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset); + +Double check that vendor:device ID against 'lspci -nn', that's 168c:003c. + +It does sound exactly like what I'm seeing. + +From http://www.gossamer-threads.com/lists/linux/kernel/2054846 + +"Yes. If you *re*start the VM . ... The first start (after reboot) was not a problem." + +It seems clear that this problem began with kernel 3.13. I tried applying the backports ath10k to kernel 3.10, but the kernel didn't recognize it, or install put it in the wrong place or something. So I tried kernel 4.1 and the module that comes with it fails this way. From the listserv I should have thought this would be fixed by kernel 4.1, but then maybe my device is so new and this has to be device-specific? + +Thanks for your instructions Alex, but I don't fully understand. I searched for an options line I could put in /etc/modprobe.d/blacklist.conf to prevent PCI bus reset for the module, but couldn't find anything. The "Unknown header type" happens very early in the boot of the guest so I don't see how fixing the guest module would help. + +Maybe you're saying that I must compile the backports module with some patch in the above link and add the line you suggest? Only thing is I don't understand why the module install failed. And I don't know whether the host should have the patched ath10k module as well as the guest. Does the host need the module at all of PCI passthrough? + + + +Actually my card locks up *on* boot of the guest, not after its reboot. + +This generation of Atheros cards requires firmware files: https://wireless.wiki.kernel.org/en/users/Drivers/ath10k/firmware +So kernel 3.10 (default with CentOS 7.1) is not an option; 3.11 is the minimum, and that doesn't allow AP mode which I need. So hell, I might as well go with ElRepo's ml kernel, unless you recommend otherwise. + +I compile the ath10k module (from somewhere), and somehow supplant the default ath10k module that comes with the kernel. + + + +The host kernel ath10k drivers and firmware are irrelevant. The change I'm asking about in comment #2 requires a patch and recompile to the host kernel. + +You mean 'a patch and recompile to the -guest- kernel'? Otherwise I'm confused. + +No, the -host- kernel. The problem is that these Atheros WiFi chips do not come back when we do a PCI bus reset. These devices only offer two ways to reset the device, a power management reset and a PCI bus reset. The extent of a power management reset is poorly defined, so we tend to prefer a PCI bus reset. A PCI bus reset is a standard part of the PCI specification and the device is expected to return from reset and be accessible. These devices never recover from reset, resulting in the behavior you're seeing. I've been unsuccessful in contacting Qualcomm/Atheros regarding this problem, so we're effectively blacklisting the devices in the host kernel to disallow the PCI bus reset mechanism. + +I see. And now I also understand that the patch is to the *PCI* driver, not the ath10k driver. + +I was busily trying to get the SRPM for ElRepo's 4.1 kernel to recompile for the guest, but it is not there. He has something called "kernel-ml-4.1.1-1.el7.elrepo.nosrc.rpm" which is incomplete, and so is completely baffling. There are no instructions; instructions are for sissies... + +I was about to give up. + +But since I now see the patch has to be applied to the host kernel, I have a chance at getting the 3.10 SRPM. I guess I'd compile the kernel with rpmbuild and then just graft the PCI module into the regular binary kernel. + + + +Oh this is not good. The current kernel (3.10.0-229.7.2.el7.x86_64) that comes with the current CentOS (7.1) does not have in quirks.c the preceeding or succeeding stanzas in the patch here: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/quirks.c?id=c3e59ee4e76686b0c84ca8faa1011d10cd4ca1b8 + +Apparently that patch was designed for kernel 3.14, so I'd better not move forward with 3.10. Even kernel-plus is still at 3.10. And ElRepo's 4.1 kernel-ml doesn't come with a complete kernel SRPM. So I'm stuck now. I want to stay with el7 packages if possible. I'm running KVM so need a kernel compatible with that. + +I've figured out how to compile ElRepo's kernel-ml and make the change to pci's quirks. But now the KVM's VM won't even boot. It gives a popup with: +"Error starting domain: Unable to read from monitor: Connection reset by peer + +Traceback (most recent call last): + File "/usr/share/virt-manager/virtManager/asyncjob.py", line 89, in cb_wrapper + callback(asyncjob, *args, **kwargs) + File "/usr/share/virt-manager/virtManager/asyncjob.py", line 125, in tmpcb + callback(*args, **kwargs) + File "/usr/share/virt-manager/virtManager/domain.py", line 1393, in startup + self._backend.create() + File "/usr/lib64/python2.7/site-packages/libvirt.py", line 966, in create + if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self) +libvirtError: Unable to read from monitor: Connection reset by peer" + +It only happens when I add the Atheros QCS988x PCI device. + +No idea. + + +Oh I see. It's because the path that was shared on the host is no longer available, apparently causing this weird error message. + +0a:00.0 Network controller: Qualcomm Atheros QCA988x 802.11ac Wireless Network Adapter (rev ff) (prog-if ff) + !!! Unknown header type 7f + +So even earlier than PCI reset on the guest, my device on the host is getting jammed. + +I guess it has to go back. Nobody knows what's wrong. This is a Doodle Labs ACE-DB-3. + + + + +Yep, lack of interest here. The ACE-DB-3 (and probably all QCA988x) simply does not work with Linux. No more time for this. + +Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU and the kernel? Or could we close this ticket nowadays? + +[Expired for QEMU because there has been no activity for 60 days.] + |