summary refs log tree commit diff stats
path: root/results/classifier/zero-shot/105/graphic/1471583
blob: 321d30c3e6067a4c4179d68e8d94940cf6b4af33 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
graphic: 0.795
other: 0.757
assembly: 0.755
device: 0.737
semantic: 0.725
instruction: 0.719
network: 0.702
mistranslation: 0.658
vnc: 0.642
boot: 0.635
socket: 0.618
KVM: 0.536

QCA988X Wifi Card Not PCI Passing Through

CPU:  Intel(R) Xeon(R) CPU E3-1265L v3 @ 2.50GHz
KVM:  qemu-kvm-1.5.3-86.el7_1.2.x86_64
Kernel:  4.1.1-1.el7.elrepo.x86_64, and kernel-3.10.0-229.7.2.el7.x86_64
Host & Guest: CentOS 7.1
Using virt-manager-1.1.0-12.el7.noarch to create, configure, and start guest

I am trying to do a PCI passthrough of a QCA988X wifi card.  It's a Doodle Labs military-grade 802.11ac miniPCI card, which uses the ath10k kernel driver.  This card configures nicely on the host, and seems to pass through to the guest, but early in the boot of the guest it says "Unknown header type" at the wifi's bus address.  And sure enough, lspci -vv on the host then shows:
        !!! Unknown header type 7f
        Kernel driver in use: vfio-pci

When the guest has booted, of course it shows as an Unclassified device.  Host and guest must run at least kernel 4.0 so the wifi card's current firmware will load, and so that its driver comes with the kernel.  I have both host and guest set up for the wifi card.  I tried running kernel 3.10 in the host and passing through the PCI device, but same behavior.

I am passing through to the guest an Intel i350 ethernet card just fine, in fact I'm passing through two of its SR-IOV virt interfaces to the guest, so that works.

On the host, before I start the guest, the wifi card looks like this (lspci -vv):

0a:00.0 Network controller: Qualcomm Atheros QCA988x 802.11ac Wireless Network Adapter
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 43
        Region 0: Memory at f7000000 (64-bit, non-prefetchable) [size=2M]
        Expansion ROM at f7200000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 2
                Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable+ Count=8/8 Maskable+ 64bit-
                Address: fee00618  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
                DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
                        RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
                        MaxPayload 128 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <4us, L1 <64us
                        ClockPM- Surprise- LLActRep- BwNot-
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Not Supported, TimeoutDis+, LTR-, OBFF Not Supported
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-
                         EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
        Capabilities: [100 v1] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
                AERCap: First Error Pointer: 00, GenCap- CGenEn- ChkCap- ChkEn-
        Capabilities: [140 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                        Status: NegoPending- InProgress-
        Capabilities: [160 v1] Device Serial Number 00-00-00-00-00-00-00-00
        Kernel driver in use: ath10k_pci

It probably needs a quirk like this to avoid bus resets:

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/quirks.c?id=c3e59ee4e76686b0c84ca8faa1011d10cd4ca1b8

IOW, add a line like this below the line added by the above patch:

DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x003c, quirk_no_bus_reset);

Double check that vendor:device ID against 'lspci -nn', that's 168c:003c.

It does sound exactly like what I'm seeing.

From http://www.gossamer-threads.com/lists/linux/kernel/2054846

"Yes. If you *re*start the VM . ...  The first start (after reboot) was not a problem."

It seems clear that this problem began with kernel 3.13.  I tried applying the backports ath10k to kernel 3.10, but the kernel didn't recognize it, or install put it in the wrong place or something.  So I tried kernel 4.1 and the module that comes with it fails this way.  From the listserv I should have thought this would be fixed by kernel 4.1, but then maybe my device is so new and this has to be device-specific?

Thanks for your instructions Alex, but I don't fully understand.  I searched for an options line I could put in /etc/modprobe.d/blacklist.conf to prevent PCI bus reset for the module, but couldn't find anything.  The "Unknown header type" happens very early in the boot of the guest so I don't see how fixing the guest module would help.

Maybe you're saying that I must compile the backports module with some patch in the above link and add the line you suggest?  Only thing is I don't understand why the module install failed.  And I don't know whether the host should have the patched ath10k module as well as the guest.  Does the host need the module at all of PCI passthrough?



Actually my card locks up *on* boot of the guest, not after its reboot.

This generation of Atheros cards requires  firmware files:  https://wireless.wiki.kernel.org/en/users/Drivers/ath10k/firmware
So kernel 3.10 (default with CentOS 7.1) is not an option;  3.11 is the minimum, and that doesn't allow AP mode which I need.  So hell, I might as well go with ElRepo's ml kernel, unless you recommend otherwise.

I compile the ath10k module (from somewhere), and somehow supplant the default ath10k module that comes with the kernel.



The host kernel ath10k drivers and firmware are irrelevant.  The change I'm asking about in comment #2 requires a patch and recompile to the host kernel.

You mean 'a patch and recompile to the -guest- kernel'?  Otherwise I'm confused.

No, the -host- kernel.  The problem is that these Atheros WiFi chips do not come back when we do a PCI bus reset.  These devices only offer two ways to reset the device, a power management reset and a PCI bus reset.  The extent of a power management reset is poorly defined, so we tend to prefer a PCI bus reset.  A PCI bus reset is a standard part of the PCI specification and the device is expected to return from reset and be accessible.  These devices never recover from reset, resulting in the behavior you're seeing.  I've been unsuccessful in contacting Qualcomm/Atheros regarding this problem, so we're effectively blacklisting the devices in the host kernel to disallow the PCI bus reset mechanism.

I see.  And now I also understand that the patch is to the *PCI* driver, not the ath10k driver.

I was busily trying to get the SRPM for ElRepo's 4.1 kernel to recompile for the guest, but it is not there.  He has something called "kernel-ml-4.1.1-1.el7.elrepo.nosrc.rpm" which is incomplete, and so is completely baffling.  There are no instructions;  instructions are for sissies...  

I was about to give up.

But since I now see the patch has to be applied to the host kernel, I have a chance at getting the 3.10 SRPM.   I guess I'd compile the kernel with rpmbuild and then just graft the PCI module into the regular binary kernel.



Oh this is not good.  The current kernel (3.10.0-229.7.2.el7.x86_64) that comes with the current CentOS (7.1) does not have in quirks.c the preceeding or succeeding stanzas in the patch here:  https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/pci/quirks.c?id=c3e59ee4e76686b0c84ca8faa1011d10cd4ca1b8

Apparently that patch was designed for kernel 3.14, so I'd better not move forward with 3.10.  Even kernel-plus is still at 3.10.  And ElRepo's 4.1 kernel-ml doesn't come with a complete kernel SRPM.  So I'm stuck now.  I want to stay with el7 packages if possible.  I'm running KVM so need a kernel compatible with that.

I've figured out how to compile ElRepo's kernel-ml and make the change to pci's quirks.  But now the KVM's VM won't even boot.  It gives a popup with:
"Error starting domain: Unable to read from monitor: Connection reset by peer

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 89, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 125, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/domain.py", line 1393, in startup
    self._backend.create()
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 966, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirtError: Unable to read from monitor: Connection reset by peer"

It only happens when I add the Atheros QCS988x PCI device.

No idea.


Oh I see.  It's because the path that was shared on the host is no longer available, apparently causing this weird error message.

0a:00.0 Network controller: Qualcomm Atheros QCA988x 802.11ac Wireless Network Adapter (rev ff) (prog-if ff)
        !!! Unknown header type 7f

So even earlier than PCI reset on the guest, my device on the host is getting jammed.

I guess it has to go back.  Nobody knows what's wrong.  This is a Doodle Labs ACE-DB-3.




Yep, lack of interest here.  The ACE-DB-3 (and probably all QCA988x) simply does not work with Linux.  No more time for this.

Looking through old bug tickets... can you still reproduce this issue with the latest version of QEMU and the kernel? Or could we close this ticket nowadays?

[Expired for QEMU because there has been no activity for 60 days.]