1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
|
watchdog: BUG: soft lockup - CPU#N stuck for XXs!
Description of problem:
Repeatedly seeing Qemu VMs locking up with guest Linux kernel reporting:
"watchdog: BUG: soft lockup - CPU#<N> stuck for <XX>s!"
e.g.: "watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [swapper/5:0]"
When the guest VM is in this condition, the host Linux OS reports that the Qemu process is typically running steadily at ~250% CPU.
Steps to reproduce:
1. Windows 10 on an x64 PC (right on the metal).
2. VMWare Workstation running Fedora Workstation 38 x64 guest, in turn acting as host with nested virtualization.
3. Qemu 7.2.1 running on Fedora host with Fedora Server 38 x64 guest.
4. Invoke Qemu using F38 QCow2 image: `$ qemu-system-x86_64 -machine pc -cpu max -smp 8 -accel kvm -accel hvf -accel tcg -m 3G -nographic -hda Client.qcow2 -nic socket,model=virtio-net-pci,mcast=239.1.2.3:4567,mac=4a:e0:72:85:c0:fb -nic user,model=virtio-net-pci,mac=4a:e0:d8:cd:a5:e6,hostfwd=tcp:127.0.0.1:2288-:22`
5. Not necessarily right away, but pretty consistently if left running overnight, guest Linux kernel repeatedly reports CPU(s) stuck, guest VM is unresponsive
6. Host Linux `top` reports Qemu process using ~250% CPU.
Additional information:
Console log attached, small sample here:
```
[ 181.101152] watchdog: BUG: soft lockup - CPU#5 stuck for 26s! [swapper/5:0]
[ 181.145578] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nfg
[ 181.145578] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.2.9-300.fc38.x86_64 #1
[ 181.145578] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-1.fc38 04/01/2014
[ 181.145578] RIP: 0010:netif_receive_skb_list_internal+0x58/0x300
[ 181.145578] Code: 4c 89 74 24 08 49 89 ec 4c 89 74 24 10 4c 8b 6d 00 48 39 ef 75 14 eb 7c 49 8b 8
[ 181.145578] RSP: 0018:ff5a086d401b8da8 EFLAGS: 00000202
[ 181.145578] RAX: 0000000000000000 RBX: ff2d02b1b404c910 RCX: 0000000000000100
[ 181.145578] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ff2d02b1b404c910
[ 181.145578] RBP: ff2d02b18998a600 R08: 0000000000000001 R09: ff2d02b188bd5d00
[ 181.145578] R10: 000000000000000c R11: ffa7ad3980175000 R12: ff2d02b18998a600
[ 181.145578] R13: ff2d02b1b404c910 R14: ff5a086d401b8db0 R15: ff2d02b1882d19c0
[ 181.145578] FS: 0000000000000000(0000) GS:ff2d02b23cb40000(0000) knlGS:0000000000000000
[ 181.145578] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 181.145578] CR2: 00007f232000f0d8 CR3: 0000000027010000 CR4: 0000000000751ee0
[ 181.145578] PKRU: 55555554
[ 181.145578] Call Trace:
[ 181.145578] <IRQ>
[ 181.145578] napi_complete_done+0x6e/0x1a0
[ 181.145578] virtnet_poll+0x420/0x550 [virtio_net]
[ 181.145578] __napi_poll+0x2b/0x1b0
[ 181.145578] net_rx_action+0x2a5/0x360
[ 181.145578] ? vp_vring_interrupt+0x73/0x90
[ 181.145578] __do_softirq+0xfd/0x31a
[ 181.145578] __irq_exit_rcu+0xd7/0x140
[ 181.145578] common_interrupt+0xb9/0xd0
[ 181.145578] </IRQ>
[ 181.145578] <TASK>
[ 181.145578] asm_common_interrupt+0x22/0x40
[ 181.145578] RIP: 0010:native_safe_halt+0xb/0x10
```
|