summary refs log tree commit diff stats
path: root/results/classifier/accel-gemma3:12b/kvm/2006
blob: 093c87d0d06048934a07fb7ce9632f99dc43e119 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
migrating failed with rcu_preempt message on proxmox 8
Description of problem:
when i migrate the VM from one host to another, it fails and give messages:

   ```
[  584.109502] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[  584.109534] rcu: 	1-...!: (0 ticks this GP) idle=1408/0/0x0 softirq=8428/8428 fqs=0 (false positive?)
[  584.109556] 	(detected by 0, t=5252 jiffies, g=2953, q=74 ncpus=2)
[  584.109561] Sending NMI from CPU 0 to CPUs 1:
[  584.109587] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xb/0x10
[  584.110564] rcu: rcu_preempt kthread timer wakeup didn't happen for 5251 jiffies! g2953 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[  584.110585] rcu: 	Possible timer handling issue on cpu=1 timer-softirq=8006
[  584.110597] rcu: rcu_preempt kthread starved for 5252 jiffies! g2953 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[  584.110614] rcu: 	Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
[  584.110645] rcu: RCU grace-period kthread stack dump:
[  584.110658] task:rcu_preempt     state:I stack:0     pid:15    ppid:2      flags:0x00004000
[  584.110667] Call Trace:
[  584.110672]  <TASK>
[  584.110688]  __schedule+0x351/0xa20
[  584.110699]  ? rcu_gp_cleanup+0x480/0x480
[  584.110704]  schedule+0x5d/0xe0
[  584.110705]  schedule_timeout+0x94/0x150
[  584.110709]  ? __bpf_trace_tick_stop+0x10/0x10
[  584.110714]  rcu_gp_fqs_loop+0x141/0x4c0
[  584.110717]  rcu_gp_kthread+0xd0/0x190
[  584.110720]  kthread+0xe9/0x110
[  584.110725]  ? kthread_complete_and_exit+0x20/0x20
[  584.110728]  ret_from_fork+0x22/0x30
[  584.110735]  </TASK>
[  584.110736] rcu: Stack dump where RCU GP kthread last ran:
[  584.110747] Sending NMI from CPU 0 to CPUs 1:
[  584.110757] NMI backtrace for cpu 1 skipped: idling at native_safe_halt+0xb/0x10

   ```

we can reproduce on our R630 cluster easily, but it is OK on R730 cluster and R740 cluster.
Steps to reproduce:
1. create and run an VM
2. migrate the vm to other host
3. it failed with message
Additional information:
i downgrade the pve-qemu-kvm from 8.1.2-4 to 8.0.2-3, same problem.