KVM: 0.773 TCG: 0.681 VMM: 0.651 hypervisor: 0.644 virtual: 0.637 vnc: 0.628 ppc: 0.625 permissions: 0.588 debug: 0.578 semantic: 0.575 files: 0.575 PID: 0.575 graphic: 0.568 device: 0.566 register: 0.555 peripherals: 0.547 risc-v: 0.539 user-level: 0.517 network: 0.510 performance: 0.509 architecture: 0.496 arm: 0.470 assembly: 0.465 mistranslation: 0.446 i386: 0.440 socket: 0.437 x86: 0.435 kernel: 0.429 boot: 0.411 watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] Description of problem: During virtual disk live move/migration, VMs get severe stuttering and even cpu soft lockups, as described here: https://bugzilla.kernel.org/show_bug.cgi?id=199727 This also happens on some of our virtual machines when i/o load inside VM is high or workload is fsync centric. i'm searching for a solution to mitigate this problem, i.e. i can live with the stuttering/delays of several seconds, but getting cpu soft lockups of 22s or higher is inacceptable. i have searched the web for a long long time now, but did not find a solution , nor did i find a way on how to troubleshoot this more in depth to find the real root cause. if this issue report will not getting accepted because of "non native qemu" (i.e. proxmox platform) , please tell me which qemu/distro i can/should use instead (which has easy usable live migration feature) to try reproducing the problem. Steps to reproduce: 1. do a live migration of one or more virtual machine disks 2. watch "ioping -WWWYy test.dat" inside VM (being moved) for disk latency 3. you disk latency is heavily varying , from time to time it goes up to vaues of tens seconds, even leading to kernel messages like " kernel:[ 2155.520846] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0]" ``` 4 KiB >>> test.dat (ext4 /dev/sda1): request=55 time=1.07 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=56 time=1.24 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=57 time=567.4 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=58 time=779.0 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=59 time=589.0 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=60 time=1.57 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=61 time=847.7 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=62 time=933.0 ms 4 KiB >>> test.dat (ext4 /dev/sda1): request=63 time=891.4 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=64 time=820.8 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=65 time=1.02 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=66 time=2.44 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=67 time=620.7 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=68 time=1.03 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=69 time=1.24 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=70 time=1.42 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=71 time=1.36 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=72 time=1.41 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=73 time=1.33 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=74 time=2.36 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=75 time=1.46 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=76 time=1.45 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=77 time=1.28 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=78 time=1.41 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=79 time=2.33 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=80 time=1.39 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=81 time=1.35 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=82 time=1.54 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=83 time=1.52 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=84 time=1.50 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=85 time=2.00 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=86 time=1.47 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=87 time=1.26 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=88 time=1.29 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=89 time=2.05 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=90 time=1.44 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=91 time=1.43 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=92 time=1.72 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=93 time=1.77 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=94 time=2.56 s Message from syslogd@iotest2 at Jan 14 14:51:12 ... kernel:[ 2155.520846] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/1:0] 4 KiB >>> test.dat (ext4 /dev/sda1): request=95 time=22.5 s (slow) 4 KiB >>> test.dat (ext4 /dev/sda1): request=96 time=3.56 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=97 time=1.52 s (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=98 time=1.69 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=99 time=1.90 s 4 KiB >>> test.dat (ext4 /dev/sda1): request=100 time=1.15 s (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=101 time=890.0 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=102 time=959.6 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=103 time=926.5 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=104 time=791.5 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=105 time=577.8 ms (fast) 4 KiB >>> test.dat (ext4 /dev/sda1): request=106 time=867.7 ms (fast) ```