summary refs log tree commit diff stats
path: root/results/classifier/accel-gemma3:12b/kvm/1581334
diff options
context:
space:
mode:
Diffstat (limited to 'results/classifier/accel-gemma3:12b/kvm/1581334')
-rw-r--r--results/classifier/accel-gemma3:12b/kvm/1581334101
1 files changed, 101 insertions, 0 deletions
diff --git a/results/classifier/accel-gemma3:12b/kvm/1581334 b/results/classifier/accel-gemma3:12b/kvm/1581334
new file mode 100644
index 000000000..4950a6065
--- /dev/null
+++ b/results/classifier/accel-gemma3:12b/kvm/1581334
@@ -0,0 +1,101 @@
+
+qemu + librbd takes high %sy cpu under high random io workload
+
+I got an IO problem. When running Qemu + ceph(use librbd), and do a random IO benchmark or some high load random IO test, it will exhaust all my host cpu on %sy cpu.
+It doesn’t happen all the time, but when it appear it will reproduce every time I start a random IO benchmark(test with Fio).
+And the only way to fix the problem is shutdown my vm and start it, but the problem will happen again with high random IO load.
+
+Some information:
+    Vendor      : HP
+    Product     : HP ProLiant BL460c Gen9
+    Kernel      : 3.16.0-4-amd64
+    Disto       : Debian
+    Version     : 8.4
+    Arch        : amd64
+    Qemu        : 2.1 ~ 2.6 (Yes, I already test the latest qemu2.6 version, but still got the problem)
+    Ceph        : Hammer 0.94.5
+    Librbd      : 0.94.5 ~ 10.2 (I rebuild librbd with ceph 10.2 source code, but the problem still here)
+    Qemu config : cache=none
+    Qemu cpu&mem: 4core, 8GB
+
+How can i reproduce the problem?
+
+while :; do bash randwrite.sh ; sleep 3600; done >test.log 2>&1 &
+(Sleep 3600 is the key to reproduce my problem. I don’t known how long sleep suit for reproduce, but one hour sleep is enough. the problem will easy reproduce after a long sleep, if i keep benchmark running without sleep, i can't reproduce it)
+
+My randwrite.sh script
+----------------------------------------------
+#!/bin/sh
+sync
+echo 3 > /proc/sys/vm/drop_caches
+
+FILENAME=/dev/vdc
+RUNTIME=100
+BLOCKSIZE=4K
+IOENGINE=libaio
+RESULTFILE=fio-randwrite.log
+IODEPTH=32
+RAMP_TIME=5
+SIZE=100G
+
+fio --numjobs 10 --norandommap --randrepeat=0 --readwrite=randwrite --ramp_time=$RAMP_TIME --bs=$BLOCKSIZE --runtime=$RUNTIME --iodepth=$IODEPTH --filename=$FILENAME --ioengine=$IOENGINE --direct=1 --name=iops_randwrite --group_reporting  | tee $RESULTFILE
+----------------------------------------------
+
+What happened after the problem appear?
+my vm will got huge IOPS drop. In my case, it will drop from 15000 IOPS to 3500 IOPS. And other thing, my host cpu will exhaust on %sy. Top output like this.
+
+Qemu Fio benchmark
+----------------------------------------------------
+Tasks: 284 total,   2 running, 282 sleeping,   0 stopped,   0 zombie
+%Cpu0  : 11.8 us, 66.7 sy,  0.0 ni, 21.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu1  : 12.7 us, 64.9 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu2  : 13.7 us, 64.5 sy,  0.0 ni, 21.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu3  : 13.2 us, 64.1 sy,  0.0 ni, 22.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu4  : 11.7 us, 65.4 sy,  0.0 ni, 22.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu5  : 13.2 us, 64.4 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu6  : 12.4 us, 65.1 sy,  0.0 ni, 22.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu7  : 13.6 us, 63.8 sy,  0.0 ni, 22.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu8  :  9.8 us, 73.0 sy,  0.0 ni, 17.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu9  :  7.8 us, 74.5 sy,  0.0 ni, 17.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu10 :  6.0 us, 81.4 sy,  0.0 ni,  6.6 id,  0.0 wa,  0.0 hi,  6.0 si,  0.0 st
+%Cpu11 :  8.4 us, 79.5 sy,  0.0 ni,  8.8 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
+%Cpu12 :  7.6 us, 80.7 sy,  0.0 ni,  7.0 id,  0.0 wa,  0.0 hi,  4.7 si,  0.0 st
+%Cpu13 :  7.4 us, 79.9 sy,  0.0 ni,  7.7 id,  0.0 wa,  0.0 hi,  5.0 si,  0.0 st
+%Cpu14 :  9.8 us, 75.4 sy,  0.0 ni, 11.4 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
+%Cpu15 :  6.7 us, 80.1 sy,  0.0 ni, 10.1 id,  0.0 wa,  0.0 hi,  3.0 si,  0.0 st
+%Cpu16 :  9.2 us, 69.2 sy,  0.0 ni, 17.5 id,  0.0 wa,  0.0 hi,  4.1 si,  0.0 st
+%Cpu17 :  9.9 us, 66.6 sy,  0.0 ni, 20.1 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st
+%Cpu18 : 16.6 us, 49.0 sy,  0.0 ni, 34.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu19 : 16.7 us, 46.4 sy,  0.0 ni, 36.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu20 : 13.0 us, 50.8 sy,  0.0 ni, 36.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu21 : 18.9 us, 46.2 sy,  0.0 ni, 34.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu22 : 12.1 us, 52.9 sy,  0.0 ni, 35.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu23 : 15.9 us, 47.6 sy,  0.0 ni, 36.6 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu24 :  6.7 us, 62.0 sy,  0.0 ni, 31.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu25 :  7.6 us, 63.7 sy,  0.0 ni, 28.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu26 :  8.1 us, 75.8 sy,  0.0 ni, 16.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu27 :  6.7 us, 73.6 sy,  0.0 ni, 19.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu28 :  9.2 us, 74.3 sy,  0.0 ni, 16.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu29 :  8.2 us, 73.3 sy,  0.0 ni, 18.5 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu30 :  4.4 us, 73.1 sy,  0.0 ni, 22.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+%Cpu31 :  7.5 us, 69.6 sy,  0.0 ni, 22.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
+KiB Mem:  13217662+total,  3721572 used, 12845504+free,   283228 buffers
+KiB Swap:  4194300 total,        0 used,  4194300 free.  2242976 cached Mem
+
+  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
+30349 root      20   0 25.381g 499892  20640 R  2495  0.4 119:11.98 qemu-system-x86
+
+Anything I do?
+I use perf top, profile to debug the problem. It show me that something like thread deadlock problem. Any I test QEMU with kernel RBD, it work fine.
+Here are the perf top output on host.
+---------------------------------------------------------------
+   PerfTop:   12393 irqs/sec  kernel:87.3%  exact:  0.0% [4000Hz cycles],  (all, 32 CPUs)
+-------------------------------------------------------------------------------
+
+    75.25%  [kernel]            [k] _raw_spin_lock
+     1.17%  [kernel]            [k] futex_wait_setup
+     0.86%  libc-2.19.so        [.] malloc
+     0.58%  [kernel]            [k] futex_wake
+     0.55%  libc-2.19.so        [.] 0x00000000000ea96f
+     0.41%  [kernel]            [k] native_write_msr_safe
+---------------------------------------------------------------
\ No newline at end of file