summaryrefslogtreecommitdiffstats
path: root/results/classifier/118/permissions/1581334
diff options
context:
space:
mode:
authorChristian Krinitsin <mail@krinitsin.com>2025-06-16 16:59:00 +0000
committerChristian Krinitsin <mail@krinitsin.com>2025-06-16 16:59:33 +0000
commit9aba81d8eb048db908c94a3c40c25a5fde0caee6 (patch)
treeb765e7fb5e9a3c2143c68b0414e0055adb70e785 /results/classifier/118/permissions/1581334
parentb89a938452613061c0f1f23e710281cf5c83cb29 (diff)
downloademulator-bug-study-9aba81d8eb048db908c94a3c40c25a5fde0caee6.tar.gz
emulator-bug-study-9aba81d8eb048db908c94a3c40c25a5fde0caee6.zip
add 18th iteration of classifier
Diffstat (limited to 'results/classifier/118/permissions/1581334')
-rw-r--r--results/classifier/118/permissions/1581334267
1 files changed, 267 insertions, 0 deletions
diff --git a/results/classifier/118/permissions/1581334 b/results/classifier/118/permissions/1581334
new file mode 100644
index 00000000..b1a44839
--- /dev/null
+++ b/results/classifier/118/permissions/1581334
@@ -0,0 +1,267 @@
+permissions: 0.865
+user-level: 0.857
+architecture: 0.790
+register: 0.779
+semantic: 0.776
+mistranslation: 0.758
+performance: 0.749
+debug: 0.733
+graphic: 0.727
+kernel: 0.720
+assembly: 0.717
+device: 0.707
+network: 0.701
+PID: 0.700
+risc-v: 0.697
+hypervisor: 0.685
+ppc: 0.662
+TCG: 0.660
+KVM: 0.659
+VMM: 0.655
+socket: 0.652
+virtual: 0.651
+peripherals: 0.634
+boot: 0.602
+vnc: 0.590
+x86: 0.588
+files: 0.538
+arm: 0.498
+i386: 0.493
+
+qemu + librbd takes high %sy cpu under high random io workload
+
+I got an IO problem. When running Qemu + ceph(use librbd), and do a random IO benchmark or some high load random IO test, it will exhaust all my host cpu on %sy cpu.
+It doesn’t happen all the time, but when it appear it will reproduce every time I start a random IO benchmark(test with Fio).
+And the only way to fix the problem is shutdown my vm and start it, but the problem will happen again with high random IO load.
+
+Some information:
+ Vendor : HP
+ Product : HP ProLiant BL460c Gen9
+ Kernel : 3.16.0-4-amd64
+ Disto : Debian
+ Version : 8.4
+ Arch : amd64
+ Qemu : 2.1 ~ 2.6 (Yes, I already test the latest qemu2.6 version, but still got the problem)
+ Ceph : Hammer 0.94.5
+ Librbd : 0.94.5 ~ 10.2 (I rebuild librbd with ceph 10.2 source code, but the problem still here)
+ Qemu config : cache=none
+ Qemu cpu&mem: 4core, 8GB
+
+How can i reproduce the problem?
+
+while :; do bash randwrite.sh ; sleep 3600; done >test.log 2>&1 &
+(Sleep 3600 is the key to reproduce my problem. I don’t known how long sleep suit for reproduce, but one hour sleep is enough. the problem will easy reproduce after a long sleep, if i keep benchmark running without sleep, i can't reproduce it)
+
+My randwrite.sh script
+----------------------------------------------
+#!/bin/sh
+sync
+echo 3 > /proc/sys/vm/drop_caches
+
+FILENAME=/dev/vdc
+RUNTIME=100
+BLOCKSIZE=4K
+IOENGINE=libaio
+RESULTFILE=fio-randwrite.log
+IODEPTH=32
+RAMP_TIME=5
+SIZE=100G
+
+fio --numjobs 10 --norandommap --randrepeat=0 --readwrite=randwrite --ramp_time=$RAMP_TIME --bs=$BLOCKSIZE --runtime=$RUNTIME --iodepth=$IODEPTH --filename=$FILENAME --ioengine=$IOENGINE --direct=1 --name=iops_randwrite --group_reporting | tee $RESULTFILE
+----------------------------------------------
+
+What happened after the problem appear?
+my vm will got huge IOPS drop. In my case, it will drop from 15000 IOPS to 3500 IOPS. And other thing, my host cpu will exhaust on %sy. Top output like this.
+
+Qemu Fio benchmark
+----------------------------------------------------
+Tasks: 284 total, 2 running, 282 sleeping, 0 stopped, 0 zombie
+%Cpu0 : 11.8 us, 66.7 sy, 0.0 ni, 21.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu1 : 12.7 us, 64.9 sy, 0.0 ni, 22.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu2 : 13.7 us, 64.5 sy, 0.0 ni, 21.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu3 : 13.2 us, 64.1 sy, 0.0 ni, 22.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu4 : 11.7 us, 65.4 sy, 0.0 ni, 22.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu5 : 13.2 us, 64.4 sy, 0.0 ni, 22.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu6 : 12.4 us, 65.1 sy, 0.0 ni, 22.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu7 : 13.6 us, 63.8 sy, 0.0 ni, 22.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu8 : 9.8 us, 73.0 sy, 0.0 ni, 17.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu9 : 7.8 us, 74.5 sy, 0.0 ni, 17.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu10 : 6.0 us, 81.4 sy, 0.0 ni, 6.6 id, 0.0 wa, 0.0 hi, 6.0 si, 0.0 st
+%Cpu11 : 8.4 us, 79.5 sy, 0.0 ni, 8.8 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
+%Cpu12 : 7.6 us, 80.7 sy, 0.0 ni, 7.0 id, 0.0 wa, 0.0 hi, 4.7 si, 0.0 st
+%Cpu13 : 7.4 us, 79.9 sy, 0.0 ni, 7.7 id, 0.0 wa, 0.0 hi, 5.0 si, 0.0 st
+%Cpu14 : 9.8 us, 75.4 sy, 0.0 ni, 11.4 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
+%Cpu15 : 6.7 us, 80.1 sy, 0.0 ni, 10.1 id, 0.0 wa, 0.0 hi, 3.0 si, 0.0 st
+%Cpu16 : 9.2 us, 69.2 sy, 0.0 ni, 17.5 id, 0.0 wa, 0.0 hi, 4.1 si, 0.0 st
+%Cpu17 : 9.9 us, 66.6 sy, 0.0 ni, 20.1 id, 0.0 wa, 0.0 hi, 3.4 si, 0.0 st
+%Cpu18 : 16.6 us, 49.0 sy, 0.0 ni, 34.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu19 : 16.7 us, 46.4 sy, 0.0 ni, 36.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu20 : 13.0 us, 50.8 sy, 0.0 ni, 36.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu21 : 18.9 us, 46.2 sy, 0.0 ni, 34.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu22 : 12.1 us, 52.9 sy, 0.0 ni, 35.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu23 : 15.9 us, 47.6 sy, 0.0 ni, 36.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu24 : 6.7 us, 62.0 sy, 0.0 ni, 31.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu25 : 7.6 us, 63.7 sy, 0.0 ni, 28.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu26 : 8.1 us, 75.8 sy, 0.0 ni, 16.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu27 : 6.7 us, 73.6 sy, 0.0 ni, 19.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu28 : 9.2 us, 74.3 sy, 0.0 ni, 16.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu29 : 8.2 us, 73.3 sy, 0.0 ni, 18.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu30 : 4.4 us, 73.1 sy, 0.0 ni, 22.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+%Cpu31 : 7.5 us, 69.6 sy, 0.0 ni, 22.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
+KiB Mem: 13217662+total, 3721572 used, 12845504+free, 283228 buffers
+KiB Swap: 4194300 total, 0 used, 4194300 free. 2242976 cached Mem
+
+ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
+30349 root 20 0 25.381g 499892 20640 R 2495 0.4 119:11.98 qemu-system-x86
+
+Anything I do?
+I use perf top, profile to debug the problem. It show me that something like thread deadlock problem. Any I test QEMU with kernel RBD, it work fine.
+Here are the perf top output on host.
+---------------------------------------------------------------
+ PerfTop: 12393 irqs/sec kernel:87.3% exact: 0.0% [4000Hz cycles], (all, 32 CPUs)
+-------------------------------------------------------------------------------
+
+ 75.25% [kernel] [k] _raw_spin_lock
+ 1.17% [kernel] [k] futex_wait_setup
+ 0.86% libc-2.19.so [.] malloc
+ 0.58% [kernel] [k] futex_wake
+ 0.55% libc-2.19.so [.] 0x00000000000ea96f
+ 0.41% [kernel] [k] native_write_msr_safe
+---------------------------------------------------------------
+
+ affects linux
+
+
+Since this works fine with krbd, it sounds like the bug may be in librbd. Could you install debug symbols (the librbd1-dbg package) and when this occurs, attach to the qemu process with gdb and get a backtrace of all threads (there will be a lot of them) via 'gdb -p $pid' and in gdb 'thread apply all bt'?
+
+ affects ceph
+
+
+Here are gdb oupout with librbd1-dbg and librados2-dbg.
+
+---------------------------------------------------------
+[Thread debugging using libthread_db enabled]
+Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
+0x00007ff8cf8dddff in ppoll () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 522 (Thread 0x7ff8ccb56700 (LWP 4959)):
+#0 0x00007ff8cf8e2809 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
+#1 0x00007ff8cf198012 in ?? () from /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
+#2 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#3 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 521 (Thread 0x7ff8cc355700 (LWP 4960)):
+#0 0x00007ff8cf8e2809 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
+#1 0x00007ff8cf198012 in ?? () from /usr/lib/x86_64-linux-gnu/liblttng-ust.so.0
+#2 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#3 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 520 (Thread 0x7ff8cbb54700 (LWP 4961)):
+#0 0x00007ff8cf8e2809 in syscall () from /lib/x86_64-linux-gnu/libc.so.6
+#1 0x0000000000833d00 in futex_wait (ev=0x1222e04 <rcu_call_ready_event>, val=4294967295) at util/qemu-thread-posix.c:292
+#2 0x0000000000833e8e in qemu_event_wait (ev=0x1222e04 <rcu_call_ready_event>) at util/qemu-thread-posix.c:399
+#3 0x0000000000849382 in call_rcu_thread (opaque=0x0) at util/rcu.c:250
+#4 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#5 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 519 (Thread 0x7ff8cab1c700 (LWP 4974)):
+#0 0x00007ff8cfbb508f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1 0x00007ff8d17a563b in ceph::log::Log::entry (this=0x1643a60) at log/Log.cc:353
+#2 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#3 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 518 (Thread 0x7ff8c9b8d700 (LWP 4975)):
+#0 0x00007ff8cfbb5438 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1 0x00007ff8d1535acb in WaitUntil (when=..., mutex=..., this=0x1692bd8) at ./common/Cond.h:71
+#2 WaitInterval (interval=..., mutex=..., cct=<optimized out>, this=0x1692bd8) at ./common/Cond.h:79
+#3 CephContextServiceThread::entry (this=0x1692b60) at common/ceph_context.cc:96
+#4 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#5 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+
+Thread 517 (Thread 0x7ff8c938c700 (LWP 4976)):
+#0 0x00007ff8cfbb5438 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1 0x00007ff8d15125c1 in WaitUntil (when=..., mutex=..., this=0x1694978) at common/Cond.h:71
+#2 SafeTimer::timer_thread (this=0x1694968) at common/Timer.cc:118
+#3 0x00007ff8d1512efd in SafeTimerThread::entry (this=<optimized out>) at common/Timer.cc:38
+#4 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#5 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 516 (Thread 0x7ff8c8b8b700 (LWP 4977)):
+#0 0x00007ff8cfbb508f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1 0x00007ff8d1696b70 in Wait (mutex=..., this=0x1693f50) at ./common/Cond.h:55
+#2 DispatchQueue::entry (this=0x1693ee8) at msg/simple/DispatchQueue.cc:196
+#3 0x00007ff8d16c65fd in DispatchQueue::DispatchThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:103
+#4 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#5 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 515 (Thread 0x7ff8c838a700 (LWP 4978)):
+#0 0x00007ff8cfbb508f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1 0x00007ff8d1699ba6 in Wait (mutex=..., this=0x16940f0) at ./common/Cond.h:55
+#2 DispatchQueue::run_local_delivery (this=0x1693ee8) at msg/simple/DispatchQueue.cc:114
+#3 0x00007ff8d16c66dd in DispatchQueue::LocalDeliveryThread::entry (this=<optimized out>) at msg/simple/DispatchQueue.h:117
+#4 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#5 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+
+Thread 514 (Thread 0x7ff8c7b89700 (LWP 4979)):
+#0 0x00007ff8cfbb508f in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
+#1 0x00007ff8d16bde99 in Wait (mutex=..., this=0x1694358) at ./common/Cond.h:55
+#2 SimpleMessenger::reaper_entry (this=0x1693d20) at msg/simple/SimpleMessenger.cc:211
+#3 0x00007ff8d16c6cdd in SimpleMessenger::ReaperThread::entry (this=<optimized out>) at msg/simple/SimpleMessenger.h:195
+#4 0x00007ff8cfbb10a4 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
+#5 0x00007ff8cf8e687d in clone () from /lib/x86_64-linux-gnu/libc.so.6
+........
+........
+........
+(all the log in attachment: debug.log)
+---------------------------------------------------------------------------
+
+Can you run 'perf top' against just the QEMU process? There was an email chain from nearly a year ago about tcmalloc causing extremely high '_raw_spin_lock' calls under high IOPS scenarios.
+
+Here are 'perf top -p `pgrep qemu` -a` output; I met tcmalloc problem on the osd host and fix it with a larger TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES. The perf top of tcmalloc problem is a little bit different of my problem.
+------------------------------------------------------------------
+Samples: 15M of event 'cycles', Event count (approx.): 463535858808
+ 87.68% [kernel] [k] _raw_spin_lock
+ 1.02% [kernel] [k] futex_wait_setup
+ 0.66% [kernel] [k] futex_wake
+ 0.47% libc-2.19.so [.] malloc
+ 0.42% [kernel] [k] try_to_wake_up
+ 0.22% [kernel] [k] __unqueue_futex
+ 0.22% [kernel] [k] _raw_spin_lock_irq
+ 0.18% libpthread-2.19.so [.] pthread_mutex_lock
+ 0.18% libc-2.19.so [.] 0x00000000000f4dff
+ 0.17% libc-2.19.so [.] 0x00000000000ea96f
+ 0.15% [kernel] [k] idle_cpu
+ 0.15% libpthread-2.19.so [.] __pthread_mutex_unlock_usercnt
+ 0.14% [kernel] [k] get_futex_value_locked
+ 0.13% [kernel] [k] get_futex_key_refs.isra.13
+ 0.13% libc-2.19.so [.] free
+ 0.12% [kernel] [k] _raw_spin_lock_irqsave
+ 0.11% [kernel] [k] update_curr
+ 0.11% [kernel] [k] select_task_rq_fair
+ 0.10% [kernel] [k] native_write_msr_safe
+ 0.10% [kernel] [k] futex_wait_queue_me
+ 0.10% [kernel] [k] __schedule
+ 0.09% [kernel] [k] wake_futex
+Press '?' for help on key bindings
+----------------------------------------------------------------
+
+Tcmalloc problem perf top (log from ceph bug tracer email, same with tcmalloc problem i met)
+-------------------------------------------------------------
+34.37% libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::FetchFromSpans
+ 18.06% libtcmalloc.so.4.1.2 [.] tcmalloc::ThreadCache::ReleaseToCentralCache
+ 13.76% libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::ReleaseToSpans
+ 1.45% libtcmalloc.so.4.1.2 [.] tcmalloc::CentralFreeList::RemoveRange
+-------------------------------------------------------------
+
+Some more test I have do:
+1. running qemu with TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES 256MB, still got problem
+2. prevent cpu go into C3 and C6 state, still got problem
+3. running qemu with aio=native, still got problem
+
+Any chance you can re-test with a more recent kernel on the hypervisor host? If the spin-lock was coming from user-space, I would expect futex_wait_setup and futex_wake to be much higher.
+
+Looking through old bug tickets... can you still reproduce this issue with the latest available versions? Or could we close this ticket nowadays?
+
+Since there wasn't a reply within the last two months, I'm assuming this doesn't affect QEMU anymore, thus I'm closing this ticket for QEMU now.
+