qemu-system-x86 crash (reason: use after free in socket_reconnect_timeout when reconnecting vhost-user dev)
Description of problem:
(gdb) bt
#0 0x00007f205976b78b in raise () from /usr/lib64/libc.so.6
#1 0x00007f205976cab1 in abort () from /usr/lib64/libc.so.6
#2 0x00007f205976404a in ?? () from /usr/lib64/libc.so.6
#3 0x00007f20597640c2 in __assert_fail () from /usr/lib64/libc.so.6
#4 0x00007f20594ea556 in **qemu_mutex_lock_impl**(mutex=, file=, line=)
#5 0x00007f205957a4ef in **socket_reconnect_timeout** (opaque=)
#6 0x00007f205993b68d in ?? () from /usr/lib64/libglib-2.0.so.0
#7 0x00007f205993aba4 in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#8 0x00007f20594e5d49 in glib_pollfds_poll () at /usr/src/debug/qemu-4.1.0-666.x86_64/util/main-loop.c:218
#9 0x00007f20594e5dc2 in os_host_main_loop_wait (timeout=)
#10 0x00007f20594e5f5d in main_loop_wait (nonblocking=nonblocking@entry=0)
... ...
#14 0x0000560919e13180 in main (argc=80, argv=0x7ffebc1d0598, envp=0x7ffebc1d0820)
at the moment, chr had be free by hot unplug vhost-user dev
I think the bug cause reason as following:
1. when vhost-user dev is connecting state, io-task-worker thread will try call tcp_chr_connect_client_async
again and again to reconnect.
2. if reconnect fail, io-task-worker thread will switch to main-thread to handle error, and main-thread will
call qemu_chr_socket_restart_timer again to reconnect again.
3. But, if a hot unplug operation insert to main-thread before io-task-worker switch to main-thread,
the qemu_chr_socket_restart_timer->socket_reconnect_timeout process will use the released chardev and
trigger qemu crash
in short, the primary cause of this bug is io-task-worker reconnect process and
main-thread hot unplug vhost-user-dev process in a race.
Steps to reproduce:
1. in qio_task_thread_worker func, add sleep in the following position:
task->thread->completion = g_idle_source_new();
g_source_set_callback(task->thread->completion,
qio_task_thread_result, task, NULL);
**sleep(8);**
g_source_attach(task->thread->completion,
task->thread->context);
g_source_unref(task->thread->completion);
2. kill spdk proces or dpdk process, qemu will reconnect to the disconnected vhost-user dev of spdk or dpdk
3. hot unplug the disconnected vhost-user dev when reconnect logic goto upper sleep position
4. qemu_chr_socket_restart_timer will use the chr after free, and trigger qemu crash
Additional information: