diff options
Diffstat (limited to 'gitlab/issues/target_missing/host_missing/accel_missing/1195.toml')
| -rw-r--r-- | gitlab/issues/target_missing/host_missing/accel_missing/1195.toml | 28 |
1 files changed, 28 insertions, 0 deletions
diff --git a/gitlab/issues/target_missing/host_missing/accel_missing/1195.toml b/gitlab/issues/target_missing/host_missing/accel_missing/1195.toml new file mode 100644 index 00000000..ec748ab5 --- /dev/null +++ b/gitlab/issues/target_missing/host_missing/accel_missing/1195.toml @@ -0,0 +1,28 @@ +id = 1195 +title = "Race condition during QEMU exit cleanup can lead to deadlock" +state = "opened" +created_at = "2022-09-05T18:10:15.519Z" +closed_at = "n/a" +labels = ["TCG plugins"] +url = "https://gitlab.com/qemu-project/qemu/-/issues/1195" +host-os = "n/a" +host-arch = "n/a" +qemu-version = "master" +guest-os = "n/a" +guest-arch = "n/a" +description = """During the cleanup phase of QEMU exiting, there is a small race condition window that can lead QEMU to lock up completely: +In the main QEMU thread, during the exit, the thread will execute the 'qemu_cleanup' function, which calls 'do_vm_stop', which calls 'pause_all_vcpus'. This method tries to (as the name suggests) stop/pause all the vcpu threads. At the same time, the vcpu thread might have just existed it's main mttcg exec loop, which means it will enter 'qemu_wait_io_event'. At this point, the following race condition can occur: +- vcpu_thread - cpus.c:416 <= enters qemu_wait_io_event +- shutdown_thread - cpus.c:555 <= enters pause_all_vcpus +- vcpu_thread - cpus.c:418 <= cpu_thread_is_idle returns true, cpu->stop not set yet +- shutdown_thread - cpus.c:560/561 <= sets cpu->stop and kicks the vcpu, but it's not waiting on cpu->halt_cond yet, so nothing happens +- vcpu_thread - cpus.c:423 <= starts waiting on cpu->halt_cond +- shutdown_thread - cpus.c:570 <= not all vcpus paused, so enters while loop +- shutdown_thread - cpus.c:571 <= starts waiting on qemu_pause_cond +- **deadlock** + +In my case, my plugin registers qemu_plugin_vcpu_idle_cb, so the race window is extended significantly in the vcpu thread (cpus.c:421) but I believe it can happen with the smaller race window as well. + +Note that this explanation is just based on my understanding of the code, and the final state of QEMU during the deadlock after I attached: The main thread (thread 1) was waiting on qemu_pause_cond in pause_all_vcpus, and the vcpu was waiting on cpu->halt_cond in qemu_wait_io_event, with no one else to wake either of them up. (This was following an exit that was triggered by a timeout signal)""" +reproduce = """This is a race condition, so I don't have a reliable reproducer.""" +additional = "n/a" |