summary refs log tree commit diff stats
path: root/gitlab/issues/target_missing/host_missing/accel_missing/1577.toml
diff options
context:
space:
mode:
Diffstat (limited to 'gitlab/issues/target_missing/host_missing/accel_missing/1577.toml')
-rw-r--r--gitlab/issues/target_missing/host_missing/accel_missing/1577.toml94
1 files changed, 94 insertions, 0 deletions
diff --git a/gitlab/issues/target_missing/host_missing/accel_missing/1577.toml b/gitlab/issues/target_missing/host_missing/accel_missing/1577.toml
new file mode 100644
index 00000000..9dd11418
--- /dev/null
+++ b/gitlab/issues/target_missing/host_missing/accel_missing/1577.toml
@@ -0,0 +1,94 @@
+id = 1577
+title = "device_del return is already in the process of unplug frequently"
+state = "closed"
+created_at = "2023-04-04T09:38:53.125Z"
+closed_at = "2023-06-12T11:21:23.183Z"
+labels = []
+url = "https://gitlab.com/qemu-project/qemu/-/issues/1577"
+host-os = "debian 11"
+host-arch = "x86_64"
+qemu-version = "v6.2 to v8.0"
+guest-os = "- OS/kernel version:"
+guest-arch = "## Description of problem"
+description = """recently we update qemu 6.1.1 to qemu 7.1.0, and run into an issue with the following error:
+
+command '{ "execute": "device_del", "arguments": { "id": "virtio-diskX" } }' for VM "id" failed ({ "return": {"class": "GenericError", "desc": "Device virtio-diskX is already in the process of unplug"} }).
+
+The issue is reproducible. With a few seconds delay before hot-unplug, hot-unplug just works fine.
+
+After a few digging, we found that the commit 9323f892b39 may incur the issue.
+------------------ 
+    failover: fix unplug pending detection
+   
+    Failover needs to detect the end of the PCI unplug to start migration
+    after the VFIO card has been unplugged.
+   
+    To do that, a flag is set in pcie_cap_slot_unplug_request_cb() and reset in
+    pcie_unplug_device().
+   
+    But since
+        17858a169508 ("hw/acpi/ich9: Set ACPI PCI hot-plug as default on Q35")
+    we have switched to ACPI unplug and these functions are not called anymore
+    and the flag not set. So failover migration is not able to detect if card
+    is really unplugged and acts as it's done as soon as it's started. So it
+    doesn't wait the end of the unplug to start the migration. We don't see any
+    problem when we test that because ACPI unplug is faster than PCIe native
+    hotplug and when the migration really starts the unplug operation is
+    already done.
+   
+    See c000a9bd06ea ("pci: mark device having guest unplug request pending")
+        a99c4da9fc2a ("pci: mark devices partially unplugged")
+   
+    Signed-off-by: Laurent Vivier <lvivier@redhat.com>
+    Reviewed-by: Ani Sinha <ani@anisinha.ca>
+    Message-Id: <20211118133225.324937-4-lvivier@redhat.com>
+    Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
+    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
+------------------  
+The purpose is for detecting the end of the PCI device hot-unplug. However, we feel the error confusing. How is it possible that a disk "is already in the process of unplug" during the first hot-unplug attempt? So far as I know, the issue was also encountered by libvirt, but they simply ignored it:
+
+    https://bugzilla.redhat.com/show_bug.cgi?id=1878659
+   
+Hence, a question is: should we have the line below in  acpi_pcihp_device_unplug_request_cb()?
+
+   pdev->qdev.pending_deleted_event = true;
+   
+It would be great if you as the author could give us a few hints.
+
+Thank you very much for your reply!
+
+Sincerely,
+
+Yu Zhang @ Compute Platform IONOS
+
+
+The issue is reproducible in our own stack, which is not quite easy to describe in a few command lines. We simplified it a bit by a script instead. Although it's not able to reproduce, it could be somewhat helpful to understand the issue.
+ 
+```
+#!/bin/bash
+
+HOME=~
+QEMU=$HOME/qemu/bin/qemu-system-x86_64
+DISK1=$HOME/img/disk1.qcow2
+DISK4=$HOME/img/disk4.qcow2
+DISK5=$HOME/img/disk5.qcow2
+
+$QEMU \\
+  -cpu host -enable-kvm -m 2048 -smp 2 \\
+  -object iothread,id=iothread1 \\
+  -drive file=$DISK1,if=none,id=drive-virtio-disk1,format=qcow2,snapshot=off,discard=on,cache=none \\
+  -device virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk1,iothread=iothread1,num-queues=1,discard=on,id=virtio-disk1 \\
+  -object iothread,id=iothread4 \\
+  -drive file=$DISK4,if=none,id=drive-virtio-disk4,format=qcow2,snapshot=off,discard=on,cache=none \\
+  -device virtio-blk-pci,bus=pci.0,addr=0x5,drive=drive-virtio-disk4,iothread=iothread4,num-queues=1,discard=on,id=virtio-disk4 \\
+  -object iothread,id=iothread5 \\
+  -drive file=$DISK5,if=none,id=drive-virtio-disk5,format=qcow2,snapshot=off,discard=on,cache=none \\
+  -device virtio-blk-pci,bus=pci.0,addr=0x6,drive=drive-virtio-disk5,iothread=iothread5,num-queues=1,discard=on,id=virtio-disk5 \\
+  -qmp unix:./qmp-sock,server,nowait &
+
+sleep 5
+
+echo '{"execute":"qmp_capabilities"}{"execute": "device_del","arguments": { "id": "virtio-disk5"}}{"execute": "query-block"}' | nc -U -w 1 ./qmp-sock
+echo '{"execute":"qmp_capabilities"}{"execute": "device_del","arguments": { "id": "virtio-disk5"}}{"execute": "query-block"}' | nc -U -w 1 ./qmp-sock```"""
+reproduce = "n/a"
+additional = """Possible workaround: https://lore.kernel.org/qemu-devel/20230403131833-mutt-send-email-mst@kernel.org/T/#t"""