diff options
Diffstat (limited to 'results/classifier/009/all')
| -rw-r--r-- | results/classifier/009/all/16056596 | 108 | ||||
| -rw-r--r-- | results/classifier/009/all/17743720 | 781 | ||||
| -rw-r--r-- | results/classifier/009/all/21221931 | 338 | ||||
| -rw-r--r-- | results/classifier/009/all/23448582 | 275 | ||||
| -rw-r--r-- | results/classifier/009/all/51610399 | 318 | ||||
| -rw-r--r-- | results/classifier/009/all/59540920 | 386 | ||||
| -rw-r--r-- | results/classifier/009/all/80570214 | 410 | ||||
| -rw-r--r-- | results/classifier/009/all/88225572 | 2910 | ||||
| -rw-r--r-- | results/classifier/009/all/92957605 | 428 | ||||
| -rw-r--r-- | results/classifier/009/all/95154278 | 165 | ||||
| -rw-r--r-- | results/classifier/009/all/96782458 | 1009 |
11 files changed, 7128 insertions, 0 deletions
diff --git a/results/classifier/009/all/16056596 b/results/classifier/009/all/16056596 new file mode 100644 index 000000000..e6f8e1f9c --- /dev/null +++ b/results/classifier/009/all/16056596 @@ -0,0 +1,108 @@ +permissions: 0.985 +other: 0.980 +semantic: 0.979 +debug: 0.978 +files: 0.975 +device: 0.973 +boot: 0.971 +graphic: 0.970 +performance: 0.970 +PID: 0.961 +socket: 0.952 +vnc: 0.946 +network: 0.940 +KVM: 0.934 + +[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()" + +Bug Description: +Encountering a boot failure when launching a KVM guest with +'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor +crashes. +Reproduction Steps: +# qemu-system-ppc64 --version +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +pseries,accel=kvm \ +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ + -device virtio-scsi-pci,id=scsi \ +-drive +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +\ +-device scsi-hd,drive=drive0,bus=scsi.0 \ + -netdev bridge,id=net0,br=virbr0 \ + -device virtio-net-pci,netdev=net0 \ + -serial pty \ + -device virtio-balloon-pci \ + -cpu host +QEMU 9.2.50 monitor - type 'help' for more information +char device redirected to /dev/pts/2 (label serial0) +(qemu) +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +unavailable: IRQ_XIVE capability must be present for KVM +Falling back to kernel-irqchip=off +** Qemu Hang + +(In another ssh session) +# screen /dev/pts/2 +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +15:20:17 UTC 2024 +Detected machine type: 0000000000000101 +command line: +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +Calling ibm,client-architecture-support... done +memory layout at init: + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 0000000008200000 + alloc_top : 0000000030000000 + alloc_top_hi : 0000000800000000 + rmo_top : 0000000030000000 + ram_top : 0000000800000000 +instantiating rtas at 0x000000002fff0000... done +prom_hold_cpus: skipped +copying OF device tree... +Building dt strings... +Building dt structure... +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +Quiescing Open Firmware ... +Booting Linux via __start() @ 0x0000000000440000 ... +** Guest Console Hang + + +Git Bisect: +Performing git bisect points to the following patch: +# git bisect bad +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +Author: Nicholas Piggin <npiggin@gmail.com> +Date: Thu Dec 19 13:40:31 2024 +1000 + + target/ppc: fix timebase register reset state +(H)DEC and PURR get reset before icount does, which causes them to +be +skewed and not match the init state. This can cause replay to not +match the recorded trace exactly. For DEC and HDEC this is usually +not +noticable since they tend to get programmed before affecting the + target machine. PURR has been observed to cause replay bugs when + running Linux. + + Fix this by resetting using a time of 0. + + Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> + Signed-off-by: Nicholas Piggin <npiggin@gmail.com> + + hw/ppc/ppc.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + + +Reverting the patch helps boot the guest. +Thanks, +Misbah Anjum N + diff --git a/results/classifier/009/all/17743720 b/results/classifier/009/all/17743720 new file mode 100644 index 000000000..e4ab63d55 --- /dev/null +++ b/results/classifier/009/all/17743720 @@ -0,0 +1,781 @@ +other: 0.984 +permissions: 0.981 +debug: 0.974 +graphic: 0.972 +device: 0.971 +performance: 0.965 +semantic: 0.962 +files: 0.961 +PID: 0.955 +socket: 0.954 +vnc: 0.945 +boot: 0.945 +network: 0.944 +KVM: 0.933 + +[Qemu-devel] [BUG] living migrate vm pause forever + +Sometimes, living migrate vm pause forever, migrate job stop, but very small +probability, I canât reproduce. +qemu wait semaphore from libvirt send migrate continue, however libvirt wait +semaphore from qemu send vm pause. + +follow stack: +qemu: +Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): +#0 0x00007f504b84d670 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 +#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at +qemu-2.12/util/qemu-thread-posix.c:322 +#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, +current_active_state=0x7f50445f2ae4, new_state=10) + at qemu-2.12/migration/migration.c:2106 +#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at +qemu-2.12/migration/migration.c:2137 +#4 migration_iteration_run (s=0x5574ef692f50) at +qemu-2.12/migration/migration.c:2311 +#5 migration_thread (opaque=0x5574ef692f50) +atqemu-2.12/migration/migration.c:2415 +#6 0x00007f504b847184 in start_thread () from +/lib/x86_64-linux-gnu/libpthread.so.0 +#7 0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6 + +libvirt: +Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): +#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from +/lib/x86_64-linux-gnu/libpthread.so.0 +#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at +../../../src/util/virthread.c:252 +#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at +../../../src/conf/domain_conf.c:3303 +#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, +vm=0x7fdbc4002f20, persist_xml=0x0, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss +</hostname>\n +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +migParams=0x7fdb82ffc900) + at ../../../src/qemu/qemu_migration.c:3937 +#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, +vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 +"tcp://172.16.202.17:49152", + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hos---Type <return> to continue, or q <return> to quit--- +tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, +migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) + at ../../../src/qemu/qemu_migration.c:4118 +#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +migParams=0x7fdb82ffc900, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + resource=0) at ../../../src/qemu/qemu_migration.c:5030 +#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, +dconnuri=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, +compression=0x7fdb78007990, + migParams=0x7fdb82ffc900, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +flags=777, + dname=0x0, resource=0, v3proto=true) at +../../../src/qemu/qemu_migration.c:5124 +#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, +xmlin=0x0, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +dconnuri=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +resource=0) at ../../../src/qemu/qemu_driver.c:12996 +#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, +xmlin=0x0, + cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n + <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +dconnuri=0x0, + uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +bandwidth=0) at ../../../src/libvirt-domain.c:4698 +#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +rerr=0x7fdb82ffcbc0, + args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528 +#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +rerr=0x7fdb82ffcbc0, + args=0x7fdb7800b220, ret=0x7fdb78021e90) at +../../../daemon/remote_dispatch.h:7944 +#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall (prog=0x55d13af98b50, +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) + at ../../../src/rpc/virnetserverprogram.c:436 +#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) + at ../../../src/rpc/virnetserverprogram.c:307 +#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, +client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) + at ../../../src/rpc/virnetserver.c:148 +------------------------------------------------------------------------------------------------------------------------------------- +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +é®ä»¶ï¼ +This e-mail and its attachments contain confidential information from New H3C, +which is +intended only for the person or entity whose address is listed above. Any use +of the +information contained herein in any way (including, but not limited to, total +or partial +disclosure, reproduction, or dissemination) by persons other than the intended +recipient(s) is prohibited. If you receive this e-mail in error, please notify +the sender +by phone or email immediately and delete it! + +* Yuchen (address@hidden) wrote: +> +Sometimes, living migrate vm pause forever, migrate job stop, but very small +> +probability, I canât reproduce. +> +qemu wait semaphore from libvirt send migrate continue, however libvirt wait +> +semaphore from qemu send vm pause. +Hi, + I've copied in Jiri Denemark from libvirt. +Can you confirm exactly which qemu and libvirt versions you're using +please. + +> +follow stack: +> +qemu: +> +Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): +> +#0 0x00007f504b84d670 in sem_wait () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at +> +qemu-2.12/util/qemu-thread-posix.c:322 +> +#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, +> +current_active_state=0x7f50445f2ae4, new_state=10) +> +at qemu-2.12/migration/migration.c:2106 +> +#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2137 +> +#4 migration_iteration_run (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2311 +> +#5 migration_thread (opaque=0x5574ef692f50) +> +atqemu-2.12/migration/migration.c:2415 +> +#6 0x00007f504b847184 in start_thread () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#7 0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6 +In migration_maybe_pause we have: + + migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); + qemu_sem_wait(&s->pause_sem); + migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); + +the line numbers don't match my 2.12.0 checkout; so I guess that it's +that qemu_sem_wait it's stuck at. + +QEMU must have sent the switch to PRE_SWITCHOVER and that should have +sent an event to libvirt, and libvirt should notice that - I'm +not sure how to tell whether libvirt has seen that event yet or not? + +Dave + +> +libvirt: +> +Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): +> +#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at +> +../../../src/util/virthread.c:252 +> +#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at +> +../../../src/conf/domain_conf.c:3303 +> +#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss +> +</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:3937 +> +#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 +> +"tcp://172.16.202.17:49152", +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n <hos---Type <return> to continue, or q <return> +> +to quit--- +> +tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, +> +migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:4118 +> +#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0) at ../../../src/qemu/qemu_migration.c:5030 +> +#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, +> +compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +dname=0x0, resource=0, v3proto=true) at +> +../../../src/qemu/qemu_migration.c:5124 +> +#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +> +resource=0) at ../../../src/qemu/qemu_driver.c:12996 +> +#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, +> +bandwidth=0) at ../../../src/libvirt-domain.c:4698 +> +#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528 +> +#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at +> +../../../daemon/remote_dispatch.h:7944 +> +#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall +> +(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0, +> +msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:436 +> +#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, +> +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:307 +> +#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, +> +client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserver.c:148 +> +------------------------------------------------------------------------------------------------------------------------------------- +> +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +> +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +> +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +> +é®ä»¶ï¼ +> +This e-mail and its attachments contain confidential information from New +> +H3C, which is +> +intended only for the person or entity whose address is listed above. Any use +> +of the +> +information contained herein in any way (including, but not limited to, total +> +or partial +> +disclosure, reproduction, or dissemination) by persons other than the intended +> +recipient(s) is prohibited. If you receive this e-mail in error, please +> +notify the sender +> +by phone or email immediately and delete it! +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +In migration_maybe_pause we have: + + migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); + qemu_sem_wait(&s->pause_sem); + migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); + +the line numbers don't match my 2.12.0 checkout; so I guess that it's that +qemu_sem_wait it's stuck at. + +QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an +event to libvirt, and libvirt should notice that - I'm not sure how to tell +whether libvirt has seen that event yet or not? + + +Thank you for your attention. +Yes, you are right, QEMU wait semaphore in this place. +I use qemu-2.12.1, libvirt-4.0.0. +Because I added some debug code, so the line numbers doesn't match open qemu + +-----é®ä»¶åä»¶----- +å件人: Dr. David Alan Gilbert [ +mailto:address@hidden +] +åéæ¶é´: 2019å¹´8æ21æ¥ 19:13 +æ¶ä»¶äºº: yuchen (Cloud) <address@hidden>; address@hidden +æé: address@hidden +主é¢: Re: [Qemu-devel] [BUG] living migrate vm pause forever + +* Yuchen (address@hidden) wrote: +> +Sometimes, living migrate vm pause forever, migrate job stop, but very small +> +probability, I canât reproduce. +> +qemu wait semaphore from libvirt send migrate continue, however libvirt wait +> +semaphore from qemu send vm pause. +Hi, + I've copied in Jiri Denemark from libvirt. +Can you confirm exactly which qemu and libvirt versions you're using please. + +> +follow stack: +> +qemu: +> +Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): +> +#0 0x00007f504b84d670 in sem_wait () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) +> +at qemu-2.12/util/qemu-thread-posix.c:322 +> +#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, +> +current_active_state=0x7f50445f2ae4, new_state=10) +> +at qemu-2.12/migration/migration.c:2106 +> +#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2137 +> +#4 migration_iteration_run (s=0x5574ef692f50) at +> +qemu-2.12/migration/migration.c:2311 +> +#5 migration_thread (opaque=0x5574ef692f50) +> +atqemu-2.12/migration/migration.c:2415 +> +#6 0x00007f504b847184 in start_thread () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#7 0x00007f504b574bed in clone () from +> +/lib/x86_64-linux-gnu/libc.so.6 +In migration_maybe_pause we have: + + migrate_set_state(&s->state, *current_active_state, + MIGRATION_STATUS_PRE_SWITCHOVER); + qemu_sem_wait(&s->pause_sem); + migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, + new_state); + +the line numbers don't match my 2.12.0 checkout; so I guess that it's that +qemu_sem_wait it's stuck at. + +QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an +event to libvirt, and libvirt should notice that - I'm not sure how to tell +whether libvirt has seen that event yet or not? + +Dave + +> +libvirt: +> +Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): +> +#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from +> +/lib/x86_64-linux-gnu/libpthread.so.0 +> +#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, +> +m=0x7fdbc4002f30) at ../../../src/util/virthread.c:252 +> +#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at +> +../../../src/conf/domain_conf.c:3303 +> +#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss +> +</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:3937 +> +#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, +> +vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 +> +"tcp://172.16.202.17:49152", +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n +> +<name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n <hos---Type <return> to continue, or q +> +<return> to quit--- +> +tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra".. +> +tuuid>., cookieinlen=207, cookieout=0x7fdb82ffcad0, +> +tuuid>cookieoutlen=0x7fdb82ffcac8, flags=777, +> +resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, +> +migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) +> +at ../../../src/qemu/qemu_migration.c:4118 +> +#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +resource=0) at ../../../src/qemu/qemu_migration.c:5030 +> +#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, +> +conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, +> +listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, +> +compression=0x7fdb78007990, +> +migParams=0x7fdb82ffc900, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +flags=777, +> +dname=0x0, resource=0, v3proto=true) at +> +../../../src/qemu/qemu_migration.c:5124 +> +#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, +> +dname=0x0, resource=0) at ../../../src/qemu/qemu_driver.c:12996 +> +#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, +> +xmlin=0x0, +> +cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n +> +<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n +> +<hostname>mss</hostname>\n +> +<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., +> +cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, +> +dconnuri=0x0, +> +uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, +> +dname=0x0, bandwidth=0) at ../../../src/libvirt-domain.c:4698 +> +#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at +> +../../../daemon/remote.c:4528 +> +#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper +> +(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, +> +rerr=0x7fdb82ffcbc0, +> +args=0x7fdb7800b220, ret=0x7fdb78021e90) at +> +../../../daemon/remote_dispatch.h:7944 +> +#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall +> +(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0, +> +msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:436 +> +#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, +> +server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserverprogram.c:307 +> +#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, +> +client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) +> +at ../../../src/rpc/virnetserver.c:148 +> +---------------------------------------------------------------------- +> +--------------------------------------------------------------- +> +æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº +> +ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã +> +ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ +> +é®ä»¶ï¼ +> +This e-mail and its attachments contain confidential information from +> +New H3C, which is intended only for the person or entity whose address +> +is listed above. Any use of the information contained herein in any +> +way (including, but not limited to, total or partial disclosure, +> +reproduction, or dissemination) by persons other than the intended +> +recipient(s) is prohibited. If you receive this e-mail in error, +> +please notify the sender by phone or email immediately and delete it! +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + diff --git a/results/classifier/009/all/21221931 b/results/classifier/009/all/21221931 new file mode 100644 index 000000000..a925c3002 --- /dev/null +++ b/results/classifier/009/all/21221931 @@ -0,0 +1,338 @@ +permissions: 0.982 +other: 0.979 +network: 0.976 +device: 0.971 +debug: 0.971 +files: 0.967 +semantic: 0.967 +performance: 0.966 +socket: 0.957 +graphic: 0.948 +boot: 0.947 +PID: 0.945 +vnc: 0.944 +KVM: 0.913 + +[BUG] qemu git error with virgl + +Hello, + +i can't start any system if i use virgl. I get the following error: +qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +`con->gl' failed. +./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +intel-hda,id=sound0,msi=on -device +hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +-device usb-tablet,bus=xhci.0 -net +nic,macaddr=52:54:00:12:34:62,model=e1000 -net +tap,ifname=$INTERFACE,script=no,downscript=no -drive +file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads +Set 'tap3' nonpersistent + +i have bicected the issue: + +towo:Defiant> git bisect good +b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +Author: Paolo Bonzini <pbonzini@redhat.com> +Date:  Tue Oct 27 08:44:23 2020 -0400 + +   vl: remove separate preconfig main_loop +   Move post-preconfig initialization to the x-exit-preconfig. If +preconfig +   is not requested, just exit preconfig mode immediately with the QMP +   command. + +   As a result, the preconfig loop will run with accel_setup_post +   and os_setup_post restrictions (xen_restrict, chroot, etc.) +   already done. + +   Reviewed-by: Igor Mammedov <imammedo@redhat.com> +   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> + + include/sysemu/runstate.h | 1 - + monitor/qmp-cmds.c       | 9 ----- + softmmu/vl.c             | 95 +++++++++++++++++++++--------------------------- + 3 files changed, 41 insertions(+), 64 deletions(-) + +Regards, + +Torsten Wohlfarth + +Cc'ing Gerd + patch author/reviewer. + +On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: +> +Hello, +> +> +i can't start any system if i use virgl. I get the following error: +> +> +qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +> +`con->gl' failed. +> +./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +> +-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +> +virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +> +intel-hda,id=sound0,msi=on -device +> +hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +> +-device usb-tablet,bus=xhci.0 -net +> +nic,macaddr=52:54:00:12:34:62,model=e1000 -net +> +tap,ifname=$INTERFACE,script=no,downscript=no -drive +> +file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads +> +> +Set 'tap3' nonpersistent +> +> +i have bicected the issue: +> +> +towo:Defiant> git bisect good +> +b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +> +commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +> +Author: Paolo Bonzini <pbonzini@redhat.com> +> +Date:  Tue Oct 27 08:44:23 2020 -0400 +> +> +   vl: remove separate preconfig main_loop +> +> +   Move post-preconfig initialization to the x-exit-preconfig. If +> +preconfig +> +   is not requested, just exit preconfig mode immediately with the QMP +> +   command. +> +> +   As a result, the preconfig loop will run with accel_setup_post +> +   and os_setup_post restrictions (xen_restrict, chroot, etc.) +> +   already done. +> +> +   Reviewed-by: Igor Mammedov <imammedo@redhat.com> +> +   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> +> +> + include/sysemu/runstate.h | 1 - +> + monitor/qmp-cmds.c       | 9 ----- +> + softmmu/vl.c             | 95 +> +++++++++++++++++++++--------------------------- +> + 3 files changed, 41 insertions(+), 64 deletions(-) +> +> +Regards, +> +> +Torsten Wohlfarth +> +> +> + +On Sun, 3 Jan 2021 18:28:11 +0100 +Philippe Mathieu-Daudé <philmd@redhat.com> wrote: + +> +Cc'ing Gerd + patch author/reviewer. +> +> +On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: +> +> Hello, +> +> +> +> i can't start any system if i use virgl. I get the following error: +> +> +> +> qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +> +> `con->gl' failed. +Does following fix issue: + [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration + +> +> ./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +> +> -smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +> +> virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +> +> intel-hda,id=sound0,msi=on -device +> +> hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +> +> -device usb-tablet,bus=xhci.0 -net +> +> nic,macaddr=52:54:00:12:34:62,model=e1000 -net +> +> tap,ifname=$INTERFACE,script=no,downscript=no -drive +> +> file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads +> +> +> +> Set 'tap3' nonpersistent +> +> +> +> i have bicected the issue: +> +> +> +> towo:Defiant> git bisect good +> +> b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +> +> commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +> +> Author: Paolo Bonzini <pbonzini@redhat.com> +> +> Date:  Tue Oct 27 08:44:23 2020 -0400 +> +> +> +>    vl: remove separate preconfig main_loop +> +> +> +>    Move post-preconfig initialization to the x-exit-preconfig. If +> +> preconfig +> +>    is not requested, just exit preconfig mode immediately with the QMP +> +>    command. +> +> +> +>    As a result, the preconfig loop will run with accel_setup_post +> +>    and os_setup_post restrictions (xen_restrict, chroot, etc.) +> +>    already done. +> +> +> +>    Reviewed-by: Igor Mammedov <imammedo@redhat.com> +> +>    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> +> +> +> +>  include/sysemu/runstate.h | 1 - +> +>  monitor/qmp-cmds.c       | 9 ----- +> +>  softmmu/vl.c             | 95 +> +> ++++++++++++++++++++--------------------------- +> +>  3 files changed, 41 insertions(+), 64 deletions(-) +> +> +> +> Regards, +> +> +> +> Torsten Wohlfarth +> +> +> +> +> +> +> +> + +Hi Igor, + +yes, that fixes my issue. + +Regards, Torsten + +Am 04.01.21 um 19:50 schrieb Igor Mammedov: +On Sun, 3 Jan 2021 18:28:11 +0100 +Philippe Mathieu-Daudé <philmd@redhat.com> wrote: +Cc'ing Gerd + patch author/reviewer. + +On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: +Hello, + +i can't start any system if i use virgl. I get the following error: + +qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion +`con->gl' failed. +Does following fix issue: + [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration +./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 +-smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device +virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device +intel-hda,id=sound0,msi=on -device +hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci +-device usb-tablet,bus=xhci.0 -net +nic,macaddr=52:54:00:12:34:62,model=e1000 -net +tap,ifname=$INTERFACE,script=no,downscript=no -drive +file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads + +Set 'tap3' nonpersistent + +i have bicected the issue: +towo:Defiant> git bisect good +b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit +commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 +Author: Paolo Bonzini <pbonzini@redhat.com> +Date:  Tue Oct 27 08:44:23 2020 -0400 + +    vl: remove separate preconfig main_loop + +    Move post-preconfig initialization to the x-exit-preconfig. If +preconfig +    is not requested, just exit preconfig mode immediately with the QMP +    command. + +    As a result, the preconfig loop will run with accel_setup_post +    and os_setup_post restrictions (xen_restrict, chroot, etc.) +    already done. + +    Reviewed-by: Igor Mammedov <imammedo@redhat.com> +    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> + +  include/sysemu/runstate.h | 1 - +  monitor/qmp-cmds.c       | 9 ----- +  softmmu/vl.c             | 95 +++++++++++++++++++++--------------------------- +  3 files changed, 41 insertions(+), 64 deletions(-) + +Regards, + +Torsten Wohlfarth + diff --git a/results/classifier/009/all/23448582 b/results/classifier/009/all/23448582 new file mode 100644 index 000000000..4cb453f2e --- /dev/null +++ b/results/classifier/009/all/23448582 @@ -0,0 +1,275 @@ +other: 0.990 +debug: 0.989 +permissions: 0.988 +semantic: 0.987 +graphic: 0.987 +performance: 0.985 +PID: 0.983 +socket: 0.982 +files: 0.979 +device: 0.979 +network: 0.973 +vnc: 0.973 +boot: 0.967 +KVM: 0.958 + +[BUG REPORT] cxl process in infinity loop + +Hi, all + +When I did the cxl memory hot-plug test on QEMU, I accidentally connected +two memdev to the same downstream port, the command like below: + +> +-object memory-backend-ram,size=262144k,share=on,id=vmem0 \ +> +-object memory-backend-ram,size=262144k,share=on,id=vmem1 \ +> +-device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +-device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> +-device cxl-upstream,bus=root_port0,id=us0 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ +> +-device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ +same downstream port but has different slot! + +> +-device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ +> +-device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ +> +-M +> +cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k +> +\ +There is no error occurred when vm start, but when I executed the âcxl listâ +command to view +the CXL objects info, the process can not end properly. + +Then I used strace to trace the process, I found that the process is in +infinity loop: +# strace cxl list +...... +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +write(3, "1\n\0", 3) = 3 +close(3) = 0 +access("/run/udev/queue", F_OK) = 0 + +[Environment]: +linux: V6.10-rc3 +QEMU: V9.0.0 +ndctl: v79 + +I know this is because of the wrong use of the QEMU command, but I think we +should +be aware of this error in one of the QEMU, OS or ndctl side at least. + +Thanks +Xingtao + +On Tue, 2 Jul 2024 00:30:06 +0000 +"Xingtao Yao (Fujitsu)" <yaoxt.fnst@fujitsu.com> wrote: + +> +Hi, all +> +> +When I did the cxl memory hot-plug test on QEMU, I accidentally connected +> +two memdev to the same downstream port, the command like below: +> +> +> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \ +> +> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \ +> +> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ +> +> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ +> +> -device cxl-upstream,bus=root_port0,id=us0 \ +> +> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ +> +> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ +> +same downstream port but has different slot! +> +> +> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ +> +> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ +> +> -M +> +> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k +> +> \ +> +> +There is no error occurred when vm start, but when I executed the âcxl listâ +> +command to view +> +the CXL objects info, the process can not end properly. +I'd be happy to look preventing this on QEMU side if you send one, +but in general there are are lots of ways to shoot yourself in the +foot with CXL and PCI device emulation in QEMU so I'm not going +to rush to solve this specific one. + +Likewise, some hardening in kernel / userspace probably makes sense but +this is a non compliant switch so priority of a fix is probably fairly low. + +Jonathan + +> +> +Then I used strace to trace the process, I found that the process is in +> +infinity loop: +> +# strace cxl list +> +...... +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 +> +openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 +> +write(3, "1\n\0", 3) = 3 +> +close(3) = 0 +> +access("/run/udev/queue", F_OK) = 0 +> +> +[Environment]: +> +linux: V6.10-rc3 +> +QEMU: V9.0.0 +> +ndctl: v79 +> +> +I know this is because of the wrong use of the QEMU command, but I think we +> +should +> +be aware of this error in one of the QEMU, OS or ndctl side at least. +> +> +Thanks +> +Xingtao + diff --git a/results/classifier/009/all/51610399 b/results/classifier/009/all/51610399 new file mode 100644 index 000000000..2e420e72d --- /dev/null +++ b/results/classifier/009/all/51610399 @@ -0,0 +1,318 @@ +permissions: 0.988 +debug: 0.986 +boot: 0.986 +graphic: 0.986 +other: 0.985 +semantic: 0.984 +device: 0.984 +performance: 0.983 +files: 0.981 +PID: 0.978 +socket: 0.978 +KVM: 0.975 +vnc: 0.974 +network: 0.973 + +[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()” + +Bug Description: +Encountering a boot failure when launching a KVM guest with +qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor +crashes. +Reproduction Steps: +# qemu-system-ppc64 --version +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +pseries,accel=kvm \ +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ + -device virtio-scsi-pci,id=scsi \ +-drive +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +\ +-device scsi-hd,drive=drive0,bus=scsi.0 \ + -netdev bridge,id=net0,br=virbr0 \ + -device virtio-net-pci,netdev=net0 \ + -serial pty \ + -device virtio-balloon-pci \ + -cpu host +QEMU 9.2.50 monitor - type 'help' for more information +char device redirected to /dev/pts/2 (label serial0) +(qemu) +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +unavailable: IRQ_XIVE capability must be present for KVM +Falling back to kernel-irqchip=off +** Qemu Hang + +(In another ssh session) +# screen /dev/pts/2 +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +15:20:17 UTC 2024 +Detected machine type: 0000000000000101 +command line: +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +Calling ibm,client-architecture-support... done +memory layout at init: + memory_limit : 0000000000000000 (16 MB aligned) + alloc_bottom : 0000000008200000 + alloc_top : 0000000030000000 + alloc_top_hi : 0000000800000000 + rmo_top : 0000000030000000 + ram_top : 0000000800000000 +instantiating rtas at 0x000000002fff0000... done +prom_hold_cpus: skipped +copying OF device tree... +Building dt strings... +Building dt structure... +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +Quiescing Open Firmware ... +Booting Linux via __start() @ 0x0000000000440000 ... +** Guest Console Hang + + +Git Bisect: +Performing git bisect points to the following patch: +# git bisect bad +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +Author: Nicholas Piggin <npiggin@gmail.com> +Date: Thu Dec 19 13:40:31 2024 +1000 + + target/ppc: fix timebase register reset state +(H)DEC and PURR get reset before icount does, which causes them to +be +skewed and not match the init state. This can cause replay to not +match the recorded trace exactly. For DEC and HDEC this is usually +not +noticable since they tend to get programmed before affecting the + target machine. PURR has been observed to cause replay bugs when + running Linux. + + Fix this by resetting using a time of 0. + + Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> + Signed-off-by: Nicholas Piggin <npiggin@gmail.com> + + hw/ppc/ppc.c | 11 ++++++++--- + 1 file changed, 8 insertions(+), 3 deletions(-) + + +Reverting the patch helps boot the guest. +Thanks, +Misbah Anjum N + +Thanks for the report. + +Tricky problem. A secondary CPU is hanging before it is started by the +primary via rtas call. + +That secondary keeps calling kvm_cpu_exec(), which keeps exiting out +early with EXCP_HLT because kvm_arch_process_async_events() returns +true because that cpu has ->halted=1. That just goes around he run +loop because there is an interrupt pending (DEC). + +So it never runs. It also never releases the BQL, and another CPU, +the primary which is actually supposed to be running, is stuck in +spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL. + +This patch just exposes the bug I think, by causing the interrupt. +although I'm not quite sure why it's okay previously (-ve decrementer +values should be causing a timer exception too). The timer exception +should not be taken as an interrupt by those secondary CPUs, and it +doesn't because it is masked, until set_all_lpcrs sets an LPCR value +that enables powersave wakeup on decrementer interrupt. + +The start_powered_off sate just sets ->halted, which makes it look +like a powersaving state. Logically I think it's not the same thing +as far as spapr goes. I don't know why start_powered_off only sets +->halted, and not ->stop/stopped as well. + +Not sure how best to solve it cleanly. I'll send a revert if I can't +get something working soon. + +Thanks, +Nick + +On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote: +> +Bug Description: +> +Encountering a boot failure when launching a KVM guest with +> +qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor +> +crashes. +> +> +> +Reproduction Steps: +> +# qemu-system-ppc64 --version +> +QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) +> +Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers +> +> +# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine +> +pseries,accel=kvm \ +> +-m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ +> +-device virtio-scsi-pci,id=scsi \ +> +-drive +> +file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 +> +> +\ +> +-device scsi-hd,drive=drive0,bus=scsi.0 \ +> +-netdev bridge,id=net0,br=virbr0 \ +> +-device virtio-net-pci,netdev=net0 \ +> +-serial pty \ +> +-device virtio-balloon-pci \ +> +-cpu host +> +QEMU 9.2.50 monitor - type 'help' for more information +> +char device redirected to /dev/pts/2 (label serial0) +> +(qemu) +> +(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but +> +unavailable: IRQ_XIVE capability must be present for KVM +> +Falling back to kernel-irqchip=off +> +** Qemu Hang +> +> +(In another ssh session) +> +# screen /dev/pts/2 +> +Preparing to boot Linux version 6.10.4-200.fc40.ppc64le +> +(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 +> +(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 +> +15:20:17 UTC 2024 +> +Detected machine type: 0000000000000101 +> +command line: +> +BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le +> +root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M +> +Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) +> +Calling ibm,client-architecture-support... done +> +memory layout at init: +> +memory_limit : 0000000000000000 (16 MB aligned) +> +alloc_bottom : 0000000008200000 +> +alloc_top : 0000000030000000 +> +alloc_top_hi : 0000000800000000 +> +rmo_top : 0000000030000000 +> +ram_top : 0000000800000000 +> +instantiating rtas at 0x000000002fff0000... done +> +prom_hold_cpus: skipped +> +copying OF device tree... +> +Building dt strings... +> +Building dt structure... +> +Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 +> +Device tree struct 0x0000000008220000 -> 0x0000000008230000 +> +Quiescing Open Firmware ... +> +Booting Linux via __start() @ 0x0000000000440000 ... +> +** Guest Console Hang +> +> +> +Git Bisect: +> +Performing git bisect points to the following patch: +> +# git bisect bad +> +e8291ec16da80566c121c68d9112be458954d90b is the first bad commit +> +commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) +> +Author: Nicholas Piggin <npiggin@gmail.com> +> +Date: Thu Dec 19 13:40:31 2024 +1000 +> +> +target/ppc: fix timebase register reset state +> +> +(H)DEC and PURR get reset before icount does, which causes them to +> +be +> +skewed and not match the init state. This can cause replay to not +> +match the recorded trace exactly. For DEC and HDEC this is usually +> +not +> +noticable since they tend to get programmed before affecting the +> +target machine. PURR has been observed to cause replay bugs when +> +running Linux. +> +> +Fix this by resetting using a time of 0. +> +> +Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> +> +Signed-off-by: Nicholas Piggin <npiggin@gmail.com> +> +> +hw/ppc/ppc.c | 11 ++++++++--- +> +1 file changed, 8 insertions(+), 3 deletions(-) +> +> +> +Reverting the patch helps boot the guest. +> +Thanks, +> +Misbah Anjum N + diff --git a/results/classifier/009/all/59540920 b/results/classifier/009/all/59540920 new file mode 100644 index 000000000..85d1e913a --- /dev/null +++ b/results/classifier/009/all/59540920 @@ -0,0 +1,386 @@ +other: 0.989 +files: 0.987 +permissions: 0.986 +graphic: 0.985 +debug: 0.985 +device: 0.985 +semantic: 0.985 +socket: 0.983 +performance: 0.983 +PID: 0.982 +network: 0.981 +boot: 0.980 +vnc: 0.977 +KVM: 0.970 + +[BUG] No irqchip created after commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") + +I apologize if this was already reported, + +I just noticed that with the latest updates QEMU doesn't start with the +following configuration: + +qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu +host,hv_vpindex,hv_synic ... + +qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument +qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument + +If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as +usual. I bisected this to the following commit: + +commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad) +Author: Paolo Bonzini <address@hidden> +Date: Wed Nov 13 10:56:53 2019 +0100 + + kvm: convert "-machine kernel_irqchip" to an accelerator property + +so aparently we now default to 'kernel_irqchip=off'. Is this the desired +behavior? + +-- +Vitaly + +No, absolutely not. I was sure I had tested it, but I will take a look. +Paolo +Il ven 20 dic 2019, 15:11 Vitaly Kuznetsov < +address@hidden +> ha scritto: +I apologize if this was already reported, +I just noticed that with the latest updates QEMU doesn't start with the +following configuration: +qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu host,hv_vpindex,hv_synic ... +qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument +qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument +If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as +usual. I bisected this to the following commit: +commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad) +Author: Paolo Bonzini < +address@hidden +> +Date:  Wed Nov 13 10:56:53 2019 +0100 +  kvm: convert "-machine kernel_irqchip" to an accelerator property +so aparently we now default to 'kernel_irqchip=off'. Is this the desired +behavior? +-- +Vitaly + +Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +accelerator property") moves kernel_irqchip property from "-machine" to +"-accel kvm", but it forgets to set the default value of +kernel_irqchip_allowed and kernel_irqchip_split. + +Also cleaning up the three useless members (kernel_irqchip_allowed, +kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. + +Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator +property") +Signed-off-by: Xiaoyao Li <address@hidden> +--- + accel/kvm/kvm-all.c | 3 +++ + include/hw/boards.h | 3 --- + 2 files changed, 3 insertions(+), 3 deletions(-) + +diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c +index b2f1a5bcb5ef..40f74094f8d3 100644 +--- a/accel/kvm/kvm-all.c ++++ b/accel/kvm/kvm-all.c +@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) + static void kvm_accel_instance_init(Object *obj) + { + KVMState *s = KVM_STATE(obj); ++ MachineClass *mc = MACHINE_GET_CLASS(current_machine); + + s->kvm_shadow_mem = -1; ++ s->kernel_irqchip_allowed = true; ++ s->kernel_irqchip_split = mc->default_kernel_irqchip_split; + } + + static void kvm_accel_class_init(ObjectClass *oc, void *data) +diff --git a/include/hw/boards.h b/include/hw/boards.h +index 61f8bb8e5a42..fb1b43d5b972 100644 +--- a/include/hw/boards.h ++++ b/include/hw/boards.h +@@ -271,9 +271,6 @@ struct MachineState { + + /*< public >*/ + +- bool kernel_irqchip_allowed; +- bool kernel_irqchip_required; +- bool kernel_irqchip_split; + char *dtb; + char *dumpdtb; + int phandle_start; +-- +2.19.1 + +Il sab 28 dic 2019, 09:48 Xiaoyao Li < +address@hidden +> ha scritto: +Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +accelerator property") moves kernel_irqchip property from "-machine" to +"-accel kvm", but it forgets to set the default value of +kernel_irqchip_allowed and kernel_irqchip_split. +Also cleaning up the three useless members (kernel_irqchip_allowed, +kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. +Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") +Signed-off-by: Xiaoyao Li < +address@hidden +> +Please also add a Reported-by line for Vitaly Kuznetsov. +--- + accel/kvm/kvm-all.c | 3 +++ + include/hw/boards.h | 3 --- + 2 files changed, 3 insertions(+), 3 deletions(-) +diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c +index b2f1a5bcb5ef..40f74094f8d3 100644 +--- a/accel/kvm/kvm-all.c ++++ b/accel/kvm/kvm-all.c +@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) + static void kvm_accel_instance_init(Object *obj) + { +   KVMState *s = KVM_STATE(obj); ++  MachineClass *mc = MACHINE_GET_CLASS(current_machine); +   s->kvm_shadow_mem = -1; ++  s->kernel_irqchip_allowed = true; ++  s->kernel_irqchip_split = mc->default_kernel_irqchip_split; +Can you initialize this from the init_machine method instead of assuming that current_machine has been initialized earlier? +Thanks for the quick fix! +Paolo + } + static void kvm_accel_class_init(ObjectClass *oc, void *data) +diff --git a/include/hw/boards.h b/include/hw/boards.h +index 61f8bb8e5a42..fb1b43d5b972 100644 +--- a/include/hw/boards.h ++++ b/include/hw/boards.h +@@ -271,9 +271,6 @@ struct MachineState { +   /*< public >*/ +-  bool kernel_irqchip_allowed; +-  bool kernel_irqchip_required; +-  bool kernel_irqchip_split; +   char *dtb; +   char *dumpdtb; +   int phandle_start; +-- +2.19.1 + +On Sat, 2019-12-28 at 10:02 +0000, Paolo Bonzini wrote: +> +> +> +Il sab 28 dic 2019, 09:48 Xiaoyao Li <address@hidden> ha scritto: +> +> Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +> +> accelerator property") moves kernel_irqchip property from "-machine" to +> +> "-accel kvm", but it forgets to set the default value of +> +> kernel_irqchip_allowed and kernel_irqchip_split. +> +> +> +> Also cleaning up the three useless members (kernel_irqchip_allowed, +> +> kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. +> +> +> +> Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an +> +> accelerator property") +> +> Signed-off-by: Xiaoyao Li <address@hidden> +> +> +Please also add a Reported-by line for Vitaly Kuznetsov. +Sure. + +> +> --- +> +> accel/kvm/kvm-all.c | 3 +++ +> +> include/hw/boards.h | 3 --- +> +> 2 files changed, 3 insertions(+), 3 deletions(-) +> +> +> +> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c +> +> index b2f1a5bcb5ef..40f74094f8d3 100644 +> +> --- a/accel/kvm/kvm-all.c +> +> +++ b/accel/kvm/kvm-all.c +> +> @@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) +> +> static void kvm_accel_instance_init(Object *obj) +> +> { +> +> KVMState *s = KVM_STATE(obj); +> +> + MachineClass *mc = MACHINE_GET_CLASS(current_machine); +> +> +> +> s->kvm_shadow_mem = -1; +> +> + s->kernel_irqchip_allowed = true; +> +> + s->kernel_irqchip_split = mc->default_kernel_irqchip_split; +> +> +Can you initialize this from the init_machine method instead of assuming that +> +current_machine has been initialized earlier? +OK, will do it in v2. + +> +Thanks for the quick fix! +BTW, it seems that this patch makes kernel_irqchip default on to workaround the +bug. +However, when explicitly configuring kernel_irqchip=off, guest still fails +booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel +ubuntu guest. Any idea about this? + +> +Paolo +> +> } +> +> +> +> static void kvm_accel_class_init(ObjectClass *oc, void *data) +> +> diff --git a/include/hw/boards.h b/include/hw/boards.h +> +> index 61f8bb8e5a42..fb1b43d5b972 100644 +> +> --- a/include/hw/boards.h +> +> +++ b/include/hw/boards.h +> +> @@ -271,9 +271,6 @@ struct MachineState { +> +> +> +> /*< public >*/ +> +> +> +> - bool kernel_irqchip_allowed; +> +> - bool kernel_irqchip_required; +> +> - bool kernel_irqchip_split; +> +> char *dtb; +> +> char *dumpdtb; +> +> int phandle_start; + +Il sab 28 dic 2019, 10:24 Xiaoyao Li < +address@hidden +> ha scritto: +BTW, it seems that this patch makes kernel_irqchip default on to workaround the +bug. +However, when explicitly configuring kernel_irqchip=off, guest still fails +booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel +ubuntu guest. Any idea about this? +We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu host by chance? +Paolo +> Paolo +> > } +> > +> > static void kvm_accel_class_init(ObjectClass *oc, void *data) +> > diff --git a/include/hw/boards.h b/include/hw/boards.h +> > index 61f8bb8e5a42..fb1b43d5b972 100644 +> > --- a/include/hw/boards.h +> > +++ b/include/hw/boards.h +> > @@ -271,9 +271,6 @@ struct MachineState { +> > +> >   /*< public >*/ +> > +> > -  bool kernel_irqchip_allowed; +> > -  bool kernel_irqchip_required; +> > -  bool kernel_irqchip_split; +> >   char *dtb; +> >   char *dumpdtb; +> >   int phandle_start; + +On Sat, 2019-12-28 at 10:57 +0000, Paolo Bonzini wrote: +> +> +> +Il sab 28 dic 2019, 10:24 Xiaoyao Li <address@hidden> ha scritto: +> +> BTW, it seems that this patch makes kernel_irqchip default on to workaround +> +> the +> +> bug. +> +> However, when explicitly configuring kernel_irqchip=off, guest still fails +> +> booting due to "KVM: failed to send PV IPI: -95" with a latest upstream +> +> kernel +> +> ubuntu guest. Any idea about this? +> +> +We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu +> +host by chance? +Yes, I used -cpu host. + +After using "-cpu host,-kvm-pv-ipi" with kernel_irqchip=off, it can boot +successfully. + +> +Paolo +> +> +> > Paolo +> +> > > } +> +> > > +> +> > > static void kvm_accel_class_init(ObjectClass *oc, void *data) +> +> > > diff --git a/include/hw/boards.h b/include/hw/boards.h +> +> > > index 61f8bb8e5a42..fb1b43d5b972 100644 +> +> > > --- a/include/hw/boards.h +> +> > > +++ b/include/hw/boards.h +> +> > > @@ -271,9 +271,6 @@ struct MachineState { +> +> > > +> +> > > /*< public >*/ +> +> > > +> +> > > - bool kernel_irqchip_allowed; +> +> > > - bool kernel_irqchip_required; +> +> > > - bool kernel_irqchip_split; +> +> > > char *dtb; +> +> > > char *dumpdtb; +> +> > > int phandle_start; +> +> + diff --git a/results/classifier/009/all/80570214 b/results/classifier/009/all/80570214 new file mode 100644 index 000000000..b531fb673 --- /dev/null +++ b/results/classifier/009/all/80570214 @@ -0,0 +1,410 @@ +vnc: 0.983 +permissions: 0.983 +debug: 0.979 +semantic: 0.978 +other: 0.978 +graphic: 0.978 +performance: 0.976 +PID: 0.976 +network: 0.975 +socket: 0.975 +device: 0.974 +KVM: 0.971 +boot: 0.969 +files: 0.961 + +[Qemu-devel] [vhost-user BUG ?] QEMU process segfault when shutdown or reboot with vhost-user + +Hi, + +We catch a segfault in our project. + +Qemu version is 2.3.0 + +The Stack backtrace is: +(gdb) bt +#0 0x0000000000000000 in ?? () +#1 0x00007f7ad9280b2f in qemu_deliver_packet (sender=<optimized out>, flags=<optimized +out>, data=<optimized out>, size=100, opaque= + 0x7f7ad9d6db10) at net/net.c:510 +#2 0x00007f7ad92831fa in qemu_net_queue_deliver (size=<optimized out>, data=<optimized +out>, flags=<optimized out>, + sender=<optimized out>, queue=<optimized out>) at net/queue.c:157 +#3 qemu_net_queue_flush (queue=0x7f7ad9d39630) at net/queue.c:254 +#4 0x00007f7ad9280dac in qemu_flush_or_purge_queued_packets +(nc=0x7f7ad9d6db10, purge=true) at net/net.c:539 +#5 0x00007f7ad9280e76 in net_vm_change_state_handler (opaque=<optimized out>, +running=<optimized out>, state=100) at net/net.c:1214 +#6 0x00007f7ad915612f in vm_state_notify (running=0, state=RUN_STATE_SHUTDOWN) +at vl.c:1820 +#7 0x00007f7ad906db1a in do_vm_stop (state=<optimized out>) at +/usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:631 +#8 vm_stop (state=RUN_STATE_SHUTDOWN) at +/usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:1325 +#9 0x00007f7ad915e4a2 in main_loop_should_exit () at vl.c:2080 +#10 main_loop () at vl.c:2131 +#11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at +vl.c:4721 +(gdb) p *(NetClientState *)0x7f7ad9d6db10 +$1 = {info = 0x7f7ad9824520, link_down = 0, next = {tqe_next = 0x7f7ad0f06d10, +tqe_prev = 0x7f7ad98b1cf0}, peer = 0x7f7ad0f06d10, + incoming_queue = 0x7f7ad9d39630, model = 0x7f7ad9d39590 "vhost_user", name = +0x7f7ad9d39570 "hostnet0", info_str = + "vhost-user to charnet0", '\000' <repeats 233 times>, receive_disabled = 0, +destructor = + 0x7f7ad92821f0 <qemu_net_client_destructor>, queue_index = 0, +rxfilter_notify_enabled = 0} +(gdb) p *(NetClientInfo *)0x7f7ad9824520 +$2 = {type = NET_CLIENT_OPTIONS_KIND_VHOST_USER, size = 360, receive = 0, +receive_raw = 0, receive_iov = 0, can_receive = 0, cleanup = + 0x7f7ad9288850 <vhost_user_cleanup>, link_status_changed = 0, +query_rx_filter = 0, poll = 0, has_ufo = + 0x7f7ad92886d0 <vhost_user_has_ufo>, has_vnet_hdr = 0x7f7ad9288670 +<vhost_user_has_vnet_hdr>, has_vnet_hdr_len = 0, + using_vnet_hdr = 0, set_offload = 0, set_vnet_hdr_len = 0} +(gdb) + +The corresponding codes where gdb reports error are: (We have added some codes +in net.c) +ssize_t qemu_deliver_packet(NetClientState *sender, + unsigned flags, + const uint8_t *data, + size_t size, + void *opaque) +{ + NetClientState *nc = opaque; + ssize_t ret; + + if (nc->link_down) { + return size; + } + + if (nc->receive_disabled) { + return 0; + } + + if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { + ret = nc->info->receive_raw(nc, data, size); + } else { + ret = nc->info->receive(nc, data, size); ----> Here is 510 line + } + +I'm not quite familiar with vhost-user, but for vhost-user, these two callback +functions seem to be always NULL, +Why we can come here ? +Is it an error to add VM state change handler for vhost-user ? + +Thanks, +zhanghailiang + +Hi + +On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +<address@hidden> wrote: +> +The corresponding codes where gdb reports error are: (We have added some +> +codes in net.c) +Can you reproduce with unmodified qemu? Could you give instructions to do so? + +> +ssize_t qemu_deliver_packet(NetClientState *sender, +> +unsigned flags, +> +const uint8_t *data, +> +size_t size, +> +void *opaque) +> +{ +> +NetClientState *nc = opaque; +> +ssize_t ret; +> +> +if (nc->link_down) { +> +return size; +> +} +> +> +if (nc->receive_disabled) { +> +return 0; +> +} +> +> +if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { +> +ret = nc->info->receive_raw(nc, data, size); +> +} else { +> +ret = nc->info->receive(nc, data, size); ----> Here is 510 line +> +} +> +> +I'm not quite familiar with vhost-user, but for vhost-user, these two +> +callback functions seem to be always NULL, +> +Why we can come here ? +You should not come here, vhost-user has nc->receive_disabled (it +changes in 2.5) + +-- +Marc-André Lureau + +On 2015/11/3 22:54, Marc-André Lureau wrote: +Hi + +On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +<address@hidden> wrote: +The corresponding codes where gdb reports error are: (We have added some +codes in net.c) +Can you reproduce with unmodified qemu? Could you give instructions to do so? +OK, i will try to do it. There is nothing special, we run iperf tool in VM, +and then shutdown or reboot it. There is change you can catch segfault. +ssize_t qemu_deliver_packet(NetClientState *sender, + unsigned flags, + const uint8_t *data, + size_t size, + void *opaque) +{ + NetClientState *nc = opaque; + ssize_t ret; + + if (nc->link_down) { + return size; + } + + if (nc->receive_disabled) { + return 0; + } + + if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { + ret = nc->info->receive_raw(nc, data, size); + } else { + ret = nc->info->receive(nc, data, size); ----> Here is 510 line + } + +I'm not quite familiar with vhost-user, but for vhost-user, these two +callback functions seem to be always NULL, +Why we can come here ? +You should not come here, vhost-user has nc->receive_disabled (it +changes in 2.5) +I have looked at the newest codes, i think we can still have chance to +come here, since we will change nc->receive_disable to false temporarily in +qemu_flush_or_purge_queued_packets(), there is no difference between 2.3 and 2.5 +for this. +Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true +in qemu_net_queue_flush() for vhost-user ? + +i will try to reproduce it by using newest qemu. + +Thanks, +zhanghailiang + +On 11/04/2015 10:24 AM, zhanghailiang wrote: +> +On 2015/11/3 22:54, Marc-André Lureau wrote: +> +> Hi +> +> +> +> On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +> +> <address@hidden> wrote: +> +>> The corresponding codes where gdb reports error are: (We have added +> +>> some +> +>> codes in net.c) +> +> +> +> Can you reproduce with unmodified qemu? Could you give instructions +> +> to do so? +> +> +> +> +OK, i will try to do it. There is nothing special, we run iperf tool +> +in VM, +> +and then shutdown or reboot it. There is change you can catch segfault. +> +> +>> ssize_t qemu_deliver_packet(NetClientState *sender, +> +>> unsigned flags, +> +>> const uint8_t *data, +> +>> size_t size, +> +>> void *opaque) +> +>> { +> +>> NetClientState *nc = opaque; +> +>> ssize_t ret; +> +>> +> +>> if (nc->link_down) { +> +>> return size; +> +>> } +> +>> +> +>> if (nc->receive_disabled) { +> +>> return 0; +> +>> } +> +>> +> +>> if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { +> +>> ret = nc->info->receive_raw(nc, data, size); +> +>> } else { +> +>> ret = nc->info->receive(nc, data, size); ----> Here is +> +>> 510 line +> +>> } +> +>> +> +>> I'm not quite familiar with vhost-user, but for vhost-user, these two +> +>> callback functions seem to be always NULL, +> +>> Why we can come here ? +> +> +> +> You should not come here, vhost-user has nc->receive_disabled (it +> +> changes in 2.5) +> +> +> +> +I have looked at the newest codes, i think we can still have chance to +> +come here, since we will change nc->receive_disable to false +> +temporarily in +> +qemu_flush_or_purge_queued_packets(), there is no difference between +> +2.3 and 2.5 +> +for this. +> +Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true +> +in qemu_net_queue_flush() for vhost-user ? +The only thing I can image is self announcing. Are you trying to do +migration? 2.5 only support sending rarp through this. + +And it's better to have a breakpoint to see why a packet was queued for +vhost-user. The stack trace may also help in this case. + +> +> +i will try to reproduce it by using newest qemu. +> +> +Thanks, +> +zhanghailiang +> + +On 2015/11/4 11:19, Jason Wang wrote: +On 11/04/2015 10:24 AM, zhanghailiang wrote: +On 2015/11/3 22:54, Marc-André Lureau wrote: +Hi + +On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang +<address@hidden> wrote: +The corresponding codes where gdb reports error are: (We have added +some +codes in net.c) +Can you reproduce with unmodified qemu? Could you give instructions +to do so? +OK, i will try to do it. There is nothing special, we run iperf tool +in VM, +and then shutdown or reboot it. There is change you can catch segfault. +ssize_t qemu_deliver_packet(NetClientState *sender, + unsigned flags, + const uint8_t *data, + size_t size, + void *opaque) +{ + NetClientState *nc = opaque; + ssize_t ret; + + if (nc->link_down) { + return size; + } + + if (nc->receive_disabled) { + return 0; + } + + if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { + ret = nc->info->receive_raw(nc, data, size); + } else { + ret = nc->info->receive(nc, data, size); ----> Here is +510 line + } + +I'm not quite familiar with vhost-user, but for vhost-user, these two +callback functions seem to be always NULL, +Why we can come here ? +You should not come here, vhost-user has nc->receive_disabled (it +changes in 2.5) +I have looked at the newest codes, i think we can still have chance to +come here, since we will change nc->receive_disable to false +temporarily in +qemu_flush_or_purge_queued_packets(), there is no difference between +2.3 and 2.5 +for this. +Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true +in qemu_net_queue_flush() for vhost-user ? +The only thing I can image is self announcing. Are you trying to do +migration? 2.5 only support sending rarp through this. +Hmm, it's not triggered by migration, For qemu-2.5, IMHO, it doesn't have such +problem, +since the callback function 'receive' is not NULL. It is vhost_user_receive(). +And it's better to have a breakpoint to see why a packet was queued for +vhost-user. The stack trace may also help in this case. +OK, i'm trying to reproduce it. + +Thanks, +zhanghailiang +i will try to reproduce it by using newest qemu. + +Thanks, +zhanghailiang +. + diff --git a/results/classifier/009/all/88225572 b/results/classifier/009/all/88225572 new file mode 100644 index 000000000..292ea66b8 --- /dev/null +++ b/results/classifier/009/all/88225572 @@ -0,0 +1,2910 @@ +permissions: 0.992 +other: 0.987 +debug: 0.986 +PID: 0.984 +semantic: 0.976 +graphic: 0.974 +device: 0.970 +boot: 0.969 +performance: 0.965 +vnc: 0.958 +files: 0.957 +socket: 0.955 +network: 0.950 +KVM: 0.924 + +[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device + +Hi, + +I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +think it's because io completion hits use-after-free when device is +already gone. Is this a known bug that has been fixed? (I went through +the git log but didn't find anything obvious). + +gdb backtrace is: + +Core was generated by `/usr/local/libexec/qemu-kvm -name +sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +Program terminated with signal 11, Segmentation fault. +#0 object_get_class (obj=obj@entry=0x0) at +/usr/src/debug/qemu-4.0/qom/object.c:903 +903 return obj->class; +(gdb) bt +#0 object_get_class (obj=obj@entry=0x0) at +/usr/src/debug/qemu-4.0/qom/object.c:903 +#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +  opaque=0x558a2f2fd420, ret=0) +  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +  i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +#6  0x00007fff9ed75780 in ?? () +#7  0x0000000000000000 in ?? () + +It seems like qemu was completing a discard/write_zero request, but +parent BusState was already freed & set to NULL. + +Do we need to drain all pending request before unrealizing virtio-blk +device? Like the following patch proposed? +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +If more info is needed, please let me know. + +Thanks, +Eryu + +On Tue, 31 Dec 2019 18:34:34 +0800 +Eryu Guan <address@hidden> wrote: + +> +Hi, +> +> +I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +think it's because io completion hits use-after-free when device is +> +already gone. Is this a known bug that has been fixed? (I went through +> +the git log but didn't find anything obvious). +> +> +gdb backtrace is: +> +> +Core was generated by `/usr/local/libexec/qemu-kvm -name +> +sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +Program terminated with signal 11, Segmentation fault. +> +#0 object_get_class (obj=obj@entry=0x0) at +> +/usr/src/debug/qemu-4.0/qom/object.c:903 +> +903 return obj->class; +> +(gdb) bt +> +#0 object_get_class (obj=obj@entry=0x0) at +> +/usr/src/debug/qemu-4.0/qom/object.c:903 +> +#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +  opaque=0x558a2f2fd420, ret=0) +> +  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +  i1=<optimized out>) at +> +/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +#6  0x00007fff9ed75780 in ?? () +> +#7  0x0000000000000000 in ?? () +> +> +It seems like qemu was completing a discard/write_zero request, but +> +parent BusState was already freed & set to NULL. +> +> +Do we need to drain all pending request before unrealizing virtio-blk +> +device? Like the following patch proposed? +> +> +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> +If more info is needed, please let me know. +may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +Thanks, +> +Eryu +> + +On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +On Tue, 31 Dec 2019 18:34:34 +0800 +> +Eryu Guan <address@hidden> wrote: +> +> +> Hi, +> +> +> +> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> think it's because io completion hits use-after-free when device is +> +> already gone. Is this a known bug that has been fixed? (I went through +> +> the git log but didn't find anything obvious). +> +> +> +> gdb backtrace is: +> +> +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 object_get_class (obj=obj@entry=0x0) at +> +> /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> 903 return obj->class; +> +> (gdb) bt +> +> #0 object_get_class (obj=obj@entry=0x0) at +> +> /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +>   vector=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +>   opaque=0x558a2f2fd420, ret=0) +> +>   at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +>   at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +>   i1=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> #6  0x00007fff9ed75780 in ?? () +> +> #7  0x0000000000000000 in ?? () +> +> +> +> It seems like qemu was completing a discard/write_zero request, but +> +> parent BusState was already freed & set to NULL. +> +> +> +> Do we need to drain all pending request before unrealizing virtio-blk +> +> device? Like the following patch proposed? +> +> +> +> +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> +> +> If more info is needed, please let me know. +> +> +may be this will help: +https://patchwork.kernel.org/patch/11213047/ +Yeah, this looks promising! I'll try it out (though it's a one-time +crash for me). Thanks! + +Eryu + +On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> On Tue, 31 Dec 2019 18:34:34 +0800 +> +> Eryu Guan <address@hidden> wrote: +> +> +> +> > Hi, +> +> > +> +> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > think it's because io completion hits use-after-free when device is +> +> > already gone. Is this a known bug that has been fixed? (I went through +> +> > the git log but didn't find anything obvious). +> +> > +> +> > gdb backtrace is: +> +> > +> +> > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > Program terminated with signal 11, Segmentation fault. +> +> > #0 object_get_class (obj=obj@entry=0x0) at +> +> > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > 903 return obj->class; +> +> > (gdb) bt +> +> > #0 object_get_class (obj=obj@entry=0x0) at +> +> > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> >   vector=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> >   opaque=0x558a2f2fd420, ret=0) +> +> >   at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> >   at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> >   i1=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > #6  0x00007fff9ed75780 in ?? () +> +> > #7  0x0000000000000000 in ?? () +> +> > +> +> > It seems like qemu was completing a discard/write_zero request, but +> +> > parent BusState was already freed & set to NULL. +> +> > +> +> > Do we need to drain all pending request before unrealizing virtio-blk +> +> > device? Like the following patch proposed? +> +> > +> +> > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > +> +> > If more info is needed, please let me know. +> +> +> +> may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +Yeah, this looks promising! I'll try it out (though it's a one-time +> +crash for me). Thanks! +After applying this patch, I don't see the original segfaut and +backtrace, but I see this crash + +[Thread debugging using libthread_db enabled] +Using host libthread_db library "/lib64/libthread_db.so.1". +Core was generated by `/usr/local/libexec/qemu-kvm -name +sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +Program terminated with signal 11, Segmentation fault. +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +addr=0, val=<optimized out>, size=<optimized out>) at +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +1324 VirtIOPCIProxy *proxy = +VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +Missing separate debuginfos, use: debuginfo-install +glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +(gdb) bt +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +addr=0, val=<optimized out>, size=<optimized out>) at +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +addr=<optimized out>, value=<optimized out>, size=<optimized out>, +shift=<optimized out>, mask=<optimized out>, attrs=...) at +/usr/src/debug/qemu-4.0/memory.c:502 +#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized +out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 +<memory_region_write_accessor>, mr=0x56121846d340, attrs=...) + at /usr/src/debug/qemu-4.0/memory.c:568 +#3 0x0000561216837c66 in memory_region_dispatch_write +(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +#4 0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, +addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address +0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, +l=<optimized out>, mr=0x56121846d340) + at /usr/src/debug/qemu-4.0/exec.c:3279 +#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, +attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at +/usr/src/debug/qemu-4.0/exec.c:3318 +#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at +/usr/src/debug/qemu-4.0/exec.c:3408 +#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized +out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address +0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) +at /usr/src/debug/qemu-4.0/exec.c:3419 +#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +#9 0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) +at /usr/src/debug/qemu-4.0/cpus.c:1281 +#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 + +And I searched and found +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +bug. + +But I can still hit the bug even after applying the commit. Do I miss +anything? + +Thanks, +Eryu +> +Eryu + +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > Eryu Guan <address@hidden> wrote: +> +> > +> +> > > Hi, +> +> > > +> +> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > think it's because io completion hits use-after-free when device is +> +> > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > the git log but didn't find anything obvious). +> +> > > +> +> > > gdb backtrace is: +> +> > > +> +> > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > Program terminated with signal 11, Segmentation fault. +> +> > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > 903 return obj->class; +> +> > > (gdb) bt +> +> > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > vector=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > opaque=0x558a2f2fd420, ret=0) +> +> > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > i1=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > #6 0x00007fff9ed75780 in ?? () +> +> > > #7 0x0000000000000000 in ?? () +> +> > > +> +> > > It seems like qemu was completing a discard/write_zero request, but +> +> > > parent BusState was already freed & set to NULL. +> +> > > +> +> > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > device? Like the following patch proposed? +> +> > > +> +> > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > +> +> > > If more info is needed, please let me know. +> +> > +> +> > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> +> +> Yeah, this looks promising! I'll try it out (though it's a one-time +> +> crash for me). Thanks! +> +> +After applying this patch, I don't see the original segfaut and +> +backtrace, but I see this crash +> +> +[Thread debugging using libthread_db enabled] +> +Using host libthread_db library "/lib64/libthread_db.so.1". +> +Core was generated by `/usr/local/libexec/qemu-kvm -name +> +sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +Program terminated with signal 11, Segmentation fault. +> +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +addr=0, val=<optimized out>, size=<optimized out>) at +> +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +1324 VirtIOPCIProxy *proxy = +> +VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +Missing separate debuginfos, use: debuginfo-install +> +glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +(gdb) bt +> +#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +addr=0, val=<optimized out>, size=<optimized out>) at +> +/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +/usr/src/debug/qemu-4.0/memory.c:502 +> +#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +access_size_min=<optimized out>, access_size_max=<optimized out>, +> +access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +attrs=...) +> +at /usr/src/debug/qemu-4.0/memory.c:568 +> +#3 0x0000561216837c66 in memory_region_dispatch_write +> +(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +#4 0x00005612167e036f in flatview_write_continue +> +(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340) +> +at /usr/src/debug/qemu-4.0/exec.c:3279 +> +#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out +> +of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at +> +/usr/src/debug/qemu-4.0/exec.c:3408 +> +#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +addr=<optimized out>, attrs=..., attrs@entry=..., +> +buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +len=<optimized out>, is_write=<optimized out>) at +> +/usr/src/debug/qemu-4.0/exec.c:3419 +> +#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +#9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +And I searched and found +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +bug. +> +> +But I can still hit the bug even after applying the commit. Do I miss +> +anything? +Hi Eryu, +This backtrace seems to be caused by this bug (there were two bugs in +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +Although the solution hasn't been tested on virtio-blk yet, you may +want to apply this patch: +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +Let me know if this works. + +Best regards, Julia Suvorova. + +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +> +> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > > Hi, +> +> > > > +> +> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > > think it's because io completion hits use-after-free when device is +> +> > > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > > the git log but didn't find anything obvious). +> +> > > > +> +> > > > gdb backtrace is: +> +> > > > +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > 903 return obj->class; +> +> > > > (gdb) bt +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > vector=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > i1=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > #7 0x0000000000000000 in ?? () +> +> > > > +> +> > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > parent BusState was already freed & set to NULL. +> +> > > > +> +> > > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > > device? Like the following patch proposed? +> +> > > > +> +> > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > +> +> > > > If more info is needed, please let me know. +> +> > > +> +> > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > +> +> > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > crash for me). Thanks! +> +> +> +> After applying this patch, I don't see the original segfaut and +> +> backtrace, but I see this crash +> +> +> +> [Thread debugging using libthread_db enabled] +> +> Using host libthread_db library "/lib64/libthread_db.so.1". +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> 1324 VirtIOPCIProxy *proxy = +> +> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> Missing separate debuginfos, use: debuginfo-install +> +> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> (gdb) bt +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +> addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> /usr/src/debug/qemu-4.0/memory.c:502 +> +> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +> attrs=...) +> +> at /usr/src/debug/qemu-4.0/memory.c:568 +> +> #3 0x0000561216837c66 in memory_region_dispatch_write +> +> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> #4 0x00005612167e036f in flatview_write_continue +> +> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> mr=0x56121846d340) +> +> at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) +> +> at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., attrs@entry=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=<optimized out>, is_write=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/exec.c:3419 +> +> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +> +> And I searched and found +> +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> bug. +> +> +> +> But I can still hit the bug even after applying the commit. Do I miss +> +> anything? +> +> +Hi Eryu, +> +This backtrace seems to be caused by this bug (there were two bugs in +> +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +Although the solution hasn't been tested on virtio-blk yet, you may +> +want to apply this patch: +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +Let me know if this works. +Will try it out, thanks a lot! + +Eryu + +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> +> +> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > > Hi, +> +> > > > +> +> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I +> +> > > > think it's because io completion hits use-after-free when device is +> +> > > > already gone. Is this a known bug that has been fixed? (I went through +> +> > > > the git log but didn't find anything obvious). +> +> > > > +> +> > > > gdb backtrace is: +> +> > > > +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > 903 return obj->class; +> +> > > > (gdb) bt +> +> > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > vector=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > i1=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > #7 0x0000000000000000 in ?? () +> +> > > > +> +> > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > parent BusState was already freed & set to NULL. +> +> > > > +> +> > > > Do we need to drain all pending request before unrealizing virtio-blk +> +> > > > device? Like the following patch proposed? +> +> > > > +> +> > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > +> +> > > > If more info is needed, please let me know. +> +> > > +> +> > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > +> +> > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > crash for me). Thanks! +> +> +> +> After applying this patch, I don't see the original segfaut and +> +> backtrace, but I see this crash +> +> +> +> [Thread debugging using libthread_db enabled] +> +> Using host libthread_db library "/lib64/libthread_db.so.1". +> +> Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> Program terminated with signal 11, Segmentation fault. +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> 1324 VirtIOPCIProxy *proxy = +> +> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> Missing separate debuginfos, use: debuginfo-install +> +> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> (gdb) bt +> +> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> addr=0, val=<optimized out>, size=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, +> +> addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> /usr/src/debug/qemu-4.0/memory.c:502 +> +> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, +> +> attrs=...) +> +> at /usr/src/debug/qemu-4.0/memory.c:568 +> +> #3 0x0000561216837c66 in memory_region_dispatch_write +> +> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> #4 0x00005612167e036f in flatview_write_continue +> +> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> mr=0x56121846d340) +> +> at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) +> +> at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> addr=<optimized out>, attrs=..., attrs@entry=..., +> +> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> len=<optimized out>, is_write=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/exec.c:3419 +> +> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> +> +> And I searched and found +> +> +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> bug. +> +> +> +> But I can still hit the bug even after applying the commit. Do I miss +> +> anything? +> +> +Hi Eryu, +> +This backtrace seems to be caused by this bug (there were two bugs in +> +1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +Although the solution hasn't been tested on virtio-blk yet, you may +> +want to apply this patch: +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +Let me know if this works. +Unfortunately, I still see the same segfault & backtrace after applying +commit 421afd2fe8dd ("virtio: reset region cache when on queue +deletion") + +Anything I can help to debug? + +Thanks, +Eryu + +On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > +> +> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > Eryu Guan <address@hidden> wrote: +> +> > > > +> +> > > > > Hi, +> +> > > > > +> +> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, +> +> > > > > I +> +> > > > > think it's because io completion hits use-after-free when device is +> +> > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > through +> +> > > > > the git log but didn't find anything obvious). +> +> > > > > +> +> > > > > gdb backtrace is: +> +> > > > > +> +> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > 903 return obj->class; +> +> > > > > (gdb) bt +> +> > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, +> +> > > > > vector=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( +> +> > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, +> +> > > > > i1=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > #7 0x0000000000000000 in ?? () +> +> > > > > +> +> > > > > It seems like qemu was completing a discard/write_zero request, but +> +> > > > > parent BusState was already freed & set to NULL. +> +> > > > > +> +> > > > > Do we need to drain all pending request before unrealizing +> +> > > > > virtio-blk +> +> > > > > device? Like the following patch proposed? +> +> > > > > +> +> > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > +> +> > > > > If more info is needed, please let me know. +> +> > > > +> +> > > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > > +> +> > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > crash for me). Thanks! +> +> > +> +> > After applying this patch, I don't see the original segfaut and +> +> > backtrace, but I see this crash +> +> > +> +> > [Thread debugging using libthread_db enabled] +> +> > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > Program terminated with signal 11, Segmentation fault. +> +> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> > addr=0, val=<optimized out>, size=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > 1324 VirtIOPCIProxy *proxy = +> +> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > Missing separate debuginfos, use: debuginfo-install +> +> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > (gdb) bt +> +> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, +> +> > addr=0, val=<optimized out>, size=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, +> +> > shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> > access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> > access_fn=0x561216835ac0 <memory_region_write_accessor>, +> +> > mr=0x56121846d340, attrs=...) +> +> > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > #4 0x00005612167e036f in flatview_write_continue +> +> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > mr=0x56121846d340) +> +> > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 +> +> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > len=<optimized out>, is_write=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at +> +> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > +> +> > And I searched and found +> +> > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> > bug. +> +> > +> +> > But I can still hit the bug even after applying the commit. Do I miss +> +> > anything? +> +> +> +> Hi Eryu, +> +> This backtrace seems to be caused by this bug (there were two bugs in +> +> 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> Although the solution hasn't been tested on virtio-blk yet, you may +> +> want to apply this patch: +> +> +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> Let me know if this works. +> +> +Unfortunately, I still see the same segfault & backtrace after applying +> +commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +deletion") +> +> +Anything I can help to debug? +Please post the QEMU command-line and the QMP commands use to remove the +device. + +The backtrace shows a vcpu thread submitting a request. The device +seems to be partially destroyed. That's surprising because the monitor +and the vcpu thread should use the QEMU global mutex to avoid race +conditions. Maybe seeing the QMP commands will make it clearer... + +Stefan +signature.asc +Description: +PGP signature + +On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > +> +> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > +> +> > > > > > Hi, +> +> > > > > > +> +> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > sandbox, I +> +> > > > > > think it's because io completion hits use-after-free when device +> +> > > > > > is +> +> > > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > > through +> +> > > > > > the git log but didn't find anything obvious). +> +> > > > > > +> +> > > > > > gdb backtrace is: +> +> > > > > > +> +> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > 903 return obj->class; +> +> > > > > > (gdb) bt +> +> > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > vector=<optimized out>) at +> +> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > out>, +> +> > > > > > i1=<optimized out>) at +> +> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > +> +> > > > > > It seems like qemu was completing a discard/write_zero request, +> +> > > > > > but +> +> > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > +> +> > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > virtio-blk +> +> > > > > > device? Like the following patch proposed? +> +> > > > > > +> +> > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > +> +> > > > > > If more info is needed, please let me know. +> +> > > > > +> +> > > > > may be this will help: +https://patchwork.kernel.org/patch/11213047/ +> +> > > > +> +> > > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > > crash for me). Thanks! +> +> > > +> +> > > After applying this patch, I don't see the original segfaut and +> +> > > backtrace, but I see this crash +> +> > > +> +> > > [Thread debugging using libthread_db enabled] +> +> > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > Program terminated with signal 11, Segmentation fault. +> +> > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > 1324 VirtIOPCIProxy *proxy = +> +> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > Missing separate debuginfos, use: debuginfo-install +> +> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > > (gdb) bt +> +> > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized +> +> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, +> +> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, +> +> > > access_size_min=<optimized out>, access_size_max=<optimized out>, +> +> > > access_fn=0x561216835ac0 <memory_region_write_accessor>, +> +> > > mr=0x56121846d340, attrs=...) +> +> > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > #4 0x00005612167e036f in flatview_write_continue +> +> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > > mr=0x56121846d340) +> +> > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > len=<optimized out>, is_write=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) +> +> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > +> +> > > And I searched and found +> +> > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the same +> +> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular +> +> > > bug. +> +> > > +> +> > > But I can still hit the bug even after applying the commit. Do I miss +> +> > > anything? +> +> > +> +> > Hi Eryu, +> +> > This backtrace seems to be caused by this bug (there were two bugs in +> +> > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > want to apply this patch: +> +> > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > Let me know if this works. +> +> +> +> Unfortunately, I still see the same segfault & backtrace after applying +> +> commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> deletion") +> +> +> +> Anything I can help to debug? +> +> +Please post the QEMU command-line and the QMP commands use to remove the +> +device. +It's a normal kata instance using virtio-fs as rootfs. + +/usr/local/libexec/qemu-kvm -name +sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ + -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ + -cpu host -qmp +unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait + \ + -qmp +unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait + \ + -m 2048M,slots=10,maxmem=773893M -device +pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ + -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +virtconsole,chardev=charconsole0,id=console0 \ + -chardev +socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait + \ + -device +virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ + -chardev +socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait + \ + -device nvdimm,id=nv0,memdev=mem0 -object +memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 + \ + -object rng-random,id=rng0,filename=/dev/urandom -device +virtio-rng,rng=rng0,romfile= \ + -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ + -chardev +socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait + \ + -chardev +socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock + \ + -device +vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ + -device +driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= + \ + -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults +-nographic -daemonize \ + -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ + -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 +i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 +console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 +root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro +rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 +agent.use_vsock=false init=/usr/lib/systemd/systemd +systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service +systemd.mask=systemd-networkd.socket \ + -pidfile +/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +\ + -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 + +QMP command to delete device (the device id is just an example, not the +one caused the crash): + +"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" + +which has been hot plugged by: +"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +"{\"return\": {}}" +"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +"{\"return\": {}}" + +> +> +The backtrace shows a vcpu thread submitting a request. The device +> +seems to be partially destroyed. That's surprising because the monitor +> +and the vcpu thread should use the QEMU global mutex to avoid race +> +conditions. Maybe seeing the QMP commands will make it clearer... +> +> +Stefan +Thanks! + +Eryu + +On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: +> +On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > > +> +> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > > +> +> > > > > > > Hi, +> +> > > > > > > +> +> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > > sandbox, I +> +> > > > > > > think it's because io completion hits use-after-free when +> +> > > > > > > device is +> +> > > > > > > already gone. Is this a known bug that has been fixed? (I went +> +> > > > > > > through +> +> > > > > > > the git log but didn't find anything obvious). +> +> > > > > > > +> +> > > > > > > gdb backtrace is: +> +> > > > > > > +> +> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > 903 return obj->class; +> +> > > > > > > (gdb) bt +> +> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > > vector=<optimized out>) at +> +> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) +> +> > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > > out>, +> +> > > > > > > i1=<optimized out>) at +> +> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > > +> +> > > > > > > It seems like qemu was completing a discard/write_zero request, +> +> > > > > > > but +> +> > > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > > +> +> > > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > > virtio-blk +> +> > > > > > > device? Like the following patch proposed? +> +> > > > > > > +> +> > > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > > +> +> > > > > > > If more info is needed, please let me know. +> +> > > > > > +> +> > > > > > may be this will help: +> +> > > > > > +https://patchwork.kernel.org/patch/11213047/ +> +> > > > > +> +> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time +> +> > > > > crash for me). Thanks! +> +> > > > +> +> > > > After applying this patch, I don't see the original segfaut and +> +> > > > backtrace, but I see this crash +> +> > > > +> +> > > > [Thread debugging using libthread_db enabled] +> +> > > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > > Program terminated with signal 11, Segmentation fault. +> +> > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > 1324 VirtIOPCIProxy *proxy = +> +> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > > Missing separate debuginfos, use: debuginfo-install +> +> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > > libstdc++-4.8.5-28.alios7.1.x86_64 +> +> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 +> +> > > > zlib-1.2.7-16.2.alios7.x86_64 +> +> > > > (gdb) bt +> +> > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized +> +> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized +> +> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at +> +> > > > /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > > #2 0x0000561216833c5d in access_with_adjusted_size +> +> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, +> +> > > > size=size@entry=2, access_size_min=<optimized out>, +> +> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 +> +> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) +> +> > > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > > #4 0x00005612167e036f in flatview_write_continue +> +> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., +> +> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, +> +> > > > mr=0x56121846d340) +> +> > > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized +> +> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, +> +> > > > len=<optimized out>, is_write=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) +> +> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at +> +> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 +> +> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > > +> +> > > > And I searched and found +> +> > > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the +> +> > > > same +> +> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add +> +> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this +> +> > > > particular +> +> > > > bug. +> +> > > > +> +> > > > But I can still hit the bug even after applying the commit. Do I miss +> +> > > > anything? +> +> > > +> +> > > Hi Eryu, +> +> > > This backtrace seems to be caused by this bug (there were two bugs in +> +> > > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > > want to apply this patch: +> +> > > +> +> > > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > > Let me know if this works. +> +> > +> +> > Unfortunately, I still see the same segfault & backtrace after applying +> +> > commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> > deletion") +> +> > +> +> > Anything I can help to debug? +> +> +> +> Please post the QEMU command-line and the QMP commands use to remove the +> +> device. +> +> +It's a normal kata instance using virtio-fs as rootfs. +> +> +/usr/local/libexec/qemu-kvm -name +> +sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ +> +-uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +> +q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ +> +-cpu host -qmp +> +unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +\ +> +-qmp +> +unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +\ +> +-m 2048M,slots=10,maxmem=773893M -device +> +pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ +> +-device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +> +virtconsole,chardev=charconsole0,id=console0 \ +> +-chardev +> +socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait +> +\ +> +-device +> +virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ +> +-chardev +> +socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait +> +\ +> +-device nvdimm,id=nv0,memdev=mem0 -object +> +memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 +> +\ +> +-object rng-random,id=rng0,filename=/dev/urandom -device +> +virtio-rng,rng=rng0,romfile= \ +> +-device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ +> +-chardev +> +socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait +> +\ +> +-chardev +> +socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock +> +\ +> +-device +> +vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +> +-netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ +> +-device +> +driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= +> +\ +> +-global kvm-pit.lost_tick_policy=discard -vga none -no-user-config +> +-nodefaults -nographic -daemonize \ +> +-object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +> +-numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ +> +-append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 +> +i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k +> +console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 +> +pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro +> +ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 +> +agent.use_vsock=false init=/usr/lib/systemd/systemd +> +systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service +> +systemd.mask=systemd-networkd.socket \ +> +-pidfile +> +/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +> +\ +> +-smp 1,cores=1,threads=1,sockets=96,maxcpus=96 +> +> +QMP command to delete device (the device id is just an example, not the +> +one caused the crash): +> +> +"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" +> +> +which has been hot plugged by: +> +"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +> +"{\"return\": {}}" +> +"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +> +"{\"return\": {}}" +Thanks. I wasn't able to reproduce this crash with qemu.git/master. + +One thing that is strange about the latest backtrace you posted: QEMU is +dispatching the memory access instead of using the ioeventfd code that +that virtio-blk-pci normally takes when a virtqueue is notified. I +guess this means ioeventfd has already been disabled due to the hot +unplug. + +Could you try with machine type "i440fx" instead of "q35"? I wonder if +pci-bridge/shpc is part of the problem. + +Stefan +signature.asc +Description: +PGP signature + +On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote: +> +On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: +> +> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: +> +> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: +> +> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: +> +> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: +> +> > > > > +> +> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: +> +> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: +> +> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 +> +> > > > > > > Eryu Guan <address@hidden> wrote: +> +> > > > > > > +> +> > > > > > > > Hi, +> +> > > > > > > > +> +> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata +> +> > > > > > > > sandbox, I +> +> > > > > > > > think it's because io completion hits use-after-free when +> +> > > > > > > > device is +> +> > > > > > > > already gone. Is this a known bug that has been fixed? (I +> +> > > > > > > > went through +> +> > > > > > > > the git log but didn't find anything obvious). +> +> > > > > > > > +> +> > > > > > > > gdb backtrace is: +> +> > > > > > > > +> +> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. +> +> > > > > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > > 903 return obj->class; +> +> > > > > > > > (gdb) bt +> +> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 +> +> > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector +> +> > > > > > > > (vdev=0x558a2e7751d0, +> +> > > > > > > > vector=<optimized out>) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 +> +> > > > > > > > #2 0x0000558a2bfdcb1e in +> +> > > > > > > > virtio_blk_discard_write_zeroes_complete ( +> +> > > > > > > > opaque=0x558a2f2fd420, ret=0) +> +> > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 +> +> > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete +> +> > > > > > > > (acb=0x558a2eed7420) +> +> > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 +> +> > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized +> +> > > > > > > > out>, +> +> > > > > > > > i1=<optimized out>) at +> +> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 +> +> > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 +> +> > > > > > > > #6 0x00007fff9ed75780 in ?? () +> +> > > > > > > > #7 0x0000000000000000 in ?? () +> +> > > > > > > > +> +> > > > > > > > It seems like qemu was completing a discard/write_zero +> +> > > > > > > > request, but +> +> > > > > > > > parent BusState was already freed & set to NULL. +> +> > > > > > > > +> +> > > > > > > > Do we need to drain all pending request before unrealizing +> +> > > > > > > > virtio-blk +> +> > > > > > > > device? Like the following patch proposed? +> +> > > > > > > > +> +> > > > > > > > +https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html +> +> > > > > > > > +> +> > > > > > > > If more info is needed, please let me know. +> +> > > > > > > +> +> > > > > > > may be this will help: +> +> > > > > > > +https://patchwork.kernel.org/patch/11213047/ +> +> > > > > > +> +> > > > > > Yeah, this looks promising! I'll try it out (though it's a +> +> > > > > > one-time +> +> > > > > > crash for me). Thanks! +> +> > > > > +> +> > > > > After applying this patch, I don't see the original segfaut and +> +> > > > > backtrace, but I see this crash +> +> > > > > +> +> > > > > [Thread debugging using libthread_db enabled] +> +> > > > > Using host libthread_db library "/lib64/libthread_db.so.1". +> +> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name +> +> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. +> +> > > > > Program terminated with signal 11, Segmentation fault. +> +> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, +> +> > > > > size=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > > 1324 VirtIOPCIProxy *proxy = +> +> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); +> +> > > > > Missing separate debuginfos, use: debuginfo-install +> +> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 +> +> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 +> +> > > > > libstdc++-4.8.5-28.alios7.1.x86_64 +> +> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64 +> +> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 +> +> > > > > (gdb) bt +> +> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write +> +> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, +> +> > > > > size=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 +> +> > > > > #1 0x0000561216835b22 in memory_region_write_accessor +> +> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, +> +> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>, +> +> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502 +> +> > > > > #2 0x0000561216833c5d in access_with_adjusted_size +> +> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, +> +> > > > > size=size@entry=2, access_size_min=<optimized out>, +> +> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 +> +> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) +> +> > > > > at /usr/src/debug/qemu-4.0/memory.c:568 +> +> > > > > #3 0x0000561216837c66 in memory_region_dispatch_write +> +> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, +> +> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 +> +> > > > > #4 0x00005612167e036f in flatview_write_continue +> +> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, +> +> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out +> +> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized +> +> > > > > out>, mr=0x56121846d340) +> +> > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 +> +> > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, +> +> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address +> +> > > > > 0x7fce7dd97028 out of bounds>, len=2) at +> +> > > > > /usr/src/debug/qemu-4.0/exec.c:3318 +> +> > > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, +> +> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>, +> +> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408 +> +> > > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, +> +> > > > > addr=<optimized out>, attrs=..., attrs@entry=..., +> +> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of +> +> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at +> +> > > > > /usr/src/debug/qemu-4.0/exec.c:3419 +> +> > > > > #8 0x0000561216849da1 in kvm_cpu_exec +> +> > > > > (cpu=cpu@entry=0x56121849aa00) at +> +> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 +> +> > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn +> +> > > > > (arg=arg@entry=0x56121849aa00) at +> +> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281 +> +> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) +> +> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 +> +> > > > > #11 0x00007fce7bef6e25 in start_thread () from +> +> > > > > /lib64/libpthread.so.0 +> +> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 +> +> > > > > +> +> > > > > And I searched and found +> +> > > > > +https://bugzilla.redhat.com/show_bug.cgi?id=1706759 +, which has the +> +> > > > > same +> +> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: +> +> > > > > Add +> +> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this +> +> > > > > particular +> +> > > > > bug. +> +> > > > > +> +> > > > > But I can still hit the bug even after applying the commit. Do I +> +> > > > > miss +> +> > > > > anything? +> +> > > > +> +> > > > Hi Eryu, +> +> > > > This backtrace seems to be caused by this bug (there were two bugs in +> +> > > > 1706759): +https://bugzilla.redhat.com/show_bug.cgi?id=1708480 +> +> > > > Although the solution hasn't been tested on virtio-blk yet, you may +> +> > > > want to apply this patch: +> +> > > > +> +> > > > +https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html +> +> > > > Let me know if this works. +> +> > > +> +> > > Unfortunately, I still see the same segfault & backtrace after applying +> +> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue +> +> > > deletion") +> +> > > +> +> > > Anything I can help to debug? +> +> > +> +> > Please post the QEMU command-line and the QMP commands use to remove the +> +> > device. +> +> +> +> It's a normal kata instance using virtio-fs as rootfs. +> +> +> +> /usr/local/libexec/qemu-kvm -name +> +> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ +> +> -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine +> +> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ +> +> -cpu host -qmp +> +> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +> \ +> +> -qmp +> +> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait +> +> \ +> +> -m 2048M,slots=10,maxmem=773893M -device +> +> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ +> +> -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device +> +> virtconsole,chardev=charconsole0,id=console0 \ +> +> -chardev +> +> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait +> +> \ +> +> -device +> +> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 +> +> \ +> +> -chardev +> +> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait +> +> \ +> +> -device nvdimm,id=nv0,memdev=mem0 -object +> +> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 +> +> \ +> +> -object rng-random,id=rng0,filename=/dev/urandom -device +> +> virtio-rng,rng=rng0,romfile= \ +> +> -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ +> +> -chardev +> +> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait +> +> \ +> +> -chardev +> +> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock +> +> \ +> +> -device +> +> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M +> +> -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ +> +> -device +> +> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= +> +> \ +> +> -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config +> +> -nodefaults -nographic -daemonize \ +> +> -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on +> +> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ +> +> -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 +> +> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp +> +> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests +> +> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 +> +> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet +> +> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false +> +> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target +> +> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \ +> +> -pidfile +> +> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid +> +> \ +> +> -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 +> +> +> +> QMP command to delete device (the device id is just an example, not the +> +> one caused the crash): +> +> +> +> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" +> +> +> +> which has been hot plugged by: +> +> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" +> +> "{\"return\": {}}" +> +> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" +> +> "{\"return\": {}}" +> +> +Thanks. I wasn't able to reproduce this crash with qemu.git/master. +> +> +One thing that is strange about the latest backtrace you posted: QEMU is +> +dispatching the memory access instead of using the ioeventfd code that +> +that virtio-blk-pci normally takes when a virtqueue is notified. I +> +guess this means ioeventfd has already been disabled due to the hot +> +unplug. +> +> +Could you try with machine type "i440fx" instead of "q35"? I wonder if +> +pci-bridge/shpc is part of the problem. +Sure, will try it. But it may take some time, as the test bed is busy +with other testing tasks. I'll report back once I got the results. + +Thanks, +Eryu + diff --git a/results/classifier/009/all/92957605 b/results/classifier/009/all/92957605 new file mode 100644 index 000000000..4f840de67 --- /dev/null +++ b/results/classifier/009/all/92957605 @@ -0,0 +1,428 @@ +other: 0.997 +permissions: 0.996 +semantic: 0.995 +debug: 0.994 +performance: 0.994 +PID: 0.993 +device: 0.993 +socket: 0.993 +boot: 0.992 +network: 0.989 +graphic: 0.986 +files: 0.986 +KVM: 0.982 +vnc: 0.981 + +[Qemu-devel] Fwd: [BUG] Failed to compile using gcc7.1 + +Hi all, +I encountered the same problem on gcc 7.1.1 and found Qu's mail in +this list from google search. + +Temporarily fix it by specifying the string length in snprintf +directive. Hope this is helpful to other people encountered the same +problem. + +@@ -1,9 +1,7 @@ +--- +--- a/block/blkdebug.c +- "blkdebug:%s:%s", s->config_file ?: "", +--- a/block/blkverify.c +- "blkverify:%s:%s", +--- a/hw/usb/bus.c +- snprintf(downstream->path, sizeof(downstream->path), "%s.%d", +- snprintf(downstream->path, sizeof(downstream->path), "%d", portnr); +-- ++++ b/block/blkdebug.c ++ "blkdebug:%.2037s:%.2037s", s->config_file ?: "", ++++ b/block/blkverify.c ++ "blkverify:%.2038s:%.2038s", ++++ b/hw/usb/bus.c ++ snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", ++ snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr); + +Tsung-en Hsiao + +> +Qu Wenruo Wrote: +> +> +Hi all, +> +> +After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. +> +> +The error is: +> +> +------ +> +CC block/blkdebug.o +> +block/blkdebug.c: In function 'blkdebug_refresh_filename': +> +> +block/blkdebug.c:693:31: error: '%s' directive output may be truncated +> +writing up to 4095 bytes into a region of size 4086 +> +[-Werror=format-truncation=] +> +> +"blkdebug:%s:%s", s->config_file ?: "", +> +^~ +> +In file included from /usr/include/stdio.h:939:0, +> +from /home/adam/qemu/include/qemu/osdep.h:68, +> +from block/blkdebug.c:25: +> +> +/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 +> +or more bytes (assuming 4106) into a destination of size 4096 +> +> +return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, +> +^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +__bos (__s), __fmt, __va_arg_pack ()); +> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +cc1: all warnings being treated as errors +> +make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +> +------ +> +> +It seems that gcc 7 is introducing more restrict check for printf. +> +> +If using clang, although there are some extra warning, it can at least pass +> +the compile. +> +> +Thanks, +> +Qu + +Hi Tsung-en, + +On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote: +Hi all, +I encountered the same problem on gcc 7.1.1 and found Qu's mail in +this list from google search. + +Temporarily fix it by specifying the string length in snprintf +directive. Hope this is helpful to other people encountered the same +problem. +Thank your for sharing this. +@@ -1,9 +1,7 @@ +--- +--- a/block/blkdebug.c +- "blkdebug:%s:%s", s->config_file ?: "", +--- a/block/blkverify.c +- "blkverify:%s:%s", +--- a/hw/usb/bus.c +- snprintf(downstream->path, sizeof(downstream->path), "%s.%d", +- snprintf(downstream->path, sizeof(downstream->path), "%d", portnr); +-- ++++ b/block/blkdebug.c ++ "blkdebug:%.2037s:%.2037s", s->config_file ?: "", +It is a rather funny way to silent this warning :) Truncating the +filename until it fits. +However I don't think it is the correct way since there is indeed an +overflow of bs->exact_filename. +Apparently exact_filename from "block/block_int.h" is defined to hold a +pathname: +char exact_filename[PATH_MAX]; +but is used for more than that (for example in blkdebug.c it might use +until 10+2*PATH_MAX chars). +I suppose it started as a buffer to hold a pathname then more block +drivers were added and this buffer ended used differently. +If it is a multi-purpose buffer one safer option might be to declare it +as a GString* and use g_string_printf(). +I CC'ed the block folks to have their feedback. + +Regards, + +Phil. ++++ b/block/blkverify.c ++ "blkverify:%.2038s:%.2038s", ++++ b/hw/usb/bus.c ++ snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", ++ snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr); + +Tsung-en Hsiao +Qu Wenruo Wrote: + +Hi all, + +After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. + +The error is: + +------ + CC block/blkdebug.o +block/blkdebug.c: In function 'blkdebug_refresh_filename': + +block/blkdebug.c:693:31: error: '%s' directive output may be truncated writing +up to 4095 bytes into a region of size 4086 [-Werror=format-truncation=] + + "blkdebug:%s:%s", s->config_file ?: "", + ^~ +In file included from /usr/include/stdio.h:939:0, + from /home/adam/qemu/include/qemu/osdep.h:68, + from block/blkdebug.c:25: + +/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 or +more bytes (assuming 4106) into a destination of size 4096 + + return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, + ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + __bos (__s), __fmt, __va_arg_pack ()); + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +cc1: all warnings being treated as errors +make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +------ + +It seems that gcc 7 is introducing more restrict check for printf. + +If using clang, although there are some extra warning, it can at least pass the +compile. + +Thanks, +Qu + +On 2017-06-12 05:19, Philippe Mathieu-Daudé wrote: +> +Hi Tsung-en, +> +> +On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote: +> +> Hi all, +> +> I encountered the same problem on gcc 7.1.1 and found Qu's mail in +> +> this list from google search. +> +> +> +> Temporarily fix it by specifying the string length in snprintf +> +> directive. Hope this is helpful to other people encountered the same +> +> problem. +> +> +Thank your for sharing this. +> +> +> +> +> @@ -1,9 +1,7 @@ +> +> --- +> +> --- a/block/blkdebug.c +> +> - "blkdebug:%s:%s", s->config_file ?: "", +> +> --- a/block/blkverify.c +> +> - "blkverify:%s:%s", +> +> --- a/hw/usb/bus.c +> +> - snprintf(downstream->path, sizeof(downstream->path), "%s.%d", +> +> - snprintf(downstream->path, sizeof(downstream->path), "%d", +> +> portnr); +> +> -- +> +> +++ b/block/blkdebug.c +> +> + "blkdebug:%.2037s:%.2037s", s->config_file ?: "", +> +> +It is a rather funny way to silent this warning :) Truncating the +> +filename until it fits. +> +> +However I don't think it is the correct way since there is indeed an +> +overflow of bs->exact_filename. +> +> +Apparently exact_filename from "block/block_int.h" is defined to hold a +> +pathname: +> +char exact_filename[PATH_MAX]; +> +> +but is used for more than that (for example in blkdebug.c it might use +> +until 10+2*PATH_MAX chars). +In any case, truncating the filenames will do just as much as truncating +the result: You'll get an unusable filename. + +> +I suppose it started as a buffer to hold a pathname then more block +> +drivers were added and this buffer ended used differently. +> +> +If it is a multi-purpose buffer one safer option might be to declare it +> +as a GString* and use g_string_printf(). +What it is supposed to be now is just an information string we can print +to the user, because strings are nicer than JSON objects. There are some +commands that take a filename for identifying a block node, but I dream +we can get rid of them in 3.0... + +The right solution is to remove it altogether and have a +"char *bdrv_filename(BlockDriverState *bs)" function (which generates +the filename every time it's called). I've been working on this for some +years now, actually, but it was never pressing enough to get it finished +(so I never had enough time). + +What we can do in the meantime is to not generate a plain filename if it +won't fit into bs->exact_filename. + +(The easiest way to do this probably would be to truncate +bs->exact_filename back to an empty string if snprintf() returns a value +greater than or equal to the length of bs->exact_filename.) + +What to do about hw/usb/bus.c I don't know (I guess the best solution +would be to ignore the warning, but I don't suppose that is going to work). + +Max + +> +> +I CC'ed the block folks to have their feedback. +> +> +Regards, +> +> +Phil. +> +> +> +++ b/block/blkverify.c +> +> + "blkverify:%.2038s:%.2038s", +> +> +++ b/hw/usb/bus.c +> +> + snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", +> +> + snprintf(downstream->path, sizeof(downstream->path), "%.12d", +> +> portnr); +> +> +> +> Tsung-en Hsiao +> +> +> +>> Qu Wenruo Wrote: +> +>> +> +>> Hi all, +> +>> +> +>> After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with +> +>> gcc. +> +>> +> +>> The error is: +> +>> +> +>> ------ +> +>> CC block/blkdebug.o +> +>> block/blkdebug.c: In function 'blkdebug_refresh_filename': +> +>> +> +>> block/blkdebug.c:693:31: error: '%s' directive output may be +> +>> truncated writing up to 4095 bytes into a region of size 4086 +> +>> [-Werror=format-truncation=] +> +>> +> +>> "blkdebug:%s:%s", s->config_file ?: "", +> +>> ^~ +> +>> In file included from /usr/include/stdio.h:939:0, +> +>> from /home/adam/qemu/include/qemu/osdep.h:68, +> +>> from block/blkdebug.c:25: +> +>> +> +>> /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' +> +>> output 11 or more bytes (assuming 4106) into a destination of size 4096 +> +>> +> +>> return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, +> +>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +>> __bos (__s), __fmt, __va_arg_pack ()); +> +>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +> +>> cc1: all warnings being treated as errors +> +>> make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 +> +>> ------ +> +>> +> +>> It seems that gcc 7 is introducing more restrict check for printf. +> +>> +> +>> If using clang, although there are some extra warning, it can at +> +>> least pass the compile. +> +>> +> +>> Thanks, +> +>> Qu +> +> +signature.asc +Description: +OpenPGP digital signature + diff --git a/results/classifier/009/all/95154278 b/results/classifier/009/all/95154278 new file mode 100644 index 000000000..2dc0c2ffc --- /dev/null +++ b/results/classifier/009/all/95154278 @@ -0,0 +1,165 @@ +permissions: 0.989 +other: 0.953 +debug: 0.951 +device: 0.951 +graphic: 0.950 +PID: 0.949 +vnc: 0.948 +semantic: 0.937 +performance: 0.936 +files: 0.918 +KVM: 0.916 +socket: 0.913 +network: 0.913 +boot: 0.902 + +[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c + +If checkpatch.pl is applied (using switch "-f") on file +target/mips/msa_helper.c, it will hang. + +There is a workaround for this particular file: + +These lines in msa_helper.c: + + uint## BITS ##_t S = _S, T = _T; \ + uint## BITS ##_t as, at, xs, xt, xd; \ + +should be replaced with: + + uint## BITS ## _t S = _S, T = _T; \ + uint## BITS ## _t as, at, xs, xt, xd; \ + +(a space is added after the second "##" in each line) + +The workaround is found by partial deleting and undeleting of the code in +msa_helper.c in binary search fashion. + +This workaround will soon be submitted by me as a patch within a series on misc +MIPS issues. + +I took a look at checkpatch.pl code, and it looks it is fairly complicated to +fix the issue, since it happens in the code segment involving intricate logic +conditions. + +Regards, +Aleksandar + +On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: +> +If checkpatch.pl is applied (using switch "-f") on file +> +target/mips/msa_helper.c, it will hang. +> +> +There is a workaround for this particular file: +> +> +These lines in msa_helper.c: +> +> +uint## BITS ##_t S = _S, T = _T; \ +> +uint## BITS ##_t as, at, xs, xt, xd; \ +> +> +should be replaced with: +> +> +uint## BITS ## _t S = _S, T = _T; \ +> +uint## BITS ## _t as, at, xs, xt, xd; \ +> +> +(a space is added after the second "##" in each line) +> +> +The workaround is found by partial deleting and undeleting of the code in +> +msa_helper.c in binary search fashion. +> +> +This workaround will soon be submitted by me as a patch within a series on +> +misc MIPS issues. +> +> +I took a look at checkpatch.pl code, and it looks it is fairly complicated to +> +fix the issue, since it happens in the code segment involving intricate logic +> +conditions. +Thanks for figuring this out, Aleksandar. Not sure if anyone else has +the apetite to fix checkpatch.pl. + +Stefan +signature.asc +Description: +PGP signature + +On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote: +> +On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: +> +> If checkpatch.pl is applied (using switch "-f") on file +> +> target/mips/msa_helper.c, it will hang. +> +> +> +> There is a workaround for this particular file: +> +> +> +> These lines in msa_helper.c: +> +> +> +> uint## BITS ##_t S = _S, T = _T; \ +> +> uint## BITS ##_t as, at, xs, xt, xd; \ +> +> +> +> should be replaced with: +> +> +> +> uint## BITS ## _t S = _S, T = _T; \ +> +> uint## BITS ## _t as, at, xs, xt, xd; \ +> +> +> +> (a space is added after the second "##" in each line) +> +> +> +> The workaround is found by partial deleting and undeleting of the code in +> +> msa_helper.c in binary search fashion. +> +> +> +> This workaround will soon be submitted by me as a patch within a series on +> +> misc MIPS issues. +> +> +> +> I took a look at checkpatch.pl code, and it looks it is fairly complicated +> +> to fix the issue, since it happens in the code segment involving intricate +> +> logic conditions. +> +> +Thanks for figuring this out, Aleksandar. Not sure if anyone else has +> +the apetite to fix checkpatch.pl. +Anyone else but Paolo ;P +http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html +signature.asc +Description: +OpenPGP digital signature + diff --git a/results/classifier/009/all/96782458 b/results/classifier/009/all/96782458 new file mode 100644 index 000000000..6fa03cc39 --- /dev/null +++ b/results/classifier/009/all/96782458 @@ -0,0 +1,1009 @@ +debug: 0.989 +permissions: 0.986 +performance: 0.985 +semantic: 0.984 +other: 0.982 +boot: 0.980 +PID: 0.980 +files: 0.978 +socket: 0.976 +vnc: 0.976 +device: 0.974 +graphic: 0.973 +network: 0.967 +KVM: 0.963 + +[Qemu-devel] [BUG] Migrate failes between boards with different PMC counts + +Hi all, + +Recently, I found migration failed when enable vPMU. + +migrate vPMU state was introduced in linux-3.10 + qemu-1.7. + +As long as enable vPMU, qemu will save / load the +vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. +But global_ctrl generated based on cpuid(0xA), the number of general-purpose +performance +monitoring counters(PMC) can vary according to Intel SDN. The number of PMC +presented +to vm, does not support configuration currently, it depend on host cpuid, and +enable all pmc +defaultly at KVM. It cause migration to fail between boards with different PMC +counts. + +The return value of cpuid (0xA) is different dur to cpu, according to Intel +SDNï¼18-10 Vol. 3B: + +Note: The number of general-purpose performance monitoring counters (i.e. N in +Figure 18-9) +can vary across processor generations within a processor family, across +processor families, or +could be different depending on the configuration chosen at boot time in the +BIOS regarding +Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; N +=4 for processors +based on the Nehalem microarchitecture; for processors based on the Sandy Bridge +microarchitecture, N = 4 if Intel Hyper Threading Technology is active and N=8 +if not active). + +Also I found, N=8 if HT is not active based on the broadwellï¼, +such as CPU E7-8890 v4 @ 2.20GHz + +# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +tcp::8888 +Completed 100 % +qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +kvm_put_msrs: +Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +Aborted + +So make number of pmc configurable to vm ? Any better idea ? + + +Regards, +-Zhuang Yanying + +* Zhuangyanying (address@hidden) wrote: +> +Hi all, +> +> +Recently, I found migration failed when enable vPMU. +> +> +migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> +As long as enable vPMU, qemu will save / load the +> +vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. +> +But global_ctrl generated based on cpuid(0xA), the number of general-purpose +> +performance +> +monitoring counters(PMC) can vary according to Intel SDN. The number of PMC +> +presented +> +to vm, does not support configuration currently, it depend on host cpuid, and +> +enable all pmc +> +defaultly at KVM. It cause migration to fail between boards with different +> +PMC counts. +> +> +The return value of cpuid (0xA) is different dur to cpu, according to Intel +> +SDNï¼18-10 Vol. 3B: +> +> +Note: The number of general-purpose performance monitoring counters (i.e. N +> +in Figure 18-9) +> +can vary across processor generations within a processor family, across +> +processor families, or +> +could be different depending on the configuration chosen at boot time in the +> +BIOS regarding +> +Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; +> +N =4 for processors +> +based on the Nehalem microarchitecture; for processors based on the Sandy +> +Bridge +> +microarchitecture, N = 4 if Intel Hyper Threading Technology is active and +> +N=8 if not active). +> +> +Also I found, N=8 if HT is not active based on the broadwellï¼, +> +such as CPU E7-8890 v4 @ 2.20GHz +> +> +# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +> +tcp::8888 +> +Completed 100 % +> +qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +kvm_put_msrs: +> +Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +Aborted +> +> +So make number of pmc configurable to vm ? Any better idea ? +Coincidentally we hit a similar problem a few days ago with -cpu host - it +took me +quite a while to spot the difference between the machines was the source +had hyperthreading disabled. + +An option to set the number of counters makes sense to me; but I wonder +how many other options we need as well. Also, I'm not sure there's any +easy way for libvirt etc to figure out how many counters a host supports - it's +not in /proc/cpuinfo. + +Dave + +> +> +Regards, +> +-Zhuang Yanying +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +* Zhuangyanying (address@hidden) wrote: +> +> Hi all, +> +> +> +> Recently, I found migration failed when enable vPMU. +> +> +> +> migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> +> +> As long as enable vPMU, qemu will save / load the +> +> vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> migration. +> +> But global_ctrl generated based on cpuid(0xA), the number of +> +> general-purpose performance +> +> monitoring counters(PMC) can vary according to Intel SDN. The number of PMC +> +> presented +> +> to vm, does not support configuration currently, it depend on host cpuid, +> +> and enable all pmc +> +> defaultly at KVM. It cause migration to fail between boards with different +> +> PMC counts. +> +> +> +> The return value of cpuid (0xA) is different dur to cpu, according to Intel +> +> SDNï¼18-10 Vol. 3B: +> +> +> +> Note: The number of general-purpose performance monitoring counters (i.e. N +> +> in Figure 18-9) +> +> can vary across processor generations within a processor family, across +> +> processor families, or +> +> could be different depending on the configuration chosen at boot time in +> +> the BIOS regarding +> +> Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom +> +> processors; N =4 for processors +> +> based on the Nehalem microarchitecture; for processors based on the Sandy +> +> Bridge +> +> microarchitecture, N = 4 if Intel Hyper Threading Technology is active and +> +> N=8 if not active). +> +> +> +> Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> such as CPU E7-8890 v4 @ 2.20GHz +> +> +> +> # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +> /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +> +> tcp::8888 +> +> Completed 100 % +> +> qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +> qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> kvm_put_msrs: +> +> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> Aborted +> +> +> +> So make number of pmc configurable to vm ? Any better idea ? +> +> +Coincidentally we hit a similar problem a few days ago with -cpu host - it +> +took me +> +quite a while to spot the difference between the machines was the source +> +had hyperthreading disabled. +> +> +An option to set the number of counters makes sense to me; but I wonder +> +how many other options we need as well. Also, I'm not sure there's any +> +easy way for libvirt etc to figure out how many counters a host supports - +> +it's not in /proc/cpuinfo. +We actually try to avoid /proc/cpuinfo whereever possible. We do direct +CPUID asm instructions to identify features, and prefer to use +/sys/devices/system/cpu if that has suitable data + +Where do the PMC counts come from originally ? CPUID or something else ? + +Regards, +Daniel +-- +|: +https://berrange.com +-o- +https://www.flickr.com/photos/dberrange +:| +|: +https://libvirt.org +-o- +https://fstop138.berrange.com +:| +|: +https://entangle-photo.org +-o- +https://www.instagram.com/dberrange +:| + +* Daniel P. Berrange (address@hidden) wrote: +> +On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> * Zhuangyanying (address@hidden) wrote: +> +> > Hi all, +> +> > +> +> > Recently, I found migration failed when enable vPMU. +> +> > +> +> > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > +> +> > As long as enable vPMU, qemu will save / load the +> +> > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> > migration. +> +> > But global_ctrl generated based on cpuid(0xA), the number of +> +> > general-purpose performance +> +> > monitoring counters(PMC) can vary according to Intel SDN. The number of +> +> > PMC presented +> +> > to vm, does not support configuration currently, it depend on host cpuid, +> +> > and enable all pmc +> +> > defaultly at KVM. It cause migration to fail between boards with +> +> > different PMC counts. +> +> > +> +> > The return value of cpuid (0xA) is different dur to cpu, according to +> +> > Intel SDNï¼18-10 Vol. 3B: +> +> > +> +> > Note: The number of general-purpose performance monitoring counters (i.e. +> +> > N in Figure 18-9) +> +> > can vary across processor generations within a processor family, across +> +> > processor families, or +> +> > could be different depending on the configuration chosen at boot time in +> +> > the BIOS regarding +> +> > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom +> +> > processors; N =4 for processors +> +> > based on the Nehalem microarchitecture; for processors based on the Sandy +> +> > Bridge +> +> > microarchitecture, N = 4 if Intel Hyper Threading Technology is active +> +> > and N=8 if not active). +> +> > +> +> > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > such as CPU E7-8890 v4 @ 2.20GHz +> +> > +> +> > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +> > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming +> +> > tcp::8888 +> +> > Completed 100 % +> +> > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +> > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> > kvm_put_msrs: +> +> > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > Aborted +> +> > +> +> > So make number of pmc configurable to vm ? Any better idea ? +> +> +> +> Coincidentally we hit a similar problem a few days ago with -cpu host - it +> +> took me +> +> quite a while to spot the difference between the machines was the source +> +> had hyperthreading disabled. +> +> +> +> An option to set the number of counters makes sense to me; but I wonder +> +> how many other options we need as well. Also, I'm not sure there's any +> +> easy way for libvirt etc to figure out how many counters a host supports - +> +> it's not in /proc/cpuinfo. +> +> +We actually try to avoid /proc/cpuinfo whereever possible. We do direct +> +CPUID asm instructions to identify features, and prefer to use +> +/sys/devices/system/cpu if that has suitable data +> +> +Where do the PMC counts come from originally ? CPUID or something else ? +Yes, they're bits 8..15 of CPUID leaf 0xa + +Dave + +> +Regards, +> +Daniel +> +-- +> +|: +https://berrange.com +-o- +https://www.flickr.com/photos/dberrange +:| +> +|: +https://libvirt.org +-o- +https://fstop138.berrange.com +:| +> +|: +https://entangle-photo.org +-o- +https://www.instagram.com/dberrange +:| +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + +On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: +> +* Daniel P. Berrange (address@hidden) wrote: +> +> On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> > * Zhuangyanying (address@hidden) wrote: +> +> > > Hi all, +> +> > > +> +> > > Recently, I found migration failed when enable vPMU. +> +> > > +> +> > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > > +> +> > > As long as enable vPMU, qemu will save / load the +> +> > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> > > migration. +> +> > > But global_ctrl generated based on cpuid(0xA), the number of +> +> > > general-purpose performance +> +> > > monitoring counters(PMC) can vary according to Intel SDN. The number of +> +> > > PMC presented +> +> > > to vm, does not support configuration currently, it depend on host +> +> > > cpuid, and enable all pmc +> +> > > defaultly at KVM. It cause migration to fail between boards with +> +> > > different PMC counts. +> +> > > +> +> > > The return value of cpuid (0xA) is different dur to cpu, according to +> +> > > Intel SDNï¼18-10 Vol. 3B: +> +> > > +> +> > > Note: The number of general-purpose performance monitoring counters +> +> > > (i.e. N in Figure 18-9) +> +> > > can vary across processor generations within a processor family, across +> +> > > processor families, or +> +> > > could be different depending on the configuration chosen at boot time +> +> > > in the BIOS regarding +> +> > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom +> +> > > processors; N =4 for processors +> +> > > based on the Nehalem microarchitecture; for processors based on the +> +> > > Sandy Bridge +> +> > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active +> +> > > and N=8 if not active). +> +> > > +> +> > > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > > such as CPU E7-8890 v4 @ 2.20GHz +> +> > > +> +> > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda +> +> > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true +> +> > > -incoming tcp::8888 +> +> > > Completed 100 % +> +> > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff +> +> > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> > > kvm_put_msrs: +> +> > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > > Aborted +> +> > > +> +> > > So make number of pmc configurable to vm ? Any better idea ? +> +> > +> +> > Coincidentally we hit a similar problem a few days ago with -cpu host - +> +> > it took me +> +> > quite a while to spot the difference between the machines was the source +> +> > had hyperthreading disabled. +> +> > +> +> > An option to set the number of counters makes sense to me; but I wonder +> +> > how many other options we need as well. Also, I'm not sure there's any +> +> > easy way for libvirt etc to figure out how many counters a host supports - +> +> > it's not in /proc/cpuinfo. +> +> +> +> We actually try to avoid /proc/cpuinfo whereever possible. We do direct +> +> CPUID asm instructions to identify features, and prefer to use +> +> /sys/devices/system/cpu if that has suitable data +> +> +> +> Where do the PMC counts come from originally ? CPUID or something else ? +> +> +Yes, they're bits 8..15 of CPUID leaf 0xa +Ok, that's easy enough for libvirt to detect then. More a question of what +libvirt should then do this with the info.... + +Regards, +Daniel +-- +|: +https://berrange.com +-o- +https://www.flickr.com/photos/dberrange +:| +|: +https://libvirt.org +-o- +https://fstop138.berrange.com +:| +|: +https://entangle-photo.org +-o- +https://www.instagram.com/dberrange +:| + +> +-----Original Message----- +> +From: Daniel P. Berrange [ +mailto:address@hidden +> +Sent: Monday, April 24, 2017 6:34 PM +> +To: Dr. David Alan Gilbert +> +Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; +> +Gonglei (Arei); Huangzhichao; address@hidden +> +Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different +> +PMC counts +> +> +On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: +> +> * Daniel P. Berrange (address@hidden) wrote: +> +> > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> > > * Zhuangyanying (address@hidden) wrote: +> +> > > > Hi all, +> +> > > > +> +> > > > Recently, I found migration failed when enable vPMU. +> +> > > > +> +> > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > > > +> +> > > > As long as enable vPMU, qemu will save / load the +> +> > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +migration. +> +> > > > But global_ctrl generated based on cpuid(0xA), the number of +> +> > > > general-purpose performance monitoring counters(PMC) can vary +> +> > > > according to Intel SDN. The number of PMC presented to vm, does +> +> > > > not support configuration currently, it depend on host cpuid, and +> +> > > > enable +> +all pmc defaultly at KVM. It cause migration to fail between boards with +> +different PMC counts. +> +> > > > +> +> > > > The return value of cpuid (0xA) is different dur to cpu, according to +> +> > > > Intel +> +SDNï¼18-10 Vol. 3B: +> +> > > > +> +> > > > Note: The number of general-purpose performance monitoring +> +> > > > counters (i.e. N in Figure 18-9) can vary across processor +> +> > > > generations within a processor family, across processor +> +> > > > families, or could be different depending on the configuration +> +> > > > chosen at boot time in the BIOS regarding Intel Hyper Threading +> +> > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for +> +processors based on the Nehalem microarchitecture; for processors based on +> +the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading Technology +> +is active and N=8 if not active). +> +> > > > +> +> > > > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > > > such as CPU E7-8890 v4 @ 2.20GHz +> +> > > > +> +> > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m +> +> > > > 4096 -hda +> +> > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true +> +> > > > -incoming tcp::8888 Completed 100 % +> +> > > > qemu-system-x86_64: error: failed to set MSR 0x38f to +> +> > > > 0x7000000ff +> +> > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +kvm_put_msrs: +> +> > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > > > Aborted +> +> > > > +> +> > > > So make number of pmc configurable to vm ? Any better idea ? +> +> > > +> +> > > Coincidentally we hit a similar problem a few days ago with -cpu +> +> > > host - it took me quite a while to spot the difference between +> +> > > the machines was the source had hyperthreading disabled. +> +> > > +> +> > > An option to set the number of counters makes sense to me; but I +> +> > > wonder how many other options we need as well. Also, I'm not sure +> +> > > there's any easy way for libvirt etc to figure out how many +> +> > > counters a host supports - it's not in /proc/cpuinfo. +> +> > +> +> > We actually try to avoid /proc/cpuinfo whereever possible. We do +> +> > direct CPUID asm instructions to identify features, and prefer to +> +> > use /sys/devices/system/cpu if that has suitable data +> +> > +> +> > Where do the PMC counts come from originally ? CPUID or something +> +else ? +> +> +> +> Yes, they're bits 8..15 of CPUID leaf 0xa +> +> +Ok, that's easy enough for libvirt to detect then. More a question of what +> +libvirt +> +should then do this with the info.... +> +Do you mean to do a validation at the begining of migration? in +qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are +not equal, just quit migration? +It maybe a good enough first edition. +But for a further better edition, maybe it's better to support Heterogeneous +migration I think, so we might need to make PMC number configrable, then we +need to modify KVM/qemu as well. + +Regards, +-Zhuang Yanying + +* Zhuangyanying (address@hidden) wrote: +> +> +> +> -----Original Message----- +> +> From: Daniel P. Berrange [ +mailto:address@hidden +> +> Sent: Monday, April 24, 2017 6:34 PM +> +> To: Dr. David Alan Gilbert +> +> Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; +> +> Gonglei (Arei); Huangzhichao; address@hidden +> +> Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different +> +> PMC counts +> +> +> +> On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: +> +> > * Daniel P. Berrange (address@hidden) wrote: +> +> > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: +> +> > > > * Zhuangyanying (address@hidden) wrote: +> +> > > > > Hi all, +> +> > > > > +> +> > > > > Recently, I found migration failed when enable vPMU. +> +> > > > > +> +> > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. +> +> > > > > +> +> > > > > As long as enable vPMU, qemu will save / load the +> +> > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the +> +> migration. +> +> > > > > But global_ctrl generated based on cpuid(0xA), the number of +> +> > > > > general-purpose performance monitoring counters(PMC) can vary +> +> > > > > according to Intel SDN. The number of PMC presented to vm, does +> +> > > > > not support configuration currently, it depend on host cpuid, and +> +> > > > > enable +> +> all pmc defaultly at KVM. It cause migration to fail between boards with +> +> different PMC counts. +> +> > > > > +> +> > > > > The return value of cpuid (0xA) is different dur to cpu, according +> +> > > > > to Intel +> +> SDNï¼18-10 Vol. 3B: +> +> > > > > +> +> > > > > Note: The number of general-purpose performance monitoring +> +> > > > > counters (i.e. N in Figure 18-9) can vary across processor +> +> > > > > generations within a processor family, across processor +> +> > > > > families, or could be different depending on the configuration +> +> > > > > chosen at boot time in the BIOS regarding Intel Hyper Threading +> +> > > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for +> +> processors based on the Nehalem microarchitecture; for processors based on +> +> the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading +> +> Technology +> +> is active and N=8 if not active). +> +> > > > > +> +> > > > > Also I found, N=8 if HT is not active based on the broadwellï¼, +> +> > > > > such as CPU E7-8890 v4 @ 2.20GHz +> +> > > > > +> +> > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m +> +> > > > > 4096 -hda +> +> > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true +> +> > > > > -incoming tcp::8888 Completed 100 % +> +> > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to +> +> > > > > 0x7000000ff +> +> > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: +> +> kvm_put_msrs: +> +> > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. +> +> > > > > Aborted +> +> > > > > +> +> > > > > So make number of pmc configurable to vm ? Any better idea ? +> +> > > > +> +> > > > Coincidentally we hit a similar problem a few days ago with -cpu +> +> > > > host - it took me quite a while to spot the difference between +> +> > > > the machines was the source had hyperthreading disabled. +> +> > > > +> +> > > > An option to set the number of counters makes sense to me; but I +> +> > > > wonder how many other options we need as well. Also, I'm not sure +> +> > > > there's any easy way for libvirt etc to figure out how many +> +> > > > counters a host supports - it's not in /proc/cpuinfo. +> +> > > +> +> > > We actually try to avoid /proc/cpuinfo whereever possible. We do +> +> > > direct CPUID asm instructions to identify features, and prefer to +> +> > > use /sys/devices/system/cpu if that has suitable data +> +> > > +> +> > > Where do the PMC counts come from originally ? CPUID or something +> +> else ? +> +> > +> +> > Yes, they're bits 8..15 of CPUID leaf 0xa +> +> +> +> Ok, that's easy enough for libvirt to detect then. More a question of what +> +> libvirt +> +> should then do this with the info.... +> +> +> +> +Do you mean to do a validation at the begining of migration? in +> +qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are +> +not equal, just quit migration? +> +It maybe a good enough first edition. +> +But for a further better edition, maybe it's better to support Heterogeneous +> +migration I think, so we might need to make PMC number configrable, then we +> +need to modify KVM/qemu as well. +Yes agreed; the only thing I wanted to check was that libvirt would have enough +information to be able to use any feature we added to QEMU. + +Dave + +> +Regards, +> +-Zhuang Yanying +-- +Dr. David Alan Gilbert / address@hidden / Manchester, UK + |