diff options
| author | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
|---|---|---|
| committer | Christian Krinitsin <mail@krinitsin.com> | 2025-07-03 19:39:53 +0200 |
| commit | dee4dcba78baf712cab403d47d9db319ab7f95d6 (patch) | |
| tree | 418478faf06786701a56268672f73d6b0b4eb239 /results/classifier/008/all | |
| parent | 4d9e26c0333abd39bdbd039dcdb30ed429c475ba (diff) | |
| download | qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.tar.gz qemu-analysis-dee4dcba78baf712cab403d47d9db319ab7f95d6.zip | |
restructure results
Diffstat (limited to 'results/classifier/008/all')
| -rw-r--r-- | results/classifier/008/all/16056596 | 108 | ||||
| -rw-r--r-- | results/classifier/008/all/17743720 | 781 | ||||
| -rw-r--r-- | results/classifier/008/all/21221931 | 338 | ||||
| -rw-r--r-- | results/classifier/008/all/23448582 | 275 | ||||
| -rw-r--r-- | results/classifier/008/all/51610399 | 318 | ||||
| -rw-r--r-- | results/classifier/008/all/59540920 | 386 | ||||
| -rw-r--r-- | results/classifier/008/all/80570214 | 410 | ||||
| -rw-r--r-- | results/classifier/008/all/88225572 | 2910 | ||||
| -rw-r--r-- | results/classifier/008/all/92957605 | 428 | ||||
| -rw-r--r-- | results/classifier/008/all/95154278 | 165 | ||||
| -rw-r--r-- | results/classifier/008/all/96782458 | 1009 |
11 files changed, 0 insertions, 7128 deletions
diff --git a/results/classifier/008/all/16056596 b/results/classifier/008/all/16056596 deleted file mode 100644 index e6f8e1f9c..000000000 --- a/results/classifier/008/all/16056596 +++ /dev/null @@ -1,108 +0,0 @@ -permissions: 0.985 -other: 0.980 -semantic: 0.979 -debug: 0.978 -files: 0.975 -device: 0.973 -boot: 0.971 -graphic: 0.970 -performance: 0.970 -PID: 0.961 -socket: 0.952 -vnc: 0.946 -network: 0.940 -KVM: 0.934 - -[BUG][powerpc] KVM Guest Boot Failure and Hang at "Booting Linux via __start()" - -Bug Description: -Encountering a boot failure when launching a KVM guest with -'qemu-system-ppc64'. The guest hangs at boot, and the QEMU monitor -crashes. -Reproduction Steps: -# qemu-system-ppc64 --version -QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) -Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers -# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine -pseries,accel=kvm \ --m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ - -device virtio-scsi-pci,id=scsi \ --drive -file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 -\ --device scsi-hd,drive=drive0,bus=scsi.0 \ - -netdev bridge,id=net0,br=virbr0 \ - -device virtio-net-pci,netdev=net0 \ - -serial pty \ - -device virtio-balloon-pci \ - -cpu host -QEMU 9.2.50 monitor - type 'help' for more information -char device redirected to /dev/pts/2 (label serial0) -(qemu) -(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but -unavailable: IRQ_XIVE capability must be present for KVM -Falling back to kernel-irqchip=off -** Qemu Hang - -(In another ssh session) -# screen /dev/pts/2 -Preparing to boot Linux version 6.10.4-200.fc40.ppc64le -(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 -(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 -15:20:17 UTC 2024 -Detected machine type: 0000000000000101 -command line: -BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le -root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M -Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) -Calling ibm,client-architecture-support... done -memory layout at init: - memory_limit : 0000000000000000 (16 MB aligned) - alloc_bottom : 0000000008200000 - alloc_top : 0000000030000000 - alloc_top_hi : 0000000800000000 - rmo_top : 0000000030000000 - ram_top : 0000000800000000 -instantiating rtas at 0x000000002fff0000... done -prom_hold_cpus: skipped -copying OF device tree... -Building dt strings... -Building dt structure... -Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 -Device tree struct 0x0000000008220000 -> 0x0000000008230000 -Quiescing Open Firmware ... -Booting Linux via __start() @ 0x0000000000440000 ... -** Guest Console Hang - - -Git Bisect: -Performing git bisect points to the following patch: -# git bisect bad -e8291ec16da80566c121c68d9112be458954d90b is the first bad commit -commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) -Author: Nicholas Piggin <npiggin@gmail.com> -Date: Thu Dec 19 13:40:31 2024 +1000 - - target/ppc: fix timebase register reset state -(H)DEC and PURR get reset before icount does, which causes them to -be -skewed and not match the init state. This can cause replay to not -match the recorded trace exactly. For DEC and HDEC this is usually -not -noticable since they tend to get programmed before affecting the - target machine. PURR has been observed to cause replay bugs when - running Linux. - - Fix this by resetting using a time of 0. - - Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> - Signed-off-by: Nicholas Piggin <npiggin@gmail.com> - - hw/ppc/ppc.c | 11 ++++++++--- - 1 file changed, 8 insertions(+), 3 deletions(-) - - -Reverting the patch helps boot the guest. -Thanks, -Misbah Anjum N - diff --git a/results/classifier/008/all/17743720 b/results/classifier/008/all/17743720 deleted file mode 100644 index e4ab63d55..000000000 --- a/results/classifier/008/all/17743720 +++ /dev/null @@ -1,781 +0,0 @@ -other: 0.984 -permissions: 0.981 -debug: 0.974 -graphic: 0.972 -device: 0.971 -performance: 0.965 -semantic: 0.962 -files: 0.961 -PID: 0.955 -socket: 0.954 -vnc: 0.945 -boot: 0.945 -network: 0.944 -KVM: 0.933 - -[Qemu-devel] [BUG] living migrate vm pause forever - -Sometimes, living migrate vm pause forever, migrate job stop, but very small -probability, I canât reproduce. -qemu wait semaphore from libvirt send migrate continue, however libvirt wait -semaphore from qemu send vm pause. - -follow stack: -qemu: -Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): -#0 0x00007f504b84d670 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0 -#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at -qemu-2.12/util/qemu-thread-posix.c:322 -#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, -current_active_state=0x7f50445f2ae4, new_state=10) - at qemu-2.12/migration/migration.c:2106 -#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at -qemu-2.12/migration/migration.c:2137 -#4 migration_iteration_run (s=0x5574ef692f50) at -qemu-2.12/migration/migration.c:2311 -#5 migration_thread (opaque=0x5574ef692f50) -atqemu-2.12/migration/migration.c:2415 -#6 0x00007f504b847184 in start_thread () from -/lib/x86_64-linux-gnu/libpthread.so.0 -#7 0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6 - -libvirt: -Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): -#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from -/lib/x86_64-linux-gnu/libpthread.so.0 -#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at -../../../src/util/virthread.c:252 -#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at -../../../src/conf/domain_conf.c:3303 -#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, -vm=0x7fdbc4002f20, persist_xml=0x0, - cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss -</hostname>\n -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -flags=777, - resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, -nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, -migParams=0x7fdb82ffc900) - at ../../../src/qemu/qemu_migration.c:3937 -#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, -vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 -"tcp://172.16.202.17:49152", - cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n - <hos---Type <return> to continue, or q <return> to quit--- -tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -flags=777, - resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, -migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) - at ../../../src/qemu/qemu_migration.c:4118 -#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, -conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, - uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, -nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, -migParams=0x7fdb82ffc900, - cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n - <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -flags=777, - resource=0) at ../../../src/qemu/qemu_migration.c:5030 -#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, -conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, -dconnuri=0x0, - uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, -listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, -compression=0x7fdb78007990, - migParams=0x7fdb82ffc900, - cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n - <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -flags=777, - dname=0x0, resource=0, v3proto=true) at -../../../src/qemu/qemu_migration.c:5124 -#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, -xmlin=0x0, - cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n - <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -dconnuri=0x0, - uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, -resource=0) at ../../../src/qemu/qemu_driver.c:12996 -#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, -xmlin=0x0, - cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss</hostname>\n - <hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -dconnuri=0x0, - uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, -bandwidth=0) at ../../../src/libvirt-domain.c:4698 -#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 -(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, -rerr=0x7fdb82ffcbc0, - args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528 -#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper -(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, -rerr=0x7fdb82ffcbc0, - args=0x7fdb7800b220, ret=0x7fdb78021e90) at -../../../daemon/remote_dispatch.h:7944 -#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall (prog=0x55d13af98b50, -server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) - at ../../../src/rpc/virnetserverprogram.c:436 -#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, -server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) - at ../../../src/rpc/virnetserverprogram.c:307 -#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, -client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) - at ../../../src/rpc/virnetserver.c:148 -------------------------------------------------------------------------------------------------------------------------------------- -æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº -ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã -ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ -é®ä»¶ï¼ -This e-mail and its attachments contain confidential information from New H3C, -which is -intended only for the person or entity whose address is listed above. Any use -of the -information contained herein in any way (including, but not limited to, total -or partial -disclosure, reproduction, or dissemination) by persons other than the intended -recipient(s) is prohibited. If you receive this e-mail in error, please notify -the sender -by phone or email immediately and delete it! - -* Yuchen (address@hidden) wrote: -> -Sometimes, living migrate vm pause forever, migrate job stop, but very small -> -probability, I canât reproduce. -> -qemu wait semaphore from libvirt send migrate continue, however libvirt wait -> -semaphore from qemu send vm pause. -Hi, - I've copied in Jiri Denemark from libvirt. -Can you confirm exactly which qemu and libvirt versions you're using -please. - -> -follow stack: -> -qemu: -> -Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): -> -#0 0x00007f504b84d670 in sem_wait () from -> -/lib/x86_64-linux-gnu/libpthread.so.0 -> -#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) at -> -qemu-2.12/util/qemu-thread-posix.c:322 -> -#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, -> -current_active_state=0x7f50445f2ae4, new_state=10) -> -at qemu-2.12/migration/migration.c:2106 -> -#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at -> -qemu-2.12/migration/migration.c:2137 -> -#4 migration_iteration_run (s=0x5574ef692f50) at -> -qemu-2.12/migration/migration.c:2311 -> -#5 migration_thread (opaque=0x5574ef692f50) -> -atqemu-2.12/migration/migration.c:2415 -> -#6 0x00007f504b847184 in start_thread () from -> -/lib/x86_64-linux-gnu/libpthread.so.0 -> -#7 0x00007f504b574bed in clone () from /lib/x86_64-linux-gnu/libc.so.6 -In migration_maybe_pause we have: - - migrate_set_state(&s->state, *current_active_state, - MIGRATION_STATUS_PRE_SWITCHOVER); - qemu_sem_wait(&s->pause_sem); - migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, - new_state); - -the line numbers don't match my 2.12.0 checkout; so I guess that it's -that qemu_sem_wait it's stuck at. - -QEMU must have sent the switch to PRE_SWITCHOVER and that should have -sent an event to libvirt, and libvirt should notice that - I'm -not sure how to tell whether libvirt has seen that event yet or not? - -Dave - -> -libvirt: -> -Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): -> -#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from -> -/lib/x86_64-linux-gnu/libpthread.so.0 -> -#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, m=0x7fdbc4002f30) at -> -../../../src/util/virthread.c:252 -> -#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at -> -../../../src/conf/domain_conf.c:3303 -> -#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, -> -vm=0x7fdbc4002f20, persist_xml=0x0, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss -> -</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, -> -nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, -> -migParams=0x7fdb82ffc900) -> -at ../../../src/qemu/qemu_migration.c:3937 -> -#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, -> -vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 -> -"tcp://172.16.202.17:49152", -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n <hos---Type <return> to continue, or q <return> -> -to quit--- -> -tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, -> -migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) -> -at ../../../src/qemu/qemu_migration.c:4118 -> -#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, -> -conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, -> -nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, -> -migParams=0x7fdb82ffc900, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -resource=0) at ../../../src/qemu/qemu_migration.c:5030 -> -#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, -> -conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, -> -dconnuri=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, -> -listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, -> -compression=0x7fdb78007990, -> -migParams=0x7fdb82ffc900, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -dname=0x0, resource=0, v3proto=true) at -> -../../../src/qemu/qemu_migration.c:5124 -> -#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, -> -xmlin=0x0, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -dconnuri=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, -> -resource=0) at ../../../src/qemu/qemu_driver.c:12996 -> -#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, -> -xmlin=0x0, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -dconnuri=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, dname=0x0, -> -bandwidth=0) at ../../../src/libvirt-domain.c:4698 -> -#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 -> -(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, -> -rerr=0x7fdb82ffcbc0, -> -args=0x7fdb7800b220, ret=0x7fdb78021e90) at ../../../daemon/remote.c:4528 -> -#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper -> -(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, -> -rerr=0x7fdb82ffcbc0, -> -args=0x7fdb7800b220, ret=0x7fdb78021e90) at -> -../../../daemon/remote_dispatch.h:7944 -> -#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall -> -(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0, -> -msg=0x55d13afbf620) -> -at ../../../src/rpc/virnetserverprogram.c:436 -> -#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, -> -server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) -> -at ../../../src/rpc/virnetserverprogram.c:307 -> -#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, -> -client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) -> -at ../../../src/rpc/virnetserver.c:148 -> -------------------------------------------------------------------------------------------------------------------------------------- -> -æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº -> -ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã -> -ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ -> -é®ä»¶ï¼ -> -This e-mail and its attachments contain confidential information from New -> -H3C, which is -> -intended only for the person or entity whose address is listed above. Any use -> -of the -> -information contained herein in any way (including, but not limited to, total -> -or partial -> -disclosure, reproduction, or dissemination) by persons other than the intended -> -recipient(s) is prohibited. If you receive this e-mail in error, please -> -notify the sender -> -by phone or email immediately and delete it! --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -In migration_maybe_pause we have: - - migrate_set_state(&s->state, *current_active_state, - MIGRATION_STATUS_PRE_SWITCHOVER); - qemu_sem_wait(&s->pause_sem); - migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, - new_state); - -the line numbers don't match my 2.12.0 checkout; so I guess that it's that -qemu_sem_wait it's stuck at. - -QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an -event to libvirt, and libvirt should notice that - I'm not sure how to tell -whether libvirt has seen that event yet or not? - - -Thank you for your attention. -Yes, you are right, QEMU wait semaphore in this place. -I use qemu-2.12.1, libvirt-4.0.0. -Because I added some debug code, so the line numbers doesn't match open qemu - ------é®ä»¶åä»¶----- -å件人: Dr. David Alan Gilbert [ -mailto:address@hidden -] -åéæ¶é´: 2019å¹´8æ21æ¥ 19:13 -æ¶ä»¶äºº: yuchen (Cloud) <address@hidden>; address@hidden -æé: address@hidden -主é¢: Re: [Qemu-devel] [BUG] living migrate vm pause forever - -* Yuchen (address@hidden) wrote: -> -Sometimes, living migrate vm pause forever, migrate job stop, but very small -> -probability, I canât reproduce. -> -qemu wait semaphore from libvirt send migrate continue, however libvirt wait -> -semaphore from qemu send vm pause. -Hi, - I've copied in Jiri Denemark from libvirt. -Can you confirm exactly which qemu and libvirt versions you're using please. - -> -follow stack: -> -qemu: -> -Thread 6 (Thread 0x7f50445f3700 (LWP 18120)): -> -#0 0x00007f504b84d670 in sem_wait () from -> -/lib/x86_64-linux-gnu/libpthread.so.0 -> -#1 0x00005574eda1e164 in qemu_sem_wait (sem=sem@entry=0x5574ef6930e0) -> -at qemu-2.12/util/qemu-thread-posix.c:322 -> -#2 0x00005574ed8dd72e in migration_maybe_pause (s=0x5574ef692f50, -> -current_active_state=0x7f50445f2ae4, new_state=10) -> -at qemu-2.12/migration/migration.c:2106 -> -#3 0x00005574ed8df51a in migration_completion (s=0x5574ef692f50) at -> -qemu-2.12/migration/migration.c:2137 -> -#4 migration_iteration_run (s=0x5574ef692f50) at -> -qemu-2.12/migration/migration.c:2311 -> -#5 migration_thread (opaque=0x5574ef692f50) -> -atqemu-2.12/migration/migration.c:2415 -> -#6 0x00007f504b847184 in start_thread () from -> -/lib/x86_64-linux-gnu/libpthread.so.0 -> -#7 0x00007f504b574bed in clone () from -> -/lib/x86_64-linux-gnu/libc.so.6 -In migration_maybe_pause we have: - - migrate_set_state(&s->state, *current_active_state, - MIGRATION_STATUS_PRE_SWITCHOVER); - qemu_sem_wait(&s->pause_sem); - migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, - new_state); - -the line numbers don't match my 2.12.0 checkout; so I guess that it's that -qemu_sem_wait it's stuck at. - -QEMU must have sent the switch to PRE_SWITCHOVER and that should have sent an -event to libvirt, and libvirt should notice that - I'm not sure how to tell -whether libvirt has seen that event yet or not? - -Dave - -> -libvirt: -> -Thread 95 (Thread 0x7fdb82ffd700 (LWP 28775)): -> -#0 0x00007fdd177dc404 in pthread_cond_wait@@GLIBC_2.3.2 () from -> -/lib/x86_64-linux-gnu/libpthread.so.0 -> -#1 0x00007fdd198c3b07 in virCondWait (c=0x7fdbc4003000, -> -m=0x7fdbc4002f30) at ../../../src/util/virthread.c:252 -> -#2 0x00007fdd198f36d2 in virDomainObjWait (vm=0x7fdbc4002f20) at -> -../../../src/conf/domain_conf.c:3303 -> -#3 0x00007fdd09ffaa44 in qemuMigrationRun (driver=0x7fdd000037b0, -> -vm=0x7fdbc4002f20, persist_xml=0x0, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n <hostname>mss -> -</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -resource=0, spec=0x7fdb82ffc670, dconn=0x0, graphicsuri=0x0, -> -nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, -> -migParams=0x7fdb82ffc900) -> -at ../../../src/qemu/qemu_migration.c:3937 -> -#4 0x00007fdd09ffb26a in doNativeMigrate (driver=0x7fdd000037b0, -> -vm=0x7fdbc4002f20, persist_xml=0x0, uri=0x7fdb780073a0 -> -"tcp://172.16.202.17:49152", -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n -> -<name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n <hos---Type <return> to continue, or q -> -<return> to quit--- -> -tuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra".. -> -tuuid>., cookieinlen=207, cookieout=0x7fdb82ffcad0, -> -tuuid>cookieoutlen=0x7fdb82ffcac8, flags=777, -> -resource=0, dconn=0x0, graphicsuri=0x0, nmigrate_disks=0, -> -migrate_disks=0x0, compression=0x7fdb78007990, migParams=0x7fdb82ffc900) -> -at ../../../src/qemu/qemu_migration.c:4118 -> -#5 0x00007fdd09ffd808 in qemuMigrationPerformPhase (driver=0x7fdd000037b0, -> -conn=0x7fdb500205d0, vm=0x7fdbc4002f20, persist_xml=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, -> -nmigrate_disks=0, migrate_disks=0x0, compression=0x7fdb78007990, -> -migParams=0x7fdb82ffc900, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -resource=0) at ../../../src/qemu/qemu_migration.c:5030 -> -#6 0x00007fdd09ffdbb5 in qemuMigrationPerform (driver=0x7fdd000037b0, -> -conn=0x7fdb500205d0, vm=0x7fdbc4002f20, xmlin=0x0, persist_xml=0x0, -> -dconnuri=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", graphicsuri=0x0, -> -listenAddress=0x0, nmigrate_disks=0, migrate_disks=0x0, nbdPort=0, -> -compression=0x7fdb78007990, -> -migParams=0x7fdb82ffc900, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -flags=777, -> -dname=0x0, resource=0, v3proto=true) at -> -../../../src/qemu/qemu_migration.c:5124 -> -#7 0x00007fdd0a054725 in qemuDomainMigratePerform3 (dom=0x7fdb78007b00, -> -xmlin=0x0, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -dconnuri=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, -> -dname=0x0, resource=0) at ../../../src/qemu/qemu_driver.c:12996 -> -#8 0x00007fdd199ad0f0 in virDomainMigratePerform3 (domain=0x7fdb78007b00, -> -xmlin=0x0, -> -cookiein=0x7fdb780084e0 "<qemu-migration>\n <name>mss-pl_652</name>\n -> -<uuid>1f2b2334-451e-424b-822a-ea10452abb38</uuid>\n -> -<hostname>mss</hostname>\n -> -<hostuuid>334e344a-4130-4336-5534-323544543642</hostuuid>\n</qemu-migra"..., -> -cookieinlen=207, cookieout=0x7fdb82ffcad0, cookieoutlen=0x7fdb82ffcac8, -> -dconnuri=0x0, -> -uri=0x7fdb780073a0 "tcp://172.16.202.17:49152", flags=777, -> -dname=0x0, bandwidth=0) at ../../../src/libvirt-domain.c:4698 -> -#9 0x000055d13923a939 in remoteDispatchDomainMigratePerform3 -> -(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, -> -rerr=0x7fdb82ffcbc0, -> -args=0x7fdb7800b220, ret=0x7fdb78021e90) at -> -../../../daemon/remote.c:4528 -> -#10 0x000055d13921a043 in remoteDispatchDomainMigratePerform3Helper -> -(server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620, -> -rerr=0x7fdb82ffcbc0, -> -args=0x7fdb7800b220, ret=0x7fdb78021e90) at -> -../../../daemon/remote_dispatch.h:7944 -> -#11 0x00007fdd19a260b4 in virNetServerProgramDispatchCall -> -(prog=0x55d13af98b50, server=0x55d13af90e60, client=0x55d13b0156f0, -> -msg=0x55d13afbf620) -> -at ../../../src/rpc/virnetserverprogram.c:436 -> -#12 0x00007fdd19a25c17 in virNetServerProgramDispatch (prog=0x55d13af98b50, -> -server=0x55d13af90e60, client=0x55d13b0156f0, msg=0x55d13afbf620) -> -at ../../../src/rpc/virnetserverprogram.c:307 -> -#13 0x000055d13925933b in virNetServerProcessMsg (srv=0x55d13af90e60, -> -client=0x55d13b0156f0, prog=0x55d13af98b50, msg=0x55d13afbf620) -> -at ../../../src/rpc/virnetserver.c:148 -> ----------------------------------------------------------------------- -> ---------------------------------------------------------------- -> -æ¬é®ä»¶åå ¶é件嫿æ°åä¸éå¢çä¿å¯ä¿¡æ¯ï¼ä» éäºåéç»ä¸é¢å°åä¸ååº -> -ç个人æç¾¤ç»ãç¦æ¢ä»»ä½å ¶ä»äººä»¥ä»»ä½å½¢å¼ä½¿ç¨ï¼å æ¬ä½ä¸éäºå ¨é¨æé¨åå°æ³é²ãå¤å¶ã -> -ææ£åï¼æ¬é®ä»¶ä¸çä¿¡æ¯ã妿æ¨éæ¶äºæ¬é®ä»¶ï¼è¯·æ¨ç«å³çµè¯æé®ä»¶éç¥å件人并å 餿¬ -> -é®ä»¶ï¼ -> -This e-mail and its attachments contain confidential information from -> -New H3C, which is intended only for the person or entity whose address -> -is listed above. Any use of the information contained herein in any -> -way (including, but not limited to, total or partial disclosure, -> -reproduction, or dissemination) by persons other than the intended -> -recipient(s) is prohibited. If you receive this e-mail in error, -> -please notify the sender by phone or email immediately and delete it! --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - diff --git a/results/classifier/008/all/21221931 b/results/classifier/008/all/21221931 deleted file mode 100644 index a925c3002..000000000 --- a/results/classifier/008/all/21221931 +++ /dev/null @@ -1,338 +0,0 @@ -permissions: 0.982 -other: 0.979 -network: 0.976 -device: 0.971 -debug: 0.971 -files: 0.967 -semantic: 0.967 -performance: 0.966 -socket: 0.957 -graphic: 0.948 -boot: 0.947 -PID: 0.945 -vnc: 0.944 -KVM: 0.913 - -[BUG] qemu git error with virgl - -Hello, - -i can't start any system if i use virgl. I get the following error: -qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion -`con->gl' failed. -./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 --smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device -virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device -intel-hda,id=sound0,msi=on -device -hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci --device usb-tablet,bus=xhci.0 -net -nic,macaddr=52:54:00:12:34:62,model=e1000 -net -tap,ifname=$INTERFACE,script=no,downscript=no -drive -file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads -Set 'tap3' nonpersistent - -i have bicected the issue: - -towo:Defiant> git bisect good -b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit -commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 -Author: Paolo Bonzini <pbonzini@redhat.com> -Date:  Tue Oct 27 08:44:23 2020 -0400 - -   vl: remove separate preconfig main_loop -   Move post-preconfig initialization to the x-exit-preconfig. If -preconfig -   is not requested, just exit preconfig mode immediately with the QMP -   command. - -   As a result, the preconfig loop will run with accel_setup_post -   and os_setup_post restrictions (xen_restrict, chroot, etc.) -   already done. - -   Reviewed-by: Igor Mammedov <imammedo@redhat.com> -   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> - - include/sysemu/runstate.h | 1 - - monitor/qmp-cmds.c       | 9 ----- - softmmu/vl.c             | 95 -++++++++++++++++++++--------------------------- - 3 files changed, 41 insertions(+), 64 deletions(-) - -Regards, - -Torsten Wohlfarth - -Cc'ing Gerd + patch author/reviewer. - -On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: -> -Hello, -> -> -i can't start any system if i use virgl. I get the following error: -> -> -qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion -> -`con->gl' failed. -> -./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 -> --smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device -> -virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device -> -intel-hda,id=sound0,msi=on -device -> -hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci -> --device usb-tablet,bus=xhci.0 -net -> -nic,macaddr=52:54:00:12:34:62,model=e1000 -net -> -tap,ifname=$INTERFACE,script=no,downscript=no -drive -> -file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads -> -> -Set 'tap3' nonpersistent -> -> -i have bicected the issue: -> -> -towo:Defiant> git bisect good -> -b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit -> -commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 -> -Author: Paolo Bonzini <pbonzini@redhat.com> -> -Date:  Tue Oct 27 08:44:23 2020 -0400 -> -> -   vl: remove separate preconfig main_loop -> -> -   Move post-preconfig initialization to the x-exit-preconfig. If -> -preconfig -> -   is not requested, just exit preconfig mode immediately with the QMP -> -   command. -> -> -   As a result, the preconfig loop will run with accel_setup_post -> -   and os_setup_post restrictions (xen_restrict, chroot, etc.) -> -   already done. -> -> -   Reviewed-by: Igor Mammedov <imammedo@redhat.com> -> -   Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -> -> - include/sysemu/runstate.h | 1 - -> - monitor/qmp-cmds.c       | 9 ----- -> - softmmu/vl.c             | 95 -> -++++++++++++++++++++--------------------------- -> - 3 files changed, 41 insertions(+), 64 deletions(-) -> -> -Regards, -> -> -Torsten Wohlfarth -> -> -> - -On Sun, 3 Jan 2021 18:28:11 +0100 -Philippe Mathieu-Daudé <philmd@redhat.com> wrote: - -> -Cc'ing Gerd + patch author/reviewer. -> -> -On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: -> -> Hello, -> -> -> -> i can't start any system if i use virgl. I get the following error: -> -> -> -> qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion -> -> `con->gl' failed. -Does following fix issue: - [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration - -> -> ./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 -> -> -smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device -> -> virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device -> -> intel-hda,id=sound0,msi=on -device -> -> hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci -> -> -device usb-tablet,bus=xhci.0 -net -> -> nic,macaddr=52:54:00:12:34:62,model=e1000 -net -> -> tap,ifname=$INTERFACE,script=no,downscript=no -drive -> -> file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads -> -> -> -> Set 'tap3' nonpersistent -> -> -> -> i have bicected the issue: -> -> -> -> towo:Defiant> git bisect good -> -> b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit -> -> commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 -> -> Author: Paolo Bonzini <pbonzini@redhat.com> -> -> Date:  Tue Oct 27 08:44:23 2020 -0400 -> -> -> ->    vl: remove separate preconfig main_loop -> -> -> ->    Move post-preconfig initialization to the x-exit-preconfig. If -> -> preconfig -> ->    is not requested, just exit preconfig mode immediately with the QMP -> ->    command. -> -> -> ->    As a result, the preconfig loop will run with accel_setup_post -> ->    and os_setup_post restrictions (xen_restrict, chroot, etc.) -> ->    already done. -> -> -> ->    Reviewed-by: Igor Mammedov <imammedo@redhat.com> -> ->    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> -> -> -> ->  include/sysemu/runstate.h | 1 - -> ->  monitor/qmp-cmds.c       | 9 ----- -> ->  softmmu/vl.c             | 95 -> -> ++++++++++++++++++++--------------------------- -> ->  3 files changed, 41 insertions(+), 64 deletions(-) -> -> -> -> Regards, -> -> -> -> Torsten Wohlfarth -> -> -> -> -> -> -> -> - -Hi Igor, - -yes, that fixes my issue. - -Regards, Torsten - -Am 04.01.21 um 19:50 schrieb Igor Mammedov: -On Sun, 3 Jan 2021 18:28:11 +0100 -Philippe Mathieu-Daudé <philmd@redhat.com> wrote: -Cc'ing Gerd + patch author/reviewer. - -On 1/2/21 2:11 PM, Torsten Wohlfarth wrote: -Hello, - -i can't start any system if i use virgl. I get the following error: - -qemu-x86_64: ../ui/console.c:1791: dpy_gl_ctx_create: Assertion -`con->gl' failed. -Does following fix issue: - [PULL 12/55] vl: initialize displays _after_ exiting preconfiguration -./and.sh: line 27: 3337167 Aborted                qemu-x86_64 -m 4096 --smp cores=4,sockets=1 -cpu host -machine pc-q35-4.0,accel=kvm -device -virtio-vga,virgl=on,xres=1280,yres=800 -display sdl,gl=on -device -intel-hda,id=sound0,msi=on -device -hda-micro,id=sound0-codec0,bus=sound0.0,cad=0 -device qemu-xhci,id=xhci --device usb-tablet,bus=xhci.0 -net -nic,macaddr=52:54:00:12:34:62,model=e1000 -net -tap,ifname=$INTERFACE,script=no,downscript=no -drive -file=/media/daten2/image/lineageos.qcow2,if=virtio,index=1,media=disk,cache=none,aio=threads - -Set 'tap3' nonpersistent - -i have bicected the issue: -towo:Defiant> git bisect good -b4e1a342112e50e05b609e857f38c1f2b7aafdc4 is the first bad commit -commit b4e1a342112e50e05b609e857f38c1f2b7aafdc4 -Author: Paolo Bonzini <pbonzini@redhat.com> -Date:  Tue Oct 27 08:44:23 2020 -0400 - -    vl: remove separate preconfig main_loop - -    Move post-preconfig initialization to the x-exit-preconfig. If -preconfig -    is not requested, just exit preconfig mode immediately with the QMP -    command. - -    As a result, the preconfig loop will run with accel_setup_post -    and os_setup_post restrictions (xen_restrict, chroot, etc.) -    already done. - -    Reviewed-by: Igor Mammedov <imammedo@redhat.com> -    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> - -  include/sysemu/runstate.h | 1 - -  monitor/qmp-cmds.c       | 9 ----- -  softmmu/vl.c             | 95 -++++++++++++++++++++--------------------------- -  3 files changed, 41 insertions(+), 64 deletions(-) - -Regards, - -Torsten Wohlfarth - diff --git a/results/classifier/008/all/23448582 b/results/classifier/008/all/23448582 deleted file mode 100644 index 4cb453f2e..000000000 --- a/results/classifier/008/all/23448582 +++ /dev/null @@ -1,275 +0,0 @@ -other: 0.990 -debug: 0.989 -permissions: 0.988 -semantic: 0.987 -graphic: 0.987 -performance: 0.985 -PID: 0.983 -socket: 0.982 -files: 0.979 -device: 0.979 -network: 0.973 -vnc: 0.973 -boot: 0.967 -KVM: 0.958 - -[BUG REPORT] cxl process in infinity loop - -Hi, all - -When I did the cxl memory hot-plug test on QEMU, I accidentally connected -two memdev to the same downstream port, the command like below: - -> --object memory-backend-ram,size=262144k,share=on,id=vmem0 \ -> --object memory-backend-ram,size=262144k,share=on,id=vmem1 \ -> --device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -> --device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ -> --device cxl-upstream,bus=root_port0,id=us0 \ -> --device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ -> --device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ -same downstream port but has different slot! - -> --device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ -> --device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ -> --M -> -cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k -> -\ -There is no error occurred when vm start, but when I executed the âcxl listâ -command to view -the CXL objects info, the process can not end properly. - -Then I used strace to trace the process, I found that the process is in -infinity loop: -# strace cxl list -...... -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -write(3, "1\n\0", 3) = 3 -close(3) = 0 -access("/run/udev/queue", F_OK) = 0 - -[Environment]: -linux: V6.10-rc3 -QEMU: V9.0.0 -ndctl: v79 - -I know this is because of the wrong use of the QEMU command, but I think we -should -be aware of this error in one of the QEMU, OS or ndctl side at least. - -Thanks -Xingtao - -On Tue, 2 Jul 2024 00:30:06 +0000 -"Xingtao Yao (Fujitsu)" <yaoxt.fnst@fujitsu.com> wrote: - -> -Hi, all -> -> -When I did the cxl memory hot-plug test on QEMU, I accidentally connected -> -two memdev to the same downstream port, the command like below: -> -> -> -object memory-backend-ram,size=262144k,share=on,id=vmem0 \ -> -> -object memory-backend-ram,size=262144k,share=on,id=vmem1 \ -> -> -device pxb-cxl,bus_nr=12,bus=pcie.0,id=cxl.1 \ -> -> -device cxl-rp,port=0,bus=cxl.1,id=root_port0,chassis=0,slot=0 \ -> -> -device cxl-upstream,bus=root_port0,id=us0 \ -> -> -device cxl-downstream,port=0,bus=us0,id=swport00,chassis=0,slot=5 \ -> -> -device cxl-downstream,port=0,bus=us0,id=swport01,chassis=0,slot=7 \ -> -same downstream port but has different slot! -> -> -> -device cxl-type3,bus=swport00,volatile-memdev=vmem0,id=cxl-vmem0 \ -> -> -device cxl-type3,bus=swport01,volatile-memdev=vmem1,id=cxl-vmem1 \ -> -> -M -> -> cxl-fmw.0.targets.0=cxl.1,cxl-fmw.0.size=64G,cxl-fmw.0.interleave-granularity=4k -> -> \ -> -> -There is no error occurred when vm start, but when I executed the âcxl listâ -> -command to view -> -the CXL objects info, the process can not end properly. -I'd be happy to look preventing this on QEMU side if you send one, -but in general there are are lots of ways to shoot yourself in the -foot with CXL and PCI device emulation in QEMU so I'm not going -to rush to solve this specific one. - -Likewise, some hardening in kernel / userspace probably makes sense but -this is a non compliant switch so priority of a fix is probably fairly low. - -Jonathan - -> -> -Then I used strace to trace the process, I found that the process is in -> -infinity loop: -> -# strace cxl list -> -...... -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -clock_nanosleep(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=1000000}, NULL) = 0 -> -openat(AT_FDCWD, "/sys/bus/cxl/flush", O_WRONLY|O_CLOEXEC) = 3 -> -write(3, "1\n\0", 3) = 3 -> -close(3) = 0 -> -access("/run/udev/queue", F_OK) = 0 -> -> -[Environment]: -> -linux: V6.10-rc3 -> -QEMU: V9.0.0 -> -ndctl: v79 -> -> -I know this is because of the wrong use of the QEMU command, but I think we -> -should -> -be aware of this error in one of the QEMU, OS or ndctl side at least. -> -> -Thanks -> -Xingtao - diff --git a/results/classifier/008/all/51610399 b/results/classifier/008/all/51610399 deleted file mode 100644 index 2e420e72d..000000000 --- a/results/classifier/008/all/51610399 +++ /dev/null @@ -1,318 +0,0 @@ -permissions: 0.988 -debug: 0.986 -boot: 0.986 -graphic: 0.986 -other: 0.985 -semantic: 0.984 -device: 0.984 -performance: 0.983 -files: 0.981 -PID: 0.978 -socket: 0.978 -KVM: 0.975 -vnc: 0.974 -network: 0.973 - -[BUG][powerpc] KVM Guest Boot Failure – Hangs at "Booting Linux via __start()” - -Bug Description: -Encountering a boot failure when launching a KVM guest with -qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor -crashes. -Reproduction Steps: -# qemu-system-ppc64 --version -QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) -Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers -# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine -pseries,accel=kvm \ --m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ - -device virtio-scsi-pci,id=scsi \ --drive -file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 -\ --device scsi-hd,drive=drive0,bus=scsi.0 \ - -netdev bridge,id=net0,br=virbr0 \ - -device virtio-net-pci,netdev=net0 \ - -serial pty \ - -device virtio-balloon-pci \ - -cpu host -QEMU 9.2.50 monitor - type 'help' for more information -char device redirected to /dev/pts/2 (label serial0) -(qemu) -(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but -unavailable: IRQ_XIVE capability must be present for KVM -Falling back to kernel-irqchip=off -** Qemu Hang - -(In another ssh session) -# screen /dev/pts/2 -Preparing to boot Linux version 6.10.4-200.fc40.ppc64le -(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 -(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 -15:20:17 UTC 2024 -Detected machine type: 0000000000000101 -command line: -BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le -root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M -Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) -Calling ibm,client-architecture-support... done -memory layout at init: - memory_limit : 0000000000000000 (16 MB aligned) - alloc_bottom : 0000000008200000 - alloc_top : 0000000030000000 - alloc_top_hi : 0000000800000000 - rmo_top : 0000000030000000 - ram_top : 0000000800000000 -instantiating rtas at 0x000000002fff0000... done -prom_hold_cpus: skipped -copying OF device tree... -Building dt strings... -Building dt structure... -Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 -Device tree struct 0x0000000008220000 -> 0x0000000008230000 -Quiescing Open Firmware ... -Booting Linux via __start() @ 0x0000000000440000 ... -** Guest Console Hang - - -Git Bisect: -Performing git bisect points to the following patch: -# git bisect bad -e8291ec16da80566c121c68d9112be458954d90b is the first bad commit -commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) -Author: Nicholas Piggin <npiggin@gmail.com> -Date: Thu Dec 19 13:40:31 2024 +1000 - - target/ppc: fix timebase register reset state -(H)DEC and PURR get reset before icount does, which causes them to -be -skewed and not match the init state. This can cause replay to not -match the recorded trace exactly. For DEC and HDEC this is usually -not -noticable since they tend to get programmed before affecting the - target machine. PURR has been observed to cause replay bugs when - running Linux. - - Fix this by resetting using a time of 0. - - Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> - Signed-off-by: Nicholas Piggin <npiggin@gmail.com> - - hw/ppc/ppc.c | 11 ++++++++--- - 1 file changed, 8 insertions(+), 3 deletions(-) - - -Reverting the patch helps boot the guest. -Thanks, -Misbah Anjum N - -Thanks for the report. - -Tricky problem. A secondary CPU is hanging before it is started by the -primary via rtas call. - -That secondary keeps calling kvm_cpu_exec(), which keeps exiting out -early with EXCP_HLT because kvm_arch_process_async_events() returns -true because that cpu has ->halted=1. That just goes around he run -loop because there is an interrupt pending (DEC). - -So it never runs. It also never releases the BQL, and another CPU, -the primary which is actually supposed to be running, is stuck in -spapr_set_all_lpcrs() in run_on_cpu() waiting for the BQL. - -This patch just exposes the bug I think, by causing the interrupt. -although I'm not quite sure why it's okay previously (-ve decrementer -values should be causing a timer exception too). The timer exception -should not be taken as an interrupt by those secondary CPUs, and it -doesn't because it is masked, until set_all_lpcrs sets an LPCR value -that enables powersave wakeup on decrementer interrupt. - -The start_powered_off sate just sets ->halted, which makes it look -like a powersaving state. Logically I think it's not the same thing -as far as spapr goes. I don't know why start_powered_off only sets -->halted, and not ->stop/stopped as well. - -Not sure how best to solve it cleanly. I'll send a revert if I can't -get something working soon. - -Thanks, -Nick - -On Tue Mar 18, 2025 at 7:09 AM AEST, misanjum wrote: -> -Bug Description: -> -Encountering a boot failure when launching a KVM guest with -> -qemu-system-ppc64. The guest hangs at boot, and the QEMU monitor -> -crashes. -> -> -> -Reproduction Steps: -> -# qemu-system-ppc64 --version -> -QEMU emulator version 9.2.50 (v9.2.0-2799-g0462a32b4f) -> -Copyright (c) 2003-2025 Fabrice Bellard and the QEMU Project developers -> -> -# /usr/bin/qemu-system-ppc64 -name avocado-vt-vm1 -machine -> -pseries,accel=kvm \ -> --m 32768 -smp 32,sockets=1,cores=32,threads=1 -nographic \ -> --device virtio-scsi-pci,id=scsi \ -> --drive -> -file=/home/kvmci/tests/data/avocado-vt/images/rhel8.0devel-ppc64le.qcow2,if=none,id=drive0,format=qcow2 -> -> -\ -> --device scsi-hd,drive=drive0,bus=scsi.0 \ -> --netdev bridge,id=net0,br=virbr0 \ -> --device virtio-net-pci,netdev=net0 \ -> --serial pty \ -> --device virtio-balloon-pci \ -> --cpu host -> -QEMU 9.2.50 monitor - type 'help' for more information -> -char device redirected to /dev/pts/2 (label serial0) -> -(qemu) -> -(qemu) qemu-system-ppc64: warning: kernel_irqchip allowed but -> -unavailable: IRQ_XIVE capability must be present for KVM -> -Falling back to kernel-irqchip=off -> -** Qemu Hang -> -> -(In another ssh session) -> -# screen /dev/pts/2 -> -Preparing to boot Linux version 6.10.4-200.fc40.ppc64le -> -(mockbuild@c23cc4e677614c34bb22d54eeea4dc1f) (gcc (GCC) 14.2.1 20240801 -> -(Red Hat 14.2.1-1), GNU ld version 2.41-37.fc40) #1 SMP Sun Aug 11 -> -15:20:17 UTC 2024 -> -Detected machine type: 0000000000000101 -> -command line: -> -BOOT_IMAGE=(ieee1275/disk,msdos2)/vmlinuz-6.10.4-200.fc40.ppc64le -> -root=/dev/mapper/fedora-root ro rd.lvm.lv=fedora/root crashkernel=1024M -> -Max number of cores passed to firmware: 2048 (NR_CPUS = 2048) -> -Calling ibm,client-architecture-support... done -> -memory layout at init: -> -memory_limit : 0000000000000000 (16 MB aligned) -> -alloc_bottom : 0000000008200000 -> -alloc_top : 0000000030000000 -> -alloc_top_hi : 0000000800000000 -> -rmo_top : 0000000030000000 -> -ram_top : 0000000800000000 -> -instantiating rtas at 0x000000002fff0000... done -> -prom_hold_cpus: skipped -> -copying OF device tree... -> -Building dt strings... -> -Building dt structure... -> -Device tree strings 0x0000000008210000 -> 0x0000000008210bd0 -> -Device tree struct 0x0000000008220000 -> 0x0000000008230000 -> -Quiescing Open Firmware ... -> -Booting Linux via __start() @ 0x0000000000440000 ... -> -** Guest Console Hang -> -> -> -Git Bisect: -> -Performing git bisect points to the following patch: -> -# git bisect bad -> -e8291ec16da80566c121c68d9112be458954d90b is the first bad commit -> -commit e8291ec16da80566c121c68d9112be458954d90b (HEAD) -> -Author: Nicholas Piggin <npiggin@gmail.com> -> -Date: Thu Dec 19 13:40:31 2024 +1000 -> -> -target/ppc: fix timebase register reset state -> -> -(H)DEC and PURR get reset before icount does, which causes them to -> -be -> -skewed and not match the init state. This can cause replay to not -> -match the recorded trace exactly. For DEC and HDEC this is usually -> -not -> -noticable since they tend to get programmed before affecting the -> -target machine. PURR has been observed to cause replay bugs when -> -running Linux. -> -> -Fix this by resetting using a time of 0. -> -> -Message-ID: <20241219034035.1826173-2-npiggin@gmail.com> -> -Signed-off-by: Nicholas Piggin <npiggin@gmail.com> -> -> -hw/ppc/ppc.c | 11 ++++++++--- -> -1 file changed, 8 insertions(+), 3 deletions(-) -> -> -> -Reverting the patch helps boot the guest. -> -Thanks, -> -Misbah Anjum N - diff --git a/results/classifier/008/all/59540920 b/results/classifier/008/all/59540920 deleted file mode 100644 index 85d1e913a..000000000 --- a/results/classifier/008/all/59540920 +++ /dev/null @@ -1,386 +0,0 @@ -other: 0.989 -files: 0.987 -permissions: 0.986 -graphic: 0.985 -debug: 0.985 -device: 0.985 -semantic: 0.985 -socket: 0.983 -performance: 0.983 -PID: 0.982 -network: 0.981 -boot: 0.980 -vnc: 0.977 -KVM: 0.970 - -[BUG] No irqchip created after commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") - -I apologize if this was already reported, - -I just noticed that with the latest updates QEMU doesn't start with the -following configuration: - -qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu -host,hv_vpindex,hv_synic ... - -qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument -qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument - -If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as -usual. I bisected this to the following commit: - -commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad) -Author: Paolo Bonzini <address@hidden> -Date: Wed Nov 13 10:56:53 2019 +0100 - - kvm: convert "-machine kernel_irqchip" to an accelerator property - -so aparently we now default to 'kernel_irqchip=off'. Is this the desired -behavior? - --- -Vitaly - -No, absolutely not. I was sure I had tested it, but I will take a look. -Paolo -Il ven 20 dic 2019, 15:11 Vitaly Kuznetsov < -address@hidden -> ha scritto: -I apologize if this was already reported, -I just noticed that with the latest updates QEMU doesn't start with the -following configuration: -qemu-system-x86_64 -name guest=win10 -machine pc,accel=kvm -cpu host,hv_vpindex,hv_synic ... -qemu-system-x86_64: failed to turn on HyperV SynIC in KVM: Invalid argument -qemu-system-x86_64: kvm_init_vcpu failed: Invalid argument -If I add 'kernel-irqchip=split' or ',kernel-irqchip=on' it starts as -usual. I bisected this to the following commit: -commit 11bc4a13d1f4b07dafbd1dda4d4bf0fdd7ad65f2 (HEAD, refs/bisect/bad) -Author: Paolo Bonzini < -address@hidden -> -Date:  Wed Nov 13 10:56:53 2019 +0100 -  kvm: convert "-machine kernel_irqchip" to an accelerator property -so aparently we now default to 'kernel_irqchip=off'. Is this the desired -behavior? --- -Vitaly - -Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an -accelerator property") moves kernel_irqchip property from "-machine" to -"-accel kvm", but it forgets to set the default value of -kernel_irqchip_allowed and kernel_irqchip_split. - -Also cleaning up the three useless members (kernel_irqchip_allowed, -kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. - -Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator -property") -Signed-off-by: Xiaoyao Li <address@hidden> ---- - accel/kvm/kvm-all.c | 3 +++ - include/hw/boards.h | 3 --- - 2 files changed, 3 insertions(+), 3 deletions(-) - -diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c -index b2f1a5bcb5ef..40f74094f8d3 100644 ---- a/accel/kvm/kvm-all.c -+++ b/accel/kvm/kvm-all.c -@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) - static void kvm_accel_instance_init(Object *obj) - { - KVMState *s = KVM_STATE(obj); -+ MachineClass *mc = MACHINE_GET_CLASS(current_machine); - - s->kvm_shadow_mem = -1; -+ s->kernel_irqchip_allowed = true; -+ s->kernel_irqchip_split = mc->default_kernel_irqchip_split; - } - - static void kvm_accel_class_init(ObjectClass *oc, void *data) -diff --git a/include/hw/boards.h b/include/hw/boards.h -index 61f8bb8e5a42..fb1b43d5b972 100644 ---- a/include/hw/boards.h -+++ b/include/hw/boards.h -@@ -271,9 +271,6 @@ struct MachineState { - - /*< public >*/ - -- bool kernel_irqchip_allowed; -- bool kernel_irqchip_required; -- bool kernel_irqchip_split; - char *dtb; - char *dumpdtb; - int phandle_start; --- -2.19.1 - -Il sab 28 dic 2019, 09:48 Xiaoyao Li < -address@hidden -> ha scritto: -Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an -accelerator property") moves kernel_irqchip property from "-machine" to -"-accel kvm", but it forgets to set the default value of -kernel_irqchip_allowed and kernel_irqchip_split. -Also cleaning up the three useless members (kernel_irqchip_allowed, -kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. -Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an accelerator property") -Signed-off-by: Xiaoyao Li < -address@hidden -> -Please also add a Reported-by line for Vitaly Kuznetsov. ---- - accel/kvm/kvm-all.c | 3 +++ - include/hw/boards.h | 3 --- - 2 files changed, 3 insertions(+), 3 deletions(-) -diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c -index b2f1a5bcb5ef..40f74094f8d3 100644 ---- a/accel/kvm/kvm-all.c -+++ b/accel/kvm/kvm-all.c -@@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) - static void kvm_accel_instance_init(Object *obj) - { -   KVMState *s = KVM_STATE(obj); -+  MachineClass *mc = MACHINE_GET_CLASS(current_machine); -   s->kvm_shadow_mem = -1; -+  s->kernel_irqchip_allowed = true; -+  s->kernel_irqchip_split = mc->default_kernel_irqchip_split; -Can you initialize this from the init_machine method instead of assuming that current_machine has been initialized earlier? -Thanks for the quick fix! -Paolo - } - static void kvm_accel_class_init(ObjectClass *oc, void *data) -diff --git a/include/hw/boards.h b/include/hw/boards.h -index 61f8bb8e5a42..fb1b43d5b972 100644 ---- a/include/hw/boards.h -+++ b/include/hw/boards.h -@@ -271,9 +271,6 @@ struct MachineState { -   /*< public >*/ --  bool kernel_irqchip_allowed; --  bool kernel_irqchip_required; --  bool kernel_irqchip_split; -   char *dtb; -   char *dumpdtb; -   int phandle_start; --- -2.19.1 - -On Sat, 2019-12-28 at 10:02 +0000, Paolo Bonzini wrote: -> -> -> -Il sab 28 dic 2019, 09:48 Xiaoyao Li <address@hidden> ha scritto: -> -> Commit 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an -> -> accelerator property") moves kernel_irqchip property from "-machine" to -> -> "-accel kvm", but it forgets to set the default value of -> -> kernel_irqchip_allowed and kernel_irqchip_split. -> -> -> -> Also cleaning up the three useless members (kernel_irqchip_allowed, -> -> kernel_irqchip_required, kernel_irqchip_split) in struct MachineState. -> -> -> -> Fixes: 11bc4a13d1f4 ("kvm: convert "-machine kernel_irqchip" to an -> -> accelerator property") -> -> Signed-off-by: Xiaoyao Li <address@hidden> -> -> -Please also add a Reported-by line for Vitaly Kuznetsov. -Sure. - -> -> --- -> -> accel/kvm/kvm-all.c | 3 +++ -> -> include/hw/boards.h | 3 --- -> -> 2 files changed, 3 insertions(+), 3 deletions(-) -> -> -> -> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c -> -> index b2f1a5bcb5ef..40f74094f8d3 100644 -> -> --- a/accel/kvm/kvm-all.c -> -> +++ b/accel/kvm/kvm-all.c -> -> @@ -3044,8 +3044,11 @@ bool kvm_kernel_irqchip_split(void) -> -> static void kvm_accel_instance_init(Object *obj) -> -> { -> -> KVMState *s = KVM_STATE(obj); -> -> + MachineClass *mc = MACHINE_GET_CLASS(current_machine); -> -> -> -> s->kvm_shadow_mem = -1; -> -> + s->kernel_irqchip_allowed = true; -> -> + s->kernel_irqchip_split = mc->default_kernel_irqchip_split; -> -> -Can you initialize this from the init_machine method instead of assuming that -> -current_machine has been initialized earlier? -OK, will do it in v2. - -> -Thanks for the quick fix! -BTW, it seems that this patch makes kernel_irqchip default on to workaround the -bug. -However, when explicitly configuring kernel_irqchip=off, guest still fails -booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel -ubuntu guest. Any idea about this? - -> -Paolo -> -> } -> -> -> -> static void kvm_accel_class_init(ObjectClass *oc, void *data) -> -> diff --git a/include/hw/boards.h b/include/hw/boards.h -> -> index 61f8bb8e5a42..fb1b43d5b972 100644 -> -> --- a/include/hw/boards.h -> -> +++ b/include/hw/boards.h -> -> @@ -271,9 +271,6 @@ struct MachineState { -> -> -> -> /*< public >*/ -> -> -> -> - bool kernel_irqchip_allowed; -> -> - bool kernel_irqchip_required; -> -> - bool kernel_irqchip_split; -> -> char *dtb; -> -> char *dumpdtb; -> -> int phandle_start; - -Il sab 28 dic 2019, 10:24 Xiaoyao Li < -address@hidden -> ha scritto: -BTW, it seems that this patch makes kernel_irqchip default on to workaround the -bug. -However, when explicitly configuring kernel_irqchip=off, guest still fails -booting due to "KVM: failed to send PV IPI: -95" with a latest upstream kernel -ubuntu guest. Any idea about this? -We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu host by chance? -Paolo -> Paolo -> > } -> > -> > static void kvm_accel_class_init(ObjectClass *oc, void *data) -> > diff --git a/include/hw/boards.h b/include/hw/boards.h -> > index 61f8bb8e5a42..fb1b43d5b972 100644 -> > --- a/include/hw/boards.h -> > +++ b/include/hw/boards.h -> > @@ -271,9 +271,6 @@ struct MachineState { -> > -> >   /*< public >*/ -> > -> > -  bool kernel_irqchip_allowed; -> > -  bool kernel_irqchip_required; -> > -  bool kernel_irqchip_split; -> >   char *dtb; -> >   char *dumpdtb; -> >   int phandle_start; - -On Sat, 2019-12-28 at 10:57 +0000, Paolo Bonzini wrote: -> -> -> -Il sab 28 dic 2019, 10:24 Xiaoyao Li <address@hidden> ha scritto: -> -> BTW, it seems that this patch makes kernel_irqchip default on to workaround -> -> the -> -> bug. -> -> However, when explicitly configuring kernel_irqchip=off, guest still fails -> -> booting due to "KVM: failed to send PV IPI: -95" with a latest upstream -> -> kernel -> -> ubuntu guest. Any idea about this? -> -> -We need to clear the PV IPI feature for userspace irqchip. Are you using -cpu -> -host by chance? -Yes, I used -cpu host. - -After using "-cpu host,-kvm-pv-ipi" with kernel_irqchip=off, it can boot -successfully. - -> -Paolo -> -> -> > Paolo -> -> > > } -> -> > > -> -> > > static void kvm_accel_class_init(ObjectClass *oc, void *data) -> -> > > diff --git a/include/hw/boards.h b/include/hw/boards.h -> -> > > index 61f8bb8e5a42..fb1b43d5b972 100644 -> -> > > --- a/include/hw/boards.h -> -> > > +++ b/include/hw/boards.h -> -> > > @@ -271,9 +271,6 @@ struct MachineState { -> -> > > -> -> > > /*< public >*/ -> -> > > -> -> > > - bool kernel_irqchip_allowed; -> -> > > - bool kernel_irqchip_required; -> -> > > - bool kernel_irqchip_split; -> -> > > char *dtb; -> -> > > char *dumpdtb; -> -> > > int phandle_start; -> -> - diff --git a/results/classifier/008/all/80570214 b/results/classifier/008/all/80570214 deleted file mode 100644 index b531fb673..000000000 --- a/results/classifier/008/all/80570214 +++ /dev/null @@ -1,410 +0,0 @@ -vnc: 0.983 -permissions: 0.983 -debug: 0.979 -semantic: 0.978 -other: 0.978 -graphic: 0.978 -performance: 0.976 -PID: 0.976 -network: 0.975 -socket: 0.975 -device: 0.974 -KVM: 0.971 -boot: 0.969 -files: 0.961 - -[Qemu-devel] [vhost-user BUG ?] QEMU process segfault when shutdown or reboot with vhost-user - -Hi, - -We catch a segfault in our project. - -Qemu version is 2.3.0 - -The Stack backtrace is: -(gdb) bt -#0 0x0000000000000000 in ?? () -#1 0x00007f7ad9280b2f in qemu_deliver_packet (sender=<optimized out>, flags=<optimized -out>, data=<optimized out>, size=100, opaque= - 0x7f7ad9d6db10) at net/net.c:510 -#2 0x00007f7ad92831fa in qemu_net_queue_deliver (size=<optimized out>, data=<optimized -out>, flags=<optimized out>, - sender=<optimized out>, queue=<optimized out>) at net/queue.c:157 -#3 qemu_net_queue_flush (queue=0x7f7ad9d39630) at net/queue.c:254 -#4 0x00007f7ad9280dac in qemu_flush_or_purge_queued_packets -(nc=0x7f7ad9d6db10, purge=true) at net/net.c:539 -#5 0x00007f7ad9280e76 in net_vm_change_state_handler (opaque=<optimized out>, -running=<optimized out>, state=100) at net/net.c:1214 -#6 0x00007f7ad915612f in vm_state_notify (running=0, state=RUN_STATE_SHUTDOWN) -at vl.c:1820 -#7 0x00007f7ad906db1a in do_vm_stop (state=<optimized out>) at -/usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:631 -#8 vm_stop (state=RUN_STATE_SHUTDOWN) at -/usr/src/packages/BUILD/qemu-kvm-2.3.0/cpus.c:1325 -#9 0x00007f7ad915e4a2 in main_loop_should_exit () at vl.c:2080 -#10 main_loop () at vl.c:2131 -#11 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at -vl.c:4721 -(gdb) p *(NetClientState *)0x7f7ad9d6db10 -$1 = {info = 0x7f7ad9824520, link_down = 0, next = {tqe_next = 0x7f7ad0f06d10, -tqe_prev = 0x7f7ad98b1cf0}, peer = 0x7f7ad0f06d10, - incoming_queue = 0x7f7ad9d39630, model = 0x7f7ad9d39590 "vhost_user", name = -0x7f7ad9d39570 "hostnet0", info_str = - "vhost-user to charnet0", '\000' <repeats 233 times>, receive_disabled = 0, -destructor = - 0x7f7ad92821f0 <qemu_net_client_destructor>, queue_index = 0, -rxfilter_notify_enabled = 0} -(gdb) p *(NetClientInfo *)0x7f7ad9824520 -$2 = {type = NET_CLIENT_OPTIONS_KIND_VHOST_USER, size = 360, receive = 0, -receive_raw = 0, receive_iov = 0, can_receive = 0, cleanup = - 0x7f7ad9288850 <vhost_user_cleanup>, link_status_changed = 0, -query_rx_filter = 0, poll = 0, has_ufo = - 0x7f7ad92886d0 <vhost_user_has_ufo>, has_vnet_hdr = 0x7f7ad9288670 -<vhost_user_has_vnet_hdr>, has_vnet_hdr_len = 0, - using_vnet_hdr = 0, set_offload = 0, set_vnet_hdr_len = 0} -(gdb) - -The corresponding codes where gdb reports error are: (We have added some codes -in net.c) -ssize_t qemu_deliver_packet(NetClientState *sender, - unsigned flags, - const uint8_t *data, - size_t size, - void *opaque) -{ - NetClientState *nc = opaque; - ssize_t ret; - - if (nc->link_down) { - return size; - } - - if (nc->receive_disabled) { - return 0; - } - - if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { - ret = nc->info->receive_raw(nc, data, size); - } else { - ret = nc->info->receive(nc, data, size); ----> Here is 510 line - } - -I'm not quite familiar with vhost-user, but for vhost-user, these two callback -functions seem to be always NULL, -Why we can come here ? -Is it an error to add VM state change handler for vhost-user ? - -Thanks, -zhanghailiang - -Hi - -On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang -<address@hidden> wrote: -> -The corresponding codes where gdb reports error are: (We have added some -> -codes in net.c) -Can you reproduce with unmodified qemu? Could you give instructions to do so? - -> -ssize_t qemu_deliver_packet(NetClientState *sender, -> -unsigned flags, -> -const uint8_t *data, -> -size_t size, -> -void *opaque) -> -{ -> -NetClientState *nc = opaque; -> -ssize_t ret; -> -> -if (nc->link_down) { -> -return size; -> -} -> -> -if (nc->receive_disabled) { -> -return 0; -> -} -> -> -if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { -> -ret = nc->info->receive_raw(nc, data, size); -> -} else { -> -ret = nc->info->receive(nc, data, size); ----> Here is 510 line -> -} -> -> -I'm not quite familiar with vhost-user, but for vhost-user, these two -> -callback functions seem to be always NULL, -> -Why we can come here ? -You should not come here, vhost-user has nc->receive_disabled (it -changes in 2.5) - --- -Marc-André Lureau - -On 2015/11/3 22:54, Marc-André Lureau wrote: -Hi - -On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang -<address@hidden> wrote: -The corresponding codes where gdb reports error are: (We have added some -codes in net.c) -Can you reproduce with unmodified qemu? Could you give instructions to do so? -OK, i will try to do it. There is nothing special, we run iperf tool in VM, -and then shutdown or reboot it. There is change you can catch segfault. -ssize_t qemu_deliver_packet(NetClientState *sender, - unsigned flags, - const uint8_t *data, - size_t size, - void *opaque) -{ - NetClientState *nc = opaque; - ssize_t ret; - - if (nc->link_down) { - return size; - } - - if (nc->receive_disabled) { - return 0; - } - - if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { - ret = nc->info->receive_raw(nc, data, size); - } else { - ret = nc->info->receive(nc, data, size); ----> Here is 510 line - } - -I'm not quite familiar with vhost-user, but for vhost-user, these two -callback functions seem to be always NULL, -Why we can come here ? -You should not come here, vhost-user has nc->receive_disabled (it -changes in 2.5) -I have looked at the newest codes, i think we can still have chance to -come here, since we will change nc->receive_disable to false temporarily in -qemu_flush_or_purge_queued_packets(), there is no difference between 2.3 and 2.5 -for this. -Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true -in qemu_net_queue_flush() for vhost-user ? - -i will try to reproduce it by using newest qemu. - -Thanks, -zhanghailiang - -On 11/04/2015 10:24 AM, zhanghailiang wrote: -> -On 2015/11/3 22:54, Marc-André Lureau wrote: -> -> Hi -> -> -> -> On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang -> -> <address@hidden> wrote: -> ->> The corresponding codes where gdb reports error are: (We have added -> ->> some -> ->> codes in net.c) -> -> -> -> Can you reproduce with unmodified qemu? Could you give instructions -> -> to do so? -> -> -> -> -OK, i will try to do it. There is nothing special, we run iperf tool -> -in VM, -> -and then shutdown or reboot it. There is change you can catch segfault. -> -> ->> ssize_t qemu_deliver_packet(NetClientState *sender, -> ->> unsigned flags, -> ->> const uint8_t *data, -> ->> size_t size, -> ->> void *opaque) -> ->> { -> ->> NetClientState *nc = opaque; -> ->> ssize_t ret; -> ->> -> ->> if (nc->link_down) { -> ->> return size; -> ->> } -> ->> -> ->> if (nc->receive_disabled) { -> ->> return 0; -> ->> } -> ->> -> ->> if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { -> ->> ret = nc->info->receive_raw(nc, data, size); -> ->> } else { -> ->> ret = nc->info->receive(nc, data, size); ----> Here is -> ->> 510 line -> ->> } -> ->> -> ->> I'm not quite familiar with vhost-user, but for vhost-user, these two -> ->> callback functions seem to be always NULL, -> ->> Why we can come here ? -> -> -> -> You should not come here, vhost-user has nc->receive_disabled (it -> -> changes in 2.5) -> -> -> -> -I have looked at the newest codes, i think we can still have chance to -> -come here, since we will change nc->receive_disable to false -> -temporarily in -> -qemu_flush_or_purge_queued_packets(), there is no difference between -> -2.3 and 2.5 -> -for this. -> -Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true -> -in qemu_net_queue_flush() for vhost-user ? -The only thing I can image is self announcing. Are you trying to do -migration? 2.5 only support sending rarp through this. - -And it's better to have a breakpoint to see why a packet was queued for -vhost-user. The stack trace may also help in this case. - -> -> -i will try to reproduce it by using newest qemu. -> -> -Thanks, -> -zhanghailiang -> - -On 2015/11/4 11:19, Jason Wang wrote: -On 11/04/2015 10:24 AM, zhanghailiang wrote: -On 2015/11/3 22:54, Marc-André Lureau wrote: -Hi - -On Tue, Nov 3, 2015 at 2:01 PM, zhanghailiang -<address@hidden> wrote: -The corresponding codes where gdb reports error are: (We have added -some -codes in net.c) -Can you reproduce with unmodified qemu? Could you give instructions -to do so? -OK, i will try to do it. There is nothing special, we run iperf tool -in VM, -and then shutdown or reboot it. There is change you can catch segfault. -ssize_t qemu_deliver_packet(NetClientState *sender, - unsigned flags, - const uint8_t *data, - size_t size, - void *opaque) -{ - NetClientState *nc = opaque; - ssize_t ret; - - if (nc->link_down) { - return size; - } - - if (nc->receive_disabled) { - return 0; - } - - if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) { - ret = nc->info->receive_raw(nc, data, size); - } else { - ret = nc->info->receive(nc, data, size); ----> Here is -510 line - } - -I'm not quite familiar with vhost-user, but for vhost-user, these two -callback functions seem to be always NULL, -Why we can come here ? -You should not come here, vhost-user has nc->receive_disabled (it -changes in 2.5) -I have looked at the newest codes, i think we can still have chance to -come here, since we will change nc->receive_disable to false -temporarily in -qemu_flush_or_purge_queued_packets(), there is no difference between -2.3 and 2.5 -for this. -Besides, is it possible for !QTAILQ_EMPTY(&queue->packets) to be true -in qemu_net_queue_flush() for vhost-user ? -The only thing I can image is self announcing. Are you trying to do -migration? 2.5 only support sending rarp through this. -Hmm, it's not triggered by migration, For qemu-2.5, IMHO, it doesn't have such -problem, -since the callback function 'receive' is not NULL. It is vhost_user_receive(). -And it's better to have a breakpoint to see why a packet was queued for -vhost-user. The stack trace may also help in this case. -OK, i'm trying to reproduce it. - -Thanks, -zhanghailiang -i will try to reproduce it by using newest qemu. - -Thanks, -zhanghailiang -. - diff --git a/results/classifier/008/all/88225572 b/results/classifier/008/all/88225572 deleted file mode 100644 index 292ea66b8..000000000 --- a/results/classifier/008/all/88225572 +++ /dev/null @@ -1,2910 +0,0 @@ -permissions: 0.992 -other: 0.987 -debug: 0.986 -PID: 0.984 -semantic: 0.976 -graphic: 0.974 -device: 0.970 -boot: 0.969 -performance: 0.965 -vnc: 0.958 -files: 0.957 -socket: 0.955 -network: 0.950 -KVM: 0.924 - -[BUG qemu 4.0] segfault when unplugging virtio-blk-pci device - -Hi, - -I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -think it's because io completion hits use-after-free when device is -already gone. Is this a known bug that has been fixed? (I went through -the git log but didn't find anything obvious). - -gdb backtrace is: - -Core was generated by `/usr/local/libexec/qemu-kvm -name -sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -Program terminated with signal 11, Segmentation fault. -#0 object_get_class (obj=obj@entry=0x0) at -/usr/src/debug/qemu-4.0/qom/object.c:903 -903 return obj->class; -(gdb) bt -#0 object_get_class (obj=obj@entry=0x0) at -/usr/src/debug/qemu-4.0/qom/object.c:903 -#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -  opaque=0x558a2f2fd420, ret=0) -  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -  i1=<optimized out>) at /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -#6  0x00007fff9ed75780 in ?? () -#7  0x0000000000000000 in ?? () - -It seems like qemu was completing a discard/write_zero request, but -parent BusState was already freed & set to NULL. - -Do we need to drain all pending request before unrealizing virtio-blk -device? Like the following patch proposed? -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -If more info is needed, please let me know. - -Thanks, -Eryu - -On Tue, 31 Dec 2019 18:34:34 +0800 -Eryu Guan <address@hidden> wrote: - -> -Hi, -> -> -I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -> -think it's because io completion hits use-after-free when device is -> -already gone. Is this a known bug that has been fixed? (I went through -> -the git log but didn't find anything obvious). -> -> -gdb backtrace is: -> -> -Core was generated by `/usr/local/libexec/qemu-kvm -name -> -sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -Program terminated with signal 11, Segmentation fault. -> -#0 object_get_class (obj=obj@entry=0x0) at -> -/usr/src/debug/qemu-4.0/qom/object.c:903 -> -903 return obj->class; -> -(gdb) bt -> -#0 object_get_class (obj=obj@entry=0x0) at -> -/usr/src/debug/qemu-4.0/qom/object.c:903 -> -#1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> -  vector=<optimized out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -#2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> -  opaque=0x558a2f2fd420, ret=0) -> -  at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -#3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -  at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -#4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> -  i1=<optimized out>) at -> -/usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -#5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -#6  0x00007fff9ed75780 in ?? () -> -#7  0x0000000000000000 in ?? () -> -> -It seems like qemu was completing a discard/write_zero request, but -> -parent BusState was already freed & set to NULL. -> -> -Do we need to drain all pending request before unrealizing virtio-blk -> -device? Like the following patch proposed? -> -> -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> -If more info is needed, please let me know. -may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> -Thanks, -> -Eryu -> - -On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -On Tue, 31 Dec 2019 18:34:34 +0800 -> -Eryu Guan <address@hidden> wrote: -> -> -> Hi, -> -> -> -> I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -> -> think it's because io completion hits use-after-free when device is -> -> already gone. Is this a known bug that has been fixed? (I went through -> -> the git log but didn't find anything obvious). -> -> -> -> gdb backtrace is: -> -> -> -> Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> Program terminated with signal 11, Segmentation fault. -> -> #0 object_get_class (obj=obj@entry=0x0) at -> -> /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> 903 return obj->class; -> -> (gdb) bt -> -> #0 object_get_class (obj=obj@entry=0x0) at -> -> /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> ->   vector=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> ->   opaque=0x558a2f2fd420, ret=0) -> ->   at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> ->   at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> ->   i1=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> #6  0x00007fff9ed75780 in ?? () -> -> #7  0x0000000000000000 in ?? () -> -> -> -> It seems like qemu was completing a discard/write_zero request, but -> -> parent BusState was already freed & set to NULL. -> -> -> -> Do we need to drain all pending request before unrealizing virtio-blk -> -> device? Like the following patch proposed? -> -> -> -> -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> -> -> If more info is needed, please let me know. -> -> -may be this will help: -https://patchwork.kernel.org/patch/11213047/ -Yeah, this looks promising! I'll try it out (though it's a one-time -crash for me). Thanks! - -Eryu - -On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> On Tue, 31 Dec 2019 18:34:34 +0800 -> -> Eryu Guan <address@hidden> wrote: -> -> -> -> > Hi, -> -> > -> -> > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -> -> > think it's because io completion hits use-after-free when device is -> -> > already gone. Is this a known bug that has been fixed? (I went through -> -> > the git log but didn't find anything obvious). -> -> > -> -> > gdb backtrace is: -> -> > -> -> > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > Program terminated with signal 11, Segmentation fault. -> -> > #0 object_get_class (obj=obj@entry=0x0) at -> -> > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > 903 return obj->class; -> -> > (gdb) bt -> -> > #0 object_get_class (obj=obj@entry=0x0) at -> -> > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > #1  0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> -> >   vector=<optimized out>) at -> -> > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > #2  0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> -> >   opaque=0x558a2f2fd420, ret=0) -> -> >   at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > #3  0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> >   at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > #4  0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> -> >   i1=<optimized out>) at -> -> > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > #5  0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > #6  0x00007fff9ed75780 in ?? () -> -> > #7  0x0000000000000000 in ?? () -> -> > -> -> > It seems like qemu was completing a discard/write_zero request, but -> -> > parent BusState was already freed & set to NULL. -> -> > -> -> > Do we need to drain all pending request before unrealizing virtio-blk -> -> > device? Like the following patch proposed? -> -> > -> -> > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > -> -> > If more info is needed, please let me know. -> -> -> -> may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> -Yeah, this looks promising! I'll try it out (though it's a one-time -> -crash for me). Thanks! -After applying this patch, I don't see the original segfaut and -backtrace, but I see this crash - -[Thread debugging using libthread_db enabled] -Using host libthread_db library "/lib64/libthread_db.so.1". -Core was generated by `/usr/local/libexec/qemu-kvm -name -sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -Program terminated with signal 11, Segmentation fault. -#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -addr=0, val=<optimized out>, size=<optimized out>) at -/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -1324 VirtIOPCIProxy *proxy = -VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -Missing separate debuginfos, use: debuginfo-install -glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 -pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -(gdb) bt -#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -addr=0, val=<optimized out>, size=<optimized out>) at -/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, -addr=<optimized out>, value=<optimized out>, size=<optimized out>, -shift=<optimized out>, mask=<optimized out>, attrs=...) at -/usr/src/debug/qemu-4.0/memory.c:502 -#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, -value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, access_size_min=<optimized -out>, access_size_max=<optimized out>, access_fn=0x561216835ac0 -<memory_region_write_accessor>, mr=0x56121846d340, attrs=...) - at /usr/src/debug/qemu-4.0/memory.c:568 -#3 0x0000561216837c66 in memory_region_dispatch_write -(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -#4 0x00005612167e036f in flatview_write_continue (fv=fv@entry=0x56121852edd0, -addr=addr@entry=841813602304, attrs=..., buf=buf@entry=0x7fce7dd97028 <Address -0x7fce7dd97028 out of bounds>, len=len@entry=2, addr1=<optimized out>, -l=<optimized out>, mr=0x56121846d340) - at /usr/src/debug/qemu-4.0/exec.c:3279 -#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, addr=841813602304, -attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, len=2) at -/usr/src/debug/qemu-4.0/exec.c:3318 -#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at -/usr/src/debug/qemu-4.0/exec.c:3408 -#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, addr=<optimized -out>, attrs=..., attrs@entry=..., buf=buf@entry=0x7fce7dd97028 <Address -0x7fce7dd97028 out of bounds>, len=<optimized out>, is_write=<optimized out>) -at /usr/src/debug/qemu-4.0/exec.c:3419 -#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at -/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -#9 0x000056121682255e in qemu_kvm_cpu_thread_fn (arg=arg@entry=0x56121849aa00) -at /usr/src/debug/qemu-4.0/cpus.c:1281 -#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 - -And I searched and found -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the same -backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -blk_drain() to virtio_blk_device_unrealize()") is to fix this particular -bug. - -But I can still hit the bug even after applying the commit. Do I miss -anything? - -Thanks, -Eryu -> -Eryu - -On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> -On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > Eryu Guan <address@hidden> wrote: -> -> > -> -> > > Hi, -> -> > > -> -> > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -> -> > > think it's because io completion hits use-after-free when device is -> -> > > already gone. Is this a known bug that has been fixed? (I went through -> -> > > the git log but didn't find anything obvious). -> -> > > -> -> > > gdb backtrace is: -> -> > > -> -> > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > Program terminated with signal 11, Segmentation fault. -> -> > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > 903 return obj->class; -> -> > > (gdb) bt -> -> > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> -> > > vector=<optimized out>) at -> -> > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> -> > > opaque=0x558a2f2fd420, ret=0) -> -> > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> -> > > i1=<optimized out>) at -> -> > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > #6 0x00007fff9ed75780 in ?? () -> -> > > #7 0x0000000000000000 in ?? () -> -> > > -> -> > > It seems like qemu was completing a discard/write_zero request, but -> -> > > parent BusState was already freed & set to NULL. -> -> > > -> -> > > Do we need to drain all pending request before unrealizing virtio-blk -> -> > > device? Like the following patch proposed? -> -> > > -> -> > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > -> -> > > If more info is needed, please let me know. -> -> > -> -> > may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> -> -> Yeah, this looks promising! I'll try it out (though it's a one-time -> -> crash for me). Thanks! -> -> -After applying this patch, I don't see the original segfaut and -> -backtrace, but I see this crash -> -> -[Thread debugging using libthread_db enabled] -> -Using host libthread_db library "/lib64/libthread_db.so.1". -> -Core was generated by `/usr/local/libexec/qemu-kvm -name -> -sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -Program terminated with signal 11, Segmentation fault. -> -#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -addr=0, val=<optimized out>, size=<optimized out>) at -> -/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -1324 VirtIOPCIProxy *proxy = -> -VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -Missing separate debuginfos, use: debuginfo-install -> -glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 -> -pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -> -(gdb) bt -> -#0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -addr=0, val=<optimized out>, size=<optimized out>) at -> -/usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -#1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, -> -addr=<optimized out>, value=<optimized out>, size=<optimized out>, -> -shift=<optimized out>, mask=<optimized out>, attrs=...) at -> -/usr/src/debug/qemu-4.0/memory.c:502 -> -#2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, -> -value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, -> -access_size_min=<optimized out>, access_size_max=<optimized out>, -> -access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, -> -attrs=...) -> -at /usr/src/debug/qemu-4.0/memory.c:568 -> -#3 0x0000561216837c66 in memory_region_dispatch_write -> -(mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -#4 0x00005612167e036f in flatview_write_continue -> -(fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., -> -buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -len=len@entry=2, addr1=<optimized out>, l=<optimized out>, mr=0x56121846d340) -> -at /usr/src/debug/qemu-4.0/exec.c:3279 -> -#5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 out -> -of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 -> -#6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) at -> -/usr/src/debug/qemu-4.0/exec.c:3408 -> -#7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -addr=<optimized out>, attrs=..., attrs@entry=..., -> -buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -len=<optimized out>, is_write=<optimized out>) at -> -/usr/src/debug/qemu-4.0/exec.c:3419 -> -#8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at -> -/usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -#9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -(arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 -> -#10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -> -/usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -#11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -> -#12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> -And I searched and found -> -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the same -> -backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -> -blk_drain() to virtio_blk_device_unrealize()") is to fix this particular -> -bug. -> -> -But I can still hit the bug even after applying the commit. Do I miss -> -anything? -Hi Eryu, -This backtrace seems to be caused by this bug (there were two bugs in -1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -Although the solution hasn't been tested on virtio-blk yet, you may -want to apply this patch: -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -Let me know if this works. - -Best regards, Julia Suvorova. - -On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: -> -On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> -> -> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > > Eryu Guan <address@hidden> wrote: -> -> > > -> -> > > > Hi, -> -> > > > -> -> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -> -> > > > think it's because io completion hits use-after-free when device is -> -> > > > already gone. Is this a known bug that has been fixed? (I went through -> -> > > > the git log but didn't find anything obvious). -> -> > > > -> -> > > > gdb backtrace is: -> -> > > > -> -> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > > Program terminated with signal 11, Segmentation fault. -> -> > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > 903 return obj->class; -> -> > > > (gdb) bt -> -> > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> -> > > > vector=<optimized out>) at -> -> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> -> > > > opaque=0x558a2f2fd420, ret=0) -> -> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> -> > > > i1=<optimized out>) at -> -> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > > #6 0x00007fff9ed75780 in ?? () -> -> > > > #7 0x0000000000000000 in ?? () -> -> > > > -> -> > > > It seems like qemu was completing a discard/write_zero request, but -> -> > > > parent BusState was already freed & set to NULL. -> -> > > > -> -> > > > Do we need to drain all pending request before unrealizing virtio-blk -> -> > > > device? Like the following patch proposed? -> -> > > > -> -> > > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > > -> -> > > > If more info is needed, please let me know. -> -> > > -> -> > > may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> > -> -> > Yeah, this looks promising! I'll try it out (though it's a one-time -> -> > crash for me). Thanks! -> -> -> -> After applying this patch, I don't see the original segfaut and -> -> backtrace, but I see this crash -> -> -> -> [Thread debugging using libthread_db enabled] -> -> Using host libthread_db library "/lib64/libthread_db.so.1". -> -> Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -> Program terminated with signal 11, Segmentation fault. -> -> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -> addr=0, val=<optimized out>, size=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> 1324 VirtIOPCIProxy *proxy = -> -> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -> Missing separate debuginfos, use: debuginfo-install -> -> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 -> -> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -> -> (gdb) bt -> -> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -> addr=0, val=<optimized out>, size=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, -> -> addr=<optimized out>, value=<optimized out>, size=<optimized out>, -> -> shift=<optimized out>, mask=<optimized out>, attrs=...) at -> -> /usr/src/debug/qemu-4.0/memory.c:502 -> -> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, -> -> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, -> -> access_size_min=<optimized out>, access_size_max=<optimized out>, -> -> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, -> -> attrs=...) -> -> at /usr/src/debug/qemu-4.0/memory.c:568 -> -> #3 0x0000561216837c66 in memory_region_dispatch_write -> -> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -> #4 0x00005612167e036f in flatview_write_continue -> -> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., -> -> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, -> -> mr=0x56121846d340) -> -> at /usr/src/debug/qemu-4.0/exec.c:3279 -> -> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 -> -> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 -> -> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) -> -> at /usr/src/debug/qemu-4.0/exec.c:3408 -> -> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -> addr=<optimized out>, attrs=..., attrs@entry=..., -> -> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> len=<optimized out>, is_write=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/exec.c:3419 -> -> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at -> -> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 -> -> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -> -> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> -> -> And I searched and found -> -> -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the same -> -> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -> -> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular -> -> bug. -> -> -> -> But I can still hit the bug even after applying the commit. Do I miss -> -> anything? -> -> -Hi Eryu, -> -This backtrace seems to be caused by this bug (there were two bugs in -> -1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -> -Although the solution hasn't been tested on virtio-blk yet, you may -> -want to apply this patch: -> -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -> -Let me know if this works. -Will try it out, thanks a lot! - -Eryu - -On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: -> -On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> -> -> On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > > Eryu Guan <address@hidden> wrote: -> -> > > -> -> > > > Hi, -> -> > > > -> -> > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, I -> -> > > > think it's because io completion hits use-after-free when device is -> -> > > > already gone. Is this a known bug that has been fixed? (I went through -> -> > > > the git log but didn't find anything obvious). -> -> > > > -> -> > > > gdb backtrace is: -> -> > > > -> -> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > > Program terminated with signal 11, Segmentation fault. -> -> > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > 903 return obj->class; -> -> > > > (gdb) bt -> -> > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> -> > > > vector=<optimized out>) at -> -> > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> -> > > > opaque=0x558a2f2fd420, ret=0) -> -> > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> -> > > > i1=<optimized out>) at -> -> > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > > #6 0x00007fff9ed75780 in ?? () -> -> > > > #7 0x0000000000000000 in ?? () -> -> > > > -> -> > > > It seems like qemu was completing a discard/write_zero request, but -> -> > > > parent BusState was already freed & set to NULL. -> -> > > > -> -> > > > Do we need to drain all pending request before unrealizing virtio-blk -> -> > > > device? Like the following patch proposed? -> -> > > > -> -> > > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > > -> -> > > > If more info is needed, please let me know. -> -> > > -> -> > > may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> > -> -> > Yeah, this looks promising! I'll try it out (though it's a one-time -> -> > crash for me). Thanks! -> -> -> -> After applying this patch, I don't see the original segfaut and -> -> backtrace, but I see this crash -> -> -> -> [Thread debugging using libthread_db enabled] -> -> Using host libthread_db library "/lib64/libthread_db.so.1". -> -> Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -> Program terminated with signal 11, Segmentation fault. -> -> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -> addr=0, val=<optimized out>, size=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> 1324 VirtIOPCIProxy *proxy = -> -> VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -> Missing separate debuginfos, use: debuginfo-install -> -> glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -> libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -> libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 -> -> pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -> -> (gdb) bt -> -> #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -> addr=0, val=<optimized out>, size=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized out>, -> -> addr=<optimized out>, value=<optimized out>, size=<optimized out>, -> -> shift=<optimized out>, mask=<optimized out>, attrs=...) at -> -> /usr/src/debug/qemu-4.0/memory.c:502 -> -> #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, -> -> value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, -> -> access_size_min=<optimized out>, access_size_max=<optimized out>, -> -> access_fn=0x561216835ac0 <memory_region_write_accessor>, mr=0x56121846d340, -> -> attrs=...) -> -> at /usr/src/debug/qemu-4.0/memory.c:568 -> -> #3 0x0000561216837c66 in memory_region_dispatch_write -> -> (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -> attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -> #4 0x00005612167e036f in flatview_write_continue -> -> (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., -> -> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> len=len@entry=2, addr1=<optimized out>, l=<optimized out>, -> -> mr=0x56121846d340) -> -> at /usr/src/debug/qemu-4.0/exec.c:3279 -> -> #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -> addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 -> -> out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 -> -> #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -> addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized out>) -> -> at /usr/src/debug/qemu-4.0/exec.c:3408 -> -> #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -> addr=<optimized out>, attrs=..., attrs@entry=..., -> -> buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> len=<optimized out>, is_write=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/exec.c:3419 -> -> #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at -> -> /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -> #9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -> (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 -> -> #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -> -> /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -> #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -> -> #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> -> -> And I searched and found -> -> -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the same -> -> backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -> -> blk_drain() to virtio_blk_device_unrealize()") is to fix this particular -> -> bug. -> -> -> -> But I can still hit the bug even after applying the commit. Do I miss -> -> anything? -> -> -Hi Eryu, -> -This backtrace seems to be caused by this bug (there were two bugs in -> -1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -> -Although the solution hasn't been tested on virtio-blk yet, you may -> -want to apply this patch: -> -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -> -Let me know if this works. -Unfortunately, I still see the same segfault & backtrace after applying -commit 421afd2fe8dd ("virtio: reset region cache when on queue -deletion") - -Anything I can help to debug? - -Thanks, -Eryu - -On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: -> -On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: -> -> On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> > -> -> > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > > > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > > > Eryu Guan <address@hidden> wrote: -> -> > > > -> -> > > > > Hi, -> -> > > > > -> -> > > > > I'm using qemu 4.0 and hit segfault when tearing down kata sandbox, -> -> > > > > I -> -> > > > > think it's because io completion hits use-after-free when device is -> -> > > > > already gone. Is this a known bug that has been fixed? (I went -> -> > > > > through -> -> > > > > the git log but didn't find anything obvious). -> -> > > > > -> -> > > > > gdb backtrace is: -> -> > > > > -> -> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > > > Program terminated with signal 11, Segmentation fault. -> -> > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > 903 return obj->class; -> -> > > > > (gdb) bt -> -> > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > #1 0x0000558a2c009e9b in virtio_notify_vector (vdev=0x558a2e7751d0, -> -> > > > > vector=<optimized out>) at -> -> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > > > #2 0x0000558a2bfdcb1e in virtio_blk_discard_write_zeroes_complete ( -> -> > > > > opaque=0x558a2f2fd420, ret=0) -> -> > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized out>, -> -> > > > > i1=<optimized out>) at -> -> > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > > > #6 0x00007fff9ed75780 in ?? () -> -> > > > > #7 0x0000000000000000 in ?? () -> -> > > > > -> -> > > > > It seems like qemu was completing a discard/write_zero request, but -> -> > > > > parent BusState was already freed & set to NULL. -> -> > > > > -> -> > > > > Do we need to drain all pending request before unrealizing -> -> > > > > virtio-blk -> -> > > > > device? Like the following patch proposed? -> -> > > > > -> -> > > > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > > > -> -> > > > > If more info is needed, please let me know. -> -> > > > -> -> > > > may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> > > -> -> > > Yeah, this looks promising! I'll try it out (though it's a one-time -> -> > > crash for me). Thanks! -> -> > -> -> > After applying this patch, I don't see the original segfaut and -> -> > backtrace, but I see this crash -> -> > -> -> > [Thread debugging using libthread_db enabled] -> -> > Using host libthread_db library "/lib64/libthread_db.so.1". -> -> > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -> > Program terminated with signal 11, Segmentation fault. -> -> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -> > addr=0, val=<optimized out>, size=<optimized out>) at -> -> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > 1324 VirtIOPCIProxy *proxy = -> -> > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -> > Missing separate debuginfos, use: debuginfo-install -> -> > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -> > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -> > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 -> -> > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -> -> > (gdb) bt -> -> > #0 0x0000561216a57609 in virtio_pci_notify_write (opaque=0x5612184747e0, -> -> > addr=0, val=<optimized out>, size=<optimized out>) at -> -> > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized -> -> > out>, addr=<optimized out>, value=<optimized out>, size=<optimized out>, -> -> > shift=<optimized out>, mask=<optimized out>, attrs=...) at -> -> > /usr/src/debug/qemu-4.0/memory.c:502 -> -> > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, -> -> > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, -> -> > access_size_min=<optimized out>, access_size_max=<optimized out>, -> -> > access_fn=0x561216835ac0 <memory_region_write_accessor>, -> -> > mr=0x56121846d340, attrs=...) -> -> > at /usr/src/debug/qemu-4.0/memory.c:568 -> -> > #3 0x0000561216837c66 in memory_region_dispatch_write -> -> > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -> > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -> > #4 0x00005612167e036f in flatview_write_continue -> -> > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., -> -> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, -> -> > mr=0x56121846d340) -> -> > at /usr/src/debug/qemu-4.0/exec.c:3279 -> -> > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -> > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address 0x7fce7dd97028 -> -> > out of bounds>, len=2) at /usr/src/debug/qemu-4.0/exec.c:3318 -> -> > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -> > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized -> -> > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 -> -> > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -> > addr=<optimized out>, attrs=..., attrs@entry=..., -> -> > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> > len=<optimized out>, is_write=<optimized out>) at -> -> > /usr/src/debug/qemu-4.0/exec.c:3419 -> -> > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) at -> -> > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -> > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -> > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 -> -> > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -> -> > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -> > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -> -> > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> > -> -> > And I searched and found -> -> > -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the same -> -> > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -> -> > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular -> -> > bug. -> -> > -> -> > But I can still hit the bug even after applying the commit. Do I miss -> -> > anything? -> -> -> -> Hi Eryu, -> -> This backtrace seems to be caused by this bug (there were two bugs in -> -> 1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -> -> Although the solution hasn't been tested on virtio-blk yet, you may -> -> want to apply this patch: -> -> -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -> -> Let me know if this works. -> -> -Unfortunately, I still see the same segfault & backtrace after applying -> -commit 421afd2fe8dd ("virtio: reset region cache when on queue -> -deletion") -> -> -Anything I can help to debug? -Please post the QEMU command-line and the QMP commands use to remove the -device. - -The backtrace shows a vcpu thread submitting a request. The device -seems to be partially destroyed. That's surprising because the monitor -and the vcpu thread should use the QEMU global mutex to avoid race -conditions. Maybe seeing the QMP commands will make it clearer... - -Stefan -signature.asc -Description: -PGP signature - -On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: -> -On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: -> -> On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: -> -> > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> > > -> -> > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > > > > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > > > > Eryu Guan <address@hidden> wrote: -> -> > > > > -> -> > > > > > Hi, -> -> > > > > > -> -> > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata -> -> > > > > > sandbox, I -> -> > > > > > think it's because io completion hits use-after-free when device -> -> > > > > > is -> -> > > > > > already gone. Is this a known bug that has been fixed? (I went -> -> > > > > > through -> -> > > > > > the git log but didn't find anything obvious). -> -> > > > > > -> -> > > > > > gdb backtrace is: -> -> > > > > > -> -> > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > > > > Program terminated with signal 11, Segmentation fault. -> -> > > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > > 903 return obj->class; -> -> > > > > > (gdb) bt -> -> > > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector -> -> > > > > > (vdev=0x558a2e7751d0, -> -> > > > > > vector=<optimized out>) at -> -> > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > > > > #2 0x0000558a2bfdcb1e in -> -> > > > > > virtio_blk_discard_write_zeroes_complete ( -> -> > > > > > opaque=0x558a2f2fd420, ret=0) -> -> > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized -> -> > > > > > out>, -> -> > > > > > i1=<optimized out>) at -> -> > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > > > > #6 0x00007fff9ed75780 in ?? () -> -> > > > > > #7 0x0000000000000000 in ?? () -> -> > > > > > -> -> > > > > > It seems like qemu was completing a discard/write_zero request, -> -> > > > > > but -> -> > > > > > parent BusState was already freed & set to NULL. -> -> > > > > > -> -> > > > > > Do we need to drain all pending request before unrealizing -> -> > > > > > virtio-blk -> -> > > > > > device? Like the following patch proposed? -> -> > > > > > -> -> > > > > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > > > > -> -> > > > > > If more info is needed, please let me know. -> -> > > > > -> -> > > > > may be this will help: -https://patchwork.kernel.org/patch/11213047/ -> -> > > > -> -> > > > Yeah, this looks promising! I'll try it out (though it's a one-time -> -> > > > crash for me). Thanks! -> -> > > -> -> > > After applying this patch, I don't see the original segfaut and -> -> > > backtrace, but I see this crash -> -> > > -> -> > > [Thread debugging using libthread_db enabled] -> -> > > Using host libthread_db library "/lib64/libthread_db.so.1". -> -> > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -> > > Program terminated with signal 11, Segmentation fault. -> -> > > #0 0x0000561216a57609 in virtio_pci_notify_write -> -> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized -> -> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > > 1324 VirtIOPCIProxy *proxy = -> -> > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -> > > Missing separate debuginfos, use: debuginfo-install -> -> > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -> > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -> > > libstdc++-4.8.5-28.alios7.1.x86_64 numactl-libs-2.0.9-5.1.alios7.x86_64 -> -> > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -> -> > > (gdb) bt -> -> > > #0 0x0000561216a57609 in virtio_pci_notify_write -> -> > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized -> -> > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized -> -> > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized -> -> > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at -> -> > > /usr/src/debug/qemu-4.0/memory.c:502 -> -> > > #2 0x0000561216833c5d in access_with_adjusted_size (addr=addr@entry=0, -> -> > > value=value@entry=0x7fcdeab1b8a8, size=size@entry=2, -> -> > > access_size_min=<optimized out>, access_size_max=<optimized out>, -> -> > > access_fn=0x561216835ac0 <memory_region_write_accessor>, -> -> > > mr=0x56121846d340, attrs=...) -> -> > > at /usr/src/debug/qemu-4.0/memory.c:568 -> -> > > #3 0x0000561216837c66 in memory_region_dispatch_write -> -> > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -> > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -> > > #4 0x00005612167e036f in flatview_write_continue -> -> > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., -> -> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, -> -> > > mr=0x56121846d340) -> -> > > at /usr/src/debug/qemu-4.0/exec.c:3279 -> -> > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -> > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address -> -> > > 0x7fce7dd97028 out of bounds>, len=2) at -> -> > > /usr/src/debug/qemu-4.0/exec.c:3318 -> -> > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -> > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized -> -> > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 -> -> > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -> > > addr=<optimized out>, attrs=..., attrs@entry=..., -> -> > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> > > len=<optimized out>, is_write=<optimized out>) at -> -> > > /usr/src/debug/qemu-4.0/exec.c:3419 -> -> > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) -> -> > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -> > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -> > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 -> -> > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -> -> > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -> > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -> -> > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> > > -> -> > > And I searched and found -> -> > > -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the same -> -> > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -> -> > > blk_drain() to virtio_blk_device_unrealize()") is to fix this particular -> -> > > bug. -> -> > > -> -> > > But I can still hit the bug even after applying the commit. Do I miss -> -> > > anything? -> -> > -> -> > Hi Eryu, -> -> > This backtrace seems to be caused by this bug (there were two bugs in -> -> > 1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -> -> > Although the solution hasn't been tested on virtio-blk yet, you may -> -> > want to apply this patch: -> -> > -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -> -> > Let me know if this works. -> -> -> -> Unfortunately, I still see the same segfault & backtrace after applying -> -> commit 421afd2fe8dd ("virtio: reset region cache when on queue -> -> deletion") -> -> -> -> Anything I can help to debug? -> -> -Please post the QEMU command-line and the QMP commands use to remove the -> -device. -It's a normal kata instance using virtio-fs as rootfs. - -/usr/local/libexec/qemu-kvm -name -sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ - -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine -q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ - -cpu host -qmp -unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait - \ - -qmp -unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait - \ - -m 2048M,slots=10,maxmem=773893M -device -pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ - -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device -virtconsole,chardev=charconsole0,id=console0 \ - -chardev -socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait - \ - -device -virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ - -chardev -socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait - \ - -device nvdimm,id=nv0,memdev=mem0 -object -memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 - \ - -object rng-random,id=rng0,filename=/dev/urandom -device -virtio-rng,rng=rng0,romfile= \ - -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ - -chardev -socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait - \ - -chardev -socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock - \ - -device -vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M --netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ - -device -driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= - \ - -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -nodefaults --nographic -daemonize \ - -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on --numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ - -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 -i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k console=hvc0 -console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 pci=lastbus=0 -root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro ro -rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 -agent.use_vsock=false init=/usr/lib/systemd/systemd -systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service -systemd.mask=systemd-networkd.socket \ - -pidfile -/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid -\ - -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 - -QMP command to delete device (the device id is just an example, not the -one caused the crash): - -"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" - -which has been hot plugged by: -"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" -"{\"return\": {}}" -"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" -"{\"return\": {}}" - -> -> -The backtrace shows a vcpu thread submitting a request. The device -> -seems to be partially destroyed. That's surprising because the monitor -> -and the vcpu thread should use the QEMU global mutex to avoid race -> -conditions. Maybe seeing the QMP commands will make it clearer... -> -> -Stefan -Thanks! - -Eryu - -On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: -> -On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: -> -> On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: -> -> > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: -> -> > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> > > > -> -> > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > > > > > Eryu Guan <address@hidden> wrote: -> -> > > > > > -> -> > > > > > > Hi, -> -> > > > > > > -> -> > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata -> -> > > > > > > sandbox, I -> -> > > > > > > think it's because io completion hits use-after-free when -> -> > > > > > > device is -> -> > > > > > > already gone. Is this a known bug that has been fixed? (I went -> -> > > > > > > through -> -> > > > > > > the git log but didn't find anything obvious). -> -> > > > > > > -> -> > > > > > > gdb backtrace is: -> -> > > > > > > -> -> > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > > > > > Program terminated with signal 11, Segmentation fault. -> -> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > > > 903 return obj->class; -> -> > > > > > > (gdb) bt -> -> > > > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector -> -> > > > > > > (vdev=0x558a2e7751d0, -> -> > > > > > > vector=<optimized out>) at -> -> > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > > > > > #2 0x0000558a2bfdcb1e in -> -> > > > > > > virtio_blk_discard_write_zeroes_complete ( -> -> > > > > > > opaque=0x558a2f2fd420, ret=0) -> -> > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete (acb=0x558a2eed7420) -> -> > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized -> -> > > > > > > out>, -> -> > > > > > > i1=<optimized out>) at -> -> > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > > > > > #6 0x00007fff9ed75780 in ?? () -> -> > > > > > > #7 0x0000000000000000 in ?? () -> -> > > > > > > -> -> > > > > > > It seems like qemu was completing a discard/write_zero request, -> -> > > > > > > but -> -> > > > > > > parent BusState was already freed & set to NULL. -> -> > > > > > > -> -> > > > > > > Do we need to drain all pending request before unrealizing -> -> > > > > > > virtio-blk -> -> > > > > > > device? Like the following patch proposed? -> -> > > > > > > -> -> > > > > > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > > > > > -> -> > > > > > > If more info is needed, please let me know. -> -> > > > > > -> -> > > > > > may be this will help: -> -> > > > > > -https://patchwork.kernel.org/patch/11213047/ -> -> > > > > -> -> > > > > Yeah, this looks promising! I'll try it out (though it's a one-time -> -> > > > > crash for me). Thanks! -> -> > > > -> -> > > > After applying this patch, I don't see the original segfaut and -> -> > > > backtrace, but I see this crash -> -> > > > -> -> > > > [Thread debugging using libthread_db enabled] -> -> > > > Using host libthread_db library "/lib64/libthread_db.so.1". -> -> > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -> > > > Program terminated with signal 11, Segmentation fault. -> -> > > > #0 0x0000561216a57609 in virtio_pci_notify_write -> -> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized -> -> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > > > 1324 VirtIOPCIProxy *proxy = -> -> > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -> > > > Missing separate debuginfos, use: debuginfo-install -> -> > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -> > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -> > > > libstdc++-4.8.5-28.alios7.1.x86_64 -> -> > > > numactl-libs-2.0.9-5.1.alios7.x86_64 pixman-0.32.6-3.1.alios7.x86_64 -> -> > > > zlib-1.2.7-16.2.alios7.x86_64 -> -> > > > (gdb) bt -> -> > > > #0 0x0000561216a57609 in virtio_pci_notify_write -> -> > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, size=<optimized -> -> > > > out>) at /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > > > #1 0x0000561216835b22 in memory_region_write_accessor (mr=<optimized -> -> > > > out>, addr=<optimized out>, value=<optimized out>, size=<optimized -> -> > > > out>, shift=<optimized out>, mask=<optimized out>, attrs=...) at -> -> > > > /usr/src/debug/qemu-4.0/memory.c:502 -> -> > > > #2 0x0000561216833c5d in access_with_adjusted_size -> -> > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, -> -> > > > size=size@entry=2, access_size_min=<optimized out>, -> -> > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 -> -> > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) -> -> > > > at /usr/src/debug/qemu-4.0/memory.c:568 -> -> > > > #3 0x0000561216837c66 in memory_region_dispatch_write -> -> > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -> > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -> > > > #4 0x00005612167e036f in flatview_write_continue -> -> > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, attrs=..., -> -> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> > > > len=len@entry=2, addr1=<optimized out>, l=<optimized out>, -> -> > > > mr=0x56121846d340) -> -> > > > at /usr/src/debug/qemu-4.0/exec.c:3279 -> -> > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -> > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address -> -> > > > 0x7fce7dd97028 out of bounds>, len=2) at -> -> > > > /usr/src/debug/qemu-4.0/exec.c:3318 -> -> > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -> > > > addr=<optimized out>, attrs=..., buf=<optimized out>, len=<optimized -> -> > > > out>) at /usr/src/debug/qemu-4.0/exec.c:3408 -> -> > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -> > > > addr=<optimized out>, attrs=..., attrs@entry=..., -> -> > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of bounds>, -> -> > > > len=<optimized out>, is_write=<optimized out>) at -> -> > > > /usr/src/debug/qemu-4.0/exec.c:3419 -> -> > > > #8 0x0000561216849da1 in kvm_cpu_exec (cpu=cpu@entry=0x56121849aa00) -> -> > > > at /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -> > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -> > > > (arg=arg@entry=0x56121849aa00) at /usr/src/debug/qemu-4.0/cpus.c:1281 -> -> > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) at -> -> > > > /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -> > > > #11 0x00007fce7bef6e25 in start_thread () from /lib64/libpthread.so.0 -> -> > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> > > > -> -> > > > And I searched and found -> -> > > > -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the -> -> > > > same -> -> > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: Add -> -> > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this -> -> > > > particular -> -> > > > bug. -> -> > > > -> -> > > > But I can still hit the bug even after applying the commit. Do I miss -> -> > > > anything? -> -> > > -> -> > > Hi Eryu, -> -> > > This backtrace seems to be caused by this bug (there were two bugs in -> -> > > 1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -> -> > > Although the solution hasn't been tested on virtio-blk yet, you may -> -> > > want to apply this patch: -> -> > > -> -> > > -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -> -> > > Let me know if this works. -> -> > -> -> > Unfortunately, I still see the same segfault & backtrace after applying -> -> > commit 421afd2fe8dd ("virtio: reset region cache when on queue -> -> > deletion") -> -> > -> -> > Anything I can help to debug? -> -> -> -> Please post the QEMU command-line and the QMP commands use to remove the -> -> device. -> -> -It's a normal kata instance using virtio-fs as rootfs. -> -> -/usr/local/libexec/qemu-kvm -name -> -sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ -> --uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine -> -q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ -> --cpu host -qmp -> -unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait -> -\ -> --qmp -> -unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait -> -\ -> --m 2048M,slots=10,maxmem=773893M -device -> -pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ -> --device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device -> -virtconsole,chardev=charconsole0,id=console0 \ -> --chardev -> -socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait -> -\ -> --device -> -virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 \ -> --chardev -> -socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait -> -\ -> --device nvdimm,id=nv0,memdev=mem0 -object -> -memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 -> -\ -> --object rng-random,id=rng0,filename=/dev/urandom -device -> -virtio-rng,rng=rng0,romfile= \ -> --device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ -> --chardev -> -socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait -> -\ -> --chardev -> -socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock -> -\ -> --device -> -vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M -> --netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ -> --device -> -driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= -> -\ -> --global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -> --nodefaults -nographic -daemonize \ -> --object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -> --numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ -> --append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 i8042.direct=1 -> -i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp reboot=k -> -console=hvc0 console=hvc1 iommu=off cryptomgr.notests net.ifnames=0 -> -pci=lastbus=0 root=/dev/pmem0p1 rootflags=dax,data=ordered,errors=remount-ro -> -ro rootfstype=ext4 quiet systemd.show_status=false panic=1 nr_cpus=96 -> -agent.use_vsock=false init=/usr/lib/systemd/systemd -> -systemd.unit=kata-containers.target systemd.mask=systemd-networkd.service -> -systemd.mask=systemd-networkd.socket \ -> --pidfile -> -/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid -> -\ -> --smp 1,cores=1,threads=1,sockets=96,maxcpus=96 -> -> -QMP command to delete device (the device id is just an example, not the -> -one caused the crash): -> -> -"{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" -> -> -which has been hot plugged by: -> -"{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" -> -"{\"return\": {}}" -> -"{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" -> -"{\"return\": {}}" -Thanks. I wasn't able to reproduce this crash with qemu.git/master. - -One thing that is strange about the latest backtrace you posted: QEMU is -dispatching the memory access instead of using the ioeventfd code that -that virtio-blk-pci normally takes when a virtqueue is notified. I -guess this means ioeventfd has already been disabled due to the hot -unplug. - -Could you try with machine type "i440fx" instead of "q35"? I wonder if -pci-bridge/shpc is part of the problem. - -Stefan -signature.asc -Description: -PGP signature - -On Tue, Jan 14, 2020 at 04:16:24PM +0000, Stefan Hajnoczi wrote: -> -On Tue, Jan 14, 2020 at 10:50:58AM +0800, Eryu Guan wrote: -> -> On Mon, Jan 13, 2020 at 04:38:55PM +0000, Stefan Hajnoczi wrote: -> -> > On Thu, Jan 09, 2020 at 12:58:06PM +0800, Eryu Guan wrote: -> -> > > On Tue, Jan 07, 2020 at 03:01:01PM +0100, Julia Suvorova wrote: -> -> > > > On Tue, Jan 7, 2020 at 2:06 PM Eryu Guan <address@hidden> wrote: -> -> > > > > -> -> > > > > On Thu, Jan 02, 2020 at 10:08:50AM +0800, Eryu Guan wrote: -> -> > > > > > On Tue, Dec 31, 2019 at 11:51:35AM +0100, Igor Mammedov wrote: -> -> > > > > > > On Tue, 31 Dec 2019 18:34:34 +0800 -> -> > > > > > > Eryu Guan <address@hidden> wrote: -> -> > > > > > > -> -> > > > > > > > Hi, -> -> > > > > > > > -> -> > > > > > > > I'm using qemu 4.0 and hit segfault when tearing down kata -> -> > > > > > > > sandbox, I -> -> > > > > > > > think it's because io completion hits use-after-free when -> -> > > > > > > > device is -> -> > > > > > > > already gone. Is this a known bug that has been fixed? (I -> -> > > > > > > > went through -> -> > > > > > > > the git log but didn't find anything obvious). -> -> > > > > > > > -> -> > > > > > > > gdb backtrace is: -> -> > > > > > > > -> -> > > > > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > > > > > sandbox-5b8df8c6c6901c3c0a9b02879be10fe8d69d6'. -> -> > > > > > > > Program terminated with signal 11, Segmentation fault. -> -> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > > > > 903 return obj->class; -> -> > > > > > > > (gdb) bt -> -> > > > > > > > #0 object_get_class (obj=obj@entry=0x0) at -> -> > > > > > > > /usr/src/debug/qemu-4.0/qom/object.c:903 -> -> > > > > > > > #1 0x0000558a2c009e9b in virtio_notify_vector -> -> > > > > > > > (vdev=0x558a2e7751d0, -> -> > > > > > > > vector=<optimized out>) at -> -> > > > > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio.c:1118 -> -> > > > > > > > #2 0x0000558a2bfdcb1e in -> -> > > > > > > > virtio_blk_discard_write_zeroes_complete ( -> -> > > > > > > > opaque=0x558a2f2fd420, ret=0) -> -> > > > > > > > at /usr/src/debug/qemu-4.0/hw/block/virtio-blk.c:186 -> -> > > > > > > > #3 0x0000558a2c261c7e in blk_aio_complete -> -> > > > > > > > (acb=0x558a2eed7420) -> -> > > > > > > > at /usr/src/debug/qemu-4.0/block/block-backend.c:1305 -> -> > > > > > > > #4 0x0000558a2c3031db in coroutine_trampoline (i0=<optimized -> -> > > > > > > > out>, -> -> > > > > > > > i1=<optimized out>) at -> -> > > > > > > > /usr/src/debug/qemu-4.0/util/coroutine-ucontext.c:116 -> -> > > > > > > > #5 0x00007f45b2f8b080 in ?? () from /lib64/libc.so.6 -> -> > > > > > > > #6 0x00007fff9ed75780 in ?? () -> -> > > > > > > > #7 0x0000000000000000 in ?? () -> -> > > > > > > > -> -> > > > > > > > It seems like qemu was completing a discard/write_zero -> -> > > > > > > > request, but -> -> > > > > > > > parent BusState was already freed & set to NULL. -> -> > > > > > > > -> -> > > > > > > > Do we need to drain all pending request before unrealizing -> -> > > > > > > > virtio-blk -> -> > > > > > > > device? Like the following patch proposed? -> -> > > > > > > > -> -> > > > > > > > -https://lists.gnu.org/archive/html/qemu-devel/2017-06/msg02945.html -> -> > > > > > > > -> -> > > > > > > > If more info is needed, please let me know. -> -> > > > > > > -> -> > > > > > > may be this will help: -> -> > > > > > > -https://patchwork.kernel.org/patch/11213047/ -> -> > > > > > -> -> > > > > > Yeah, this looks promising! I'll try it out (though it's a -> -> > > > > > one-time -> -> > > > > > crash for me). Thanks! -> -> > > > > -> -> > > > > After applying this patch, I don't see the original segfaut and -> -> > > > > backtrace, but I see this crash -> -> > > > > -> -> > > > > [Thread debugging using libthread_db enabled] -> -> > > > > Using host libthread_db library "/lib64/libthread_db.so.1". -> -> > > > > Core was generated by `/usr/local/libexec/qemu-kvm -name -> -> > > > > sandbox-a2f34a11a7e1449496503bbc4050ae040c0d3'. -> -> > > > > Program terminated with signal 11, Segmentation fault. -> -> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write -> -> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, -> -> > > > > size=<optimized out>) at -> -> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > > > > 1324 VirtIOPCIProxy *proxy = -> -> > > > > VIRTIO_PCI(DEVICE(vdev)->parent_bus->parent); -> -> > > > > Missing separate debuginfos, use: debuginfo-install -> -> > > > > glib2-2.42.2-5.1.alios7.x86_64 glibc-2.17-260.alios7.x86_64 -> -> > > > > libgcc-4.8.5-28.alios7.1.x86_64 libseccomp-2.3.1-3.alios7.x86_64 -> -> > > > > libstdc++-4.8.5-28.alios7.1.x86_64 -> -> > > > > numactl-libs-2.0.9-5.1.alios7.x86_64 -> -> > > > > pixman-0.32.6-3.1.alios7.x86_64 zlib-1.2.7-16.2.alios7.x86_64 -> -> > > > > (gdb) bt -> -> > > > > #0 0x0000561216a57609 in virtio_pci_notify_write -> -> > > > > (opaque=0x5612184747e0, addr=0, val=<optimized out>, -> -> > > > > size=<optimized out>) at -> -> > > > > /usr/src/debug/qemu-4.0/hw/virtio/virtio-pci.c:1324 -> -> > > > > #1 0x0000561216835b22 in memory_region_write_accessor -> -> > > > > (mr=<optimized out>, addr=<optimized out>, value=<optimized out>, -> -> > > > > size=<optimized out>, shift=<optimized out>, mask=<optimized out>, -> -> > > > > attrs=...) at /usr/src/debug/qemu-4.0/memory.c:502 -> -> > > > > #2 0x0000561216833c5d in access_with_adjusted_size -> -> > > > > (addr=addr@entry=0, value=value@entry=0x7fcdeab1b8a8, -> -> > > > > size=size@entry=2, access_size_min=<optimized out>, -> -> > > > > access_size_max=<optimized out>, access_fn=0x561216835ac0 -> -> > > > > <memory_region_write_accessor>, mr=0x56121846d340, attrs=...) -> -> > > > > at /usr/src/debug/qemu-4.0/memory.c:568 -> -> > > > > #3 0x0000561216837c66 in memory_region_dispatch_write -> -> > > > > (mr=mr@entry=0x56121846d340, addr=0, data=<optimized out>, size=2, -> -> > > > > attrs=attrs@entry=...) at /usr/src/debug/qemu-4.0/memory.c:1503 -> -> > > > > #4 0x00005612167e036f in flatview_write_continue -> -> > > > > (fv=fv@entry=0x56121852edd0, addr=addr@entry=841813602304, -> -> > > > > attrs=..., buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out -> -> > > > > of bounds>, len=len@entry=2, addr1=<optimized out>, l=<optimized -> -> > > > > out>, mr=0x56121846d340) -> -> > > > > at /usr/src/debug/qemu-4.0/exec.c:3279 -> -> > > > > #5 0x00005612167e0506 in flatview_write (fv=0x56121852edd0, -> -> > > > > addr=841813602304, attrs=..., buf=0x7fce7dd97028 <Address -> -> > > > > 0x7fce7dd97028 out of bounds>, len=2) at -> -> > > > > /usr/src/debug/qemu-4.0/exec.c:3318 -> -> > > > > #6 0x00005612167e4a1b in address_space_write (as=<optimized out>, -> -> > > > > addr=<optimized out>, attrs=..., buf=<optimized out>, -> -> > > > > len=<optimized out>) at /usr/src/debug/qemu-4.0/exec.c:3408 -> -> > > > > #7 0x00005612167e4aa5 in address_space_rw (as=<optimized out>, -> -> > > > > addr=<optimized out>, attrs=..., attrs@entry=..., -> -> > > > > buf=buf@entry=0x7fce7dd97028 <Address 0x7fce7dd97028 out of -> -> > > > > bounds>, len=<optimized out>, is_write=<optimized out>) at -> -> > > > > /usr/src/debug/qemu-4.0/exec.c:3419 -> -> > > > > #8 0x0000561216849da1 in kvm_cpu_exec -> -> > > > > (cpu=cpu@entry=0x56121849aa00) at -> -> > > > > /usr/src/debug/qemu-4.0/accel/kvm/kvm-all.c:2034 -> -> > > > > #9 0x000056121682255e in qemu_kvm_cpu_thread_fn -> -> > > > > (arg=arg@entry=0x56121849aa00) at -> -> > > > > /usr/src/debug/qemu-4.0/cpus.c:1281 -> -> > > > > #10 0x0000561216b794d6 in qemu_thread_start (args=<optimized out>) -> -> > > > > at /usr/src/debug/qemu-4.0/util/qemu-thread-posix.c:502 -> -> > > > > #11 0x00007fce7bef6e25 in start_thread () from -> -> > > > > /lib64/libpthread.so.0 -> -> > > > > #12 0x00007fce7bc1ef1d in clone () from /lib64/libc.so.6 -> -> > > > > -> -> > > > > And I searched and found -> -> > > > > -https://bugzilla.redhat.com/show_bug.cgi?id=1706759 -, which has the -> -> > > > > same -> -> > > > > backtrace as above, and it seems commit 7bfde688fb1b ("virtio-blk: -> -> > > > > Add -> -> > > > > blk_drain() to virtio_blk_device_unrealize()") is to fix this -> -> > > > > particular -> -> > > > > bug. -> -> > > > > -> -> > > > > But I can still hit the bug even after applying the commit. Do I -> -> > > > > miss -> -> > > > > anything? -> -> > > > -> -> > > > Hi Eryu, -> -> > > > This backtrace seems to be caused by this bug (there were two bugs in -> -> > > > 1706759): -https://bugzilla.redhat.com/show_bug.cgi?id=1708480 -> -> > > > Although the solution hasn't been tested on virtio-blk yet, you may -> -> > > > want to apply this patch: -> -> > > > -> -> > > > -https://lists.nongnu.org/archive/html/qemu-devel/2019-12/msg05197.html -> -> > > > Let me know if this works. -> -> > > -> -> > > Unfortunately, I still see the same segfault & backtrace after applying -> -> > > commit 421afd2fe8dd ("virtio: reset region cache when on queue -> -> > > deletion") -> -> > > -> -> > > Anything I can help to debug? -> -> > -> -> > Please post the QEMU command-line and the QMP commands use to remove the -> -> > device. -> -> -> -> It's a normal kata instance using virtio-fs as rootfs. -> -> -> -> /usr/local/libexec/qemu-kvm -name -> -> sandbox-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d \ -> -> -uuid e03f6b6b-b80b-40c0-8d5b-0cbfed1305d2 -machine -> -> q35,accel=kvm,kernel_irqchip,nvdimm,nosmm,nosmbus,nosata,nopit \ -> -> -cpu host -qmp -> -> unix:/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait -> -> \ -> -> -qmp -> -> unix:/run/vc/vm/debug-a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/qmp.sock,server,nowait -> -> \ -> -> -m 2048M,slots=10,maxmem=773893M -device -> -> pci-bridge,bus=pcie.0,id=pci-bridge-0,chassis_nr=1,shpc=on,addr=2,romfile= \ -> -> -device virtio-serial-pci,disable-modern=false,id=serial0,romfile= -device -> -> virtconsole,chardev=charconsole0,id=console0 \ -> -> -chardev -> -> socket,id=charconsole0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/console.sock,server,nowait -> -> \ -> -> -device -> -> virtserialport,chardev=metricagent,id=channel10,name=metric.agent.channel.10 -> -> \ -> -> -chardev -> -> socket,id=metricagent,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/metric.agent.channel.sock,server,nowait -> -> \ -> -> -device nvdimm,id=nv0,memdev=mem0 -object -> -> memory-backend-file,id=mem0,mem-path=/usr/local/share/containers-image-1.9.0.img,size=268435456 -> -> \ -> -> -object rng-random,id=rng0,filename=/dev/urandom -device -> -> virtio-rng,rng=rng0,romfile= \ -> -> -device virtserialport,chardev=charch0,id=channel0,name=agent.channel.0 \ -> -> -chardev -> -> socket,id=charch0,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/kata.sock,server,nowait -> -> \ -> -> -chardev -> -> socket,id=char-6fca044b801a78a1,path=/run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/vhost-fs.sock -> -> \ -> -> -device -> -> vhost-user-fs-pci,chardev=char-6fca044b801a78a1,tag=kataShared,cache-size=8192M -> -> -netdev tap,id=network-0,vhost=on,vhostfds=3,fds=4 \ -> -> -device -> -> driver=virtio-net-pci,netdev=network-0,mac=76:57:f1:ab:51:5c,disable-modern=false,mq=on,vectors=4,romfile= -> -> \ -> -> -global kvm-pit.lost_tick_policy=discard -vga none -no-user-config -> -> -nodefaults -nographic -daemonize \ -> -> -object memory-backend-file,id=dimm1,size=2048M,mem-path=/dev/shm,share=on -> -> -numa node,memdev=dimm1 -kernel /usr/local/share/kernel \ -> -> -append tsc=reliable no_timer_check rcupdate.rcu_expedited=1 -> -> i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 i8042.noaux=1 noreplace-smp -> -> reboot=k console=hvc0 console=hvc1 iommu=off cryptomgr.notests -> -> net.ifnames=0 pci=lastbus=0 root=/dev/pmem0p1 -> -> rootflags=dax,data=ordered,errors=remount-ro ro rootfstype=ext4 quiet -> -> systemd.show_status=false panic=1 nr_cpus=96 agent.use_vsock=false -> -> init=/usr/lib/systemd/systemd systemd.unit=kata-containers.target -> -> systemd.mask=systemd-networkd.service systemd.mask=systemd-networkd.socket \ -> -> -pidfile -> -> /run/vc/vm/a670786fcb1758d2348eb120939d90ffacf9f049f10b337284ad49bbcd60936d/pid -> -> \ -> -> -smp 1,cores=1,threads=1,sockets=96,maxcpus=96 -> -> -> -> QMP command to delete device (the device id is just an example, not the -> -> one caused the crash): -> -> -> -> "{\"arguments\":{\"id\":\"virtio-drive-5967abfb917c8da6\"},\"execute\":\"device_del\"}" -> -> -> -> which has been hot plugged by: -> -> "{\"arguments\":{\"cache\":{\"direct\":true,\"no-flush\":false},\"driver\":\"raw\",\"file\":{\"driver\":\"file\",\"filename\":\"/dev/dm-18\"},\"node-name\":\"drive-5967abfb917c8da6\"},\"execute\":\"blockdev-add\"}" -> -> "{\"return\": {}}" -> -> "{\"arguments\":{\"addr\":\"01\",\"bus\":\"pci-bridge-0\",\"drive\":\"drive-5967abfb917c8da6\",\"driver\":\"virtio-blk-pci\",\"id\":\"virtio-drive-5967abfb917c8da6\",\"romfile\":\"\",\"share-rw\":\"on\"},\"execute\":\"device_add\"}" -> -> "{\"return\": {}}" -> -> -Thanks. I wasn't able to reproduce this crash with qemu.git/master. -> -> -One thing that is strange about the latest backtrace you posted: QEMU is -> -dispatching the memory access instead of using the ioeventfd code that -> -that virtio-blk-pci normally takes when a virtqueue is notified. I -> -guess this means ioeventfd has already been disabled due to the hot -> -unplug. -> -> -Could you try with machine type "i440fx" instead of "q35"? I wonder if -> -pci-bridge/shpc is part of the problem. -Sure, will try it. But it may take some time, as the test bed is busy -with other testing tasks. I'll report back once I got the results. - -Thanks, -Eryu - diff --git a/results/classifier/008/all/92957605 b/results/classifier/008/all/92957605 deleted file mode 100644 index 4f840de67..000000000 --- a/results/classifier/008/all/92957605 +++ /dev/null @@ -1,428 +0,0 @@ -other: 0.997 -permissions: 0.996 -semantic: 0.995 -debug: 0.994 -performance: 0.994 -PID: 0.993 -device: 0.993 -socket: 0.993 -boot: 0.992 -network: 0.989 -graphic: 0.986 -files: 0.986 -KVM: 0.982 -vnc: 0.981 - -[Qemu-devel] Fwd: [BUG] Failed to compile using gcc7.1 - -Hi all, -I encountered the same problem on gcc 7.1.1 and found Qu's mail in -this list from google search. - -Temporarily fix it by specifying the string length in snprintf -directive. Hope this is helpful to other people encountered the same -problem. - -@@ -1,9 +1,7 @@ ---- ---- a/block/blkdebug.c -- "blkdebug:%s:%s", s->config_file ?: "", ---- a/block/blkverify.c -- "blkverify:%s:%s", ---- a/hw/usb/bus.c -- snprintf(downstream->path, sizeof(downstream->path), "%s.%d", -- snprintf(downstream->path, sizeof(downstream->path), "%d", portnr); --- -+++ b/block/blkdebug.c -+ "blkdebug:%.2037s:%.2037s", s->config_file ?: "", -+++ b/block/blkverify.c -+ "blkverify:%.2038s:%.2038s", -+++ b/hw/usb/bus.c -+ snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", -+ snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr); - -Tsung-en Hsiao - -> -Qu Wenruo Wrote: -> -> -Hi all, -> -> -After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. -> -> -The error is: -> -> ------- -> -CC block/blkdebug.o -> -block/blkdebug.c: In function 'blkdebug_refresh_filename': -> -> -block/blkdebug.c:693:31: error: '%s' directive output may be truncated -> -writing up to 4095 bytes into a region of size 4086 -> -[-Werror=format-truncation=] -> -> -"blkdebug:%s:%s", s->config_file ?: "", -> -^~ -> -In file included from /usr/include/stdio.h:939:0, -> -from /home/adam/qemu/include/qemu/osdep.h:68, -> -from block/blkdebug.c:25: -> -> -/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 -> -or more bytes (assuming 4106) into a destination of size 4096 -> -> -return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, -> -^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -> -__bos (__s), __fmt, __va_arg_pack ()); -> -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -> -cc1: all warnings being treated as errors -> -make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 -> ------- -> -> -It seems that gcc 7 is introducing more restrict check for printf. -> -> -If using clang, although there are some extra warning, it can at least pass -> -the compile. -> -> -Thanks, -> -Qu - -Hi Tsung-en, - -On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote: -Hi all, -I encountered the same problem on gcc 7.1.1 and found Qu's mail in -this list from google search. - -Temporarily fix it by specifying the string length in snprintf -directive. Hope this is helpful to other people encountered the same -problem. -Thank your for sharing this. -@@ -1,9 +1,7 @@ ---- ---- a/block/blkdebug.c -- "blkdebug:%s:%s", s->config_file ?: "", ---- a/block/blkverify.c -- "blkverify:%s:%s", ---- a/hw/usb/bus.c -- snprintf(downstream->path, sizeof(downstream->path), "%s.%d", -- snprintf(downstream->path, sizeof(downstream->path), "%d", portnr); --- -+++ b/block/blkdebug.c -+ "blkdebug:%.2037s:%.2037s", s->config_file ?: "", -It is a rather funny way to silent this warning :) Truncating the -filename until it fits. -However I don't think it is the correct way since there is indeed an -overflow of bs->exact_filename. -Apparently exact_filename from "block/block_int.h" is defined to hold a -pathname: -char exact_filename[PATH_MAX]; -but is used for more than that (for example in blkdebug.c it might use -until 10+2*PATH_MAX chars). -I suppose it started as a buffer to hold a pathname then more block -drivers were added and this buffer ended used differently. -If it is a multi-purpose buffer one safer option might be to declare it -as a GString* and use g_string_printf(). -I CC'ed the block folks to have their feedback. - -Regards, - -Phil. -+++ b/block/blkverify.c -+ "blkverify:%.2038s:%.2038s", -+++ b/hw/usb/bus.c -+ snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", -+ snprintf(downstream->path, sizeof(downstream->path), "%.12d", portnr); - -Tsung-en Hsiao -Qu Wenruo Wrote: - -Hi all, - -After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with gcc. - -The error is: - ------- - CC block/blkdebug.o -block/blkdebug.c: In function 'blkdebug_refresh_filename': - -block/blkdebug.c:693:31: error: '%s' directive output may be truncated writing -up to 4095 bytes into a region of size 4086 [-Werror=format-truncation=] - - "blkdebug:%s:%s", s->config_file ?: "", - ^~ -In file included from /usr/include/stdio.h:939:0, - from /home/adam/qemu/include/qemu/osdep.h:68, - from block/blkdebug.c:25: - -/usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output 11 or -more bytes (assuming 4106) into a destination of size 4096 - - return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, - ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - __bos (__s), __fmt, __va_arg_pack ()); - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -cc1: all warnings being treated as errors -make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 ------- - -It seems that gcc 7 is introducing more restrict check for printf. - -If using clang, although there are some extra warning, it can at least pass the -compile. - -Thanks, -Qu - -On 2017-06-12 05:19, Philippe Mathieu-Daudé wrote: -> -Hi Tsung-en, -> -> -On 06/11/2017 04:08 PM, Tsung-en Hsiao wrote: -> -> Hi all, -> -> I encountered the same problem on gcc 7.1.1 and found Qu's mail in -> -> this list from google search. -> -> -> -> Temporarily fix it by specifying the string length in snprintf -> -> directive. Hope this is helpful to other people encountered the same -> -> problem. -> -> -Thank your for sharing this. -> -> -> -> -> @@ -1,9 +1,7 @@ -> -> --- -> -> --- a/block/blkdebug.c -> -> - "blkdebug:%s:%s", s->config_file ?: "", -> -> --- a/block/blkverify.c -> -> - "blkverify:%s:%s", -> -> --- a/hw/usb/bus.c -> -> - snprintf(downstream->path, sizeof(downstream->path), "%s.%d", -> -> - snprintf(downstream->path, sizeof(downstream->path), "%d", -> -> portnr); -> -> -- -> -> +++ b/block/blkdebug.c -> -> + "blkdebug:%.2037s:%.2037s", s->config_file ?: "", -> -> -It is a rather funny way to silent this warning :) Truncating the -> -filename until it fits. -> -> -However I don't think it is the correct way since there is indeed an -> -overflow of bs->exact_filename. -> -> -Apparently exact_filename from "block/block_int.h" is defined to hold a -> -pathname: -> -char exact_filename[PATH_MAX]; -> -> -but is used for more than that (for example in blkdebug.c it might use -> -until 10+2*PATH_MAX chars). -In any case, truncating the filenames will do just as much as truncating -the result: You'll get an unusable filename. - -> -I suppose it started as a buffer to hold a pathname then more block -> -drivers were added and this buffer ended used differently. -> -> -If it is a multi-purpose buffer one safer option might be to declare it -> -as a GString* and use g_string_printf(). -What it is supposed to be now is just an information string we can print -to the user, because strings are nicer than JSON objects. There are some -commands that take a filename for identifying a block node, but I dream -we can get rid of them in 3.0... - -The right solution is to remove it altogether and have a -"char *bdrv_filename(BlockDriverState *bs)" function (which generates -the filename every time it's called). I've been working on this for some -years now, actually, but it was never pressing enough to get it finished -(so I never had enough time). - -What we can do in the meantime is to not generate a plain filename if it -won't fit into bs->exact_filename. - -(The easiest way to do this probably would be to truncate -bs->exact_filename back to an empty string if snprintf() returns a value -greater than or equal to the length of bs->exact_filename.) - -What to do about hw/usb/bus.c I don't know (I guess the best solution -would be to ignore the warning, but I don't suppose that is going to work). - -Max - -> -> -I CC'ed the block folks to have their feedback. -> -> -Regards, -> -> -Phil. -> -> -> +++ b/block/blkverify.c -> -> + "blkverify:%.2038s:%.2038s", -> -> +++ b/hw/usb/bus.c -> -> + snprintf(downstream->path, sizeof(downstream->path), "%.12s.%d", -> -> + snprintf(downstream->path, sizeof(downstream->path), "%.12d", -> -> portnr); -> -> -> -> Tsung-en Hsiao -> -> -> ->> Qu Wenruo Wrote: -> ->> -> ->> Hi all, -> ->> -> ->> After upgrading gcc from 6.3.1 to 7.1.1, qemu can't be compiled with -> ->> gcc. -> ->> -> ->> The error is: -> ->> -> ->> ------ -> ->> CC block/blkdebug.o -> ->> block/blkdebug.c: In function 'blkdebug_refresh_filename': -> ->> -> ->> block/blkdebug.c:693:31: error: '%s' directive output may be -> ->> truncated writing up to 4095 bytes into a region of size 4086 -> ->> [-Werror=format-truncation=] -> ->> -> ->> "blkdebug:%s:%s", s->config_file ?: "", -> ->> ^~ -> ->> In file included from /usr/include/stdio.h:939:0, -> ->> from /home/adam/qemu/include/qemu/osdep.h:68, -> ->> from block/blkdebug.c:25: -> ->> -> ->> /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' -> ->> output 11 or more bytes (assuming 4106) into a destination of size 4096 -> ->> -> ->> return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, -> ->> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -> ->> __bos (__s), __fmt, __va_arg_pack ()); -> ->> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -> ->> cc1: all warnings being treated as errors -> ->> make: *** [/home/adam/qemu/rules.mak:69: block/blkdebug.o] Error 1 -> ->> ------ -> ->> -> ->> It seems that gcc 7 is introducing more restrict check for printf. -> ->> -> ->> If using clang, although there are some extra warning, it can at -> ->> least pass the compile. -> ->> -> ->> Thanks, -> ->> Qu -> -> -signature.asc -Description: -OpenPGP digital signature - diff --git a/results/classifier/008/all/95154278 b/results/classifier/008/all/95154278 deleted file mode 100644 index 2dc0c2ffc..000000000 --- a/results/classifier/008/all/95154278 +++ /dev/null @@ -1,165 +0,0 @@ -permissions: 0.989 -other: 0.953 -debug: 0.951 -device: 0.951 -graphic: 0.950 -PID: 0.949 -vnc: 0.948 -semantic: 0.937 -performance: 0.936 -files: 0.918 -KVM: 0.916 -socket: 0.913 -network: 0.913 -boot: 0.902 - -[Qemu-devel] [BUG] checkpatch.pl hangs on target/mips/msa_helper.c - -If checkpatch.pl is applied (using switch "-f") on file -target/mips/msa_helper.c, it will hang. - -There is a workaround for this particular file: - -These lines in msa_helper.c: - - uint## BITS ##_t S = _S, T = _T; \ - uint## BITS ##_t as, at, xs, xt, xd; \ - -should be replaced with: - - uint## BITS ## _t S = _S, T = _T; \ - uint## BITS ## _t as, at, xs, xt, xd; \ - -(a space is added after the second "##" in each line) - -The workaround is found by partial deleting and undeleting of the code in -msa_helper.c in binary search fashion. - -This workaround will soon be submitted by me as a patch within a series on misc -MIPS issues. - -I took a look at checkpatch.pl code, and it looks it is fairly complicated to -fix the issue, since it happens in the code segment involving intricate logic -conditions. - -Regards, -Aleksandar - -On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: -> -If checkpatch.pl is applied (using switch "-f") on file -> -target/mips/msa_helper.c, it will hang. -> -> -There is a workaround for this particular file: -> -> -These lines in msa_helper.c: -> -> -uint## BITS ##_t S = _S, T = _T; \ -> -uint## BITS ##_t as, at, xs, xt, xd; \ -> -> -should be replaced with: -> -> -uint## BITS ## _t S = _S, T = _T; \ -> -uint## BITS ## _t as, at, xs, xt, xd; \ -> -> -(a space is added after the second "##" in each line) -> -> -The workaround is found by partial deleting and undeleting of the code in -> -msa_helper.c in binary search fashion. -> -> -This workaround will soon be submitted by me as a patch within a series on -> -misc MIPS issues. -> -> -I took a look at checkpatch.pl code, and it looks it is fairly complicated to -> -fix the issue, since it happens in the code segment involving intricate logic -> -conditions. -Thanks for figuring this out, Aleksandar. Not sure if anyone else has -the apetite to fix checkpatch.pl. - -Stefan -signature.asc -Description: -PGP signature - -On 07/11/2018 09:36 AM, Stefan Hajnoczi wrote: -> -On Wed, Jul 04, 2018 at 03:35:18PM +0000, Aleksandar Markovic wrote: -> -> If checkpatch.pl is applied (using switch "-f") on file -> -> target/mips/msa_helper.c, it will hang. -> -> -> -> There is a workaround for this particular file: -> -> -> -> These lines in msa_helper.c: -> -> -> -> uint## BITS ##_t S = _S, T = _T; \ -> -> uint## BITS ##_t as, at, xs, xt, xd; \ -> -> -> -> should be replaced with: -> -> -> -> uint## BITS ## _t S = _S, T = _T; \ -> -> uint## BITS ## _t as, at, xs, xt, xd; \ -> -> -> -> (a space is added after the second "##" in each line) -> -> -> -> The workaround is found by partial deleting and undeleting of the code in -> -> msa_helper.c in binary search fashion. -> -> -> -> This workaround will soon be submitted by me as a patch within a series on -> -> misc MIPS issues. -> -> -> -> I took a look at checkpatch.pl code, and it looks it is fairly complicated -> -> to fix the issue, since it happens in the code segment involving intricate -> -> logic conditions. -> -> -Thanks for figuring this out, Aleksandar. Not sure if anyone else has -> -the apetite to fix checkpatch.pl. -Anyone else but Paolo ;P -http://lists.nongnu.org/archive/html/qemu-devel/2018-07/msg01250.html -signature.asc -Description: -OpenPGP digital signature - diff --git a/results/classifier/008/all/96782458 b/results/classifier/008/all/96782458 deleted file mode 100644 index 6fa03cc39..000000000 --- a/results/classifier/008/all/96782458 +++ /dev/null @@ -1,1009 +0,0 @@ -debug: 0.989 -permissions: 0.986 -performance: 0.985 -semantic: 0.984 -other: 0.982 -boot: 0.980 -PID: 0.980 -files: 0.978 -socket: 0.976 -vnc: 0.976 -device: 0.974 -graphic: 0.973 -network: 0.967 -KVM: 0.963 - -[Qemu-devel] [BUG] Migrate failes between boards with different PMC counts - -Hi all, - -Recently, I found migration failed when enable vPMU. - -migrate vPMU state was introduced in linux-3.10 + qemu-1.7. - -As long as enable vPMU, qemu will save / load the -vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. -But global_ctrl generated based on cpuid(0xA), the number of general-purpose -performance -monitoring counters(PMC) can vary according to Intel SDN. The number of PMC -presented -to vm, does not support configuration currently, it depend on host cpuid, and -enable all pmc -defaultly at KVM. It cause migration to fail between boards with different PMC -counts. - -The return value of cpuid (0xA) is different dur to cpu, according to Intel -SDNï¼18-10 Vol. 3B: - -Note: The number of general-purpose performance monitoring counters (i.e. N in -Figure 18-9) -can vary across processor generations within a processor family, across -processor families, or -could be different depending on the configuration chosen at boot time in the -BIOS regarding -Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; N -=4 for processors -based on the Nehalem microarchitecture; for processors based on the Sandy Bridge -microarchitecture, N = 4 if Intel Hyper Threading Technology is active and N=8 -if not active). - -Also I found, N=8 if HT is not active based on the broadwellï¼, -such as CPU E7-8890 v4 @ 2.20GHz - -# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda -/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming -tcp::8888 -Completed 100 % -qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff -qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -kvm_put_msrs: -Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -Aborted - -So make number of pmc configurable to vm ? Any better idea ? - - -Regards, --Zhuang Yanying - -* Zhuangyanying (address@hidden) wrote: -> -Hi all, -> -> -Recently, I found migration failed when enable vPMU. -> -> -migrate vPMU state was introduced in linux-3.10 + qemu-1.7. -> -> -As long as enable vPMU, qemu will save / load the -> -vmstate_msr_architectural_pmu(msr_global_ctrl) register during the migration. -> -But global_ctrl generated based on cpuid(0xA), the number of general-purpose -> -performance -> -monitoring counters(PMC) can vary according to Intel SDN. The number of PMC -> -presented -> -to vm, does not support configuration currently, it depend on host cpuid, and -> -enable all pmc -> -defaultly at KVM. It cause migration to fail between boards with different -> -PMC counts. -> -> -The return value of cpuid (0xA) is different dur to cpu, according to Intel -> -SDNï¼18-10 Vol. 3B: -> -> -Note: The number of general-purpose performance monitoring counters (i.e. N -> -in Figure 18-9) -> -can vary across processor generations within a processor family, across -> -processor families, or -> -could be different depending on the configuration chosen at boot time in the -> -BIOS regarding -> -Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom processors; -> -N =4 for processors -> -based on the Nehalem microarchitecture; for processors based on the Sandy -> -Bridge -> -microarchitecture, N = 4 if Intel Hyper Threading Technology is active and -> -N=8 if not active). -> -> -Also I found, N=8 if HT is not active based on the broadwellï¼, -> -such as CPU E7-8890 v4 @ 2.20GHz -> -> -# ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda -> -/data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming -> -tcp::8888 -> -Completed 100 % -> -qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff -> -qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -> -kvm_put_msrs: -> -Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -> -Aborted -> -> -So make number of pmc configurable to vm ? Any better idea ? -Coincidentally we hit a similar problem a few days ago with -cpu host - it -took me -quite a while to spot the difference between the machines was the source -had hyperthreading disabled. - -An option to set the number of counters makes sense to me; but I wonder -how many other options we need as well. Also, I'm not sure there's any -easy way for libvirt etc to figure out how many counters a host supports - it's -not in /proc/cpuinfo. - -Dave - -> -> -Regards, -> --Zhuang Yanying --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: -> -* Zhuangyanying (address@hidden) wrote: -> -> Hi all, -> -> -> -> Recently, I found migration failed when enable vPMU. -> -> -> -> migrate vPMU state was introduced in linux-3.10 + qemu-1.7. -> -> -> -> As long as enable vPMU, qemu will save / load the -> -> vmstate_msr_architectural_pmu(msr_global_ctrl) register during the -> -> migration. -> -> But global_ctrl generated based on cpuid(0xA), the number of -> -> general-purpose performance -> -> monitoring counters(PMC) can vary according to Intel SDN. The number of PMC -> -> presented -> -> to vm, does not support configuration currently, it depend on host cpuid, -> -> and enable all pmc -> -> defaultly at KVM. It cause migration to fail between boards with different -> -> PMC counts. -> -> -> -> The return value of cpuid (0xA) is different dur to cpu, according to Intel -> -> SDNï¼18-10 Vol. 3B: -> -> -> -> Note: The number of general-purpose performance monitoring counters (i.e. N -> -> in Figure 18-9) -> -> can vary across processor generations within a processor family, across -> -> processor families, or -> -> could be different depending on the configuration chosen at boot time in -> -> the BIOS regarding -> -> Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom -> -> processors; N =4 for processors -> -> based on the Nehalem microarchitecture; for processors based on the Sandy -> -> Bridge -> -> microarchitecture, N = 4 if Intel Hyper Threading Technology is active and -> -> N=8 if not active). -> -> -> -> Also I found, N=8 if HT is not active based on the broadwellï¼, -> -> such as CPU E7-8890 v4 @ 2.20GHz -> -> -> -> # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda -> -> /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming -> -> tcp::8888 -> -> Completed 100 % -> -> qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff -> -> qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -> -> kvm_put_msrs: -> -> Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -> -> Aborted -> -> -> -> So make number of pmc configurable to vm ? Any better idea ? -> -> -Coincidentally we hit a similar problem a few days ago with -cpu host - it -> -took me -> -quite a while to spot the difference between the machines was the source -> -had hyperthreading disabled. -> -> -An option to set the number of counters makes sense to me; but I wonder -> -how many other options we need as well. Also, I'm not sure there's any -> -easy way for libvirt etc to figure out how many counters a host supports - -> -it's not in /proc/cpuinfo. -We actually try to avoid /proc/cpuinfo whereever possible. We do direct -CPUID asm instructions to identify features, and prefer to use -/sys/devices/system/cpu if that has suitable data - -Where do the PMC counts come from originally ? CPUID or something else ? - -Regards, -Daniel --- -|: -https://berrange.com --o- -https://www.flickr.com/photos/dberrange -:| -|: -https://libvirt.org --o- -https://fstop138.berrange.com -:| -|: -https://entangle-photo.org --o- -https://www.instagram.com/dberrange -:| - -* Daniel P. Berrange (address@hidden) wrote: -> -On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: -> -> * Zhuangyanying (address@hidden) wrote: -> -> > Hi all, -> -> > -> -> > Recently, I found migration failed when enable vPMU. -> -> > -> -> > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. -> -> > -> -> > As long as enable vPMU, qemu will save / load the -> -> > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the -> -> > migration. -> -> > But global_ctrl generated based on cpuid(0xA), the number of -> -> > general-purpose performance -> -> > monitoring counters(PMC) can vary according to Intel SDN. The number of -> -> > PMC presented -> -> > to vm, does not support configuration currently, it depend on host cpuid, -> -> > and enable all pmc -> -> > defaultly at KVM. It cause migration to fail between boards with -> -> > different PMC counts. -> -> > -> -> > The return value of cpuid (0xA) is different dur to cpu, according to -> -> > Intel SDNï¼18-10 Vol. 3B: -> -> > -> -> > Note: The number of general-purpose performance monitoring counters (i.e. -> -> > N in Figure 18-9) -> -> > can vary across processor generations within a processor family, across -> -> > processor families, or -> -> > could be different depending on the configuration chosen at boot time in -> -> > the BIOS regarding -> -> > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom -> -> > processors; N =4 for processors -> -> > based on the Nehalem microarchitecture; for processors based on the Sandy -> -> > Bridge -> -> > microarchitecture, N = 4 if Intel Hyper Threading Technology is active -> -> > and N=8 if not active). -> -> > -> -> > Also I found, N=8 if HT is not active based on the broadwellï¼, -> -> > such as CPU E7-8890 v4 @ 2.20GHz -> -> > -> -> > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda -> -> > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -incoming -> -> > tcp::8888 -> -> > Completed 100 % -> -> > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff -> -> > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -> -> > kvm_put_msrs: -> -> > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -> -> > Aborted -> -> > -> -> > So make number of pmc configurable to vm ? Any better idea ? -> -> -> -> Coincidentally we hit a similar problem a few days ago with -cpu host - it -> -> took me -> -> quite a while to spot the difference between the machines was the source -> -> had hyperthreading disabled. -> -> -> -> An option to set the number of counters makes sense to me; but I wonder -> -> how many other options we need as well. Also, I'm not sure there's any -> -> easy way for libvirt etc to figure out how many counters a host supports - -> -> it's not in /proc/cpuinfo. -> -> -We actually try to avoid /proc/cpuinfo whereever possible. We do direct -> -CPUID asm instructions to identify features, and prefer to use -> -/sys/devices/system/cpu if that has suitable data -> -> -Where do the PMC counts come from originally ? CPUID or something else ? -Yes, they're bits 8..15 of CPUID leaf 0xa - -Dave - -> -Regards, -> -Daniel -> --- -> -|: -https://berrange.com --o- -https://www.flickr.com/photos/dberrange -:| -> -|: -https://libvirt.org --o- -https://fstop138.berrange.com -:| -> -|: -https://entangle-photo.org --o- -https://www.instagram.com/dberrange -:| --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - -On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: -> -* Daniel P. Berrange (address@hidden) wrote: -> -> On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: -> -> > * Zhuangyanying (address@hidden) wrote: -> -> > > Hi all, -> -> > > -> -> > > Recently, I found migration failed when enable vPMU. -> -> > > -> -> > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. -> -> > > -> -> > > As long as enable vPMU, qemu will save / load the -> -> > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the -> -> > > migration. -> -> > > But global_ctrl generated based on cpuid(0xA), the number of -> -> > > general-purpose performance -> -> > > monitoring counters(PMC) can vary according to Intel SDN. The number of -> -> > > PMC presented -> -> > > to vm, does not support configuration currently, it depend on host -> -> > > cpuid, and enable all pmc -> -> > > defaultly at KVM. It cause migration to fail between boards with -> -> > > different PMC counts. -> -> > > -> -> > > The return value of cpuid (0xA) is different dur to cpu, according to -> -> > > Intel SDNï¼18-10 Vol. 3B: -> -> > > -> -> > > Note: The number of general-purpose performance monitoring counters -> -> > > (i.e. N in Figure 18-9) -> -> > > can vary across processor generations within a processor family, across -> -> > > processor families, or -> -> > > could be different depending on the configuration chosen at boot time -> -> > > in the BIOS regarding -> -> > > Intel Hyper Threading Technology, (e.g. N=2 for 45 nm Intel Atom -> -> > > processors; N =4 for processors -> -> > > based on the Nehalem microarchitecture; for processors based on the -> -> > > Sandy Bridge -> -> > > microarchitecture, N = 4 if Intel Hyper Threading Technology is active -> -> > > and N=8 if not active). -> -> > > -> -> > > Also I found, N=8 if HT is not active based on the broadwellï¼, -> -> > > such as CPU E7-8890 v4 @ 2.20GHz -> -> > > -> -> > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m 4096 -hda -> -> > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -> -> > > -incoming tcp::8888 -> -> > > Completed 100 % -> -> > > qemu-system-x86_64: error: failed to set MSR 0x38f to 0x7000000ff -> -> > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -> -> > > kvm_put_msrs: -> -> > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -> -> > > Aborted -> -> > > -> -> > > So make number of pmc configurable to vm ? Any better idea ? -> -> > -> -> > Coincidentally we hit a similar problem a few days ago with -cpu host - -> -> > it took me -> -> > quite a while to spot the difference between the machines was the source -> -> > had hyperthreading disabled. -> -> > -> -> > An option to set the number of counters makes sense to me; but I wonder -> -> > how many other options we need as well. Also, I'm not sure there's any -> -> > easy way for libvirt etc to figure out how many counters a host supports - -> -> > it's not in /proc/cpuinfo. -> -> -> -> We actually try to avoid /proc/cpuinfo whereever possible. We do direct -> -> CPUID asm instructions to identify features, and prefer to use -> -> /sys/devices/system/cpu if that has suitable data -> -> -> -> Where do the PMC counts come from originally ? CPUID or something else ? -> -> -Yes, they're bits 8..15 of CPUID leaf 0xa -Ok, that's easy enough for libvirt to detect then. More a question of what -libvirt should then do this with the info.... - -Regards, -Daniel --- -|: -https://berrange.com --o- -https://www.flickr.com/photos/dberrange -:| -|: -https://libvirt.org --o- -https://fstop138.berrange.com -:| -|: -https://entangle-photo.org --o- -https://www.instagram.com/dberrange -:| - -> ------Original Message----- -> -From: Daniel P. Berrange [ -mailto:address@hidden -> -Sent: Monday, April 24, 2017 6:34 PM -> -To: Dr. David Alan Gilbert -> -Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; -> -Gonglei (Arei); Huangzhichao; address@hidden -> -Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different -> -PMC counts -> -> -On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: -> -> * Daniel P. Berrange (address@hidden) wrote: -> -> > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: -> -> > > * Zhuangyanying (address@hidden) wrote: -> -> > > > Hi all, -> -> > > > -> -> > > > Recently, I found migration failed when enable vPMU. -> -> > > > -> -> > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. -> -> > > > -> -> > > > As long as enable vPMU, qemu will save / load the -> -> > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the -> -migration. -> -> > > > But global_ctrl generated based on cpuid(0xA), the number of -> -> > > > general-purpose performance monitoring counters(PMC) can vary -> -> > > > according to Intel SDN. The number of PMC presented to vm, does -> -> > > > not support configuration currently, it depend on host cpuid, and -> -> > > > enable -> -all pmc defaultly at KVM. It cause migration to fail between boards with -> -different PMC counts. -> -> > > > -> -> > > > The return value of cpuid (0xA) is different dur to cpu, according to -> -> > > > Intel -> -SDNï¼18-10 Vol. 3B: -> -> > > > -> -> > > > Note: The number of general-purpose performance monitoring -> -> > > > counters (i.e. N in Figure 18-9) can vary across processor -> -> > > > generations within a processor family, across processor -> -> > > > families, or could be different depending on the configuration -> -> > > > chosen at boot time in the BIOS regarding Intel Hyper Threading -> -> > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for -> -processors based on the Nehalem microarchitecture; for processors based on -> -the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading Technology -> -is active and N=8 if not active). -> -> > > > -> -> > > > Also I found, N=8 if HT is not active based on the broadwellï¼, -> -> > > > such as CPU E7-8890 v4 @ 2.20GHz -> -> > > > -> -> > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m -> -> > > > 4096 -hda -> -> > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -> -> > > > -incoming tcp::8888 Completed 100 % -> -> > > > qemu-system-x86_64: error: failed to set MSR 0x38f to -> -> > > > 0x7000000ff -> -> > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -> -kvm_put_msrs: -> -> > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -> -> > > > Aborted -> -> > > > -> -> > > > So make number of pmc configurable to vm ? Any better idea ? -> -> > > -> -> > > Coincidentally we hit a similar problem a few days ago with -cpu -> -> > > host - it took me quite a while to spot the difference between -> -> > > the machines was the source had hyperthreading disabled. -> -> > > -> -> > > An option to set the number of counters makes sense to me; but I -> -> > > wonder how many other options we need as well. Also, I'm not sure -> -> > > there's any easy way for libvirt etc to figure out how many -> -> > > counters a host supports - it's not in /proc/cpuinfo. -> -> > -> -> > We actually try to avoid /proc/cpuinfo whereever possible. We do -> -> > direct CPUID asm instructions to identify features, and prefer to -> -> > use /sys/devices/system/cpu if that has suitable data -> -> > -> -> > Where do the PMC counts come from originally ? CPUID or something -> -else ? -> -> -> -> Yes, they're bits 8..15 of CPUID leaf 0xa -> -> -Ok, that's easy enough for libvirt to detect then. More a question of what -> -libvirt -> -should then do this with the info.... -> -Do you mean to do a validation at the begining of migration? in -qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are -not equal, just quit migration? -It maybe a good enough first edition. -But for a further better edition, maybe it's better to support Heterogeneous -migration I think, so we might need to make PMC number configrable, then we -need to modify KVM/qemu as well. - -Regards, --Zhuang Yanying - -* Zhuangyanying (address@hidden) wrote: -> -> -> -> -----Original Message----- -> -> From: Daniel P. Berrange [ -mailto:address@hidden -> -> Sent: Monday, April 24, 2017 6:34 PM -> -> To: Dr. David Alan Gilbert -> -> Cc: Zhuangyanying; Zhanghailiang; wangxin (U); address@hidden; -> -> Gonglei (Arei); Huangzhichao; address@hidden -> -> Subject: Re: [Qemu-devel] [BUG] Migrate failes between boards with different -> -> PMC counts -> -> -> -> On Mon, Apr 24, 2017 at 11:27:16AM +0100, Dr. David Alan Gilbert wrote: -> -> > * Daniel P. Berrange (address@hidden) wrote: -> -> > > On Mon, Apr 24, 2017 at 10:23:21AM +0100, Dr. David Alan Gilbert wrote: -> -> > > > * Zhuangyanying (address@hidden) wrote: -> -> > > > > Hi all, -> -> > > > > -> -> > > > > Recently, I found migration failed when enable vPMU. -> -> > > > > -> -> > > > > migrate vPMU state was introduced in linux-3.10 + qemu-1.7. -> -> > > > > -> -> > > > > As long as enable vPMU, qemu will save / load the -> -> > > > > vmstate_msr_architectural_pmu(msr_global_ctrl) register during the -> -> migration. -> -> > > > > But global_ctrl generated based on cpuid(0xA), the number of -> -> > > > > general-purpose performance monitoring counters(PMC) can vary -> -> > > > > according to Intel SDN. The number of PMC presented to vm, does -> -> > > > > not support configuration currently, it depend on host cpuid, and -> -> > > > > enable -> -> all pmc defaultly at KVM. It cause migration to fail between boards with -> -> different PMC counts. -> -> > > > > -> -> > > > > The return value of cpuid (0xA) is different dur to cpu, according -> -> > > > > to Intel -> -> SDNï¼18-10 Vol. 3B: -> -> > > > > -> -> > > > > Note: The number of general-purpose performance monitoring -> -> > > > > counters (i.e. N in Figure 18-9) can vary across processor -> -> > > > > generations within a processor family, across processor -> -> > > > > families, or could be different depending on the configuration -> -> > > > > chosen at boot time in the BIOS regarding Intel Hyper Threading -> -> > > > > Technology, (e.g. N=2 for 45 nm Intel Atom processors; N =4 for -> -> processors based on the Nehalem microarchitecture; for processors based on -> -> the Sandy Bridge microarchitecture, N = 4 if Intel Hyper Threading -> -> Technology -> -> is active and N=8 if not active). -> -> > > > > -> -> > > > > Also I found, N=8 if HT is not active based on the broadwellï¼, -> -> > > > > such as CPU E7-8890 v4 @ 2.20GHz -> -> > > > > -> -> > > > > # ./x86_64-softmmu/qemu-system-x86_64 --enable-kvm -smp 4 -m -> -> > > > > 4096 -hda -> -> > > > > /data/zyy/test_qemu.img.sles12sp1 -vnc :99 -cpu kvm64,pmu=true -> -> > > > > -incoming tcp::8888 Completed 100 % -> -> > > > > qemu-system-x86_64: error: failed to set MSR 0x38f to -> -> > > > > 0x7000000ff -> -> > > > > qemu-system-x86_64: /data/zyy/git/test/qemu/target/i386/kvm.c:1833: -> -> kvm_put_msrs: -> -> > > > > Assertion `ret == cpu->kvm_msr_buf->nmsrs' failed. -> -> > > > > Aborted -> -> > > > > -> -> > > > > So make number of pmc configurable to vm ? Any better idea ? -> -> > > > -> -> > > > Coincidentally we hit a similar problem a few days ago with -cpu -> -> > > > host - it took me quite a while to spot the difference between -> -> > > > the machines was the source had hyperthreading disabled. -> -> > > > -> -> > > > An option to set the number of counters makes sense to me; but I -> -> > > > wonder how many other options we need as well. Also, I'm not sure -> -> > > > there's any easy way for libvirt etc to figure out how many -> -> > > > counters a host supports - it's not in /proc/cpuinfo. -> -> > > -> -> > > We actually try to avoid /proc/cpuinfo whereever possible. We do -> -> > > direct CPUID asm instructions to identify features, and prefer to -> -> > > use /sys/devices/system/cpu if that has suitable data -> -> > > -> -> > > Where do the PMC counts come from originally ? CPUID or something -> -> else ? -> -> > -> -> > Yes, they're bits 8..15 of CPUID leaf 0xa -> -> -> -> Ok, that's easy enough for libvirt to detect then. More a question of what -> -> libvirt -> -> should then do this with the info.... -> -> -> -> -Do you mean to do a validation at the begining of migration? in -> -qemuMigrationBakeCookie() & qemuMigrationEatCookie(), if the PMC numbers are -> -not equal, just quit migration? -> -It maybe a good enough first edition. -> -But for a further better edition, maybe it's better to support Heterogeneous -> -migration I think, so we might need to make PMC number configrable, then we -> -need to modify KVM/qemu as well. -Yes agreed; the only thing I wanted to check was that libvirt would have enough -information to be able to use any feature we added to QEMU. - -Dave - -> -Regards, -> --Zhuang Yanying --- -Dr. David Alan Gilbert / address@hidden / Manchester, UK - |